!wget https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
!unzip -qo dogImages.zip
!rm dogImages.zip
Dog Breed Classification
This example is based on a very popular Udacity project, upgraded to use TensorFlow 2.3.0
with GPU accelaration. The goal is to classify images of dogs according to their breed.
In this notebook, we take the first steps towards developing an algorithm that could be used as part of a mobile or web app. At the end of this project, our code will accept any user-supplied image as input. If a dog is detected in the image, it will provide an estimate of the dog’s breed. In this real-world setting, we will piece together a series of state-of-the-art computer vision models to perform different tasks (Dog detection -> Breed classification).
Table of contents
We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.
- Step 0: Download Datasets and Install Dependencies
- Step 1: Import Datasets
- Step 2: Detect Dogs
- Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
- Step 4: Create a CNN (VGG16) to Classify Dog Breeds (using Transfer Learning)
- Step 5: Create a CNN (ResNet-50) to Classify Dog Breeds (using Transfer Learning)
- Step 6: Write your Own Dog Classifier
- Step 7: Test Your Classifier
## Step 0: Download Datasets and Install Dependencies
For this task we use TensorFlow 2.3.0
, as well as a few helper libraries like Pillow
. Also, we need to download and extract the dataset we will use to train our classifier.
Download the dataset
The dataset is available here. The following cell downloads it as a zip file, extracts it and moves it to the corresponding folder. Finally, it removes the zip
file.
Install dependencies
The task requires Pillow
the friendly PIL fork by Alex Clark and Contributors. PIL is the Python Imaging Library by Fredrik Lundh and Contributors. Below, we install the Pillow
package using pip
.
!pip3 install --user -r requirements/requirements-v2.txt
Import the necessary libraries
We use Tensorflow 2.3.0
to build and train our dog breed classifier. We also need Pillow
to load the images in memory, which we specifically instruct to load any truncated images also.
import os
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from glob import glob
from PIL import Image
from PIL import ImageFile
= True ImageFile.LOAD_TRUNCATED_IMAGES
print(f"Version of TensorFlow in use: {tf.__version__}")
print(f"Using GPU device: {tf.config.list_physical_devices('GPU')}")
First, let us define the pipeline-parameters cell. We use it to define the hyperparametes we would like to tune later. These variables will be converted to KFP pipeline parameters, so we should make sure they are used as global variables throughout the notebook.
= 6e-4
LR = 32
BATCH_SIZE = 256
NUMBER_OF_NODES = 4
EPOCHS = 224 IMG_SIZE
Processing the Dataset
We use TensorFlow native generators to load and transform the data. Pay attention to the train_datagen
which also includes several transformations to augment our dataset (e.g. width and height shift, brightness alterations and horizontal flip). These transformations are taking place in memory, leaving the original data untouched.
def get_train_generator():
= ImageDataGenerator(
data_datagen =1./255,
rescale=.2,
width_shift_range=.2,
height_shift_range=[0.5,1.5],
brightness_range=True
horizontal_flip
)return data_datagen.flow_from_directory(
"dogImages/train/",
=(int(IMG_SIZE), int(IMG_SIZE)),
target_size=int(BATCH_SIZE),
batch_size
)
def get_valid_generator():
= ImageDataGenerator(rescale=1./255)
data_datagen return data_datagen.flow_from_directory(
"dogImages/valid/",
=(int(IMG_SIZE), int(IMG_SIZE)),
target_size=int(BATCH_SIZE)
batch_size
)
def get_test_generator():
= ImageDataGenerator(rescale=1./255)
data_datagen return data_datagen.flow_from_directory(
"dogImages/test/",
=(int(IMG_SIZE), int(IMG_SIZE)),
target_size=int(BATCH_SIZE)
batch_size )
In this section, we use a pre-trained ResNet V2 model to detect dogs in images. First, we download a pretrained ResNet-50 model on ImageNet, a very large, very popular dataset used for image classification and other computer vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.
= tf.keras.applications.ResNet50V2(
dog_classifier ="imagenet",
weights=(int(IMG_SIZE), int(IMG_SIZE), 3)
input_shape )
The categories corresponding to dogs appear in an uninterrupted sequence referring to dictionary keys 151-268, inclusive, to include all categories from ‘Chihuahua’ to ‘Mexican hairless’. Thus, in order to check if an image is predicted to contain a dog by the pre-trained ResNet model, we need only check if the function below returns a value between 151 and 268 (inclusive).
We use these ideas to complete the is_dog
function below, which returns True if a dog is detected in an image.
def is_dog(data):
= dog_classifier.predict(data)
probs = tf.argmax(probs, axis=1)
preds return ((preds >= 151) & (preds <= 268))
To test the classifier we request a batch from our training data generator and feed it through the network. The accuracy, as expected, is really high.
= get_train_generator()
train_generator = train_generator.next()
batch = is_dog(batch[0]) predictions
= np.sum(predictions)
n_dog = n_dog/int(BATCH_SIZE)
dog_percentage
print('{:.0%} of the files have a detected dog'.format(dog_percentage))
## Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
Now that we have a function for detecting dogs in images, we need a way to predict the dog breed from images. In this step, you will create a CNN that classifies dog breeds. We should be careful with adding too many trainable layers though. More parameters means longer training and you may also fall in the trap of overfitting. Thankfully, tf.keras
provides a handy estimate of the time that each epoch is likely to take; you can extrapolate this estimate to figure out how long it will take for your algorithm to train.
We mention that the task of assigning breed to dogs from images is considered exceptionally challenging. To see why, consider that even a human would have great difficulty in distinguishing between a Brittany and a Welsh Springer Spaniel.
Brittany | Welsh Springer Spaniel |
---|---|
![]() |
![]() |
It is not difficult to find other dog breed pairs with minimal inter-class variation (for instance, Curly-Coated Retrievers and American Water Spaniels).
Curly-Coated Retriever | American Water Spaniel |
---|---|
![]() |
![]() |
Likewise, recall that labradors come in yellow, chocolate, and black. Your vision-based algorithm will have to conquer this high intra-class variation to determine how to classify all of these different shades as the same breed.
Yellow Labrador | Chocolate Labrador | Black Labrador |
---|---|---|
![]() |
![]() |
![]() |
We also mention that random chance presents an exceptionally low bar: setting aside the fact that the classes are slightly imabalanced, a random guess will provide a correct answer roughly 1 in 133 times, which corresponds to an accuracy of less than 1%.
Model architecture
Create a CNN to classify dog breed. At the end of your code cell block, summarize the layers of your model by executing model.summary()
.
= tf.keras.models.Sequential([
model 16, 3, activation="relu", input_shape=(int(IMG_SIZE), int(IMG_SIZE), 3)),
tf.keras.layers.Conv2D(
tf.keras.layers.MaxPool2D(),32, 3, activation="relu"),
tf.keras.layers.Conv2D(
tf.keras.layers.MaxPool2D(),64, 3, activation="relu"),
tf.keras.layers.Conv2D(
tf.keras.layers.MaxPool2D(),
tf.keras.layers.GlobalAveragePooling2D(),int(NUMBER_OF_NODES), activation="relu"),
tf.keras.layers.Dense(133, activation="softmax")
tf.keras.layers.Dense( ])
model.summary()
The next step is to compile the model. For this, we need to pass an optimizer and a loss function. We can also pass a list of metrics we want. In this example, we pass the accuracy metric.
compile(
model.=tf.optimizers.Adam(learning_rate=float(LR)),
optimizer=tf.losses.categorical_crossentropy,
loss=["accuracy"]
metrics )
Finally, we can train the model using the fit
method. This runs on batches yielded by the data generator and prints out the loss and accuracy both for train and validation sets.
= get_train_generator()
train_generator = get_valid_generator()
valid_generator
= tf.keras.callbacks.TensorBoard(log_dir="custom_classifier_logs")
tb_callback
=2,
model.fit(train_generator, epochs=valid_generator,
validation_data=[tb_callback]
callbacks )
Evaluation
To evaluate the final model we feed it with the test dataset and call the evaluate
method.
= get_test_generator()
test_generator
compile(
model.=tf.optimizers.Adam(learning_rate=float(LR)),
optimizer=tf.losses.categorical_crossentropy,
loss=["accuracy"]
metrics
)
= model.evaluate(test_generator)
test_loss_custom, test_accuracy_custom
print(f"The accuracy in the test set is {test_accuracy_custom:.3f}.")
## Step 4: Create a CNN (VGG16) to Classify Dog Breeds (using Transfer Learning)
To reduce training time without sacrificing accuracy, we train a CNN using Transfer Learning. Transfer Learning is the fine-tuning of a network that was pre-trained on some big dataset with new classification layers. The idea behind is that we want to keep all the good features learned in the lower levels of the network (because there’s a high probability the new images will also have those features) and just learn a new classifier on top of those. This tends to work well, especially with small datasets that don’t allow for a full training of the network from scratch (it’s also much faster than a full training).
One way of doing Transfer Learning is by loading a pretrained model up to a point, usually chopping off the final dense part of the model and adding a fully connected layer with the output that we want (e.g. an 133-node classifier). Then, we freeze the first part of the model (i.e. the body) and train only the final layer we added.
= tf.keras.applications.VGG16(
vgg_body ="imagenet",
weights=False,
include_top=(int(IMG_SIZE), int(IMG_SIZE), 3)
input_shape )
= False vgg_body.trainable
= tf.keras.layers.Input(shape=(int(IMG_SIZE), int(IMG_SIZE), 3))
inputs # We make sure that the vgg_body is running in inference mode here,
# by passing `training=False`. This is important for fine-tuning, as you will
# learn in a few paragraphs.
= vgg_body(inputs, training=False)
x # Convert features of shape `vgg_body.output_shape[1:]` to vectors
= tf.keras.layers.GlobalAveragePooling2D()(x)
x # A Dense classifier (categorical classification)
= tf.keras.layers.Dense(133, activation="softmax")(x)
outputs
= tf.keras.Model(inputs, outputs) vgg_model
vgg_model.summary()
compile(
vgg_model.=tf.optimizers.Adam(learning_rate=float(LR)),
optimizer=tf.losses.categorical_crossentropy,
loss=["accuracy"]
metrics )
= get_train_generator()
train_generator = get_valid_generator()
valid_generator
=2,
vgg_model.fit(train_generator, epochs=valid_generator
validation_data )
Evaluation
To evaluate the model on the test set we call the same evaluate
method.
= get_test_generator()
test_generator
compile(
vgg_model.=tf.optimizers.Adam(learning_rate=float(LR)),
optimizer=tf.losses.categorical_crossentropy,
loss=["accuracy"]
metrics
)
= vgg_model.evaluate(test_generator)
test_loss_vgg, test_accuracy_vgg
print(f"The accuracy in the test set is {test_accuracy_vgg:.3f}.")
## Step 5: Create a CNN (ResNet-50) to Classify Dog Breeds (using Transfer Learning)
In this section, we will use the same procedure but with a pretrained ResNet-50 model.
= tf.keras.applications.ResNet50V2(
resnet_body ="imagenet",
weights=False,
include_top=(int(IMG_SIZE), int(IMG_SIZE), 3)
input_shape )
= False resnet_body.trainable
= tf.keras.layers.Input(shape=(int(IMG_SIZE), int(IMG_SIZE), 3))
inputs # We make sure that the vgg_body is running in inference mode here,
# by passing `training=False`. This is important for fine-tuning, as you will
# learn in a few paragraphs.
= resnet_body(inputs, training=False)
x # Convert features of shape `vgg_body.output_shape[1:]` to vectors
= tf.keras.layers.Flatten()(x)
x # A Dense classifier (categorical classification)
= tf.keras.layers.Dense(133, activation="softmax")(x)
outputs
= tf.keras.Model(inputs, outputs) resnet_model
compile(
resnet_model.=tf.optimizers.Adam(learning_rate=float(LR)),
optimizer=tf.losses.categorical_crossentropy,
loss=["accuracy"]
metrics )
= get_train_generator()
train_generator = get_valid_generator()
valid_generator
=int(EPOCHS),
resnet_model.fit(train_generator, epochs=valid_generator
validation_data )
Evaluation
= get_test_generator()
test_generator
compile(
resnet_model.=tf.optimizers.Adam(learning_rate=float(LR)),
optimizer=tf.losses.categorical_crossentropy,
loss=["accuracy"]
metrics
)
= resnet_model.evaluate(test_generator)
test_loss_resnet, test_accuracy_resnet
print(f"The accuracy in the test set is {test_accuracy_resnet:.3f}.")
## Step 6: Write your Own Dog Classifier
To create our own classifier we need a class to predict if there is a dog in the image and if that’s true, return the breed. For the first part we use the dog_classifier
method and then predict the breed using the predict_breed
method.
= {value: key for key, value in train_generator.class_indices.items()}
idx_to_class
def predict_breed(images):
= resnet_model.predict(images)
probs = tf.argmax(probs, axis=1)
pred = idx_to_class[pred.numpy()[0]]
label return label.split(".")[-1]
def predict_dog(image):
= image[None,...]
image if is_dog(image):
= predict_breed(image)
pred print(f"This photo looks like a(n) {pred}.")
return
print("No dog detected")
= train_generator.next()[0][0]
image
plt.imshow(image)
plt.show()= predict_dog(image) pred
## Step 7: Test Your Classifier
In the last section, we take your new algorithm for a spin; if you have a dog, does it predict your dog’s breed accurately? If you have a cat, does it mistakenly think that your cat is a dog?
for img_path in sorted(glob("check_images/*")):
print(img_path)
= Image.open(img_path)
img = img.resize((224, 224))
img
plt.imshow(img)
plt.show()
= np.array(img)
img predict_dog(img)
Pipeline metrics
This is the pipeline-metrics cell. Use it to define the pipeline metrics that KFP will produce for every pipeline run. Kale will associate each one of these metrics to the steps that produced them. Also, you will have to choose one these metrics as the Katib search objective metric.
print(test_accuracy_resnet)