In this module, we are going to use a pretrained CNN model to perform image classification on our dogs vs. cats images.
Learning objectives:
library(keras)
library(glue)
library(ggplot2)
We are working with the same dogs and cats images as before.
image_dir <- here::here("materials", "data", "dogs-vs-cats")
train_dir <- file.path(image_dir, "train")
valid_dir <- file.path(image_dir, "validation")
test_dir <- file.path(image_dir, "test")
There are two main ways we can apply a pretrained model to perform a CNN
There are several pretrained models available via keras; they all start with application_
. Here we’ll use the VGG16 model; it is intuitive to understand the model structure, does a good job with this task, and is good for teaching purposes.
Best practice for picking pretrained models
Applying pretrained models that are already supplied by keras is simple:
weights
: Represents the weights to use. Most pretrained models are built on imagenet and using these weights tends to do well.include_top
: Whether to include the fully-connected dense classifier. Typically, we want the classifier to be specific to our problem.input_shape
: The shape of our nputs (150x150 pixel images w/3 color channels).Tip: Check out the tfhub package which makes it easy to interact with TensorFlow Hub, a libarry for publication, discovery, and consumption of resusable models.
conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(150, 150, 3)
)
summary(conv_base)
Model: "vgg16"
_________________________________________________________________________________________
Layer (type) Output Shape Param #
=========================================================================================
input_1 (InputLayer) [(None, 150, 150, 3)] 0
_________________________________________________________________________________________
block1_conv1 (Conv2D) (None, 150, 150, 64) 1792
_________________________________________________________________________________________
block1_conv2 (Conv2D) (None, 150, 150, 64) 36928
_________________________________________________________________________________________
block1_pool (MaxPooling2D) (None, 75, 75, 64) 0
_________________________________________________________________________________________
block2_conv1 (Conv2D) (None, 75, 75, 128) 73856
_________________________________________________________________________________________
block2_conv2 (Conv2D) (None, 75, 75, 128) 147584
_________________________________________________________________________________________
block2_pool (MaxPooling2D) (None, 37, 37, 128) 0
_________________________________________________________________________________________
block3_conv1 (Conv2D) (None, 37, 37, 256) 295168
_________________________________________________________________________________________
block3_conv2 (Conv2D) (None, 37, 37, 256) 590080
_________________________________________________________________________________________
block3_conv3 (Conv2D) (None, 37, 37, 256) 590080
_________________________________________________________________________________________
block3_pool (MaxPooling2D) (None, 18, 18, 256) 0
_________________________________________________________________________________________
block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160
_________________________________________________________________________________________
block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________________________________
block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________________________________
block4_pool (MaxPooling2D) (None, 9, 9, 512) 0
_________________________________________________________________________________________
block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808
_________________________________________________________________________________________
block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808
_________________________________________________________________________________________
block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808
_________________________________________________________________________________________
block5_pool (MaxPooling2D) (None, 4, 4, 512) 0
=========================================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________________________________
This seems a little daunting to understand but this is just implementing more of a manual approach to what you already have been doing. Here we create a function that will:
datagen <- image_data_generator(rescale = 1/255)
batch_size <- 20
extract_features <- function(directory, sample_count) {
features <- array(0, dim = c(sample_count, 4, 4, 512)) # step 1
labels <- array(0, dim = c(sample_count)) # step 1
generator <- flow_images_from_directory( # step 2
directory = directory,
generator = datagen,
target_size = c(150, 150),
batch_size = batch_size,
class_mode = "binary"
)
i <- 0
while (TRUE) { # step 3
message("Processing batch ", i + 1, " of ", ceiling(sample_count / batch_size))
batch <- generator_next(generator) # step 3a
inputs_batch <- batch[[1]]
labels_batch <- batch[[2]]
features_batch <- conv_base %>% predict(inputs_batch) # step 3b
index_range <- ((i * batch_size) + 1):((i + 1) * batch_size)
features[index_range,,,] <- features_batch # step 3c
labels[index_range] <- labels_batch # step 3c
i <- i + 1
if (i * batch_size >= sample_count) break # step 3d
}
list(
features = features,
labels = labels
)
}
Let’s apply this function to our training, validation, and test data
Without a GPU this will take approximately 5 minutes to execute
train <- extract_features(train_dir, 2000)
Found 2000 images belonging to 2 classes.
validation <- extract_features(valid_dir, 1000)
Found 1000 images belonging to 2 classes.
test <- extract_features(test_dir, 1000)
Found 1000 images belonging to 2 classes.
The extracted features will be a 4D tensor (samples, 4, 4, 512). We can see this in the last layer of our conv_base
model above (block5_pool (MaxPooling2D)
).
Consequently, we need to reshape (flatten) these into a 2D tensor to feed into a densely connected classifier. This results in a 2D tensor of size (samples, 4 * 4 * 512 = 8192).
reshape_features <- function(features) {
array_reshape(features, dim = c(nrow(features), 4 * 4 * 512))
}
train$features <- reshape_features(train$features)
validation$features <- reshape_features(validation$features)
test$features <- reshape_features(test$features)
dim(train$features)
[1] 2000 8192
We’ve extracted and flattened our features from the convolution layers so now we only need to build the densely connected classifier portion of our model.
model <- keras_model_sequential() %>%
layer_dense(units = 256, activation = "relu", input_shape = 4 * 4 * 512) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = "sigmoid")
summary(model)
Model: "sequential"
_________________________________________________________________________________________
Layer (type) Output Shape Param #
=========================================================================================
dense (Dense) (None, 256) 2097408
_________________________________________________________________________________________
dropout (Dropout) (None, 256) 0
_________________________________________________________________________________________
dense_1 (Dense) (None, 1) 257
=========================================================================================
Total params: 2,097,665
Trainable params: 2,097,665
Non-trainable params: 0
_________________________________________________________________________________________
Now we can compile and train our model. This will train quickly, taking approximately 1 min when trained on your local CPU. Our validation loss also improves over the previous CNN we built.
model %>% compile(
optimizer = optimizer_rmsprop(lr = 0.0001),
loss = "binary_crossentropy",
metrics = c("accuracy")
)
history1 <- model %>% fit(
train$features, train$labels,
epochs = 30,
batch_size = 32,
validation_data = list(validation$features, validation$labels),
callbacks = list(
callback_early_stopping(patience = 10),
callback_reduce_lr_on_plateau(patience = 2)
)
)
So we acheived a significant decrease in our loss score, increased our accuracy to 90%, and did so in a fraction of the time!
best_epoch <- which.min(history1$metrics$val_loss)
best_loss <- history1$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history1$metrics$val_accuracy[best_epoch] %>% round(3)
glue("Our optimal loss is {best_loss} with an accuracy of {best_acc}")
Our optimal loss is 0.242 with an accuracy of 0.898
plot(history1) +
scale_x_continuous(limits = c(0, length(history1$metrics$val_loss)))
⚠️⚠️ ONLY RUN ON GPU!! ⚠️⚠️
The above approach performed pretty well. However, we can see that we are still overfitting, which may be reducing model performance. An alternative approach is to run a pretrained model from end-to-end. This approach is much slower and computationally intense; however, it offers greater flexibility in using and adjusting the pretrained model because it lets you:
The following approach simply plugs the pretrained convolution base into a sequential model but freezes the convolution base weights.
In this case we can literally plug in our conv_base
within our model architecture.
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
model
Before you compile and train the model, it’s important to freeze the convolutional base weights. This prevents the weights from being updated during training. If you don’t do this then the representations found in the pretrained model will be modified and, potentially, completely destroyed.
cat(length(model$trainable_weights), "trainable weight tensors before freezing.\n")
freeze_weights(conv_base)
cat(length(model$trainable_weights), "trainable weight tensors before freezing.\n")
The following trains the model end-to-end using all CNN logic that you have seen before:
train_datagen = image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
test_datagen <- image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
validation_generator <- flow_images_from_directory(
valid_dir,
test_datagen,
target_size = c(150, 150),
batch_size = 20,
class_mode = "binary"
)
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 1e-5),
metrics = c("accuracy")
)
history2 <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 30,
validation_data = validation_generator,
validation_steps = 50
)
plot(history2)
Another widely used technique for using pretrained models, is to unfreeze a few of the convolutional base and allow those weights to be updated. Recall that the early layers in a CNN identify detailed edges and shapes. Later layers put these edges and shapes together to make higher order parts of the images we are trying to classify (i.e. cat ears, dog tails).
The more our images deviate from the images used to create the pretrained model, then the more likely you will want to retrain the last few layers, which will make the edge and shape features more relevant to your problem.
To fine-tune a pretrained model you:
CNN-base-and-classifier
code chunk).freeze-parameters
code chunk).train-end-to-end
code chunk).We already did steps 1-3. The following executes steps 4 and 5.
unfreeze_weights(conv_base, from = "block3_conv1")
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 1e-5),
metrics = c("accuracy")
)
history2 <- model %>% fit_generator(
train_generator,
steps_per_epoch = 100,
epochs = 100,
validation_data = validation_generator,
validation_steps = 50
)
plot(history2)
Pre-trained models can be efficient and effective for problems that align to common computer vision (and other) tasks.
Many pre-existing models exist on TensorFlow Hub and are worth researching.