Our prize winning AWS Hackathon entry
explaining the project that won runner up Computer Vision award.
We are using the Food101 standard database to create a deep learning model that can tell the difference and make predictions between 101 classes of food. using all 75,750 training images and 25,250 testing images. We will also use mixed learning to increase the speed of the training process.
Our goal is to beat the original DeepFood paper. they have an accuracy of 77.4%.
We will use the power of transfer learning by incorporating the EfficientNetX architecture, using the following steps:
Here is the Link to the complete github repository.
Since we are planning to use mixed learning, we need a compatible GPU. This model is being trained on Google’s colab and they provides 3 types of free Nvidia GPU’s.
Knowing this we need access to an Nvidia Tesla T4(from colab) or any GPU with a compute score of 7+ of our own.
Lets find out our GPU type by the following command.
!nvidia-smi -L
GPU 0: Tesla T4 (UUID: GPU-b54d7bc4-6911-4395-1619-170d04e3161d)
Great! we have a compatible GPU ie, the Tesla T4. If you do not have a compatible GPU try factory runtime -> factory reset to reset your session.
# Show the Tensorflow Version (run this in Google Colab)
import tensorflow as tf
print(tf.__version__)
2.4.1
There is a known bug with tensorflow 2.5.0 specifically that does’nt work with mixed learning. To avoid that use the following code to downgrade to version 2.4.1.For more info.
# Downgrade Tensorflow Version (run this in Google Colab)
!pip install tensorflow==2.4.1
We have created some helper functions that we are importing into our project.
!wget https://raw.githubusercontent.com/realnihal/Random_Code/master/helper_functions.py
from helper_functions import create_tensorboard_callback, plot_loss_curves
--2021-08-05 05:15:53-- https://raw.githubusercontent.com/realnihal/Random_Code/master/helper_functions.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10510 (10K) [text/plain]
Saving to: ‘helper_functions.py’
helper_functions.py 100%[===================>] 10.26K --.-KB/s in 0s
2021-08-05 05:15:53 (89.1 MB/s) - ‘helper_functions.py’ saved [10510/10510]
The dataset Food101 in available in to download from tensorflow datasets. You can find out more about this from here Tensorflow Datasets
#importing tfds
import tensorflow_datasets as tfds
#checking the name of the dataset in present within tfds
datasets_list = tfds.list_builders()
print("food101" in datasets_list)
True
This may take a few minutes since the data is so large.
#Loading the dataset
(train_data, test_data), ds_info = tfds.load(name='food101',
split=['train','validation'],
shuffle_files=True,
as_supervised=True,
with_info=True)
lets see what the downloaded data contains. This allows us to get a more better idea of how to preprocess the data.
#Features of Food101 from TFDS
ds_info.features
FeaturesDict({
'image': Image(shape=(None, None, 3), dtype=tf.uint8),
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=101),
})
First thing we notice is that the image is in the uint8 format we need to convert it into float32.
#Printing a few class names
class_names = ds_info.features["label"].names
class_names[:10] #only print 20 to avoid spamming the output
['apple_pie',
'baby_back_ribs',
'baklava',
'beef_carpaccio',
'beef_tartare',
'beet_salad',
'beignets',
'bibimbap',
'bread_pudding',
'breakfast_burrito']
#Lets take one sample to dive deeper and view our data
train_one_sample = train_data.take(1)
#Printing the image features
for image, label in train_one_sample:
print(f'''
Image shape: {image.shape}
Iimage datatype: {image.dtype}
Target class from Food101 (tensor form): {label}
Class name (str form): {class_names[label.numpy()]}
''')
Image shape: (512, 512, 3)
Iimage datatype: <dtype: 'uint8'>
Target class from Food101 (tensor form): 25
Class name (str form): club_sandwich
#Checking whether the image is normalized
image
<tf.Tensor: shape=(512, 512, 3), dtype=uint8, numpy=
array([[[135, 156, 175],
[125, 148, 166],
[114, 136, 159],
...,
[ 26, 5, 12],
[ 26, 3, 11],
[ 27, 4, 12]],
[[128, 150, 171],
[115, 140, 160],
[102, 127, 149],
...,
[ 28, 7, 14],
[ 29, 6, 14],
[ 30, 7, 15]],
[[112, 139, 160],
[ 99, 127, 148],
[ 87, 115, 137],
...,
[ 29, 6, 16],
[ 31, 5, 16],
[ 32, 6, 17]],
...,
[[ 48, 47, 53],
[ 53, 52, 58],
[ 52, 51, 59],
...,
[111, 99, 99],
[108, 98, 97],
[106, 96, 97]],
[[ 44, 45, 47],
[ 48, 49, 51],
[ 46, 47, 51],
...,
[108, 96, 98],
[105, 94, 98],
[102, 93, 96]],
[[ 40, 42, 41],
[ 45, 47, 46],
[ 44, 45, 49],
...,
[105, 95, 96],
[104, 93, 99],
[100, 91, 96]]], dtype=uint8)>
tf.reduce_min(image), tf.reduce_max(image)
(<tf.Tensor: shape=(), dtype=uint8, numpy=0>,
<tf.Tensor: shape=(), dtype=uint8, numpy=255>)
#Plotting an image from the dataset to check if our labels are correct
import matplotlib.pyplot as plt
plt.imshow(image)
plt.title(class_names[label.numpy()])
plt.axis(False)
(-0.5, 511.5, 511.5, -0.5)
from our initial data exploration we found that we need to do the following things:
# Make a function for preprocessing images
def preprocess_img(image, label, img_shape=224):
"""
Converts image datatype from 'uint8' -> 'float32' and reshapes image to
[img_shape, img_shape, color_channels]
"""
image = tf.image.resize(image, [img_shape, img_shape]) # reshape to img_shape
#image = image/255.0 (not required)
return tf.cast(image, tf.float32), label # return (float32_image, label) tuple
#Using our preprocess function to test on the sample image
preprocessed_img = preprocess_img(image, label)[0]
print(f"Image before preprocessing:\n {image[:2]}...,\nShape: {image.shape},\nDatatype: {image.dtype}\n")
print(f"Image after preprocessing:\n {preprocessed_img[:2]}...,\nShape: {preprocessed_img.shape},\nDatatype: {preprocessed_img.dtype}")
Image before preprocessing:
[[[135 156 175]
[125 148 166]
[114 136 159]
...
[ 26 5 12]
[ 26 3 11]
[ 27 4 12]]
[[128 150 171]
[115 140 160]
[102 127 149]
...
[ 28 7 14]
[ 29 6 14]
[ 30 7 15]]]...,
Shape: (512, 512, 3),
Datatype: <dtype: 'uint8'>
Image after preprocessing:
[[[122.83163 146.17346 165.81633 ]
[ 95.07653 122.122444 144.47958 ]
[ 72.5051 106.994896 134.34694 ]
...
[ 20.714308 2.3570995 3.9285717]
[ 27.285715 6.285714 13.285714 ]
[ 28.28575 5.2857494 13.285749 ]]
[[ 88.65305 119.41326 140.41327 ]
[ 74.59694 108.30102 133.02042 ]
[ 75.2551 112.57143 141.91325 ]
...
[ 26.857143 6.285671 11.040798 ]
[ 30.061235 6.86222 16.795908 ]
[ 31.688843 5.688843 16.688843 ]]]...,
Shape: (224, 224, 3),
Datatype: <dtype: 'float32'>
The data.AUTOTUNE function and the prefetch function work in tandem to utilize the multiple cores available to us and the gpu to efficiently process all of our images as quickly as possible.
# Map preprocessing function to training data (and paralellize)
train_data = train_data.map(map_func=preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)
# Shuffle train_data and turn it into batches and prefetch it (load it faster)
train_data = train_data.shuffle(buffer_size=1000).batch(batch_size=32).prefetch(buffer_size=tf.data.AUTOTUNE)
# Map prepreprocessing function to test data
test_data = test_data.map(preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)
# Turn test data into batches (don't need to shuffle)
test_data = test_data.batch(32).prefetch(tf.data.AUTOTUNE)
train_data, test_data
(<PrefetchDataset shapes: ((None, 224, 224, 3), (None,)), types: (tf.float32, tf.int64)>,
<PrefetchDataset shapes: ((None, 224, 224, 3), (None,)), types: (tf.float32, tf.int64)>)
Now lets implement Mixed precision. Here we can try to use flast 16 on some layers to improve speed and efficiency. It is only compatible with GPU’s with a compute score of 7+.
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy("mixed_float16")
INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: Tesla T4, compute capability 7.5
INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: Tesla T4, compute capability 7.5
Its time to define our model. Here is the order in which we define our layers:
We are ensuring that the layers of the EfficientNet is Freezed. This is to prevent any deviation of the already learned patterns. We need to also specify the float32 as output to the activation layer, as using the global float16 may cause issues with the softmax activation function.
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
# Create base model
input_shape = (224, 224, 3)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False # freeze base model layers
# Create Functional model
inputs = layers.Input(shape=input_shape, name="input_layer")
# x = preprocessing.Rescaling(1./255)(x) EfficientNetBX models have rescaling built-in but if your model didn't you could have a layer like below
x = base_model(inputs, training=False) # set base_model to inference mode only
x = layers.GlobalAveragePooling2D(name="pooling_layer")(x)
x = layers.Dense(len(class_names))(x) # want one output neuron per class
# Separate activation of output layer so we can output float32 activations
outputs = layers.Activation("softmax", dtype=tf.float32, name="softmax_float32")(x)
model = tf.keras.Model(inputs, outputs)
# Compile the model
model.compile(loss="sparse_categorical_crossentropy", # Use sparse_categorical_crossentropy when labels are *not* one-hot
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5
16711680/16705208 [==============================] - 0s 0us/step
#Printing a summary of the model
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
efficientnetb0 (Functional) (None, None, None, 1280) 4049571
_________________________________________________________________
pooling_layer (GlobalAverage (None, 1280) 0
_________________________________________________________________
dense (Dense) (None, 101) 129381
_________________________________________________________________
softmax_float32 (Activation) (None, 101) 0
=================================================================
Total params: 4,178,952
Trainable params: 129,381
Non-trainable params: 4,049,571
_________________________________________________________________
#Checking the trainability and datatypes of the layers
for layer in model.layers:
print(layer.name, layer.trainable, layer.dtype, layer.dtype_policy)
input_layer True float32 <Policy "float32">
efficientnetb0 False float32 <Policy "mixed_float16">
pooling_layer True float32 <Policy "mixed_float16">
dense True float32 <Policy "mixed_float16">
softmax_float32 True float32 <Policy "float32">
Creating our callback functions.
# Create TensorBoard callback (already have "create_tensorboard_callback()" from a previous notebook)
from helper_functions import create_tensorboard_callback
# Create ModelCheckpoint callback to save model's progress
checkpoint_path = "model_checkpoints/cp.ckpt" # saving weights requires ".ckpt" extension
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
monitor = 'val_accuracy', # save the model weights with best validation accuracy
save_best_only=True, # only save the best weights
save_weights_only=True, # only save model weights (not whole model)
verbose=1)
We are using the early_stopping callback to prevent any major overfitting and running the training for 3 epochs. We are testing on only 15 percent of the data to save time. Yet this is going to take a while processing over 100,000 images per epoch. So, I’m going to grab a cup of coffee.
history_101_food_classes_feature_extract = model.fit(train_data,
epochs=3,
steps_per_epoch=len(train_data),
validation_data=test_data,
validation_steps=int(0.15 * len(test_data)),
callbacks=[create_tensorboard_callback("training_logs",
"efficientnetb0_101_classes_all_data_feature_extract"),
model_checkpoint])
Saving TensorBoard log files to: training_logs/efficientnetb0_101_classes_all_data_feature_extract/20210805-060548
Epoch 1/3
2368/2368 [==============================] - 192s 80ms/step - loss: 0.6993 - accuracy: 0.8149 - val_loss: 1.1099 - val_accuracy: 0.7021
Epoch 00001: val_accuracy did not improve from 0.71478
Epoch 2/3
2368/2368 [==============================] - 167s 70ms/step - loss: 0.6780 - accuracy: 0.8221 - val_loss: 1.1195 - val_accuracy: 0.7058
Epoch 00002: val_accuracy did not improve from 0.71478
Epoch 3/3
2368/2368 [==============================] - 171s 72ms/step - loss: 0.6579 - accuracy: 0.8262 - val_loss: 1.1418 - val_accuracy: 0.6989
Epoch 00003: val_accuracy did not improve from 0.71478
#Evaluating our model on the test data
results_feature_extract_model = model.evaluate(test_data)
results_feature_extract_model
790/790 [==============================] - 55s 70ms/step - loss: 1.1396 - accuracy: 0.7001
[1.1395542621612549, 0.7001188397407532]
We got an accuracy of about 70% its close to our target ie Deepfood’s original score of 77.4%. We still have another step to do, lets hope for the best.
#Saving our model
model.save("101_food_feature_extract_mixedpred_model")
INFO:tensorflow:Assets written to: 101_food_feature_extract_mixedpred_model/assets
INFO:tensorflow:Assets written to: 101_food_feature_extract_mixedpred_model/assets
#Loading our model into a seperate model
loaded_model = tf.keras.models.load_model("101_food_feature_extract_mixedpred_model")
loaded_model.load_weights(checkpoint_path)
#Checking if our layer types are accurate in loaded_model
for layer in loaded_model.layers:
print(layer.name, layer.trainable, layer.dtype, layer.dtype_policy)
input_layer True float32 <Policy "float32">
efficientnetb0 True float32 <Policy "mixed_float16">
pooling_layer True float32 <Policy "mixed_float16">
dense True float32 <Policy "mixed_float16">
softmax_float32 True float32 <Policy "float32">
#Evaulating on our loaded_model
test_eval = loaded_model.evaluate(test_data)
790/790 [==============================] - 49s 60ms/step - loss: 1.0722 - accuracy: 0.7116
#Seeing if the results match our actual model.
results_feature_extract_model == test_eval
False
To fine tune the model lets unfreeze the layers in our EfficientNet.
# Are any of the layers in our model frozen?
for layer in loaded_model.layers:
layer.trainable = True # set all layers to trainable
print(layer.name, layer.trainable, layer.dtype, layer.dtype_policy) # make sure loaded model is using mixed precision dtype_policy ("mixed_float16")
input_layer True float32 <Policy "float32">
efficientnetb0 True float32 <Policy "mixed_float16">
pooling_layer True float32 <Policy "mixed_float16">
dense True float32 <Policy "mixed_float16">
softmax_float32 True float32 <Policy "float32">
Setting up Some more Callbacks for Fine-tuning
# Setup EarlyStopping callback to stop training if model's val_loss doesn't improve for 3 epochs
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_loss", # watch the val loss metric
patience=3) # if val loss decreases for 3 epochs in a row, stop training
# Create ModelCheckpoint callback to save best model during fine-tuning
checkpoint_path = "fine_tune_checkpoints/"
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
save_best_only=True,
monitor="val_loss")
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",
factor=0.2, # multiply the learning rate by 0.2 (reduce by 5x)
patience=2,
verbose=1, # print out when learning rate goes down
min_lr=1e-7)
loaded_model.compile(loss="sparse_categorical_crossentropy", # sparse_categorical_crossentropy for labels that are *not* one-hot
optimizer=tf.keras.optimizers.Adam(0.0001), # 10x lower learning rate than the default
metrics=["accuracy"])
We are running the training for 100 epochs. The training will be stopped when the early stopping function is called. This is again going to take a while processing over 100,000 images per epoch. So, I’m going to grab another cup of coffee.
history_101_food_classes_all_data_fine_tune = loaded_model.fit(train_data,
epochs=100, # fine-tune for a maximum of 100 epochs
steps_per_epoch=len(train_data),
validation_data=test_data,
validation_steps=int(0.15 * len(test_data)), # validation during training on 15% of test data
callbacks=[create_tensorboard_callback("training_logs", "efficientb0_101_classes_all_data_fine_tuning"), # track the model training logs
model_checkpoint, # save only the best model during training
early_stopping, # stop model after X epochs of no improvements
reduce_lr]) # reduce the learning rate after X epochs of no improvements
Saving TensorBoard log files to: training_logs/efficientb0_101_classes_all_data_fine_tuning/20210805-063310
Epoch 1/100
2368/2368 [==============================] - 313s 127ms/step - loss: 0.8116 - accuracy: 0.7748 - val_loss: 0.8895 - val_accuracy: 0.7524
INFO:tensorflow:Assets written to: fine_tune_checkpoints/assets
INFO:tensorflow:Assets written to: fine_tune_checkpoints/assets
Epoch 2/100
2368/2368 [==============================] - 296s 122ms/step - loss: 0.4660 - accuracy: 0.8666 - val_loss: 0.9664 - val_accuracy: 0.7505
Epoch 3/100
2368/2368 [==============================] - 290s 122ms/step - loss: 0.2495 - accuracy: 0.9273 - val_loss: 1.0183 - val_accuracy: 0.7532
Epoch 00003: ReduceLROnPlateau reducing learning rate to 1.9999999494757503e-05.
Epoch 4/100
2368/2368 [==============================] - 290s 122ms/step - loss: 0.0796 - accuracy: 0.9808 - val_loss: 1.0898 - val_accuracy: 0.7773
loaded_model.save("efficientnetb0_fine_tuned_101_classes_mixed_precision")
INFO:tensorflow:Assets written to: efficientnetb0_fine_tuned_101_classes_mixed_precision/assets
INFO:tensorflow:Assets written to: efficientnetb0_fine_tuned_101_classes_mixed_precision/assets
final_score = loaded_model.evaluate(test_data)
790/790 [==============================] - 56s 70ms/step - loss: 1.0577 - accuracy: 0.7834
Yes! we did it! we beat the original DeepFood’s score of 77.4%.
#saving the tensorboard data online - I have multiple models since Iyes tried a couple of ideas
!tensorboard dev upload --logdir ./training_logs \
--name "Fine-tuning EfficientNetB0 on all Food101 Data" \
--description "Training results for fine-tuning EfficientNetB0 on Food101 Data with learning rate 0.0001" \
--one_shot
Click here to view the analysis at tensorboard.
Downloading the files to my computer. these files should be available in the github repository.
!zip -r /content/file.zip '101_food_feature_extract_mixedpred_model'
adding: 101_food_feature_extract_mixedpred_model/ (stored 0%)
adding: 101_food_feature_extract_mixedpred_model/assets/ (stored 0%)
adding: 101_food_feature_extract_mixedpred_model/variables/ (stored 0%)
adding: 101_food_feature_extract_mixedpred_model/variables/variables.index (deflated 73%)
adding: 101_food_feature_extract_mixedpred_model/variables/variables.data-00000-of-00001 (deflated 8%)
adding: 101_food_feature_extract_mixedpred_model/saved_model.pb (deflated 92%)
from google.colab import files
files.download("/content/file.zip")
!zip -r /content/file2.zip 'efficientnetb0_fine_tuned_101_classes_mixed_precision'
adding: efficientnetb0_fine_tuned_101_classes_mixed_precision/ (stored 0%)
adding: efficientnetb0_fine_tuned_101_classes_mixed_precision/assets/ (stored 0%)
adding: efficientnetb0_fine_tuned_101_classes_mixed_precision/variables/ (stored 0%)
adding: efficientnetb0_fine_tuned_101_classes_mixed_precision/variables/variables.index (deflated 78%)
adding: efficientnetb0_fine_tuned_101_classes_mixed_precision/variables/variables.data-00000-of-00001 (deflated 8%)
adding: efficientnetb0_fine_tuned_101_classes_mixed_precision/saved_model.pb (deflated 92%)
files.download("/content/file2.zip")
We acheived an accuracy of about 78.3%. With this we beat the score set by the original DeepFood paper.
It feels great that we achieved our goal. But for those who were keen, our training accuracy was very high compared to our test accuracy. This is probably due to over fitting. On investigation I found out that the EfficientNet model was trainied upon the ImageNet dataset. These images closely resemble our images. due to this our model ended up slightly overfitting.
Nevertheless, we acheived our objective and that’s all that matters.
Feel free to contact me on any of my social’s for anything!, Peace out!