1. At the heart of many computer vision tasks like image classi cation, object detection,
segmentation, etc. is a Convolutional Neural Network (CNN).
2. In CNN, there is an arrangement of convolutional, pooling and fully connected layers that
learns the patterns or features from set of input images.
3. The result of the arrangement of these layers is what we call an model architecture. The
architecture tells us about the size and parameters associated with the model.
4. The history of classi cation model tells us that accuracy improvements in subsequent
years came at the expense of an increased number of model parameters.
5. e.g. In 2012, AlexNet won the ImageNet Competition beating the nearest competitor by
nearly 10% in top-5 accuracy on ImageNet dataset. But it used a whopping 62 million
6. GoogleNet, the winner of ImageNet Competition 2014, used only 6.8 million parameters
while being substantially more accurate than AlexNet.
7. So how did Googlenet with almost 1/10th parameter of Alexnet, performed better than
later. One of the reason is the model architecture itself.
8. We have seen for past several years, researchers are coming with new model architecture
that tries to solve problems like huge # of parameters, large size, learning complex non-
linear features, etc. We also want our model to run on any low performing devices like
mobile phones, edge devices and personal computers.
9. So how do we now design a network that is say half the size even though it is less
accurate? One of the way is model scaling.
10. The concept of model scaling is to use a standard network like GoogleNet or ResNet and
scale it up (i.e. use larger parameters) or down (i.e. use fewer parameters) by changing the
depth and/or width of the network, or the size of the input image.
11. CNNs are commonly developed at a xed resource cost, and then scaled up in order to
achieve better accuracy when more resources are made available. For example, ResNet
can be scaled up from ResNet-18 to ResNet-200 by increasing the number of layers.

Model Scaling

Following are few different ways, scaling can be achieved:

1. Change model depth: A CNN with a larger number of layers ( i.e. a deeper network ) can
hold richer details about the image and therefore is usually more accurate than a model
with a fewer number of layers ( i.e. a shallow network).

e.g. ResNet-18 and ResNet-200 are both based on the ResNet architecture, but ResNet-200
is much deeper than ResNet-18 and is, therefore, more accurate. On the other hand,
ResNet-18 is smaller and faster to run.

But there are a few problems with using very deep networks
They are di cult to train due to the vanishing gradient problem.
The gain in accuracy saturates after a certain depth.

2. Change model width: A CNN layer also have multiple channels much like the R, G, and B
channels of an image. A network with more channels per layer is considered wider than a
network with fewer channels.

e.g. WideResNet and MobileNets are not very deep but are wide.

3. Input image resolution: CNN architectures take in images of xed size as input. It is
obvious that a 512×512 image has more information than a 256×256 image. Therefore,
one can change the architecture to take in a larger input image and improve accuracy. This
increase in accuracy but requires more processing power because a 512×512 image has
4x more pixels compared to a 256×256 image.

E cientNet: Rethinking Model Scaling for Convolutional Neural Networks

Excerpt from the paper:

1. While these methods do improve accuracy, they usually require tedious manual tuning, and
still often yield suboptimal performance. What if, instead, we could nd a more principled
method to scale up a CNN to obtain better accuracy and e ciency?
2. We propose a novel model scaling method that uses a simple yet highly effective
compound coe cient to scale up CNNs in a more structured manner.
3. Compound Model Scaling: In order to understand the effect of scaling the network, we
systematically studied the impact of scaling different dimensions of the model. While
scaling individual dimensions improves model performance, we observed that balancing
all dimensions of the network—width, depth, and image resolution—against the available
resources would best improve overall performance.
The authors found that the choice of the initial model to scale makes a difference in the nal
output. So they developed their own baseline architecture and called it E cientNet-B0. Based
on this baseline model, they developed a family of networks called E cientNet. Following is the
comparison taken from the research paper.

For this presentation, I have implemented E cientNetB2 and compared it with Resent50.

Mounting Drive

1 %tensorflow_version 1.x
2 import tensorflow as tf
3 print(tf.__version__)
5 tf.keras.backend.clear_session()
7 import numpy as np
8 import os
9 import matplotlib.pyplot as plt
11 from tensorflow.keras.models import Sequential
12 from tensorflow.keras.layers import Dense, Dropout, GlobalAveragePooling2D
13 from tensorflow.keras.optimizers import Adam
14 from tensorflow.keras.applications.resnet50 import ResNet50
15 from tensorflow.keras.preprocessing.image import ImageDataGenerator
16 from tensorflow.keras.applications.resnet50 import preprocess_input as keras_pp
17 from tensorflow.keras.callbacks import ReduceLROnPlateau
19 from efficientnet.tfkeras import EfficientNetB2
20 from efficientnet.tfkeras import preprocess_input as eff_pp

1.15.2


Creating models
1 # constants for both model
2 input_shape = (224, 224, 3)
3 num_classes = 3
5 def getModel(baseModel):
7   return Sequential([
8                   baseModel,
9                   GlobalAveragePooling2D(name="global_avg_pool"),
10                   Dense(units=512, activation='relu', name='Dense_1'),
11                   Dropout(rate=0.2, name='Dropout_1'),
12                   Dense(units=256, activation='relu', name='Dense_2'),
13                   Dropout(rate=0.2, name='Dropout_2'),
14                   Dense(units = num_classes, activation='sigmoid', name='output
15               ])
17 # Resnet50 model
18 resnet_base_model = ResNet50(weights="imagenet", include_top=False, input_shape=
20 resnet_model = getModel(resnet_base_model)
22 print('\n[INFO] Summary of resnet50 model..\n')
23 resnet_model.summary()
26 # efficientnetB2 model
27 eff_base_model = EfficientNetB2(weights="imagenet", include_top=False, input_sha
29 effnet_model = getModel(eff_base_model)
31 print('\n[INFO] Summary of efficientnetB2 model..\n')
32 effnet_model.summary()
[INFO] Summary of resnet50 model..

Model: "sequential"
Layer (type) Output Shape Param #
resnet50 (Model) (None, 7, 7, 2048) 23587712
global_avg_pool (GlobalAvera (None, 2048) 0
Dense_1 (Dense) (None, 512) 1049088
Dropout_1 (Dropout) (None, 512) 0
Dense_2 (Dense) (None, 256) 131328
Dropout_2 (Dropout) (None, 256) 0
output (Dense) (None, 3) 771
Total params: 24,768,899
Trainable params: 24,715,779
Non-trainable params: 53,120
[INFO] Summary of efficientnetB2 model..

Model: "sequential_1"
(type) Model Output Shape Param #
efficientnet-b2 (Model) (None, 7, 7, 1408) 7768562
1 epochs = 30
global_avg_pool (GlobalAvera (None, 1408)
2 learningRate = 0.001 0
3 dataset_path = '/content/drive/My Drive/Colab Notebooks/datasets/classification/
Dense_1 (Dense) (None, 512) 721408
5 resnet_datagen = ImageDataGenerator(brightness_range=(0.10, 0.25),
Dropout_1 (Dropout) (None, 512) 0
6                                     rotation_range=20,
7                                     preprocessing_function=keras_pp,
Dense_2 (Dense) (None, 256) 131328
8                                     validation_split=0.2)
9   Dropout_2 (Dropout) (None, 256) 0
10 resnet_train_gen = resnet_datagen.flow_from_directory(dataset_path,
output (Dense) (None, 3) 771
11                                                       batch_size=32,
12                                                       target_size=(224, 224),
Total params: 8,622,069
13                                                       subset='training')
Trainable params: 8,554,501
14   Non-trainable params: 67,568
15 resnet_val_gen = resnet_datagen.flow_from_directory(dataset_path,
16                                                   batch_size=32,
17                                                   target_size=(224, 224),
18                                                   subset='validation')
20 # Adam optimisers
21 opt = Adam(lr=learningRate)
23 callbacks = [ReduceLROnPlateau(patience=2, verbose=1)]
25 # compiling model
26 print('[INFO] Compiling Resent50 model...')
27 resnet_model.compile(loss = 'categorical_crossentropy', optimizer = opt, metrics
29 # training
30 print('[INFO] Training Resnet50...')
31 history_resnet = resnet_model.fit_generator(resnet_train_gen, epochs=epochs, val
Found 2016 images belonging to 3 classes.
Found 504 images belonging to 3 classes.
[INFO] Compiling Resent50 model...
[INFO] Training Resnet50...
Training EfficientNet
1 effnet_datagen = ImageDataGenerator(brightness_range=(0.10, 0.25),
Found 2016 images belonging to 3 classes.
Found 504 images belonging to 3 classes.
[INFO] Compiling EfficientNetB2 model...
[INFO] Training EfficientNetB2...
Saving Models
1'/content/drive/My Drive/Colab Notebooks/resnet50.h5')
Size of Resnet50 model file = 297.62772 MB
Size of EfficientNetB2 model file = 104.20259999999999 MB
Plotting Training and validation accuracy

1 # Getting range or number of epochs actually trained.
2 N = np.arange(0, len(history_effnet.history["loss"]))
4 def plot_graphs(title, metric):
6   plt.figure(figsize=(15, 5))
7   plt.plot(N, history_effnet.history[metric], label="efficient_net_B2")
8   plt.plot(N, history_resnet.history[metric], label="resnet_50")
9   plt.xlabel("Epoch #")
10   plt.ylabel(title)
11   plt.legend()
15 plot_graphs("Training Accuracy", "acc")
16 plot_graphs("Training Loss", "loss")
17 plot_graphs("Validation Accuracy", "val_acc")
18 plot_graphs("Validation Loss", "val_loss")
1 print('\n[INFO] Evaluating Resent50 Model..\n')
2 loss, accuracy = resnet_model.evaluate(resnet_val_gen)
3 print('Loss: {} and Accuracy: {}'.format(loss, accuracy))
5 print('\n[INFO] Evaluating EfficientNetB2 Model..\n')
6 loss, accuracy = effnet_model.evaluate(effnet_val_gen)
7 print('Loss: {} and Accuracy: {}'.format(loss, accuracy))

[INFO] Evaluating Resent50 Model..

16/16 [==============================] - 8s 478ms/step - loss: 0.0219 - acc:

Loss: 0.021850345341590582 and Accuracy: 0.9920634627342224

[INFO] Evaluating EfficientNetB2 Model..

16/16 [==============================] - 8s 496ms/step - loss: 6.2201e-04 - a

Loss: 0.0006220091796933502 and Accuracy: 1.0

