Sunteți pe pagina 1din 12

IMAGE SUPER-RESOLUTION

USING DEEP LEARNING


NIKHIL GUPTA (2015EEB1066) and KARTIKEY GHAI (2015EEB1058)

INTRODUCTION-
In today’s world, high resolution images and videos are desirable in a wide
variety of applications such as video surveillance, medical diagnosis, improving
performance of algorithms such as for face detection, etc. With the
development of imaging techniques rapidly, resolution has reached a new
level. So to meet the needs of the fast growing imaging techniques, image
resolution enhancement is in great demand. Currently present techniques are
not sufficient to satisfy these demands.

There are situations when image taken is of low resolution with reference to
the application purpose and there is no way to take an image that is of better
resolution than this. It can happen when the devices that we use are restricted
to a particular resolution, or the object in the image taken is of very small size
as compared to the size of the image. So, there is requirement of some
technique that may help to improve the resolution of these images using
image features.

Super-resolution is one of the techniques to convert low-resolution images to


high-resolution images. Different methods to implement super-resolution are
sparse-coding (SC), neighbour-embedding + locally-linear embedding, bi-cubic
interpolation, convolutional neural network etc.

We implemented convolutional neural network (CNN) for doing super


resolution. It uses the features of the images to predict their appearance in
high resolution. This is an example-based learning method. Convolutional
neural network is trained on the train set and tested on the validation set.
Steps followed in this method 
1. Pre-processing
2. Feature Extraction
3. Non-linear mapping
4. Reconstruction
Feature extraction, non-linear mapping and reconstruction are casted as
layers in the convolutional neural network.

Pre-processing – This step involves upscaling of low-resolution image to


high-resolution image (say up-scaled image is Img). It leads to degradation of
image quality.

Feature Extraction – It is used to extract set of feature maps from Img.


Non-linear Mapping – It is used to map the feature maps that represent
the lower-resolution to higher-resolution.

Reconstruction – Used to produce higher-resolution image from high-


resolution patches.

Structure of Convolutional Neural Network


implemented 
Input image : Lower resolution image up-sampled to higher resolution
image (of desired size) and ‘p’ channels i.e. colour components of the image.

Layer-1-
64 filters of size 9x9

Activation Function used : ReLU


Optimizer : adam

Output : 64 feature maps

Layer-2-
32 filters of size 1x1

Activation function used : ReLU

Optimizer : adam

Output : 32 feature maps

Layer-3-
1 filter of size 5x5

Activation Function used : Identity

Optimizer : adam

Output : Image of desired size

Here all the layers of the convolutional neural network use ‘mean-squared
error ’ as the loss function.

Experiment -
Dataset which we are using contains 101 images. These images downloaded
from google in different sizes: 32x32, 64x64, 124x124 and 225x225. It consists
of face images of women celebrities. This dataset was split into training and
testing sets. Training set contains 90 images and the test (validation) set
contains 11 images. Model was trained on different filter sizes and number of
layers. When we added one more layer in between the network with filter size
of 1x1 and tried to train the network, it was converging slowly and not able to
get the desired results compared to the three layers network case.So, the
filters for each layers that give the most approximate results are-

Size of filter for layer 1 – 9x9

Size of filter for layer 2 – 1x1

Size of filter for layer 2 – 5x5

This network was trained on two specifications.

First Specification –
Low resolution image – 32x32

High resolution image – 64x64

Model with this specification is trained for 1000 epochs with batch size of 1.
Firstly, 32x32 images are converted to 64x64 images and given as input to the
convolution neural network. Then the model is run for 1000 epochs on the
training set. After the model gets trained, we run the procedure to generate
64x64 images and compare it with the images that were actually of size 64x64.

Second Specification –
Low resolution image – 124x124

High resolution image – 225x225

Model with this specification is trained for 1000 epochs with batch size of 1.
Firstly, 124x124 images are converted to 225x225 images and given as input to
the convolution neural network. Then the model is run for 600 epochs on the
training set. After the model gets trained, we run the procedure to generate
225x225 images and compare it with the images that were actually of size
225x225.
Observations –
Some of the images are shown below for different outputs from different
layers. These layers are actually behaving as filters.

Repesentation in first layer –

Fig.1 Fig.2 Fig.3

Fig.4 Fig.5 Fig.6


Fig.7 Fig.8

Output from the first layer seems to detect the features of the image.

Fig.5 and Fig.6 seems to detect edge features from the image.

It can also be observed that some kind of smoothing filters are applied on
images to get Fig.7 and Fig.8. So, it can be inferred that some of the filters are
behaving as edge detection filters and some as smoothing filters, etc.

Representation in second layer –


Image in this layer seems to resemble to the original images. Different textures
and colours in the images are visible. Almost all the features are well learned
by the network in this layer.
Comparison of Input Images
And Reconstructed Images

Input Image
Output of Trained Network(Reconstructed Image)
Input Image
Output Image of Trained Network (Reconstructed Image)

Image output from the neural network gets smoothed out. Contrast is reduced
in the output image and image seems to be slightly washed out. Edges are not
clearly visible as the image is smoothed. Some of the details are lost.

Now, we apply to an image taken outside of dataset and observe that how the
model is performing on this image.
Input Image (fig.1)

Reconstructed Image (fig.2)

When we observe fig.1 and fig.2, it can be seen that the layers are not working
as good as compared to the dataset that we used. Along with the loss of details
of image, neither the image is getting smoothed nor the edges are being
detected correctly. This is because the network was trained on face images
that have very different features with respect to the fig.1. So, the network is
not able to perform well.
Conclusion –
Although the neural network is able to learn the features from the images but
due to small number of epochs it is not able to preserve details. If we train the
model on a large number of epochs, it may be able to learn the features more
accurately and its performance may get improved.

Also, the dataset we are using is not very general and small. A general and
large dataset will improve the performance in real life. But this will also
increase the epoch requirements further which is computationally expensive.

Hence, this model cannot be applied in medical field because details of the
images are not preserved.

References-
1. C. Dong, C. C. Loy, K. He and X. Tang, "Learning a Deep Convolutional Network for Image
Super-Resolution," 2014.
2. A. Greaves and H. Winter, "Multi-Frame Video Super-Resolution Using Convolutional Neural
Networks," 2016.
3. Wikipedia

S-ar putea să vă placă și