Documente Academic
Documente Profesional
Documente Cultură
Abstract also need to specifically mention the loss function that the
CNN has to minimize. If we don’t, the CNN might take
In our project, we investigate conditional adversarial a naive approach and produce disfigured images. Sharp,
networks that have been used in a variety of image-to-image realistic images need accurate loss functions, predicting
translation tasks for quite some time now. These networks which requires quite a lot of expert knowledge.
not only learn the mapping from the input image to the out-
put image but also learn the loss function to train this map- We thus need to make use of a network that automat-
ping. We’ve thus been able to apply a similar approach to ically learns an appropriate loss function to make the
a wide variety of tasks with different loss functions. Most output indistinguishable from reality. This is exactly what
problems in image processing, computer graphics & com- Generative Adversarial Networks (GANs) do. Using a
puter vision can be posed as translating an input image into Generator G, GANs produce the output image which is
a corresponding output image. These tasks include synthe- then classified by the Discriminator as real or fake. This
sizing photos from labels, converting aerial images to maps, totally eliminates the possibility of producing a blurry [5, 6]
and colorizing images. In this project, we solely focus on output since it will easily be classified as a fake image. In
one specific application: Pix2Pix, i.e., the reconstruction of our project we direct our focus on Conditional Generative
objects from edge maps. Adversarial Networks (CGANs), where we condition on an
input image to produce a corresponding output image.
1
Another interesting application of Pix2Pix has been
made by Christopher Hesse [18] in his Image-to-Image
demo of cats. Around 2k stock photos and edges automati-
cally generated from those photos were used for training.
3. Approach
In our approach to implement the Pix2Pix model on an
entirely new dataset, we first need to prepare our training
and test data. The input images thus need to be resized,
combined with their edge maps and then split into train and
test sets.
2
The successively inherited edge-maps are more concise,
thus emphasizing the term ”nested”. This approach depicts
an integrated learning of hierarchical features.
3
Figure 4 is the output image generated by our Pix2Pix Figure 6 is an example of an input fruit image given to
model corresponding to the edge-map given in Figure 3. test our model.
Figure 4. Figure 7.
Figure 5 is the actual ground truth (real) image corre- Figure 7 is the output fruit image generated by our
sponding to the edge-map given as input. Pix2Pix model corresponding to the edge-map given in Fig-
ure 6.
Figure 5.
Figure 8.
From the results obtained above, we could make sure
that our Pix2Pix model was working perfectly on the dataset
proposed in the base paper.In order to extend the implemen- Figure 8 is the actual ground truth (real) fruit image
tation to our new dataset of fruits [4], the stock images were corresponding to the edge-map given as input.
obtained from an open source kaggle dataset, resized and
then combined with their edge-maps. The data was ran- We now see that the Pix2Pix model proposed in our
domly split into about 5k training images and 300 test im- base paper can with no doubt be extended to a customized
ages. After appropriate HED post-processing using MAT- dataset of our choice. Even though the images look close
LAB and training the model, the results obtained were as to perfect to the human eye, there are multiple evaluation
follows: techniques that can be applied to verify its correctness such
as the ones mentioned below.
5. Conclusion
Image-to-image translation is a challenging problem and
Figure 6. often requires specialized models and loss functions for a
4
given translation task or dataset. The results in this pa- [14] C. Li and M. Wand. Precomputed real-time texture
per suggest that conditional adversarial networks give out synthesis with markovian generative adversarial net-
notable results for many image-to-image translation tasks. works. ECCV, 2016.
These networks learn a loss adapted to the task and data
at hand, which makes them applicable in a wide variety of [15] Y. Zhou and T. L. Berg. Learning temporal transfor-
settings. mations from time-lapse videos. ECCV, 2016.
[3] Aron Yu, Kristen Grauman. Fine-Grained Visual Com- [18] Christopher Hesse-Interactive Image Translation with
parisons with Local Learning. CVPR, 2014. pix2pix-tensorflow https://affinelayer.com/pixsrv/
[4] Horea Muresan, Mihai Oltean. Fruit recognition from [19] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.
images using deep learning. Acta Univ. Sapientiae, In- Warde-Farley, S. Ozair, A. Courville, and Y. Bengio.
formatica Vol. 10, Issue 1, 2018. Generative adversarial nets. NIPS, 2014.
[13] M. Mathieu, C. Couprie, and Y. LeCun. Deep multi- [27] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance
scale video prediction beyond mean square error. normalization: The missing ingredient for fast styliza-
ICLR, 2016. tion. arXiv preprint arXiv:1607.08022, 2016.
5
[28] P. Sermanet, S. Chintala, and Y. LeCun. Convolutional
neural networks applied to house numbers digit classi-
fication. ICPR, 2012.
[29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon-
celli. Image quality assessment: from error visibility to
structural similarity. IEEE Transactions on Image Pro-
cessing, 13(4):600–612, 2004.
[30] D. Eigen and R. Fergus. Predicting depth, surface nor-
mals and semantic labels with a common multi-scale
convolutional architecture. In Proceedings of the IEEE
International Conference on Computer Vision, pages
2650–2658, 2015.
[31] S. Iizuka, E. Simo-Serra, and H. Ishikawa. Let there
be Color!: Joint End-to-end Learning of Global and
Local Image Priors for Automatic Image Colorization
with Simultaneous Classification. ACM Transactions
on Graphics (TOG), 35(4), 2016.