Sunteți pe pagina 1din 27

DCGAN(Deep Convolution Generative

Adversarial Networks)
Presented by:
Lakpa Dorje Tamang
Dept. of Information and Communication Engineering
Introduction
• Unsupervised representation learning with Deep Convolutional GANs
• Authors: Alec Radford and Luke Metz
• Previously, we talked about GANs which provided an attractive alternative to maximum likelihood
techniques for supervised tasks.
• Their learning process and the lack of heuristic functions are attractive to representation learning.
• But, GANs are unstable to train, so often resulting in generators that produce nonsensical outputs.
Representation learning from unlabeled data
• Unsupervised representation learning is fairly well studied problem in computer vision research as well as in
the context of the images.
• K-means clustering is the classic approach to unsupervised representation learning to do clustering of data
and hold the clusters for improving classification scores.
• In context of images, hierarchical clustering of image patches is good to learn powerful image
representations.
• Another method is to train auto encoders separating the what and where components of the code, ladder
structures that encode an image into a compact code and decode the code to reconstruct the image as
accurately as possible.
Generating Natural Images

• Non - Parametric: This model often do matching from the data base of existing images, often matching
patches of the images and frequently used in texture synthesis, super resolution and in painting.
• Parametric:
-A variational sampling approach for generating images but the samples suffer from blurry effect.

-GAN generated images suffering from being noisy and incomprehensible.

-A Laplacian pyramid extension to this approach showed higher image quality but they still suffered
from the objects looking wobbly due to noise introduction in chaining multiple models.

-A deconvolutional approach had some success with generating natural images but have not
completely leveraged the generator for supervised tasks.
Visualizing the internal of CNNs

• There was constant criticism of using neural networks, that they are black box methods, with little
understanding of what the networks do in the form of simple human consumable algorithm.
• But in the context of CNN, by using deconvolutions and filtering the maximal activations, we can find the
approximate purpose of each convolution filter in the network.
• Also use of gradient descent in the inputs will allow us to inspect the ideal image that activates certain
subsets of filters.
Approach and Model Architecture

• Historical attempts to scale up GAN using CNNs to model images have been unsuccessful which motivated
for development of an alternative approach to iteratively upscale low resolution images which can be
modeled more reliably.
• In this paper, the authors have identified a family of architectures that resulted in stable training across a
range of datasets.
• It also allows for training higher resolution and deeper generative models.
• Core of this approach is adopting and modifying three recently demonstrated changes to CNN architectures.
First change

• Replacing pooling function with strided convolutions.


• This replacing allows the network to learn its own spatial downsampling.
• This approach is used in the generator allowing it to learn its own spatial upsampling.
Second change

• Eliminating fully connected layers on top of convolutional features. Eg: global average pooling
• By this change the authors of the paper found that the global average pooling increased the stability of
model but hurt the convergence speed.
Third change

• Batch Normalization which stabilizes learning by normalizing the input to each unit to have zero mean and
unit variance.
• This helps to deal with training problems that arise due to poor initialization and help gradient flow in
deeper models.
• Directly applying batchnorm to all layers however resulted in sample oscillation and model instability.
• This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.
• The ReLU (Rectified linear unit) activation is used in the generator with the exception of the output
layer which uses the Tanh function.
• Use of bounded activation allowed model to learn quickly to saturate and cover the space of the training
distribution.
• The leaky activation worked well especially for higher resolution modeling, which is in contrast to the
original GAN paper that used maxout activation.
Details of adversarial training

• In this paper DCGAN was trained on three datasets large scale scene understanding (LSUN), Imagenet-1k and
newly assembled faces dataset.
• No any preprocessing was applied to the training images besides scaling to the range of tanh activation
function [-1, 1].
• All models were trained with mini batch stochastic gradient descent (SGD) with mini batch size of 128.
• All weights were initialized from a zero center normal distribution with standard distribution 0.02.
• In Leaky ReLU the slope of leak was set to 0.2.
• Optimizer = Adam Optimizer
• Learning rate= 0.0002
• The leaving momentum term B1 was set to 0.5 which helped in stabilizing the training.
• The figure aside is block diagram of
DCGAN generator used for LSUN scene
modelling.
• Vector of 100 random values between -1
and 1 is put through several
convolutional transpose layers to turn
this vector into an image.
• There is series of four fractionally
strided convolutions and convert this
high level representation into a 64 X 64
pixel image.
• No any fully connected or pooling layers
are used.
LSUN

• As visual quality of samples from


generative image models has
improved, concerns of over
fitting and memorization of
training samples has risen.
• This paper demonstrates the
training of model on LSUN
bedroom dataset containing
over 3 million training examples.
• Below figure is generative
bedrooms after one training
passed through the dataset.
• This figure is generated bedrooms after five epochs of training.
• This dataset has 3M images from 10K people.
• Using open CV face detector on these images,
keeping the detections that are sufficiently high
resolution, which gave approximately 350,000
face boxes which were used for training.
Faces • No data augmentation was applied to the
images.
• Data augmentation is a technique commonly
used in increasing the image data size for image
classification task.
Imagenet-1k

• Imagenet-1k was also used as a source of natural images for


unsupervised training.
• No data augmentation was applied.
Classifying CIFAR-10 using GANs as a feature extractor

• The technique for evaluating the quality of unsupervised representation learning is to apply them as feature extractor
on the supervised datasets and evaluating the performance of linear models fitted on top of these features.
Classifying SVHN digits using GANs as a feature extractor

• Here, 1000 uniformly class distributed training examples are randomly selected and used to train a
regularized linear L2-SVM classifier on top of the same feature extraction pipeline used for CIFAR-10.
• The latent space is the space in which the data lies in the
bottleneck layer.
What is a latent space? • The latent space contains a compressed representation of
the image, which is the only information the decoder is
allowed to use to try to reconstruct the input as faithfully
as possible.
GAN feature visualization

• Feature visualization in CNNs is achieved as, a generator


network is trained via a gradient descent to produce image
that results in maximum activation from a given feature.
• In this paper, the authors have tested this with their
discriminator model on the LSUN bedroom dataset and
present the following image.
• These are the features that the discriminator suse to tell if
the images are real or fake.
Vector Arithmetic in Latent space

• What is Interpolation?
• Process of finding intermediate data points between specific known data points in the space.
• The generator model in the GAN architecture takes a point from the latent space as input and generates a
new image.
• The new images are generated using the random points in the latent space.
• The first one is vector arithmetic with faces.
• Smiling woman – neutral woman + neutral man = smiling man
• The second demonstration is the transition between two generated faces.
• A series of points can be created on a linear path between two points in the latent space which points can
be used to generate the series of images that show transition between two points.
Evaluating DCGANs capability to capture data
distributions
Summary of DCGAN

• Replaces all max pooling with convolutional strides


• Uses transposed convolution for upsampling
• Eliminate fully connected layers
• Use batch normalization except the output for the generator and the input layer of the discriminator
• Use ReLU in the generator except for the output which uses tanh
• Use Leaky ReLU in the discriminator
THANK YOU

S-ar putea să vă placă și