Sunteți pe pagina 1din 5

EE7390:PatternRecognitionandMachineLearning

ProjectReport

EE14MTECH11005
EE14RESCH11006

DeterminingmoodfromfacialexpressionusingConvolutionalNeural
Networks
Dr.KSriRamaMurty(Instructor:EE7390)
L.PraveenKumarReddy(EE14MTECH11005),M.ShanmukhReddy(EE14RESCH11006)

Abstract:

In many pattern recognition and image classification problems,Convolutional Neural


Networks(CNNs)provideabettersolutioncomparedtoothertechniques.Determiningapersons
mood from his/her facial expression is an interesting learning problem.Herewecome up witha
supervised CNN model,that learns features from image database(face images),and then gets all
its parameters estimated.Once trained on a particular dataset, model predicts the mood
depending on the facial expression.Our model predicts the mood with an accuracy of 67.001%
with7differentexpressionsnamelyangry,happy,sad,disgust,surprise,neutral,fear.

keywords
:CNNs,Pooling,softmax.

Introduction:

The technology in humancomputer interaction system is improving on and on. Now,


machines(computers/electronic devices) are advanced to a stage wheretheycandetect, process,
reconstruct a meaningful information from speech signals, images, videos what not.Mimicking
human visual system is quite challenging and a good interface for computer human
interaction/communication.

Humansinteractwitheach
othermostlythroughspeech,andalsothroughbodygestures
to emphasize a certain part of speech and/or
display of emotions. In order to achieve more
effective humancomputer in
teraction, recognizing the emotional state of the human fromhis
or her face could prove to be an invaluable tool. With this motivation we come up with a
modelthatcanrecogniseaperson'smoodbasedonfacialexpression.

There has been research work in emotion detection from past two decades,but various
machine learning approach basedmodelswerebeingproposedinrecentyears.Briefoverviewof
different methods: In [1], the facial landmarks of every image are extracted at first stretch,then
they are using them as input to train a SVM(Support Vector Machine) model.The spatial
relationship/information was not taken into account and is dependent on finding landmarks on
every image.Wepropose aCNNbasedmodelwithseveralconvolutionallayerseachfollowedby

EE7390:PatternRecognitionandMachineLearning
ProjectReport

EE14MTECH11005
EE14RESCH11006

subsampling/pooling layers to extract features, then feed it to a neural network with a softmax
activationfunction.

figure1:sevenemotionsstartingwithsurprise,sad,neutral,happy,disgust,angerandfear
databasesource:EmotionLab(http://www.emotionlab.se/resources/kdef)

The following sections of this report are organised with a brief overview on CNN
followedbyproposedrecognitionalgorithm,resultsandthenconclusion.

[2]
ConvolutionalNeuralNetworks
:

A neural network is a system of interconnected artificial neurons that exchange


messages between each other. The connections have numeric weights that are tuned during the
training process, so that a properly trained network will respond correctly when presented with
an image or pattern to recognize. The network consists of multiple layers of featuredetecting
neurons. Each layer has many neurons that respond to different combinations of inputs from
the previous layers. The layers are built up so that the first layer detects a set of primitive
patterns in the input, the second layer detects patternsofpatterns,thethirdlayerdetectspatterns
ofthosepatterns,andsoon.

The design of a CNN is motivated by the discovery of a visual mechanism, the visual
cortex, in the brain. The visual cortex contains a lot of cells that are responsible for detecting
light in small, overlapping subregions of the visual field, which are called receptive fields.
These cells act as local filters over the input space, and the more complex cells have larger
receptive fields. The convolution layer in a CNN performs the function thatisperformedbythe
cells in the visualcortex.ACNNis aspecialcaseoftheneuralnetworkdescribedabove.ACNN
consists of one ormoreconvolutionallayers, oftenwithasubsamplinglayer,whicharefollowed
by one or more fully connected layers as in a standard neural network.In a CNN, convolutional
layers playtheroleof featureextractor.Buttheyarenothand designed.Convolutionfilterkernel

EE7390:PatternRecognitionandMachineLearning
ProjectReport

EE14MTECH11005
EE14RESCH11006

weightsaredecidedonaspartofthetrainingprocess.Convolutionallayersareabletoextractthe
localfeaturesbecausetheyrestrictthereceptivefieldsofthehiddenlayerstobelocal.

figure2:AnexampleofConvolutionalNeuralNetworkArchitecture.

In a CNN, convolutional layers play the role of feature extractor. But they are not hand
designed. Convolution filter kernel weights are decided on as part of the training process.
Convolutional layers are able to extract the local features because they restrict the receptive
fieldsofthehiddenlayerstobelocal.

LayersofCNN
By stacking multiple and different layers in a CNN, complex architectures are built for
classification problems. Four types of layers are most common: convolution layers, pooling or
subsamplinglayers,nonlinearlayers,andfullyconnectedlayers.

Convolutional Layers : The convolution operationextractsdifferentfeaturesoftheinput.


The first convolutionlayerextractslowlevelfeatureslikeedges,lines, andcorners.Higherlevel
layersextracthigherlevelfeatures.
Pooling/SubsamplingLayer:Thepooling/subsamplinglayerreduces theresolutionofthe
features. It makes the features robust against noise and distortion. There are two ways to do
pooling: max pooling and average pooling. In both cases, the input is divided into
nonoverlappingtwodimensionalspaces.
Nonlinear layers :
Neural networks in general and CNNs in particular rely on a
nonlinear trigger function to signal distinct identification of likely features on each hidden
layer. CNNs may use a variety of specific functions ,such as rectified linear units
(ReLUs),sigmoids,tanhfunctionstoefficientlyimplementthisnonlineartriggering.
Fully connected layers : Fully connected layers are often used as the final layers of a
CNN. These layers mathematically sum a weighting of the previouslayeroffeatures, indicating

EE7390:PatternRecognitionandMachineLearning
ProjectReport

EE14MTECH11005
EE14RESCH11006

the precise mix of ingredients to determine a specific target output result. In case of a fully
connected layer, all the elements of all the features of the previous layer get used in the
calculationofeachelementofeachoutputfeature.
RecognitionAlgorithm:

Similar to every CNN architecture, our model has four convolutionlayers,eachfollowed


by a max pooling layer. As we grow the layers, the number of filters increase from previous
layer. The extracted feature vectors are flattened and classified using softmax. Corresponding
numbersoffiltersandsizesinthearchitecturearementionedbelow.

ConvolutionallayerI:64filterseachofsize3*3
maxpoolinglayerI:2*2withoutanyoverlap
ConvolutionallayerII:128filterseachofsize3*3
maxpoolinglayerII:2*2withoutanyoverlap
ConvolutionallayerIII:256filterseachofsize3*3
maxpoolinglayerIII:2*2withoutanyoverlap
ConvolutionallayerIII:256filterseachofsize3*3
maxpoolinglayerIV:2*2withoutanyoverlap
flatteningtheoutputsandgivingittoalayerwith512nodes(neurons)
activationlayer:activationusingsigmoidfunction
Outputlayerwith7nodes
activationofoutputlayerusingsoftmaxfunction

Image database with which we trained our model contains 2400 images,which includes
35 males and 35 females with 7 facial expressions.from 240 images we are using 1800 images
for training and 600 images for testing.The database used has images of size is 562*762*3
pixels,which is very large when compared with the usual CNN input images. As a part of
preprocessing we madesurethateveryimageisresized,convertingthecolorimagestograyscale
and resized to 128*128.In the first convolutional layer the modeofconvolutionalisfullwhich
means it includes zero padding such that the output of the convolution is
(M1+N11)*(M2+N21). In the remaining convolutional layers the mode of convolution is
validwhichincludesnozeropaddingsooutputsizeis(M1N1+1)*(M2N2+1).

EntiremodelisbuildusingKerasinpython.

Results:

As first cut, we trained our model with only 471 (frontal view images) images of the
dataset we had, out of which 371 images are used as training set and rest(100) images as test

EE7390:PatternRecognitionandMachineLearning
ProjectReport

EE14MTECH11005
EE14RESCH11006

set.Our model gave an accuracy of 67.001% within 50 epochs. The training for entire datawith
five different views is being trained,1800 as training setand600astestingset,andyettogetthe
output.

Conclusion:

At present we have chosen a particular(of a particular region/country) dataset of facial


images and designed a model based on CNNs,this can be more generalised/universalised by
picking a dataset which is global. In addition to this, an SVM model that can use the features
extracted from CNN as input may improve the accuracy.This can be extended to a sequence of
frames from a video to provide dynamic classification which might be useful in measuring the
moodofaudienceinmeetingsetc.

Acknowledgements:

We would like to thank our course instructor, Dr. K. Sri Rama Murty, for suggesting to
go with CNN based classification rather than depending on the facial landmarks in this
project.WealsothankEmotionLabatKarolinskyforprovidingafreeaccesstotheirdatabase.

References:

[1]StanfordCSclass

CS231n:ConvolutionalNeuralNetworksforVisualRecognition
.
[2]UsingConvolutionalNeuralNetworksforImageRecognition.
SamerHijazi,RishiKumar,andChrisRowen
[3]TheKarolinskaDirectedEmotionalFaces.
DepartmentofClinicalNeuroscience,Psychologysection,Karolinska
[4]Kerasdocumentation.

S-ar putea să vă placă și