Biologically Inspired Cognitive Architectures: Hadjer Boubenna, Dohoon Lee T

Biologically Inspired Cognitive Architectures 24 (2018) 70–76
Contents lists available at ScienceDirect
Biologically Inspired Cognitive Architectures

journal homepage: www.elsevier.com/locate/bica
Research article
Image-based emotion recognition using evolutionary algorithms T

⁎
Hadjer Boubenna , Dohoon Lee
Department of Electrical and Computer Engineering, Pusan National University, Republic of Korea
A R T I C LE I N FO A B S T R A C T
Keywords: In pattern recognition, the classification accuracy has a strong correlation with the selected features. Therefore,
Pattern recognition in the present paper, we applied an evolutionary algorithm in combination with linear discriminant analysis
Evolutionary algorithms (LDA) to enhance the feature selection in a static image-based facial expressions system. The accuracy of the
Optimization classification depends on whether the features are well representing the expression or not. Therefore the opti-
Machine learning
mization of the selected features will automatically improve the classification accuracy. The proposed method
Supervised learning
not only improves the classification but also reduces the dimensionality of features. Our approach outperforms
linear-based dimensionality reduction algorithms and other existing genetic-based feature selection algorithms.
Further, we compare our approach with VGG (Visual Geometry Group)-face convolutional neural network
(CNN), according to the experimental results, the overall accuracy is 98.67% for either our approach or VGG-
face. However, the proposed method outperforms CNN in terms of training time and features size. The proposed
method proves that it is able to achieve high accuracy by using far fewer features than CNN and within a
reasonable training time.
Introduction on the hold-out set.

And then the score is calculated based on the error of the model (for
Emotion recognition based on static images is one of the hardest the given subset). Wrapper methods are computationally expensive
pattern recognition problems, because facial expressions can be am- however they provide a good performance (Kohavi & John, 1997). In-
biguous and may have several possible interpretations. Thus for vision stead of using a predictive model, filter based methods use a proxy
community, developing an automatic facial expression recognition measure such as mutual information (Kohavi & John, 1997) and
system is important. Facial expression recognition has increasingly pointwise mutual information (Guyon & Elisseeff, 2003). Filter based
been used in different fields such as entertainment, military, medicine, methods are computationally less expensive than wrappers based
E- learning, monitoring and marketing. methods however they provide lower performance than wrappers
The big size of features is a crucial issue for facial emotion re- (Kohavi & John, 1997). The third category is called embedded, this
cognition system and other pattern recognition systems. When the method performs the selection of features as part of the model con-
search space is very large a curse of dimensionality problem (Gheyas & struction process. Embedded techniques are computationally more
Smith, 2010) occurs which will drop the classification accuracy. complex than filters and less complex than wrappers techniques
Thus optimizing the selection of features is indispensable to over- (Kohavi & John, 1997). A hybrid feature selection can be a combination
come this problem and improve the classification accuracy. Feature of filters and wrappers. First, filters are applied to select the features,
selection has been widely applied to many machine learning applica- and then wrappers are applied to further refine the features (Yang &
tions, such as classification, regression and clustering (Dash & Liu, Pedersen, 1997).
1997). Features selection techniques can be grouped into four cate- Finding an optimal subset of features in a reasonable time (Hsu,
gories which are filter (Roffo & Melzi, 2016; Roffo, Melzi, & Cristani, Hsieh, & Lu, 2011; Liu & Motoda, 2007) is a hard task, however an
2015; Zhang et al., 2015), hybrid (Huerta, Duval, & Hao, 2006), approximation of the subset of features can be a solution to overcome
wrapper (Chuang, Yang, & Yang, 2009; Zhang, Wang, Phillips, & Ji, the time complexity. Therefore we suggest involving an optimization
2014) and embedded (Duval, Hao, & Hernandez Hernandez, 2009; technique in the task of feature selection to reduce the search space and
Hernandez, Duval, & Hao, 2007) based features selection. improve the classification accuracy. Our objective is to build an auto-
The first category is called wrapper, this method is based on a matic emotion recognition system using bio-inspired optimization to
predictive model, where each new subset is used for training and tested decrease the redundancy, and keep only the relevant features that
⁎
Corresponding author.
E-mail addresses: hadjer@pusan.ac.kr (H. Boubenna), dohoon@pusan.ac.kr (D. Lee).
https://doi.org/10.1016/j.bica.2018.04.008
Received 29 March 2018; Received in revised form 14 April 2018; Accepted 15 April 2018
2212-683X/ © 2018 Elsevier B.V. All rights reserved.
H. Boubenna, D. Lee Biologically Inspired Cognitive Architectures 24 (2018) 70–76
encode important information. neural network for classification. The results show that the proposed
We propose a genetic algorithm (GA) in combination with linear approach increased the emotion recognition accuracy by up to 26.08%.
discriminant classifier (LDA) to evaluate each combination of features, In Ghamisi and Benediktsson (2015) they proposed an integration of
we run the algorithm for a finite number of iterations until converging a GA and PSO, where the accuracy of SVM is used as fitness function.
to the optimum subset of features which will later be used to classify the According to the results performed on indian pines hyperspectral da-
emotion. taset, their method can select relevant features and achieve high ac-
The rest of the paper is structured as follows: in Section “Related curacy within a reasonable time.
Work” we describe the related works. In Section “Framework” we give In Oreski and Oreski (2014) a hybrid GA and neural networks (HGA-
an overview of the proposed framework. In Section “Experiments and NN) is proposed as feature selection for credit risk assessment. They use
analysis” we describe the experimental results. In Section “Conclusion” a real-world credit dataset to evaluate the performance of HGA-NN
we give the conclusion. classifier.
In Anbarasi, Anupriya, and Iyengar (2010) a hybrid of GA-SVM is
Related work used to select a relevant subset of bands mass in order to classify hy-
perspectral images, which can be applied to land cover investigation
Feature selection is indispensable in any pattern recognition system, and target detection.
in which the dimensionality of feature vectors will be decreased sig- Datasets in medical domain consist of much more disease mea-
nificantly. Decreasing the redundant features will increase the classifi- surements than records. Therefore feature selection is necessary to re-
cation accuracy automatically (Janecek, Gansterer, Demel, & Ecker, duce redundancy and the memory space used to stock the disease
2008). measurements.
Principal component analysis (PCA) is considered as one of the most In Li, Wu, Wan, and Zhu (2011) in order to reduce the number of
encountered feature dimensionality reduction techniques. It has been tests that the patient needs to take, GA is employed to determine which
applied to many pattern recognition researches such as face recognition tests contribute more toward the diagnosis of heart diseases, which will
(Abbas, Safi, & Rijab, 2017). In addition to PCA, LDA has also been reduce the number of tests automatically. Classification by clustering,
widely employed to reduce the features size (Moon, Choi, Kim, & Pan, decision tree and naive Bayes is then applied to predict the diagnosis. In
2015). LDA enhances the class separability by providing the most dis- Inbarani, Azar, and Jothi (2014) a hybridization of PSO is proposed,
criminant projection of features from a high dimensionality to a low PSO based quick reduct (PSO-QR) and PSO based relative reduct (PSO-
dimensionality, however the performance of LDA drops down when the RR) are used to optimize the selection of features for diseases diagnosis.
samples number is smaller than the dimensionality (Fukunaga, 2013). In addition to GA and PSO which are widely employed for feature
To address the problem of feature selection optimization, evolu- selection, ant colony optimization (ACO) also has been employed to
tionary algorithms have been widely applied to decrease the features reduce the size of features and select an optimal subset of features. In
size and enhance the selection process. The most used algorithms are Kashef and Nezamabadi-pour (2015) Advanced Binary ACO (ABACO) is
GA and PSO (particle swarm optimization). proposed for feature selection. The features form a fully connected
In Zhao, Fu, Ji, Tang, and Zhou (2011) GA is used to select the graph where each node consists of two sub-nodes, one to select and the
features and set the parameters for SVM (support vector machine) other to deselect the feature. At each tour, each ant has to visit all the
which will significantly influence the classification accuracy. The pro- nodes on the graph. At the end of the tour, every ant will have a vector
posed has been evaluated on several real-world datasets, according to of zeros and ones where 1 means select and 0 means deselect the cor-
the results, the fusion of GA and SVM has improved and the accuracy responding feature. In Chen, Chen, and Chen (2013) they propose an
and the processing time. alternative way to be traversed by the ants, for n features instead of
In Guo, White, Wang, Li, and Wang (2011) they propose GA feature using a fully connected graph they use a directed graph with only O(2n)
selection (GAFES) in software product line (SPL). GA is used to mini- arcs instead of O(n2) arcs.
mize or maximize an objective function such as a total cost. According
to the results, GAFES proves that it can achieve better results than other
feature selection methods and in less time. Framework
In Huang (2012) they propose a stock selection method using GA
and support vector regression (SVR). GA is used to optimize the model In this section, we present the proposed framework which consists
parameters and select the optimal set of input variables to the SVR of three basic steps, the first one is feature extraction, in which, each
model. The proposed GA-SVR has significantly outperformed the image is transformed into a feature vector which is supposed to be a
benchmark. representative of the original face image, the next step is feature se-
In Gheyas and Smith (2010) They proposed SAGA for feature selection which is the main aim of our study, in this step, the size of
lection which is a hybridization between simulated annealing (SA) and feature vectors will be significantly reduced by eliminating the irrele-
GA. SA is good at avoiding local minimum and GA has a high con- vant features and keep only a small subset of features that best re-
vergence rate. According to the results, the hybrid method outperforms presents the face emotion. These vectors are then used in the classifi-
other algorithms on real and synthetic datasets. cation step to segregate the features into different facial emotion
In Kabir, Shahjahan, and Murase (2011) they proposed a hybrid GA classes. Fig. 1 shows the framework of our approach.
and local search. They embedded a local search technique in GA to fine Before extracting the feature vectors, the images go through a pre-
tune the search. The local search aims to distinguish the nature of processing step, in which the region of interest is extracted from the
features based on their correlation information. They evaluate the original images. The absolute face area is extracted from the back-
performance of their approach on 11 real-world classification datasets ground. Extracting the face area is necessary to get more accurate
according to the results their method performs better than other well- emotion detection. Then the images are resized to 100∗100 pixels and
known feature selection algorithms. normalized to zero mean. Viola-Jones algorithm is employed to detect
In Suguna and Thanushkodi (2010) they combine GA with KNN. and extract the face area (Viola & Jones, 2004). The adaboost algorithm
Instead of choosing the k-neighbors among all the samples, GA is used used in viola-jones face detection aims to construct a strong classifier by
to select the k neighbors immediately and classify the new samples by laniary combining weak classifiers. Viola-jones is a robust and fast face
computing the distance. detection technique and it can be applied for real-time applications.
In Sidorov, Brester, Minker, and Semenkin (2014) they use self- After extracting the face area we applied canny edge (Canny, 1987)
adaptive multi-objective GA for feature selection and probabilistic detection operator.
71
Input (face image) Binary vector 0 1 1 0 1 1 0
Feature vector 0.03 0.21 0.09 0.56 0.33 0.45 0.12
Selected Features 0.21 0.09 0.33 0.45

Pre -proc es sing
Fig. 3. The candidate solution structure.
Fig. 3 each candidate solution is represented by two vectors whose

Feature extraction length is equal to 1360, the first vector is the feature vector re-
presenting the face emotion (the feature vector extracted in the pre-
vious step) and the second one is a binary vector where the ith feature in
the binary vector corresponds to the ith bit in the feature vector. The
Feature selection binary vector is used to determine which feature to be selected, when
the bit value corresponding to a feature is one it means that feature will
be selected by the algorithm otherwise the feature will not be selected.
Then, for each candidate solution, we train the classifier using the
selected features, and then we evaluate each solution using the loss
Classification
function (fitness function). The solutions that achieve high accuracy are
then selected stochastically, and their corresponding binary vectors will
undergo crossover and mutation to build the next generation of solu-
tions. The algorithm in pseudo-code is given below.
Output (class emotion)
Algorithm 1. GA selection:
Fig. 1. The Framework of our approach.
Feature extraction
In this step, each face image is represented by a vector of features.

We employed a new feature extraction technique called pyramid his-
togram of oriented gradients (PHOG) proposed in Bosch, Zisserman,
and Munoz (2007). PHOG aims to compute the histogram of oriented
gradients (HOG) (Dalal & Triggs, 2005) over each sub-region of the
image to ensure that images with more edges are not weighted strongly
than images with fewer edges PHOG takes into account the spatial
property which is represented as a pyramid structure as shown in Fig. 2.
The final PHOG histogram is represented by a concatenation of all the
HOG histograms at each pyramid level. The hierarchy structure pro-
vides more precision, however the size of feature vectors increases by
increasing the number of layers.
Table 1 compares the size of the original features with the size of the
selected features. Our approach proves that it can decrease the size of
features by about 54%.
Classification
The classification is the last step in facial emotion recognition pro-

Fig. 2. PHOG histogram computation.
cess. After selecting the features using GA, the reduced feature vectors
and their corresponding labels are then fed to the classifier to segregate
the features into different facial emotion classes. In this work, we have
Feature selection tried two different classifiers which are LDA and KNN in order to choose
which one performs best. LDA classifier developed by R. A. Fisher, in
In this step, the size of feature vectors will be significantly de- 1936. This method was applied widely in pattern recognition and ma-
creased. Feature selection is crucial because it is difficult to find a chine learning (Sharma & Paliwal, 2008). LDA aims to enhance the class
tradeoff between the size of features and the accuracy. Therefore we
proposed a method to optimize the selection using GA which is the most
used evolutionary algorithm. GA process consists of generating an in-
Table 1
itial population of solutions, and then evaluating each solution using a
Comparison between the size of feature vectors before
function called fitness function. Then, the more fit solutions are ran-
and after applying genetic selection.
domly selected, two-point crossover (rate = 0.8) and uniform mutation
(rate = 0.01) are then applied to build the next generation. The process Feature vector Size
is repeated through a finite number of iterations until the convergence
Original feature vectors 1360
toward the optimal solution is achieved. Reduced vectors by GA 626
In our study, GA feature selection process is as follows: as shown in
72
100 Hinge
98 Exponential
96 Binomial deviance
94
Classification error
92
happy angry sad contemptuous 90
88
Fig. 5. Comparison between the four fitness functions in terms of classification

accuracy.
disgusted neutral fearful surprised
Fig. 4. RaFD samples. •h =y
j
∗
j ′f(Zj). where hj represents the classification score for a true
predicted class.
separability by maximizing the between-class dispersion and mini-

We have compared the four fitness functions in terms of classifica-
mizing the within-class dispersion. KNN is widely applied for classifi-
tion accuracy using LDA classifier. Fig. 5 shows the accuracy over the
cation problems. KNN aims to classify a new sample according to its
six emotions and the average accuracy corresponding to each fitness
similarity to the closet neighborhood (Cover & Hart, 1967).
function. The average classification accuracy is 98.67% for hinge loss
and 97.33% for both binomial and exponential loss, and 96% for
Experiments and analysis
classification error. According to the results, training GA feature se-
lection using hinge loss as fitness function has achieved the highest
For the experiments, we adopted radboud faces database (RaFD)
accuracy. Based on these results we have selected hinge loss function to
(Langner et al., 2010). As shown in Fig. 4 the samples include 8 ex-
train our feature selection algorithm.
pressions. We have used the total frontal images (train-images = 306,
Further, we have compared the four fitness functions in terms of
test-images = 150).
time needed to train our algorithm for 50 generations. Table 2 shows
the time needed to train our method using different fitness functions.
The relationship between the fitness function and the accuracy
Based on the results the time cost is almost equal.
The choice of GA parameters has a strong impact on the perfor-
In this experiment, we train our method using several fitness func-
mance of feature selection. Fig. 6 shows the effect of the number of
tions in order to choose which one performs best. The functions used in
generations on the accuracy. From 20 to 50 generations the accuracy
this experiment are hinge loss (Gentile & Warmuth, 1999), exponential
increased from 96.67% to 98.67%, from 50 to 60 the accuracy was
(Friedman, Hastie, & Tibshirani, 2000), binomial deviance (Burges
stable. And from 60 to 70 generations the accuracy decreased from
et al., 2005) and classification error (Domingos & Pazzani, 1997). The
98.67% to 96.67%, this phenomenon occurs because of over-training of
equations corresponding to each loss function (fitness function) are
the algorithm.
given below(Hastie, 2008).
Fig. 7 shows the effect of the number of candidate solutions on the
Hinge loss is given by (1):
classification accuracy. Increasing the population size from 50 to 100
n
solutions has increased the accuracy from 95.36% to 98.67%. Using
A= ∑ wj max{0,1−hj}
more candidate solutions provides more diversity and increase the
j=1 (1)
chance to find the optimal solution however it may slow the training.
Exponential loss is given by (2):
n
A= ∑ wj exp(−hj ) Comparison between GA and PCA feature selection
j=1 (2)
The first experiment aims to compare the performance of the fea-
Binomial deviance loss is given by (3): tures selected by GA to those selected by PCA in terms of classification
n
accuracy, by using PHOG feature extraction and LDA classifier. As
A= ∑ wj log{1 + exp[−2hj]} shown in Fig. 8 the accuracy is 98.67% and 81.33% for GA and PCA
j=1 (3)
respectively. Under the same experimental framework, GA feature se-
Classification error loss is given by (4): lection outperforms PCA based feature selection. The power of GA lies
n in its ability to use the feedback from LDA classifier to evaluate and
A= ∑ wj I { ∧yj ≠ yj } combine (applying crossover) the best individuals to build more fit
j=1 (4) solutions. Moreover, GA reduces the features size by selecting only the
strong features that perform best in the classification.
where
The second experiment aims to show the effect of combining the
• A represents the weighted average loss.

• nw represents the size of samples.
Table 2
• represents the weight for observation j where ∑ w = 1.

j
n
j=1 j
Comparison between the four fitness functions in terms of training
time.
• y represents the label of class with maximum probability.
∧
j
• I{x} represents the indicator function. Fitness function Training time (in minutes)
• y represents a vector of L-1zeros, and a one in the position of the

j
∗
Hinge 30
true observed class y . j Exponential 31
• f(Z ) represents the length L vector of class scores for observation j of
j Binomial deviance
Classification error
30
30
the predictor data Z.
73
100 100
98 90
Accuracy(%) 80
96 70
94 60
number of solutions Accuracy(%)
20 30 50 55 60 65 70
Fig. 6. The effect of the number of generations on the classification accuracy.
100
98 Fig. 10. Comparison between PHOG and PHOG-LBP features using GA feature
96 Accuracy(%) selection.
94
92 number of solutions 100
50 75 100 90
Fig. 7. The effect of the number of candidate solutions on the classification 80
accuracy. 70 Accuracy(%)
60
100 LDA(PCA)
KNN(PCA)
90
80 Fig. 11. Comparison between KNN and LDA classifiers using PCA feature se-
70 Accuracy(%) lection.
60
GA 100
PCA
90
Fig. 8. Comparison between GA and PCA feature selection.
80
70 Accuracy(%)
features of two different feature extraction techniques. Specifically, it 60
shows which features perform best with GA and which features perform LDA(GA)
KNN(GA)
best with PCA.
We have combined PHOG features with Local binary pattern (LBP) Fig. 12. Comparison between KNN and LDA classifiers using GA feature se-
features. lection.
The results shown in Fig. 9 are based on PCA feature selection and
LDA classifier, the combination of features has increased the accuracy on VGG-Very-Deep-16 (Parkhi, Vedaldi, & Zisserman, 2015). Since we
from 81.33% to 84.66%. have a considerably small dataset, we cannot train CNN from the
The results shown in Fig. 10 are based on GA feature selection and scratch to avoid a catastrophic overfitting (Agrawal, Girshick, & Malik,
LDA classifier, the combination of features using GA feature selection 2014). Therefore we preferred to use a CNN pre-trained model to ex-
and LDA classifier didn’t improve the accuracy. Moreover, it slows tract the features. We choose to extract the features form the FC7 layer
down the training. We can conclude that PCA performs better with which has a high semantic relevance. This layer output is a 1D vector
PHOG-LBP features than with PHOG features, while the combination of whose length is equal to 4096. We have then applied the classification
features did not affect the performance of GA. using LDA. Table 3 shows the comparison results between the proposed
The third experiment aims to compare LDA and KNN classifiers method and VGG-face. Fig. 13 compares the accuracy of our method to
using either GA or PCA features selection. As shown in Fig. 11 PCA VGG-face over the six facial emotions.
performs better with KNN classifier than with LDA. According to the results shown in Table 3, the accuracy for both
However, Fig. 12 shows that GA performs better with LDA classifier VGG-face CNN and our approach is 98.67%. However, the proposed
than with KNN. method based on GA-LDA has achieved a high accuracy in less time and
In the fourth experiment, we have compared our approach with a by using significantly fewer features than CNN, which improves the
convolutional neural network (CNN). We have used the pre-trained time cost and the memory consumption. Further, while CNN needs to
VGG (Visual Geometry Group)-face model which is implemented based be trained on a big dataset, our approach based on GA can be trained on
a small dataset and achieve good results too. Note that the training time
100 corresponding to VGG-face CNN is not actually the time needed to train
the CNN from the scratch, but it is corresponding to the time needed to
90 extract the features using the pre-trained model.
80
Table 3
70 Accuracy(%) Comparison between our approach and cnn.
Approach Comparison criteria
Accuracy (%) Size of feature vectors Training time (in min)
Our approach 98.67 626 30

VGG-face CNN 98.67 4096 34
Fig. 9. Comparison between PHOG and PHOG-LBP features using PCA feature
selection.
74
Learning to rank using gradient descent. Proceedings of the 22nd international conference on
100 machine learning (pp. 89–96). ACM.
Canny, J. (1987). A computational approach to edge detection. In Readings in computer vision
Accuracy(%)
(pp. 184–203).
95 Chen, B., Chen, L., & Chen, Y. (2013). Efficient ant colony optimization for image feature se-
lection. Signal Processing, 93(6), 1566–1576.
GA
Chuang, L. Y., Yang, C. H., & Yang, C. H. (2009). Tabu search and binary particle swarm op-
90
timization for feature selection using microarray data. Journal of Computational Biology,
CNN 16(12), 1689–1703.
85 Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on
Information Theory, 13(1), 21–27.
Dalal, N. & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE
computer society conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005
(Vol. 1, pp. 886–893). IEEE.
Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(3),
Fig. 13. Comparison between our approach based on GA and VGG-face CNN in 131–156.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under
terms of accuracy over the six emotions (%). zero-one loss. Machine Learning, 29(2–3), 103–130.
Duval, B., Hao, J. K., & Hernandez Hernandez, J. C. (2009). A memetic algorithm for gene
selection and molecular classification of cancer. Proceedings of the 11th annual conference on
genetic and evolutionary computation (pp. 201–208). ACM.
Table 4 Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view
Comparison between our approach and other existing approaches. of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2),
337–407.
Approach Comparison criteria Fukunaga, K. (2013). Introduction to statistical pattern recognition. Academic Press.
Gentile, C. & Warmuth, M. K. (1999). Linear hinge loss and average margin. In Advances in
Features Accuracy neural information processing systems (pp. 225–231).
Ghamisi, P., & Benediktsson, J. A. (2015). Feature selection based on hybridization of genetic
algorithm and particle swarm optimization. IEEE Geoscience and Remote Sensing Letters,
Our approach PHOG 98%
12(2), 309–313.
LPP (Rui Xiao et al., 2011) LBP 96.57% Gheyas, I. A., & Smith, L. S. (2010). Feature subset selection in large dimensionality domains.
Uniform LGBP GA Pareto Random Forest (Fowei Wang Gabor 95.5% Pattern Recognition, 43(1), 5–13.
et al.,2016) Guo, J., White, J., Wang, G., Li, J., & Wang, Y. (2011). A genetic algorithm for optimized feature
LPP (Rui Xiao et al., 2011) Gabor 95.38% selection with resource constraints in software product lines. Journal of Systems and
Uniform LGBP GA Pareto SVM (Fowei Wang et al.,2016) Gabor 93% Software, 84(12), 2208–2221.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of
LPP (Rui Xiao et al., 2011) IMG 89.35%
Machine Learning Research, 3(Mar), 1157–1182.
Hastie, T. (2008). Tibshirani, r. and Friedman, J. (2009): The elements of statistical learning.
Data mining, inference, and prediction.
Hernandez, J. C. H., Duval, B., & Hao, J. K. (2007). A genetic embedded approach for gene
selection and classification of microarray data. European conference on evolutionary com-
In order to compare our approach to other existing approaches, we putation, machine learning and data mining in bioinformatics (pp. 90–101). Berlin, Heidelberg:
Springer.
train the proposed method on Cohn-Kanade (Kanade, Cohn, & Tian, Hsu, H. H., Hsieh, C. W., & Lu, M. D. (2011). Hybrid feature selection by combining filters and
2000) dataset. According to the results shown in Table 4, our approach wrappers. Expert Systems with Applications, 38(7), 8144–8150.
Huang, C. F. (2012). A hybrid stock selection model using genetic algorithms and support
has achieved the highest accuracy.
vector regression. Applied Soft Computing, 12(2), 807–818.
Huerta, E. B., Duval, B., & Hao, J. K. (2006). A hybrid GA/SVM approach for gene selection and
Conclusion classification of microarray data. Workshops on applications of evolutionary computation (pp.
34–44). Berlin, Heidelberg: Springer.
Inbarani, H. H., Azar, A. T., & Jothi, G. (2014). Supervised hybrid feature selection based on
In this work, we have proposed an optimization of pattern re- PSO and rough sets for medical diagnosis. Computer methods and programs in biomedicine,
113(1), 175–185.
cognition using one of the most encountered evolutionary algorithms. Janecek, A., Gansterer, W., Demel, M., & Ecker, G. (2008). On the relationship between feature
We have used GA based on LDA classifier and hinge loss to evaluate and selection and classification accuracy. In New challenges for feature selection in data mining
select the relevant features. Our method has efficiently decreased the and knowledge discovery (pp. 90–105).
Kabir, M. M., Shahjahan, M., & Murase, K. (2011). A new local search based hybrid genetic
features size and improved the classification accuracy within a rea- algorithm for feature selection. Neurocomputing, 74(17), 2914–2928.
sonable time. According to the experimental results, our approach has Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression ana-
lysis. In Fourth IEEE international conference on automatic face and gesture recognition,
outperformed PCA-based feature selection approach and other existing 2000. Proceedings (pp. 46–53). IEEE.
algorithms. Further, we have compared our method to VGG-face CNN Kashef, S., & Nezamabadi-pour, H. (2015). An advanced ACO algorithm for feature subset se-
lection. Neurocomputing, 147, 271–279.
in terms of accuracy, training time and features size. Based on the ex-
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence,
perimental results, the proposed method based on GA-LDA has achieved 97(1–2), 273–324.
the same accuracy as CNN in less time and by using far fewer features Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H., Hawk, S. T., & Van Knippenberg, A. D.
(2010). Presentation and validation of the Radboud Faces Database. Cognition and Emotion,
than CNN. We can conclude that, although the proposed method based 24(8), 1377–1388.
on evolutionary algorithm optimization has been trained on a small Li, S., Wu, H., Wan, D., & Zhu, J. (2011). An effective feature selection method for hyperspectral
image classification based on genetic algorithm and support vector machine. Knowledge-
dataset, it outperforms other well-known dimensionality reduction Based Systems, 24(1), 40–48.
approaches. Liu, H., & Motoda, H. (Eds.). (2007). Computational methods of feature selection. CRC Press.
Moon, H. M., Choi, D., Kim, P., & Pan, S. B. (2015). LDA-based face recognition using multiple
distance training face images with low user cooperation. In 2015 IEEE International
References Conference on Consumer Electronics (ICCE) (pp. 7–8). IEEE.
Oreski, S., & Oreski, G. (2014). Genetic algorithm-based heuristic for feature selection in credit
risk assessment. Expert Systems with Applications, 41(4), 2052–2064.
Abbas, E. I., Safi, M. E., & Rijab, K. S. (2017). Face recognition rate using different classifier
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC (Vol. 1, No.
methods based on PCA. In 2017 International Conference on Current Research in Computer
3, p. 6).
Science and Information Technology (ICCIT) (pp. 37–40). IEEE.
Roffo, G. & Melzi, S. (2016). Features selection via eigenvector centrality. In Proceedings of New
Agrawal, P., Girshick, R., & Malik, J. (2014). Analyzing the performance of multilayer neural
Frontiers in Mining Complex Patterns (NFMCP 2016) (Oct 2016).
networks for object recognition. In European conference on computer vision (pp. 329–344).
Roffo, G., Melzi, S., & Cristani, M. (2015). Infinite feature selection. In Proceedings of the IEEE
Cham: Springer.
international conference on computer vision (pp. 4202–4210).
Anbarasi, M., Anupriya, E., & Iyengar, N. C. S. N. (2010). Enhanced prediction of heart disease
Sharma, A., & Paliwal, K. K. (2008). Cancer classification by gradient LDA technique using
with feature subset selection using genetic algorithm. International Journal of Engineering
microarray gene expression data. Data & Knowledge Engineering, 66(2), 338–347.
Science and Technology, 2(10), 5370–5376.
Sidorov, M., Brester, C., Minker, W., & Semenkin, E. (2014). Speech-based emotion recognition:
Bosch, A., Zisserman, A., & Munoz, X. (2007). Representing shape with a spatial pyramid
feature selection by self-adaptive multi-criteria genetic algorithm. In LREC (pp.
kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval (pp.
3481–3485).
401–408). ACM.
Suguna, N., & Thanushkodi, K. (2010). An improved k-nearest neighbor classification using
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005).
genetic algorithm. International Journal of Computer Science Issues, 7(2), 18–21.
75
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Zhang, Y., Wang, S., Phillips, P., & Ji, G. (2014). Binary PSO with mutation operator for feature
Computer Vision, 57(2), 137–154. selection using decision tree applied to spam detection. Knowledge-Based Systems, 64,
Yang, Y. & Pedersen, J. O. (1997). A comparative study on feature selection in text categor- 22–31.
ization. In Icml (Vol. 97, pp. 412–420). Zhao, M., Fu, C., Ji, L., Tang, K., & Zhou, M. (2011). Feature selection and parameter optimi-
Zhang, Y., Dong, Z., Phillips, P., Wang, S., Ji, G., Yang, J., & Yuan, T. F. (2015). Detection of zation for support vector machines: A new approach based on genetic algorithm with
subjects and brain regions related to Alzheimer's disease using 3D MRI scans based on feature chromosomes. Expert Systems with Applications, 38(5), 5197–5204.
eigenbrain and machine learning. Frontiers in Computational Neuroscience, 9, 66.
76

Biologically Inspired Cognitive Architectures: Hadjer Boubenna, Dohoon Lee T

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Biologically Inspired Cognitive Architectures: Hadjer Boubenna, Dohoon Lee T

Încărcat de

Drepturi de autor:

Formate disponibile

Biologically Inspired Cognitive Architectures 24 (2018) 70–76

Contents lists available at ScienceDirect

Biologically Inspired Cognitive Architectures

Image-based emotion recognition using evolutionary algorithms T

Introduction on the hold-out set.

Input (face image) Binary vector 0 1 1 0 1 1 0

Feature vector 0.03 0.21 0.09 0.56 0.33 0.45 0.12

Selected Features 0.21 0.09 0.33 0.45

Fig. 3 each candidate solution is represented by two vectors whose

In this step, each face image is represented by a vector of features.

The classiﬁcation is the last step in facial emotion recognition pro-

Fig. 5. Comparison between the four ﬁtness functions in terms of classiﬁcation

separability by maximizing the between-class dispersion and mini-

• A represents the weighted average loss.

• represents the weight for observation j where ∑ w = 1.

• y represents a vector of L-1zeros, and a one in the position of the

Accuracy (%) Size of feature vectors Training time (in min)

Our approach 98.67 626 30

S-ar putea să vă placă și