Documente Academic
Documente Profesional
Documente Cultură
Research article
A R T I C LE I N FO A B S T R A C T
Keywords: In pattern recognition, the classification accuracy has a strong correlation with the selected features. Therefore,
Pattern recognition in the present paper, we applied an evolutionary algorithm in combination with linear discriminant analysis
Evolutionary algorithms (LDA) to enhance the feature selection in a static image-based facial expressions system. The accuracy of the
Optimization classification depends on whether the features are well representing the expression or not. Therefore the opti-
Machine learning
mization of the selected features will automatically improve the classification accuracy. The proposed method
Supervised learning
not only improves the classification but also reduces the dimensionality of features. Our approach outperforms
linear-based dimensionality reduction algorithms and other existing genetic-based feature selection algorithms.
Further, we compare our approach with VGG (Visual Geometry Group)-face convolutional neural network
(CNN), according to the experimental results, the overall accuracy is 98.67% for either our approach or VGG-
face. However, the proposed method outperforms CNN in terms of training time and features size. The proposed
method proves that it is able to achieve high accuracy by using far fewer features than CNN and within a
reasonable training time.
⁎
Corresponding author.
E-mail addresses: hadjer@pusan.ac.kr (H. Boubenna), dohoon@pusan.ac.kr (D. Lee).
https://doi.org/10.1016/j.bica.2018.04.008
Received 29 March 2018; Received in revised form 14 April 2018; Accepted 15 April 2018
2212-683X/ © 2018 Elsevier B.V. All rights reserved.
H. Boubenna, D. Lee Biologically Inspired Cognitive Architectures 24 (2018) 70–76
encode important information. neural network for classification. The results show that the proposed
We propose a genetic algorithm (GA) in combination with linear approach increased the emotion recognition accuracy by up to 26.08%.
discriminant classifier (LDA) to evaluate each combination of features, In Ghamisi and Benediktsson (2015) they proposed an integration of
we run the algorithm for a finite number of iterations until converging a GA and PSO, where the accuracy of SVM is used as fitness function.
to the optimum subset of features which will later be used to classify the According to the results performed on indian pines hyperspectral da-
emotion. taset, their method can select relevant features and achieve high ac-
The rest of the paper is structured as follows: in Section “Related curacy within a reasonable time.
Work” we describe the related works. In Section “Framework” we give In Oreski and Oreski (2014) a hybrid GA and neural networks (HGA-
an overview of the proposed framework. In Section “Experiments and NN) is proposed as feature selection for credit risk assessment. They use
analysis” we describe the experimental results. In Section “Conclusion” a real-world credit dataset to evaluate the performance of HGA-NN
we give the conclusion. classifier.
In Anbarasi, Anupriya, and Iyengar (2010) a hybrid of GA-SVM is
Related work used to select a relevant subset of bands mass in order to classify hy-
perspectral images, which can be applied to land cover investigation
Feature selection is indispensable in any pattern recognition system, and target detection.
in which the dimensionality of feature vectors will be decreased sig- Datasets in medical domain consist of much more disease mea-
nificantly. Decreasing the redundant features will increase the classifi- surements than records. Therefore feature selection is necessary to re-
cation accuracy automatically (Janecek, Gansterer, Demel, & Ecker, duce redundancy and the memory space used to stock the disease
2008). measurements.
Principal component analysis (PCA) is considered as one of the most In Li, Wu, Wan, and Zhu (2011) in order to reduce the number of
encountered feature dimensionality reduction techniques. It has been tests that the patient needs to take, GA is employed to determine which
applied to many pattern recognition researches such as face recognition tests contribute more toward the diagnosis of heart diseases, which will
(Abbas, Safi, & Rijab, 2017). In addition to PCA, LDA has also been reduce the number of tests automatically. Classification by clustering,
widely employed to reduce the features size (Moon, Choi, Kim, & Pan, decision tree and naive Bayes is then applied to predict the diagnosis. In
2015). LDA enhances the class separability by providing the most dis- Inbarani, Azar, and Jothi (2014) a hybridization of PSO is proposed,
criminant projection of features from a high dimensionality to a low PSO based quick reduct (PSO-QR) and PSO based relative reduct (PSO-
dimensionality, however the performance of LDA drops down when the RR) are used to optimize the selection of features for diseases diagnosis.
samples number is smaller than the dimensionality (Fukunaga, 2013). In addition to GA and PSO which are widely employed for feature
To address the problem of feature selection optimization, evolu- selection, ant colony optimization (ACO) also has been employed to
tionary algorithms have been widely applied to decrease the features reduce the size of features and select an optimal subset of features. In
size and enhance the selection process. The most used algorithms are Kashef and Nezamabadi-pour (2015) Advanced Binary ACO (ABACO) is
GA and PSO (particle swarm optimization). proposed for feature selection. The features form a fully connected
In Zhao, Fu, Ji, Tang, and Zhou (2011) GA is used to select the graph where each node consists of two sub-nodes, one to select and the
features and set the parameters for SVM (support vector machine) other to deselect the feature. At each tour, each ant has to visit all the
which will significantly influence the classification accuracy. The pro- nodes on the graph. At the end of the tour, every ant will have a vector
posed has been evaluated on several real-world datasets, according to of zeros and ones where 1 means select and 0 means deselect the cor-
the results, the fusion of GA and SVM has improved and the accuracy responding feature. In Chen, Chen, and Chen (2013) they propose an
and the processing time. alternative way to be traversed by the ants, for n features instead of
In Guo, White, Wang, Li, and Wang (2011) they propose GA feature using a fully connected graph they use a directed graph with only O(2n)
selection (GAFES) in software product line (SPL). GA is used to mini- arcs instead of O(n2) arcs.
mize or maximize an objective function such as a total cost. According
to the results, GAFES proves that it can achieve better results than other
feature selection methods and in less time. Framework
In Huang (2012) they propose a stock selection method using GA
and support vector regression (SVR). GA is used to optimize the model In this section, we present the proposed framework which consists
parameters and select the optimal set of input variables to the SVR of three basic steps, the first one is feature extraction, in which, each
model. The proposed GA-SVR has significantly outperformed the image is transformed into a feature vector which is supposed to be a
benchmark. representative of the original face image, the next step is feature se-
In Gheyas and Smith (2010) They proposed SAGA for feature se- lection which is the main aim of our study, in this step, the size of
lection which is a hybridization between simulated annealing (SA) and feature vectors will be significantly reduced by eliminating the irrele-
GA. SA is good at avoiding local minimum and GA has a high con- vant features and keep only a small subset of features that best re-
vergence rate. According to the results, the hybrid method outperforms presents the face emotion. These vectors are then used in the classifi-
other algorithms on real and synthetic datasets. cation step to segregate the features into different facial emotion
In Kabir, Shahjahan, and Murase (2011) they proposed a hybrid GA classes. Fig. 1 shows the framework of our approach.
and local search. They embedded a local search technique in GA to fine Before extracting the feature vectors, the images go through a pre-
tune the search. The local search aims to distinguish the nature of processing step, in which the region of interest is extracted from the
features based on their correlation information. They evaluate the original images. The absolute face area is extracted from the back-
performance of their approach on 11 real-world classification datasets ground. Extracting the face area is necessary to get more accurate
according to the results their method performs better than other well- emotion detection. Then the images are resized to 100∗100 pixels and
known feature selection algorithms. normalized to zero mean. Viola-Jones algorithm is employed to detect
In Suguna and Thanushkodi (2010) they combine GA with KNN. and extract the face area (Viola & Jones, 2004). The adaboost algorithm
Instead of choosing the k-neighbors among all the samples, GA is used used in viola-jones face detection aims to construct a strong classifier by
to select the k neighbors immediately and classify the new samples by laniary combining weak classifiers. Viola-jones is a robust and fast face
computing the distance. detection technique and it can be applied for real-time applications.
In Sidorov, Brester, Minker, and Semenkin (2014) they use self- After extracting the face area we applied canny edge (Canny, 1987)
adaptive multi-objective GA for feature selection and probabilistic detection operator.
71
H. Boubenna, D. Lee Biologically Inspired Cognitive Architectures 24 (2018) 70–76
Feature extraction
Table 1 compares the size of the original features with the size of the
selected features. Our approach proves that it can decrease the size of
features by about 54%.
Classification
72
H. Boubenna, D. Lee Biologically Inspired Cognitive Architectures 24 (2018) 70–76
100 Hinge
98 Exponential
96 Binomial deviance
94
Classification error
92
happy angry sad contemptuous 90
88
• I{x} represents the indicator function. Fitness function Training time (in minutes)
73
H. Boubenna, D. Lee Biologically Inspired Cognitive Architectures 24 (2018) 70–76
100 100
98 90
Accuracy(%) 80
96 70
94 60
number of solutions Accuracy(%)
20 30 50 55 60 65 70
Fig. 6. The effect of the number of generations on the classification accuracy.
100
98 Fig. 10. Comparison between PHOG and PHOG-LBP features using GA feature
96 Accuracy(%) selection.
94
92 number of solutions 100
50 75 100 90
Fig. 7. The effect of the number of candidate solutions on the classification 80
accuracy. 70 Accuracy(%)
60
100 LDA(PCA)
KNN(PCA)
90
80 Fig. 11. Comparison between KNN and LDA classifiers using PCA feature se-
70 Accuracy(%) lection.
60
GA 100
PCA
90
Fig. 8. Comparison between GA and PCA feature selection.
80
70 Accuracy(%)
features of two different feature extraction techniques. Specifically, it 60
shows which features perform best with GA and which features perform LDA(GA)
KNN(GA)
best with PCA.
We have combined PHOG features with Local binary pattern (LBP) Fig. 12. Comparison between KNN and LDA classifiers using GA feature se-
features. lection.
The results shown in Fig. 9 are based on PCA feature selection and
LDA classifier, the combination of features has increased the accuracy on VGG-Very-Deep-16 (Parkhi, Vedaldi, & Zisserman, 2015). Since we
from 81.33% to 84.66%. have a considerably small dataset, we cannot train CNN from the
The results shown in Fig. 10 are based on GA feature selection and scratch to avoid a catastrophic overfitting (Agrawal, Girshick, & Malik,
LDA classifier, the combination of features using GA feature selection 2014). Therefore we preferred to use a CNN pre-trained model to ex-
and LDA classifier didn’t improve the accuracy. Moreover, it slows tract the features. We choose to extract the features form the FC7 layer
down the training. We can conclude that PCA performs better with which has a high semantic relevance. This layer output is a 1D vector
PHOG-LBP features than with PHOG features, while the combination of whose length is equal to 4096. We have then applied the classification
features did not affect the performance of GA. using LDA. Table 3 shows the comparison results between the proposed
The third experiment aims to compare LDA and KNN classifiers method and VGG-face. Fig. 13 compares the accuracy of our method to
using either GA or PCA features selection. As shown in Fig. 11 PCA VGG-face over the six facial emotions.
performs better with KNN classifier than with LDA. According to the results shown in Table 3, the accuracy for both
However, Fig. 12 shows that GA performs better with LDA classifier VGG-face CNN and our approach is 98.67%. However, the proposed
than with KNN. method based on GA-LDA has achieved a high accuracy in less time and
In the fourth experiment, we have compared our approach with a by using significantly fewer features than CNN, which improves the
convolutional neural network (CNN). We have used the pre-trained time cost and the memory consumption. Further, while CNN needs to
VGG (Visual Geometry Group)-face model which is implemented based be trained on a big dataset, our approach based on GA can be trained on
a small dataset and achieve good results too. Note that the training time
100 corresponding to VGG-face CNN is not actually the time needed to train
the CNN from the scratch, but it is corresponding to the time needed to
90 extract the features using the pre-trained model.
80
Table 3
70 Accuracy(%) Comparison between our approach and cnn.
Approach Comparison criteria
74
H. Boubenna, D. Lee Biologically Inspired Cognitive Architectures 24 (2018) 70–76
Learning to rank using gradient descent. Proceedings of the 22nd international conference on
100 machine learning (pp. 89–96). ACM.
Canny, J. (1987). A computational approach to edge detection. In Readings in computer vision
Accuracy(%)
(pp. 184–203).
95 Chen, B., Chen, L., & Chen, Y. (2013). Efficient ant colony optimization for image feature se-
lection. Signal Processing, 93(6), 1566–1576.
GA
Chuang, L. Y., Yang, C. H., & Yang, C. H. (2009). Tabu search and binary particle swarm op-
90
timization for feature selection using microarray data. Journal of Computational Biology,
CNN 16(12), 1689–1703.
85 Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on
Information Theory, 13(1), 21–27.
Dalal, N. & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE
computer society conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005
(Vol. 1, pp. 886–893). IEEE.
Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(3),
Fig. 13. Comparison between our approach based on GA and VGG-face CNN in 131–156.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under
terms of accuracy over the six emotions (%). zero-one loss. Machine Learning, 29(2–3), 103–130.
Duval, B., Hao, J. K., & Hernandez Hernandez, J. C. (2009). A memetic algorithm for gene
selection and molecular classification of cancer. Proceedings of the 11th annual conference on
genetic and evolutionary computation (pp. 201–208). ACM.
Table 4 Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view
Comparison between our approach and other existing approaches. of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2),
337–407.
Approach Comparison criteria Fukunaga, K. (2013). Introduction to statistical pattern recognition. Academic Press.
Gentile, C. & Warmuth, M. K. (1999). Linear hinge loss and average margin. In Advances in
Features Accuracy neural information processing systems (pp. 225–231).
Ghamisi, P., & Benediktsson, J. A. (2015). Feature selection based on hybridization of genetic
algorithm and particle swarm optimization. IEEE Geoscience and Remote Sensing Letters,
Our approach PHOG 98%
12(2), 309–313.
LPP (Rui Xiao et al., 2011) LBP 96.57% Gheyas, I. A., & Smith, L. S. (2010). Feature subset selection in large dimensionality domains.
Uniform LGBP GA Pareto Random Forest (Fowei Wang Gabor 95.5% Pattern Recognition, 43(1), 5–13.
et al.,2016) Guo, J., White, J., Wang, G., Li, J., & Wang, Y. (2011). A genetic algorithm for optimized feature
LPP (Rui Xiao et al., 2011) Gabor 95.38% selection with resource constraints in software product lines. Journal of Systems and
Uniform LGBP GA Pareto SVM (Fowei Wang et al.,2016) Gabor 93% Software, 84(12), 2208–2221.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of
LPP (Rui Xiao et al., 2011) IMG 89.35%
Machine Learning Research, 3(Mar), 1157–1182.
Hastie, T. (2008). Tibshirani, r. and Friedman, J. (2009): The elements of statistical learning.
Data mining, inference, and prediction.
Hernandez, J. C. H., Duval, B., & Hao, J. K. (2007). A genetic embedded approach for gene
selection and classification of microarray data. European conference on evolutionary com-
In order to compare our approach to other existing approaches, we putation, machine learning and data mining in bioinformatics (pp. 90–101). Berlin, Heidelberg:
Springer.
train the proposed method on Cohn-Kanade (Kanade, Cohn, & Tian, Hsu, H. H., Hsieh, C. W., & Lu, M. D. (2011). Hybrid feature selection by combining filters and
2000) dataset. According to the results shown in Table 4, our approach wrappers. Expert Systems with Applications, 38(7), 8144–8150.
Huang, C. F. (2012). A hybrid stock selection model using genetic algorithms and support
has achieved the highest accuracy.
vector regression. Applied Soft Computing, 12(2), 807–818.
Huerta, E. B., Duval, B., & Hao, J. K. (2006). A hybrid GA/SVM approach for gene selection and
Conclusion classification of microarray data. Workshops on applications of evolutionary computation (pp.
34–44). Berlin, Heidelberg: Springer.
Inbarani, H. H., Azar, A. T., & Jothi, G. (2014). Supervised hybrid feature selection based on
In this work, we have proposed an optimization of pattern re- PSO and rough sets for medical diagnosis. Computer methods and programs in biomedicine,
113(1), 175–185.
cognition using one of the most encountered evolutionary algorithms. Janecek, A., Gansterer, W., Demel, M., & Ecker, G. (2008). On the relationship between feature
We have used GA based on LDA classifier and hinge loss to evaluate and selection and classification accuracy. In New challenges for feature selection in data mining
select the relevant features. Our method has efficiently decreased the and knowledge discovery (pp. 90–105).
Kabir, M. M., Shahjahan, M., & Murase, K. (2011). A new local search based hybrid genetic
features size and improved the classification accuracy within a rea- algorithm for feature selection. Neurocomputing, 74(17), 2914–2928.
sonable time. According to the experimental results, our approach has Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression ana-
lysis. In Fourth IEEE international conference on automatic face and gesture recognition,
outperformed PCA-based feature selection approach and other existing 2000. Proceedings (pp. 46–53). IEEE.
algorithms. Further, we have compared our method to VGG-face CNN Kashef, S., & Nezamabadi-pour, H. (2015). An advanced ACO algorithm for feature subset se-
lection. Neurocomputing, 147, 271–279.
in terms of accuracy, training time and features size. Based on the ex-
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence,
perimental results, the proposed method based on GA-LDA has achieved 97(1–2), 273–324.
the same accuracy as CNN in less time and by using far fewer features Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H., Hawk, S. T., & Van Knippenberg, A. D.
(2010). Presentation and validation of the Radboud Faces Database. Cognition and Emotion,
than CNN. We can conclude that, although the proposed method based 24(8), 1377–1388.
on evolutionary algorithm optimization has been trained on a small Li, S., Wu, H., Wan, D., & Zhu, J. (2011). An effective feature selection method for hyperspectral
image classification based on genetic algorithm and support vector machine. Knowledge-
dataset, it outperforms other well-known dimensionality reduction Based Systems, 24(1), 40–48.
approaches. Liu, H., & Motoda, H. (Eds.). (2007). Computational methods of feature selection. CRC Press.
Moon, H. M., Choi, D., Kim, P., & Pan, S. B. (2015). LDA-based face recognition using multiple
distance training face images with low user cooperation. In 2015 IEEE International
References Conference on Consumer Electronics (ICCE) (pp. 7–8). IEEE.
Oreski, S., & Oreski, G. (2014). Genetic algorithm-based heuristic for feature selection in credit
risk assessment. Expert Systems with Applications, 41(4), 2052–2064.
Abbas, E. I., Safi, M. E., & Rijab, K. S. (2017). Face recognition rate using different classifier
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC (Vol. 1, No.
methods based on PCA. In 2017 International Conference on Current Research in Computer
3, p. 6).
Science and Information Technology (ICCIT) (pp. 37–40). IEEE.
Roffo, G. & Melzi, S. (2016). Features selection via eigenvector centrality. In Proceedings of New
Agrawal, P., Girshick, R., & Malik, J. (2014). Analyzing the performance of multilayer neural
Frontiers in Mining Complex Patterns (NFMCP 2016) (Oct 2016).
networks for object recognition. In European conference on computer vision (pp. 329–344).
Roffo, G., Melzi, S., & Cristani, M. (2015). Infinite feature selection. In Proceedings of the IEEE
Cham: Springer.
international conference on computer vision (pp. 4202–4210).
Anbarasi, M., Anupriya, E., & Iyengar, N. C. S. N. (2010). Enhanced prediction of heart disease
Sharma, A., & Paliwal, K. K. (2008). Cancer classification by gradient LDA technique using
with feature subset selection using genetic algorithm. International Journal of Engineering
microarray gene expression data. Data & Knowledge Engineering, 66(2), 338–347.
Science and Technology, 2(10), 5370–5376.
Sidorov, M., Brester, C., Minker, W., & Semenkin, E. (2014). Speech-based emotion recognition:
Bosch, A., Zisserman, A., & Munoz, X. (2007). Representing shape with a spatial pyramid
feature selection by self-adaptive multi-criteria genetic algorithm. In LREC (pp.
kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval (pp.
3481–3485).
401–408). ACM.
Suguna, N., & Thanushkodi, K. (2010). An improved k-nearest neighbor classification using
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005).
genetic algorithm. International Journal of Computer Science Issues, 7(2), 18–21.
75
H. Boubenna, D. Lee Biologically Inspired Cognitive Architectures 24 (2018) 70–76
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Zhang, Y., Wang, S., Phillips, P., & Ji, G. (2014). Binary PSO with mutation operator for feature
Computer Vision, 57(2), 137–154. selection using decision tree applied to spam detection. Knowledge-Based Systems, 64,
Yang, Y. & Pedersen, J. O. (1997). A comparative study on feature selection in text categor- 22–31.
ization. In Icml (Vol. 97, pp. 412–420). Zhao, M., Fu, C., Ji, L., Tang, K., & Zhou, M. (2011). Feature selection and parameter optimi-
Zhang, Y., Dong, Z., Phillips, P., Wang, S., Ji, G., Yang, J., & Yuan, T. F. (2015). Detection of zation for support vector machines: A new approach based on genetic algorithm with
subjects and brain regions related to Alzheimer's disease using 3D MRI scans based on feature chromosomes. Expert Systems with Applications, 38(5), 5197–5204.
eigenbrain and machine learning. Frontiers in Computational Neuroscience, 9, 66.
76