Documente Academic
Documente Profesional
Documente Cultură
Abstract— Facial recognition, one of the basic topics in com- intelligent algorithms for face recognition results in efficiency
puter vision and pattern recognition, has received substantial improvements, innovations, and cost savings in several areas.
attention in recent years. However, for those traditional facial In general, the basic steps for facial recognition
recognition algorithms, the facial images are reshaped to a
long vector, thereby losing part of the original spatial con- involve facial detection, handcrafted-based feature extrac-
straints of each pixel. In this paper, a new tensor-based feature tion, subspace-based feature extraction and classification [14].
extraction algorithm termed tensor rank preserving discriminant At the facial detection and handcrafted-based feature extrac-
analysis (TRPDA) for facial image recognition is proposed; tion stage [27], local features encode features on interest
the proposed method involves two stages: in the first stage, regions [31]. Biswas et al. [2] used Scale Invariant Feature
the low-dimensional tensor subspace of the original input tensor
samples was obtained; in the second stage, discriminative locality Transform (SIFT) features [6] to describe each landmark and
alignment was utilized to obtain the ultimate vector feature combined the SIFT features of total landmarks to represent
representation for subsequent facial recognition. On the one a face. The SIFT feature was popularized in the computer
hand, the proposed TRPDA algorithm fully utilizes the natural vision field and was initially designed for recognizing the
structure of the input samples, and it applies an optimization cri- identical object under different conditions. Because of its high
terion that can directly handle the tensor spectral analysis prob-
lem, thereby decreasing the computation cost compared those discriminative power, the SIFT feature is widely adopted for
traditional tensor-based feature selection algorithms. On the facial recognition [5]. Chen et al. [3] obtained multi-scale
other hand, the proposed TRPDA algorithm extracts feature Local Binary Pattern (LBP) [15] features from 27 landmarks,
by finding a tensor subspace that preserves most of the rank where the 27 landmarks come from a patch. For all patches,
order information of the intra-class input samples. Experiments LBP features are connected to become a long feature vector
on the three facial databases are performed here to determine
the effectiveness of the proposed TRPDA algorithm. as the pose feature. LBP features are originally proposed for
Index Terms— Tensor representation, rank preserving, face
texture classification; for an image, the values of LBP features
recognition, discriminant analysis. are determined by its local geometric structure, based on a
non-parametric method. The LBP features have been used in
I. I NTRODUCTION image description widely [33].
selection algorithm that maximizes the mutual information the manifold structure of the data. Tensor discriminative
between primitive high-dimensional Gaussian distributed sam- locality alignment (TDLA) [41], a supervised algorithm, is a
ples and projected low-dimensional samples. However, as for tensor generalization for DLA where the optimal solution is
LPP, PCA does not consider the label information; thus, we do obtained by optimizing each mode of the input samples. The
not use it to classify directly in general. As a variant of disadvantage of TDLA is that its computation is expensive
MDS [7], ISOMAP preserves the global geodesic distances and it does not consider the rank order information of the
of all pair-wise samples. Representative conventional super- intra-class samples in a patch.
vised feature selection algorithms include linear discriminant Lim [17] proposed a theory of singular values and singular
analysis (LDA) [8], Marginal Fisher’s Analysis (MFA) [37] vectors for tensors based on a constrained variational approach
and discriminative locality alignment (DLA) [42]. LDA is quite similar to the Rayleigh quotient for symmetric matrix
one of the widely used globally supervised linear feature eigenvalues. These notions are particularly useful in general-
selection algorithms that not only maximizes the determi- izing certain areas where the spectral analysis of matrices has
nant of the between-class scatter matrix but also minimizes traditionally played an important role. Thus, the traditional
the determinant of the within-class scatter matrix with low- spectral analysis based feature selection algorithm can be
dimensional projected samples. Although LDA has extensive generalized to tensor spectral analysis.
applications in pattern classification tasks [4], it often meets a In this paper, we present a new feature selection algo-
circumstance that requires a mass of training samples to obtain rithm called Tensor Rank Preserving Discriminant Analy-
a good model approximation, which can be seen as the small sis (TRPDA) that aims to directly approach the optimization
sample size (SSS) problem [32]. MFA is a popular supervised criterion. TRPDA differs from the aforementioned tensor-
manifold learning based linear feature selection algorithm, based feature selection algorithms, which iteratively approach
which builds a penalty graph with the inter-class marginal the optimization criterion and finally return to the traditional
samples to keep the divisibility of the inter-class. However, spectral analysis problem. TRPDA applies an optimization
MFA ignores the discriminative information of non-marginal criterion that can directly solve the tensor spectral analysis
samples and faces the ill-posed problem. DLA is a popular problem. The main contributions of the proposed TRPDA
supervised manifold learning based linear feature selection algorithm are summarized as follows:
algorithm. DLA utilizes the classification optimization criteria: 1) We represent the facial image as a 2-order-tensor, and
the distance between the intra-class output samples is as small its data structure is preserved. Based on the 2-order
as possible and the distance between the inter-class output tensor representation, the first step of the proposed
samples is as large as possible to preserve the discriminative TRPDA algorithm is to extract tensor feature by finding
information in a local patch. In addition, DLA combines all a tensor subspace that preserves most of the rank order
the optimal weighted parts to form a global subspace structure. information of the intra-class input samples.
However, DLA does not preserve the rank order information 2) Followed by the first step of the TRPDA algorithm,
of the intra-class samples. we vectorized the redefined tensor features. Next,
For vector feature representation, the key shortcoming discriminative locality alignment is adopted to obtain
is that this scheme loses part of the original spatial con- the final vector feature representation, by which the
straints of each pixel in the face images, which hinders the recognition rate is improved.
subsequent algorithm to construct the optimal model and The rest of the paper is summarized as follows. Section II
classification model. To solve the aforementioned difficulty, introduces a brief description of related tensor algebra. Next,
some researchers propose to use tensor representation rather the proposed TRPDA algorithm is presented in detail in
than vector representation as the input sample [41], [36]. section III. Finally, section IV reports the experiments details
Two-dimensional PCA (2DPCA) [38], an unsupervised algo- and discussions, followed by the conclusion in section V.
rithm, projects an image matrix to a low-dimensional matrix
by a linear transformation in the 2-mode while maximizing II. T ENSOR A LGEBRA
the mutual information. Thus, the linear transformation in the Let R denote the set of all real numbers. Tensors [18]
1-mode is ignored, resulting in poor performance. Multi-linear are multidimensional arrays of numbers that transform lin-
PCA (MPCA) [21] is an unsupervised algorithm where the early
under coordinate transformations. We call X =
input sample can be vectors, matrices, or higher-order tensors; X n1 ,n2 ,...,n M ∈ R N1 ×N2 ×...×N M a real tensor of order M if
MPCA captures most of the original structure of the input X n1 ,n2 ,...,n M ∈ R, where 1 ≤ n i ≤ Ni , and 1 ≤ i ≤ M.
sample. The disadvantage of MPCA is that the class label We briefly introduce the following relevant definitions in
information is not utilized; hence, it is not optimal for the multi-linear algebra.
classification tasks. Two-dimensional LDA (2DLDA) [39] is Definition 1 (Tensor Product): Let A ∈ R N1 ×N2 ×...×N M and
a supervised algorithm that projects an image matrices to B ∈ R N1 ×N2 ×...×N M be tensors of order M and M . In this
a low-dimensional matrix by a linear transformation in the case, the tensor product of A and B is a tensor of order
1-mode and 2-mode simultaneously, the advantage of 2DLDA M + M denoted A ⊗ B ∈ R N1 ×N2 ×...×N M ×N1 ×N2 ×...×N M ,
is that it maximizes the determinant of the between-class with its (n 1 , . . . , n M , n 1 , . . . , n M )-entry given by
scatter matrix and minimizes the determinant of the within-
A ⊗ Bn1 ×n2 ×...×n M ×n1 ×n2 ...,n = An1 ×n2 ×...×n M Bn1 ×n2 ...,n ,
class scatter matrix; moreover, 2DLDA preserves the original M M
matrix structure of data. However, 2DLDA does not consider for all index values.
TAO et al.: TRPDA FOR FACIAL RECOGNITION 327
M We unify the above by proposing a penalty factor as ωi and
A ×1 U1 ×2 U2 × · · · × M U M A ×iUi . a combination factor α
i=1 ⎛ ⎞
k1
2
k2
2
Definition 4 (Frobenius Norm): The Frobenius norm of a arg min ⎝ yi − yi j (ωi ) j − α yi − yi j ⎠ , (4)
tensor A ∈ R N1 ×N2 ×...×N M is given by yi
j =1 j =1
where
N
N
TABLE I
A G ENERAL F RAMEWORK OF TRPDA
Fig. 4. The average recognition rate versus the subspace dimension on the sets of UMIST for six vector-based methods and TRPDA method: (a) three
samples for training; (b) five samples for training; (c) seven samples for training.
Fig. 5. The average recognition rate versus the subspace dimension on the sets of UMIST for six tensor-based methods: (a) three samples for training;
(b) five samples for training; (c) seven samples for training.
Fig. 6. The boxplots on UMIST for seven methods from left to right of LDA, PCA, ISOMAP, DLA, MFA, LPP and TRPDA: (a) three samples for training;
(b) five samples for training; (c) seven samples for training.
obtain the average recognition. For CAS-PEAL-R1 expression average recognition rate versus the subspace dimensions for
and distance database, the role of the training set was to the UMIST database for six tensor-based methods. Fig. 6 and
obtain the projection matrix or the projection matrices to Fig. 8 show the boxplots of the experimental results of the
learn the low-dimensional subspace, while the role of gallery six vector-based methods and TRPDA method on the UMIST
set and probe sets was to report the recognition accuracy. and ORL datasets. For all boxplots, the number of dimensions
Fig. 4, Fig. 7, and Fig. 9 shows the average recognition of all the algorithms on the test was set from 11 to 15.
rate versus the subspace dimensions for the UMIST, ORL, Because training time (> 4 hour) of TDLA on CAS-PEAL-R1
and CAS-PEAL-R1 expression and distance database for six expression and distance database was too long, fig. 10 show
vector-based methods and TRPDA method. Fig. 5 show the the shows the average recognition rate versus the subspace
TAO et al.: TRPDA FOR FACIAL RECOGNITION 331
Fig. 7. The average recognition rate versus the subspace dimension on the sets of ORL for six vector-based methods and TRPDA method: (a) two samples
for training; (b) four samples for training; (c) six samples for training.
Fig. 8. The boxplots on ORL for seven methods from left to right of LDA, PCA, ISOMAP, DLA, MFA, LPP and TRPDA: (a) two samples for training;
(b) four samples for training; (c) six samples for training.
TABLE II
B EST AVERAGE R ECOGNITION R ATE OF T WELVE
A LGORITHMS ON THE UMIST D ATABASE
V. C ONCLUSION
During the past decade, a large number of subspace-based
feature extraction algorithms for facial recognition have been
proposed. However, most traditional extraction algorithms are
vector-based feature selection algorithms, which lose part of
TABLE IV the original spatial constraints of each pixel in the facial
B EST AVERAGE R ECOGNITION R ATE OF E LEVEN A LGORITHMS ON images. Thus, it will be more effective to propose a tensor-
THE CAS-PEAL-R1 E XPRESSION AND D ISTANCE D ATABASE
based feature extraction algorithm, by which the natural struc-
ture of the input samples is fully utilized.
In this paper, by considering the facial image as a two order
tensor, the low-dimensional tensor subspace of the original
input tensor samples was obtained; moreover, discriminative
locality alignment was captured to transform the refined ten-
sor samples to the ultimate vector feature representation for
subsequent facial recognition. In addition, numerous exper-
iments on the three facial image databases were conducted
TABLE V to demonstrate the effective performance of the proposed
T RAINING T IME algorithm.
R EFERENCES
[1] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques
for embedding and clustering,” in Proc. Adv. Neural Inf. Process. Syst.,
vol. 14. Dec. 2002, pp. 585–591.
of dimensions of the selected subspace is low and the training [2] S. Biswas, G. Aggarwal, P. J. Flynn, and K. W. Bowyer, “Pose-robust
set is small, our method still presents robust performance recognition of low-resolution face images,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 35, no. 12, pp. 3037–3049, Dec. 2013.
because it simultaneously considers the natural structure of [3] D. Chen, X. Cao, F. Wen, and J. Sun, “Blessing of dimensionality:
the input samples and the rank order information of intra-class High-dimensional feature and its efficient compression for face verifi-
samples. cation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013,
pp. 3025–3032.
2. Fig. 6 and Fig. 8 show the boxplots of the experimental [4] M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, and
results of the six vector-based algorithms and TRPDA on the A. Del Bimbo, “3-D human action recognition by shape analysis of
two UMIST and ORL datasets. Each boxplot contains a box motion trajectories on Riemannian manifold,” IEEE Trans. Cybern.,
vol. 45, no. 7, pp. 1340–1352, Jul. 2015.
and whisker, the box has lines at the lower, median, and high [5] D. B. Graham and N. M. Allinson, “Characterizing virtual eigensigna-
quartile values, and the whisker extend from end of the box tures for general purpose face recognition,” in Face Recognition:
to the adjoin values in the data-by default. According to these From Theory to Applications (NATO ASI Series F, Computer and
Systems Sciences), vol. 163, H. Wechsler, P. J. Pil-Lips, V. Bruce,
boxplots, TRPDA extract features with the rank information F. Fogelman-Soulie, and T. S. Huang, Eds. Berlin, Germany: Springer,
and eliminates the most unstable ones, manifold learning 1998, pp. 446–456.
TAO et al.: TRPDA FOR FACIAL RECOGNITION 333
[6] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” [32] D. Tao, X. Li, X. Wu, and S. J. Maybank, “Geometric mean for subspace
Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004. selection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2,
[7] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Hoboken, pp. 260–274, Feb. 2009.
NJ, USA: Wiley, 2012. [33] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random
[8] R. A. Fisher, “The use of multiple measurements in taxonomic prob- subspace for support vector machines-based relevance feedback in image
lems,” Ann. Eugenics, vol. 7, no. 2, pp. 179–188, 1936. retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 7,
[9] W. Gao et al., “The CAS-PEAL large-scale Chinese face database and pp. 1088–1099, Jul. 2006.
baseline evaluations,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, [34] D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant
vol. 38, no. 1, pp. 149–161, Jan. 2008. analysis and Gabor features for gait recognition,” IEEE Trans. Pattern
[10] X. Gao, X. Wang, D. Tao, and X. Li, “Supervised Gaussian process Anal. Mach. Intell., vol. 29, no. 10, pp. 1700–1715, Oct. 2007.
latent variable model for dimensionality reduction,” IEEE Trans. Syst., [35] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric
Man, Cybern. B, Cybern., vol. 41, no. 2, pp. 425–434, Apr. 2011. framework for nonlinear dimensionality reduction,” Science, vol. 290,
[11] N. Guan, D. Tao, Z. Luo, and B. Yuan, “Non-negative patch alignment no. 5500, pp. 2319–2323, Dec. 2000.
framework,” IEEE Trans. Neural Netw., vol. 22, no. 8, pp. 1218–1230, [36] W. K. Wong, Z. Lai, Y. Xu, J. Wen, and C. P. Ho, “Joint tensor feature
Aug. 2011. analysis for visual object recognition,” IEEE Trans. Cybern., vol. 45,
no. 11, pp. 2425–2436, Nov. 2015.
[12] X. He and P. Niyogi, “Locality preserving projections,” in Proc. Adv.
[37] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph
Neural Inf. Process. Syst., vol. 45. Dec. 2005, pp. 186–197.
embedding and extensions: A general framework for dimensionality
[13] H. Hotelling, “Analysis of a complex of statistical variables into principal reduction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1,
components,” J. Edu. Psychol., vol. 24, no. 6, pp. 417–441, Sep. 1933. pp. 40–51, Jan. 2007.
[14] C. Hou, F. Nie, C. Zhang, D. Yi, and Y. Wu, “Multiple rank multi-linear [38] J. Yang, D. Zhang, A. F. Frangi, and J.-Y. Yang, “Two-dimensional
SVM for matrix data classification,” Pattern Recognit., vol. 47, no. 1, PCA: A new approach to appearance-based face representation and
pp. 454–469, 2014. recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1,
[15] D. Huang, C. Shan, M. Ardabilian, Y. Wang, and L. Chen, “Local binary pp. 131–137, Jan. 2004.
patterns and its application to facial image analysis: A survey,” IEEE [39] J. Ye, R. Janardan, and Q. Li, “Two-dimensional linear discriminant
Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 41, no. 6, pp. 765–781, analysis,” in Proc. Adv. Neural Inf. Process. Syst., 2004, pp. 1569–1576.
Nov. 2011. [40] J. Yu, D. Tao, J. Li, and J. Cheng, “Semantic preserving distance metric
[16] Y. Jia, F. Nie, and C. Zhang, “Trace ratio problem revisited,” IEEE learning and applications,” Inf. Sci., vol. 281, pp. 674–686, Oct. 2014.
Trans. Neural Netw., vol. 20, no. 4, pp. 729–735, Apr. 2009. [41] L. Zhang, L. Zhang, D. Tao, and X. Huang, “Tensor discrimina-
[17] L.-H. Lim, “Singular values and eigenvalues of tensors: A variational tive locality alignment for hyperspectral image spectral–spatial fea-
approach,” in Proc. IEEE Int. Workshop Comput. Adv. Multi-Sensor ture extraction,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 1,
Adapt. Process., Dec. 2005, pp. 129–132. pp. 242–256, Jan. 2013.
[18] L. D. Lathauwer, “Signal processing based on multilinear algebra,” [42] T. Zhang, D. Tao, X. Li, and J. Yang, “Patch alignment for dimen-
Ph.D. dissertation, Dept. Elektrotechniek, Katholike Universiteit Leuven, sionality reduction,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9,
Leuven, Belgium, 1997. pp. 1299–1313, Sep. 2009.
[19] W.-Y. Liu, K. Yue, and M.-H. Gao, “Constructing probabilistic graphical
model from predicate formulas for fusing logical and probabilistic
knowledge,” Inf. Sci., vol. 181, no. 18, pp. 3825–3845, May 2011.
[20] W. Liu, H. Zhang, D. Tao, Y. Wang, and K. Lu, “Large-scale paralleled
sparse principal component analysis,” Multimedia Tools Appl., vol. 75,
no. 3, pp. 1481–1493, 2014, doi: 10.1007/s11042-014-2004-4.
[21] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “MPCA: Multilin-
ear principal component analysis of tensor objects,” IEEE Trans. Neural
Netw., vol. 19, no. 1, pp. 18–39, Jan. 2008.
[22] X. Lu, X. Zheng, and X. Li, “Latent semantic minimal hashing Dapeng Tao received the B.E degree from
for image retrieval,” IEEE Trans. Image Process., vol. 26, no. 1, Northwestern Polytechnical University and the Ph.D.
pp. 355–368, Jan. 2017. degree from the South China University of Tech-
[23] X. Lu, Y. Yuan, and X. Zheng, “Joint dictionary learning for multispec- nology. He is currently a Professor with the School
tral change detection,” IEEE Trans. Cybern., vol. 47, no. 4, pp. 884–897, of Information Science and Engineering, Yunnan
Apr. 2017. University, Kunming, China. He has authored and
[24] F. Nie, S. Xiang, Y. Song, and C. Zhang, “Extracting the optimal co-authored over 50 scientific articles. His research
dimensionality for local tensor discriminant analysis,” Pattern Recognit., interests include machine learning, computer vision,
vol. 42, no. 1, pp. 105–114, 2009. and robotics. He has served over 10 international
[25] F. Nie, S. Xiang, Y. Song, and C. Zhang, “Orthogonal journals including the IEEE TNNLS, the IEEE
locality minimizing globality maximizing projections for feature TCYB, the IEEE TMM, the IEEE CSVT, the IEEE
extraction,” Opt. Eng., vol. 48, no. 1, pp. 017202-1–017202-5, TBME, and the Information Sciences.
2009.
[26] F. Nie, D. Xu, I. W. Tsang, and C. Zhang, “Flexible manifold embed-
ding: A framework for semi-supervised and unsupervised dimension
reduction,” IEEE Trans. Image Process., vol. 19, no. 7, pp. 1921–1932,
Jul. 2010.
[27] F. Nie, J. Yuan, and H. Huang, “Optimal mean robust principal
component analysis,” in Proc. Int. Conf. Mach. Learn., Jun. 2007,
pp. 1062–1070.
[28] F. Nie, S. Xiang, and C. Zhang, “Neighborhood MinMax projections,”
in Proc. Int. Joint Conf. Artif. Intell., 2007, pp. 993–998. Yanan Guo received the B.Eng. degree from Hubei
[29] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by Polytechnic University. She is currently pursuing
locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, the M.Sc. degree with Yunnan University, Kunming,
2000. China. Her research interests include machine learn-
[30] F. S. Samaria and A. C. Harter, “Parameterisation of a stochastic model ing and computer vision.
for human face identification,” in Proc. IEEE Workshop Appl. Comput.
Vis., Dec. 1994, pp. 138–142.
[31] F. M. Sukno, J. L. Waddington, and P. F. Whelan, “3-D facial landmark
localization with asymmetry patterns and shape regression from incom-
plete local features,” IEEE Trans. Cybern., vol. 45, no. 9, pp. 1717–1730,
Sep. 2015.
334 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 1, JANUARY 2018
Yaotang Li is currently a Professor with the Xinbo Gao (M’02–SM’07) received the B.E., M.S.,
School of Mathematics and Statistics, Yunnan Uni- and Ph.D. degrees from Xidian University, Xi’an,
versity, China. He has authored or co-authored over China, in 1994, 1997, and 1999, respectively, all
80 research papers. His main research interests in signal and information processing. He was a
include numerical algebra and special matrices. Research Fellow with the Department of Computer
Science, Shizuoka University, Shizuoka, Japan, from
1997 to 1998. From 2000 to 2001, he was a
Post-Doctoral Research Fellow with the Department
of Information Engineering, The Chinese University
of Hong Kong, Hong Kong. Since 2001, he has been
with the School of Electronic Engineering, Xidian
University, where he is currently a Cheung Kong Professor with the Ministry
of Education, and also a Professor of pattern recognition and intelligent
systems and the Director of the State Key Laboratory of Integrated Services
Networks. He has authored five books and around 200 technical articles
in refereed journals and proceedings. His current research interests include
multimedia analysis, computer vision, pattern recognition, machine learning,
and wireless communications. He is a fellow of the Institution of Engineering
and Technology. He is on the Editorial Boards of several journals, including
Signal Processing (Elsevier) and Neurocomputing (Elsevier). He served as
the General Chair/Co-Chair, the Program Committee (PC) Chair/Co-Chair,
or a PC Member for around 30 major international conferences.