Sunteți pe pagina 1din 10

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO.

1, JANUARY 2018 325

Tensor Rank Preserving Discriminant Analysis


for Facial Recognition
Dapeng Tao , Yanan Guo, Yaotang Li, and Xinbo Gao , Senior Member, IEEE

Abstract— Facial recognition, one of the basic topics in com- intelligent algorithms for face recognition results in efficiency
puter vision and pattern recognition, has received substantial improvements, innovations, and cost savings in several areas.
attention in recent years. However, for those traditional facial In general, the basic steps for facial recognition
recognition algorithms, the facial images are reshaped to a
long vector, thereby losing part of the original spatial con- involve facial detection, handcrafted-based feature extrac-
straints of each pixel. In this paper, a new tensor-based feature tion, subspace-based feature extraction and classification [14].
extraction algorithm termed tensor rank preserving discriminant At the facial detection and handcrafted-based feature extrac-
analysis (TRPDA) for facial image recognition is proposed; tion stage [27], local features encode features on interest
the proposed method involves two stages: in the first stage, regions [31]. Biswas et al. [2] used Scale Invariant Feature
the low-dimensional tensor subspace of the original input tensor
samples was obtained; in the second stage, discriminative locality Transform (SIFT) features [6] to describe each landmark and
alignment was utilized to obtain the ultimate vector feature combined the SIFT features of total landmarks to represent
representation for subsequent facial recognition. On the one a face. The SIFT feature was popularized in the computer
hand, the proposed TRPDA algorithm fully utilizes the natural vision field and was initially designed for recognizing the
structure of the input samples, and it applies an optimization cri- identical object under different conditions. Because of its high
terion that can directly handle the tensor spectral analysis prob-
lem, thereby decreasing the computation cost compared those discriminative power, the SIFT feature is widely adopted for
traditional tensor-based feature selection algorithms. On the facial recognition [5]. Chen et al. [3] obtained multi-scale
other hand, the proposed TRPDA algorithm extracts feature Local Binary Pattern (LBP) [15] features from 27 landmarks,
by finding a tensor subspace that preserves most of the rank where the 27 landmarks come from a patch. For all patches,
order information of the intra-class input samples. Experiments LBP features are connected to become a long feature vector
on the three facial databases are performed here to determine
the effectiveness of the proposed TRPDA algorithm. as the pose feature. LBP features are originally proposed for
Index Terms— Tensor representation, rank preserving, face
texture classification; for an image, the values of LBP features
recognition, discriminant analysis. are determined by its local geometric structure, based on a
non-parametric method. The LBP features have been used in
I. I NTRODUCTION image description widely [33].

F ACEIAL recognition, one of the popular topics in com-


puter vision that has been studied and used for several
decades, can be applied to a mass of applications such as
At the subspace-based feature extraction stage
[24]–[26], [34], one of the most important problems is
to find a projection that transforms the original samples from
security, entertainment, and forensics. The development of a high-dimensional space to a low-dimensional subspace for
Manuscript received March 18, 2016; revised January 26, 2017; accepted the subsequent classification. This approach aims to reveal
October 5, 2017. Date of publication October 12, 2017; date of current version an effective representation of the distribution of the samples
November 3, 2017. This work was supported in part by the National Natural in the original high-dimensional space. From the perspective
Science Foundation of China under Grant 61562053, Grant 61572486,
Grant 61432014, Grant 61772402, Grant 61772455, and Grant 61402458, of the input sample is a vector or tensor, we can group these
in part by the Yunnan Natural Science Funds under Grant 2016FB105, in part algorithms into two categories: vector-based algorithms and
by the Guangdong Natural Science Funds under Grant 2014A030310252, tensor-based algorithms.
in part by the Program for Excellent Young Talents of Yunnan Univer-
sity under Grant WX069051, in part by the National Key Research and Traditional vector-based algorithms, which represent each
Development Program of China under Grant 2016QY01W0200, in part by input sample as a vector, have been developed for sev-
the National High-Level Talents Special Support Program of China under eral decades [1], [10], [11], [16], [19], [20], [22], [23],
Grant CS31117200001, and in part by the Project of Innovative Research
Team of Yunnan province. The associate editor coordinating the review of [28], [29], [40]. Representative conventional unsupervised
this manuscript and approving it for publication was Prof. Shiguang Shan. feature selection algorithms include locality preserving projec-
(Corresponding author: Dapeng Tao.) tions (LPP) [12], principal component analysis (PCA) [13] and
D. Tao is with the School of Information Science and Engineering, Yunnan
University, Kumming 650091, China (e-mail: dapeng.tao@gmail.com). ISOMAP [35]. LPP, a popular manifold learning based unsu-
Y. Guo and Y. Li are with the School of Mathematics and Statistics, pervised linear feature selection algorithm, builds a weighted
Yunnan University, Kumming 650091, China (e-mail: yananguo.ynu@qq.com; graph to preserve the distance-based sample relationship infor-
liyaotang@ynu.edu.cn).
X. Gao is with the State Key Laboratory of Integrated Services Networks, mation. In the training set, the undirected weighted graph
School of Electronic Engineering, Xidian University, Xi’an 710071, China combines the neighborhood information of the pair samples.
(e-mail: xbgao@mail.xidian.edu.cn). The disadvantage of LPP is that the class label information is
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. not utilized; hence, it is not optimal for the classification tasks.
Digital Object Identifier 10.1109/TIP.2017.2762588 PCA is the traditional globally unsupervised linear feature
1057-7149 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
326 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 1, JANUARY 2018

selection algorithm that maximizes the mutual information the manifold structure of the data. Tensor discriminative
between primitive high-dimensional Gaussian distributed sam- locality alignment (TDLA) [41], a supervised algorithm, is a
ples and projected low-dimensional samples. However, as for tensor generalization for DLA where the optimal solution is
LPP, PCA does not consider the label information; thus, we do obtained by optimizing each mode of the input samples. The
not use it to classify directly in general. As a variant of disadvantage of TDLA is that its computation is expensive
MDS [7], ISOMAP preserves the global geodesic distances and it does not consider the rank order information of the
of all pair-wise samples. Representative conventional super- intra-class samples in a patch.
vised feature selection algorithms include linear discriminant Lim [17] proposed a theory of singular values and singular
analysis (LDA) [8], Marginal Fisher’s Analysis (MFA) [37] vectors for tensors based on a constrained variational approach
and discriminative locality alignment (DLA) [42]. LDA is quite similar to the Rayleigh quotient for symmetric matrix
one of the widely used globally supervised linear feature eigenvalues. These notions are particularly useful in general-
selection algorithms that not only maximizes the determi- izing certain areas where the spectral analysis of matrices has
nant of the between-class scatter matrix but also minimizes traditionally played an important role. Thus, the traditional
the determinant of the within-class scatter matrix with low- spectral analysis based feature selection algorithm can be
dimensional projected samples. Although LDA has extensive generalized to tensor spectral analysis.
applications in pattern classification tasks [4], it often meets a In this paper, we present a new feature selection algo-
circumstance that requires a mass of training samples to obtain rithm called Tensor Rank Preserving Discriminant Analy-
a good model approximation, which can be seen as the small sis (TRPDA) that aims to directly approach the optimization
sample size (SSS) problem [32]. MFA is a popular supervised criterion. TRPDA differs from the aforementioned tensor-
manifold learning based linear feature selection algorithm, based feature selection algorithms, which iteratively approach
which builds a penalty graph with the inter-class marginal the optimization criterion and finally return to the traditional
samples to keep the divisibility of the inter-class. However, spectral analysis problem. TRPDA applies an optimization
MFA ignores the discriminative information of non-marginal criterion that can directly solve the tensor spectral analysis
samples and faces the ill-posed problem. DLA is a popular problem. The main contributions of the proposed TRPDA
supervised manifold learning based linear feature selection algorithm are summarized as follows:
algorithm. DLA utilizes the classification optimization criteria: 1) We represent the facial image as a 2-order-tensor, and
the distance between the intra-class output samples is as small its data structure is preserved. Based on the 2-order
as possible and the distance between the inter-class output tensor representation, the first step of the proposed
samples is as large as possible to preserve the discriminative TRPDA algorithm is to extract tensor feature by finding
information in a local patch. In addition, DLA combines all a tensor subspace that preserves most of the rank order
the optimal weighted parts to form a global subspace structure. information of the intra-class input samples.
However, DLA does not preserve the rank order information 2) Followed by the first step of the TRPDA algorithm,
of the intra-class samples. we vectorized the redefined tensor features. Next,
For vector feature representation, the key shortcoming discriminative locality alignment is adopted to obtain
is that this scheme loses part of the original spatial con- the final vector feature representation, by which the
straints of each pixel in the face images, which hinders the recognition rate is improved.
subsequent algorithm to construct the optimal model and The rest of the paper is summarized as follows. Section II
classification model. To solve the aforementioned difficulty, introduces a brief description of related tensor algebra. Next,
some researchers propose to use tensor representation rather the proposed TRPDA algorithm is presented in detail in
than vector representation as the input sample [41], [36]. section III. Finally, section IV reports the experiments details
Two-dimensional PCA (2DPCA) [38], an unsupervised algo- and discussions, followed by the conclusion in section V.
rithm, projects an image matrix to a low-dimensional matrix
by a linear transformation in the 2-mode while maximizing II. T ENSOR A LGEBRA
the mutual information. Thus, the linear transformation in the Let R denote the set of all real numbers. Tensors [18]
1-mode is ignored, resulting in poor performance. Multi-linear are multidimensional arrays of numbers that transform lin-
PCA (MPCA) [21] is an unsupervised algorithm where the early
 under  coordinate transformations. We call X =
input sample can be vectors, matrices, or higher-order tensors; X n1 ,n2 ,...,n M ∈ R N1 ×N2 ×...×N M a real tensor of order M if
MPCA captures most of the original structure of the input X n1 ,n2 ,...,n M ∈ R, where 1 ≤ n i ≤ Ni , and 1 ≤ i ≤ M.
sample. The disadvantage of MPCA is that the class label We briefly introduce the following relevant definitions in
information is not utilized; hence, it is not optimal for the multi-linear algebra.
classification tasks. Two-dimensional LDA (2DLDA) [39] is Definition 1 (Tensor Product): Let A ∈ R N1 ×N2 ×...×N M and
  
a supervised algorithm that projects an image matrices to B ∈ R N1 ×N2 ×...×N M  be tensors of order M and M  . In this
a low-dimensional matrix by a linear transformation in the case, the tensor product of A and B is a tensor of order
  
1-mode and 2-mode simultaneously, the advantage of 2DLDA M + M  denoted A ⊗ B ∈ R N1 ×N2 ×...×N M ×N1 ×N2 ×...×N M  ,
is that it maximizes the determinant of the between-class with its (n 1 , . . . , n M , n 1 , . . . , n M  )-entry given by
scatter matrix and minimizes the determinant of the within-
A ⊗ Bn1 ×n2 ×...×n M ×n1 ×n2 ...,n = An1 ×n2 ×...×n M Bn1 ×n2 ...,n  ,
class scatter matrix; moreover, 2DLDA preserves the original M M
matrix structure of data. However, 2DLDA does not consider for all index values.
TAO et al.: TRPDA FOR FACIAL RECOGNITION 327

Definition 2 (Tensor Contraction): Let A ∈ R1N × N2 ×. . .× A. Rank Preserving Discriminant Analysis


N M × N1 × N2 × . . . × N M and B ∈ R N1 × N × . . . × N ×
 2 M For each sample Xi , we use the Euclidean distance to find its
N 1 × N 2 × . . . × N M  be tensors of order M + M  and
  
k1 nearest within-class neighbors Xi 1 , · · · , Xi k1 and k2 nearest
M + M  . In this case, the contraction on the tensor product between-class neighbors Xi1 , · · · , Xik2 . Thus, we can build a
A ⊗ B is local patch
[A ⊗ B; (1 : M) (1 : M)]

P(Xi ) = Xi , Xi 1 , · · · , Xi k1 , Xi1 , · · · , Xik2
N1 
NM
= ··· (A)n1 ×...×n M ×n1 ×...×n (B)n1 ×...×n M ×n 1 ×...×n M ,
M
∈ R D1 ×D2 ···×D M−1 ×N1+k1 +k2
n 1 =1 n M =1
with the corresponding representation denoted by
which is of order M  + M  .

Definition 3 (Mode-d Product): Let A ∈ R N1 ×N2 ×...×N M be P(yi ) = yi , yi 1 , · · · , yi k1 , yi1 , · · · , yik2 ∈ R 1×(1+k1 +k2 ) .

a tensor and U ∈ R Nd ×Nd be a matrix of order M and 2.
The mode-d product of A and U is a tensor of order M As our goal is to preserve the discriminability of classes for

denoted A ×d U ∈ R N1 ×N2 ×...×Nd−1 ×Nd ×Nd+1 ×...×N M with its classification, the between-class distances will be as large as
(n 1 , . . . , n d−1 , j, n d+1, . . . , n M )-entry given by possible and the within-class rank order information will be
preserved as much as possible. Based on this point, we have
(A ×d U )n1 ×...×nd−1 × j ×nd+1 ×...×n M the following optimizations on patch Xi :

Nd
= An1 ×...×nd−1 ×nd ×nd+1 ×...×n M U j ×nd , 
k1
 2
arg min yi − yi j , (2)
n d =1 yi
j =1
for all index values. The mode-dproduct is a type of contrac- 
k2
 2
tion. arg max yi − yi j . (3)
yi
To simplify the notation in this paper, we denote j =1


M We unify the above by proposing a penalty factor as ωi and
A ×1 U1 ×2 U2 × · · · × M U M  A ×iUi . a combination factor α
i=1 ⎛ ⎞
k1
 2 
k2
 2
Definition 4 (Frobenius Norm): The Frobenius norm of a arg min ⎝ yi − yi j (ωi ) j − α yi − yi j ⎠ , (4)
tensor A ∈ R N1 ×N2 ×...×N M is given by yi
j =1 j =1

 A = ([A ⊗ A; (1 : M) (1 : M)])1/2 where ωi denoted by


⎛ ⎞1/2 ⎧ 
N1 
NM

=⎝ ··· A2n1 ,...,n M ⎠ . ⎨ Xi ⊗Xij ;(1:M−1)(1:M−1)
  , i f Xi j ∈ Nk1 (Xi )
  
(ωi ) j =  Xi · Xi j 
n 1 =1 n M =1 ⎪

0, other wi se
Definition 5 (Euclidean Distance): The Euclidean distance
between two tensors A,B ∈ R N1 ×N2 ×...×N M is given by and α ∈ [0, 1], The tradeoff factor (ωi ) j uses different
weighting value to emphasize different distances in the original
D (A,B) = A-B  . sample space. In this way, majority of the rank order infor-
mation of within-class samples is preserved, and the small
III. T ENSOR R ANK P RESERVING distances in raw sample space will lead to more heavily
D ISCRIMINANT A NALYSIS penalization in obtained subspace.
To simplify the following derivation, we set
Given a training set {X i } , (i = 1 : N ) in high-dimensional
space R D1 ×D2 ···×D M−1 , for any sample X i , the corresponding
k1 k
    
2

class label is Ci ∈ Z n . First, we consider the problem βi = [(ωi )1 , · · · , (ωi )k1 , −α, · · · , −α].
of representing samples {Xi } (i = 1 : N ) by a single vector
y = {y1 , y2 , · · · , y N } such that yi represents Xi . Specifically, Thus, (4) reduces to
the objective is to find M − 1 two order transformation vector 
k1
 2 
k2
 2
denoted by u j ∈ R 1×D j , ( j = 1, · · · , M − 1). That is, arg min (βi ) j . yi − yi j + (βi ) j +k1 . yi − yi j
yi
j =1 j =1

M−1
1 +k2
k
y i = Xi ×ju j, (1)  2
j =1
= arg min (βi ) j . y Pi (1) − y Pi ( j +1)
yi
j =1
where i ∈ {1, · · · , N }. Here, we consider the tensor-based = arg min P(yi )L i (P(yi ))T , (5)
Patch Alignment framework (PAF) [42]. yi
328 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 1, JANUARY 2018

where 
N 
N

⎡ k1 +k2 ⎤ = arg min


 u 1 ,u 2 ,···u M−1
j1 =1 j2 =1
⎢ (βi ) j −βiT ⎥ ⎛ ⎞
Li = ⎣ ⎦. (6)
j =1 
D1 
D M−1
−βi di ag (βi ) ×⎝ ··· Xi1 ···i M−1 j1 (u 1 )i1 · · · (u M−1 )i M−1⎠ L j1 j2
i1 =1 i M−1 =1
⎛ ⎞
B. Whole Alignment
⎜ 
D1 
D M−1

Each P(yi ) can be selected from the global coordinates ×⎝ ··· Xi1 ···i M−1
 j2 (u 1 )i1 · · · (u M−1 )i M−1
 ⎠
Y = [yi , y2 , · · · , y N ] ∈ R 1×N by using a selection matrix, i1 =1 
i M−1 =1
i.e. = arg min
u 1 ,u 2 ,···u M−1
P(yi ) = YSi , (7)
 ! !  2(M−1)

where Si ∈ R N×(1+k1 +k2 ) is the selection matrix whose entry × X × M L T ⊗ X ; (M) (M) ×i u P(i) , (11)
is defined by: i=1
 where P = {1, 2, · · · , M − 1, 1, 2, · · · , M − 1} is the
1 i f p = Fi (q)
(Si ) pq = (8) index set.
0 else, To avoid trivial solutions, we impose the following con-
  straints to uniquely determine the projection matrices:
for all index values, where Fi = i, i 1 , · · · , i k1 , i 1 , · · · , i k2
denotes the set of indices for i t h patch that is built by the  u 1 2 = · · · = u M−1 2 = 1. (12)
patch P(Xi ). For simplicity, we denote
Thus, (5) can be rewritten as:  ! ! 
! X × M L T ⊗ X ; (M) (M)  A. (13)
arg min YSi L i SiT YT . (9)
Y Using (12), (11) can be transformed to
We have the whole alignment by summing over all the part 
2(M−1)
optimizations defined in (8) to achieve arg min A ×i u P(i)
u 1 ,u 2 ,···u M−1
i=1

N !
s.t.  u 1 2 = · · · = u M−1 2 = 1. (14)
arg min YSi L i SiT YT = arg min YLYT , (10)
Y Y
i=1 Similar to Lagrange’s multiplier method used in matrix
N  eigenvalue problem, by taking a constrained variation
" 
where L = Si L i SiT ∈ R N×N is the alignment matrix. approach [15], we can transform (14) to a tensor singular value
i=1 problem. The projection matrices Ui (i = 1, · · · , M − 1) are
By using (1), (10) can be written as
di singular vectors related to di smallest mode-i singular
⎛ ⎞ ⎛ ⎞T values of A.

M−1 
M−1
arg min ⎝X × j U j ⎠ L ⎝X ×jUj⎠
In this part, to the best knowledge of the authors, most
u 1 ,u 2 ,···u M−1 tensor-based algorithms aim to approach an optimization
j =1 j =1
criterion iteratively. The objective function of tensor-based
= arg min algorithm is often defined as:
u 1 ,u 2 ,···u M−1
!
⎛ ⎞T arg min tr UkT F (k) Uk s.t.UkT Uk = I, (15)

D1 
D M−1
Uk
⎜ ··· Xi1 ···i M−1 1 (u 1 )i1 · · · (u M−1 )i M−1 ⎟
⎜ ⎟ where k = 1, 2, · · · , M−1, M−1 is the order of the input sam-
⎜ i1 =1 i M−1 =1 ⎟
⎜. ⎟ ple, Uk (k = 1, · · · , M − 1) are the projection matrices, and
⎜. ⎟
×⎜. ⎟ L F (k) refers to the alignment matrix varies by different tensor-
⎜ ⎟
⎜ D1 ⎟ based methods. Next, when selecting any k ∈ {1, · · · , M − 1},
⎜ 
D M−1 ⎟
⎝ ··· Xi1 ···i M−1 2 (u 1 )i1 · · · (u M−1 )i M−1 ⎠ they update the projection matrix Uk by fixing the other
i1 =1 i M−1 =1 M − 2 projection matrix U j ( j = k). Thus, these tensor-based
⎛ ⎞ algorithms not only will increase computation complexity but

D1 
D M−1
also result in a convergence problem.
⎜ ··· Xi1 ···i M−1 1 (u 1 )i1 · · · (u M−1 )i M−1 ⎟
⎜ ⎟
⎜ i1 =1 i M−1 =1 ⎟
⎜. ⎟ C. Discriminative Locality Alignment
⎜. ⎟
×⎜. ⎟
⎜ ⎟ The output from the above process is a tensor that could not
⎜ D1 ⎟
⎜ 
D M−1 ⎟ be processed by a conventional classifier directly. A tensor to
⎝ ··· Xi1 ···i M−1 2 (u 1 )i1 · · · (u M−1 )i M−1 ⎠ vector feature selection scheme is introduced here to select the
i1 =1 i M−1 =1 vector feature from the above output tensor data.
TAO et al.: TRPDA FOR FACIAL RECOGNITION 329

TABLE I
A G ENERAL F RAMEWORK OF TRPDA

Fig. 2. ORL database.

Fig. 3. CAS-PEAL-R1 expression and distance database.

each individual vary in accessory, expression, background, dis-


tance, lighting, and time. The two probe sets (PE and PS) cor-
responding to variations in expression and distance of frontal
faces respectively, the PE probe set contains 1570 images
collected from 377 individuals while the PS probe set contains
275 images collected from 247 individuals. The gallery set
Fig. 1. UMIST database. contains 1040 images collected from the 1040 individuals
under a normal condition, with each individual having one
Let Bn ∈ R d1 ×d2 ···×d M−1 be the output tensor. The ele- image. Fig. 3 shows several typical images.
ments in Bn are rearranged into a vector feature x n by the
vectorization of Bn . To promote the discriminability of the B. Experimental Setup and Results
vector feature representation further, in this paper, we utilize In this section, six vector-based feature extraction algo-
the DLA [42] algorithm to transform the facial image set to a rithms (LDA, PCA, ISOMAP, DLA, MFA, and LPP), and five
more discriminative feature representation. tensor-based feature extraction algorithms (2DPCA, 2DLDA,
As a summary of this section, we provide the procedure of MPCA, TDLA, TLDA) are compared to evaluate the proposed
TRPDA algorithm in Table 1. TRPDA algorithm on three facial databases. For six vector-
based methods, they reshape the facial image to a long vector
IV. E XPERIMENTAL R ESULTS by fixing its order to arrange its pixel values. Regarding
This section describes experiments conducted on two public whether or not to consider the class label information, LDA,
face image databases to highlight the capability of the pro- DLA, ISOMAP, MFA, 2DLAD, TDLA, and TLDA are super-
posed TRPDA algorithm. Because facial image databases are vised algorithms that do consider the class label information,
naturally a two-order tensor with a row and a column, the input whereas PCA, LPP, 2DPCA, and MPCA are unsupervised
is a tensor of order two, with its row space and column space algorithms that do not. When applying the aforementioned
accounting for its two modes. algorithms on three databases, all facial images were normal-
ized to the 40 × 40 pixel with 256 gray levels for each pixel.
For the LDA and LPP algorithms, because the number of
A. Datasets the training samples is substantially smaller than the original
The first experiment was performed on the UMIST data- features’ dimension, PCA projection is the first stage to ensure
base [3], which contains 575 facial images from twenty people. the within-scatter matrix is nonsingular.
Each of the individuals covers a range of poses from profile For UMIST and ORL database, they were divided into two
to frontal views, and the subjects cover a range of races, separate sets randomly: training set and testing set. The role of
sexes and appearances. Fig. 1 shows several images of one the training set was to obtain the projection matrix or the pro-
individual. jection matrices to learn the low-dimensional subspace, while
The ORL database [30] was the second dataset to perform the role of testing set was to report the recognition accuracy
on the experiment, which contains 400 facial images collected by using the Nearest Neighbor (NN) rule, which is convenient
from forty persons. Ten images for each person vary in pose compared to the Support Vector Machine (SVM) [14]. Specifi-
and facial expression, such as closed or open eyes, smiling or cally, for the UMIST database, different numbers (3, 5, and 7)
not, and so on. Fig. 2 shows one individual image set. of facial images per individual were randomly selected for
The third experiment was performed on the CAS-PEAL-R1 training, with the remaining facial images used for test-
expression and distance database [9], the CAS-PEAL-R1 ing; for the ORL database, different numbers (2, 4, and 6)
expression and distance database contains one training set, of facial images per individual were randomly selected for
two probe sets, and one gallery set. The training set contains training, with the remaining facial images used for testing.
1200 images collected from 300 individuals, four images for Each experiment was performed ten times independently to
330 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 1, JANUARY 2018

Fig. 4. The average recognition rate versus the subspace dimension on the sets of UMIST for six vector-based methods and TRPDA method: (a) three
samples for training; (b) five samples for training; (c) seven samples for training.

Fig. 5. The average recognition rate versus the subspace dimension on the sets of UMIST for six tensor-based methods: (a) three samples for training;
(b) five samples for training; (c) seven samples for training.

Fig. 6. The boxplots on UMIST for seven methods from left to right of LDA, PCA, ISOMAP, DLA, MFA, LPP and TRPDA: (a) three samples for training;
(b) five samples for training; (c) seven samples for training.

obtain the average recognition. For CAS-PEAL-R1 expression average recognition rate versus the subspace dimensions for
and distance database, the role of the training set was to the UMIST database for six tensor-based methods. Fig. 6 and
obtain the projection matrix or the projection matrices to Fig. 8 show the boxplots of the experimental results of the
learn the low-dimensional subspace, while the role of gallery six vector-based methods and TRPDA method on the UMIST
set and probe sets was to report the recognition accuracy. and ORL datasets. For all boxplots, the number of dimensions
Fig. 4, Fig. 7, and Fig. 9 shows the average recognition of all the algorithms on the test was set from 11 to 15.
rate versus the subspace dimensions for the UMIST, ORL, Because training time (> 4 hour) of TDLA on CAS-PEAL-R1
and CAS-PEAL-R1 expression and distance database for six expression and distance database was too long, fig. 10 show
vector-based methods and TRPDA method. Fig. 5 show the the shows the average recognition rate versus the subspace
TAO et al.: TRPDA FOR FACIAL RECOGNITION 331

Fig. 7. The average recognition rate versus the subspace dimension on the sets of ORL for six vector-based methods and TRPDA method: (a) two samples
for training; (b) four samples for training; (c) six samples for training.

Fig. 8. The boxplots on ORL for seven methods from left to right of LDA, PCA, ISOMAP, DLA, MFA, LPP and TRPDA: (a) two samples for training;
(b) four samples for training; (c) six samples for training.

TABLE II
B EST AVERAGE R ECOGNITION R ATE OF T WELVE
A LGORITHMS ON THE UMIST D ATABASE

Fig. 9. The average recognition rate versus the subspace dimension on


the sets of CAS-PEAL-R1 expression and distance database for six vector-
based methods and TRPDA method, and the probe sets are: (a) expression;
(b) distance.

projection matrices, respectively. Table 5 reports training time


of each method. We conduct all experiments on an i5-2500K
3.30GHz computer with an 8-GB memory.
dimensions for the CAS-PEAL-R1 expression and distance
database for five tensor-based methods (2DPCA, 2DLDA, C. Discussion
MPCA, TLDA, and TRPDA) combine with DLA. Table 2, Based on the above experiments, we made the following
Table 3, and Table 4 report the best average recognition rates observations:
of the corresponding dimensionalities for the UMIST database, 1. 1. Figs. 4-5, Fig. 7, and Figs.9-10 show the recognition
the ORL database, and the CAS-PEAL-R1 expression and rate of the eleven feature extraction algorithms on the three
distance database, respectively. facial image datasets. TRPDA was found to have the most
We selected CAS-PEAL-R1 expression and distance dataset robust performance and produce the highest recognition rate
and applied TRPDA, MPCA [1] and LTDA to update the of the algorithms considered. Specifically, when the number
332 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 1, JANUARY 2018

methods including MFA, DLA and ISOMAP are more stable


than LDA because they consider the manifold structure of
sample.
3. According to the average recognition rate of these
vector-based methods and TRPDA, PCA and LPP do not
exhibit promising performance because they ignore the class
label information. Because the class label information is
considered, ISOMAP, MFA and LDA perform better than
PCA and LPP, and their performance is superior to those
unsupervised algorithms. TRPDA and DLA perform the best
because they consider both the discriminative information and
Fig. 10. The average recognition rate versus the subspace dimension on the
sets of CAS-PEAL-R1 expression and distance database for five tensor-based local geometric information. Because TRPDA takes advantage
methods, and the probe sets are: (a) expression; (b) distance. of the natural structure of the input samples and preserves
the rank order information in a local patch, it outperforms
TABLE III
DLA. According to the average recognition rate of these
B EST AVERAGE R ECOGNITION R ATE OF S EVEN
A LGORITHMS ON THE ORL D ATABASE tensor-based methods, TDLA’s performance inferior to other
tensor-based method due to its convergence problem.
Typical tensor-based methods, such as 2DPCA, 2DLDA,
MPCA, achieve comparable performance, because they take
advantage of the natural structure of the input samples.
TRPDA perform the best because it considers the rank infor-
mation and directly approach the optimization criterion.

V. C ONCLUSION
During the past decade, a large number of subspace-based
feature extraction algorithms for facial recognition have been
proposed. However, most traditional extraction algorithms are
vector-based feature selection algorithms, which lose part of
TABLE IV the original spatial constraints of each pixel in the facial
B EST AVERAGE R ECOGNITION R ATE OF E LEVEN A LGORITHMS ON images. Thus, it will be more effective to propose a tensor-
THE CAS-PEAL-R1 E XPRESSION AND D ISTANCE D ATABASE
based feature extraction algorithm, by which the natural struc-
ture of the input samples is fully utilized.
In this paper, by considering the facial image as a two order
tensor, the low-dimensional tensor subspace of the original
input tensor samples was obtained; moreover, discriminative
locality alignment was captured to transform the refined ten-
sor samples to the ultimate vector feature representation for
subsequent facial recognition. In addition, numerous exper-
iments on the three facial image databases were conducted
TABLE V to demonstrate the effective performance of the proposed
T RAINING T IME algorithm.

R EFERENCES
[1] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques
for embedding and clustering,” in Proc. Adv. Neural Inf. Process. Syst.,
vol. 14. Dec. 2002, pp. 585–591.
of dimensions of the selected subspace is low and the training [2] S. Biswas, G. Aggarwal, P. J. Flynn, and K. W. Bowyer, “Pose-robust
set is small, our method still presents robust performance recognition of low-resolution face images,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 35, no. 12, pp. 3037–3049, Dec. 2013.
because it simultaneously considers the natural structure of [3] D. Chen, X. Cao, F. Wen, and J. Sun, “Blessing of dimensionality:
the input samples and the rank order information of intra-class High-dimensional feature and its efficient compression for face verifi-
samples. cation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013,
pp. 3025–3032.
2. Fig. 6 and Fig. 8 show the boxplots of the experimental [4] M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, and
results of the six vector-based algorithms and TRPDA on the A. Del Bimbo, “3-D human action recognition by shape analysis of
two UMIST and ORL datasets. Each boxplot contains a box motion trajectories on Riemannian manifold,” IEEE Trans. Cybern.,
vol. 45, no. 7, pp. 1340–1352, Jul. 2015.
and whisker, the box has lines at the lower, median, and high [5] D. B. Graham and N. M. Allinson, “Characterizing virtual eigensigna-
quartile values, and the whisker extend from end of the box tures for general purpose face recognition,” in Face Recognition:
to the adjoin values in the data-by default. According to these From Theory to Applications (NATO ASI Series F, Computer and
Systems Sciences), vol. 163, H. Wechsler, P. J. Pil-Lips, V. Bruce,
boxplots, TRPDA extract features with the rank information F. Fogelman-Soulie, and T. S. Huang, Eds. Berlin, Germany: Springer,
and eliminates the most unstable ones, manifold learning 1998, pp. 446–456.
TAO et al.: TRPDA FOR FACIAL RECOGNITION 333

[6] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” [32] D. Tao, X. Li, X. Wu, and S. J. Maybank, “Geometric mean for subspace
Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004. selection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2,
[7] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Hoboken, pp. 260–274, Feb. 2009.
NJ, USA: Wiley, 2012. [33] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random
[8] R. A. Fisher, “The use of multiple measurements in taxonomic prob- subspace for support vector machines-based relevance feedback in image
lems,” Ann. Eugenics, vol. 7, no. 2, pp. 179–188, 1936. retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 7,
[9] W. Gao et al., “The CAS-PEAL large-scale Chinese face database and pp. 1088–1099, Jul. 2006.
baseline evaluations,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, [34] D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant
vol. 38, no. 1, pp. 149–161, Jan. 2008. analysis and Gabor features for gait recognition,” IEEE Trans. Pattern
[10] X. Gao, X. Wang, D. Tao, and X. Li, “Supervised Gaussian process Anal. Mach. Intell., vol. 29, no. 10, pp. 1700–1715, Oct. 2007.
latent variable model for dimensionality reduction,” IEEE Trans. Syst., [35] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric
Man, Cybern. B, Cybern., vol. 41, no. 2, pp. 425–434, Apr. 2011. framework for nonlinear dimensionality reduction,” Science, vol. 290,
[11] N. Guan, D. Tao, Z. Luo, and B. Yuan, “Non-negative patch alignment no. 5500, pp. 2319–2323, Dec. 2000.
framework,” IEEE Trans. Neural Netw., vol. 22, no. 8, pp. 1218–1230, [36] W. K. Wong, Z. Lai, Y. Xu, J. Wen, and C. P. Ho, “Joint tensor feature
Aug. 2011. analysis for visual object recognition,” IEEE Trans. Cybern., vol. 45,
no. 11, pp. 2425–2436, Nov. 2015.
[12] X. He and P. Niyogi, “Locality preserving projections,” in Proc. Adv.
[37] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph
Neural Inf. Process. Syst., vol. 45. Dec. 2005, pp. 186–197.
embedding and extensions: A general framework for dimensionality
[13] H. Hotelling, “Analysis of a complex of statistical variables into principal reduction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1,
components,” J. Edu. Psychol., vol. 24, no. 6, pp. 417–441, Sep. 1933. pp. 40–51, Jan. 2007.
[14] C. Hou, F. Nie, C. Zhang, D. Yi, and Y. Wu, “Multiple rank multi-linear [38] J. Yang, D. Zhang, A. F. Frangi, and J.-Y. Yang, “Two-dimensional
SVM for matrix data classification,” Pattern Recognit., vol. 47, no. 1, PCA: A new approach to appearance-based face representation and
pp. 454–469, 2014. recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1,
[15] D. Huang, C. Shan, M. Ardabilian, Y. Wang, and L. Chen, “Local binary pp. 131–137, Jan. 2004.
patterns and its application to facial image analysis: A survey,” IEEE [39] J. Ye, R. Janardan, and Q. Li, “Two-dimensional linear discriminant
Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 41, no. 6, pp. 765–781, analysis,” in Proc. Adv. Neural Inf. Process. Syst., 2004, pp. 1569–1576.
Nov. 2011. [40] J. Yu, D. Tao, J. Li, and J. Cheng, “Semantic preserving distance metric
[16] Y. Jia, F. Nie, and C. Zhang, “Trace ratio problem revisited,” IEEE learning and applications,” Inf. Sci., vol. 281, pp. 674–686, Oct. 2014.
Trans. Neural Netw., vol. 20, no. 4, pp. 729–735, Apr. 2009. [41] L. Zhang, L. Zhang, D. Tao, and X. Huang, “Tensor discrimina-
[17] L.-H. Lim, “Singular values and eigenvalues of tensors: A variational tive locality alignment for hyperspectral image spectral–spatial fea-
approach,” in Proc. IEEE Int. Workshop Comput. Adv. Multi-Sensor ture extraction,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 1,
Adapt. Process., Dec. 2005, pp. 129–132. pp. 242–256, Jan. 2013.
[18] L. D. Lathauwer, “Signal processing based on multilinear algebra,” [42] T. Zhang, D. Tao, X. Li, and J. Yang, “Patch alignment for dimen-
Ph.D. dissertation, Dept. Elektrotechniek, Katholike Universiteit Leuven, sionality reduction,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9,
Leuven, Belgium, 1997. pp. 1299–1313, Sep. 2009.
[19] W.-Y. Liu, K. Yue, and M.-H. Gao, “Constructing probabilistic graphical
model from predicate formulas for fusing logical and probabilistic
knowledge,” Inf. Sci., vol. 181, no. 18, pp. 3825–3845, May 2011.
[20] W. Liu, H. Zhang, D. Tao, Y. Wang, and K. Lu, “Large-scale paralleled
sparse principal component analysis,” Multimedia Tools Appl., vol. 75,
no. 3, pp. 1481–1493, 2014, doi: 10.1007/s11042-014-2004-4.
[21] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “MPCA: Multilin-
ear principal component analysis of tensor objects,” IEEE Trans. Neural
Netw., vol. 19, no. 1, pp. 18–39, Jan. 2008.
[22] X. Lu, X. Zheng, and X. Li, “Latent semantic minimal hashing Dapeng Tao received the B.E degree from
for image retrieval,” IEEE Trans. Image Process., vol. 26, no. 1, Northwestern Polytechnical University and the Ph.D.
pp. 355–368, Jan. 2017. degree from the South China University of Tech-
[23] X. Lu, Y. Yuan, and X. Zheng, “Joint dictionary learning for multispec- nology. He is currently a Professor with the School
tral change detection,” IEEE Trans. Cybern., vol. 47, no. 4, pp. 884–897, of Information Science and Engineering, Yunnan
Apr. 2017. University, Kunming, China. He has authored and
[24] F. Nie, S. Xiang, Y. Song, and C. Zhang, “Extracting the optimal co-authored over 50 scientific articles. His research
dimensionality for local tensor discriminant analysis,” Pattern Recognit., interests include machine learning, computer vision,
vol. 42, no. 1, pp. 105–114, 2009. and robotics. He has served over 10 international
[25] F. Nie, S. Xiang, Y. Song, and C. Zhang, “Orthogonal journals including the IEEE TNNLS, the IEEE
locality minimizing globality maximizing projections for feature TCYB, the IEEE TMM, the IEEE CSVT, the IEEE
extraction,” Opt. Eng., vol. 48, no. 1, pp. 017202-1–017202-5, TBME, and the Information Sciences.
2009.
[26] F. Nie, D. Xu, I. W. Tsang, and C. Zhang, “Flexible manifold embed-
ding: A framework for semi-supervised and unsupervised dimension
reduction,” IEEE Trans. Image Process., vol. 19, no. 7, pp. 1921–1932,
Jul. 2010.
[27] F. Nie, J. Yuan, and H. Huang, “Optimal mean robust principal
component analysis,” in Proc. Int. Conf. Mach. Learn., Jun. 2007,
pp. 1062–1070.
[28] F. Nie, S. Xiang, and C. Zhang, “Neighborhood MinMax projections,”
in Proc. Int. Joint Conf. Artif. Intell., 2007, pp. 993–998. Yanan Guo received the B.Eng. degree from Hubei
[29] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by Polytechnic University. She is currently pursuing
locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, the M.Sc. degree with Yunnan University, Kunming,
2000. China. Her research interests include machine learn-
[30] F. S. Samaria and A. C. Harter, “Parameterisation of a stochastic model ing and computer vision.
for human face identification,” in Proc. IEEE Workshop Appl. Comput.
Vis., Dec. 1994, pp. 138–142.
[31] F. M. Sukno, J. L. Waddington, and P. F. Whelan, “3-D facial landmark
localization with asymmetry patterns and shape regression from incom-
plete local features,” IEEE Trans. Cybern., vol. 45, no. 9, pp. 1717–1730,
Sep. 2015.
334 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 1, JANUARY 2018

Yaotang Li is currently a Professor with the Xinbo Gao (M’02–SM’07) received the B.E., M.S.,
School of Mathematics and Statistics, Yunnan Uni- and Ph.D. degrees from Xidian University, Xi’an,
versity, China. He has authored or co-authored over China, in 1994, 1997, and 1999, respectively, all
80 research papers. His main research interests in signal and information processing. He was a
include numerical algebra and special matrices. Research Fellow with the Department of Computer
Science, Shizuoka University, Shizuoka, Japan, from
1997 to 1998. From 2000 to 2001, he was a
Post-Doctoral Research Fellow with the Department
of Information Engineering, The Chinese University
of Hong Kong, Hong Kong. Since 2001, he has been
with the School of Electronic Engineering, Xidian
University, where he is currently a Cheung Kong Professor with the Ministry
of Education, and also a Professor of pattern recognition and intelligent
systems and the Director of the State Key Laboratory of Integrated Services
Networks. He has authored five books and around 200 technical articles
in refereed journals and proceedings. His current research interests include
multimedia analysis, computer vision, pattern recognition, machine learning,
and wireless communications. He is a fellow of the Institution of Engineering
and Technology. He is on the Editorial Boards of several journals, including
Signal Processing (Elsevier) and Neurocomputing (Elsevier). He served as
the General Chair/Co-Chair, the Program Committee (PC) Chair/Co-Chair,
or a PC Member for around 30 major international conferences.

S-ar putea să vă placă și