Sunteți pe pagina 1din 4

NULL SPACE-BASED LDA WITH WEIGHTED DUAL PERSONAL SUBSPACES FOR FACE RECOGNITION Xipeng Qiu and Lide

Wu Media Computing & Web Intelligence Lab Department of Computer Science and Engineering Fudan University, Shanghai, China xpqiu,ldwu@fudan.edu.cn
ABSTRACT Linear discriminant analysis (LDA) is popular feature extraction technique for face recognition. However, it often suffers the small sample size problem when dealing with the high dimensional face data. Moreover, the within-class and between-class scatter matrix used in LDA have low effective when dealing with face data of non-gaussian density. In this paper, we propose a new method for face recognition. We rst calculate the weighted dual personal subspaces to replace the within and between class matrix, then null space-based LDA is performed. The experiments show our method outperforms existing LDA and state-of-art face recognition approaches. 1. INTRODUCTION Among various approaches of the face recognition systems, the most successful one seems to be those appearance-based approaches, which deal directly with the facial images as two-dimensional holistic patterns [1]. Howerver, appearance-based approaches often suffer from curse of dimensionality and many dimensionality reduction techniques have been proposed. Linear discriminant analysis (LDA) [2] is one of the most popular linear dimensionality reduction methods. A drawback of LDA is that it often suffers from the small sample size (SSS) problem when dealing with the high dimensional image data. When there are not enough training samples, the within-class scatter Sw may become singular. The traditional solution to the SSS problem requires the incorporation of a PCA step into the LDA framework. In this approach, PCA is used as a preprocessing step for dimensionality reduction to discard the null space of the within-class scatter matrix of the training data set. Then LDA is performed in the lower dimensional PCA subspace [3]. However, it has been shown that the discarded null space contains the most discriminant information [4, 5]. A effective method, LDA in null space of Sw (NLDA), is proposed in [4]. In the NLDA framework, the null space of Sw is kept and the between-class scatter Sb is removed, and it is assumed that the null space of Sb contains no discriminative information. However, the class separability represented by Sb is poor when the class conditional densities are non-gaussian [2], especially such as the face data. In this paper, we replace Sw and Sb with weighted dual personal subspaces (intra-personal and extra-personal subspace), which are nonparametric within-class and between-class scatter matrix
Thanks to NSF of China (69935010) for funding.

[2], and also can be regarded as the weighted version of Moghaddams [6] dual personal subspaces. We give a fast algorithm to calculate it. Then we do null space-based LDA with weighted dual personal subspaces. Our method does not assume that the density of face data belongs to any particular parametric family. The rest of the paper is organized as follows: Section 2 gives the review of the LDA and null space-based LDA. Then we describe weighted dual personal subspaces and give their fast computational approach in Section 3. Experimental evaluations of our method, existing variant LDA methods and the other state-of-art face recognition approaches are presented in Section 4. Finally, we give the conclusions in Section 5. 2. LINEAR DISCRIMINANT ANALYSIS 2.1. LDA LDA method tries to nd a set of projection vectors W RDd maximizing the ratio of determinant of the between-class scatter Sb to the within-class scatter Sw , W = arg max
W

|W T Sb W | , |W T Sw W |

(1)

where D and d are the dimensionalities of the data before and after the transformation respectively. The between-class scatter matrix Sb and the within-class scatter matrix Sw are dened as Sb Sw = =
c X i=1 c X i=1

pi (mi m)(mi m)T p i Si ,

(2) (3)

where c is the number of classes; mi and pi are the mean vector P and a priori probability of class i, respectively; m = c pi mi i=1 is the total mean vector; Si is the covariance matrix of class i. From Eq.(1), the transformation matrix W must be constituted 1 by the d eigenvectors of Sw Sb corresponding to its rst d largest eigenvalues [2]. However, when the small sample size problem occurs, Sw be1 comes singular and Sw does not exist. To avoid the singularity of Sw , a two-stage PCA+LDA approach is used [3]. PCA is rst used to project the high dimensional face data into a low dimensional feature space. Then LDA is performed in the reduced PCA subspace, in which Sw is non-singular.

2.2. LDA in the Null Space of Sw (NLDA) The traditional LDA approaches described above are all performed in the principal subspace of Sw , in which W T Sw W = 0. However, the null space of Sw , in which W T Sw W = 0, also contains much discriminative information, since it is possible to nd some projection vectors W satisfying W T Sw W = 0 and W T Sb W > 0, thus the Fisher criteria in Eq. (1) denitely reaches its maximum value. A LDA in the null space of Sw was proposed by Chen et al. [4]. First, the null space of Sw is computed as, T Sw = 0 (T = I). Then Sb is projected to the null space of Sw , Sb = T Sb = 0. (5) (4)

So the nonparametric I and E in our denition correspond to the covariance matrices of weighted intra-personal and extrapersonal subspace. Our aim is to nd the projection vectors to minimize the intrapersonal variations and maximize extra-personal variations. Obviously, the optimal projection vectors W must satisfy W T I W = 0 and W T E W > 0. We can calculate the W using the method proposed in NLDA [4]. 3.1. Fast algorithm to calculate I and E From Eq. (8), it is expensive to calculate the E when the number of samples is large. Since the differences between all the samples should be calculated and its time complexity is O(N 2 ). Moghaddam [6] calculate the extra-personal variations only using a random subset. This method lose much information of extrapersonal variations. In this paper, we calculate E using an efcient method and its time complexity is O(N ). Theorem 1 Given a set of samples and their corresponding weights, X X wi wj (xi xj )(xi xj )T = 2 wi (xi )(xi )T ,
i,j i

Choose the eigenvectors of Sb with the largest eigenvalues , T Sb = . The LDA transformation matrix is dened as W = . 3. NULL SPACE-BASED LDA WITH WEIGHTED DUAL PERSONAL SUBSPACES In fact, the class conditional densities of face data are non-gaussian, so the class separability represented by Sb is poor. Especially in the case that each class shares the same mean, it fails to nd the discriminant direction because there is no scatter of the class means[2]. To improve the performance of LDA, we use nonparametric within-class scatter matrix I and between-class scatter matrix E . Besides, if classication is the ultimate goal, the samples near the class boundaries are more important and should have the larger weights. We dene the I and E as follow: X I = wi wj (xi xj )(xi xj )T , (7)
l(xi )=l(xj )

(6)

where = Proof:

wi xi is the mean of samples and X X X X


i,j,k i i i

(10) wi = 1.

wi (xi )(xi )T wi (xi wi X


j

= = =

X
j

wj xj )(xi X
k

X
k

wk xk )T

wj (xi xj )

wk (xi xk )T (11)

wi wj wk (xi xj )(xi xk )T ,

E =

wi wj (xi xj )(xi xj )T ,

(8)

l(xi )=l(xj )

Rewrite Eq.(11), X wi wj wk (xi xj )(xi xk )T


i,j,k

P where wi is the weight of sample i and wi = 1, l(x) is the class label for sample x. I Given a sample xi , we calculate the distance Di from the samE ple to its nearest neighbor in its same class and the distance Di from the sample to its nearest neighbor out of its same class. Then we can calculate its weight, wi =
I (Di ) I (Di ) , E + (Di )

= =

wi wj wk (xi xj )(xi xj + xj xk )T wi wj (xi xj )(xi xj )T wi wj wk (xi xj )(xj xk )T

i,j,k

X
i,j

+ =

(9)

X
i,j

i,j,k

wi wj (xi xj )(xi xj )T wi wj wk (xj xi )(xj xk )T . (12)

where is a control parameter between zero and innity. This sample weight is introduced to deemphasize samples away from class boundaries. In this paper, we set = 2. In nonparametric form, the separability of classes not only is spanned by the class means, but also the difference between all the samples which belong to different classes. Moghaddam et al. [6] dene two classes of facial image variations: intra-personal variations I and extra-personal variations E . They project the face variations into the dual subspaces, intrapersonal and extra-personal subspaces.

i,j,k

Exchange the subscripts (i and j), we can see X wi wj wk (xi xj )(xi xk )T


i,j,k

wi wj wk (xj xi )(xj xk )T .

(13)

i,j,k

From Eq.(11), Eq.(12) and Eq.(13), X wi (xi )(xi )T = = X


i,j,k i

wi wj wk (xi xj )(xi xk )T

1X wi wj (xi xj )(xi xj )T . Proof End (14) 2 i,j Fig. 1. Face examples from ATT database

From Theorem 1, we can calculate the I and E by the following formulas: I =


C X

wi wj (xi xj )(xi xj )T
T

c=1 xi ,xj c

= T = = E

C X c=1

2Pc

X
xi c

wi (xi c )(xi c ) ,

(15)

X
i,j

wi wj (xi xj )(xi xj )T wi (xi )(xi )T , (16)

X
i

pre-processed using zero-mean-unit-variance operation and manually registered using the eye positions. All the images are normalized by the eye locations and are cropped to the size of 75 x 65. A mask template is used to remove the background and the hair. Histogram equalization is applied to the face images for photometric normalization. Two images for each person is randomly selected for training and the rest one is used for test. FERET Dataset 2 This dataset is a different subset of the FERET database. All the 1195 people from the FERET Fa/Fb data set are used in the experiment. There are two face images for each person. This dataset has no overlap between the training set and the galley/probe set according to the FERET protocol [9]. 500 people are randomly selected for training, and the remaining 695 people are used for testing. For each testing people, one face image is in the gallery and the other is for probe. All images are pre-processed by using the same method in FERET Dataset 1.

= T I , (17) X X wi xi where Pc = wi is the weight of class c , and c = Pc x x is the mean vector of class c . 4. EXPERIMENTS In this section, we apply our method to face recognition and compare it with the existing variant LDA methods and the other stateof-art face recognition approaches, such as PCA [7], PCA+LDA [3], NLDA [4], and Bayesian [6] approaches. All the experiments are repeated 5 times independently and the average results are calculated. 4.1. Datasets To evaluate the robustness of our method, we perform the experiments on three datasets from the popular ATT face database [8] and FERET face database [9]. The descriptions of the three datasets are below: ATT Dataset This dataset is the ATT face database (formerly The ORL Database of Faces), which contains 400 images (112 92) of 40 persons, 10 images per person. Each image is linearly stretched to the full range of pixel values of [0,255]. The set of the 10 images for each person is randomly partitioned into a training subset of 5 images and a test set of the other 5. The training set is then used to learn basis components, and the test set for evaluate. Fig.1 shows some face examples in this database. FERET Dataset 1 This dataset is a subset of the FERET database with 194 subjects only. Each subject has 3 images: (a) one taken under controlled lighting condition with a neutral expression; (b) one taken under the same lighting condition as above but with different facial expressions (mostly smiling); and (c) one taken under different lighting condition and mostly with a neutral expression. All images are
i c i c

4.2. Experimental Results Fig. 2 shows the rank-1 recognition rates with the different number of features on the three different datasets. It is shown that our method outperforms the other methods. Fig. 3 shows cumulative recognition rates on the three different datasets. From it, our method is still better than the other methods in the cumulative recognition rates. When dataset contains the changes of lighting condition (such as FERET Dataset 1, our method has obviously better performance than the others. Different from ATT dataset and FERET dataset 1, where the class labels involved in training and testing are the same, the FERER dataset 2 has no overlap between the training set and the galley/probe set according to the FERET protocol [9]. The ability of generalization from known subjects in the training set to unknown subjects in the gallery/probe set is needed for each method. Thus, the result on FERET dataset 2 is more convincing to evaluate the robust of each method. We can see that our method also gives the best performance than the other methods on FERET dataset 2. A major character, displayed by the experimental results, is that our method always has a stable and high recognition rates on the three different datasets, while the other methods have unstable performances.

0.9

0.9

0.8

0.9

0.7

0.8

0.8
0.6

0.7

Recognition rates

Recognition rates

0.7

0.6

PCA PCA+LDA NLDA Bayes ML Our method

Recognition rates

0.6

0.5

0.5

0.4

0.4

0.3

0.5
0.2

0.3

0.4
0.1

PCA PCA+LDA NLDA Bayes ML Our method


0 20 40 60 80 100 120 140 160 180 200

0.2

0.1

PCA PCA+LDA NLDA Bayes ML Our method


0 20 40 60 80 100 120 140 160 180 200

0.3

10

20

30

40

50

60

Number of features

Number of features

Number of features

Fig. 2. Rank-1 recognition rates with the different number of features on the three different datasets. (Left: ATT dataset; Middle: FERET dataset 1; Right: FERET dataset 2)

0.9
0.95 0.95

0.8

0.7

Recognition rates

Recognition rates

0.9

0.9

0.6

0.5

0.85

PCA PCA+LDA NLDA Bayes ML Our method

Recognition rates

0.85

0.4

0.8

0.8

PCA PCA+LDA NLDA Bayes ML Our method


1 2 3 4 5 6 7 8 9 10

0.3
0.75

0.2

PCA PCA+LDA NLDA Bayes ML Our method


1 2 3 4 5 6 7 8 9 10

0.75

0.1

10

0.7

Rank

Rank

Rank

Fig. 3. Cumulative recognition rates on the three different datasets. Left:ATT dataset(the number of features is 39; Middle:FERET dataset 1 (the number of features is 60); Right: FERET dataset 2 (the number of features is 60)

5. CONCLUSION In this paper, we proposed null space-based LDA with weighted dual personal subspaces (intra-personal and extra-personal subspace) for face recognition, which are nonparametric within-class and between-class scatter matrix. Our method does not assume that the density of face data belongs to any particular parametric family. We also give a fast computational approach to calculate weighted dual personal subspaces. Our experimental results on the three datasets from ATT and FERET face databases demonstrate that our method outperforms the existing variant LDA methods and the other state-of-art face recognition approaches. In the further works, we will extend our method to non-linear discriminant analysis with the kernel method. 6. REFERENCES [1] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, Face recognition: A literature survey, ACM Computing Surveys, vol. 35, no. 4, pp. 399458, 2003. [2] K. Fukunaga, Introduction to statistical pattern recognition, Academic Press, Boston, 2nd edition, 1990. [3] P.N. Belhumeur, J. Hespanda, and D. Kiregeman, Eigenfaces vs. Fisherfaces: Recognition using class specic linear pro-

jection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711720, 1997. [4] L. Chen, H. Liao, M. Ko, J. Lin, and G. Yu, A new LDAbased face recognition system which can solve the small sample size problem, Pattern Recognition, vol. 33, no. 10, pp. 17131726, 2000. [5] H. Yu and J. Yang, A direct LDA algorithm for highdimensional data with application to face recognition, Pattern Recognition, vol. 34, pp. 20672070, 2001. [6] B. Moghaddam, T. Jebara, and A. Pentland, Bayesian face recognition, Pattern Recognition, vol. 33, pp. 17711782, 2000. [7] M. Turk and A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 7186, 1991. [8] Ferdinando Samaria and Andy Harter, Parameterisation of a stochastic model for human face identication, in Proc. of 2nd IEEE Workshop on Applications of Computer Vision, 1994. [9] P.J. Phillips, H. Wechsler, J. Huang, and P. Rauss, The feret database and evaluation procedure for face recognition algorithms, Image and Vision Computing, vol. 16, no. 5, pp. 295 306, 1998.

S-ar putea să vă placă și