Sunteți pe pagina 1din 4

A COMPARISON OF DISCRETE AND CONTINUOUS OUTPUT MODELING TECHNIQUES FOR A PSEUDO-2D HIDDEN MARKOV MODEL FACE RECOGNITION SYSTEM

Frank Wallhoff, Stefan Eickeler , Gerhard Rigoll Department of Computer Science Faculty of Electrical Engineering Gerhard-Mercator-University Duisburg, Germany wallhoff,eickeler,rigoll @fb9-ti.uni-duisburg.de
ABSTRACT Face recognition has become an important topic within the eld of pattern recognition and computer vision. In this eld a number of different approaches to feature extraction, modeling and classication techniques have been tested. However, many questions concerning the optimal modeling techniques for high performance face recognition are still open. The face recognition system developed by our research group uses a Discrete Cosine Transform (DCT) combined with the use of Pseudo 2D Hidden Markov Models (P2DHMM). In the past our system used continuous probability density functions and was tested on a smaller database. This paper addresses the question if there is a major difference in recognition performance with discrete production probabilities compared to continuous ones. Therefore the system is tested using a larger subset of the FERET database. We will show that we are able to achieve higher recognition scores and an improvement concerning the computation speed by using discrete modeling techniques. 1. INTRODUCTION Face recognition is an important topic for security relevant systems. A number of systems with different approaches to preprocessing and classication have been presented in the literature [1, 2]. Our research group has developed a baseline face-recognition system which extracts the features with a DCT transform and models the face image with a P2DHMM. This recognition system makes use of continuous output probabilities and the performance of this system was reported as perfect (100%) on a rather smaller database (ORL-database by Olivetti and near perfect on the so called Bochum database) [3, 4, 5]. In this paper we use a fairly larger test set. The images are a subset of the known FERET-database provided by the Army Research Laboratories (ARL). Typically we use continuous HMMs with one Gaussian per state, due to the large number of states and
now with Cobion AG, Kassel, Germany

the limited training data for each face. However it turned out that discrete modeling techniques in conjunction with Hidden Markov Models (HMM) are efcient for large vocabulary speech recognition (LVCSR) and even in the eld of handwriting applications. Some examples of the successful work with discrete modeling techniques are given in the literature [6, 7, 8]. Thus it seems promising to test this technology also for our face recognizer. The advantage of discrete systems is the higher computation speed and smaller model size compared to continuous HMMs. The goal of the actual work is to examine if there is a major difference between a continuous and discrete output probability modeling technique regarding the recognition performance of our face recognizer. In the next section the subset of the database used and the preprocessing steps are briey introduced. Then the P2DHMM-system with continuous and discrete modeling techniques is presented. The paper closes with the presentation of the achieved recognition rates and a conclusion. 2. PREPROCESSING AND FEATURE EXTRACTION For the task to test and compare different face recognition systems, the ARL has decided to provide the FERET database [9]. This database consists of sets of face images of several hundred people. To test our system we decided to use two galleries of images. The rst gallery is the training set and the second one is the test set. From among all the pictures for a single person we chose to train the models with images of frontal views that are labeled as fa. The test set consists of the corresponding images of the individuals with a slight facial expression taken at the same day labeled as fb. The training and test set then consists of 321 people. To prepare the images for modeling with the Pseudo-2 Dimensional Hidden Markov Models (P2DHMM) we used a semi-automated feature extraction that starts with the manual labeling of the eye- and mouth- center coordinates. The next step is the automatic rotation of the original images

so that a line through the eyes is horizontal. See (a) and (b) in gure 1. After this, the resulting image is cropped with a space of the half eye-to-eye distance on the left and the right eye to the border in the horizontal direction. The height is given by the difference between the eye and mouth coordinates plus one and a half times of the eye-to-eye distance in the vertical direction (c). The nal step of preprocessing is the normalization of the size. The images are re-sized to the smallest image among the resulting images.

 ,ABCJLK  Q 

for M

?NO:P

(2)

R#S?T#R

3. FACE RECOGNITION USING P2DHMM For the classication process the following maximumlikelihood decision is used.

(c)

64

96 (a) (b) (d)

Fig. 1. Preprocessing of the face images After the preprocessing all images in the training and test set are aligned in the same way and consist of pixels. The subsequent feature extraction, then consists of a rectangular windowed 2 dimensional Discrete Cosine Transform (DCT). The blocksize (BS) of the windows used is given by the value
. The sampling window is moved in the vertical direction rst, then in the horizontal direction. With each displacement the window is moved not by a full window size but rather one fourth of the window size so that an overlap of 75% to the previous window arises. To preserve the 2 dimensional structure of the images, a special marker is inserted at the beginning of each row. For the computation of the DCT-coefcients for one window yields:

,\^] _` argmax (3) XZY DB [ \ In this formula, _ is the feature vector sequence of an unknown image and represents one P2DHMM of an individual contained in the database. The system recognizes the image as _ belonging to the individual whose corresponding model yields the highest production proba ,\^] _`  bility . \^] _` To[ solve this equation the values of for all ,\^ ] _`  [ models have to be computed. The probability of _ [ follow(given a certain model ) can be computed by the ing sum over all possible state-sequences a [10]. gf+ ]  h f+ Q ] Q k>k>klh g a a b c;b6ijd 3 bmCnocbm   all b (4) The computation of this equation can be further simplied using the Forward-algorithm [11]. The possible state sequences for a P2DHMM are given by the state-transition r matrices p q . A representation for the state-transition H typical H P2DHMM is given in the following gsequence of a ure. The states marked with a cross represent the so called marker-states that are necessary to preserve the 2 dimensional structure of the images. [ bced
S Start
1

UWV 

\^] _`

<

aS S 4 5 S1 aS S 1 2 S5 aS S 5 6 S6

aS S 8 9 S9 aS S 9 10 S 10 aS S 10 10 End

  %&

,+-./

S2

aS S 2 2

aS S 6 6

!#"$ '("$*) (1) 546+87:9#;=< ?46.@7:9#;3< 0(132 0>132 4 4



,ABC It can be considered that the coefcients of  the transform above are arranged in a matrix indexed by and  . A triangular shaped mask extracts the D rst ten coef7ECGFIH cients of this matrix where the condition is satised. After the transformation the parameters of the statistical P2DHMM have to be trained with sequences of the computed coefcients. To derive the model parameters more robustly out of just one single original image, the coefcients of the ipped and by 2 pixels horizontally and vertically displaced images are additionally computed. The  ,A BC factors in equation 1 can take the following values:

aS S 2 3 S3 aS S 3 3

aS S 6 7 S7 aS S 7 7

aS S 10 11 S 11 aS S 11 11

aS S 4 1 aS S 3 4 S4 aS S 4 4

aS S 8 5 aS S 7 8 S8 aS S 8 8

aS S 12 9 a S11 S12 S 12 aS S 12 12

Fig. 2. State-Transitions of a

P2DHMM

In the training phase of the recognition system all the r < ,\^] _` parameters and p q for each of the 321 modb c, _ [ computed. This is done using the feature els have to be sequences from the training images and several well-known techniques presented in [10] and [11]. The difference between discrete and continuous HMMs is given by the \^ ] _`  use of different output production probabilities , which [ are explained in the next sections.

3.1. Continuous Output Modeling A continuous modeling technique consists of the use of multivariate weighted Gaussian mixtures. The probability s v f+ u u for a t -dimensional observation in a certain state a is fw given by the corresponding mean-vector and the covariance matrix x . 9 c !#~ 6 mnoc |;!# sy g= f+ ulCw f   } ~  6 } z 54<-{-] ]  i|;}  } x (5) x R The complete output probability can be approximated by a weighted sum (factor * ) of possibly several Gaussian u mixtures. The number of mixtures used depends on the amount of training material available.

4.1. Results Using Continuous Outputs


BS=8 P2DHMM on Rank 3 96.88 97.20 97.51 96.88

1 94.70 95.33 95.02 95.95

10 98.13 97.81 97.81 96.88

Table 1. % correct for blocksize 8


BS=16 P2DHMM on Rank 3 96.88 96.57 96.26 96.88

gf+ u ] u J a d "

~ 

1 94.70 95.33 94.70 94.70

10 98.75 98.44 98.13 98.13

sy gf+ u w f   x *

(6)

Table 2. % correct for blocksize 16 A comparison of the performance of the systems tested reveals that the system using the smaller blocksize achieves slight higher recognition scores. The best performance was 3 k o3 . In the following experiments the inuence of the number of used mixtures, , is examined with P2DHMMs.
Blocksize BS=8 BS=16 1 95.33 95.33 Mixtures 2 95.02 93.77 3 95.02 92.11

3.2. Discrete Output Modeling In spite of the continuous modeling techniques, the discrete output consists #f+not but of a set of discrete  u ] _ of  a u Gaussian a probabilities that are restricted to a certain d f+ u u state a _ , input vector and a corresponding codebook entry (CB) . The codebook is typically obtained by k-means clustering of all available training data feature vectors but can also be obtained by more sophisticated = f+ u u procedures (see [8]). The probability of seeing in state a then becomes: 9 f+ u _O #= f+ u] u? g= f+ u] _  ue K for in a (7) a P k d d

R S?TgR The number of different probabilities is called codebook size. In the following gure the difference between discrete and continuous output probability functions is depicted.
p(x|Si ) p(x|Si )

Table 3. % correct for several mixtures Although it was expected that the performance would improve, the score decreased. The reason for this might be the fact, that in contrast to the experiments presented in [4] there was just one original training example per individual. 4.2. Results Using Discrete Outputs

wk x Mk x

Fig. 3. Continuous and discrete output probabilities 4. EXPERIMENTS AND RESULTS The recognition rates for several parameter-sets were performed. The size of the sliding-window for the feature extraction (
9 parameter of the DCT-blocksize) was chosen to 9 and pixels (BS). The number of states of the P2DHMMs used (excluding marker-states) was varied from up to y . The measured recognition scores of the systems tested are listed in the following tables divided by continuous and discrete output probabilities. The cumulative match score on rank 3 and 10 are also listed.

In the following tables the results of systems with codebook sizes of 300 and 1000 are examined. Once again P2DHMMs with y to D states and sliding windows with a size of 8 and 16 pixels are tested. Once again it turned out that the systems using smaller windows outperform systems using the larger ones. The 3 k9gH highest score is up to . This means that only 6 individuals of the 321 tested are not recognized correctly. By comparing the results of the discrete and continuous systems it becomes clear that the discrete systems outperform thek9#continuous systems. The highest recognition rate of H was measured. The computation time while classing the discrete models was just about half the time of the continuous ones. We believe that these favorable results for discrete HMMs represent a quite surprising and novel fact

BS=8 P2DHMM SIZE

 *

on rank CB=300 1 3 10 94.39 97.20 98.44 95.64 98.13 98.75 96.74 98.13 98.75 96.26 98.14 98.75

on rank (CB=1000) 1 3 10 97.51 98.13 99.06 97.82 98.13 99.06 97.82 98.44 99.09 98.13 98.44 99.37

[2] L. Wiskott, J.-M. Fellous, N. Kr uger, and C. von der Malsburg, Face Recognition by Elastic Bunch Graph Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 775779, July 1997. [3] S. Eickeler, S. M uller, and G. Rigoll, High Quality Face Recognition in JPEG Compressed Images, in IEEE Int. Conference on Image Processing (ICIP), Kobe, Japan, Oct. 1999, pp. 672676. [4] S. Eickeler, S. M uller, and G. Rigoll, Recognition of JPEG Compressed Face Images Based on Statistical Methods, Image and Vision Computing Journal, Special Issue on Facial Image Analysis, vol. 18, no. 4, pp. 279287, Mar. 2000. [5] S. Eickeler, M. Jabs, and G. Rigoll, Comparison of Condence Measures for Face Recognition, in Conference on Automatic Face and Gesture Recognition, Grenoble, France, Mar. 2000, pp. 257262. [6] G. Rigoll, A. Kosmala, J. Rottland, and C. Neukirchen, A Comparison Between Continuous and Discrete Density Hidden Markov Models for Cursive Handwriting Recognition, in Int. Conference on Pattern Recognition (ICPR), Vienna, Aug. 1996, vol. 2, pp. 205209. [7] C. Neukirchen and G. Rigoll, Ein systematischer Vergleich von diskreten, kontinuierlichen und hybriden HMM-basierten Systemen zur Spracherkennung, in Elektronische Sprachsignalverarbeitung, Tagungsband, Berlin, Oct. 1994, pp. 8389. [8] Cristoph Neukirchen and Gerhard Rigoll, Advanced Training Methods and New Network Topologies for Hybrid MMI-Connectionist/HMM Speech Recognition Systems, in IEEE Int. Conference on Acoustics, Speech, and Signal Processing (ICASSP), Munich, Apr. 1997, pp. 32573260. [9] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, The FERET Evaluation Methodology for Face-Recognition Algorithms, NISTIR 6264, Oct. 1999. [10] L. R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, vol. 77, no. 2, pp. 257285, Feb. 1989. [11] F. Jelinek, Statistical methods for speech recognition, MIT press, 1997. [12] H.A. Rowley, S. Baluja, and T. Kanade, Neural network-based face detection, in PAMI, Jan. 1998.

Table 4. % correct for blocksize 8


BS=16 P2DHMM on rank CB=300 1 3 10 95.64 97.51 98.13 96.57 98.13 98.13 96.88 98.13 98.13 96.26 98.13 98.13 on rank (CB=1000) 1 3 10 97.51 98.13 98.44 97.20 98.13 98.44 97.20 98.13 98.44 97.51 98.44 98.75

 *

Table 5. % correct for blocksize 16

in the eld of face recognition and may encourage other researchers to pick up the suggested methods. 5. CONCLUSIONS AND OUTLOOK We have presented a face recognition system for frontal views that makes use of DCT feature extraction and statistical P2DHMM modeling techniques using continuous and discrete output production probabilities. As the results above show, using smaller sliding windows concerning the feature extraction achieves higher recognition rates. The best system tested achieved a recog3 k9gH nition score of for the fb versus fa test of 321 individuals from the FERET database using discrete output probabilities. Although this measured result is good compared to other existing systems [9] there is a little room for a possible improvement. Beside harder tests using the duplicate images provided in the FERET database we will use fully automated preprocessing methods to nd the eye- and mouth-coordinates presented in [12]. In addition to this we also plan to focus on the use of discrete P2DHMM by evaluating possible improvements through experiments with MMI-vector quantizers as introduced in [8]. 6. ACKNOWLEDGMENTS The authors would like to thank Mr. Jonathan Phillips for providing us with the subset of the FERET-database. 7. REFERENCES [1] M. Turk and A. Pentland, Face Recognition using Eigenfaces, in Conference on Computer Vision and Pattern Recognition, June 1991, pp. 586591.

S-ar putea să vă placă și