Sunteți pe pagina 1din 8

Robust Face Recognition with Light Compensation

Yea-ShuanHuangl, Yao-Hong ~ s a i ' Jun-Wei


, shieh2

'computer & Communications Research Laboratories


Industrial Technology and Research Institute, Taiwan
yeashuan@itri.org.tw
2Depxtment of Electrical Engineering, YuanZe University

Abstract. Tbis paper proposes a face recopition method which is based on a


Generalized Probabilistic Descent (GPD) lear-g rule with a three-layer feed-
forward network. Tbis method aims to r e c o m e faces in a loosely controlled
sweillance environment, which allows (1) large face image rotation (on and
out of image plane), (2) different backgrounds, and (3) different illmimtion.
Besides, a novel light compensation approach is desiped to compensate the
gray-level differences resulted from different lighting conditions. Experiments
for three kinds of classifiers (LVQ2, BP, and GPD) have been performed on a
ITRI face database. GPD with the proposed light compensation approach
displays the best recopition accuracy among all possible combmation.

1 Introduction

Due to the rapid advance of computer hardware and the continuous progress of
computer software, we are looking forward to developing more powerful and kiendly
computer use models so that computer can serve people in a more active and
intelligent way. To this end, the computer naturally needs to have a surveillance
ability, which enables it to detect, track, and recognize its surrounding people. This
results in the situation that researches on face processing (including detection [I-21,
tracking [3], and recognition [4-71) are very prosperous in the last two decades. This
paper mainly discusses our research effort on the face recognition (FR) issue.
The objective of our research is to develop a FR classifier which can recognize
faces in a loosely controlled surveillance environment. This means the desired FR
classifier can deal with the faces having different rotations, illumination, and
backgrounds. Our approach is to train a three-layer feed-forward network by using a
Generalized Probabilistic Descent (GPD) learning rule. GPD is originally proposed by
Juang [8] to train a speech classifier, which is reported to have a much better
recognition performance than the well-known Back-Propagation (BP) [9] training.
However, to our best knowledge, GPD is rarely or even never used by the computer-
vision community. Because GPD is based on minimizing a classification related error
function, it theoretically can produce a better classification performance than the
classifiers (such as BP) based on minimizing a least-mean-square error. Furthermore,
to make FR insensitive to the illumination variation, a novel light compensation

H - Y . Shum,
H.-Y. Shum, M. Liao, and S.-F.
S - F . Chang (Eds.): PCM 2001, LNCS 2195, pp. 237–244,
237-244, 2001.

@ Heidelberg 2001
c Springer-Verlag Berlin Heidelberg
238 Y.-S. J.-W. Shieh
Y . 3 . Huang, Y.-H. Tsai, and J.-W.

method is proposed to compensate the image variation resulted from the lighting
factor.
This paper consists of five sections. Section 2 describes the used error function
and the derived GPD weights updating rule. Section 3 describes the proposed light
compensation method which can reduce the image difference occurred from the
illumination variation. Section 4 specifies the ITRI (Industrial Technology and
Research Institute) face database and its construction guidelines. Section 5 then
performs several experiments and makes a performance comparison among LVQ2,
BP, and GPD. Finally, Section 6 draws our conclusions and point out the future
research directions.

2 A Generalized Probabilistic Descent Learning

The key idea in GPD formulation is the incorporation of a smooth classification-


related error function into the gradient search optimization objective. In general, GPD
can be applied to train various kinds of classifiers. Here, a three-layer feed-forward
network is used to serve the classifier architecture.
Assume there are K persons in the concerned pattern domain, C, ,..., C, , and the
feature of an observed face is x = (x,,...,x, ) , where n is the feature dimension. Let
g, (x) be a discrimination function indicating the degree to which patternx belongs to
person i . In general, a pattern x is classified to be person M if g, (x) is the
largest value among { g , ( x ) 1 < i < K} , that is M = argmaxg, (x) . In order to
I<i<K

derive the optimized {gX( x ) 1 <i <K ] which produce the best recognition
accuracy, a training stage must be performed. Of course, a set of face training samples
should be collected beforehand. During training, let T and M denote respectively
the genuine person index and the person index having the largest discrimination
function of pattern x . To simplify notation, g, denotes g,(x) . An error
functionE (x) is defined to specify the possible classification error under the current
classifier as

where A ( n ) is a bandwidth parameter in iteration n governing the size of the active


area of the sigmoid function, and 0 < A ( n + 1 ) < A ( n ) . When g, is smaller than
g, , E ( x ) becomes large, and when g, is equal to g, , E ( x ) becomes 0.5 which
is the smallest value among all possible E ( x ) . In fact, the more negative the value of
Robust Face Recognition with Light Compensation 239

g, - g, is, the larger E ( x ) will be. This is a desired property because the small
value of g, -g
, indicates the poor classification ability of the trained classifier.
Therefore, the defined E ( x ) is appropriate to serve as an m r function.
Consequently, to minimize E ( x ) corresponds to derive a better classifier.
A three-layer network is used to classify an input face panem, which consists of the
input, hidden, and output layers. I, denotes the i th node of the input layer which
contains the value of the i th feature element of an input x , 0, is the output value of
the j th node of the hidden layer, and 0, is the output value of the k th node of the
output layer. In this network, gk means 0, ,and

1
E ( x )= 4-0,
+
1 e '(")
where n and m are the node numbers of the input and hidden layers respectively,
Wg is the connection weight between the i th node of the input layer and the j th
node of the hidden layer, and W,>is the connection weight between the j th node of
the hidden layer and the k th node of the output layer. Therefore, to minimize E ( x )
corresponds to derive the optimized y7and Wjk. Accordiig to the generalized delta
rnle, W;i and W,%can be updated as follows:

where a(n)is a positive and monotonically-decreased learning rate. By using the


chain rnle with simple mathematical operations, it is easily to derive
240 Y.-S. J.-W. Shieh
Y . 3 . Huang, Y.-H. Tsai, and J.-W.

3 Light compensation for Face Images

It is well known that the image colors (or gray levels) are very sensitive to the lighting
variation. A same object with different illumination may produce considerably
different color or gray-level images. The first row of Fig. 1 shows several face color
images which are taken from different lighting conditions. In general, it is difficult to
produce a good classification accuracy if face samples in the training and testing sets
are taken from d i f f m t lighting conditions. Therefore, a light compensation
preprocessing is essential, which can reduce the image difference resulted from
illumination variation to the minimum.
A novel light compensation method with a six-step processing procedure is
proposed as follows:
Step 1: transform each face image pixel F(m,n) from the RGB space to
the YC,C, space, where 1 I m I H and 1 I n I V (H is the pixel width of the face
image and V is the pixel height of the face image) ,and let P(m,n) be the gray level of
mvo.
Step 2: mark each F(m,n) to be a skin-color pixel if it satisfies conditions defined in
[lo].
Step 3: approximate an equation Q(x,y) to all marked skin-color pixel's gray levels
P(m,n). Here, Q(x,y) is designed to be a second-rder equation. That is
P(x9 Y ) = ecs
Y)
=ax2+bV+cY2+dx+ey+f.
Parameters a, b, c, d, e, and f can be derived from the least-mean square error between
P(m,n) and Q(m,n) for all marked skin-color pixels. The derived Q(x,y) indeed
approximates the background lighting distribution for the current face image.
Step 4: subtract Q(m,n) from P(m,n) to derive a subtracted image ~ ' ( mn), , that is

Step 5: compute the average gray level p and the standard deviation d from the
marked skin-color subtracted pixels. That is

where N is the total number of pixels marked as skin color.


Step 6: normalize each pixel's gray level p9(m,n) to generate a new value by the
following equation
Robust Face Recognition with Light Compensation 241

where β is a scale factor which purpose is to make the range of the transformed gray
levels between 0 and 255. It is worthwhile to mention that the transformed faces have
an average gray level close to 128 and have similar gray-level standard deviations.
Before applying the proposed light compensation method, the face portion of image
should be extracted from the whole image so that the skin-color-pixel detection can
focus only on the face image. Fig. 1 shows respectively some face images (first row),
skin-color-pixel images where white color denotes skin color and black color denotes
non-skin color (second row), background lighting distributions (third row), subtracted
images (fourth row), and normalized images (fifth row) by using the proposed light
compensation method. From inspection, the final normalized images have more
similar gray-level distributions so that a face classifier can learn the truly personal
distinct characteristics..

4 Face Database Construction

Because we aim to recognize faces under a loosely controlled surveillance


environment, it is important to consider many variation factors when collecting the
face database, which includes
(1) Face rotation : Each person was asked to look at 14 marked points attached on
walls, every two neighboring points approximately form a 10-degree viewing angle to
this person.
(2) Background : Pictures are taken at two diagonal corners of an office room. The
two corners look quite differently to each other; one is near windows which is a more
homogenous background, and the other is near a door with a cluttered desk.
(3) Illumination : Since one background is close to window, it accepts sort of day
light. Therefore, the taken images are brighter than those taken from the other
background. Also, because the day light is not always normal to the people’s faces,
some face images present one side brighter than the other side.
There are 46 persons involved in our face database whose pictures are display on Fig
4. Each person was arranged to stand sequentially at two corners of one office room,
and at each corner he/she was asked to rotate his/her head for about every 10 degrees
by looking at 14 predefined wall marks. For each head rotation, a video camera (with
resolution 320*240) is used to take pictures. Therefore, there are 46*14 pictures in
total. Fig. 2 shows the taken images of one particular person.

5 Experiment Results

Currently, the face images are extracted from the original whole images manually. In
general, an extracted face image is the rectangular portion of image which contains
eyebrows, eyes, nose, and mouth. Since the statistically-based classification
approaches generally require a lot of training samples, it is important to increase the
sample number of database images. However, it is expensive to collect the real face
images. So, we increase the face images by two ways. The first one is to produce a
242 Y.-S. Huang, Y.-H. Tsai, and J.-W. Shieh

mirror image for each photographed image, and the second one is to generate virtual
images by rotating and shifting the original images
Each extracted face image is first processed by applying the proposed light
compensation method and accordingly a compensated image G is generated. Then, G
is normalized into a 25*25 image which is further mapped to a 40-LDA-eigen bases.
Therefore, the total feature dimension is 40 and the three-layer network architecture is
set to be 40 (input)-200(hidden)-46(output) here. The odd-numbered samples of the
ITRI face data base are used to train the network, and the even-numbered ones of the
same data base are used to test the trained network.
Three classification methods (LVQ2, BP, GPD) are implemented to make a
performance comparison. LVQ2 [11] can construct the appropriate face prototypes of
each person, and it classifies an input face pattern x to the person having the nearest
prototype to x . Here, three prototypes for each person were constructed. Both BP and
GPD use the same network architecture (40-200-46). For manifesting the effectiveness
of the proposed light compensation method (LC), all the three classifiers are trained
and tested both with and without CP. Table 1 displays the recognition results to the
test database. From inspection, it obviously shows that (1) GPD performs better than
the other two classifiers (BP and LVQ2) which confirms the efficiency of the
proposed error function, and (2) the proposed light compensation method is effective
because all the three classifiers with LC achieve higher recognition accuracy than their
individual performance without LC.

6 Conclusions

Due to the huge amount of applications, face recognition technology has attracted a lot
of research efforts. This paper proposes a GPD training approach based on a three-
layer feed-forward network to recognize faces. GPD has been applied in speech
processing, but it is rarely mentioned by the computer-vision community. From
experiments. It shows GPD can produce a better performance than the well-known BP
and LVQ2. A novel light compensation method is also described, which can
compensate effectively the variation of the face images occurred from the illumination
variation, and produce uniform-light gray-level face images with similar average gray
level and standard deviation.
The experiment results, however, shows that even the GPD approach cannot have a
secure enough recognition performance when dealing with face samples taken with
several kinds of variations. It is necessary to design other efficient face features which
are insensitive to camera variation. One feasible approach is to extract the individual
face components, such as eyes, nose, and mouth, and classification is based on the
total summation of individual component matching scores. Also, currently the face
images are extracted manually. We are working on the face detection and extraction
algorithms, which will further be integrated with the face recognition so that an
automatic face detection and recognition system can be implemented in the near
future.
Robust Face Recognition with Light Compensation
Compensation 243

Acknowledgements
This work is an outcome of the NetworkGeneration HumdComputer Interface
Project (project code: 3XSlBI I) supported by Ministry of Economic Administration
(MOEA), Taiwan, R.O.C.

References
I. K.K. sung and T. Poggio, "Erample-Baed Learnlng for view-Baed Human Face
Detection." IEEE Tmns. Pan. Anal. Machine Intell.. Vol. 20, pp. 39-51. 1998.
2. H. k Rowley, S. Baluja. and T. Kanade. "Newdl network-bawd face detection." IEEE
rransactions on PAMI.. vol. 20. no. I, pp. 22-38. Jan. 1998.
3. D. M. Cavrila ."The visual analysis of human movement: a survey." Computer Vision and
Image Understanding. vol. 73. pp. 82-98. 1999.
4. M. Turk and A. Pmtland, '%ihenfaces for Recognition", J o ~ dof Cognitive
Neuroscience. March, 1991.
5. R Brunelli and T. Poggio, "Face Recognition: Feamres Versus Templates", lEEE Trans.
Pan. Anal. Machine Intell. Vol. 15. No. 10, October. pp 1042-1052. 1993.
6. R. Chellappa, C. Wilson and S. Sirohey, "Ilunron and Machine Recognition of Faces: A
w ' . Of lEEE. Vol. 83, No. 5. May. pp 705-740, 1995.
S ~ u n Pmc.
7. A.K. Jain, R. Bolle and S. Pankanti, Riometrics: Per.wnol ldenti/icafion in Nemrked
Society, Kluwer Academic Publishers, 1999.
8. B.H. Juang and S. Karagiri. "Di.wriminotiw Leorningfor Minimum Ermr Classi/ication",
IEEE Trans. On Signal Pnxssins. Vol. 40. No. 12, Ueccmber, pp 3043-3054. 1992.
9. B. Widrow and R. Winter. "Neuml Nets for Adoptive Filtering and Adaptive Ponern
Recognition", Computer. Vol. 21. No. 3. pp. 25-39, March. 1988.
10. C. Garcia and G. Tziritas, '%ace Detection Using Quantized Skin Color Regiom Meging
and wrrvelet Pocket Analysis". lEEE Trans. On Multimedia Vol. I No. 3, pp 264277,
1999.'
11. T. Kohonm "Theself-Oganizalion Map". Roc. OfIEEE, 78:1468-1680,1990.

Table I. The test recognition rates of three face classifim with


and without light compensation

I Approach I Recognition Rate I


244
244 Y.-S.Huang,
Y.-S. Huang,Y.-H. Tsai,and
Y.-H.Tsai, JLW.Shieh
andJ.-W. Shieh

Figurc 1. This figure shows respectively some face images (first row), skin-
color-pixel images (second row), background lighting distributions (third
row), subtracted images (fourth row), and normalized images (fifth row) b y
using the proposed light compensation method.

Figure 2 Pictures taken from different illumination and head rotation angles by
a digit camera, the image size of each picture is 320*240.

S-ar putea să vă placă și