Documente Academic
Documente Profesional
Documente Cultură
975
cated, the constructed VTM based on these regressive rela- where constant C determines the trade-off between the flat-
tionships can efficiently transform gait feature from source ness of f and the amount up to which deviations larger than
viewing angle to target viewing angle. are tolerated.
As a regression tool, SVR is chosen for the proposed ap- The optimization problem (2) can be solved more easily
proach because of the following advantages [11]. First, it in its dual formulation. Thus, a standard dualization method
features in good generalization performance. This is core utilizing Lagrange multipliers is used and yields the dual
requirement for most regression application including this optimization problem as:
research. Secondly, its representation is sparse. This is be- ( Pk
− 21 i,j=1 (αi − αi∗ )(αj − αj∗ ) < xi , xj >
cause a regression model obtained by SVR depends only maximize Pk Pk
on a subspace of the training dataset. Therefore, dimension − i=1 (αi + αi∗ ) + i=1 yi (αi − αi∗ )
Pk ∗ ∗
i=1 (αi − αi ) = 0, αi , αi ∈ [0, C]
of VTM is sparse and controllable. Thirdly, unlike some subject to (3)
other regression tools such as neural network, SVR does
not have a local minimum problem. Its solution is unique where αi , αi∗ ≥ 0 are Lagrange multipliers.
and globally optimal. Hence, we can obtain a global op- The support vector expansion also can be obtained where
timized version of VTM based on the supplied context of the w can be completely described as a linear combination
training data. Fourthly, SVR is a kernel-based regression of the training pattern x as:
Pk
technique. Thus, it allows the system to work with arbitrary w = i=1 (αi − αi∗ )xi (4)
large feature space, not limited to just input space by using
Moreover, the complete algorithm can be described in
linear kernel.
terms of dot products between the data as:
2.1.2 SVR Pk
f (x) = i=1 (αi − αi∗ ) < xi , x > + b (5)
SVR concept is briefly explained in the followings,
more details can be referred to [13]. Given the data The concept described above can be considered as a lin-
as {Si : (xi , yi ) |xi ∈ X, yi ∈ R, i = 1...k}, where X de- ear kernel-based SVR. To apply a non-linear kernel to SVR,
notes the space of input patterns and k is the number of in equations (3) and (5), the dot products < xi , xj > for
sample. Regression equation f (x) for describing the case linear kernel are replaced with alternative kernel k(xi , xj ).
of linear function is defined as: Two non-linear kernels [1] are used in this study. The first
kernel is polynomial of degree d which can be defined as:
f (x) = < w, x > + b (1) polynomial
Kd,s,k (xi , xj ) = (s < xi , xj > + k)d (6)
where w ∈ X, b ∈ R, and < ·, · > denotes the dot product
in X. The degree of the polynomial kernel controls the flexi-
bility of the resulting regression models. The lowest degree
polynomial is the linear kernel, which is not sufficient when
a nonlinear relationship between features exists. The sec-
ond kernel that is widely used is the Gaussian or Radius
Basic Function (RBF). RBF is defined as:
KσRBF (xi , xj ) = exp(− σ1 ||xi − xj ||2 ) (7)
The σ > 0 is a parameter that controls width of Gaussian,
Figure 1. The soft margin loss setting for SVR which plays a similar role as degree of polynomial kernel.
σ in the Gaussian kernel and d in the polynomial kernel
In -SVR [13] as shown in Figure 1, the goal is to find a determine the flexibility of the produced SVR in fitting the
function f (x) that satisfies the following three fundamental data. Larger d or smaller σ may lead to over-fitting.
aspects. First, error from the difference between observed
target yi and predicted value f (xi ) is disregarded as long 2.2. SVD based VTM construction and its limitation
as it is less than . Secondly, the most possible flatness For any given matrix A ∈ Rn×m , it has a decomposition
is attempted by minimizing the norm ||w||2 =< w, w >. A = U SV T such that U is an (n × n) orthogonal matrix
Thirdly, the soft margin loss function is allowed. One can called Left Singular Vectors of A, S is an (n × m) matrix
introduce slack variables ξ, ξ ∗ to cope with data points that with non-negative diagonal entries which are singular val-
lies outside the absolute regions. Hence, − SV R arrives ues of A, and V is an (m × m) orthogonal matrix called
at the objective formulation as: Right Singular Vectors of A. SVD can perform factoriza-
1 2
Pk ∗
minimize 2 ||w|| + C i=1 (ξi + ξi )
tion of matrix A. The diagonal values of S are the square
yi − < w, xi > − b ≤ + ξi roots of the Eigenvalues of AT A and AAT . Consequently,
subject to < w, xi > + b − yi ≤ + ξi∗ (2) the left singular vectors are eigenvectors of AT A and the
ξi , ξi∗ ≥ 0 right singular vectors are eigenvectors of AAT .
976
To adopt SVD for VTM construction [7], [10], [14], the Then the proposed approach captures the gait informa-
first step is to build gait matrix A. In the matrix, each row tion in spatial domain through Gait Energy Image (GEI) [7].
contains gait information from same viewing angle of the Figure 2 illustrates the example of GEI under various view-
different subjects and each column contains gait informa- ing angles of three different subjects.
tion from same subject under the different angles. Then
SVD can factorize the gait matrix A into view-independent
sub-matrix and subject-independent sub-matrix. Let gθki be
a gait signature of subject k under viewing angle θi . The
factorization process by SVD is as:
gθ11 ... gθK1
Pθ1
. . . .
= U SV T = .
Figure 2. Spatial-domain GEI under various viewing angles
v 1 ... v K
. . . (8)
. . . .
Figure 2 shows that the appearance-based GEI varies on
gθ1 ... gθK PθI different viewing angles. So it is not efficient to directly
I I
A vector v k is an intrinsic gait feature of the k th subject measure the similarity between two GEIs under different
for any viewing angle. Pθi is a projection matrix which can views. Thus, in the proposed approach, a constructed VTM
project intrinsic vector v of any subject to the gait feature is required to transform a GEI to be under the view as same
vector under a specific viewing angle θi . Thus, gait feature as another GEI before the similarity is measured.
can be written in factorized form as:
gθki = Pθi v k (9)
4. View transformation model construction
Let V T Mθi →θj denotes a view transformation model
The subject-independent matrix P is used as a VTM in that is used to transform GEI from view θi to θj , gθki denotes
common for any subject. For example, gait feature transfor-
a GEI of subject k under view θi , pkθi denotes pth pixel of
mation from viewing angle θi to θj is obtained by:
gθki , and ROIpθki denotes a region of interest or group of
gθkj = Pθj Pθ+i gθki (10) θj
977
nected pixels. In the second step of ROI selection, we aim on training data.
to locate all these pixels in the neighboring region of ṕ. To obtain the two horizontal cuts, GEI is first projected
on major axis to create a histogram. Then, smooth the his-
4.1. Estimation of ṕ
togram by average filter. Example results are shown in Fig-
The pixel ṕ is not always located at the same coordi-
ure 4.
nate as the pixel p. This is because GEIs under two views
(gθi , gθj ) may have different 2D display structures. The
GEIs are classified into four categories according to the
walking directions of subjects (see Figure 3). Moreover,
in this study, a GEI is divided into six independent areas by
one vertical cut and two horizontal cuts as shown in Figure
Figure 4. Projection histograms for body part segmentation process. Blue
3. These three cuts are not trivial, but generated from mo- line is a border between head and upper body and red line is a border
tion information and geometric distribution of human body between upper body and lower body.
parts. A border between head and upper body is at the first sad-
dle point from top to bottom or from head to feet. A border
between upper body and lower body is claimed to be at the
hip which is approximately located at the peak position of
the histogram based on the observation (see Figure 4).
As the result from one vertical cut and two horizontal
cuts, a GEI is divided into six areas. The next step is to pro-
cess the area matching between gθi and gθj . For vertical cut,
the sides between two views have to be correctly matched
Figure 3. The top row shows the four possible walking direction types.
in terms of a walking direction. According to Figure 3,
The bottom row shows examples GEI displays of the four types. In the if a view transformation is conducted between/inside the
top row, dash arrow is a walking direction, a gray vertical line is an optical first two cases, “Left” side must match with “Left” side and
axis, and a green horizontal line is an image plane. In the bottom row, a “Right” side must match with “Right” side. If a view trans-
blue vertical line is used for side segmentation and red horizontal lines are
used for body part segmentation.
formation is conducted between/inside the last two cases,
“Front” side must match with “Front” side and “Back” side
A GEI is divided into two sides by a vertical cut. The must match with “Back” side. For horizontal cuts, body part
vertical cut, called major axis, is calculated from eigenvec- areas between two views have to be correctly matched. For
tor of covariance matrix of the tracked silhouette [9]. For example, upper body area of gθi has to be simply matched
each camera viewing angle, major axes calculated from dif- with upper body area of gθj .
ferent sample silhouettes can be slightly different. Thus, the
average (in terms of the axis’s slope) is used as a represen-
tative major axis of GEI under the camera viewing angle.
Four possible 2D displays of walking manners regarding
of vertical cut are shown in Figure 3. Each GEI must belong
to one of these four manners. The first two cases usually
occur when a person walks within ±18◦ away from camera Figure 5. ROI selection: p is a target pixel under view θj , ṕ is an estimated
position under view θi of p, and yellow area is the candidate pixels for
optical axis. Major axis divides a GEI into “Left” side and ROIp
“Right” side. The first case is walking away from a camera. Let Ai is an area from view θi that is likely to contain a
The second case is walking toward a camera. The last two pixel p from area Aj of view θj . For example as shown in
cases occur when a person walks at angle larger than ±18◦ Figure 5, if p is in the front-lower body area under view θj ,
away from camera optical axis. Major axis divides a GEI then ṕ must also be in the front-lower body area but under
into “Front” side and “Back” side. The third case is walking view θi . The area Ai and area Aj may contain different size
from left side to right side of a camera, while the fourth case and shape because they are captured under different views
is walking from right side to left side of a camera. or different cameras. Thus, position of ṕ in Ai is the corre-
Next, two horizontal cuts are used to divide human body sponding position of p in Aj , which is proportional to the
into 3 major parts, head (hair + face), upper body (torso + sizes of the areas.
arms) and lower body (legs + feet). Portions of human body
of GEIs under two views can be significantly different when 4.2. Selection of ROI elements
the two views are captured from different cameras with dif- Let Ac be a local region that contains ṕ and its neighbor-
ferent parameters setting. For each viewing angle, the aver- ing pixels. Then the ROI of T pixels is defined as:
age portion of body parts segmentation are estimated based ROI = {p∗1 , p∗2 , ..., p∗T } , p∗t ∈ Ac (12)
978
where p∗t is a pixel that is one of the T highest values of 6. Experimental results
|COR(p∗t , p)|, COR(·, ·) is correlation coefficient of the The publically available CASIA gait database B was
two variables. This provides the T most relevant pixels that used for our experiments. It contains 124 subjects. The
have closest motion relationship with the pixel p. The size, gait data was captured from 11 viewing angles, namely 0◦ ,
T , of the ROI is decided based on cross validation tests. 18◦ , 36◦ , 54◦ , 72◦ , 90◦ , 108◦ , 126◦ , 144◦ , 162◦ , and 180◦ .
Figure 6 shows examples of ROI selections based on the There are 6 video sequences for each person under each dif-
proposed method. ferent viewing angle. Therefore, we use a total of 11x124x6
or 8184 gait sequences. The dataset is divided into 2 groups.
The first group contains 24 subjects for VTM construction
process. The second group contains the rest 100 subjects
for performance evaluate on multi-views gait recognition.
Implementation of SVR is based on the well known
SVM-Light Support Vector Machine library [5]. The
proposed SVR-based method requires tuning of SVR-
Figure 6. The first row is the ROI selection for V T M36◦ →54◦ and the
second row is the ROI selection for V T M18◦ →162◦ . For each row, the
parameters such as in equation (2), C in equation (3), d in
first image contains the allocated ROI (red pixels) for predicting the cor- equation (6), and σ in equation (7). Some parameters can be
responding target pixel (red pixel) as shown in second image. The third roughly estimated based on each specific case of the VTM
image shows the relationship between the target pixel (y-axis) and the se- construction. For example, should be larger to construct
lected pixel from corresponding source ROI (x-axis) from various pairs of
training samples (gθk , gθk ).
the VTM for the pair of viewing angles that have larger dif-
i j
ference. The SVR model should be more flexible than the
4.3. Multi-view to one-view transformation model for closer viewing angles. Besides, d or the degree
In practice, one-view to one-view transformation is not of polynomial kernel and σ or the width of RBF kernel can
precise enough. This is because the orthogonality is degen- be adjusted based on over-fitting of the regression models
erated when processing 2D silhouette image. To overcome on validation dataset.
this problem, a gait feature under one particular angle can The experiments were carried out using SVR with three
be estimated from the features under multiple views. For different kernels (linear, polynomial and RBF). The perfor-
example, two gait features under view θi and θm are trans- mance based on proposed technique (SVR) and the methods
formed to the feature under view θj as: (SVD) [7][10] are compared. The method [7] applied SVD
on optimized GEI while the method [10] applied SVD on
pkθj ≈ < w, ROIpθki : ROIpθkm > + b (13) frequency-domain gait feature. Figure 7 shows examples of
θj θj
transformed GEI images using the proposed technique.
where “:” is special concatenation between the two ROI
vectors such that the operator selects T pixels that have T
highest correlations with the target pixel and T is the size of
final ROI for regression.
In our study, it is seen that gaits from multiple views
provide more sufficient information. Thus, equation (13)
will generate more precise view transformation results. Figure 7. (a) is g0◦ . (b), (c) and (d) are transformed g18◦ from g0◦ by
using SVR with linear, polynomial, and RBF kernels respectively. (e) is
5. Gait similarity measurement g18◦ . (f) is g126◦ . (g), (h) and (i) are transformed g108◦ from g126◦ by
This paper focuses on investigating the performance of using SVR with linear, polynomial, and RBF kernels respectively. (j) is
new VTM construction. To achieve similarity measure- g108◦ .
ments, the simple but widely adopted Euclidean distance is The experiments were completed using the computer
used. Once two gait features under the same viewing angle, machine with Quad Processor 2.66 GHz and 4 GB Ram.
g i and g j , are obtained. The similarity of the two features The size of GEI is 30×30 pixels and the size of ROI is 30
(g i , g j ) is linearly measured as follows: pixels. Based on the mentioned specifications, the train-
v ing time for one VTM using the proposed method is ap-
uN
uX proximately 10-20 minutes. It depends on the setting of
i j 2
d(g , g ) = t (g i (n) − g j (n)) (14) SVR-parameters and the choice of SVR-kernel. In addition,
n=1 the performance and complexity of the proposed method as
where d(g , g ) is a distance between gait signatures g i and
i j well depends on the dimension of GEI, the size of ROI, and
g j . N is dimension of gait feature. The smaller value of d, the context and the number of training gait dataset.
the more possibility that gait signatures g i and g j belong to Figure 8 illustrates the first rank or the top one gait
the same subject. identity matching for multi-view gait recognition by using
979
Figure 8. Comparisons of first rank multi-view gait recognition performances between the proposed approaches (linear-SVR,
polynomial-SVR, RBF-SVR) and the methods (SVD[7], SVD[10]) from literature
Figure 9. CMS curves for multi-view gait recognitions using multi-view to one-view transformation based on RBF kernel
based-SVR
980
five different methods, linear-SVR, polynomial-SVR, RBF- Acknowledgement
SVR, SVD [7], and SVD [10]. For each bar chart in Figure 1. NICTA is funded by the Australian Government as represented by
8, probe data under one particular viewing angle is trans- the Department of Broadband, Communications and the Digital
Economy and the Australian Research Council through the ICT Cen-
formed to a feature set under another view that matches one tre of Excellence program.
of the views from gallery data. Then the gait similarity is 2. Portions of the research in this paper use the CASIA Gait Database
measured to determine the multi-view gait recognition rate. collected by Institute of Automation, Chinese Academy of Sciences.
In Figure 9, Cumulative Match Scores (CMS) are used to References
demonstrate the results of multi-view to one-view transfor-
mation based on RBF-SVR method. The CMS with rank r [1] A. Ben-Hur, C. S. Ong, S. Sonnenburg, B. Scholkopf, and
means the top r matches must include the real identity. G. Ratsch. Support vector machines and kernels for compu-
From the experimental results, we can conclude the fol- tational biology, 2008. Public Library of Science Computer
Biology.
lowing key points. (1) RBF kernel provides the highest
[2] C. BenAbdelkader. Gait as a biometric for person identifica-
accuracy, then follows by polynomial and linear kernels tion in video, 2002. Ph.D. thesis in Maryland university.
respectively. (2) Being compared with SVD based ap- [3] R. Bodor, A. Drenner, D. Fehr, O. Masoud, and N. Pa-
proaches [7][10], the proposed method (RBF-SVR) signifi- panikolopoulos. View-independent human motion classifi-
cantly improves multi-view gait recognition performance. cation using image-based reconstruction, 2009. Journal of
The RBF-SVR achieves the accuracy up to 95% for the Image and Vision Computing.
close viewing angles (18◦ difference). However, SVD[7] [4] F. Jean, R. Bergevin, and A. B. Albu. Computing and evalu-
and SVD[10] can only achieve up to 85% of accuracy and ating view-normalized body part trajectories, 2009. Journal
70% of accuracy respectively. (3) Figure 9 shows that the of Image and Vision Computing.
multi-view gait recognition performance of milti-view to [5] T. Joachims. Svm-light support vector machine, 2008. ver-
one-view transformation is significantly better than one- sion 6.02.
view to one-view transformation. In addition, the proposed [6] A. Kale, K. R. Chowdhury, and R. Chellappa. Towards a
view invariant gait recognition algorithm, 2003. IEEE Con-
approach (RBF-SVR) also performs better than the SVD
ference on Advanced Video and Signal Based Surveillance.
based methods [7][10] for multi-view to one-view transfor-
[7] W. Kusakunniran, Q. Wu, H. Li, and J. Zhang. Multi-
mation. (4) It is clearly seen from Figure 8 that transfor- ple views gait recognition using view transformation model
mation between the closer viewing angles results in a better based on optimized gait energy image, 2009. 2nd IEEE Inter-
performance. This is because features under closer views national Workshop on Tracking Humans for the Evaluation
share more common gait information. of their Motion in Image Sequences.
[8] L. Lee. Gait analysis for classification, 2002. PhD. Thesis in
7. Conclusion Massachusetts Institute of Technology.
[9] F. Lv, T. Zhao, and R. Nevatia. Camera calibration from
This paper has proposed a new multi-view gait recog-
video of a walking human, 2006. IEEE Transactions on Pat-
nition using View Transformation Model (VTM) based on tern Analysis and Machine Intelligence.
Support Vector Regression (SVR). In our study, multi-view [10] Y. Makihara, R. Sagawa, Y. Mukaigawa, T. Echigo, and
gait recognition is re-formulated as a regression problem. Y. Yagi. Gait recognition using a view transformation model
It is completely under different point of view when be- in the frequency domain, 2006. European Conference on
ing compared with the typical SVD based method [7][10] Computer Vision.
which considered multi-view gait recognition as a matrix [11] N. A. Sakhanenko, G. F. Luger, H. E. Makaruk, J. B. Aubrey,
decomposition problem. In the proposal, a VTM is con- and D. B. Holtkamp. Shock physics data reconstruction us-
structed from regression processes based on local motion ing support vector regression, 2006. International Journal of
feature selection through GEIs. The well generated VTMs Modern Physics C.
can efficiently transform GEIs under various viewing an- [12] G. Shakhnarovich, L. Lee, and T. Darrell. Integrated face
and gait recognition from multiple views, 2001. IEEE Con-
gles into a common viewing angle. Then gaits similarity
ference on Computer Vision and Pattern Recognition.
measurement can be conducted without difficulty.
[13] A. J. Smola and B. Scholkopf. A tutorial on support vector
The proposed approach has been verified with a large regression, 2004. Statistics and Computing.
multiple views gait database. Compared with the SVD [14] A. Utsumi and N. Tetsutani. Adaptation of appearance model
based method [7][10] , the proposed SVR based method sig- for human tracking using geometrical pixel value distribu-
nificantly improves the performance of the multi-view gait tions, 2004. In Proceedings of the 6th Asian Conference on
recognition. In addition, by using local feature, the pro- Computer Vision.
posed view transformation system is more robust to noise. [15] S. Yu, D. Tan, , and T. Tan. Modelling the effect of view
Beside, its complexity is controllable because it does not re- angle variation on appearance-based gait recognition, 2006.
quire any camera calibration and complicated multi-camera In Proceedings of the 7th Asian Conference on Computer
system. Vision.
981