Sunteți pe pagina 1din 8

Support Vector Regression for Multi-View Gait Recognition based on Local

Motion Feature Selection


Worapan Kusakunniran1,4 , Qiang Wu2 , Jian Zhang1,4 , and Hongdong Li3,4
1
School of Computer Science and Engineering - University of New South Wales,
2
School of Computing and Communications - University of Technology Sydney,
3
Research School of Information Sciences and Engineering - Australian National University,
and 4 National ICT Australia-NICTA
{worapan.kusakunniran@nicta.com.au, qiang.wu@uts.edu.au,
jian.zhang@nicta.com.au, and hongdong.li@nicta.com.au}

Abstract To overcome this difficulty, three major categories of


Gait is a well recognized biometric feature that is used methods have been investigated recently. The first category
to identify a human at a distance. However, in real envi- is to perform gait recognition under calibrated multi-camera
ronment, appearance changes of individuals due to viewing system. This kind of approach requires complicated multi-
ple cameras setting instead of simple single camera system.
angle changes cause many difficulties for gait recognition.
This paper re-formulates this problem as a regression prob- A gait feature can be reconstructed under any view using 3D
lem. A novel solution is proposed to create a View Trans- structure information. Bodor et al. [3] applied image-based
formation Model (VTM) from the different point of view us- rendering to reconstruct gait feature for any required view-
ing Support Vector Regression (SVR). To facilitate the pro- ing angle. It combined several gaits which were acquired by
cess of regression, a new method is proposed to seek lo- various cameras under the different views. Shakhnarovich
et al. [8] and Lee [12] invented an image-based visual hull
cal Region of Interest (ROI) under one viewing angle for
predicting the corresponding motion information under an- (IBVH) to render visual views for gait recognition. IBVH
other viewing angle. Thus, the well constructed VTM is able was computed from a set of monocular views captured from
to transfer gait information under one viewing angle into multiple cameras which can be rendered simultaneously.
another viewing angle. This proposal can achieve view- Although this category of approaches can provide a reli-
independent gait recognition. It normalizes gait features able performance, they require costly complicated setup of
a cooperating multi-camera system.
under various viewing angles into a common viewing angle
before similarity measurement is carried out. The exten- In the second category, gait feature invariant to view-
sive experimental results based on widely adopted bench- ing angle change is developed. Jean et al. [4] introduced
mark dataset demonstrate that the proposed algorithm can a method to compute and evaluate view-normalized trajec-
achieve significantly better performance than the existing tories of pedestrian body parts obtained from monocular
methods in literature. video sequences. It used feet and head 2D-trajectories from
1. Introduction tracked silhouettes to segment the walking trajectory into
piecewise linear segments. The normalized trajectories are
Gait has recently gained considerable attention from hu-
invariant to viewing angle since they always appear like be-
man identification field because it is one biometric feature
ing seen from a frontal-parallel viewpoint. However, this
that can be efficiently recognized at a distance. It has been
technique works well only for a limited range of viewing
be well analyzed and recognized by computer vision proce-
angles. It is because that the duration of self-occlusion inter-
dure according to global shape statistic, motion frequency,
vals is not longer than the duration of time intervals where
and temporal and spatial changes of human bodies.
both feet are visible. Kale et al. [6] developed a method
In real application, appearance changes due to a change
to synthesize images of arbitrary-view from an image under
of viewing angle or walking direction is one of the main
particular view through perspective projection in a sagittal
difficulties for gait recognition. This is because a person can
plane. This approach significantly drops in performance in
walk freely in any direction within a single camera’s field of
case of a self-occlusion when an angle between image plane
view. Viewing angle may also change when a person walks
and the sagittal plane is large.
from one camera into another camera with different settings
of parameters. Gait recognition accuracy drops drastically The third category is to transform gait features from one
when viewing angle is changed [2] [15]. viewing angle to another one which has been observed in

978-1-4244-6985-7/10/$26.00 ©2010 IEEE 974


the gallery data. Thus, gait similarity measurement can be predict gait information on the target viewing angle. More-
carried out by a standard method. It uses a simpler single over, such kind of regression relation defined is regardless
camera system, when being compared with a complicated of the subjects observed. In our study, Support Vector Re-
multi-camera system of the first category. In addition, it gression (SVR) is used as a regression tool because of the
creates a more stable gait recognition system which is not well recognized advantages.
sensitive to noise when being compared with the system ob- Compared with the existing VTMs based on SVD, this
tained from the second category. The method proposed in new view transformation is carried out at the fundamental
this paper belongs to this third category. and visible feature domain, GEI. Thus, potentially, the per-
Typical methods in the third category established a View formance can be further improved by adopting refining op-
Transformation Model (VTM) through a matrix factoriza- eration on GEI. Moreover, this proposed approach is less
tion process by adopting Singular Value Decomposition sensitive to noise since regression is carried out between
(SVD) technique [7] [10] [14]. The method proposed in the reference point and a locally selected ROI, regardless
[14] created a VTM using a static image on spatial domain, of global gait feature. In addition, the computational com-
while [10] used a frequency-domain gait feature that ob- plexity can be well controlled since the size of VTM only
tained through Fourier Transformation. To improve recog- depends on the size of ROI and the number of support vec-
nition rate, Gait Energy Image (GEI) optimized by Linear tors. The proposed approach performs significantly better
Discriminant Analysis (LDA) operation was built up be- than the existing methods in the same category.
fore constructing VTM [7]. These methods [7], [10], [14] The rest of this paper is organized as follows. Basic tech-
created a matrix which each row contains gait information niques and gait feature extraction are described in section 2
from same viewing angle but different subjects, and each and section 3 respectively. Detailed construction of VTM
column contains gait information from same subject but is explained in section 4. Gait similarity measurement is
different viewing angles. Then, SVD factorized the gait stated in section 5. Experimental results are analyzed in
matrix into view independent and subject independent sub- section 6 and conclusion is drawn in section 7.
matrices. These sub-matrices are used to construct VTM
for multi-view gait recognition. 2. Basic techniques
These methods assume that the gait feature matrix in the This section describes two knowledge concepts: Support
training dataset can be completely decomposed into view Vector Regression (SVR) and VTM construction based on
independent sub-matrix and subject independent sub-matrix Singular Value Decomposition (SVD). SVR is a regression
respectively without overlapping elements. This assump- tool adopted in the proposed approach. SVD based VTM
tion has not been clearly verified mathematically. Hence, it construction [7][10] is a baseline method for performance
would not guarantee an optimized VTM. Moreover, these comparison on multi-view gait recognition.
methods always used global feature, a complete silhouette,
2.1. Regression for VTM construction
to perform view transformation. Global feature might also
include noise and uncertainty due to other changes of en- To achieve gait measurement between different viewing
vironments. It cannot effectively deal with partial occlu- angles, it is re-formulated as a regression problem. Particu-
sion either. In addition, the size of VTM in these methods larly, SVR has been adopted as a regression tool considering
not only depends on gait feature dimension but also size of its advantages on several aspects.
training dataset. Normally, in order to obtain a well con- 2.1.1 Problem re-formulation
structed VTM, a large training dataset is required. Hence, In the proposal, a view transformation of gait feature is
the size of VTM can be large as well, which leads to un- re-formulated as a regression problem. A set of VTMs is
popular computational complexity. These drawbacks will constructed for multi-view gait recognition. Each VTM is
be tackled in this paper by re-formulate the problem of gait used to transfer gait feature under one viewing angle into
recognition of various views as a regression problem. another viewing angle. A VTM consists of multiple regres-
In this paper, we propose a new approach in which Gait sion processes. Each regression process aims to predict a
Energy Image (GEI) is first constructed using silhouettes target pixel on GEI under target viewing angle from a se-
obtained from complete walking cycle(s) of a walking hu- lected ROI on GEI under source viewing angle. ROI is
man. Then, a set of VTMs based on GEI are created. Each carefully selected based on local motion relationship.
VTM will be used for view transformation between a pair In statistics, a regression process focuses on the relation-
of viewing angles. To facilitate the process of regression, a ship between a dependent variable (a target pixel on GEI
new method is proposed to seek local motion feature under under a viewing angle) and independent variables (a se-
one viewing angle, Region of Interest (ROI), for predicting lected ROI on GEI under another viewing angle). Regres-
the corresponding motion information under another view- sion analysis will estimate the conditional expectation of the
ing angle. The carefully selected ROI is able to regressively target pixel given the ROI. Thus, if a proper ROI can be lo-

975
cated, the constructed VTM based on these regressive rela- where constant C determines the trade-off between the flat-
tionships can efficiently transform gait feature from source ness of f and the amount up to which deviations larger than
viewing angle to target viewing angle.  are tolerated.
As a regression tool, SVR is chosen for the proposed ap- The optimization problem (2) can be solved more easily
proach because of the following advantages [11]. First, it in its dual formulation. Thus, a standard dualization method
features in good generalization performance. This is core utilizing Lagrange multipliers is used and yields the dual
requirement for most regression application including this optimization problem as:
research. Secondly, its representation is sparse. This is be- ( Pk
− 21 i,j=1 (αi − αi∗ )(αj − αj∗ ) < xi , xj >
cause a regression model obtained by SVR depends only maximize Pk Pk
on a subspace of the training dataset. Therefore, dimension − i=1 (αi + αi∗ ) + i=1 yi (αi − αi∗ )
Pk ∗ ∗
i=1 (αi − αi ) = 0, αi , αi ∈ [0, C]
of VTM is sparse and controllable. Thirdly, unlike some subject to (3)
other regression tools such as neural network, SVR does
not have a local minimum problem. Its solution is unique where αi , αi∗ ≥ 0 are Lagrange multipliers.
and globally optimal. Hence, we can obtain a global op- The support vector expansion also can be obtained where
timized version of VTM based on the supplied context of the w can be completely described as a linear combination
training data. Fourthly, SVR is a kernel-based regression of the training pattern x as:
Pk
technique. Thus, it allows the system to work with arbitrary w = i=1 (αi − αi∗ )xi (4)
large feature space, not limited to just input space by using
Moreover, the complete algorithm can be described in
linear kernel.
terms of dot products between the data as:
2.1.2 SVR Pk
f (x) = i=1 (αi − αi∗ ) < xi , x > + b (5)
SVR concept is briefly explained in the followings,
more details can be referred to [13]. Given the data The concept described above can be considered as a lin-
as {Si : (xi , yi ) |xi ∈ X, yi ∈ R, i = 1...k}, where X de- ear kernel-based SVR. To apply a non-linear kernel to SVR,
notes the space of input patterns and k is the number of in equations (3) and (5), the dot products < xi , xj > for
sample. Regression equation f (x) for describing the case linear kernel are replaced with alternative kernel k(xi , xj ).
of linear function is defined as: Two non-linear kernels [1] are used in this study. The first
kernel is polynomial of degree d which can be defined as:
f (x) = < w, x > + b (1) polynomial
Kd,s,k (xi , xj ) = (s < xi , xj > + k)d (6)
where w ∈ X, b ∈ R, and < ·, · > denotes the dot product
in X. The degree of the polynomial kernel controls the flexi-
bility of the resulting regression models. The lowest degree
polynomial is the linear kernel, which is not sufficient when
a nonlinear relationship between features exists. The sec-
ond kernel that is widely used is the Gaussian or Radius
Basic Function (RBF). RBF is defined as:
KσRBF (xi , xj ) = exp(− σ1 ||xi − xj ||2 ) (7)
The σ > 0 is a parameter that controls width of Gaussian,
Figure 1. The soft margin loss setting for SVR which plays a similar role as degree of polynomial kernel.
σ in the Gaussian kernel and d in the polynomial kernel
In -SVR [13] as shown in Figure 1, the goal is to find a determine the flexibility of the produced SVR in fitting the
function f (x) that satisfies the following three fundamental data. Larger d or smaller σ may lead to over-fitting.
aspects. First, error from the difference between observed
target yi and predicted value f (xi ) is disregarded as long 2.2. SVD based VTM construction and its limitation
as it is less than . Secondly, the most possible flatness For any given matrix A ∈ Rn×m , it has a decomposition
is attempted by minimizing the norm ||w||2 =< w, w >. A = U SV T such that U is an (n × n) orthogonal matrix
Thirdly, the soft margin loss function is allowed. One can called Left Singular Vectors of A, S is an (n × m) matrix
introduce slack variables ξ, ξ ∗ to cope with data points that with non-negative diagonal entries which are singular val-
lies outside the absolute  regions. Hence,  − SV R arrives ues of A, and V is an (m × m) orthogonal matrix called
at the objective formulation as: Right Singular Vectors of A. SVD can perform factoriza-
1 2
Pk ∗
minimize  2 ||w|| + C i=1 (ξi + ξi )
tion of matrix A. The diagonal values of S are the square
 yi − < w, xi > − b ≤  + ξi roots of the Eigenvalues of AT A and AAT . Consequently,
subject to < w, xi > + b − yi ≤  + ξi∗ (2) the left singular vectors are eigenvectors of AT A and the

ξi , ξi∗ ≥ 0 right singular vectors are eigenvectors of AAT .

976
To adopt SVD for VTM construction [7], [10], [14], the Then the proposed approach captures the gait informa-
first step is to build gait matrix A. In the matrix, each row tion in spatial domain through Gait Energy Image (GEI) [7].
contains gait information from same viewing angle of the Figure 2 illustrates the example of GEI under various view-
different subjects and each column contains gait informa- ing angles of three different subjects.
tion from same subject under the different angles. Then
SVD can factorize the gait matrix A into view-independent
sub-matrix and subject-independent sub-matrix. Let gθki be
a gait signature of subject k under viewing angle θi . The
factorization process by SVD is as:
gθ11 ... gθK1
 
Pθ1
 
 . . .   .

 = U SV T =  .
 
Figure 2. Spatial-domain GEI under various viewing angles
 v 1 ... v K

. . . (8)
  
 
 . . .   . 
Figure 2 shows that the appearance-based GEI varies on
gθ1 ... gθK PθI different viewing angles. So it is not efficient to directly
I I

A vector v k is an intrinsic gait feature of the k th subject measure the similarity between two GEIs under different
for any viewing angle. Pθi is a projection matrix which can views. Thus, in the proposed approach, a constructed VTM
project intrinsic vector v of any subject to the gait feature is required to transform a GEI to be under the view as same
vector under a specific viewing angle θi . Thus, gait feature as another GEI before the similarity is measured.
can be written in factorized form as:
gθki = Pθi v k (9)
4. View transformation model construction
Let V T Mθi →θj denotes a view transformation model
The subject-independent matrix P is used as a VTM in that is used to transform GEI from view θi to θj , gθki denotes
common for any subject. For example, gait feature transfor-
a GEI of subject k under view θi , pkθi denotes pth pixel of
mation from viewing angle θi to θj is obtained by:
gθki , and ROIpθki denotes a region of interest or group of
gθkj = Pθj Pθ+i gθki (10) θj

pixels on gθki regarding relations with a pixel pkθj .


where Pθ+i is the pseudo inverse matrix of Pθi . As mentioned in section 2.1.1, a V T Mθi →θj consists of
There are three main limitations in VTM construction multiple regression processes. Each regression process is
based on SVD factorization process [10]. First, from used to predict each pixel value on gθkj from relevant pixels
equation (8), the method claims P = U S as a subject-
on gθki . The regression model f is defined as:
independent sub-matrix and V T as a view-independent sub-
matrix. However, this claim is lack of concrete proof from pkθj ≈ f (θi , θj , k, p) =< w, ROIpθki > + b (11)
θj
a mathematic point of view. Thus, there is no promise to
obtain an optimized VTM. Secondly, from equation (8), di- This equation is equivalent to standard regression equa-
mension of matrix P is (IxNg )xK and dimension of ma- tion (1) where x is ROIpθki and y is pkθj .
θj
trix V is KxK, where I is the total number of observed To build up the VTM, the sufficient numbers of train-
viewing angles, Ng is a gait dimension, and K is the to- ing pair (gθi , gθj ) from different subjects are required. The
tal number of training data. Thus, size of VTM clearly de- regression technique used for training process is Support
pends on the number of training data which can lead to huge Vector Regression (SVR). The three kernels (linear, poly-
memory consuming and high complexity problems when a nomial, and RBF) are attempted for this study.
system requires a huge training dataset for training process. In order to achieve reliable regression, a local ROI of rel-
Thirdly, a gait feature (gθki ) is a global feature which is sen- evant pixels on gθi is selected to estimate a corresponding
sitive to background noise and any defection in any local target pixel p on gθj . Without reliable ROI, the regression
area such as partial occlusion. process will definitely fail. In addition, ROI or group of pix-
els is used, instead of just single pixel, to predict the target
3. Gait feature extraction pixel is because of the following two supporting reasons.
Gait is a periodic action. Therefore, gait analysis should The first reason is to avoid variation and noise from indi-
be operated within complete walking cycle(s). This paper vidual pixel. The second reason is that single source pixel
adopts the method from [7] to determine the period of each might not be efficient and robust to predict the target pixel.
gait sequence. The idea is to build a waveform of aspect ra- The ROI is claimed to be able to predict motion informa-
tio: width/height of silhouette bounding box, along the time tion of the pixel p. There are two main steps to obtain the
of image sequence. Then normalization and autocorrelation efficient ROI. The first step is to find ṕ on gθi . The position
processes are applied to the waveform to obtain the repeated ṕ is the closest estimated position on gθi for p. The ROI in
curve pattern, which may indicate the gait period. our study may contain consecutive pixels and also discon-

977
nected pixels. In the second step of ROI selection, we aim on training data.
to locate all these pixels in the neighboring region of ṕ. To obtain the two horizontal cuts, GEI is first projected
on major axis to create a histogram. Then, smooth the his-
4.1. Estimation of ṕ
togram by average filter. Example results are shown in Fig-
The pixel ṕ is not always located at the same coordi-
ure 4.
nate as the pixel p. This is because GEIs under two views
(gθi , gθj ) may have different 2D display structures. The
GEIs are classified into four categories according to the
walking directions of subjects (see Figure 3). Moreover,
in this study, a GEI is divided into six independent areas by
one vertical cut and two horizontal cuts as shown in Figure
Figure 4. Projection histograms for body part segmentation process. Blue
3. These three cuts are not trivial, but generated from mo- line is a border between head and upper body and red line is a border
tion information and geometric distribution of human body between upper body and lower body.
parts. A border between head and upper body is at the first sad-
dle point from top to bottom or from head to feet. A border
between upper body and lower body is claimed to be at the
hip which is approximately located at the peak position of
the histogram based on the observation (see Figure 4).
As the result from one vertical cut and two horizontal
cuts, a GEI is divided into six areas. The next step is to pro-
cess the area matching between gθi and gθj . For vertical cut,
the sides between two views have to be correctly matched
Figure 3. The top row shows the four possible walking direction types.
in terms of a walking direction. According to Figure 3,
The bottom row shows examples GEI displays of the four types. In the if a view transformation is conducted between/inside the
top row, dash arrow is a walking direction, a gray vertical line is an optical first two cases, “Left” side must match with “Left” side and
axis, and a green horizontal line is an image plane. In the bottom row, a “Right” side must match with “Right” side. If a view trans-
blue vertical line is used for side segmentation and red horizontal lines are
used for body part segmentation.
formation is conducted between/inside the last two cases,
“Front” side must match with “Front” side and “Back” side
A GEI is divided into two sides by a vertical cut. The must match with “Back” side. For horizontal cuts, body part
vertical cut, called major axis, is calculated from eigenvec- areas between two views have to be correctly matched. For
tor of covariance matrix of the tracked silhouette [9]. For example, upper body area of gθi has to be simply matched
each camera viewing angle, major axes calculated from dif- with upper body area of gθj .
ferent sample silhouettes can be slightly different. Thus, the
average (in terms of the axis’s slope) is used as a represen-
tative major axis of GEI under the camera viewing angle.
Four possible 2D displays of walking manners regarding
of vertical cut are shown in Figure 3. Each GEI must belong
to one of these four manners. The first two cases usually
occur when a person walks within ±18◦ away from camera Figure 5. ROI selection: p is a target pixel under view θj , ṕ is an estimated
position under view θi of p, and yellow area is the candidate pixels for
optical axis. Major axis divides a GEI into “Left” side and ROIp
“Right” side. The first case is walking away from a camera. Let Ai is an area from view θi that is likely to contain a
The second case is walking toward a camera. The last two pixel p from area Aj of view θj . For example as shown in
cases occur when a person walks at angle larger than ±18◦ Figure 5, if p is in the front-lower body area under view θj ,
away from camera optical axis. Major axis divides a GEI then ṕ must also be in the front-lower body area but under
into “Front” side and “Back” side. The third case is walking view θi . The area Ai and area Aj may contain different size
from left side to right side of a camera, while the fourth case and shape because they are captured under different views
is walking from right side to left side of a camera. or different cameras. Thus, position of ṕ in Ai is the corre-
Next, two horizontal cuts are used to divide human body sponding position of p in Aj , which is proportional to the
into 3 major parts, head (hair + face), upper body (torso + sizes of the areas.
arms) and lower body (legs + feet). Portions of human body
of GEIs under two views can be significantly different when 4.2. Selection of ROI elements
the two views are captured from different cameras with dif- Let Ac be a local region that contains ṕ and its neighbor-
ferent parameters setting. For each viewing angle, the aver- ing pixels. Then the ROI of T pixels is defined as:
age portion of body parts segmentation are estimated based ROI = {p∗1 , p∗2 , ..., p∗T } , p∗t ∈ Ac (12)

978
where p∗t is a pixel that is one of the T highest values of 6. Experimental results
|COR(p∗t , p)|, COR(·, ·) is correlation coefficient of the The publically available CASIA gait database B was
two variables. This provides the T most relevant pixels that used for our experiments. It contains 124 subjects. The
have closest motion relationship with the pixel p. The size, gait data was captured from 11 viewing angles, namely 0◦ ,
T , of the ROI is decided based on cross validation tests. 18◦ , 36◦ , 54◦ , 72◦ , 90◦ , 108◦ , 126◦ , 144◦ , 162◦ , and 180◦ .
Figure 6 shows examples of ROI selections based on the There are 6 video sequences for each person under each dif-
proposed method. ferent viewing angle. Therefore, we use a total of 11x124x6
or 8184 gait sequences. The dataset is divided into 2 groups.
The first group contains 24 subjects for VTM construction
process. The second group contains the rest 100 subjects
for performance evaluate on multi-views gait recognition.
Implementation of SVR is based on the well known
SVM-Light Support Vector Machine library [5]. The
proposed SVR-based method requires tuning of SVR-
Figure 6. The first row is the ROI selection for V T M36◦ →54◦ and the
second row is the ROI selection for V T M18◦ →162◦ . For each row, the
parameters such as  in equation (2), C in equation (3), d in
first image contains the allocated ROI (red pixels) for predicting the cor- equation (6), and σ in equation (7). Some parameters can be
responding target pixel (red pixel) as shown in second image. The third roughly estimated based on each specific case of the VTM
image shows the relationship between the target pixel (y-axis) and the se- construction. For example,  should be larger to construct
lected pixel from corresponding source ROI (x-axis) from various pairs of
training samples (gθk , gθk ).
the VTM for the pair of viewing angles that have larger dif-
i j
ference. The SVR model should be more flexible than the
4.3. Multi-view to one-view transformation model for closer viewing angles. Besides, d or the degree
In practice, one-view to one-view transformation is not of polynomial kernel and σ or the width of RBF kernel can
precise enough. This is because the orthogonality is degen- be adjusted based on over-fitting of the regression models
erated when processing 2D silhouette image. To overcome on validation dataset.
this problem, a gait feature under one particular angle can The experiments were carried out using SVR with three
be estimated from the features under multiple views. For different kernels (linear, polynomial and RBF). The perfor-
example, two gait features under view θi and θm are trans- mance based on proposed technique (SVR) and the methods
formed to the feature under view θj as: (SVD) [7][10] are compared. The method [7] applied SVD
on optimized GEI while the method [10] applied SVD on
pkθj ≈ < w, ROIpθki : ROIpθkm > + b (13) frequency-domain gait feature. Figure 7 shows examples of
θj θj
transformed GEI images using the proposed technique.
where “:” is special concatenation between the two ROI
vectors such that the operator selects T pixels that have T
highest correlations with the target pixel and T is the size of
final ROI for regression.
In our study, it is seen that gaits from multiple views
provide more sufficient information. Thus, equation (13)
will generate more precise view transformation results. Figure 7. (a) is g0◦ . (b), (c) and (d) are transformed g18◦ from g0◦ by
using SVR with linear, polynomial, and RBF kernels respectively. (e) is
5. Gait similarity measurement g18◦ . (f) is g126◦ . (g), (h) and (i) are transformed g108◦ from g126◦ by
This paper focuses on investigating the performance of using SVR with linear, polynomial, and RBF kernels respectively. (j) is
new VTM construction. To achieve similarity measure- g108◦ .
ments, the simple but widely adopted Euclidean distance is The experiments were completed using the computer
used. Once two gait features under the same viewing angle, machine with Quad Processor 2.66 GHz and 4 GB Ram.
g i and g j , are obtained. The similarity of the two features The size of GEI is 30×30 pixels and the size of ROI is 30
(g i , g j ) is linearly measured as follows: pixels. Based on the mentioned specifications, the train-
v ing time for one VTM using the proposed method is ap-
uN
uX proximately 10-20 minutes. It depends on the setting of
i j 2
d(g , g ) = t (g i (n) − g j (n)) (14) SVR-parameters and the choice of SVR-kernel. In addition,
n=1 the performance and complexity of the proposed method as
where d(g , g ) is a distance between gait signatures g i and
i j well depends on the dimension of GEI, the size of ROI, and
g j . N is dimension of gait feature. The smaller value of d, the context and the number of training gait dataset.
the more possibility that gait signatures g i and g j belong to Figure 8 illustrates the first rank or the top one gait
the same subject. identity matching for multi-view gait recognition by using

979
Figure 8. Comparisons of first rank multi-view gait recognition performances between the proposed approaches (linear-SVR,
polynomial-SVR, RBF-SVR) and the methods (SVD[7], SVD[10]) from literature

Figure 9. CMS curves for multi-view gait recognitions using multi-view to one-view transformation based on RBF kernel
based-SVR

980
five different methods, linear-SVR, polynomial-SVR, RBF- Acknowledgement
SVR, SVD [7], and SVD [10]. For each bar chart in Figure 1. NICTA is funded by the Australian Government as represented by
8, probe data under one particular viewing angle is trans- the Department of Broadband, Communications and the Digital
Economy and the Australian Research Council through the ICT Cen-
formed to a feature set under another view that matches one tre of Excellence program.
of the views from gallery data. Then the gait similarity is 2. Portions of the research in this paper use the CASIA Gait Database
measured to determine the multi-view gait recognition rate. collected by Institute of Automation, Chinese Academy of Sciences.
In Figure 9, Cumulative Match Scores (CMS) are used to References
demonstrate the results of multi-view to one-view transfor-
mation based on RBF-SVR method. The CMS with rank r [1] A. Ben-Hur, C. S. Ong, S. Sonnenburg, B. Scholkopf, and
means the top r matches must include the real identity. G. Ratsch. Support vector machines and kernels for compu-
From the experimental results, we can conclude the fol- tational biology, 2008. Public Library of Science Computer
Biology.
lowing key points. (1) RBF kernel provides the highest
[2] C. BenAbdelkader. Gait as a biometric for person identifica-
accuracy, then follows by polynomial and linear kernels tion in video, 2002. Ph.D. thesis in Maryland university.
respectively. (2) Being compared with SVD based ap- [3] R. Bodor, A. Drenner, D. Fehr, O. Masoud, and N. Pa-
proaches [7][10], the proposed method (RBF-SVR) signifi- panikolopoulos. View-independent human motion classifi-
cantly improves multi-view gait recognition performance. cation using image-based reconstruction, 2009. Journal of
The RBF-SVR achieves the accuracy up to 95% for the Image and Vision Computing.
close viewing angles (18◦ difference). However, SVD[7] [4] F. Jean, R. Bergevin, and A. B. Albu. Computing and evalu-
and SVD[10] can only achieve up to 85% of accuracy and ating view-normalized body part trajectories, 2009. Journal
70% of accuracy respectively. (3) Figure 9 shows that the of Image and Vision Computing.
multi-view gait recognition performance of milti-view to [5] T. Joachims. Svm-light support vector machine, 2008. ver-
one-view transformation is significantly better than one- sion 6.02.
view to one-view transformation. In addition, the proposed [6] A. Kale, K. R. Chowdhury, and R. Chellappa. Towards a
view invariant gait recognition algorithm, 2003. IEEE Con-
approach (RBF-SVR) also performs better than the SVD
ference on Advanced Video and Signal Based Surveillance.
based methods [7][10] for multi-view to one-view transfor-
[7] W. Kusakunniran, Q. Wu, H. Li, and J. Zhang. Multi-
mation. (4) It is clearly seen from Figure 8 that transfor- ple views gait recognition using view transformation model
mation between the closer viewing angles results in a better based on optimized gait energy image, 2009. 2nd IEEE Inter-
performance. This is because features under closer views national Workshop on Tracking Humans for the Evaluation
share more common gait information. of their Motion in Image Sequences.
[8] L. Lee. Gait analysis for classification, 2002. PhD. Thesis in
7. Conclusion Massachusetts Institute of Technology.
[9] F. Lv, T. Zhao, and R. Nevatia. Camera calibration from
This paper has proposed a new multi-view gait recog-
video of a walking human, 2006. IEEE Transactions on Pat-
nition using View Transformation Model (VTM) based on tern Analysis and Machine Intelligence.
Support Vector Regression (SVR). In our study, multi-view [10] Y. Makihara, R. Sagawa, Y. Mukaigawa, T. Echigo, and
gait recognition is re-formulated as a regression problem. Y. Yagi. Gait recognition using a view transformation model
It is completely under different point of view when be- in the frequency domain, 2006. European Conference on
ing compared with the typical SVD based method [7][10] Computer Vision.
which considered multi-view gait recognition as a matrix [11] N. A. Sakhanenko, G. F. Luger, H. E. Makaruk, J. B. Aubrey,
decomposition problem. In the proposal, a VTM is con- and D. B. Holtkamp. Shock physics data reconstruction us-
structed from regression processes based on local motion ing support vector regression, 2006. International Journal of
feature selection through GEIs. The well generated VTMs Modern Physics C.
can efficiently transform GEIs under various viewing an- [12] G. Shakhnarovich, L. Lee, and T. Darrell. Integrated face
and gait recognition from multiple views, 2001. IEEE Con-
gles into a common viewing angle. Then gaits similarity
ference on Computer Vision and Pattern Recognition.
measurement can be conducted without difficulty.
[13] A. J. Smola and B. Scholkopf. A tutorial on support vector
The proposed approach has been verified with a large regression, 2004. Statistics and Computing.
multiple views gait database. Compared with the SVD [14] A. Utsumi and N. Tetsutani. Adaptation of appearance model
based method [7][10] , the proposed SVR based method sig- for human tracking using geometrical pixel value distribu-
nificantly improves the performance of the multi-view gait tions, 2004. In Proceedings of the 6th Asian Conference on
recognition. In addition, by using local feature, the pro- Computer Vision.
posed view transformation system is more robust to noise. [15] S. Yu, D. Tan, , and T. Tan. Modelling the effect of view
Beside, its complexity is controllable because it does not re- angle variation on appearance-based gait recognition, 2006.
quire any camera calibration and complicated multi-camera In Proceedings of the 7th Asian Conference on Computer
system. Vision.

981

S-ar putea să vă placă și