Sunteți pe pagina 1din 6

National Conference On “SOFT COMPUTING”

Sponsored By DRDO
8 th & 9th march 2011

HUMAN ACTION RECOGNITION IN VIDEOS


FOR VISUAL SURVEILLANCE

1
Ms.N.Prathiba, 2 Mrs.G.Glorindal Selvam
1
M.E Student , Department of Computer Science And Engineering ,
2
Lecturer, Department of Computer Science And Engineering ,
Dr.Sivanthi Aditanar College of Engineering, Tiruchendur
mail4prathi@gmail.com

Abstract–This work proposes a set of kinematic system that may detect a person having a heart
features that are derived from the optical flow for attack in a room or a system that may detect a
representing complex human action in videos. person drowning at a swimming pool can be
Each kinematic feature when computed from the recognized.
optical flow of a sequence of images, gives a Action recognition is done by making use of
spatiotemporal pattern. These patterns are flow generated by motion called optical flow.
represented in the form of dominant kinematic Optical flow reflects the image changes due to
trends or kinematic modes. For motion during a time interval.
classification,Multiple Instance Learning (MIL) is
Since motion is an important source of
used in each action in videos to represent it as a
information for classifying human actions, the main
bag of kinematic modes. Then each video is
objective is to recognize human action in videos
embedded into a kinematic-mode-based feature
from a set of kinematic features that are derived
space. Through this approach, it explores clear
from the optical flow of a sequence of images and
information about the motion of the human for
multiple instance learning. The idea behind the
excellent identification in surveillance scenarios.
introduction of kinematic features is to convert
Keywords- Action recognition, motion, video optical flow information into a more discriminative
analysis, principal component analysis, kinematic representation that can improve the motion-based
features action classification.

I. INTRODUCTION The proposed kinematic features are:


divergence, vorticity, symmetric and antisymmetric
Action recognition has applications in video
optical flow fields, second and third principal
surveillance, human computer interaction,
invariants of flow gradient and rate of strain tensor,
multimedia retrieval, and among others. It is also
and third principal invariant of rate of rotation
very challenging both because the range of possible
tensor. Each feature is selected to capture a
human motions is so large and because variations
different aspect of optical flow. For instance,
in scene, viewpoint, and clothing add an additional
divergence delineates the regions of optical flow
layer of complexity. Image Processing, Pattern
that are undergoing expansion due to the movement
Recognition and Computer Vision have matured
of different limbs of the human body, while the
from recognizing simple objects, textures and
vorticity feature emphasizes regions of optical flow
images to recognize discrete actions, continuous
that are undergoing circular motion.
action and sequences of actions, called events. A

Department Of CSE,Dr.SACOE,Tiruchendur 484


National Conference On “SOFT COMPUTING”
Sponsored By DRDO
8 th & 9th march 2011

Each kinematic feature, when computed from the training data. Multiple instance learning is used,
optical flow of a sequence of because it provides flexibility in picking the
images, gives rise to a spatiotemporal pattern number of kinematic modes used to represent the
which is represented in the form of dominant action. It is important because a complex action
kinematic trends or kinematic modes. These may require more kinematic modes to represent its
kinematic modes are computed by performing dynamics than a simple action.
Principal Component Analysis (PCA) on the
II.OVERVIEW OF ALGORITHM
spatiotemporal volumes of the kinematic features.
A video containing an action is converted into
corresponding bag of kinematic modes whose
The identified kinematic modes of each
pictorial representation is shown in Figure 1.1
kinematic feature are used for the classification of
human actions. To do this, the multiple instance
learning (MIL) approach is involved. The idea of
MIL is to represent each action video as a
collection or a “bag” of kinematic modes in which
each kinematic mode is referred to as an instance
representing that video.
Multiple-instance learning is a variation on
supervised learning. Instead of receiving a set of
instances which are labeled positive or negative,
the learner receives a set of bags that are labeled
Figure 1.1 Description of the process
positive or negative. Each bag contains many
instances. A bag is labeled negative if all the The steps involved in creating the bag of
instances in it are negative. On the other hand, a kinematic modes are
bag is labeled positive if there is at least one 1) A video containing an action is the input to the
instance in it which is positive. From a collection first step, which computes the optical flow between
of labeled bags, the learner tries to induce a consecutive frames of the video and produces a
concept that will label individual instances stack of optical flow fields.
correctly.
2) This stack of optical flow fields is the input to
In supervised learning, every training the second step, which computes the kinematic
instance is assigned a discrete or real-valued label. features and produces a separate, spatiotemporal
In comparison, in MIL, the labels are assigned only volume for each feature.
to bags of instances. In the binary case, a bag is
labeled positive if at least one instance in that bag 3) The third step takes the volume of each
is positive, and the bag is labeled negative if all the kinematic feature as an input, performs the PCA,
instances in it are negative. There are no labels on and produces the kinematic modes.
the individual instances.
4) Finally, the video is represented as a bag of
The goal of MIL is to classify unseen kinematic modes pooled from all the kinematic
bags or instances by using the labeled bags as the

Department Of CSE,Dr.SACOE,Tiruchendur 484


National Conference On “SOFT COMPUTING”
Sponsored By DRDO
8 th & 9th march 2011

features. This bag is then used to conduct the


feature space embedding of the video.

III.KINEMATICFEATURES
Kinematic features represent the features
that are independent of forces acting on the object
or mass of the object and only capture motion
information. It includes position, velocity and
acceleration. These are all geometrical and time- Figure 1.2 Shows Optical flow
related properties of motion. It can be useful for
recognizing action as it makes the representation A. Divergence

independent of the physical features of the subject Divergence of a flow field is a scalar quantity

performing the action. which is defined at a point ( x, t i ) in space and


time as:
In order to compute the kinematic features, the
optical flow of a given video using a block based
∂u ( x, t i ) ∂v ( x, t i )
correlation algorithm is found out. It is computed f 1 ( x, ti ) = +
∂x ∂y
by selecting a square interrogation block centered
at the same pixel location in two consecutive
∂u ( x, t i ) ∂v ( x, t i )
frames of the sequence. Pixel values in both blocks Where and ,
∂x ∂y
are mean normalized and a correlation surface is
respectively, are the partial derivatives of u and v
obtained by performing cross correlation in the
components of the optical flow with respect to the
frequency domain. Peaks are located on the
correlation surface and used to compute the x and y direction at time t i .

displacement of the pixel. The process is repeated This feature is important for discriminating
for all possible blocks in the image. As a post between types of motions which involve
processing step, local outliers are removed by independent motion of different body parts. For
applying local median filtering and removed instance, in “hand-waving” action, only one part of
vectors are filled by interpolation of the the body is involved, while, in “bending” action,
neighboring flow vectors. The size of the block the complete upper body plays a role.
employed in our experiments is
16 x 16 pixels. The process is repeated for all B.Vorticity
frames, to generate a stack of optical flows for the Vorticity is the measure of local spin around
video. Thus optical flow is generated between the axis perpendicular to the plane of the flow field.
consecutive frames of the video frames as shown in It can also be defined as circulation per unit area. It
Figure 1.2 is computed at a point ( x, t i ) as follows:

∂u ( x, t i ) ∂v ( x, t i )
f 2 ( x, t i ) = −
∂x ∂y
It is useful for distinguishing between actions
that involve articulated motion and ones that do

Department Of CSE,Dr.SACOE,Tiruchendur 484


National Conference On “SOFT COMPUTING”
Sponsored By DRDO
8 th & 9th march 2011

 ∂ u( x,ti ) ∂ u( x,ti ) 
not. Then it is useful for highlighting dynamics in
the flow field resulting from local circular motion
of the human body or part of the body. The “bend”
 
∂x ∂y 
action is a good example of this type of motion,
where circular motion of the body is around the

∇ U ( x, t i ) = 
∂ v( x,ti ) ∂ v( x,ti ) 
perpendicular axis passing through the torso.
C.Symmetric and Asymmetric flow fields

Symmetric and asymmetric fields capture the  


dynamics that emphasize the
symmetry or asymmetry of a human action around  ∂x ∂y 
a diagonal axis. The symmetric and asymmetric
kinematics of u and v components of the flow field An invariant property is particularly useful for
is computed separately, resulting in four kinematic human actions since many interesting aspects of a
measurements in total. The symmetric kinematic flow field can be described in terms of features that
features are computed as follows: are coordinate invariant. The three principal
invariants of the gradient tensor can be written as
f 3 (t i ) = u (t i ) + u (t i )* ,
P ( x, t i ) = −trace ( ∇U ( x, t i ) ) ,
f 4 (t i ) = v(t i ) + v(t i ) * ,
The asymmetric kinematic features are
computed as follows:
Q ( x, t i ) =
1 2
2
( (
P − trace ∇U ( x, t i ) ,
2
))
f 5 (t i ) = u (t i ) − u (t i )* ,
R( x, t i ) = − det ( ∇U ( x, t i ) ).
f 6 (t i ) = v(t i ) − v (t i )*,
where u (t i ) and v(t i ) represent the u and v The first invariant P is the trace of the gradient
tensor, which is equal to the divergence. Therefore,
components of the optical flow at time (t i ). The
we utilize only the second and third invariants as
symbol “*” denotes the transpose operation.
they are providing us with new information, i.e., f
It is useful for distinguishing between actions
of raising the right hand compared to raising the 7 = Q and f 8 = R.

left hand. If the diagonal axis is drawn from the top


left to the bottom right of the image, then E.Rate of Strain and Spin Tensor features
symmetric and asymmetric kinematics can help in The rate of strain tensor S and rate of spin
differentiating these two actions tensors O can be obtained by decomposing the flow
. gradient tensor as follows:
D. Gradient Tensor Features
S ( x, t i ) =
1
2
(
∇U ( x, t i ) + ∇U ( x, t i ) ,

)
These features take small-scale structures
present in a flow field which arise due to small-
scale motion of different limbs. The optical flow O ( x, t i ) =
1
2
(
∇U ( x, t i ) − ∇U ( x, t i ) ,

)
gradient tensor is computed as follows:

Department Of CSE,Dr.SACOE,Tiruchendur 484


National Conference On “SOFT COMPUTING”
Sponsored By DRDO
8 th & 9th march 2011

where “*“ is the matrix transpose operation. time-dependent expansion coefficients αk (t i )


These two tensors are often used as a measure of such that the reconstruction
the deformability which occurs due to the presence
of the gradients in the flow field.
M
IV.KINEMATIC MODES U ( x, t i ) = ∑α k ( t i ) ⋅ φk ( x ) , i = 1,......., M
k =1
These modes are obtained by performing PCA
directly on the two components (u,v) of optical is optimal in the sense that the average least

flow. It can be observed that, as the number of squares truncated error


M
modes increases, smaller energy-containing scales
∈m = U ( x, t i ) − ∑αk ( t i ) ⋅ φk ( x )
are exhibited .These modes do not provide k =1

information about the dynamics (e.g., rotation,


is minimum for any given number,m ≤ M, of the
expansion, symmetry, etc.) of the underlying
basis vectors over all possible sets of orthogonal
optical flow. In order to capture the dynamics
basis.
information of the optical flow, a set of orthogonal
The eigen vectors can be represented as a linear
basis in terms of its dynamics is computed, instead
combination of the data vectors:
of its energy content. For that, PCA is performed
M
separately on spatiotemporal volumes of each φk = ∑vi kU ( x, t i ), k =1,........, M .
i =1
kinematic feature.
k
The coefficients vi can be obtained from the
V.PRINCIPALCOMPONENT ANALYSIS solution of
It is a well-known technique for determining
CV = λV ,
an optimal basis for the reconstruction of data. Let
k k
where v= v1 ...... v m is the kth eigenvector of
U ( x, t i ) , i= 1…M, represents a vectorized
the above equation and C is a symmetric M x M
sequence of experimental observations.The
matrix defined by
observation at each ti is referred to as a
1
“snapshot” of the physical process that it is Cij = (U ( x, t i ).U ( x, t j ))
M
measuring. Let, u ( x, t i ) represents the optical
VI.COMPUTATION OF KINEMATIC
flow computed at frame t i of the given video.
MODES
Without any loss of generality, the time average of
Kinematic modes are essential ingredients of
the observation defined by,
the spatiotemporal patterns representing the human
M
1
U ( x ) = U ( x, t i ) = ∑U ( x, t ) i
action.To obtain kinematic modes of the optical
M i =1 flow, the orthogonal basis of the kinematic features
of the optical flow field is computed. Theoretically,

is assumed to be zero. The symbol . represents it can be done by treating the kinematic features

the averaging operation. The PCA then extracts the f 1 ( x, t i ), f 2 ( x, t i )......... ..... f 11 ( x, t i )
time independent orthonormal basis φk (x ) and as

Department Of CSE,Dr.SACOE,Tiruchendur 484


National Conference On “SOFT COMPUTING”
Sponsored By DRDO
8 th & 9th march 2011

kinematic kernels for the application of PCA. The IEEE CS Conf. Computer Vision and Pattern
kernel matrix C is represented as: Recognition, 2008.

C k (t i , t j ) =
1
M
f ( ( t ) ⋅ f (t )),
k
i
k
j
[3] J.C. Niebles, H. Wang, and L. Fei-Fei,
where k is the index of the kinematic feature being “Unsupervised Learning of Human Action
used. Categories Using Spatial-Temporal Words,” Proc.
British Machine Vision Conf., 2006.
[4]. E. Shechtman and M. Irani, “Space-Time
VII.CONCLUSION Behavior Based Correlation,”
The proposed action recognition algorithm is Proc. IEEE CS Conf. Computer Vision and Pattern
evaluated on two publicly available data sets: the Recognition, 2005.
Weizmann action data set and the KTH action data
set. The goal is to determine the presence or [5] A. Oikonomopoulousm, I. Patras, and M.
absence of the target action in the given video. The Pantic, “SpatiotemporalSaliency for Human Action
utility of kinematic features derived from motion Recognition,” Proc. IEEE Int’l Conf.
information for the task of human action Multimedia and Expo, 2005.
recognition in videos is explored.
[6] P. Dollar, V. Rabaud, G. Cottrell, and S.
The kinematic features are computed from the
Belongie, “Behavior Recognition via Sparse
optical flow. Next, it is hypothesized that the
Spatio-Temporal Features,” Proc. IEEE Int’l
dynamic information of the optical flow is
Workshop VS-PETS, 2005.
represented by the kinematic features in terms of
dominant kinematic modes. These dominant [7] Hoey and J. Little, “Representation and
kinematic modes are computed by performing PCA Recognition of Complex Human Motion,” Proc.
on each kinematic feature. IEEE CS Conf. Computer Vision and Pattern
For classification, an MIL model where each Recognition, 2000.
action video is treated as a bag or a collection of
kinematic modes. Each bag is embedded into a [8] Y. Chen, J. Bi, and J.Z. Wang, “MILES:
kinematic-mode-based feature space in which the Multiple Instance Learning via Embedded Instance
coordinates of the videos in this space are used for Selection,” IEEE Trans. Pattern Analysis and
classification using the nearest neighbor classifier Machine Intelligence, vol. 28, no. 12, pp. 1931-
1947, Dec. 2006.
VIII.REFERENCES

[1] J. Yamato, J. Ohya, and K. Ishii, “Recognizing


Human Action in Time Sequential Images Using
Hidden Markov Model,” Proc. IEEE CS Conf.
Computer Vision and Pattern Recognition, 1992.

[2] J. Liu, S. Ali, and M. Shah, “Recognizing


Human Actions Using Multiple Features,” Proc.

Department Of CSE,Dr.SACOE,Tiruchendur 484

S-ar putea să vă placă și