Documente Academic
Documente Profesional
Documente Cultură
Sponsored By DRDO
8 th & 9th march 2011
1
Ms.N.Prathiba, 2 Mrs.G.Glorindal Selvam
1
M.E Student , Department of Computer Science And Engineering ,
2
Lecturer, Department of Computer Science And Engineering ,
Dr.Sivanthi Aditanar College of Engineering, Tiruchendur
mail4prathi@gmail.com
Abstract–This work proposes a set of kinematic system that may detect a person having a heart
features that are derived from the optical flow for attack in a room or a system that may detect a
representing complex human action in videos. person drowning at a swimming pool can be
Each kinematic feature when computed from the recognized.
optical flow of a sequence of images, gives a Action recognition is done by making use of
spatiotemporal pattern. These patterns are flow generated by motion called optical flow.
represented in the form of dominant kinematic Optical flow reflects the image changes due to
trends or kinematic modes. For motion during a time interval.
classification,Multiple Instance Learning (MIL) is
Since motion is an important source of
used in each action in videos to represent it as a
information for classifying human actions, the main
bag of kinematic modes. Then each video is
objective is to recognize human action in videos
embedded into a kinematic-mode-based feature
from a set of kinematic features that are derived
space. Through this approach, it explores clear
from the optical flow of a sequence of images and
information about the motion of the human for
multiple instance learning. The idea behind the
excellent identification in surveillance scenarios.
introduction of kinematic features is to convert
Keywords- Action recognition, motion, video optical flow information into a more discriminative
analysis, principal component analysis, kinematic representation that can improve the motion-based
features action classification.
Each kinematic feature, when computed from the training data. Multiple instance learning is used,
optical flow of a sequence of because it provides flexibility in picking the
images, gives rise to a spatiotemporal pattern number of kinematic modes used to represent the
which is represented in the form of dominant action. It is important because a complex action
kinematic trends or kinematic modes. These may require more kinematic modes to represent its
kinematic modes are computed by performing dynamics than a simple action.
Principal Component Analysis (PCA) on the
II.OVERVIEW OF ALGORITHM
spatiotemporal volumes of the kinematic features.
A video containing an action is converted into
corresponding bag of kinematic modes whose
The identified kinematic modes of each
pictorial representation is shown in Figure 1.1
kinematic feature are used for the classification of
human actions. To do this, the multiple instance
learning (MIL) approach is involved. The idea of
MIL is to represent each action video as a
collection or a “bag” of kinematic modes in which
each kinematic mode is referred to as an instance
representing that video.
Multiple-instance learning is a variation on
supervised learning. Instead of receiving a set of
instances which are labeled positive or negative,
the learner receives a set of bags that are labeled
Figure 1.1 Description of the process
positive or negative. Each bag contains many
instances. A bag is labeled negative if all the The steps involved in creating the bag of
instances in it are negative. On the other hand, a kinematic modes are
bag is labeled positive if there is at least one 1) A video containing an action is the input to the
instance in it which is positive. From a collection first step, which computes the optical flow between
of labeled bags, the learner tries to induce a consecutive frames of the video and produces a
concept that will label individual instances stack of optical flow fields.
correctly.
2) This stack of optical flow fields is the input to
In supervised learning, every training the second step, which computes the kinematic
instance is assigned a discrete or real-valued label. features and produces a separate, spatiotemporal
In comparison, in MIL, the labels are assigned only volume for each feature.
to bags of instances. In the binary case, a bag is
labeled positive if at least one instance in that bag 3) The third step takes the volume of each
is positive, and the bag is labeled negative if all the kinematic feature as an input, performs the PCA,
instances in it are negative. There are no labels on and produces the kinematic modes.
the individual instances.
4) Finally, the video is represented as a bag of
The goal of MIL is to classify unseen kinematic modes pooled from all the kinematic
bags or instances by using the labeled bags as the
III.KINEMATICFEATURES
Kinematic features represent the features
that are independent of forces acting on the object
or mass of the object and only capture motion
information. It includes position, velocity and
acceleration. These are all geometrical and time- Figure 1.2 Shows Optical flow
related properties of motion. It can be useful for
recognizing action as it makes the representation A. Divergence
independent of the physical features of the subject Divergence of a flow field is a scalar quantity
displacement of the pixel. The process is repeated This feature is important for discriminating
for all possible blocks in the image. As a post between types of motions which involve
processing step, local outliers are removed by independent motion of different body parts. For
applying local median filtering and removed instance, in “hand-waving” action, only one part of
vectors are filled by interpolation of the the body is involved, while, in “bending” action,
neighboring flow vectors. The size of the block the complete upper body plays a role.
employed in our experiments is
16 x 16 pixels. The process is repeated for all B.Vorticity
frames, to generate a stack of optical flows for the Vorticity is the measure of local spin around
video. Thus optical flow is generated between the axis perpendicular to the plane of the flow field.
consecutive frames of the video frames as shown in It can also be defined as circulation per unit area. It
Figure 1.2 is computed at a point ( x, t i ) as follows:
∂u ( x, t i ) ∂v ( x, t i )
f 2 ( x, t i ) = −
∂x ∂y
It is useful for distinguishing between actions
that involve articulated motion and ones that do
∂ u( x,ti ) ∂ u( x,ti )
not. Then it is useful for highlighting dynamics in
the flow field resulting from local circular motion
of the human body or part of the body. The “bend”
∂x ∂y
action is a good example of this type of motion,
where circular motion of the body is around the
∇ U ( x, t i ) =
∂ v( x,ti ) ∂ v( x,ti )
perpendicular axis passing through the torso.
C.Symmetric and Asymmetric flow fields
is assumed to be zero. The symbol . represents it can be done by treating the kinematic features
the averaging operation. The PCA then extracts the f 1 ( x, t i ), f 2 ( x, t i )......... ..... f 11 ( x, t i )
time independent orthonormal basis φk (x ) and as
kinematic kernels for the application of PCA. The IEEE CS Conf. Computer Vision and Pattern
kernel matrix C is represented as: Recognition, 2008.
C k (t i , t j ) =
1
M
f ( ( t ) ⋅ f (t )),
k
i
k
j
[3] J.C. Niebles, H. Wang, and L. Fei-Fei,
where k is the index of the kinematic feature being “Unsupervised Learning of Human Action
used. Categories Using Spatial-Temporal Words,” Proc.
British Machine Vision Conf., 2006.
[4]. E. Shechtman and M. Irani, “Space-Time
VII.CONCLUSION Behavior Based Correlation,”
The proposed action recognition algorithm is Proc. IEEE CS Conf. Computer Vision and Pattern
evaluated on two publicly available data sets: the Recognition, 2005.
Weizmann action data set and the KTH action data
set. The goal is to determine the presence or [5] A. Oikonomopoulousm, I. Patras, and M.
absence of the target action in the given video. The Pantic, “SpatiotemporalSaliency for Human Action
utility of kinematic features derived from motion Recognition,” Proc. IEEE Int’l Conf.
information for the task of human action Multimedia and Expo, 2005.
recognition in videos is explored.
[6] P. Dollar, V. Rabaud, G. Cottrell, and S.
The kinematic features are computed from the
Belongie, “Behavior Recognition via Sparse
optical flow. Next, it is hypothesized that the
Spatio-Temporal Features,” Proc. IEEE Int’l
dynamic information of the optical flow is
Workshop VS-PETS, 2005.
represented by the kinematic features in terms of
dominant kinematic modes. These dominant [7] Hoey and J. Little, “Representation and
kinematic modes are computed by performing PCA Recognition of Complex Human Motion,” Proc.
on each kinematic feature. IEEE CS Conf. Computer Vision and Pattern
For classification, an MIL model where each Recognition, 2000.
action video is treated as a bag or a collection of
kinematic modes. Each bag is embedded into a [8] Y. Chen, J. Bi, and J.Z. Wang, “MILES:
kinematic-mode-based feature space in which the Multiple Instance Learning via Embedded Instance
coordinates of the videos in this space are used for Selection,” IEEE Trans. Pattern Analysis and
classification using the nearest neighbor classifier Machine Intelligence, vol. 28, no. 12, pp. 1931-
1947, Dec. 2006.
VIII.REFERENCES