Documente Academic
Documente Profesional
Documente Cultură
3, APRIL 2008
Abstract—This paper presents a novel posture classification system to detect pedestrians in outdoor environments and rec-
system that analyzes human movements directly from video se- ognize their activities from multiple views based on a mixture of
quences. In the system, each sequence of movements is converted Gaussian classifiers. Other approaches extract human postures
into a posture sequence. To better characterize a posture in a
sequence, we triangulate it into triangular meshes, from which we or body parts (such as the head, hands, torso, or feet) to ana-
extract two features: the skeleton feature and the centroid context lyze human behavior [5], [6]. Mikic et al.. [5] used complex 3-D
feature. The first feature is used as a coarse representation of models and multiple video cameras to extract 3-D information
the subject, while the second is used to derive a finer description. for 3-D posture analysis, while Werghi [6] extracted 3-D infor-
We adopt a depth-first search (dfs) scheme to extract the skeletal mation and applied wavelet transforms to analyze 3-D human
features of a posture from the triangulation result. The proposed
skeleton feature extraction scheme is more robust and efficient postures. Although 3-D features are more informative than 2-D
than conventional silhouette-based approaches. The skeletal fea- features (like contours or silhouettes) for classifying human pos-
tures extracted in the first stage are used to extract the centroid tures, the inherent correspondence problem and high computa-
context feature, which is a finer representation that can charac- tional cost make this approach inappropriate for real-time ap-
terize the shape of a whole body or body parts. The two descriptors plications. As a result, a number of researchers have based their
working together make human movement analysis a very efficient
and accurate process because they generate a set of key postures analyses of human behavior on 2-D postures. In [2], Cucchiara
from a movement sequence. The ordered key posture sequence is et al.. proposed a probabilistic posture classification scheme
represented by a symbol string. Matching two arbitrary action to identify several types of movement, such as walking, run-
sequences then becomes a symbol string matching problem. Our ning, squatting, or sitting. Ivanov and Bobick [11] presented a
experiment results demonstrate that the proposed method is a 2-D posture classification system that uses HMMs to recognize
robust, accurate, and powerful tool for human movement analysis.
human gestures and behaviors. In [13], Wren et al. proposed
Index Terms—Behavior analysis, event detection, string a Pfinder system for tracking and recognizing human behavior
matching, triangulation.
based on a 2-D blob model. The drawback of this approach
is that incorporating 2-D posture models into the analysis of
I. INTRODUCTION human behavior raises the possibility of ambiguity between the
adopted models and real human behavior due to the mutual oc-
Fig. 3. All the vertices are labeled anticlockwise such that the interior of V is
located on their left. Fig. 4. Triangulation result of a body posture. (a) Input posture. (b) Triangula-
tion result of (a).
Fig. 6. Extracting the skeleton of a human model. (a) Original image; (b) the
spanning tree of (a); and (c) a simple skeleton of (a). Fig. 7. Distance transform of a posture. (a) Result of triangulation of a human
posture; (b) skeleton extracted from (a); and (c) calculated distance map of (b).
Step 1: Construct a graph from according to the where . The distance between the two distance maps of
connectivity of nodes in . In addition, get the centroid and can be defined as follows:
from .
(5)
Step 2: Find a node, , whose degree is one and whose
position is the highest among all nodes on .
where represents the image size of . When calcu-
Step 3: Apply a depth first search to to find its spanning
lating (5), and must be normalized to a unit size and their
tree.
centers must be set to the origins of and , respec-
Step 4: Get all leaf nodes and branch nodes from the tively. Fig. 7 shows the result of a distance transform of a posture
tree. Let be the union of , , , and . after skeleton extraction. Fig. 7(a) shows the original posture,
Fig. 7(b) is the result of skeleton extraction, and Fig. 7(c) shows
Step 5: Extract the skeleton from by linking any two
the resultant distance map based on Fig. 7(b).
nodes in if they are connected through other nodes.
IV. POSTURE RECOGNITION USING THE CENTROID CONTEXT
Note that, the spanning tree of obtained by the depth-first
search is also a skeleton. As shown in Fig. 6, (a) is the original The skeleton-based method presented in the previous section
posture, and (b) is the calculated spanning tree. From (b), we can provides a simple and efficient way to represent body postures.
get the corresponding skeleton, shown in (c), using the TSSE However, the skeleton is a coarse feature that can only be used in
algorithm. In fact, the linked interior of (b) is also a skeleton of a coarse search process. To better characterize human postures
(a), but it is not as straight as the skeleton in (c). In this paper, and improve the accuracy of the recognition result, we use the
we call the skeleton shown in Fig. 5(b) a “complex skeleton,” centroid context.
and the one shown in Fig. 5(c) a “simple skeleton.” The simple
A. Centroid Context-Based Description of Postures
skeleton is obtained just by connecting all the branch nodes of
. We compare their performances in Section VI. In this section, we introduce a centroid context-based shape
descriptor that can characterize the interior of a shape. In the
B. Posture Recognition Using a Skeleton literature, quite a number of global features [27]–[29] have
been proposed for shape recognition. For example, the moment
In the previous section, a triangulation-based method was descriptor [29] has good invariance properties for trademark in-
proposed for extracting skeletal features from a body posture. dexing but its calculation is very time-consuming. The Fourier
Assume and are two skeletal images extracted from a descriptor [28] is simple but easily affected by noise. Shape
posture and a posture , respectively. We use a distance trans- context [24] is a good descriptor for shape recognition but it
form to convert each skeleton into a gray level image. Then, requires a set of dense feature correspondences. Therefore,
based on the distance maps, we calculate the degree of simi- we propose to use a new type of descriptor-“centroid context”
larity between and . to tackle the above problems. Basically, the descriptor can
Assume is the distance map of . Then, the value be used in the fine search process. Since the triangulation
of a pixel in is the shortest distance between it and all results of human postures vary, we calculate the distribution
foreground pixels in , i.e., of every posture based on the relative positions of the meshs’
centroids. A descriptor of this form guarantees robustness and
(3) compactness. Assume all postures are normalized to a unit size.
376 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 3, APRIL 2008
Fig. 10. Multiple centroid contexts using different numbers of sectors and
shells: (a) 4 shells and 15 sectors, (b) 8 shells and 30 sectors, and (c) centroid
histogram of the gravity center of the posture.
Fig. 8. Polar transform of a human posture.
Assume that all the postures have been extracted from a video
Algorithm for Centroid Context Extraction: clip and each frame has only one posture. In addition, assume
is the posture extracted from the th frame. Given two adjacent
Input: the spanning tree of a posture . postures, and , the distance between them, , is calcu-
Output: the centroid context of . lated using (10), where is set at 0.5. Assume is the average
value of for all pairs of adjacent postures. For a posture , if
Step 1: Recursively trace using a depth first search until is greater than , we define it as posture-change instance.
the tree is empty. If a branch node (a node with two children) By collecting all postures that correspond to posture-change in-
is found, collect all the visited nodes to a new path and stances, we derive a set of key posture candidates . How-
remove these nodes from . ever, may still contain many redundant postures, which
Step 2: If a new path only contains two nodes, eliminate would degrade the accuracy of the sequence modeling. To ad-
it; otherwise, find its path centroid . dress this problem, we use a clustering technique to find a better
set of key postures.
Step 3: Using (6) to find the centroid histogram of each Initially, we assume each element in individually
path centroid . forms a cluster . Then, given two cluster elements, and ,
Step 4: Collect all the histograms as the centroid in , the distance between them is defined by
context of .
(11)
The time complexity of centroid context extraction is
, where is the number of triangle meshes in where is defined in (10) and represents the number
and . of elements in . According to (11), we can execute an itera-
tive merging scheme to find a compact set of key postures from
B. Posture Recognition Using the Skeleton and the Centroid . Let and be the th cluster and the collection of
Context all clusters , respectively, at the th iteration. At each itera-
Given a posture, we can extract its skeleton feature and cen- tion, we choose a pair of clusters, and , whose distance
troid context using the techniques described in Sections III-A is the minimum among all pairs in , i.e.,
and IV-A, respectively. Then, the distance between any two pos-
tures can be measured using (5) (for the skeleton) or (9) (for the
centroid context). As noted previously, the skeleton feature is
used for a coarse search and the centroid context feature is used
for a fine search. To obtain better recognition results, the two If is less than , then and are merged
distance measures should be integrated. We use the following to form a new cluster and construct a new collection of clusters
weighted sum to represent the total distance: . The merging process is executed iteratively until no fur-
ther merging is possible. Assume is the final set of clusters.
(10) Then, from the th cluster in , we can extract a key posture
that satisfies the following equation:
where is the total distance between two postures
and and is a weight used to balance and (12)
.
where the “insertion,” “deletion,” and “replacement” operations Input: two strings and .
correspond, respectively, to the transitions from cell
to cell , from cell to cell , and from cell Output: the edit distance between and .
to cell , respectively. Step 1: and .
Assume that is a “walking” video clip whose string repre-
sentation is “swwwwwwe.” is different from that in Fig. 11(a) Step 2: and for all
due to the scaling problem along the time axis. According to and
(13), the edit distance between and the sequence in Fig. 11(a) Step 3: For all and do
is 3, while the edit distance between and the sequences
shown in Fig. 11(b) and (c) is 2. However, is more similar
to Fig. 11(a) than to Fig. 11(b) and (c). Clearly, (13) cannot
be directly applied to human movement sequence analysis
and must be modified. Thus, we define a new edit distance to Step 4: Return .
address this problem.
Let , , and be the respective costs of the “inser- VI. EXPERIMENTAL RESULTS
tion,” “replacement,” and “deletion” operations performed at the To test the efficiency and effectiveness of the proposed ap-
th and th characters of and , respectively. Then, (13) proach, we constructed a large database of more than thirty
can be rewritten as thousand postures, from which we obtained over three hun-
dred human movement sequences. The ratio of female type in
this database is 20%. In each sequence, we recorded a spe-
cific type of human behavior: walking, sitting, lying, squatting,
(14) falling, picking up, jumping, climbing, stooping, or gymnas-
tics. Fig. 12 illustrates the skeleton-based posture recognition
Here, the “replacement” operation is considered more important system. The left-hand side of the figure shows a person walking,
than “insertion” and “deletion”, since it means a change of pos- while the right-hand side shows twelve pre-determined key pos-
ture type. Thus, the weight of “insertion” and “deletion” should tures for matching. Among the twelve key postures, which were
be scaled down to , where . Following this rule, if an determined by a clustering algorithm, the one surrounded by
“insertion” is found when calculating , its weight a bold frame matches the current posture of the query action
HSIEH et al.: VIDEO-BASED HUMAN MOVEMENT ANALYSIS 379
Fig. 14. Result of key posture selection from three action sequences: walking,
squatting, and gymnastics.
Fig. 12. Results of walking behavior analysis using the skeleton feature. The
recognized posture is highlighted with a bold frame and shown in (b).
Fig. 15. Results of skeleton extraction using different sampling rates. The first
row shows the skeletons extracted by connecting all triangle centroids, while
the second row shows the skeletons extracted by connecting all branch nodes.
Fig. 13. Result of walking behavior analysis using the centroid context feature.
The recognized type is highlighted with a bold frame and shown in (b).
posture selection from sequences of walking, squatting, and
gymnastics. Due to space limitations, only 36 key postures
are shown in Fig. 14. With this set of key postures, different
sequence [see (b)]. Fig. 13 shows the centroid context-based
movement sequences can be converted into a set of symbols
posture recognition system. The current posture shown on the
and then recognized through a string matching method. In the
left-hand side matches the third key posture (surrounded by a
experiments, the average accuracy of key posture selection was
bold frame) located on the right-hand side of the figure [see (b)].
91.9%.
In the first set of experiments, the performance of key posture
In the second set of experiments, we evaluated the perfor-
selection was examined. Since there is no benchmark behavior
mance of skeleton extraction using triangulation. Fig. 15 shows
database in the literature, twenty movement sequences (two se-
the results of skeleton extraction using different sampling rates.
quences per type) were chosen to evaluate the accuracy of our
The top row illustrates the skeletons extracted by connecting all
proposed key posture selection algorithm. For each sequence, a
the triangle centroids, while the bottom row shows the skeletons
set of key postures was manually selected to serve as the ground
extracted by connecting all the branch nodes. It is obvious that
truth. Two postures were considered “correctly matched” if their
no matter which sampling rate was used, the skeletons were
distance was smaller than 0.1 [see (10)]. Let
always extracted; however, a higher sampling rate achieved a
and be the numbers of key frames selected by a manual
better approximation result. Fig. 16 shows another example of
method and our approach, respectively. We can then define the
extracting a simple skeleton and a complex skeleton directly
accuracy of key posture selection as follows:
from a profile posture. In this paper, a simple skeleton means
a skeleton derived by connecting all the branch nodes, and a
complex skeleton is a skeleton derived by connecting all the
centroids of all the triangles. To classify a posture, we have
where is the number of key postures selected by our to convert a skeleton into its corresponding distance map.
method, each of which matches one of the manually selected Fig. 17(a) and (b) show the results of applying the distance
key postures. is the number of manually selected key transform to the skeletons shown in Fig. 16(a) and (b), respec-
postures, each of which correctly matches one of the automat- tively. Table I details the success rates of posture recognition
ically selected key postures. Fig. 14 shows the results of key when different numbers of meshes (resolutions) were used to
380 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 3, APRIL 2008
Fig. 16. Skeleton extraction directly from a posture profile. (a) Simple skeleton
derived by connecting branch nodes and (b) complex skeleton derived by con-
necting all triangle centroids.
Fig. 18. Posture recognition results when the simple skeleton is used. The first
row shows the skeletons of all query postures. The silhouettes in the second row
show the corresponding posture type retrieved from the database.
Fig. 17. Result of distance transform. (a) Distance map extracted from
Fig. 16(a) and (b) distance map extracted from Fig. 16(b).
TABLE I
SUCCESS RATES OF POSTURE RECOGNITION WHEN DIFFERENT NUMBERS OF
MESHES WERE USED FOR SKELETON EXTRACTION
Fig. 19. Success rates of posture recognition using the simple skeleton and
complex skeleton on different movement types.
extract both simple and complex skeletons. From the table,
we observe that 1) the performance obtained using complex
skeleton representation was better than that derived using the
simple skeleton and 2) the more meshes used for representation,
the better the performance of posture recognition.
We explain how the posture recognition process works by
the following example. The top row of Fig. 18 shows a number
of input postures represented by simple skeletons. Their corre-
sponding binary maps are shown in the bottom row of the figure
using dark shading. The associated silhouettes, shown in the
second row, represent the corresponding posture types retrieved
from the posture database. Fig. 19 compares the performance
of the simple and complex skeletons in recognizing postures
Fig. 20. Different numbers of sectors and shells used to define the centroid
from different types of action (movement) types. From Fig. 19, context: (a) 4 shells and 15 sectors and (b) 8 shells and 30 sectors.
it is obvious that the complex skeleton consistently outperforms
the simple skeleton. The reason why the above results were ob-
tained was due to the fact that the simple skeleton loses more were (4, 15), (4, 20), (4, 30), (8, 15), (8, 20), and (8, 30), in which
information than the complex skeleton. Therefore, the complex the first argument represents the number of shells and the second
one performs better than the simple one in terms of accuracy. represents the number of sectors. Fig. 20 shows two types of
The next set of experiments was conducted to evaluate the polar partition. Table II lists the recognition rates when different
performance of the centroid context feature. Since the results (shell, sector) parameter sets were predefined and applied to the
of this experiment were influenced by the number of predefined single centroid context and the multiple centroid context cases.
sectors and shells, we conducted six sets of experiments. They In the former case, the center was set at the gravity center of a
HSIEH et al.: VIDEO-BASED HUMAN MOVEMENT ANALYSIS 381
TABLE II
RECOGNITION RATES OF POSTURES WHEN DIFFERENT NUMBERS
OF SECTORS AND SHELLS WERE USED
Fig. 23. Posture recognition results using (from left to right) the CC, Shape
Context, PPM, and Zernike moment methods, respectively. The shaded region
shows the input query, and the silhouettes are their recognized posture types.
TABLE III
FUNCTIONAL COMPARISONS AMONG CENTROID CONTEXT, SHAPE CONTEXT,
PPM, AND MOMENTS. 3K : NUMBER OF MESHES; v : NUMBER OF BODY
PARTS; n: NUMBER OF BOUNDARY POINTS; (w; h): DIMENSION OF THE INPUT
POSTURE; L: NUMBER OF MOMENT ORDERS. T: TRANSLATION; S: SCALING;
R: ROTATION. IT IS NOTICED THAT n K v
Fig. 21. Posture recognition results using multiple centroid contexts.
Fig. 25. Abnormal activity detection. (a) Five key postures defining several
normal human movements. (b) Normal condition is detected. (c) Warning mes-
sage is triggered when an abnormal posture is detected.
Fig. 24. Comparison of the performance of different approaches.
TABLE IV
ANALYSIS OF TEN BEHAVIOR TYPES USING OUR PROPOSED
STRING MATCHING SCHEME
Fig. 26. Abnormal posture detection. (a) and (c): Normal postures were de-
tected. (b) and (d): Abnormal postures were detected due to the unexpected
matching scheme. We collected 450 movement sequences, “shooting” and “climbing wall” postures, respectively.
divided equally among the ten behavior types (i.e., each be-
havior type had 45 movement sequences). The experiment
results, listed in Table IV, show that most of the test sequences Fig. 25(a). However, the postures in Fig. 26(b) and (d) were clas-
were correctly identified and classified into the appropriate sified as abnormal, since the subjects adopted “shooting” and
categories. However, some of the “squat” sequences tended “climbing wall” postures. The abnormal posture detection func-
to be misclassified as “picking up” behavior due to their high tion improves a video surveillance system in two ways. First, it
degree of similarity to squatting. In this set of experiments, the saves a great deal of storage space because it is only necessary to
average accuracy of behavior analysis was 94.5%. store abnormal postures. Second, it responds immediately when
To test the proposed approach on real-world applications, we an abnormal posture is identified.
analyzed irregular (or suspicious) human movements captured
by a surveillance camera. First, we extracted a set of “normal” VII. CONCLUSION
key postures from video sequences to learn “regular” human We have proposed a method for analyzing human movements
movements like walking or running. After a training process, we directly from videos. Specifically, the method comprises the
took unknown surveillance sequences and determined whether following.
the postures in them were regular or not. If some irregular pos- 1) A triangulation-based technique that extracts two im-
tures appear continuously, an alarm message triggers a security portant features, the skeleton feature and the centroid
warning. For example, in Fig. 25(a), a set of normal key pos- context feature, from a posture to derive more semantic
tures was extracted from a walking sequence. From these pos- meaning. The features form a finer descriptor that can
tures, we can tell that the posture in (b) is a “normal” posture. describe a posture from the shape of the whole body or
However, the posture in (c) could be classified as “abnormal,” from body parts. Since the features complement each
since the person adopted a bending-down posture. We used a other, all human postures can be compared and classi-
marker to label these abnormal posture sequences until they re- fied very accurately.
turned to normal. Fig. 26 shows another two cases of abnormal 2) A clustering scheme for key posture selection, which
posture detection. The postures in Fig. 26(a) and (c) were rec- recognizes and codes human movements using a set of
ognized as normal, since they were similar to the postures in symbols.
HSIEH et al.: VIDEO-BASED HUMAN MOVEMENT ANALYSIS 383
3) A novel string-based technique for recognizing human [21] E. Horowitz, S. Sahni, and S. A. Freed, Fundamentals of Data Structure
movements. Even though events may have different in C. New York: W. H. Freeman.
[22] G. Borgefors, “Distance transformations in digital images,” Comput.
scaling changes, they can still be recognized. Vis., Graph., Image Process., vol. 34, no. 3, pp. 344–371, Feb. 1986.
The average accuracies of posture classification and behavior [23] M. Bern and D. Eppstein, Mesh Generation and Optimal Triangulation,
analysis were 95.16% and 94.5%, respectively. The experiment Computing in Euclidean Geometry, 2nd ed. : World Scientific, 1995,
results show that the proposed method is superior to other ap- pp. 47–123.
[24] C. Stauffer and E. Grimson, “Learning patterns of activity using real-
proaches for analyzing human movements in terms of accuracy, time tracking,” IEEE Trans. Pattern Recognit. Machine Intell., vol. 22,
robustness, and stability. no. 8, pp. 747–757, 2000.
[25] D. Chetverikov and Z. Szabo, “A simple and efficient algorithm for de-
tection of high curvature points in planar curves,” in Proc. 23rd Work-
REFERENCES shop Austrian Pattern Recognition Group, 1999, pp. 175–184.
[1] T. B. Moeslund and E. Granum, “A survey of computer vision-based [26] L. P. Chew, “Constrained Delaunay triangulations,” Algorithmica, vol.
human motion capture,” Comput. Vis. Image Understand., vol. 81, no. 4, no. 1, pp. 97–108, 1989.
3, pp. 231–268, Mar. 2001. [27] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object
[2] R. Cucchiara, C. Grana, A. Prati, and R. Vezzani, “Probabilities posture recognition using shape contexts,” IEEE Trans. Pattern Recognit. Ma-
classification for human-behavior analysis,” IEEE Trans. Syst., Man, chine Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002.
Cybern. A, vol. 35, no. 1, pp. 42–54, Jan. 2005. [28] Y. Rui, A. C. She, and T. S. Huang, “Modified Fourier descriptors
[3] S. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-time surveil- for shape representation -a practical approach,” in Proc. 1st Int. Work-
lance of people and their activities,” IEEE Trans. Pattern Anal. Machine shop on Image Databases and Multi-Media Search, Amsterdam, The
Intell., vol. 22, no. 8, pp. 809–830, 2000. Netherlands, Aug. 1996.
[4] S. Maybank and T. Tan, “Introduction to special section on visual [29] A. Kontanzad and Y. H. Hong, “Invariant image recognition by Zernike
surveillance,” Int. J. Comput. Vis., vol. 37, no. 2, pp. 173–173, 2000. moments,” IEEE Trans. Pattern Recognit. Machine Intell., vol. 12, no.
[5] I. Mikic et al., “Human body model acquisition and tracking using 5, pp. 489–497, May 1990.
voxel data,” Int. J. Comput. Vis., vol. 53, no. 3, pp. 199–223, 2003. [30] E. Ukkonen, “On approximate string matching,” in Proc. Int. Conf. on
[6] N. Werghi, “A discriminative 3D wavelet-based descriptors: Applica- Foundations of Comp. Theory, 1983, pp. 487–495.
tion to the recognition of human body postures,” Pattern Recognit. [31] M. J. Swain and D. H. Ballard, “Color indexing,” Int. J. Comput. Vis.,
Lett., vol. 26, no. 5, pp. 663–677, 2005. vol. 7, no. 1, pp. 11–32, 1991.
[7] I.-C. Chang and C.-L. Huang, “The model-based human body motion
analysis,” Image Vis. Comput., vol. 18, no. 14, pp. 1067–1083, 2000.
[8] H. Fujiyoshi, A. J. Lipton, and T. Kanade, “Real-time human motion
analysis by image skeletonization,” IEICE Trans. Inform. Syst., vol.
E87-D, no. 1, pp. 113–120, Jan. 2004.
[9] N. Oliver, B. Rosario, and A. Pentland, “A Bayesian computer vision
system for modeling human interactions,” IEEE Trans. Pattern Anal.
Machine Intell., vol. 22, no. 8, pp. 831–843, Aug. 2000.
[10] R. Rosales and S. Sclaroff, “3D trajectory recovery for tracking mul-
tiple objects and trajectory guided recognition of actions,” in Proc. Jun-Wei Hsieh (M’06) received the B.S. degree in
IEEE Conf. On Computer Vision and Pattern Recognition, Jun. 1999, computer science from TongHai University, Taiwan,
vol. 2, pp. 637–663. R.O.C., in 1990 and the Ph.D. degree in computer
[11] Y. Ivanov and A. Bobick, “Recognition of visual activities and inter- engineering from the National Central University,
actions by stochastic parsing,” IEEE Trans. Pattern Recognit. Machine Taiwan, in 1995.
Intell., vol. 2, no. 8, pp. 852–872, Aug. 2000. From 1996 to 2000, he was a Researcher Fellow
[12] S. Weik and C. E. Liedtke, “Hierarchical 3D pose estimation for artic- at the Industrial Technology Researcher Institute,
ulated human body models from a sequence of volume data,” in Proc. Hsinchu, Taiwan, and managed a team to develop
video-related technologies. He is presently an As-
Int. Workshop on Robot Vision 2001, Auckland, New Zealand, Feb.
sociate Professor with the Department of Electrical
2001, pp. 27–34.
Engineering, Yuan Ze University, Chung-Li, Taiwan.
[13] C. R. Wren et al., “Pfinder: Real-time tracking of the human body,”
His research interests include content-based multimedia databases, video
IEEE Trans. Pattern Anal. Machine Intell., vol. 19, no. 7, pp. 780–785, indexing and retrieval, computer vision, and pattern recognition.
Jul. 1997. Dr. Hsieh received the Best Paper Awards from the Image Processing and
[14] J. Shewchuk, “Delaunay refinement algorithms for triangular mesh Pattern Recognition Society of Taiwan in 2005, 2006, and 2007, and the Distin-
generation, computational geometry,” Theory Applicat., vol. 23, pp. guished Research Award from Yuan Ze University in 2006. He is a recipient of
21–74, May 2002. the Phai-Tao-Phai Award.
[15] S. Park and J. K. Aggarwal, “Segmentation and tracking of interacting
human body parts under occlusion and shadowing,” in Proc. IEEE
Workshop on Motion and Video Computing, Orlando, FL, 2002, pp.
105–111.
[16] S. Park and J. K. Aggarwal, “Semantic-level understanding of human
actions and interactions using event hierarchy,” in Proc. 2004 Conf.
Computer Vision and Pattern Recognition Workshop, Washington, DC,
Jun. 2, 2004.
[17] Y. Guo, G. Xu, and S. Tsuji, “Understanding human motion pat-
Yung-Tai Hsu was born in Kaohsiung, Taiwan,
terns,” in Proc. IEEE Int. Conf. Pattern Recognition, 1994, vol. 2,
R.O.C., in 1978. He received the B.S and M.S
pp. 325–329. degrees in mechanical engineering from Yuan Ze
[18] Y. Guo, G. Xu, and S. Tsuji, “Tracking human body motion based on University, Taiwan, in 2000 and 2002, respectively.
a stick figure model,” J. Vis. Commun. Image Representat., vol. 5, pp. He is currently pursuing the Ph.D degree in elec-
1–p, 1994. trical engineering at Yuan Ze University, Chung-Li,
[19] Ben-Arie, Z. Wang, P. Pandit, and S. Rajaram, “Human activity recog- Taiwan.
nition using multidimensional indexing,” IEEE Trans. Pattern Anal. His research interests include pattern recognition,
Machine Intell., vol. 24, no. 8, pp. 1091–1104, Aug. 2002. computer vision, and machine learning.
[20] S. X. Ju, M. J. Black, and Y. Yaccob, “Cardboard people: A parameter- Mr. Tsai received the Best Paper Award for
ized model of articulated image motion,” in Proc. Int. Conf. Automatic the 2006 IPPR Conference on Computer Vision,
Face and Gesture Recognition, Killington, VT, 1996, pp. 38–44. Graphics, and Image Processing.
384 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 3, APRIL 2008
Hong-Yuan Mark Liao (SM’01) received the Chih-Chiang Chen was born in Taipei, Taiwan,
B.S. degree in physics from National Tsing-Hua R.O.C., in 1982. He received the B.S degree in
University, Hsinchu, Taiwan, R.O.C., in 1981, and electrical engineering from Yuan Ze University,
the M.S. and Ph.D. degrees in electrical engineering Taiwan, in 2004, where he is currently pursuing the
from Northwestern University, Evanston, IL, in 1985 Ph.D degree in electrical engineering.
and 1990, respectively. His research interests include image processing,
He was a Research Associate with the Computer behavior analysis, and video surveillance.
Vision and Image Processing Laboratory, North-
western University, in 1990–1991. In July 1991,
he joined the Institute of Information Science,
Academia Sinica, Taipei, Taiwan, as an Assistant
Research Fellow. He was promoted to Associate Research Fellow and then
Research Fellow in 1995 and 1998, respectively. From August 1997 to July
2000, he served as the Deputy Director of the institute. From February 2001
to January 2004, he was the Acting Director of the Institute of Applied
Science and Engineering Research. He is jointly appointed as a Professor of
the Computer Science and Information Engineering Department of National
Chiao-Tung University. His current research interests include multimedia
signal processing, video surveillance, and content-based multimedia retrieval.
Dr. Liao is the Editor-in-Chief of the Journal of Information Science and En-
gineering. He is on the Editorial Boards of the International Journal of Visual
Communication and Image Representation, the EURASIP Journal on Advances
in Signal Processing, and Research Letters in Signal Processing. He was an As-
sociate Editor of the IEEE TRANSACTIONS ON MULTIMEDIA from 1998 to 2001.
He received the Young Investigators’ Award from Academia Sinica in 1998, the
Excellent Paper Award from the Image Processing and Pattern Recognition So-
ciety of Taiwan in 1998 and 2000, the Distinguished Research Award from the
National Science Council of Taiwan in 2003, and the National Invention Award
in 2004. He served as the Program Chair of the International Symposium on
Multimedia Information Processing (ISMIP’97), the Program Co-Chair of the
Second IEEE Pacific-Rim conference on Multimedia (2001), the Conference
Co-Chair of the Fifth IEEE International Conference on Multimedia and Ex-
position 2004 (ICME), and as Technical Co-Chair of the Eighth IEEE ICME
(2007).