Sunteți pe pagina 1din 15

Monocular 3D Pose Estimation and Tracking by Detection

Mykhaylo Andriluka Stefan Roth Bernt Schiele

3D Pose Estimation
Estimate positions and angles of individual body parts in a 3D space Monocular refers to a single camera system Very reliable in controlled situations used in motion tracking Currently poor performance in realistic scenes Frequently relies on edge detection/background subtraction Potential problems: loose clothing, occlusions, ego motion, background clutter

Why is it interesting for us?

Accurate body pose estimation makes action recognition practically trivial

This paper
Performs 3D pose estimation of multiple people simultaneously with a single camera in a realistic street scene

Pictorial Structures Model


2D part-based model each part i represented by lmi = {xmi,ymi,mi,smi} at frame m Lm - Overall part configuration at frame m Dm - Visual evidence at frame m

Pictorial Structures Model


Body represented as left/right lower and upper legs, torso, head and left/right upper and lower arms Each body part detected individually by parts detectors The posterior probability Lm is maximized to detect the body

Viewpoint Estimation
This method only detects people, and only from a single viewpoint This paper trains 10 of these detectors from a multiview dataset each detector assumes a different viewpoint This gives us viewpoint estimation find the detector with the strongest response to the scene

Tracklet Extraction
Want to extract tracks of each person relating temporal states can give us more information for body pose estimation even gives more robustness against occlusion Use pictorial structure model as detector, to get bounding boxes, and likely viewpoint at each frame, for each person Treat bounding boxes and viewpoint probabilities as emissions, hypotheses as states, in a Hidden Markov Model Use Viterbi Decoding to extract most likely sequence of states/viewpoints.

Tracklet Extraction
Transition Probabilities between states:
For viewpoints, high transition probabilities between similar viewpoints, to reflect that people turn slowly For bounding boxes, transition probability is proportional to difference between RGB colour histograms within each bounding box

3D Pose Estimation
Use 2D->3D examplars to pick most likely 3D pose in tracklet, for each frame. This gives us M body pose hypotheses for each frame, where M is the length of the tracklet 3D body pose at frame m: Qm = {qm, m, hm} q joint configuration body rotation in 3D world h position and scale of the body

Representation of Pose

3D Pose Estimation
Single Frame likelihood: Breakdown:

Position of Body Parts


In order to reduce computational complexity of 3D body part estimation, we find the J most likely locations for each body part n in frame m, and then calculate the Gaussian distribution of that body part. This allows the posterior probability to be modelled as:

hGLPVM
Given the above information, with a prior estimation of p(Q1:m)= p(q1:m)p(h1:m), we can estimate the posterior probability of the frames using hGLPVM This models the sequence of poses as a Gaussian process, and solves using MAP estimation

Pose Estimation Examples

S-ar putea să vă placă și