Documente Academic
Documente Profesional
Documente Cultură
3D Pose Estimation
Estimate positions and angles of individual body parts in a 3D space Monocular refers to a single camera system Very reliable in controlled situations used in motion tracking Currently poor performance in realistic scenes Frequently relies on edge detection/background subtraction Potential problems: loose clothing, occlusions, ego motion, background clutter
This paper
Performs 3D pose estimation of multiple people simultaneously with a single camera in a realistic street scene
Viewpoint Estimation
This method only detects people, and only from a single viewpoint This paper trains 10 of these detectors from a multiview dataset each detector assumes a different viewpoint This gives us viewpoint estimation find the detector with the strongest response to the scene
Tracklet Extraction
Want to extract tracks of each person relating temporal states can give us more information for body pose estimation even gives more robustness against occlusion Use pictorial structure model as detector, to get bounding boxes, and likely viewpoint at each frame, for each person Treat bounding boxes and viewpoint probabilities as emissions, hypotheses as states, in a Hidden Markov Model Use Viterbi Decoding to extract most likely sequence of states/viewpoints.
Tracklet Extraction
Transition Probabilities between states:
For viewpoints, high transition probabilities between similar viewpoints, to reflect that people turn slowly For bounding boxes, transition probability is proportional to difference between RGB colour histograms within each bounding box
3D Pose Estimation
Use 2D->3D examplars to pick most likely 3D pose in tracklet, for each frame. This gives us M body pose hypotheses for each frame, where M is the length of the tracklet 3D body pose at frame m: Qm = {qm, m, hm} q joint configuration body rotation in 3D world h position and scale of the body
Representation of Pose
3D Pose Estimation
Single Frame likelihood: Breakdown:
hGLPVM
Given the above information, with a prior estimation of p(Q1:m)= p(q1:m)p(h1:m), we can estimate the posterior probability of the frames using hGLPVM This models the sequence of poses as a Gaussian process, and solves using MAP estimation