A Dual-Source Approach for 3D Pose Estimation from a Single Image

Hashim Yasin        Umar Iqbal        Björn Krüger        Andreas Weber        Juergen Gall


Abstract

One major challenge for 3D pose estimation from a single RGB image is the acquisition of sufficient training data. In particular, collecting large amounts of training data that contain unconstrained images and are annotated with accurate 3D poses is infeasible. We therefore propose to use two independent training sources. The first source consists of images with annotated 2D poses and the second source consists of accurate 3D motion capture data. To integrate both sources, we propose a dual-source approach that combines 2D pose estimation with efficient and robust 3D pose retrieval. In our experiments, we show that our approach achieves state-of-the-art results and is even competitive when the skeleton structure of the two sources differ substantially.


Overview


Overview: Our approach relies on two training sources. The first source is a motion capture database that contains only 3D poses. The second source is an image database with annotated 2D poses. The motion capture data is processed by pose normalization and projecting the poses to 2D using several virtual cameras. This gives many 3D-2D pairs where the 2D poses serve as features. The image data is used to learn a pictorial structure model (PSM) for 2D pose estimation where the unaries are learned by a random forest. Given a test image, the PSM predicts the 2D pose which is then used to retrieve the normalized nearest 3D poses. The final 3D pose is then estimated by minimizing the projection error under the constraint that the solution is close to the retrieved poses, which are weighted by the unaries of the PSM. The steps (red arrows) in the dashed box can be iterated by updating the binaries of the PSM using the retrieved poses and updating the 2D pose.


Publication(s)

Hashim Yasin*, Umar Iqbal*, Björn Krüger, Andreas Weber, Juergen Gall
A Dual-Source Approach for 3D Pose Estimation from a Single Image

IEEE Conference on Computer Vision and Pattern Recognition 2016 (CVPR'16), Las Vegas, USA.
*equal contribution
[PDF] [Supplementary Material] [Poster]


Source Code

Source code is available here


Acknowledgements

Hashim Yasin gratefully acknowledges the Higher Education Commission of Pakistan for providing the financial support. The authors would also like to acknowledge the financial support from the DFG Emmy Noether program (GA 1927/1-1) and DFG research grant (KR 4309/2-1). A big thanks to Andreas Doering for preparing the source code for public availability.