Pose for Action - Action for Pose

Umar Iqbal        Martin Garbade        Juergen Gall


In this work we propose to utilize information about human actions to improve pose estimation in monocular videos. To this end, we present a pictorial structure model that exploits high-level information about activities to incorporate higher-order part dependencies by modeling action specific appearance models and pose priors. However, instead of using an additional expensive action recognition framework, the action priors are efficiently estimated by our pose estimation framework. This is achieved by starting with a uniform action prior and updating the action prior during pose estimation. We also show that learning the right amount of appearance sharing among action classes improves the pose estimation. We demonstrate the effectiveness of the proposed method on two challenging datasets for pose estimation and action recognition with over 80,000 test images


Overview: We propose an action conditioned pictorial structure model for human pose estimation (2). Both the unaries φ and the binaries ψ of the model are conditioned on the distribution of action classes. While the pairwise terms are modeled by Gaussians conditioned on action probabilities, the unaries are learned by a regression forest conditioned on actions (1). Given an input video, we do not have any prior knowledge about the action and use a uniform prior. We then predict the pose for each frame independently (3). Based on the estimated poses, the probabilities of the action classes are estimated for the entire video (4). Pose estimation is repeated with the updated action prior to obtain better pose estimates (5).


Umar Iqbal, Martin Garbade, Juergen Gall
Pose for Action - Action for Pose

IEEE Conference on Automatic Face and Gesture Recognition (FG'17), Washington DC, USA.
[PDF] [Supplementary Material] [Poster]

Source Code

Source code is available here


The work was partially supported by the ERC Starting Grant ARCA (677650).