We present an approach for modeling the human body by Sums of spatial Gaussians (SoG), allowing us to perform fast and high-quality markerless motion capture from multi-view video sequences. The SoG model is equipped with a color model to represent the shape and appearance of the human and can be reconstructed from a sparse set of images. Similar to the human body, we also represent the image domain as SoG that models color consistent image blobs. Based on the SoG models of the image and the human body, we introduce a novel continuous and differentiable model-to-image similarity measure that can be used to estimate the skeletal motion of a human at 5-15 frames per second even for many camera views. In our experiments, we show that our method, which does not rely on silhouettes or training data, offers an good balance between accuracy and computational cost.
Top: Construction of an actor-specific human 3D body model based on a set of spatial 3D Gaussians with a constant color model (3D SoG model of actor). Bottom: Conversion of input images into superpixels with constant color model. The shape of each superpixel is represented by a spatial 2D Gaussian (2D SoG models of images). Right: Pose estimation is performed by efficiently matching the 3D SoG model of the actor to the 2D SoG models of the images using local optimization.
Tracking results of the proposed method shown as skeleton overlay over the input images.
Video ~30MB (AVI)
Stoll C., Hasler N., Gall J., Seidel H.-P., and Theobalt C., Fast Articulated Motion Tracking using a Sums of Gaussians Body Model (PDF), International Conference on Computer Vision (ICCV'11), 951-958, 2011. ©IEEE.
Stoll C., Hasler N., Gall J., Seidel H.-P., and Theobalt C., Fast Articulated Motion Tracking using a Sums of Gaussians Body Model - Supplementary Material (PDF), 2011.