Early works on human action recognition focused on tracking and classifying articulated body motions. Such methods required accurate localisation of body parts, which is a difficult task, particularly under realistic imaging conditions. As such, recent trends have shifted towards the use of more abstract, low-level appearance features such as spatio-temporal interest points. Motivated by the recent progress in pose estimation, we feel that pose-based action recognition systems warrant a second look. In this paper, we address the question of whether pose estimation is useful for action recognition or if it is better to train a classifier only on low-level appearance features drawn from video data. We compare pose-based, appearance-based and combined pose and appearance features for action recognition in a home-monitoring scenario. Our experiments show that posebased features outperform low-level appearance features, even when heavily corrupted by noise, suggesting that pose estimation is beneficial for the action recognition task.
We address the question of whether it is useful to perform pose estimation for the task of action recognition by comparing the use of appearance-based features, pose-based features and combined appearance- and pose-based features
Pose-based features. (a) Euclidean distance between two joints (red). (b) Plane feature: distance between a joint (red) and a plane (defined by three joints - black). (c) Normal plane feature: same as plane feature, but the plane is defined by its normal (direction of two joints - black squares) and a joint (black circle). (d) Velocity feature: velocity component of a joint (red) in the direction of two joints (black). (e) Normal velocity feature: velocity component of a joint in normal to the plane defined by three other joints (black).
Yao A., Gall J., and van Gool L., Coupled Action Recognition and Pose Estimation from Multiple Views (PDF), International Journal of Computer Vision, Vol 100(1), 16-37, Springer, 2012. ©Springer
Yao A., Gall J., Fanelli G., and van Gool L., Does Human Action Recognition Benefit from Pose Estimation? (PDF), British Machine Vision Conference (BMVC'11), 2011.
Gall J., Yao A., Razavi N., van Gool L., and Lempitsky V., Hough Forests for Object Detection, Tracking, and Action Recognition (PDF), IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 11, 2188-2202, 2011. ©IEEE
Yao A., Gall J., and van Gool L., A Hough Transform-Based Voting Framework for Action Recognition (PDF), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10), 2010. ©IEEE