A Hough Transform-Based Voting Framework for Action Recognition

Angela Yao, Juergen Gall, and Luc Van Gool

Abstract

We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a discriminative multi-class codebook that share features between the action classes and vote for action centers in a probabilistic manner. Using low-level features such as gradients and optical flow, we demonstrate that Hough-voting can achieve state-of-the-art performance on several datasets covering a wide range of action-recognition scenarios.

Images/Videos

(a) Detection hypotheses are generated for each frame independently. (b) Particle filtering is used to link hypotheses across frames. (c) Action tracks. (d) Hough voting by 3D patches for action label and center.

Source Code/Data

Source Code

Bounding Box Annotations for Datasets

If you have questions concerning the annotations or source code, please contact Angela Yao.

Publications

Gall J., Yao A., Razavi N., van Gool L., and Lempitsky V., Hough Forests for Object Detection, Tracking, and Action Recognition (PDF), IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 11, 2188-2202, 2011. ©IEEE

Waltisberg D., Yao A., Gall J., and van Gool L., Variations of a Hough-Voting Action Recognition System (PDF), Proceedings of the ICPR 2010 Contests, Springer, LNCS 6388, 306-312, 2010. ©Springer

Yao A., Gall J., and van Gool L., A Hough Transform-Based Voting Framework for Action Recognition (PDF), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10), 2010. ©IEEE

Gall J. and Lempitsky V., Class-Specific Hough Forests for Object Detection (PDF), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09), 2009. ©IEEE