Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections

Nima Razavi, Juergen Gall, and Luc Van Gool

Abstract

Hough transform based object detectors learn a mapping from the image domain to a Hough voting space. Within this space, object hypotheses are formed by local maxima. The votes contributing to a hypothesis are called support. In this work, we investigate the use of the support and its backprojection to the image domain for multi-view object detection. To this end, we create a shared codebook with training and matching complexities independent of the number of quantized views. We show that since backprojection encodes enough information about the viewpoint all views can be handled together. In our experiments, we demonstrate that superior accuracy and efficiency can be achieved in comparison to the popular one-vs-the-rest detectors by treating views jointly especially with few training examples and no view annotations. Furthermore, we go beyond the detection case and based on the support we introduce a part-based similarity measure between two arbitrary detections which naturally takes spatial relationships of parts into account and is insensitive to partial occlusions. We also show that backprojection can be used to efficiently measure the similarity of a detection to all training examples. Finally, we demonstrate how these metrics can be used to estimate continuous object parameters like human pose and object's viewpoint. In our experiment, we achieve state-of-the-art performance for view-classification on the PASCAL VOC'06 dataset.

Images/Videos

(a) Features are matched against the codebook casting votes to the voting space. (b) The local maxima of the voting space is localized and the votes contributing to it are identified (inside the red circle). (c) The votes are backprojected to the image domain creating the backprojection mask. (d-e) Visualization of the backprojection mask.

(a) Sharing of codebook occurrences across views for Leuven-cars. (b) Viewpoint retrieval with the proposed nearest neighbor metric.

Video ~40MB (AVI)

Publications

Razavi N., Gall J., and Van Gool L., Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections (PDF), European Conference on Computer Vision (ECCV'10), LNCS 6311, 620-633, 2010. ©Springer-Verlag

Gall J., Yao A., Razavi N., van Gool L., and Lempitsky V., Hough Forests for Object Detection, Tracking, and Action Recognition (PDF), IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 11, 2188-2202, 2011. ©IEEE

Gall J. and Lempitsky V., Class-Specific Hough Forests for Object Detection (PDF), IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09), 2009. ©IEEE