CVPR2009: Object Detection Using a Max-Margin Hough Transform


Published on

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Thank you. Good morning. I am going to present a learning framework for Hough transform based object detection.
  • We are interested in the task of object detection where we are interested in localizing an instance of an object in an image. We use an approach based on hough transform. Before I go into the details, I will present an overview of hough tranform followed by our learning framework. I will then present experimental results and conclude.
  • Yet another way of doing this is hough transform based approach. This is of course an old idea proposed by Hough for detecting lines more than 50 years ago. Since then it has been generalized to detect parametric shapes like ellipses and circles. Local parts cast vote for object pose and the complexity scales linearly with # parts times # votes.
  • Recently Liebe and Schile have extended this framework for object detection. A slide from their Implicit Shape Model framework illustrates the technique. Local parts are based on patches represented using a dictionary learned form training examples. The position of each codeword is recorded on the training example to from a distribution of each codeword location wrto the object center. For example the patch corresponding to the head of the person is typically at a fixed vertical offset wrto the torso as seen in the bottom left distribution. At test time the interest points are detected and matched to the codebook entries which vote for the object center. The peaks of the voting space correspond to object locations. Quite simple but a powerful framework.
  • Introducing you to a set of notations for the next set of slides. Let C be the learned codebook, let f denote the features and l the location of the features. The overall detection score is the sum of contributions from each feature f_j observed at a location l_j. Each feature is matched to a codebook as given by p(Ci|fj). This could be simply 1 for the nearest neighbour and 0 for the other codewords. P(x|O,Ci,l_j) is the distribution of the centroid given the Codeword Ci observed at location lj. The last term p(O|Ci,lj) is the confidence (or weight) of the codeword Ci.
  • Learning codeword weights in the context of Hough transform has not been addressed well in the literature. In an earlier talk today we saw a way of learning discriminative dictionaries for Hough transform. However in situations where the codebook is fixed we would like to learn the importance of each codeword. I.e. we have been given a codebook and the posterior distribution of the object center for each codeword and we would like to learn weights so that the Hough transform detector has the best detection rates. What we show is that these weights can be learned optimally using convex optimization and leads to better detection rates when compared to uniform weights and even a simple learning scheme.
  • Assign each codebook a weight proportional to the relative frequency of the object. We call this the naïve Bayes weights. (Read from slides)
  • If you look at the equation of the Hough tranform you realize that the overall score is linear in the codebook weights. This is assuming a location invariance of the object (i.e. the object can appear anywhere in the image). Thus the score is a dot product of the weight vector and a activation vector. The activations are independent of the weights given the features and their locations. This suggests a learning scheme which learns weights which increases the score on the positive locations over negative ones. We formalize this in the next slide.
  • We perform experiments on 3 datasets (ETHZ, UIUC cars and INRIA horses)
  • Our HT detector is based on GB descriptors (read from slide) and correct detections are counted using the PASCAL criterion i.e. an overlap of greater than 0.5.
  • To illustrate the idea : consider a toy example. We are trying to detect squares where the negative examples are parallel lines as shown. We have four kinds of codewords. The tips, vertical edges, horizontal edges and corners. Both corners and horizontal edges occur on the positive example only, however lets assume that corners are easy to localize while the horizontal edge can appear anywhere. The NB scheme assigns equal weights to both these whereas our framework distinguishes them correctly as seen in the table weights. The final scores on the + and – for all the schemes are shown and one can see that the m2ht achieves the maximum separation.
  • CVPR2009: Object Detection Using a Max-Margin Hough Transform

    1. 1. CVPR 2009, Miami, Florida Subhransu Maji and Jitendra Malik University of California at Berkeley, Berkeley, CA-94720 Object Detection Using a Max-Margin Hough Transform
    2. 2. Overview <ul><li>Overview of probabilistic Hough transform </li></ul><ul><li>Learning framework </li></ul><ul><li>Experiments </li></ul><ul><li>Summary </li></ul>
    3. 3. Our Approach: Hough Transform <ul><li>Popular for detecting parameterized shapes </li></ul><ul><ul><li>Hough’59, Duda&Hart’72, Ballard’81,… </li></ul></ul><ul><li>Local parts vote for object pose </li></ul><ul><li>Complexity : # parts * # votes </li></ul><ul><ul><li>Can be significantly lower than brute force search over pose (for example sliding window detectors) </li></ul></ul>
    4. 4. Generalized to object detection Learning <ul><li>Learn appearance codebook </li></ul><ul><ul><li>Cluster over interest points on </li></ul></ul><ul><ul><li>training images </li></ul></ul><ul><li>Use Hough space voting to find objects </li></ul><ul><ul><li>Lowe’99, Leibe’04,’08, Opelt&Pinz’08 </li></ul></ul><ul><li>Implicit Shape Model </li></ul><ul><ul><li>Leibe’04,’08 </li></ul></ul><ul><li>Learn spatial distributions </li></ul><ul><ul><li>Match codebook to training images </li></ul></ul><ul><ul><li>Record matching positions on object </li></ul></ul><ul><ul><li>Centroid is given </li></ul></ul>Spatial occurrence distributions x y s x y s x y s x y s
    5. 5. Detection Pipeline B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model ‘ 2004 Probabilistic Voting Interest Points eg. SIFT,GB, Local Patches Matched Codebook Entries KD Tree
    6. 6. Probabilistic Hough Transform <ul><li>C – Codebook </li></ul><ul><li>f – features, l - locations </li></ul>Position Posterior Codeword Match Codeword likelihood Detection Score Codeword likelihood
    7. 7. Learning Feature Weights <ul><li>Given : </li></ul><ul><ul><li>Appearance Codebook, C </li></ul></ul><ul><ul><li>Posterior distribution of object center for each codeword P(x|…) </li></ul></ul><ul><li>To Do : </li></ul><ul><ul><li>Learn codebook weights such that the Hough transform detector works well (i.e. better detection rates) </li></ul></ul><ul><li>Contributions : </li></ul><ul><ul><li>Show that these weights can be learned optimally using a max-margin framework. </li></ul></ul><ul><ul><li>Demonstrate that this leads to improved accuracy on various datasets </li></ul></ul>
    8. 8. <ul><li>Naïve Bayes weights: </li></ul><ul><li>Encourages relatively rare parts </li></ul><ul><li>However rare parts may not be good predictors of the object location </li></ul><ul><li>Need to jointly consider both priors and distribution of location centers. </li></ul>Learning Feature Weights : First Try
    9. 9. <ul><li>Location invariance assumption </li></ul><ul><li>Overall score is linear given the matched codebook entries </li></ul>Learning Feature Weights : Second Try Position Posterior Codeword Match Codeword likelihood Activations Feature weights
    10. 10. Max-Margin Training <ul><li>Training: </li></ul><ul><li>Construct dictionary </li></ul><ul><li>Record codeword distributions on training examples </li></ul><ul><li>Compute “a” vectors on positive and negative training examples </li></ul><ul><li>Learn codebook weights using by max-margin training </li></ul>Standard ISM model (Leibe’04) Our Contribution class label {+1,-1} activations non negative
    11. 11. Experiment Datasets ETHZ Shape Dataset ( Ferrari et al., ECCV 2006) 255 images, over 5 classes (Apple logo, Bottle, Giraffe, Mug, Swan) UIUC Single Scale Cars Dataset ( Agarwal & Roth, ECCV 2002) 1050 training, 170 test images INRIA Horse Dataset ( Jurie & Ferrari) 170 positive + 170 negative images (50 + 50 for training)
    12. 12. Experimental Results <ul><li>Hough transform details </li></ul><ul><ul><li>Interest points : Geometric Blur descriptors at sparse sample of edges (Berg&Malik’01) </li></ul></ul><ul><ul><li>Codebook constructed using k -means </li></ul></ul><ul><ul><li>Voting over position and aspect ratio </li></ul></ul><ul><ul><li>Search over scales </li></ul></ul><ul><li>Correct detections (PASCAL criterion) </li></ul>
    13. 13. Learned Weights (ETHZ shape) Max-Margin Important Parts Naïve Bayes blue (low) , dark red (high) Influenced by clutter (rare structures)
    14. 14. Learned Weights (UIUC cars) blue (low) , dark red (high) Naïve Bayes Max-Margin Important Parts
    15. 15. Learned Weights (INRIA horses) blue (low) , dark red (high) Naïve Bayes Max-Margin Important Parts
    16. 16. Detection Results (ETHZ dataset) Recall @ 1.0 False Positives Per Window
    17. 17. Detection Results (INRIA Horses) Our Work
    18. 18. Detection Results (UIUC Cars) INRIA horses Our Work
    19. 19. Hough Voting + Verification Classifier Recall @ 0.3 False Positives Per Image ETHZ Shape Dataset IKSVM was run on top 30 windows + local search KAS – Ferrari, PAMI’08 TPS-RPM – Ferrari, CVPR’07 better fitting bounding box Implicit sampling over aspect-ratio
    20. 20. Hough Voting + Verification Classifier IKSVM was run on top 30 windows + local search Our Work
    21. 21. Hough Voting + Verification Classifier UIUC Single Scale Car Dataset IKSVM was run on top 10 windows + local search 1.7% improvement
    22. 22. Summary <ul><li>Hough transform based detectors offer good detection performance and speed. </li></ul><ul><li>To get better performance one may learn </li></ul><ul><ul><li>Discriminative dictionaries (two talks ago, Gall’09) </li></ul></ul><ul><ul><li>Weights on codewords (our work) </li></ul></ul><ul><li>Our approach directly optimizes detection performance using a max-margin formulation </li></ul><ul><li>Any weak predictor of object center can be used is this framework </li></ul><ul><ul><li>Eg. Regions (one talk ago, Gu CVPR’09) </li></ul></ul>
    23. 23. <ul><li>Work partially supported by: </li></ul><ul><li>ARO MURI W911NF-06-1-0076 and ONR MURI N00014-06-1-0734 </li></ul><ul><li>Computer Vision Group @ UC Berkeley </li></ul>Acknowledgements Thank You Questions?
    24. 24. Backup Slide : Toy Example Rare but poor localization Rare and good localization