ICCV2009: Max-Margin Ađitive Classifiers for Detection

  • 453 views
Uploaded on

 

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
453
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Thankyou. Good afternoon everybody. I am going to present ways to train additive classifiers efficiently . This work is a part of an ongoing collaboration with alex berg.
  • For any classification task the two main things we care about are accuracy and evaluation time. Especially for object detection where one evalutaes a classifier on thousands of windowsPer image – the evalutation time becomes very important. In the past linear SVMs though relatively less accurate were preferred over kernel SVMs for real-time applications.
  • In our CVPR 08 paper…
  • We identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions.
  • We identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions.
  • And showed that they can be evaulated efficiently. This makes it possible for one to use more accurate classifiers with relatively no loss in speed. In fact more than half of thisYear’s submissions to the PACCAL VOC object detection challenge use variants of additive kernels.
  • In this talk we are going to talk about additive models in general – where the classifier decomposes into dimensions. This may seem restrictive but it’s a useful class of classifiers which iis strictly more general than linear classifiers.In fact if the underlying kernel for the SVM is additive then the classifier is also additive
  • Pic looks similar to that for evaluation time… it is important to note that this was not the case even somewhat recently…
  • Maybe put some refs on this…
  • Maybe put some refs on this…As mentioned before, our previous work identified a subset of non-linear classifiers with an additive structure and showed they could be evaluated efficiently, but unfortunately did not address improving efficiency for training…
  • Maybe put some refs on this…
  • This paper addresses efficient training for additive classifiers, developing training methods that are about as efficient as the best methods fortraining linear classifiers. We also demonstrate the accuracy avantages on some popular datasets.?....
  • Should we change the wording? Drop SVM?
  • (finish this by 5 mins)
  • The idea of support vector machines is to find a separating hyperplane on the data into a high dimension space using a Kernel.The final classifier is ofcouse a line in a very high dimensional space but can be expressed using only the Kernel function using the so called kernel trick. If the embedded space is low dimensional then one can take advantage of the very fast linear SVM training algorithms which scale linearly with trainingData as opposed to the quadratic growth for the kernel SVM.
  • Unfortunately these embeddings are often high dimensionalOur approach can be seen as finding embeddings that are both sparse and accurate so that we can use the very best of the linear SVM training algorithms for trainingThe classifier. In fact we would ideally like the number of non zero entries in the embedded features to be a small multiple of the nonn zero entries in the input features.
  • A key idea of the paper is to realize that additive kernels are easy to embed as the final embedding is just a concatenation of the individual dimension embeddingsAS as example the min kernel or the histogram intersection kernel defined as A well known embedding for min kernel for integers is the unaryencoding where each number is represented in the unaryExample …For non-integers one may just approximate this by quantization

Transcript

  • 1. Max-Margin Additive Classifiers for Detection
    SubhransuMaji & Alexander Berg
    University of California at Berkeley
    Columbia University
    ICCV 2009, Kyoto, Japan
  • 2. Accuracy vs. Evaluation Timefor SVM Classifiers
    Non-linear Kernel
    Evaluation time
    Linear Kernel
    Accuracy
  • 3. Accuracy vs. Evaluation Timefor SVM Classifiers
    Non-linear Kernel
    Evaluation time
    Our CVPR 08
    Linear Kernel
    Accuracy
  • 4. Non-linear Kernel
    Additive Kernel
    Evaluation time
    Our CVPR 08
    Linear Kernel
    Accuracy
    Accuracy vs. Evaluation Timefor SVM Classifiers
  • 5. Additive Kernel
    Non-linear Kernel
    Additive Kernel
    Evaluation time
    Our CVPR 08
    Linear Kernel
    Accuracy
    Accuracy vs. Evaluation Timefor SVM Classifiers
  • 6. Accuracy vs. Evaluation Timefor SVM Classifiers
    Additive Kernel
    Non-linear Kernel
    Evaluation time
    Our CVPR 08
    Linear Kernel
    Additive Kernel
    Accuracy
    Made it possible to use SVMs with additive kernels for detection.
  • 7. Additive Classifiers
    Much work already uses them!
    SVMs with additive kernels are additive classifiers
    Histogram based kernels
    Histogram intersection, chi-squared kernel
    Pyramid Match Kernel (Grauman & Darell, ICCV’05)
    Spatial Pyramid Match Kernel (Lazebniket.al., CVPR’06)
    ….
  • 8. Accuracy vs. Training Timefor SVM Classifiers
    Non-linear
    Training time
    Linear Kernel
    Accuracy
  • 9. Accuracy vs. Training Timefor SVM Classifiers
    Non-linear
    Training time
    <=1990s
    Linear
    Accuracy
  • 10. Accuracy vs. Training Timefor SVM Classifiers
    Non-linear
    Training time
    Today
    Linear
    Accuracy
    Eg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend
  • 11. Accuracy vs. Training Timefor SVM Classifiers
    Non-linear
    Additive
    Training time
    Our CVPR 08
    Linear
    Accuracy
  • 12. Accuracy vs. Training Timefor SVM Classifiers
    Non-linear
    Additive
    Training time
    Our CVPR 08

    Linear
    Accuracy
  • 13. Accuracy vs. Training Timefor SVM Classifiers
    Non-linear
    Additive
    Training time
    This Paper
    Linear
    Accuracy
  • 14. Accuracy vs. Training Timefor SVM Classifiers
    Non-linear
    Training time
    This Paper
    Linear
    Additive
    Accuracy
    Makes it possible to train additive classifiers very fast.
  • 15. Summary
    Additive classifiers are widely used and can provide better accuracy than linear
    Our CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear.
    This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers.
    An example
  • 16. Support Vector Machines
    Embedded Space
    Input Space
    Kernel Function
    • Inner Product in the embedded space
    • 17. Can learn non-linear boundaries in input space
    Classification Function
    Kernel Trick
  • 18. Embeddings…
    These embeddings can be high dimensional (even infinite)
    Our approach is based on embeddings thatapproximate kernels.
    We’d like this to be as accurate as possible
    We are going to use fast linear classifier training algorithms on the so sparseness is important.
  • 19. Key Idea: Embedding an Additive Kernel
    Additive Kernels are easy to embed, just embed each dimension independently
    Linear Embedding for min Kernel for integers
    For non integers can approximate by quantizing
  • 20. Issues: Embedding Error
    Quantization leads to large errors
    Better encoding
    x
    y
  • 21. Issues: Sparsity
    Represent with sparse values
  • 22. Linear SVM objective (solve with LIBLINEAR):
    Encoded SVM objective (not practical):
    Linear vs. Encoded SVMs
  • 23. Linear vs. Encoded SVMs
    Linear SVM objective (solve with LIBLINEAR):
    Encoded SVM modified (custom solver):
    Encourages smooth functions
    Closely approximates min kernel SVM
    Custom solver : PWLSGD (see paper)
  • 24. Linear SVM objective (solve with LIBLINEAR):
    Encoded SVM objective (solve with LIBLINEAR) :
    Linear vs. Encoded SVMs
  • 25. Additive Classifier Choices
    Regularization
    Encoding
  • 26. Additive Classifier Choices
    Accuracy Increases
    Regularization
    Encoding
    Evaluation times are similar
  • 27. Additive Classifier Choices
    Accuracy Increases
    Regularization
    Encoding
    Accuracy Increases
    Evaluation times are similar
  • 28. Additive Classifier Choices
    Accuracy Increases
    Regularization
    Encoding
    Accuracy Increases
    Standard solver
    Eg. LIBSVM
    Few lines of code + standard solver
    Eg. LIBLINEAR
  • 29. Additive Classifier Choices
    Accuracy Increases
    Regularization
    Encoding
    Accuracy Increases
    Custom solver
  • 30. Additive Classifier Choices
    Accuracy Increases
    Regularization
    Encoding
    Accuracy Increases
    Classifier Notations
  • 31. Experiments
    “Small” Scale: Caltech 101 (Fei-Fei, et.al.)
    “Medium” Scale: DC Pedestrians (Munder & Gavrila)
    “Large” Scale : INRIA Pedestrians (Dalal & Triggs)
  • 32. Experiment : DC Pedestrians
    (3.18s, 89.25%)
    (1.86s, 88.80%)
    (363s, 89.05%)
    (2.98s, 85.71%)
    100x faster
    training time ~ linear SVM
    accuracy ~ kernel SVM
    (1.89s, 72.98%)
    20,000 features, 656 dimensional
    100 bins for encoding
    6-fold cross validation
  • 33. Experiment : Caltech 101
    (291s, 55.35%)
    (2687s, 56.49%)
    (102s, 54.8%)
    (90s, 51.64%)
    10x faster
    Small loss in accuracy
    (41s, 46.15%)
    30 training examples per category
    100 bins for encoding
    Pyramid HOG + Spatial Pyramid Match Kernel
  • 34. Experiment : INRIA Pedestrians
    (140 mins, 0.95)
    (76s, 0.94)
    (27s, 0.88)
    300x faster
    training time ~ linear SVM
    accuracy ~ kernel SVMtrains the detector in < 2 mins
    (122s, 0.85)
    (20s, 0.82)
    SPHOG: 39,000 features, 2268 dimensional
    100 bins for encoding
    Cross Validation Plots
  • 35. Experiment : INRIA Pedestrians
    300x faster
    training time ~ linear SVM
    accuracy ~ kernel SVMtrains the detector in < 2 mins
    SPHOG: 39,000 features, 2268 dimensional
    100 bins for encoding
    Cross Validation Plots
  • 36. Take Home Messages
    Additive models are practical for large scale data
    Can be trained discriminatively:
    Poor man’s version : encode + Linear SVM Solver
    Middle man’s version : encode + Custom Solver
    Rich man’s version : Min Kernel SVM
    Embedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training time
    Everyone should use: see code on our websites
    Fast IKSVM from CVPR’08, Encoded SVMs, etc
  • 37. Thank You