Part 3: classifier based methods Antonio Torralba
Overview of section <ul><li>A short story of discriminative methods </li></ul><ul><li>Object detection with classifiers </...
Classifier based methods Object detection and recognition is formulated as a classification problem.  …  and a decision is...
Discriminative vs. generative x = data <ul><li>Generative model  </li></ul>(The lousy  painter) 0 10 20 30 40 50 60 70 0 0...
<ul><li>The representation and matching of pictorial structures   Fischler, Elschlager (1973) .  </li></ul><ul><li>Face re...
<ul><li>The representation and matching of pictorial structures   Fischler, Elschlager (1973) .  </li></ul><ul><li>Face re...
Face detection <ul><li>The representation and matching of pictorial structures   Fischler, Elschlager (1973) .  </li></ul>...
Face detection
<ul><li>Formulation: binary classification </li></ul>Formulation +1 -1 x 1 x 2 x 3 x N … … x N+1 x N+2 x N+M -1 -1 ? ? ? …...
Discriminative methods 10 6  examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural ...
A simple object detector with Boosting  <ul><li>Download  </li></ul><ul><li>Toolbox for manipulating dataset </li></ul><ul...
<ul><li>A simple algorithm for learning robust classifiers </li></ul><ul><ul><li>Freund & Shapire, 1995 </li></ul></ul><ul...
Boosting  <ul><li>Boosting fits the additive model </li></ul>by minimizing the exponential loss Training samples The expon...
Boosting  Sequential procedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Reg...
Weak classifiers  <ul><li>The input is a set of weighted training samples (x,y,w) </li></ul><ul><li>Regression stumps: sim...
Flavors of boosting <ul><li>AdaBoost (Freund and Shapire, 1995) </li></ul><ul><li>Real AdaBoost (Friedman et al, 1998) </l...
From images to features: A myriad of weak detectors <ul><li>We will now define a family of visual features that can be use...
A myriad of weak detectors <ul><li>Yuille, Snow, Nitzbert, 1998 </li></ul><ul><li>Amit, Geman 1998 </li></ul><ul><li>Papag...
Weak detectors <ul><li>Textures of textures  </li></ul><ul><li>Tieu and Viola, CVPR 2000 </li></ul>Every combination of th...
Haar wavelets <ul><li>Haar filters and integral image </li></ul><ul><li>Viola and Jones, ICCV 2001 </li></ul>The average i...
Haar wavelets Papageorgiou & Poggio (2000) Polynomial SVM
Edges and chamfer distance Gavrila, Philomin, ICCV 1999
Edge fragments Weak detector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes Opelt, Pinz, Zis...
Histograms of oriented gradients <ul><li>Dalal & Trigs, 2006 </li></ul><ul><li>Shape context </li></ul><ul><li>Belongie, M...
Weak detectors <ul><li>Part based : similar to part-based generative models. We create weak detectors by using parts and v...
Weak detectors First we collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman, Nature Neuro...
Weak detectors We now define a family of “weak detectors” as: = = Better than chance *
Weak detectors We can do a better job using filtered images Still a weak detector but better than before * * = = =
Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the objec...
Representation and object model Selected features for the screen detector Lousy painter  … 4 10 1 2 3 … 100
Representation and object model Selected features for the car detector 1 2 3 4 10 100 … …
Detection <ul><li>Invariance: search strategy </li></ul><ul><li>Part based </li></ul>Here, invariance in translation and s...
Example: screen detection Feature  output
Example: screen detection Feature  output Thresholded  output Weak ‘detector’ Produces many false alarms.
Example: screen detection Feature  output Thresholded  output Strong classifier  at iteration 1
Example: screen detection Feature  output Thresholded  output Strong classifier Second weak ‘detector’ Produces a differen...
Example: screen detection + Feature  output Thresholded  output Strong classifier Strong classifier  at iteration 2
Example: screen detection + … Feature  output Thresholded  output Strong classifier Strong classifier  at iteration 10
Example: screen detection + … Feature  output Thresholded  output Strong classifier Adding  features Final classification ...
Cascade of classifiers Fleuret and Geman 2001, Viola and Jones 2001 We want the complexity of the 3 features classifier wi...
Some goals for object recognition  <ul><li>Able to detect and recognize many object classes </li></ul><ul><li>Computationa...
Multiclass object detection
Multiclass object detection Number of classes Amount of labeled data Number of classes Complexity
Shared features <ul><li>Is learning the object class 1000 easier than learning the first? </li></ul><ul><li>Can we transfe...
Multitask learning <ul><li>horizontal location of doorknob  </li></ul><ul><li>single or double door </li></ul><ul><li>hori...
Sharing invariances S. Thrun.   Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996 “ Knowledge is tr...
Sharing transformations <ul><li>Miller, E., Matsakis, N., and Viola, P. (2000). Learning from one example through shared d...
Models of object recognition I. Biederman, “Recognition-by-components: A theory of human image understanding,”  Psychologi...
Sharing in constellation models Pictorial Structures Fischler & Elschlager, IEEE Trans. Comp. 1973 Constellation Model Bur...
Random  initialization Variational EM (Attias, Hinton, Beal, etc.) Slide from Fei Fei Li Fei-Fei ,  Fergus, & Perona, ICCV...
Reusable Parts Goal: Look for a vocabulary of edges that reduces the number of features. Krempp, Geman, & Amit “Sequential...
Sharing patches <ul><li>Bart and Ullman, 2004 </li></ul>For a new class, use only features similar to features that where ...
Multiclass boosting <ul><li>Adaboost.MH (Shapire & Singer, 2000) </li></ul><ul><li>Error correcting output codes (Dietteri...
Shared features Screen detector Car detector Face detector <ul><li>Independent binary classifiers: </li></ul>Torralba, Mur...
50 training samples/class 29 object classes 2000 entries in the dictionary Results averaged on 20 runs Error bars = 80% in...
Generalization as a function of object similarities Number of training samples per class Number of training samples per cl...
Opelt, Pinz, Zisserman, CVPR 2006  Efficiency Generalization
Some references on multiclass <ul><li>Caruana 1997 </li></ul><ul><li>Schapire, Singer, 2000 </li></ul><ul><li>Thrun, Pratt...
Context based methods Antonio Torralba
Why is this hard?
What are the hidden objects? 2 1
What are the hidden objects? Chance ~ 1/30000
Context-based object recognition <ul><li>Cognitive psychology </li></ul><ul><ul><li>Palmer 1975  </li></ul></ul><ul><ul><l...
 
 
 
Global and local representations building car sidewalk Urban street scene
Global and local representations Image index: Summary statistics,  configuration of textures Urban street scene features h...
Global scene representations Spatial structure is important in order to provide context for object localization Sivic, Rus...
Contextual object relationships Carbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005) Torralba Murphy Freeman (2004...
Context <ul><li>Murphy, Torralba & Freeman (NIPS 03) </li></ul><ul><li>Use global context to predict presence and location...
3d Scene Context Image World [Hoiem, Efros, Hebert ICCV 2005]
3d Scene Context Image Support Vertical Sky V-Left V-Center V-Right V-Porous V-Solid [Hoiem, Efros, Hebert ICCV 2005] Obje...
Object-Object Relationships <ul><li>Enforce spatial consistency between labels using MRF </li></ul>Carbonetto, de Freitas ...
Object-Object Relationships <ul><li>Use latent variables to induce long distance correlations between labels in a Conditio...
<ul><li>Fink & Perona (NIPS 03) </li></ul><ul><li>Use output of boosting from other objects at previous iterations as inpu...
CRFs Object-Object Relationships [Torralba Murphy Freeman 2004] [Kumar Hebert 2005]
Hierarchical Sharing and Context <ul><li>Scenes share objects </li></ul><ul><li>Objects share parts </li></ul><ul><li>Part...
Some references on context <ul><li>Strat & Fischler (PAMI 91)  </li></ul><ul><li>Torralba & Sinha (ICCV 01),  </li></ul><u...
A car out of context …
Integrated models for  scene and object recognition Banksy
Summary <ul><li>Many techniques are used for training discriminative models that I have not mention here: </li></ul><ul><l...
Upcoming SlideShare
Loading in …5
×

Cvpr2007 object category recognition p3 - discriminative models

1,145 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,145
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
174
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • There are multiple flavors of multiclass boosting, here I am going to use the original Adaboost.MH
  • Strees: the main part is not to discriminate among classes. The hard part is to detect the objects when inmersed in clutter.
  • These results have been reproduced by opelt with a completely different set of features, and they found the same sublinear scaling for the complexity and the same improved generalization. Although the increase is weak because they use few categories
  • Despite of being semantically wrong, some basic contextual principles are still preserved: the elephant is not sticking out of the wall.
  • Cvpr2007 object category recognition p3 - discriminative models

    1. 1. Part 3: classifier based methods Antonio Torralba
    2. 2. Overview of section <ul><li>A short story of discriminative methods </li></ul><ul><li>Object detection with classifiers </li></ul><ul><li>Boosting </li></ul><ul><ul><li>Gentle boosting </li></ul></ul><ul><ul><li>Weak detectors </li></ul></ul><ul><ul><li>Object model </li></ul></ul><ul><ul><li>Object detection </li></ul></ul><ul><li>Multiclass object detection </li></ul><ul><li>Context based object recognition </li></ul>
    3. 3. Classifier based methods Object detection and recognition is formulated as a classification problem. … and a decision is taken at each window about if it contains a target object or not. Where are the screens? The image is partitioned into a set of overlapping windows Bag of image patches Decision boundary Computer screen Background In some feature space
    4. 4. Discriminative vs. generative x = data <ul><li>Generative model </li></ul>(The lousy painter) 0 10 20 30 40 50 60 70 0 0.05 0.1 0 10 20 30 40 50 60 70 0 0.5 1 x = data <ul><li>Discriminative model </li></ul>0 10 20 30 40 50 60 70 80 -1 1 x = data <ul><li>Classification function </li></ul>( The artist )
    5. 5. <ul><li>The representation and matching of pictorial structures Fischler, Elschlager (1973) . </li></ul><ul><li>Face recognition using eigenfaces M. Turk and A. Pentland (1991). </li></ul><ul><li>Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) </li></ul><ul><li>Graded Learning for Object Detection - Fleuret, Geman (1999) </li></ul><ul><li>Robust Real-time Object Detection - Viola, Jones (2001) </li></ul><ul><li>Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) </li></ul><ul><li>… . </li></ul>
    6. 6. <ul><li>The representation and matching of pictorial structures Fischler, Elschlager (1973) . </li></ul><ul><li>Face recognition using eigenfaces M. Turk and A. Pentland (1991). </li></ul><ul><li>Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) </li></ul><ul><li>Graded Learning for Object Detection - Fleuret, Geman (1999) </li></ul><ul><li>Robust Real-time Object Detection - Viola, Jones (2001) </li></ul><ul><li>Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) </li></ul><ul><li>… . </li></ul>
    7. 7. Face detection <ul><li>The representation and matching of pictorial structures Fischler, Elschlager (1973) . </li></ul><ul><li>Face recognition using eigenfaces M. Turk and A. Pentland (1991). </li></ul><ul><li>Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) </li></ul><ul><li>Graded Learning for Object Detection - Fleuret, Geman (1999) </li></ul><ul><li>Robust Real-time Object Detection - Viola, Jones (2001) </li></ul><ul><li>Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) </li></ul><ul><li>… . </li></ul>
    8. 8. Face detection
    9. 9. <ul><li>Formulation: binary classification </li></ul>Formulation +1 -1 x 1 x 2 x 3 x N … … x N+1 x N+2 x N+M -1 -1 ? ? ? … Training data: each image patch is labeled as containing the object or background Test data Features x = Labels y = <ul><li>Minimize misclassification error </li></ul><ul><li>(Not that simple: we need some guarantees that there will be generalization) </li></ul>Where belongs to some family of functions <ul><li>Classification function </li></ul>
    10. 10. Discriminative methods 10 6 examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines and Kernels Conditional Random Fields McCallum, Freitag, Pereira 2000 Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001 …
    11. 11. A simple object detector with Boosting <ul><li>Download </li></ul><ul><li>Toolbox for manipulating dataset </li></ul><ul><li>Code and dataset </li></ul><ul><li>Matlab code </li></ul><ul><li>Gentle boosting </li></ul><ul><li>Object detector using a part based model </li></ul><ul><li>Dataset with cars and computer monitors </li></ul>http://people.csail.mit.edu/torralba/iccv2005/
    12. 12. <ul><li>A simple algorithm for learning robust classifiers </li></ul><ul><ul><li>Freund & Shapire, 1995 </li></ul></ul><ul><ul><li>Friedman, Hastie, Tibshhirani, 1998 </li></ul></ul><ul><li>Provides efficient algorithm for sparse visual feature selection </li></ul><ul><ul><li>Tieu & Viola, 2000 </li></ul></ul><ul><ul><li>Viola & Jones, 2003 </li></ul></ul><ul><li>Easy to implement, not requires external optimization tools. </li></ul>Why boosting?
    13. 13. Boosting <ul><li>Boosting fits the additive model </li></ul>by minimizing the exponential loss Training samples The exponential loss is a differentiable upper bound to the misclassification error.
    14. 14. Boosting Sequential procedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) to minimize the residual loss input Desired output Parameters weak classifier
    15. 15. Weak classifiers <ul><li>The input is a set of weighted training samples (x,y,w) </li></ul><ul><li>Regression stumps: simple but commonly used in object detection. </li></ul>Four parameters: b=E w (y [x>  ]) a=E w (y [x<  ]) x f m (x) 
    16. 16. Flavors of boosting <ul><li>AdaBoost (Freund and Shapire, 1995) </li></ul><ul><li>Real AdaBoost (Friedman et al, 1998) </li></ul><ul><li>LogitBoost (Friedman et al, 1998) </li></ul><ul><li>Gentle AdaBoost (Friedman et al, 1998) </li></ul><ul><li>BrownBoosting (Freund, 2000) </li></ul><ul><li>FloatBoost (Li et al, 2002) </li></ul><ul><li>… </li></ul>
    17. 17. From images to features: A myriad of weak detectors <ul><li>We will now define a family of visual features that can be used as weak classifiers (“weak detectors”) </li></ul>Takes image as input and the output is binary response. The output is a weak detector.
    18. 18. A myriad of weak detectors <ul><li>Yuille, Snow, Nitzbert, 1998 </li></ul><ul><li>Amit, Geman 1998 </li></ul><ul><li>Papageorgiou, Poggio, 2000 </li></ul><ul><li>Heisele, Serre, Poggio, 2001 </li></ul><ul><li>Agarwal, Awan, Roth, 2004 </li></ul><ul><li>Schneiderman, Kanade 2004 </li></ul><ul><li>Carmichael, Hebert 2004 </li></ul><ul><li>… </li></ul>
    19. 19. Weak detectors <ul><li>Textures of textures </li></ul><ul><li>Tieu and Viola, CVPR 2000 </li></ul>Every combination of three filters generates a different feature This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.
    20. 20. Haar wavelets <ul><li>Haar filters and integral image </li></ul><ul><li>Viola and Jones, ICCV 2001 </li></ul>The average intensity in the block is computed with four sums independently of the block size.
    21. 21. Haar wavelets Papageorgiou & Poggio (2000) Polynomial SVM
    22. 22. Edges and chamfer distance Gavrila, Philomin, ICCV 1999
    23. 23. Edge fragments Weak detector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes Opelt, Pinz, Zisserman, ECCV 2006
    24. 24. Histograms of oriented gradients <ul><li>Dalal & Trigs, 2006 </li></ul><ul><li>Shape context </li></ul><ul><li>Belongie, Malik, Puzicha, NIPS 2000 </li></ul><ul><li>SIFT, D. Lowe, ICCV 1999 </li></ul>
    25. 25. Weak detectors <ul><li>Part based : similar to part-based generative models. We create weak detectors by using parts and voting for the object center location </li></ul>Car model Screen model These features are used for the detector on the course web site.
    26. 26. Weak detectors First we collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman, Nature Neuroscience 2003 …
    27. 27. Weak detectors We now define a family of “weak detectors” as: = = Better than chance *
    28. 28. Weak detectors We can do a better job using filtered images Still a weak detector but better than before * * = = =
    29. 29. Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background:
    30. 30. Representation and object model Selected features for the screen detector Lousy painter … 4 10 1 2 3 … 100
    31. 31. Representation and object model Selected features for the car detector 1 2 3 4 10 100 … …
    32. 32. Detection <ul><li>Invariance: search strategy </li></ul><ul><li>Part based </li></ul>Here, invariance in translation and scale is achieved by the search strategy: the classifier is evaluated at all locations (by translating the image) and at all scales (by scaling the image in small steps). The search cost can be reduced using a cascade. f i , P i g i
    33. 33. Example: screen detection Feature output
    34. 34. Example: screen detection Feature output Thresholded output Weak ‘detector’ Produces many false alarms.
    35. 35. Example: screen detection Feature output Thresholded output Strong classifier at iteration 1
    36. 36. Example: screen detection Feature output Thresholded output Strong classifier Second weak ‘detector’ Produces a different set of false alarms.
    37. 37. Example: screen detection + Feature output Thresholded output Strong classifier Strong classifier at iteration 2
    38. 38. Example: screen detection + … Feature output Thresholded output Strong classifier Strong classifier at iteration 10
    39. 39. Example: screen detection + … Feature output Thresholded output Strong classifier Adding features Final classification Strong classifier at iteration 200
    40. 40. Cascade of classifiers Fleuret and Geman 2001, Viola and Jones 2001 We want the complexity of the 3 features classifier with the performance of the 100 features classifier: 3 features 30 features 100 features Select a threshold with high recall for each stage. We increase precision using the cascade Recall Precision 0% 100% 100%
    41. 41. Some goals for object recognition <ul><li>Able to detect and recognize many object classes </li></ul><ul><li>Computationally efficient </li></ul><ul><li>Able to deal with data starving situations: </li></ul><ul><ul><li>Some training samples might be harder to collect than others </li></ul></ul><ul><ul><li>We want on-line learning to be fast </li></ul></ul>
    42. 42. Multiclass object detection
    43. 43. Multiclass object detection Number of classes Amount of labeled data Number of classes Complexity
    44. 44. Shared features <ul><li>Is learning the object class 1000 easier than learning the first? </li></ul><ul><li>Can we transfer knowledge from one object to another? </li></ul><ul><li>Are the shared properties interesting by themselves? </li></ul>…
    45. 45. Multitask learning <ul><li>horizontal location of doorknob </li></ul><ul><li>single or double door </li></ul><ul><li>horizontal location of doorway center </li></ul><ul><li>width of doorway </li></ul><ul><li>horizontal location of left door jamb </li></ul><ul><li>horizontal location of right door jamb </li></ul><ul><li>width of left door jamb </li></ul><ul><li>width of right door jamb </li></ul><ul><li>horizontal location of left edge of door </li></ul><ul><li>horizontal location of right edge of door </li></ul>Primary task : detect door knobs Tasks used : R. Caruana. Multitask Learning. ML 1997
    46. 46. Sharing invariances S. Thrun. Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996 “ Knowledge is transferred between tasks via a learned model of the invariances of the domain: object recognition is invariant to rotation, translation, scaling, lighting, … These invariances are common to all object recognition tasks”. Toy world Without sharing With sharing
    47. 47. Sharing transformations <ul><li>Miller, E., Matsakis, N., and Viola, P. (2000). Learning from one example through shared densities on transforms. In IEEE Computer Vision and Pattern Recognition. </li></ul>Transformations are shared and can be learnt from other tasks.
    48. 48. Models of object recognition I. Biederman, “Recognition-by-components: A theory of human image understanding,” Psychological Review , 1987. M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience 1999. T. Serre, L. Wolf and T. Poggio. “Object recognition with features inspired by visual cortex”. CVPR 2005
    49. 49. Sharing in constellation models Pictorial Structures Fischler & Elschlager, IEEE Trans. Comp. 1973 Constellation Model Burl, Liung,Perona, 1996; Weber, Welling, Perona, 2000 Fergus, Perona, & Zisserman, CVPR 2003 SVM Detectors Heisele, Poggio, et. al., NIPS 2001 Model-Guided Segmentation Mori, Ren, Efros, & Malik, CVPR 2004
    50. 50. Random initialization Variational EM (Attias, Hinton, Beal, etc.) Slide from Fei Fei Li Fei-Fei , Fergus, & Perona, ICCV 2003 E-Step prior knowledge of p(  ) new estimate of p(  |train) M-Step new  ’s
    51. 51. Reusable Parts Goal: Look for a vocabulary of edges that reduces the number of features. Krempp, Geman, & Amit “Sequential Learning of Reusable Parts for Object Detection”. TR 2002 Number of features Number of classes Examples of reused parts
    52. 52. Sharing patches <ul><li>Bart and Ullman, 2004 </li></ul>For a new class, use only features similar to features that where good for other classes: Proposed Dog features
    53. 53. Multiclass boosting <ul><li>Adaboost.MH (Shapire & Singer, 2000) </li></ul><ul><li>Error correcting output codes (Dietterich & Bakiri, 1995; …) </li></ul><ul><li>Lk-TreeBoost (Friedman, 2001) </li></ul><ul><li>... </li></ul>
    54. 54. Shared features Screen detector Car detector Face detector <ul><li>Independent binary classifiers: </li></ul>Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 Screen detector Car detector Face detector <ul><li>Binary classifiers that share features: </li></ul>
    55. 55. 50 training samples/class 29 object classes 2000 entries in the dictionary Results averaged on 20 runs Error bars = 80% interval Krempp, Geman, & Amit, 2002 Torralba, Murphy, Freeman. CVPR 2004 Shared features Class-specific features
    56. 56. Generalization as a function of object similarities Number of training samples per class Number of training samples per class Area under ROC Area under ROC K = 2.1 K = 4.8 Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 12 viewpoints 12 unrelated object classes
    57. 57. Opelt, Pinz, Zisserman, CVPR 2006 Efficiency Generalization
    58. 58. Some references on multiclass <ul><li>Caruana 1997 </li></ul><ul><li>Schapire, Singer, 2000 </li></ul><ul><li>Thrun, Pratt 1997 </li></ul><ul><li>Krempp, Geman, Amit, 2002 </li></ul><ul><li>E.L.Miller, Matsakis, Viola, 2000 </li></ul><ul><li>Mahamud, Hebert, Lafferty, 2001 </li></ul><ul><li>Fink 2004 </li></ul><ul><li>LeCun, Huang, Bottou, 2004 </li></ul><ul><li>Holub, Welling, Perona, 2005 </li></ul><ul><li>… </li></ul>
    59. 59. Context based methods Antonio Torralba
    60. 60. Why is this hard?
    61. 61. What are the hidden objects? 2 1
    62. 62. What are the hidden objects? Chance ~ 1/30000
    63. 63. Context-based object recognition <ul><li>Cognitive psychology </li></ul><ul><ul><li>Palmer 1975 </li></ul></ul><ul><ul><li>Biederman 1981 </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Computer vision </li></ul><ul><ul><li>Noton and Stark (1971) </li></ul></ul><ul><ul><li>Hanson and Riseman (1978) </li></ul></ul><ul><ul><li>Barrow & Tenenbaum (1978) </li></ul></ul><ul><ul><li>Ohta, kanade, Skai (1978) </li></ul></ul><ul><ul><li>Haralick (1983) </li></ul></ul><ul><ul><li>Strat and Fischler (1991) </li></ul></ul><ul><ul><li>Bobick and Pinhanez (1995) </li></ul></ul><ul><ul><li>Campbell et al (1997) </li></ul></ul>
    64. 67. Global and local representations building car sidewalk Urban street scene
    65. 68. Global and local representations Image index: Summary statistics, configuration of textures Urban street scene features histogram building car sidewalk Urban street scene
    66. 69. Global scene representations Spatial structure is important in order to provide context for object localization Sivic, Russell, Freeman, Zisserman, ICCV 2005 Fei-Fei and Perona, CVPR 2005 Bosch, Zisserman, Munoz, ECCV 2006 Bag of words Spatially organized textures Non localized textons S. Lazebnik, et al, CVPR 2006 Walker, Malik. Vision Research 2004 … M. Gorkani, R. Picard, ICPR 1994 A. Oliva, A. Torralba, IJCV 2001 …
    67. 70. Contextual object relationships Carbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005) Torralba Murphy Freeman (2004) Fink & Perona (2003) E. Sudderth et al (2005)
    68. 71. Context <ul><li>Murphy, Torralba & Freeman (NIPS 03) </li></ul><ul><li>Use global context to predict presence and location of objects </li></ul>Keyboards O p1,c1 v p1,c1 O pN,c1 v pN,c1 . . . O p1,c2 v p1,c2 O pN,c2 v pN,c2 . . . Class 1 Class 2 E1 E2 S c2 max V c1 max V X1 X2 v g
    69. 72. 3d Scene Context Image World [Hoiem, Efros, Hebert ICCV 2005]
    70. 73. 3d Scene Context Image Support Vertical Sky V-Left V-Center V-Right V-Porous V-Solid [Hoiem, Efros, Hebert ICCV 2005] Object Surface? Support?
    71. 74. Object-Object Relationships <ul><li>Enforce spatial consistency between labels using MRF </li></ul>Carbonetto, de Freitas & Barnard (04)
    72. 75. Object-Object Relationships <ul><li>Use latent variables to induce long distance correlations between labels in a Conditional Random Field (CRF) </li></ul>He, Zemel & Carreira-Perpinan (04)
    73. 76. <ul><li>Fink & Perona (NIPS 03) </li></ul><ul><li>Use output of boosting from other objects at previous iterations as input into boosting for this iteration </li></ul>Object-Object Relationships
    74. 77. CRFs Object-Object Relationships [Torralba Murphy Freeman 2004] [Kumar Hebert 2005]
    75. 78. Hierarchical Sharing and Context <ul><li>Scenes share objects </li></ul><ul><li>Objects share parts </li></ul><ul><li>Parts share features </li></ul>E. Sudderth, A. Torralba, W. T. Freeman, and A. Wilsky.
    76. 79. Some references on context <ul><li>Strat & Fischler (PAMI 91) </li></ul><ul><li>Torralba & Sinha (ICCV 01), </li></ul><ul><li>Torralba (IJCV 03) </li></ul><ul><li>Fink & Perona (NIPS 03) </li></ul><ul><li>Murphy, Torralba & Freeman (NIPS 03) </li></ul><ul><li>Kumar and M. Hebert (NIPS 04) </li></ul><ul><li>Carbonetto, Freitas & Barnard (ECCV 04) </li></ul><ul><li>He, Zemel & Carreira-Perpinan (CVPR 04) </li></ul><ul><li>Sudderth, Torralba, Freeman, Wilsky (ICCV 05) </li></ul><ul><li>Hoiem, Efros, Hebert (ICCV 05) </li></ul><ul><li>… </li></ul>With a mixture of generative and discriminative approaches
    77. 80. A car out of context …
    78. 81. Integrated models for scene and object recognition Banksy
    79. 82. Summary <ul><li>Many techniques are used for training discriminative models that I have not mention here: </li></ul><ul><li>Conditional random fields </li></ul><ul><li>Kernels for object recognition </li></ul><ul><li>Learning object similarities </li></ul><ul><li>… </li></ul>

    ×