Part 3: classifier based methods Antonio Torralba
Overview of section A short story of discriminative methods Object detection with classifiers Boosting Gentle boosting Weak detectors Object model Object detection Multiclass object detection Context based object recognition
Classifier based methods Object detection and recognition is formulated as a classification problem.  …  and a decision is taken at each window about if it contains a target object or not. Where are the screens? The image is partitioned into a set of overlapping windows Bag of image patches Decision boundary Computer screen Background In some feature space
Discriminative vs. generative x = data Generative model  (The lousy  painter) 0 10 20 30 40 50 60 70 0 0.05 0.1 0 10 20 30 40 50 60 70 0 0.5 1 x = data Discriminative model  0 10 20 30 40 50 60 70 80 -1 1 x = data Classification function ( The artist )
The representation and matching of pictorial structures   Fischler, Elschlager (1973) .  Face recognition using eigenfaces M. Turk and A. Pentland (1991).  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)  Graded Learning for Object Detection - Fleuret, Geman (1999)  Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) … .
The representation and matching of pictorial structures   Fischler, Elschlager (1973) .  Face recognition using eigenfaces M. Turk and A. Pentland (1991).  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)  Graded Learning for Object Detection - Fleuret, Geman (1999)  Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) … .
Face detection The representation and matching of pictorial structures   Fischler, Elschlager (1973) .  Face recognition using eigenfaces M. Turk and A. Pentland (1991).  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)  Graded Learning for Object Detection - Fleuret, Geman (1999)  Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) … .
Face detection
Formulation: binary classification Formulation +1 -1 x 1 x 2 x 3 x N … … x N+1 x N+2 x N+M -1 -1 ? ? ? … Training data: each image patch is labeled as containing the object or background Test data Features  x = Labels y = Minimize misclassification error (Not that simple: we need some guarantees that there will be generalization) Where  belongs to some family of functions Classification function
Discriminative methods 10 6  examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines and Kernels Conditional Random Fields McCallum, Freitag, Pereira 2000 Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001 …
A simple object detector with Boosting  Download  Toolbox for manipulating dataset Code and dataset Matlab code Gentle boosting Object detector using a part based model Dataset with cars and computer monitors http://people.csail.mit.edu/torralba/iccv2005/
A simple algorithm for learning robust classifiers Freund & Shapire, 1995 Friedman, Hastie, Tibshhirani, 1998 Provides efficient algorithm for sparse visual feature selection Tieu & Viola, 2000 Viola & Jones, 2003 Easy to implement, not requires external optimization tools. Why boosting?
Boosting  Boosting fits the additive model by minimizing the exponential loss Training samples The exponential loss is a differentiable upper bound to the misclassification error.
Boosting  Sequential procedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) to minimize the residual loss  input Desired output Parameters weak classifier
Weak classifiers  The input is a set of weighted training samples (x,y,w) Regression stumps: simple but commonly used in object detection. Four parameters: b=E w (y [x>   ]) a=E w (y [x<   ]) x f m (x) 
Flavors of boosting AdaBoost (Freund and Shapire, 1995) Real AdaBoost (Friedman et al, 1998) LogitBoost (Friedman et al, 1998) Gentle AdaBoost (Friedman et al, 1998) BrownBoosting (Freund, 2000) FloatBoost (Li et al, 2002) …
From images to features: A myriad of weak detectors We will now define a family of visual features that can be used as weak classifiers (“weak detectors”) Takes image as input and the output is binary response. The output is a weak detector.
A myriad of weak detectors Yuille, Snow, Nitzbert, 1998 Amit, Geman 1998 Papageorgiou, Poggio, 2000 Heisele, Serre, Poggio, 2001 Agarwal, Awan, Roth, 2004 Schneiderman, Kanade 2004  Carmichael, Hebert 2004 …
Weak detectors Textures of textures  Tieu and Viola, CVPR 2000 Every combination of three filters generates a different feature This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.
Haar wavelets Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size.
Haar wavelets Papageorgiou & Poggio (2000) Polynomial SVM
Edges and chamfer distance Gavrila, Philomin, ICCV 1999
Edge fragments Weak detector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes Opelt, Pinz, Zisserman, ECCV 2006
Histograms of oriented gradients Dalal & Trigs, 2006 Shape context Belongie, Malik, Puzicha, NIPS 2000 SIFT, D. Lowe, ICCV 1999
Weak detectors Part based : similar to part-based generative models. We create weak detectors by using parts and voting for the object center location Car model Screen model These features are used for the detector on the course web site.
Weak detectors First we collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman, Nature Neuroscience 2003 …
Weak detectors We now define a family of “weak detectors” as: = = Better than chance *
Weak detectors We can do a better job using filtered images Still a weak detector but better than before * * = = =
Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background:
Representation and object model Selected features for the screen detector Lousy painter  … 4 10 1 2 3 … 100
Representation and object model Selected features for the car detector 1 2 3 4 10 100 … …
Detection Invariance: search strategy Part based Here, invariance in translation and scale is achieved by the search strategy: the classifier is evaluated at all locations (by translating the image) and at all scales (by scaling the image in small steps). The search cost can be reduced using a cascade. f i , P i g i
Example: screen detection Feature  output
Example: screen detection Feature  output Thresholded  output Weak ‘detector’ Produces many false alarms.
Example: screen detection Feature  output Thresholded  output Strong classifier  at iteration 1
Example: screen detection Feature  output Thresholded  output Strong classifier Second weak ‘detector’ Produces a different set of false alarms.
Example: screen detection + Feature  output Thresholded  output Strong classifier Strong classifier  at iteration 2
Example: screen detection + … Feature  output Thresholded  output Strong classifier Strong classifier  at iteration 10
Example: screen detection + … Feature  output Thresholded  output Strong classifier Adding  features Final classification Strong classifier  at iteration 200
Cascade of classifiers Fleuret and Geman 2001, Viola and Jones 2001 We want the complexity of the 3 features classifier with the performance of the 100 features classifier: 3 features 30 features 100 features Select a threshold with high recall for each stage.  We increase precision using the cascade Recall Precision 0% 100% 100%
Some goals for object recognition  Able to detect and recognize many object classes Computationally efficient Able to deal with data starving situations: Some training samples might be harder to collect than others We want on-line learning to be fast
Multiclass object detection
Multiclass object detection Number of classes Amount of labeled data Number of classes Complexity
Shared features Is learning the object class 1000 easier than learning the first? Can we transfer knowledge from one object to another? Are the shared properties interesting by themselves?  …
Multitask learning horizontal location of doorknob  single or double door horizontal location of doorway center  width of doorway horizontal location of left door jamb horizontal location of right door jamb width of left door jamb  width of right door jamb horizontal location of left edge of door  horizontal location of right edge of door Primary task : detect door knobs Tasks used : R. Caruana. Multitask Learning. ML 1997
Sharing invariances S. Thrun.   Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996 “ Knowledge is transferred between tasks via a learned model of the invariances of the domain: object recognition is invariant to rotation, translation, scaling, lighting, … These invariances are common to all object recognition tasks”.  Toy world Without sharing With sharing
Sharing transformations Miller, E., Matsakis, N., and Viola, P. (2000). Learning from one example through shared densities on transforms. In IEEE Computer Vision and Pattern Recognition. Transformations are shared and can be learnt from other tasks.
Models of object recognition I. Biederman, “Recognition-by-components: A theory of human image understanding,”  Psychological Review , 1987. M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,”  Nature Neuroscience  1999. T. Serre, L. Wolf and T. Poggio. “Object recognition with features inspired  by visual cortex”. CVPR 2005
Sharing in constellation models Pictorial Structures Fischler & Elschlager, IEEE Trans. Comp. 1973 Constellation Model Burl, Liung,Perona, 1996; Weber, Welling, Perona, 2000 Fergus, Perona, & Zisserman, CVPR 2003  SVM Detectors Heisele, Poggio, et. al., NIPS 2001 Model-Guided Segmentation Mori, Ren, Efros, & Malik, CVPR 2004
Random  initialization Variational EM (Attias, Hinton, Beal, etc.) Slide from Fei Fei Li Fei-Fei ,  Fergus, & Perona, ICCV 2003 E-Step prior knowledge of p(  ) new estimate  of p(  |train) M-Step new   ’s
Reusable Parts Goal: Look for a vocabulary of edges that reduces the number of features. Krempp, Geman, & Amit “Sequential Learning of Reusable Parts for Object Detection”. TR 2002 Number of features Number of classes Examples of reused parts
Sharing patches Bart and Ullman, 2004 For a new class, use only features similar to features that where good for other classes: Proposed Dog  features
Multiclass boosting Adaboost.MH (Shapire & Singer, 2000) Error correcting output codes (Dietterich & Bakiri, 1995; …) Lk-TreeBoost (Friedman, 2001) ...
Shared features Screen detector Car detector Face detector Independent binary classifiers: Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 Screen detector Car detector Face detector Binary classifiers that share features:
50 training samples/class 29 object classes 2000 entries in the dictionary Results averaged on 20 runs Error bars = 80% interval Krempp, Geman, & Amit, 2002 Torralba, Murphy, Freeman. CVPR 2004  Shared features Class-specific features
Generalization as a function of object similarities Number of training samples per class Number of training samples per class Area under ROC Area under ROC K = 2.1 K = 4.8 Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 12 viewpoints 12 unrelated object classes
Opelt, Pinz, Zisserman, CVPR 2006  Efficiency Generalization
Some references on multiclass Caruana 1997 Schapire, Singer, 2000 Thrun, Pratt 1997 Krempp, Geman, Amit, 2002 E.L.Miller, Matsakis, Viola, 2000 Mahamud, Hebert, Lafferty, 2001 Fink 2004 LeCun, Huang, Bottou, 2004 Holub, Welling, Perona, 2005 …
Context based methods Antonio Torralba
Why is this hard?
What are the hidden objects? 2 1
What are the hidden objects? Chance ~ 1/30000
Context-based object recognition Cognitive psychology Palmer 1975  Biederman 1981 … Computer vision Noton and Stark (1971) Hanson and Riseman (1978) Barrow & Tenenbaum (1978)  Ohta, kanade, Skai (1978) Haralick (1983) Strat and Fischler (1991) Bobick and Pinhanez (1995) Campbell et al (1997)
 
 
 
Global and local representations building car sidewalk Urban street scene
Global and local representations Image index: Summary statistics,  configuration of textures Urban street scene features histogram building car sidewalk Urban street scene
Global scene representations Spatial structure is important in order to provide context for object localization Sivic, Russell, Freeman, Zisserman, ICCV 2005 Fei-Fei and Perona, CVPR 2005 Bosch, Zisserman, Munoz, ECCV 2006 Bag of words Spatially organized textures Non localized textons S. Lazebnik, et al, CVPR 2006 Walker, Malik. Vision Research 2004  … M. Gorkani, R. Picard, ICPR 1994 A. Oliva, A. Torralba, IJCV 2001 …
Contextual object relationships Carbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005) Torralba Murphy Freeman (2004) Fink & Perona (2003) E. Sudderth et al (2005)
Context Murphy, Torralba & Freeman (NIPS 03) Use global context to predict presence and location of objects Keyboards O p1,c1 v p1,c1 O pN,c1 v pN,c1 . . . O p1,c2 v p1,c2 O pN,c2 v pN,c2 . . . Class 1 Class 2 E1 E2 S c2 max V c1 max V X1 X2 v g
3d Scene Context Image World [Hoiem, Efros, Hebert ICCV 2005]
3d Scene Context Image Support Vertical Sky V-Left V-Center V-Right V-Porous V-Solid [Hoiem, Efros, Hebert ICCV 2005] Object Surface? Support?
Object-Object Relationships Enforce spatial consistency between labels using MRF Carbonetto, de Freitas & Barnard (04)
Object-Object Relationships Use latent variables to induce long distance correlations between labels in a Conditional Random Field (CRF) He, Zemel & Carreira-Perpinan (04)
Fink & Perona (NIPS 03) Use output of boosting from other objects at previous iterations as input into boosting for this iteration Object-Object Relationships
CRFs Object-Object Relationships [Torralba Murphy Freeman 2004] [Kumar Hebert 2005]
Hierarchical Sharing and Context Scenes share objects Objects share parts Parts share features E. Sudderth, A. Torralba, W. T. Freeman, and A. Wilsky.
Some references on context Strat & Fischler (PAMI 91)  Torralba & Sinha (ICCV 01),  Torralba (IJCV 03)  Fink & Perona (NIPS 03) Murphy, Torralba & Freeman (NIPS 03) Kumar and M. Hebert (NIPS 04) Carbonetto, Freitas & Barnard (ECCV 04) He, Zemel & Carreira-Perpinan (CVPR 04) Sudderth, Torralba, Freeman, Wilsky (ICCV 05) Hoiem, Efros, Hebert (ICCV 05) … With a mixture of generative and discriminative approaches
A car out of context …
Integrated models for  scene and object recognition Banksy
Summary Many techniques are used for training discriminative models that I have not mention here: Conditional random fields  Kernels for object recognition Learning object similarities …

Cvpr2007 object category recognition p3 - discriminative models

  • 1.
    Part 3: classifierbased methods Antonio Torralba
  • 2.
    Overview of sectionA short story of discriminative methods Object detection with classifiers Boosting Gentle boosting Weak detectors Object model Object detection Multiclass object detection Context based object recognition
  • 3.
    Classifier based methodsObject detection and recognition is formulated as a classification problem. … and a decision is taken at each window about if it contains a target object or not. Where are the screens? The image is partitioned into a set of overlapping windows Bag of image patches Decision boundary Computer screen Background In some feature space
  • 4.
    Discriminative vs. generativex = data Generative model (The lousy painter) 0 10 20 30 40 50 60 70 0 0.05 0.1 0 10 20 30 40 50 60 70 0 0.5 1 x = data Discriminative model 0 10 20 30 40 50 60 70 80 -1 1 x = data Classification function ( The artist )
  • 5.
    The representation andmatching of pictorial structures Fischler, Elschlager (1973) . Face recognition using eigenfaces M. Turk and A. Pentland (1991). Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) Graded Learning for Object Detection - Fleuret, Geman (1999) Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) … .
  • 6.
    The representation andmatching of pictorial structures Fischler, Elschlager (1973) . Face recognition using eigenfaces M. Turk and A. Pentland (1991). Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) Graded Learning for Object Detection - Fleuret, Geman (1999) Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) … .
  • 7.
    Face detection Therepresentation and matching of pictorial structures Fischler, Elschlager (1973) . Face recognition using eigenfaces M. Turk and A. Pentland (1991). Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) Graded Learning for Object Detection - Fleuret, Geman (1999) Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) … .
  • 8.
  • 9.
    Formulation: binary classificationFormulation +1 -1 x 1 x 2 x 3 x N … … x N+1 x N+2 x N+M -1 -1 ? ? ? … Training data: each image patch is labeled as containing the object or background Test data Features x = Labels y = Minimize misclassification error (Not that simple: we need some guarantees that there will be generalization) Where belongs to some family of functions Classification function
  • 10.
    Discriminative methods 106 examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines and Kernels Conditional Random Fields McCallum, Freitag, Pereira 2000 Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001 …
  • 11.
    A simple objectdetector with Boosting Download Toolbox for manipulating dataset Code and dataset Matlab code Gentle boosting Object detector using a part based model Dataset with cars and computer monitors http://people.csail.mit.edu/torralba/iccv2005/
  • 12.
    A simple algorithmfor learning robust classifiers Freund & Shapire, 1995 Friedman, Hastie, Tibshhirani, 1998 Provides efficient algorithm for sparse visual feature selection Tieu & Viola, 2000 Viola & Jones, 2003 Easy to implement, not requires external optimization tools. Why boosting?
  • 13.
    Boosting Boostingfits the additive model by minimizing the exponential loss Training samples The exponential loss is a differentiable upper bound to the misclassification error.
  • 14.
    Boosting Sequentialprocedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) to minimize the residual loss input Desired output Parameters weak classifier
  • 15.
    Weak classifiers The input is a set of weighted training samples (x,y,w) Regression stumps: simple but commonly used in object detection. Four parameters: b=E w (y [x>  ]) a=E w (y [x<  ]) x f m (x) 
  • 16.
    Flavors of boostingAdaBoost (Freund and Shapire, 1995) Real AdaBoost (Friedman et al, 1998) LogitBoost (Friedman et al, 1998) Gentle AdaBoost (Friedman et al, 1998) BrownBoosting (Freund, 2000) FloatBoost (Li et al, 2002) …
  • 17.
    From images tofeatures: A myriad of weak detectors We will now define a family of visual features that can be used as weak classifiers (“weak detectors”) Takes image as input and the output is binary response. The output is a weak detector.
  • 18.
    A myriad ofweak detectors Yuille, Snow, Nitzbert, 1998 Amit, Geman 1998 Papageorgiou, Poggio, 2000 Heisele, Serre, Poggio, 2001 Agarwal, Awan, Roth, 2004 Schneiderman, Kanade 2004 Carmichael, Hebert 2004 …
  • 19.
    Weak detectors Texturesof textures Tieu and Viola, CVPR 2000 Every combination of three filters generates a different feature This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.
  • 20.
    Haar wavelets Haarfilters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size.
  • 21.
    Haar wavelets Papageorgiou& Poggio (2000) Polynomial SVM
  • 22.
    Edges and chamferdistance Gavrila, Philomin, ICCV 1999
  • 23.
    Edge fragments Weakdetector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes Opelt, Pinz, Zisserman, ECCV 2006
  • 24.
    Histograms of orientedgradients Dalal & Trigs, 2006 Shape context Belongie, Malik, Puzicha, NIPS 2000 SIFT, D. Lowe, ICCV 1999
  • 25.
    Weak detectors Partbased : similar to part-based generative models. We create weak detectors by using parts and voting for the object center location Car model Screen model These features are used for the detector on the course web site.
  • 26.
    Weak detectors Firstwe collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman, Nature Neuroscience 2003 …
  • 27.
    Weak detectors Wenow define a family of “weak detectors” as: = = Better than chance *
  • 28.
    Weak detectors Wecan do a better job using filtered images Still a weak detector but better than before * * = = =
  • 29.
    Training First weevaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background:
  • 30.
    Representation and objectmodel Selected features for the screen detector Lousy painter … 4 10 1 2 3 … 100
  • 31.
    Representation and objectmodel Selected features for the car detector 1 2 3 4 10 100 … …
  • 32.
    Detection Invariance: searchstrategy Part based Here, invariance in translation and scale is achieved by the search strategy: the classifier is evaluated at all locations (by translating the image) and at all scales (by scaling the image in small steps). The search cost can be reduced using a cascade. f i , P i g i
  • 33.
  • 34.
    Example: screen detectionFeature output Thresholded output Weak ‘detector’ Produces many false alarms.
  • 35.
    Example: screen detectionFeature output Thresholded output Strong classifier at iteration 1
  • 36.
    Example: screen detectionFeature output Thresholded output Strong classifier Second weak ‘detector’ Produces a different set of false alarms.
  • 37.
    Example: screen detection+ Feature output Thresholded output Strong classifier Strong classifier at iteration 2
  • 38.
    Example: screen detection+ … Feature output Thresholded output Strong classifier Strong classifier at iteration 10
  • 39.
    Example: screen detection+ … Feature output Thresholded output Strong classifier Adding features Final classification Strong classifier at iteration 200
  • 40.
    Cascade of classifiersFleuret and Geman 2001, Viola and Jones 2001 We want the complexity of the 3 features classifier with the performance of the 100 features classifier: 3 features 30 features 100 features Select a threshold with high recall for each stage. We increase precision using the cascade Recall Precision 0% 100% 100%
  • 41.
    Some goals forobject recognition Able to detect and recognize many object classes Computationally efficient Able to deal with data starving situations: Some training samples might be harder to collect than others We want on-line learning to be fast
  • 42.
  • 43.
    Multiclass object detectionNumber of classes Amount of labeled data Number of classes Complexity
  • 44.
    Shared features Islearning the object class 1000 easier than learning the first? Can we transfer knowledge from one object to another? Are the shared properties interesting by themselves? …
  • 45.
    Multitask learning horizontallocation of doorknob single or double door horizontal location of doorway center width of doorway horizontal location of left door jamb horizontal location of right door jamb width of left door jamb width of right door jamb horizontal location of left edge of door horizontal location of right edge of door Primary task : detect door knobs Tasks used : R. Caruana. Multitask Learning. ML 1997
  • 46.
    Sharing invariances S.Thrun. Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996 “ Knowledge is transferred between tasks via a learned model of the invariances of the domain: object recognition is invariant to rotation, translation, scaling, lighting, … These invariances are common to all object recognition tasks”. Toy world Without sharing With sharing
  • 47.
    Sharing transformations Miller,E., Matsakis, N., and Viola, P. (2000). Learning from one example through shared densities on transforms. In IEEE Computer Vision and Pattern Recognition. Transformations are shared and can be learnt from other tasks.
  • 48.
    Models of objectrecognition I. Biederman, “Recognition-by-components: A theory of human image understanding,” Psychological Review , 1987. M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience 1999. T. Serre, L. Wolf and T. Poggio. “Object recognition with features inspired by visual cortex”. CVPR 2005
  • 49.
    Sharing in constellationmodels Pictorial Structures Fischler & Elschlager, IEEE Trans. Comp. 1973 Constellation Model Burl, Liung,Perona, 1996; Weber, Welling, Perona, 2000 Fergus, Perona, & Zisserman, CVPR 2003 SVM Detectors Heisele, Poggio, et. al., NIPS 2001 Model-Guided Segmentation Mori, Ren, Efros, & Malik, CVPR 2004
  • 50.
    Random initializationVariational EM (Attias, Hinton, Beal, etc.) Slide from Fei Fei Li Fei-Fei , Fergus, & Perona, ICCV 2003 E-Step prior knowledge of p(  ) new estimate of p(  |train) M-Step new  ’s
  • 51.
    Reusable Parts Goal:Look for a vocabulary of edges that reduces the number of features. Krempp, Geman, & Amit “Sequential Learning of Reusable Parts for Object Detection”. TR 2002 Number of features Number of classes Examples of reused parts
  • 52.
    Sharing patches Bartand Ullman, 2004 For a new class, use only features similar to features that where good for other classes: Proposed Dog features
  • 53.
    Multiclass boosting Adaboost.MH(Shapire & Singer, 2000) Error correcting output codes (Dietterich & Bakiri, 1995; …) Lk-TreeBoost (Friedman, 2001) ...
  • 54.
    Shared features Screendetector Car detector Face detector Independent binary classifiers: Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 Screen detector Car detector Face detector Binary classifiers that share features:
  • 55.
    50 training samples/class29 object classes 2000 entries in the dictionary Results averaged on 20 runs Error bars = 80% interval Krempp, Geman, & Amit, 2002 Torralba, Murphy, Freeman. CVPR 2004 Shared features Class-specific features
  • 56.
    Generalization as afunction of object similarities Number of training samples per class Number of training samples per class Area under ROC Area under ROC K = 2.1 K = 4.8 Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 12 viewpoints 12 unrelated object classes
  • 57.
    Opelt, Pinz, Zisserman,CVPR 2006 Efficiency Generalization
  • 58.
    Some references onmulticlass Caruana 1997 Schapire, Singer, 2000 Thrun, Pratt 1997 Krempp, Geman, Amit, 2002 E.L.Miller, Matsakis, Viola, 2000 Mahamud, Hebert, Lafferty, 2001 Fink 2004 LeCun, Huang, Bottou, 2004 Holub, Welling, Perona, 2005 …
  • 59.
    Context based methodsAntonio Torralba
  • 60.
  • 61.
    What are thehidden objects? 2 1
  • 62.
    What are thehidden objects? Chance ~ 1/30000
  • 63.
    Context-based object recognitionCognitive psychology Palmer 1975 Biederman 1981 … Computer vision Noton and Stark (1971) Hanson and Riseman (1978) Barrow & Tenenbaum (1978) Ohta, kanade, Skai (1978) Haralick (1983) Strat and Fischler (1991) Bobick and Pinhanez (1995) Campbell et al (1997)
  • 64.
  • 65.
  • 66.
  • 67.
    Global and localrepresentations building car sidewalk Urban street scene
  • 68.
    Global and localrepresentations Image index: Summary statistics, configuration of textures Urban street scene features histogram building car sidewalk Urban street scene
  • 69.
    Global scene representationsSpatial structure is important in order to provide context for object localization Sivic, Russell, Freeman, Zisserman, ICCV 2005 Fei-Fei and Perona, CVPR 2005 Bosch, Zisserman, Munoz, ECCV 2006 Bag of words Spatially organized textures Non localized textons S. Lazebnik, et al, CVPR 2006 Walker, Malik. Vision Research 2004 … M. Gorkani, R. Picard, ICPR 1994 A. Oliva, A. Torralba, IJCV 2001 …
  • 70.
    Contextual object relationshipsCarbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005) Torralba Murphy Freeman (2004) Fink & Perona (2003) E. Sudderth et al (2005)
  • 71.
    Context Murphy, Torralba& Freeman (NIPS 03) Use global context to predict presence and location of objects Keyboards O p1,c1 v p1,c1 O pN,c1 v pN,c1 . . . O p1,c2 v p1,c2 O pN,c2 v pN,c2 . . . Class 1 Class 2 E1 E2 S c2 max V c1 max V X1 X2 v g
  • 72.
    3d Scene ContextImage World [Hoiem, Efros, Hebert ICCV 2005]
  • 73.
    3d Scene ContextImage Support Vertical Sky V-Left V-Center V-Right V-Porous V-Solid [Hoiem, Efros, Hebert ICCV 2005] Object Surface? Support?
  • 74.
    Object-Object Relationships Enforcespatial consistency between labels using MRF Carbonetto, de Freitas & Barnard (04)
  • 75.
    Object-Object Relationships Uselatent variables to induce long distance correlations between labels in a Conditional Random Field (CRF) He, Zemel & Carreira-Perpinan (04)
  • 76.
    Fink & Perona(NIPS 03) Use output of boosting from other objects at previous iterations as input into boosting for this iteration Object-Object Relationships
  • 77.
    CRFs Object-Object Relationships[Torralba Murphy Freeman 2004] [Kumar Hebert 2005]
  • 78.
    Hierarchical Sharing andContext Scenes share objects Objects share parts Parts share features E. Sudderth, A. Torralba, W. T. Freeman, and A. Wilsky.
  • 79.
    Some references oncontext Strat & Fischler (PAMI 91) Torralba & Sinha (ICCV 01), Torralba (IJCV 03) Fink & Perona (NIPS 03) Murphy, Torralba & Freeman (NIPS 03) Kumar and M. Hebert (NIPS 04) Carbonetto, Freitas & Barnard (ECCV 04) He, Zemel & Carreira-Perpinan (CVPR 04) Sudderth, Torralba, Freeman, Wilsky (ICCV 05) Hoiem, Efros, Hebert (ICCV 05) … With a mixture of generative and discriminative approaches
  • 80.
    A car outof context …
  • 81.
    Integrated models for scene and object recognition Banksy
  • 82.
    Summary Many techniquesare used for training discriminative models that I have not mention here: Conditional random fields Kernels for object recognition Learning object similarities …

Editor's Notes

  • #54 There are multiple flavors of multiclass boosting, here I am going to use the original Adaboost.MH
  • #55 Strees: the main part is not to discriminate among classes. The hard part is to detect the objects when inmersed in clutter.
  • #58 These results have been reproduced by opelt with a completely different set of features, and they found the same sublinear scaling for the complexity and the same improved generalization. Although the increase is weak because they use few categories
  • #82 Despite of being semantically wrong, some basic contextual principles are still preserved: the elephant is not sticking out of the wall.