Cvpr2007 object category recognition p3 - discriminative models

Part 3: classifier based methods Antonio Torralba

Overview of section A short story of discriminative methods Object detection with classifiers Boosting Gentle boosting Weak detectors Object model Object detection Multiclass object detection Context based object recognition

Classifier based methods Object detection and recognition is formulated as a classification problem. … and a decision is taken at each window about if it contains a target object or not. Where are the screens? The image is partitioned into a set of overlapping windows Bag of image patches Decision boundary Computer screen Background In some feature space

Discriminative vs. generative x = data Generative model (The lousy painter) 0 10 20 30 40 50 60 70 0 0.05 0.1 0 10 20 30 40 50 60 70 0 0.5 1 x = data Discriminative model 0 10 20 30 40 50 60 70 80 -1 1 x = data Classification function ( The artist )

The representation and matching of pictorial structures Fischler, Elschlager (1973) . Face recognition using eigenfaces M. Turk and A. Pentland (1991). Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) Graded Learning for Object Detection - Fleuret, Geman (1999) Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) … .

Face detection The representation and matching of pictorial structures Fischler, Elschlager (1973) . Face recognition using eigenfaces M. Turk and A. Pentland (1991). Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) Graded Learning for Object Detection - Fleuret, Geman (1999) Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) … .

Formulation: binary classification Formulation +1 -1 x 1 x 2 x 3 x N … … x N+1 x N+2 x N+M -1 -1 ? ? ? … Training data: each image patch is labeled as containing the object or background Test data Features x = Labels y = Minimize misclassification error (Not that simple: we need some guarantees that there will be generalization) Where belongs to some family of functions Classification function

Discriminative methods 10 6 examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines and Kernels Conditional Random Fields McCallum, Freitag, Pereira 2000 Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001 …

A simple object detector with Boosting Download Toolbox for manipulating dataset Code and dataset Matlab code Gentle boosting Object detector using a part based model Dataset with cars and computer monitors http://people.csail.mit.edu/torralba/iccv2005/

A simple algorithm for learning robust classifiers Freund & Shapire, 1995 Friedman, Hastie, Tibshhirani, 1998 Provides efficient algorithm for sparse visual feature selection Tieu & Viola, 2000 Viola & Jones, 2003 Easy to implement, not requires external optimization tools. Why boosting?

Boosting Boosting fits the additive model by minimizing the exponential loss Training samples The exponential loss is a differentiable upper bound to the misclassification error.

Boosting Sequential procedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) to minimize the residual loss input Desired output Parameters weak classifier

Weak classifiers The input is a set of weighted training samples (x,y,w) Regression stumps: simple but commonly used in object detection. Four parameters: b=E w (y [x>  ]) a=E w (y [x<  ]) x f m (x) 

Flavors of boosting AdaBoost (Freund and Shapire, 1995) Real AdaBoost (Friedman et al, 1998) LogitBoost (Friedman et al, 1998) Gentle AdaBoost (Friedman et al, 1998) BrownBoosting (Freund, 2000) FloatBoost (Li et al, 2002) …

From images to features: A myriad of weak detectors We will now define a family of visual features that can be used as weak classifiers (“weak detectors”) Takes image as input and the output is binary response. The output is a weak detector.

A myriad of weak detectors Yuille, Snow, Nitzbert, 1998 Amit, Geman 1998 Papageorgiou, Poggio, 2000 Heisele, Serre, Poggio, 2001 Agarwal, Awan, Roth, 2004 Schneiderman, Kanade 2004 Carmichael, Hebert 2004 …

Weak detectors Textures of textures Tieu and Viola, CVPR 2000 Every combination of three filters generates a different feature This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.

Haar wavelets Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size.

Haar wavelets Papageorgiou & Poggio (2000) Polynomial SVM

Edges and chamfer distance Gavrila, Philomin, ICCV 1999

Edge fragments Weak detector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes Opelt, Pinz, Zisserman, ECCV 2006

Histograms of oriented gradients Dalal & Trigs, 2006 Shape context Belongie, Malik, Puzicha, NIPS 2000 SIFT, D. Lowe, ICCV 1999

Weak detectors Part based : similar to part-based generative models. We create weak detectors by using parts and voting for the object center location Car model Screen model These features are used for the detector on the course web site.

Weak detectors First we collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman, Nature Neuroscience 2003 …

Weak detectors We now define a family of “weak detectors” as: = = Better than chance *

Weak detectors We can do a better job using filtered images Still a weak detector but better than before * * = = =

Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background:

Representation and object model Selected features for the screen detector Lousy painter … 4 10 1 2 3 … 100

Representation and object model Selected features for the car detector 1 2 3 4 10 100 … …

Detection Invariance: search strategy Part based Here, invariance in translation and scale is achieved by the search strategy: the classifier is evaluated at all locations (by translating the image) and at all scales (by scaling the image in small steps). The search cost can be reduced using a cascade. f i , P i g i

Example: screen detection Feature output

Example: screen detection Feature output Thresholded output Weak ‘detector’ Produces many false alarms.

Example: screen detection Feature output Thresholded output Strong classifier at iteration 1

Example: screen detection Feature output Thresholded output Strong classifier Second weak ‘detector’ Produces a different set of false alarms.

Example: screen detection + Feature output Thresholded output Strong classifier Strong classifier at iteration 2

Example: screen detection + … Feature output Thresholded output Strong classifier Strong classifier at iteration 10

Example: screen detection + … Feature output Thresholded output Strong classifier Adding features Final classification Strong classifier at iteration 200

Cascade of classifiers Fleuret and Geman 2001, Viola and Jones 2001 We want the complexity of the 3 features classifier with the performance of the 100 features classifier: 3 features 30 features 100 features Select a threshold with high recall for each stage. We increase precision using the cascade Recall Precision 0% 100% 100%

Some goals for object recognition Able to detect and recognize many object classes Computationally efficient Able to deal with data starving situations: Some training samples might be harder to collect than others We want on-line learning to be fast

Multiclass object detection Number of classes Amount of labeled data Number of classes Complexity

Shared features Is learning the object class 1000 easier than learning the first? Can we transfer knowledge from one object to another? Are the shared properties interesting by themselves? …

Multitask learning horizontal location of doorknob single or double door horizontal location of doorway center width of doorway horizontal location of left door jamb horizontal location of right door jamb width of left door jamb width of right door jamb horizontal location of left edge of door horizontal location of right edge of door Primary task : detect door knobs Tasks used : R. Caruana. Multitask Learning. ML 1997

Sharing invariances S. Thrun. Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996 “ Knowledge is transferred between tasks via a learned model of the invariances of the domain: object recognition is invariant to rotation, translation, scaling, lighting, … These invariances are common to all object recognition tasks”. Toy world Without sharing With sharing

Sharing transformations Miller, E., Matsakis, N., and Viola, P. (2000). Learning from one example through shared densities on transforms. In IEEE Computer Vision and Pattern Recognition. Transformations are shared and can be learnt from other tasks.

Models of object recognition I. Biederman, “Recognition-by-components: A theory of human image understanding,” Psychological Review , 1987. M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience 1999. T. Serre, L. Wolf and T. Poggio. “Object recognition with features inspired by visual cortex”. CVPR 2005

Sharing in constellation models Pictorial Structures Fischler & Elschlager, IEEE Trans. Comp. 1973 Constellation Model Burl, Liung,Perona, 1996; Weber, Welling, Perona, 2000 Fergus, Perona, & Zisserman, CVPR 2003 SVM Detectors Heisele, Poggio, et. al., NIPS 2001 Model-Guided Segmentation Mori, Ren, Efros, & Malik, CVPR 2004

Random initialization Variational EM (Attias, Hinton, Beal, etc.) Slide from Fei Fei Li Fei-Fei , Fergus, & Perona, ICCV 2003 E-Step prior knowledge of p(  ) new estimate of p(  |train) M-Step new  ’s

Reusable Parts Goal: Look for a vocabulary of edges that reduces the number of features. Krempp, Geman, & Amit “Sequential Learning of Reusable Parts for Object Detection”. TR 2002 Number of features Number of classes Examples of reused parts

Sharing patches Bart and Ullman, 2004 For a new class, use only features similar to features that where good for other classes: Proposed Dog features

Multiclass boosting Adaboost.MH (Shapire & Singer, 2000) Error correcting output codes (Dietterich & Bakiri, 1995; …) Lk-TreeBoost (Friedman, 2001) ...

Shared features Screen detector Car detector Face detector Independent binary classifiers: Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 Screen detector Car detector Face detector Binary classifiers that share features:

50 training samples/class 29 object classes 2000 entries in the dictionary Results averaged on 20 runs Error bars = 80% interval Krempp, Geman, & Amit, 2002 Torralba, Murphy, Freeman. CVPR 2004 Shared features Class-specific features

Generalization as a function of object similarities Number of training samples per class Number of training samples per class Area under ROC Area under ROC K = 2.1 K = 4.8 Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 12 viewpoints 12 unrelated object classes

Opelt, Pinz, Zisserman, CVPR 2006 Efficiency Generalization

Some references on multiclass Caruana 1997 Schapire, Singer, 2000 Thrun, Pratt 1997 Krempp, Geman, Amit, 2002 E.L.Miller, Matsakis, Viola, 2000 Mahamud, Hebert, Lafferty, 2001 Fink 2004 LeCun, Huang, Bottou, 2004 Holub, Welling, Perona, 2005 …

Context based methods Antonio Torralba

What are the hidden objects? 2 1

What are the hidden objects? Chance ~ 1/30000

Context-based object recognition Cognitive psychology Palmer 1975 Biederman 1981 … Computer vision Noton and Stark (1971) Hanson and Riseman (1978) Barrow & Tenenbaum (1978) Ohta, kanade, Skai (1978) Haralick (1983) Strat and Fischler (1991) Bobick and Pinhanez (1995) Campbell et al (1997)

Global and local representations building car sidewalk Urban street scene

Global and local representations Image index: Summary statistics, configuration of textures Urban street scene features histogram building car sidewalk Urban street scene

Global scene representations Spatial structure is important in order to provide context for object localization Sivic, Russell, Freeman, Zisserman, ICCV 2005 Fei-Fei and Perona, CVPR 2005 Bosch, Zisserman, Munoz, ECCV 2006 Bag of words Spatially organized textures Non localized textons S. Lazebnik, et al, CVPR 2006 Walker, Malik. Vision Research 2004 … M. Gorkani, R. Picard, ICPR 1994 A. Oliva, A. Torralba, IJCV 2001 …

Contextual object relationships Carbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005) Torralba Murphy Freeman (2004) Fink & Perona (2003) E. Sudderth et al (2005)

Context Murphy, Torralba & Freeman (NIPS 03) Use global context to predict presence and location of objects Keyboards O p1,c1 v p1,c1 O pN,c1 v pN,c1 . . . O p1,c2 v p1,c2 O pN,c2 v pN,c2 . . . Class 1 Class 2 E1 E2 S c2 max V c1 max V X1 X2 v g

3d Scene Context Image World [Hoiem, Efros, Hebert ICCV 2005]

3d Scene Context Image Support Vertical Sky V-Left V-Center V-Right V-Porous V-Solid [Hoiem, Efros, Hebert ICCV 2005] Object Surface? Support?

Object-Object Relationships Enforce spatial consistency between labels using MRF Carbonetto, de Freitas & Barnard (04)

Object-Object Relationships Use latent variables to induce long distance correlations between labels in a Conditional Random Field (CRF) He, Zemel & Carreira-Perpinan (04)

Fink & Perona (NIPS 03) Use output of boosting from other objects at previous iterations as input into boosting for this iteration Object-Object Relationships

CRFs Object-Object Relationships [Torralba Murphy Freeman 2004] [Kumar Hebert 2005]

Hierarchical Sharing and Context Scenes share objects Objects share parts Parts share features E. Sudderth, A. Torralba, W. T. Freeman, and A. Wilsky.

Some references on context Strat & Fischler (PAMI 91) Torralba & Sinha (ICCV 01), Torralba (IJCV 03) Fink & Perona (NIPS 03) Murphy, Torralba & Freeman (NIPS 03) Kumar and M. Hebert (NIPS 04) Carbonetto, Freitas & Barnard (ECCV 04) He, Zemel & Carreira-Perpinan (CVPR 04) Sudderth, Torralba, Freeman, Wilsky (ICCV 05) Hoiem, Efros, Hebert (ICCV 05) … With a mixture of generative and discriminative approaches

Integrated models for scene and object recognition Banksy

Summary Many techniques are used for training discriminative models that I have not mention here: Conditional random fields Kernels for object recognition Learning object similarities …

Cvpr2007 object category recognition p3 - discriminative models

More Related Content

What's hot

Similar to Cvpr2007 object category recognition p3 - discriminative models

More from zukun

Recently uploaded

Cvpr2007 object category recognition p3 - discriminative models

Editor's Notes