• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cvpr2007 object category recognition   p3 - discriminative models
 

Cvpr2007 object category recognition p3 - discriminative models

on

  • 980 views

 

Statistics

Views

Total Views
980
Views on SlideShare
980
Embed Views
0

Actions

Likes
0
Downloads
136
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • There are multiple flavors of multiclass boosting, here I am going to use the original Adaboost.MH
  • Strees: the main part is not to discriminate among classes. The hard part is to detect the objects when inmersed in clutter.
  • These results have been reproduced by opelt with a completely different set of features, and they found the same sublinear scaling for the complexity and the same improved generalization. Although the increase is weak because they use few categories
  • Despite of being semantically wrong, some basic contextual principles are still preserved: the elephant is not sticking out of the wall.

Cvpr2007 object category recognition   p3 - discriminative models Cvpr2007 object category recognition p3 - discriminative models Presentation Transcript

  • Part 3: classifier based methods Antonio Torralba
  • Overview of section
    • A short story of discriminative methods
    • Object detection with classifiers
    • Boosting
      • Gentle boosting
      • Weak detectors
      • Object model
      • Object detection
    • Multiclass object detection
    • Context based object recognition
  • Classifier based methods Object detection and recognition is formulated as a classification problem. … and a decision is taken at each window about if it contains a target object or not. Where are the screens? The image is partitioned into a set of overlapping windows Bag of image patches Decision boundary Computer screen Background In some feature space
  • Discriminative vs. generative x = data
    • Generative model
    (The lousy painter) 0 10 20 30 40 50 60 70 0 0.05 0.1 0 10 20 30 40 50 60 70 0 0.5 1 x = data
    • Discriminative model
    0 10 20 30 40 50 60 70 80 -1 1 x = data
    • Classification function
    ( The artist )
    • The representation and matching of pictorial structures Fischler, Elschlager (1973) .
    • Face recognition using eigenfaces M. Turk and A. Pentland (1991).
    • Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)
    • Graded Learning for Object Detection - Fleuret, Geman (1999)
    • Robust Real-time Object Detection - Viola, Jones (2001)
    • Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001)
    • … .
    • The representation and matching of pictorial structures Fischler, Elschlager (1973) .
    • Face recognition using eigenfaces M. Turk and A. Pentland (1991).
    • Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)
    • Graded Learning for Object Detection - Fleuret, Geman (1999)
    • Robust Real-time Object Detection - Viola, Jones (2001)
    • Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001)
    • … .
  • Face detection
    • The representation and matching of pictorial structures Fischler, Elschlager (1973) .
    • Face recognition using eigenfaces M. Turk and A. Pentland (1991).
    • Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)
    • Graded Learning for Object Detection - Fleuret, Geman (1999)
    • Robust Real-time Object Detection - Viola, Jones (2001)
    • Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001)
    • … .
  • Face detection
    • Formulation: binary classification
    Formulation +1 -1 x 1 x 2 x 3 x N … … x N+1 x N+2 x N+M -1 -1 ? ? ? … Training data: each image patch is labeled as containing the object or background Test data Features x = Labels y =
    • Minimize misclassification error
    • (Not that simple: we need some guarantees that there will be generalization)
    Where belongs to some family of functions
    • Classification function
  • Discriminative methods 10 6 examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines and Kernels Conditional Random Fields McCallum, Freitag, Pereira 2000 Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001 …
  • A simple object detector with Boosting
    • Download
    • Toolbox for manipulating dataset
    • Code and dataset
    • Matlab code
    • Gentle boosting
    • Object detector using a part based model
    • Dataset with cars and computer monitors
    http://people.csail.mit.edu/torralba/iccv2005/
    • A simple algorithm for learning robust classifiers
      • Freund & Shapire, 1995
      • Friedman, Hastie, Tibshhirani, 1998
    • Provides efficient algorithm for sparse visual feature selection
      • Tieu & Viola, 2000
      • Viola & Jones, 2003
    • Easy to implement, not requires external optimization tools.
    Why boosting?
  • Boosting
    • Boosting fits the additive model
    by minimizing the exponential loss Training samples The exponential loss is a differentiable upper bound to the misclassification error.
  • Boosting Sequential procedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) to minimize the residual loss input Desired output Parameters weak classifier
  • Weak classifiers
    • The input is a set of weighted training samples (x,y,w)
    • Regression stumps: simple but commonly used in object detection.
    Four parameters: b=E w (y [x>  ]) a=E w (y [x<  ]) x f m (x) 
  • Flavors of boosting
    • AdaBoost (Freund and Shapire, 1995)
    • Real AdaBoost (Friedman et al, 1998)
    • LogitBoost (Friedman et al, 1998)
    • Gentle AdaBoost (Friedman et al, 1998)
    • BrownBoosting (Freund, 2000)
    • FloatBoost (Li et al, 2002)
  • From images to features: A myriad of weak detectors
    • We will now define a family of visual features that can be used as weak classifiers (“weak detectors”)
    Takes image as input and the output is binary response. The output is a weak detector.
  • A myriad of weak detectors
    • Yuille, Snow, Nitzbert, 1998
    • Amit, Geman 1998
    • Papageorgiou, Poggio, 2000
    • Heisele, Serre, Poggio, 2001
    • Agarwal, Awan, Roth, 2004
    • Schneiderman, Kanade 2004
    • Carmichael, Hebert 2004
  • Weak detectors
    • Textures of textures
    • Tieu and Viola, CVPR 2000
    Every combination of three filters generates a different feature This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.
  • Haar wavelets
    • Haar filters and integral image
    • Viola and Jones, ICCV 2001
    The average intensity in the block is computed with four sums independently of the block size.
  • Haar wavelets Papageorgiou & Poggio (2000) Polynomial SVM
  • Edges and chamfer distance Gavrila, Philomin, ICCV 1999
  • Edge fragments Weak detector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes Opelt, Pinz, Zisserman, ECCV 2006
  • Histograms of oriented gradients
    • Dalal & Trigs, 2006
    • Shape context
    • Belongie, Malik, Puzicha, NIPS 2000
    • SIFT, D. Lowe, ICCV 1999
  • Weak detectors
    • Part based : similar to part-based generative models. We create weak detectors by using parts and voting for the object center location
    Car model Screen model These features are used for the detector on the course web site.
  • Weak detectors First we collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman, Nature Neuroscience 2003 …
  • Weak detectors We now define a family of “weak detectors” as: = = Better than chance *
  • Weak detectors We can do a better job using filtered images Still a weak detector but better than before * * = = =
  • Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background:
  • Representation and object model Selected features for the screen detector Lousy painter … 4 10 1 2 3 … 100
  • Representation and object model Selected features for the car detector 1 2 3 4 10 100 … …
  • Detection
    • Invariance: search strategy
    • Part based
    Here, invariance in translation and scale is achieved by the search strategy: the classifier is evaluated at all locations (by translating the image) and at all scales (by scaling the image in small steps). The search cost can be reduced using a cascade. f i , P i g i
  • Example: screen detection Feature output
  • Example: screen detection Feature output Thresholded output Weak ‘detector’ Produces many false alarms.
  • Example: screen detection Feature output Thresholded output Strong classifier at iteration 1
  • Example: screen detection Feature output Thresholded output Strong classifier Second weak ‘detector’ Produces a different set of false alarms.
  • Example: screen detection + Feature output Thresholded output Strong classifier Strong classifier at iteration 2
  • Example: screen detection + … Feature output Thresholded output Strong classifier Strong classifier at iteration 10
  • Example: screen detection + … Feature output Thresholded output Strong classifier Adding features Final classification Strong classifier at iteration 200
  • Cascade of classifiers Fleuret and Geman 2001, Viola and Jones 2001 We want the complexity of the 3 features classifier with the performance of the 100 features classifier: 3 features 30 features 100 features Select a threshold with high recall for each stage. We increase precision using the cascade Recall Precision 0% 100% 100%
  • Some goals for object recognition
    • Able to detect and recognize many object classes
    • Computationally efficient
    • Able to deal with data starving situations:
      • Some training samples might be harder to collect than others
      • We want on-line learning to be fast
  • Multiclass object detection
  • Multiclass object detection Number of classes Amount of labeled data Number of classes Complexity
  • Shared features
    • Is learning the object class 1000 easier than learning the first?
    • Can we transfer knowledge from one object to another?
    • Are the shared properties interesting by themselves?
  • Multitask learning
    • horizontal location of doorknob
    • single or double door
    • horizontal location of doorway center
    • width of doorway
    • horizontal location of left door jamb
    • horizontal location of right door jamb
    • width of left door jamb
    • width of right door jamb
    • horizontal location of left edge of door
    • horizontal location of right edge of door
    Primary task : detect door knobs Tasks used : R. Caruana. Multitask Learning. ML 1997
  • Sharing invariances S. Thrun. Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996 “ Knowledge is transferred between tasks via a learned model of the invariances of the domain: object recognition is invariant to rotation, translation, scaling, lighting, … These invariances are common to all object recognition tasks”. Toy world Without sharing With sharing
  • Sharing transformations
    • Miller, E., Matsakis, N., and Viola, P. (2000). Learning from one example through shared densities on transforms. In IEEE Computer Vision and Pattern Recognition.
    Transformations are shared and can be learnt from other tasks.
  • Models of object recognition I. Biederman, “Recognition-by-components: A theory of human image understanding,” Psychological Review , 1987. M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience 1999. T. Serre, L. Wolf and T. Poggio. “Object recognition with features inspired by visual cortex”. CVPR 2005
  • Sharing in constellation models Pictorial Structures Fischler & Elschlager, IEEE Trans. Comp. 1973 Constellation Model Burl, Liung,Perona, 1996; Weber, Welling, Perona, 2000 Fergus, Perona, & Zisserman, CVPR 2003 SVM Detectors Heisele, Poggio, et. al., NIPS 2001 Model-Guided Segmentation Mori, Ren, Efros, & Malik, CVPR 2004
  • Random initialization Variational EM (Attias, Hinton, Beal, etc.) Slide from Fei Fei Li Fei-Fei , Fergus, & Perona, ICCV 2003 E-Step prior knowledge of p(  ) new estimate of p(  |train) M-Step new  ’s
  • Reusable Parts Goal: Look for a vocabulary of edges that reduces the number of features. Krempp, Geman, & Amit “Sequential Learning of Reusable Parts for Object Detection”. TR 2002 Number of features Number of classes Examples of reused parts
  • Sharing patches
    • Bart and Ullman, 2004
    For a new class, use only features similar to features that where good for other classes: Proposed Dog features
  • Multiclass boosting
    • Adaboost.MH (Shapire & Singer, 2000)
    • Error correcting output codes (Dietterich & Bakiri, 1995; …)
    • Lk-TreeBoost (Friedman, 2001)
    • ...
  • Shared features Screen detector Car detector Face detector
    • Independent binary classifiers:
    Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 Screen detector Car detector Face detector
    • Binary classifiers that share features:
  • 50 training samples/class 29 object classes 2000 entries in the dictionary Results averaged on 20 runs Error bars = 80% interval Krempp, Geman, & Amit, 2002 Torralba, Murphy, Freeman. CVPR 2004 Shared features Class-specific features
  • Generalization as a function of object similarities Number of training samples per class Number of training samples per class Area under ROC Area under ROC K = 2.1 K = 4.8 Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007 12 viewpoints 12 unrelated object classes
  • Opelt, Pinz, Zisserman, CVPR 2006 Efficiency Generalization
  • Some references on multiclass
    • Caruana 1997
    • Schapire, Singer, 2000
    • Thrun, Pratt 1997
    • Krempp, Geman, Amit, 2002
    • E.L.Miller, Matsakis, Viola, 2000
    • Mahamud, Hebert, Lafferty, 2001
    • Fink 2004
    • LeCun, Huang, Bottou, 2004
    • Holub, Welling, Perona, 2005
  • Context based methods Antonio Torralba
  • Why is this hard?
  • What are the hidden objects? 2 1
  • What are the hidden objects? Chance ~ 1/30000
  • Context-based object recognition
    • Cognitive psychology
      • Palmer 1975
      • Biederman 1981
    • Computer vision
      • Noton and Stark (1971)
      • Hanson and Riseman (1978)
      • Barrow & Tenenbaum (1978)
      • Ohta, kanade, Skai (1978)
      • Haralick (1983)
      • Strat and Fischler (1991)
      • Bobick and Pinhanez (1995)
      • Campbell et al (1997)
  •  
  •  
  •  
  • Global and local representations building car sidewalk Urban street scene
  • Global and local representations Image index: Summary statistics, configuration of textures Urban street scene features histogram building car sidewalk Urban street scene
  • Global scene representations Spatial structure is important in order to provide context for object localization Sivic, Russell, Freeman, Zisserman, ICCV 2005 Fei-Fei and Perona, CVPR 2005 Bosch, Zisserman, Munoz, ECCV 2006 Bag of words Spatially organized textures Non localized textons S. Lazebnik, et al, CVPR 2006 Walker, Malik. Vision Research 2004 … M. Gorkani, R. Picard, ICPR 1994 A. Oliva, A. Torralba, IJCV 2001 …
  • Contextual object relationships Carbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005) Torralba Murphy Freeman (2004) Fink & Perona (2003) E. Sudderth et al (2005)
  • Context
    • Murphy, Torralba & Freeman (NIPS 03)
    • Use global context to predict presence and location of objects
    Keyboards O p1,c1 v p1,c1 O pN,c1 v pN,c1 . . . O p1,c2 v p1,c2 O pN,c2 v pN,c2 . . . Class 1 Class 2 E1 E2 S c2 max V c1 max V X1 X2 v g
  • 3d Scene Context Image World [Hoiem, Efros, Hebert ICCV 2005]
  • 3d Scene Context Image Support Vertical Sky V-Left V-Center V-Right V-Porous V-Solid [Hoiem, Efros, Hebert ICCV 2005] Object Surface? Support?
  • Object-Object Relationships
    • Enforce spatial consistency between labels using MRF
    Carbonetto, de Freitas & Barnard (04)
  • Object-Object Relationships
    • Use latent variables to induce long distance correlations between labels in a Conditional Random Field (CRF)
    He, Zemel & Carreira-Perpinan (04)
    • Fink & Perona (NIPS 03)
    • Use output of boosting from other objects at previous iterations as input into boosting for this iteration
    Object-Object Relationships
  • CRFs Object-Object Relationships [Torralba Murphy Freeman 2004] [Kumar Hebert 2005]
  • Hierarchical Sharing and Context
    • Scenes share objects
    • Objects share parts
    • Parts share features
    E. Sudderth, A. Torralba, W. T. Freeman, and A. Wilsky.
  • Some references on context
    • Strat & Fischler (PAMI 91)
    • Torralba & Sinha (ICCV 01),
    • Torralba (IJCV 03)
    • Fink & Perona (NIPS 03)
    • Murphy, Torralba & Freeman (NIPS 03)
    • Kumar and M. Hebert (NIPS 04)
    • Carbonetto, Freitas & Barnard (ECCV 04)
    • He, Zemel & Carreira-Perpinan (CVPR 04)
    • Sudderth, Torralba, Freeman, Wilsky (ICCV 05)
    • Hoiem, Efros, Hebert (ICCV 05)
    With a mixture of generative and discriminative approaches
  • A car out of context …
  • Integrated models for scene and object recognition Banksy
  • Summary
    • Many techniques are used for training discriminative models that I have not mention here:
    • Conditional random fields
    • Kernels for object recognition
    • Learning object similarities