• Like
Fcv learn fergus
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Fcv learn fergus



Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Winder and Brown paper. Slightly smoothed view of things.
  • Note pooling is across space, not across Gabor channel
  • Non-maximal suppression across VW. Like an L-Inf normalization
  • Note pooling is across space, not across Gabor channel


  • 1. The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4.40pm: Erik Sudderth 4.50pm: Spotlights - Qiang Ji, M-H Yang 4.55pm: Discussion 5.30pm: End Feature / Deep Learning Compositional Models Learning Representations Overview Low-level Representations Learning on the fly
  • 2. An Overview of Hierarchical Feature Learning and Relations to Other Models Rob Fergus Dept. of Computer Science, Courant Institute, New York University
  • 3. Motivation
    • Multitude of hand-designed features currently in use
      • SIFT, HOG, LBP, MSER, Color-SIFT………….
    • Maybe some way of learning the features?
    • Also, just capture low-level edge gradients
    Felzenszwalb, Girshick, McAllester and Ramanan, PAMI 2007 Yan & Huang (Winner of PASCAL 2010 classification competition)
  • 4. Beyond Edges?
    • Mid-level cues
    “ Tokens” from Vision by D.Marr: Continuation Parallelism Junctions Corners
    • High-level object parts:
    • Difficult to hand-engineer  What about learning them?
  • 5.
      • Build hierarchy of feature extractors (≥ 1 layers)
      • All the way from pixels  classifier
      • Homogenous structure per layer
      • Unsupervised training
    Deep/Feature Learning Goal Layer 1 Layer 2 Layer 3 Simple Classifier Image/Video Pixels
    • Numerous approaches:
      • Restricted Boltzmann Machines (Hinton, Ng, Bengio,…)
      • Sparse coding (Yu, Fergus, LeCun)
      • Auto-encoders (LeCun, Bengio)
      • ICA variants (Ng, Cottrell)
      • & many more….
  • 6. Single Layer Architecture Filter Normalize Pool Input: Image Pixels / Features Output: Features / Classifier Details in the boxes matter (especially in a hierarchy) Links to neuroscience
  • 7. Example Feature Learning Architectures Pixels / Features Filter with Dictionary (patch/tiled/convolutional) Spatial/Feature (Sum or Max) Normalization between feature responses Features + Non-linearity Local Contrast Normalization (Subtractive / Divisive) (Group) Sparsity Max / Softmax
  • 8. SIFT Descriptor
    • Image Pixels
    Apply Gabor filters Spatial pool (Sum) Normalize to unit length Feature Vector
  • 9.
    • SIFT Features
    Spatial Pyramid Matching Filter with Visual Words Multi-scale spatial pool (Sum) Max Classifier Lazebnik, Schmid, Ponce [CVPR 2006]
  • 10. Role of Normalization
    • Lots of different mechanisms (max, sparsity, LCN etc.)
    • All induce local competition between features to explain input
      • “ Explaining away”
      • Just like top-down models
      • But more local mechanism
    Example: Convolutional Sparse Coding Filters Convolution |.| 1 |.| 1 |.| 1 |.| 1 Zeiler et al. [CVPR’10/ICCV’11], Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]
  • 11. Role of Pooling
    • Spatial pooling
      • Invariance to small transformations
    Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007]
    • Pooling across feature groups
      • Gives AND/OR type behavior
      • Compositional models of Zhu, Yuille
      • Larger receptive fields
    Zeiler, Taylor, Fergus [ICCV 2011]
    • Pooling with latent variables (& springs)
      • Pictorial structures models
    Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009]
  • 12.  
  • 13.
    • HOG Pyramid
    Object Detection with Discriminatively Trained Part-Based Models Apply object part filters Pool part responses (latent variables & springs) Non-max Suppression (Spatial) Score Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009] + +