The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille ...
An Overview of  Hierarchical Feature Learning  and Relations to Other Models Rob Fergus Dept. of Computer Science,  Couran...
Motivation <ul><li>Multitude of hand-designed features currently in use </li></ul><ul><ul><li>SIFT, HOG, LBP, MSER, Color-...
Beyond Edges?  <ul><li>Mid-level cues </li></ul>“ Tokens”  from Vision by D.Marr: Continuation Parallelism Junctions Corne...
<ul><ul><li>Build hierarchy of feature extractors (≥ 1 layers) </li></ul></ul><ul><ul><li>All the way from pixels    clas...
Single Layer Architecture  Filter Normalize Pool Input:  Image Pixels / Features Output:    Features / Classifier Details ...
Example Feature Learning Architectures Pixels / Features Filter with  Dictionary (patch/tiled/convolutional) Spatial/Featu...
SIFT Descriptor <ul><li>Image  Pixels </li></ul>Apply Gabor filters Spatial pool  (Sum)  Normalize to unit length Feature ...
<ul><li>SIFT Features </li></ul>Spatial Pyramid Matching Filter with  Visual Words Multi-scale spatial pool  (Sum)  Max Cl...
Role of Normalization  <ul><li>Lots of different mechanisms (max, sparsity, LCN etc.) </li></ul><ul><li>All  induce local ...
Role of Pooling  <ul><li>Spatial pooling </li></ul><ul><ul><li>Invariance to small transformations  </li></ul></ul>Chen, Z...
 
<ul><li>HOG Pyramid </li></ul>Object Detection with Discriminatively Trained Part-Based Models Apply object part filters P...
Upcoming SlideShare
Loading in …5
×

Fcv learn fergus

382 views
327 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
382
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Winder and Brown paper. Slightly smoothed view of things.
  • Note pooling is across space, not across Gabor channel
  • Non-maximal suppression across VW. Like an L-Inf normalization
  • Note pooling is across space, not across Gabor channel
  • Fcv learn fergus

    1. 1. The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4.40pm: Erik Sudderth 4.50pm: Spotlights - Qiang Ji, M-H Yang 4.55pm: Discussion 5.30pm: End Feature / Deep Learning Compositional Models Learning Representations Overview Low-level Representations Learning on the fly
    2. 2. An Overview of Hierarchical Feature Learning and Relations to Other Models Rob Fergus Dept. of Computer Science, Courant Institute, New York University
    3. 3. Motivation <ul><li>Multitude of hand-designed features currently in use </li></ul><ul><ul><li>SIFT, HOG, LBP, MSER, Color-SIFT…………. </li></ul></ul><ul><li>Maybe some way of learning the features? </li></ul><ul><li>Also, just capture low-level edge gradients </li></ul>Felzenszwalb, Girshick, McAllester and Ramanan, PAMI 2007 Yan & Huang (Winner of PASCAL 2010 classification competition)
    4. 4. Beyond Edges? <ul><li>Mid-level cues </li></ul>“ Tokens” from Vision by D.Marr: Continuation Parallelism Junctions Corners <ul><li>High-level object parts: </li></ul><ul><li>Difficult to hand-engineer  What about learning them? </li></ul>
    5. 5. <ul><ul><li>Build hierarchy of feature extractors (≥ 1 layers) </li></ul></ul><ul><ul><li>All the way from pixels  classifier </li></ul></ul><ul><ul><li>Homogenous structure per layer </li></ul></ul><ul><ul><li>Unsupervised training </li></ul></ul>Deep/Feature Learning Goal Layer 1 Layer 2 Layer 3 Simple Classifier Image/Video Pixels <ul><li>Numerous approaches: </li></ul><ul><ul><li>Restricted Boltzmann Machines (Hinton, Ng, Bengio,…) </li></ul></ul><ul><ul><li>Sparse coding (Yu, Fergus, LeCun) </li></ul></ul><ul><ul><li>Auto-encoders (LeCun, Bengio) </li></ul></ul><ul><ul><li>ICA variants (Ng, Cottrell) </li></ul></ul><ul><ul><li>& many more…. </li></ul></ul>
    6. 6. Single Layer Architecture Filter Normalize Pool Input: Image Pixels / Features Output: Features / Classifier Details in the boxes matter (especially in a hierarchy) Links to neuroscience
    7. 7. Example Feature Learning Architectures Pixels / Features Filter with Dictionary (patch/tiled/convolutional) Spatial/Feature (Sum or Max) Normalization between feature responses Features + Non-linearity Local Contrast Normalization (Subtractive / Divisive) (Group) Sparsity Max / Softmax
    8. 8. SIFT Descriptor <ul><li>Image Pixels </li></ul>Apply Gabor filters Spatial pool (Sum) Normalize to unit length Feature Vector
    9. 9. <ul><li>SIFT Features </li></ul>Spatial Pyramid Matching Filter with Visual Words Multi-scale spatial pool (Sum) Max Classifier Lazebnik, Schmid, Ponce [CVPR 2006]
    10. 10. Role of Normalization <ul><li>Lots of different mechanisms (max, sparsity, LCN etc.) </li></ul><ul><li>All induce local competition between features to explain input </li></ul><ul><ul><li>“ Explaining away” </li></ul></ul><ul><ul><li>Just like top-down models </li></ul></ul><ul><ul><li>But more local mechanism </li></ul></ul>Example: Convolutional Sparse Coding Filters Convolution |.| 1 |.| 1 |.| 1 |.| 1 Zeiler et al. [CVPR’10/ICCV’11], Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]
    11. 11. Role of Pooling <ul><li>Spatial pooling </li></ul><ul><ul><li>Invariance to small transformations </li></ul></ul>Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007] <ul><li>Pooling across feature groups </li></ul><ul><ul><li>Gives AND/OR type behavior </li></ul></ul><ul><ul><li>Compositional models of Zhu, Yuille </li></ul></ul><ul><ul><li>Larger receptive fields </li></ul></ul>Zeiler, Taylor, Fergus [ICCV 2011] <ul><li>Pooling with latent variables (& springs) </li></ul><ul><ul><li>Pictorial structures models </li></ul></ul>Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009]
    12. 13. <ul><li>HOG Pyramid </li></ul>Object Detection with Discriminatively Trained Part-Based Models Apply object part filters Pool part responses (latent variables & springs) Non-max Suppression (Spatial) Score Felzenszwalb, Girshick, McAllester, Ramanan [PAMI 2009] + +

    ×