Your SlideShare is downloading. ×
0
Sparse Coding and Its Extensions for Visual Recognition Kai Yu M edia Analytics Department NEC Labs America, C upertino, CA
V isual Recognition is HOT in Computer Vision  12/30/11 C altech 101 PASCAL VOC 80 Million Tiny  Images I mageNet
T he pipeline of machine visual perception 12/30/11 M ost Efforts in Machine Learning  Low-level sensing Pre-processing Fe...
Computer vision features SIFT Spin image HoG RIFT S lide Credit: Andrew Ng GLOH
L earning everything from data 12/30/11 Machine Learning  Low-level sensing Pre-processing Feature extract. Feature select...
BoW + SPM Kernel 12/30/11 <ul><li>Combining multiple features, this method had been the state-of-the-art on Caltech-101, P...
W inning Method in PASCAL VOC before 2009 12/30/11 M ultiple Feature Sampling Methods Multiple   Visual Descriptors VQ C o...
Convolution Neural Networks  <ul><li>T he architectures of some successful methods are not so much different from CNNs </l...
BoW+SPM: the same architecture <ul><li>Observations:  </li></ul><ul><li>Nonlinear SVM is not scalable </li></ul><ul><li>VQ...
D evelop better methods Better Coding Better Pooling Scalable Linear Classifier B etter Coding B etter  Pooling
S parse Coding 12/30/11 Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in...
Sparse Coding E xample Natural Images Learned bases (  1 , …,   64 ):  “Edges” x        0.8 *   36   +  0.3 *   42   ...
S elf-taught Learning Testing: What is this?  Motorcycles Not motorcycles [Raina, Lee, Battle, Packer & Ng, ICML 07] Testi...
Classification   R esult on Caltech 101 12/30/11 64%  SIFT VQ + N onlinear SVM 50% Pixel   S parse Coding + Linear SVM 9K ...
e.g, SIFT, HOG S parse Coding on SIFT [Y ang, Yu, Gong & Huang , CVPR09] S parse  Coding M ax  Pooling Scalable Linear Cla...
12/30/11 64%  SIFT VQ + N onlinear SVM 73% SIFT   S parse Coding + Linear SVM C altech-101 S parse Coding on SIFT [Y ang, ...
W hat we have learned?  <ul><li>S parse coding is a useful stuff (why?) </li></ul><ul><li>H ierarchical architecture is ne...
MNIST E xperiments 12/30/11 Error: 4.54% <ul><li>When SC achieves the  best classification accuracy, the learned bases are...
Distribution of coefficient (SIFT, Caltech101) 12/30/11 Neighbor bases tend to get nonzero coefficients
12/30/11 <ul><li>I nterpretation  2 </li></ul><ul><li>Geometr y  of data manifold </li></ul><ul><li>Each basis an “ anchor...
A F unction Approximation View to Coding 12/30/11 <ul><li>S etting :  f(x) is a nonlinear feature extraction function on i...
A F unction Approximation View to Coding  – The General Formulation 12/30/11 F unction Approx. Error  ≤ A n unsupervised l...
Local Coordinate Coding (LCC) 12/30/11 <ul><li>D ictionary Learning: k-means (or hierarchical  k -means) </li></ul><ul><li...
Super-Vector Coding (SVC) 12/30/11 <ul><li>D ictionary Learning: k-means (or hierarchical  k -means) </li></ul><ul><li>C o...
F unction Approximation based on  LCC 12/30/11 data points bases Yu, Zhang, Gong, NIPS 10 locally linear
Function Approximation based on SVC Zhou, Yu, Zhang, and Huang, ECCV 10 data points cluster centers Piecewise local linear...
PASCAL VOC C hallenge  2009 12/30/11 N o.1 for 18 of 20 categories W e used only HOG feature on gray images Ours Best of  ...
I mageNet Challenge 2010 12/30/11 ~40%  VQ + I ntersection Kernel 64%~73% Various Coding Methods + Linear SVM  1.4  millio...
H ierarchical sparse coding Yu, Lin, & Lafferty, CVPR 11 Conv. Filtering  Pooling  Conv. Filtering  Pooling L earning from...
A two-layer sparse coding formulation  12/30/11
MNIST Results -- classification    HSC vs. CNN:  HSC provide even better performance than CNN      more amazingly, HSC...
MNIST results  -- effect of hierarchical learning  C omparing the Fisher score of HSC and SC    Discriminative power:  is...
MNIST results  -- learned codebook One dimension in the second layer: invariance to translation, rotation, and deformation
Caltech101 results  -- classification    Learned descriptor:  performs slightly better than SIFT + SC
Caltech101 results  -- learned codebook    First layer bases:  very much like edge detectors.
Conclusion and Future Work <ul><li>“ function approximation ” view to derive novel sparse coding methods. </li></ul><ul><l...
References 12/30/11 <ul><li>L earning Image Representations from Pixel Level via Hierarchical Sparse Coding,  </li></ul><u...
Upcoming SlideShare
Loading in...5
×

Fcv learn yu

311

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
311
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Fcv learn yu"

  1. 1. Sparse Coding and Its Extensions for Visual Recognition Kai Yu M edia Analytics Department NEC Labs America, C upertino, CA
  2. 2. V isual Recognition is HOT in Computer Vision 12/30/11 C altech 101 PASCAL VOC 80 Million Tiny Images I mageNet
  3. 3. T he pipeline of machine visual perception 12/30/11 M ost Efforts in Machine Learning Low-level sensing Pre-processing Feature extract. Feature selection Inference: prediction, recognition <ul><li>M ost c ritical for accuracy </li></ul><ul><li>A ccount for most of the computation </li></ul><ul><li>Most time-consuming in development cycle </li></ul><ul><li>O ften hand-craft in practice </li></ul>
  4. 4. Computer vision features SIFT Spin image HoG RIFT S lide Credit: Andrew Ng GLOH
  5. 5. L earning everything from data 12/30/11 Machine Learning Low-level sensing Pre-processing Feature extract. Feature selection Inference: prediction, recognition M achine L earning
  6. 6. BoW + SPM Kernel 12/30/11 <ul><li>Combining multiple features, this method had been the state-of-the-art on Caltech-101, PASCAL, 15 Scene Categories, … </li></ul>Figure credit: Fei-Fei Li, Svetlana Lazebnik Bag-of-visual-words representation (BoW) based on vector quantization (VQ) Spatial pyramid matching (SPM) kernel
  7. 7. W inning Method in PASCAL VOC before 2009 12/30/11 M ultiple Feature Sampling Methods Multiple Visual Descriptors VQ C oding , H istogram, SPM N onlinear SVM
  8. 8. Convolution Neural Networks <ul><li>T he architectures of some successful methods are not so much different from CNNs </li></ul>Conv. Filtering Pooling Conv. Filtering Pooling
  9. 9. BoW+SPM: the same architecture <ul><li>Observations: </li></ul><ul><li>Nonlinear SVM is not scalable </li></ul><ul><li>VQ coding may be too coarse </li></ul><ul><li>Average pooling is not optimal </li></ul><ul><li>Why not learn the whole thing? </li></ul>e.g, SIFT, HOG VQ Coding Average Pooling (obtain histogram) Nonlinear SVM Local Gradients Pooling
  10. 10. D evelop better methods Better Coding Better Pooling Scalable Linear Classifier B etter Coding B etter Pooling
  11. 11. S parse Coding 12/30/11 Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). T raining: given a set of random patches x, learning a dictionary of bases [Φ 1, Φ 2, …] Coding: for data vector x, solve LASSO to find the sparse coefficient vector a
  12. 12. Sparse Coding E xample Natural Images Learned bases (  1 , …,  64 ): “Edges” x  0.8 *  36 + 0.3 *  42 + 0.5 *  63 [a 1 , …, a 64 ] = [ 0, 0, …, 0, 0.8 , 0, …, 0, 0.3 , 0, …, 0, 0.5 , 0 ] (feature representation) Test example Compact & easily interpretable Slide credit: Andrew Ng  0.8 * + 0.3 * + 0.5 *
  13. 13. S elf-taught Learning Testing: What is this? Motorcycles Not motorcycles [Raina, Lee, Battle, Packer & Ng, ICML 07] Testing: What is this ? Slide credit: Andrew Ng Unlabeled images …
  14. 14. Classification R esult on Caltech 101 12/30/11 64% SIFT VQ + N onlinear SVM 50% Pixel S parse Coding + Linear SVM 9K images, 101 classes
  15. 15. e.g, SIFT, HOG S parse Coding on SIFT [Y ang, Yu, Gong & Huang , CVPR09] S parse Coding M ax Pooling Scalable Linear Classifier Local Gradients Pooling
  16. 16. 12/30/11 64% SIFT VQ + N onlinear SVM 73% SIFT S parse Coding + Linear SVM C altech-101 S parse Coding on SIFT [Y ang, Yu, Gong & Huang , CVPR09]
  17. 17. W hat we have learned? <ul><li>S parse coding is a useful stuff (why?) </li></ul><ul><li>H ierarchical architecture is needed </li></ul>e.g, SIFT, HOG S parse Coding M ax Pooling Scalable Linear Classifier Local Gradients Pooling
  18. 18. MNIST E xperiments 12/30/11 Error: 4.54% <ul><li>When SC achieves the best classification accuracy, the learned bases are like digits – each basis has a clear local class association. </li></ul>Error: 3.75% Error: 2.64%
  19. 19. Distribution of coefficient (SIFT, Caltech101) 12/30/11 Neighbor bases tend to get nonzero coefficients
  20. 20. 12/30/11 <ul><li>I nterpretation 2 </li></ul><ul><li>Geometr y of data manifold </li></ul><ul><li>Each basis an “ anchor point ” </li></ul><ul><li>Sparsity is induced by locality : each datum is a linear combination of neighbor anchors. </li></ul><ul><li>I nterpretation 1 </li></ul><ul><li>Discover subspaces </li></ul><ul><li>Each basis is a “ direction ” </li></ul><ul><li>Sparsity : each datum is a linear combination of only several bases. </li></ul><ul><li>Related to topic model </li></ul>
  21. 21. A F unction Approximation View to Coding 12/30/11 <ul><li>S etting : f(x) is a nonlinear feature extraction function on image patches x </li></ul><ul><li>Coding : nonlinear mapping </li></ul><ul><li> x  a </li></ul><ul><li>typically, a is high-dim & sparse </li></ul><ul><li>Nonlinear Learning : </li></ul><ul><li>f(x) = <w, a> </li></ul>A coding scheme is good if it helps learning f(x)
  22. 22. A F unction Approximation View to Coding – The General Formulation 12/30/11 F unction Approx. Error ≤ A n unsupervised learning objective
  23. 23. Local Coordinate Coding (LCC) 12/30/11 <ul><li>D ictionary Learning: k-means (or hierarchical k -means) </li></ul><ul><li>C oding for x, to obtain its sparse representation a </li></ul><ul><ul><li>Step 1 – ensure locality : find the K nearest bases </li></ul></ul><ul><ul><li>Step 2 – ensure low coding error : </li></ul></ul>Yu, Zhang & Gong, NIPS 09 W ang, Yang, Yu, Lv, Huang CVPR 10
  24. 24. Super-Vector Coding (SVC) 12/30/11 <ul><li>D ictionary Learning: k-means (or hierarchical k -means) </li></ul><ul><li>C oding for x, to obtain its sparse representation a </li></ul><ul><ul><li>Step 1 – find the nearest bas i s of x, obtain its VQ coding </li></ul></ul><ul><ul><li>e.g. [0, 0, 1, 0, …] </li></ul></ul><ul><ul><li>Step 2 – form super vector coding: </li></ul></ul><ul><ul><li> e.g. [0, 0, 1, 0, …, 0, 0, (x-m 3 ), 0 ,… ] </li></ul></ul>Zhou, Yu, Zhang, and Huang, ECCV 10 Zero-order Local tangent
  25. 25. F unction Approximation based on LCC 12/30/11 data points bases Yu, Zhang, Gong, NIPS 10 locally linear
  26. 26. Function Approximation based on SVC Zhou, Yu, Zhang, and Huang, ECCV 10 data points cluster centers Piecewise local linear ( first-order) Local tangent
  27. 27. PASCAL VOC C hallenge 2009 12/30/11 N o.1 for 18 of 20 categories W e used only HOG feature on gray images Ours Best of Other Teams Difference Classes
  28. 28. I mageNet Challenge 2010 12/30/11 ~40% VQ + I ntersection Kernel 64%~73% Various Coding Methods + Linear SVM 1.4 million images, 1000 classes, top5 hit rate 50% Classification accuracy
  29. 29. H ierarchical sparse coding Yu, Lin, & Lafferty, CVPR 11 Conv. Filtering Pooling Conv. Filtering Pooling L earning from unlabeled data
  30. 30. A two-layer sparse coding formulation 12/30/11
  31. 31. MNIST Results -- classification  HSC vs. CNN: HSC provide even better performance than CNN  more amazingly, HSC learns features in unsupervised manner!
  32. 32. MNIST results -- effect of hierarchical learning C omparing the Fisher score of HSC and SC  Discriminative power: is significantly improved by HSC although HSC is unsupervised coding
  33. 33. MNIST results -- learned codebook One dimension in the second layer: invariance to translation, rotation, and deformation
  34. 34. Caltech101 results -- classification  Learned descriptor: performs slightly better than SIFT + SC
  35. 35. Caltech101 results -- learned codebook  First layer bases: very much like edge detectors.
  36. 36. Conclusion and Future Work <ul><li>“ function approximation ” view to derive novel sparse coding methods. </li></ul><ul><li>Locality – one way to achieve sparsity and it’s really useful. But we need deeper understanding of the feature learning methods </li></ul><ul><li>Interesting directions </li></ul><ul><ul><li>Hierarchical coding – Deep Learning (many papers now!) </li></ul></ul><ul><ul><li>Faster methods for sparse coding (e.g. from LeCun’s group) </li></ul></ul><ul><ul><li>Learning features from a richer structure of data, e.g., video (learning invariance to out plane rotation) </li></ul></ul>
  37. 37. References 12/30/11 <ul><li>L earning Image Representations from Pixel Level via Hierarchical Sparse Coding, </li></ul><ul><li>K ai Yu, Yuanqing Lin, John Lafferty. CVPR 2011 </li></ul><ul><li>Large-scale Image Classification: Fast Feature Extraction and SVM Training, </li></ul><ul><li>Y uanqing Lin, Fengjun Lv, Liangliang Cao, Shenghuo Zhu, Ming Yang, Timothee Cour, Thomas Huang, Kai Yu </li></ul><ul><li>in CVPR 2011 </li></ul><ul><li>ECCV 2010 Tutorial, Kai Yu, Andrew Ng (with links to some source codes) </li></ul><ul><li>Deep Coding Networks, </li></ul><ul><li>Yuanqing Lin, Tong Zhang, Shenghuo Zhu, Kai Yu. In NIPS 2010 . </li></ul><ul><li>Image Classification using Super-Vector Coding of Local Image Descriptors, </li></ul><ul><li>Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In ECCV 2010 . </li></ul><ul><li>Efficient Highly Over-Complete Sparse Coding using a Mixture Model, </li></ul><ul><li>Jianchao Yang, Kai Yu, and Thomas Huang. In ECCV 2010 . </li></ul><ul><li>Improved Local Coordinate Coding using Local Tangents, </li></ul><ul><li>Kai Yu and Tong Zhang. In ICML 2010 . </li></ul><ul><li>Supervised translation-invariant sparse coding, </li></ul><ul><li>Jianchao Yang, Kai Yu, and Thomas Huang, In CVPR 2010 </li></ul><ul><li>Learning locality-constrained linear coding for image classification, </li></ul><ul><li>Jingjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang. In CVPR 2010 . </li></ul><ul><li>Nonlinear learning using local coordinate coding, </li></ul><ul><li>Kai Yu, Tong Zhang, and Yihong Gong. In NIPS 2009 . </li></ul><ul><li>Linear spatial pyramid matching using sparse coding for image classification, </li></ul><ul><li>Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. In CVPR 2009 . </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×