ECCV2010: feature learning for image classification, part 4

1,503 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,503
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
46
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Sometimes, most data wins. So, how to get more data? Even with AMT, often slow and expensive.
  • End: One of challenges is scaling up. Most people: 14x14 up to 32x32.
  • Time-invariant features
  • Visual bases: Look at them and see if make sense/correspond to Gabors. Try to perform similar analysis on audio bases.
  • Aglioti et al., 1994; Halligan et al., 1993; Weinstein, 1969; Ramachandran, 1998; Halligan et al., 1993; Sadato et al., 1996; Halligan et al., 1999
  • http://www.cbsnews.com/stories/2000/06/29/tech/main210684.shtml: 12.3 Tflops, $110 million, used to simulate nuclear weapon testing. Like 13 graphics cards costing $250 each. 40 people with US$250 graphics card  #18 on top supercomputers list 2 years back. http://www.top500.org/list/2006/11/100
  • ECCV2010: feature learning for image classification, part 4

    1. 1. Advanced topics
    2. 2. Outline <ul><li>Self-taught learning </li></ul><ul><li>Learning feature hierarchies (Deep learning) </li></ul><ul><li>Scaling up </li></ul>
    3. 3. Self-taught learning
    4. 4. Supervised learning Cars Motorcycles Testing: What is this?
    5. 5. Semi-supervised learning Unlabeled images (all cars/motorcycles) Testing: What is this? Car Motorcycle
    6. 6. Self-taught learning Unlabeled images (random internet images) Testing: What is this? Car Motorcycle
    7. 7. Self-taught learning Sparse coding, LCC, etc.     , …,  k Use learned     , …,  k to represent training/test sets. Using     , …,  k  a   a  , …, a k If have labeled training set is small, can give huge performance boost. Car Motorcycle
    8. 8. Learning feature hierarchies/Deep learning
    9. 9. Why feature hierarchies pixels edges object parts (combination of edges) object models
    10. 10. Deep learning algorithms <ul><li>Stack sparse coding algorithm </li></ul><ul><li>Deep Belief Network (DBN) (Hinton) </li></ul><ul><li>Deep sparse autoencoders (Bengio) </li></ul><ul><li>[Other related work: LeCun, Lee, Yuille, Ng …] </li></ul>
    11. 11. Deep learning with autoencoders <ul><li>Logistic regression </li></ul><ul><li>Neural network </li></ul><ul><li>Sparse autoencoder </li></ul><ul><li>Deep autoencoder </li></ul>
    12. 12. Logistic regression Logistic regression has a learned parameter vector  . On input x, it outputs: where Draw a logistic regression unit as: x 1 x 2 x 3 +1
    13. 13. Neural Network <ul><li>String a lot of logistic units together. Example 3 layer network: </li></ul>x 1 x 2 x 3 +1 +1 Layer 1 Layer 3 Layer 3 a 3 a 2 a 1
    14. 14. Neural Network x 1 x 2 x 3 +1 +1 Layer 1 Layer 2 Layer 4 +1 Layer 3 Example 4 layer network with 2 output units:
    15. 15. Neural Network example [Courtesy of Yann LeCun]
    16. 16. Training a neural network <ul><li>Given training set (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), …. </li></ul><ul><li>Adjust parameters  (for every node) to make: </li></ul><ul><li>(Use gradient descent. “Backpropagation” algorithm. Susceptible to local optima.) </li></ul>
    17. 17. Unsupervised feature learning with a neural network <ul><li>Autoencoder. </li></ul><ul><li>Network is trained to output the input (learn identify function). </li></ul><ul><li>Trivial solution unless: </li></ul><ul><li>Constrain number of units in Layer 2 (learn compressed representation), or </li></ul><ul><li>Constrain Layer 2 to be sparse . </li></ul>a 1 a 2 a 3 x 4 x 5 x 6 +1 Layer 1 Layer 2 x 1 x 2 x 3 x 4 x 5 x 6 x 1 x 2 x 3 +1 Layer 3
    18. 18. Unsupervised feature learning with a neural network Training a sparse autoencoder. Given unlabeled training set x 1 , x 2 , … Reconstruction error term L 1 sparsity term a 1 a 2 a 3
    19. 19. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 Layer 1 Layer 2 x 1 x 2 x 3 x 4 x 5 x 6 x 1 x 2 x 3 +1 Layer 3 a 1 a 2 a 3
    20. 20. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 Layer 1 Layer 2 x 1 x 2 x 3 +1 a 1 a 2 a 3 New representation for input.
    21. 21. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 Layer 1 Layer 2 x 1 x 2 x 3 +1 a 1 a 2 a 3
    22. 22. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 Train parameters so that , subject to b i ’s being sparse.
    23. 23. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 Train parameters so that , subject to b i ’s being sparse.
    24. 24. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 Train parameters so that , subject to b i ’s being sparse.
    25. 25. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 New representation for input.
    26. 26. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3
    27. 27. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 +1 c 1 c 2 c 3
    28. 28. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 +1 c 1 c 2 c 3 New representation for input. Use [c 1 , c 3 , c 3 ] as representation to feed to learning algorithm.
    29. 29. Deep Belief Net <ul><li>Deep Belief Net (DBN) is another algorithm for learning a feature hierarchy. </li></ul><ul><li>Building block: 2-layer graphical model (Restricted Boltzmann Machine). </li></ul><ul><li>Can then learn additional layers one at a time. </li></ul>
    30. 30. Restricted Boltzmann machine (RBM) Input [x 1, x 2 , x 3 , x 4 ] Layer 2. [a 1, a 2 , a 3 ] (binary-valued) MRF with joint distribution: Use Gibbs sampling for inference. Given observed inputs x, want maximum likelihood estimation: x 4 x 1 x 2 x 3 a 2 a 3 a 1
    31. 31. Restricted Boltzmann machine (RBM) Input [x 1, x 2 , x 3 , x 4 ] Layer 2. [a 1, a 2 , a 3 ] (binary-valued) Gradient ascent on log P(x) : [x i a j ] obs from fixing x to observed value, and sampling a from P(a|x). [x i a j ] prior from running Gibbs sampling to convergence. Adding sparsity constraint on a i ’s usually improves results. x 4 x 1 x 2 x 3 a 2 a 3 a 1
    32. 32. Deep Belief Network <ul><li>Similar to a sparse autoencoder in many ways. Stack RBMs on top of each other to get DBN. </li></ul>Input [x 1, x 2 , x 3 , x 4 ] Layer 2. [a 1, a 2 , a 3 ] Layer 3. [b 1, b 2 , b 3 ] Train with approximate maximum likelihood (often with sparsity constraint on a i ’s):
    33. 33. Deep Belief Network Input [x 1, x 2 , x 3 , x 4 ] Layer 2. [a 1, a 2 , a 3 ] Layer 3. [b 1, b 2 , b 3 ] Layer 4. [c 1, c 2 , c 3 ]
    34. 34. Deep learning examples
    35. 35. Convolutional DBN for audio Spectrogram Detection units Max pooling unit
    36. 36. Convolutional DBN for audio Spectrogram
    37. 37. Probabilistic max pooling X 3 X 1 X 2 X 4 max {x 1 , x 2 , x 3 , x 4 } Convolutional Neural net: Convolutional DBN: Where x i are real numbers. Where x i are {0,1}, and mutually exclusive . Thus, 5 possible cases: Collapse 2 n configurations into n+1 configurations. Permits bottom up and top down inference. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 X 3 X 1 X 2 X 4 max {x 1 , x 2 , x 3 , x 4 }
    38. 38. Convolutional DBN for audio Spectrogram
    39. 39. Convolutional DBN for audio One CDBN layer Detection units Max pooling Detection units Max pooling Second CDBN layer
    40. 40. CDBNs for speech Learned first-layer bases
    41. 41. Convolutional DBN for Images Visible nodes (binary or real) At most one hidden nodes are active. Hidden nodes (binary) “ Filter” weights (shared) Input data V W k Detection layer H Max-pooling layer P ‘’ max-pooling’’ node (binary)
    42. 42. Convolutional DBN on face images pixels edges object parts (combination of edges) object models Note: Sparsity important for these results.
    43. 43. Learning of object parts Examples of learned object parts from object categories Faces Cars Elephants Chairs
    44. 44. Training on multiple objects Plot of H (class|neuron active) Trained on 4 classes (cars, faces, motorbikes, airplanes). Second layer: Shared-features and object-specific features. Third layer: More specific features. Second layer bases learned from 4 object categories. Third layer bases learned from 4 object categories.
    45. 45. Hierarchical probabilistic inference Input images Samples from feedforward Inference (control ) Samples from Full posterior inference Generating posterior samples from faces by “filling in” experiments (cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.
    46. 46. Key issue in feature learning: Scaling up
    47. 47. Scaling up with graphics processors Peak GFlops NVIDIA GPU US$ 250 2003 2004 2005 2006 2007 2008 (Source: NVIDIA CUDA Programming Guide) Intel CPU
    48. 48. Scaling up with GPUs Approx. number of parameters (millions): Using GPU (Raina et al., 2009)
    49. 49. Unsupervised feature learning: Does it work?
    50. 50. State-of-the-art task performance Audio Images Multimodal (audio/video) Video TIMIT Phone classification Accuracy Prior art (Clarkson et al.,1999) 79.6% Stanford Feature learning 80.3% TIMIT Speaker identification Accuracy Prior art (Reynolds, 1995) 99.7% Stanford Feature learning 100.0% CIFAR Object classification Accuracy Prior art (Yu and Zhang, 2010) 74.5% Stanford Feature learning 75.5% NORB Object classification Accuracy Prior art (Ranzato et al., 2009) 94.4% Stanford Feature learning 96.2% AVLetters Lip reading Accuracy Prior art (Zhao et al., 2009) 58.9% Stanford Feature learning 63.1% UCF activity classification Accuracy Prior art (Kalser et al., 2008) 86% Stanford Feature learning 87% Hollywood2 classification Accuracy Prior art (Laptev, 2004) 47% Stanford Feature learning 50%
    51. 51. Summary <ul><li>Instead of hand-tuning features, use unsupervised feature learning! </li></ul><ul><li>Sparse coding, LCC. </li></ul><ul><li>Advanced topics: </li></ul><ul><ul><li>Self-taught learning </li></ul></ul><ul><ul><li>Deep learning </li></ul></ul><ul><ul><li>Scaling up </li></ul></ul>
    52. 52. Other resources <ul><li>Workshop page: http://ufldl.stanford.edu/eccv10-tutorial/ </li></ul><ul><li>Code for Sparse coding, LCC. </li></ul><ul><li>References. </li></ul><ul><li>Full online tutorial. </li></ul>

    ×