This document discusses techniques for unsupervised feature learning from unlabeled data using neural networks. It describes using sparse autoencoders to learn feature hierarchies in an unsupervised manner by training networks to reconstruct their inputs while enforcing sparsity constraints. Convolutional deep belief networks are also discussed as a method for hierarchical probabilistic modeling of audio, images and video. The document concludes that unsupervised feature learning has achieved state-of-the-art results on various tasks such as object classification, activity recognition and speech processing.
7. Self-taught learning Sparse coding, LCC, etc. , …, k Use learned , …, k to represent training/test sets. Using , …, k a a , …, a k If have labeled training set is small, can give huge performance boost. Car Motorcycle
12. Logistic regression Logistic regression has a learned parameter vector . On input x, it outputs: where Draw a logistic regression unit as: x 1 x 2 x 3 +1
13.
14. Neural Network x 1 x 2 x 3 +1 +1 Layer 1 Layer 2 Layer 4 +1 Layer 3 Example 4 layer network with 2 output units:
18. Unsupervised feature learning with a neural network Training a sparse autoencoder. Given unlabeled training set x 1 , x 2 , … Reconstruction error term L 1 sparsity term a 1 a 2 a 3
19. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 Layer 1 Layer 2 x 1 x 2 x 3 x 4 x 5 x 6 x 1 x 2 x 3 +1 Layer 3 a 1 a 2 a 3
20. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 Layer 1 Layer 2 x 1 x 2 x 3 +1 a 1 a 2 a 3 New representation for input.
22. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 Train parameters so that , subject to b i ’s being sparse.
23. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 Train parameters so that , subject to b i ’s being sparse.
24. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 Train parameters so that , subject to b i ’s being sparse.
25. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 New representation for input.
27. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 +1 c 1 c 2 c 3
28. Unsupervised feature learning with a neural network x 4 x 5 x 6 +1 x 1 x 2 x 3 +1 a 1 a 2 a 3 +1 b 1 b 2 b 3 +1 c 1 c 2 c 3 New representation for input. Use [c 1 , c 3 , c 3 ] as representation to feed to learning algorithm.
29.
30. Restricted Boltzmann machine (RBM) Input [x 1, x 2 , x 3 , x 4 ] Layer 2. [a 1, a 2 , a 3 ] (binary-valued) MRF with joint distribution: Use Gibbs sampling for inference. Given observed inputs x, want maximum likelihood estimation: x 4 x 1 x 2 x 3 a 2 a 3 a 1
31. Restricted Boltzmann machine (RBM) Input [x 1, x 2 , x 3 , x 4 ] Layer 2. [a 1, a 2 , a 3 ] (binary-valued) Gradient ascent on log P(x) : [x i a j ] obs from fixing x to observed value, and sampling a from P(a|x). [x i a j ] prior from running Gibbs sampling to convergence. Adding sparsity constraint on a i ’s usually improves results. x 4 x 1 x 2 x 3 a 2 a 3 a 1
32.
33. Deep Belief Network Input [x 1, x 2 , x 3 , x 4 ] Layer 2. [a 1, a 2 , a 3 ] Layer 3. [b 1, b 2 , b 3 ] Layer 4. [c 1, c 2 , c 3 ]
37. Probabilistic max pooling X 3 X 1 X 2 X 4 max {x 1 , x 2 , x 3 , x 4 } Convolutional Neural net: Convolutional DBN: Where x i are real numbers. Where x i are {0,1}, and mutually exclusive . Thus, 5 possible cases: Collapse 2 n configurations into n+1 configurations. Permits bottom up and top down inference. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 X 3 X 1 X 2 X 4 max {x 1 , x 2 , x 3 , x 4 }
41. Convolutional DBN for Images Visible nodes (binary or real) At most one hidden nodes are active. Hidden nodes (binary) “ Filter” weights (shared) Input data V W k Detection layer H Max-pooling layer P ‘’ max-pooling’’ node (binary)
42. Convolutional DBN on face images pixels edges object parts (combination of edges) object models Note: Sparsity important for these results.
43. Learning of object parts Examples of learned object parts from object categories Faces Cars Elephants Chairs
44. Training on multiple objects Plot of H (class|neuron active) Trained on 4 classes (cars, faces, motorbikes, airplanes). Second layer: Shared-features and object-specific features. Third layer: More specific features. Second layer bases learned from 4 object categories. Third layer bases learned from 4 object categories.
45. Hierarchical probabilistic inference Input images Samples from feedforward Inference (control ) Samples from Full posterior inference Generating posterior samples from faces by “filling in” experiments (cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.
50. State-of-the-art task performance Audio Images Multimodal (audio/video) Video TIMIT Phone classification Accuracy Prior art (Clarkson et al.,1999) 79.6% Stanford Feature learning 80.3% TIMIT Speaker identification Accuracy Prior art (Reynolds, 1995) 99.7% Stanford Feature learning 100.0% CIFAR Object classification Accuracy Prior art (Yu and Zhang, 2010) 74.5% Stanford Feature learning 75.5% NORB Object classification Accuracy Prior art (Ranzato et al., 2009) 94.4% Stanford Feature learning 96.2% AVLetters Lip reading Accuracy Prior art (Zhao et al., 2009) 58.9% Stanford Feature learning 63.1% UCF activity classification Accuracy Prior art (Kalser et al., 2008) 86% Stanford Feature learning 87% Hollywood2 classification Accuracy Prior art (Laptev, 2004) 47% Stanford Feature learning 50%
51.
52.
Editor's Notes
Sometimes, most data wins. So, how to get more data? Even with AMT, often slow and expensive.
End: One of challenges is scaling up. Most people: 14x14 up to 32x32.
Time-invariant features
Visual bases: Look at them and see if make sense/correspond to Gabors. Try to perform similar analysis on audio bases.
Aglioti et al., 1994; Halligan et al., 1993; Weinstein, 1969; Ramachandran, 1998; Halligan et al., 1993; Sadato et al., 1996; Halligan et al., 1999
http://www.cbsnews.com/stories/2000/06/29/tech/main210684.shtml: 12.3 Tflops, $110 million, used to simulate nuclear weapon testing. Like 13 graphics cards costing $250 each. 40 people with US$250 graphics card #18 on top supercomputers list 2 years back. http://www.top500.org/list/2006/11/100