Deep learning

1,333 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,333
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
46
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Deep learning

  1. 1. Deep learning simon@hekovnik.si
  2. 2. Deep/nature• Brains have a deep architecture• Humans organize their ideas hierarchically, through composition of simpler ideas
  3. 3. Deep/math• Insufficiently deep architectures can be exponentially inefficient• Distributed (possibly sparse) representations are necessary to achieve non-local generalization• Intermediate representations allow sharing statistical strength
  4. 4. Enablers• CPU (GPU) power• Algorithmic • pre-training/network stacking (RBMs, auto-encoders) • RBM + contrastive divergence • parallelization tricks
  5. 5. State of the art• Learning feature detectors for faces and cats unsupervised from videos.• Billions of units, >9 layers• Better than human recognition of traffic sings
  6. 6. Applications• Sequence prediction (time series, gene sequence, natural language modeling, ...)• Machine vision• Speech recognition• Dimensionality reduction• Classification ...
  7. 7. The good• State of the art results in many fields• Unsupervised, semi-supervised• (at least somewhat) online learning and adaptation• multi-task learning• (close to) linear scalability
  8. 8. The bad• Expensive to train• Hard to inspect/visualize progress for non- visual tasks
  9. 9. The ugly• Hyperparameter and topology selection still critical• Dependance on tricks for practical results on real-life datasets
  10. 10. Deep belief networks
  11. 11. DBN key ideas: network stacking• Greedy layer-wise learning• Hidden units of level k as visible units of level k+1• (Use backpropagation on whole stack)
  12. 12. DBN key ideas:Unsupervised greedy layer-wise learning +Supervised top layer
  13. 13. DBN key ideas: RBM• generative stochastic neural network• the network has an energy function and we are searching for thermal equilibrium• binary units; weights are state change probabilities• learning via contrastive divergence
  14. 14. DBN key ideas:contrastive divergence(1) v → h(2) h → v’(3) v’ → h’(4) δw = v⊗h - v’⊗h’
  15. 15. DBN key ideas:auto-encoder • Denoising auto-encoder (corrupt and reconstruct the input) • Sparse coding (each item is encoded by strong activation of a small set of neurons)
  16. 16. LSTM
  17. 17. LSTM• RNN with explicit state• Combination of BPTT and RTRL learning• Online learning• Can retain information over arbitrarily long periods of time• Can be trained by artificial evolution• Can combine LSTM blocks with regular units
  18. 18. Tricks• Specialized layers (convolution, max- pooling, ...)• Multi-column• Mini-batching, bias, weight momentum, parameter scheduling, ...
  19. 19. DBN vs. LSTM• General purpose • Conceptually cleaner• More flexible • Simpler and smaller• Bigger community topology • Faster convergence
  20. 20. People to watch• G. Hinton (U. Toronto)• J. Schmidhuber (IDSIA)• A.Y. Ng (Stanford, Google Brain)• J. Hawkins (Numenta)

×