Successfully reported this slideshow.
Your SlideShare is downloading. ×

Deep Learning: a birds eye view

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 75 Ad

More Related Content

Slideshows for you (20)

Similar to Deep Learning: a birds eye view (20)

Advertisement

More from Roelof Pieters (16)

Recently uploaded (20)

Advertisement

Deep Learning: a birds eye view

  1. 1. @graphific Roelof Pieters Deep Learning:
 a (non-techy) birds-eye view 20  April  2015   Stockholm Deep Learning Slides at:
 http://www.slideshare.net/roelofp/deep-learning-a-birdseye-view
  2. 2. • Deals with “construction and study of systems that can learn from data” Refresher: Machine Learning ??? A computer program is said to learn from experience (E) with respect to some class of tasks (T) and performance measure (P), if its performance at tasks in T, as measured by P, improves with experience E — T. Mitchell 1997 2
  3. 3. Improving some task T based on experience E with respect to performance measure P. Deep Learning = Machine Learning Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task (or tasks drawn from a population of similar tasks) more effectively the next time. — H. Simon 1983 
 "Why Should Machines Learn?” in Mitchell 1997 — T. Mitchell 1997 3
  4. 4. Representation learning Attempts to automatically learn good features or representations Deep learning Attempt to learn multiple levels of representation of increasing complexity/abstraction Deep Learning: What? 4
  5. 5. Deep Learning ?? 5
  6. 6. Machine Learning ?? Traditional Programming: Data Program Output 6 Computer
  7. 7. Machine Learning ?? Traditional Programming: Data Program Output Data Program Output Machine Learning: 7 (labels) (“weights”/model) Computer Computer
  8. 8. Machine Learning ?? 8 • Most machine learning methods work well because of human-designed/hand- engineered features (representations) • machine learning -> optimising weights to best make a final prediction
  9. 9. Typical ML Regression Deep Learning: Why?
  10. 10. Neural NetTypical ML Regression Deep Learning: Why?
  11. 11. Machine ->Deep Learning ??
  12. 12. Machine ->Deep Learning ?? DEEP NET 
 (DEEP LEARNING)
  13. 13. Biological Inspiration 14
  14. 14. Deep Learning: Why?
  15. 15. Deep Learning is everywhere…
  16. 16. Deep Learning in the News
  17. 17. (source: Google Trends) 19
  18. 18. (source: arXiv bookworm, Culturomics) Scientific Articles:
  19. 19. Why Now? • Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks. • No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998 • SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow architecture). • Breakthrough in 2006! 25
  20. 20. Renewed Interest: 1990s • Learning multiple layers • “Back propagation” • Can “theoretically” learn any function! But… • Very slow and inefficient • SVMs, random forests, etc. SOTA 26
  21. 21. 2006 Breakthrough • More data • Faster hardware: GPU’s, multi-core CPU’s • Working ideas on how to train deep architectures 27
  22. 22. 2006 Breakthrough • More data • Faster hardware: GPU’s, multi-core CPU’s • Working ideas on how to train deep architectures 28
  23. 23. 2006 Breakthrough 29
  24. 24. 2006 Breakthrough 30 Growth of datasets
  25. 25. 2006 Breakthrough • More data • Faster hardware: GPU’s, multi-core CPU’s • Working ideas on how to train deep architectures 31
  26. 26. 2006 Breakthrough 32 vs
  27. 27. Rise of Raw Computation Power
  28. 28. 2006 Breakthrough • More data • Faster hardware: GPU’s, multi-core CPU’s • Working ideas on how to train deep architectures 34
  29. 29. 2006 Breakthrough Stacked Restricted Boltzman Machines* (RBM) Hinton, G. E, Osindero, S., and Teh, Y. W. (2006).
 A fast learning algorithm for deep belief nets.
 Neural Computation, 18:1527-1554. Stacked Autoencoders (AE) Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007).
 Greedy Layer-Wise Training of Deep Networks,
 Advances in Neural Information Processing Systems 19 * called Deep Belief Networks (DBN)
35
  30. 30. Deep Learning for the Win!
  31. 31. • 1.2M images with 1000 object categories • AlexNet of uni Toronto: 15% error rate vs 26% for 2th placed (traditional CV) Impact on Computer Vision ImageNet Challenge 2012
  32. 32. Impact on Computer Vision (from Clarifai)
  33. 33. Impact on Computer Vision
  34. 34. 40 Classification results on ImageNet 2012 Team Year Place Error (top-5) Uses external data SuperVision 2012 - 16.4% no SuperVision 2012 1st 15.3% ImageNet 22k Clarifai 2013 - 11.7% no Clarifai 2013 1st 11.2% ImageNet 22k MSRA 2014 3rd 7.35% no VGG 2014 2nd 7.32% no GoogLeNet 2014 1st 6.67% no Final Detection Results Team Year Place mAP e x t e r n a l data ensemble c o n t e x t u a l model approach UvA-Euvision 2013 1st 22.6% none ? yes F i s h e r vectors Deep Insight 2014 3rd 40.5% I L S V R C 1 2 Classification + Localization 3 models yes ConvNet C U H K DeepID-Net 2014 2nd 40.7% I L S V R C 1 2 Classification + Localization ? no ConvNet GoogLeNet 2014 1st 43.9% I L S V R C 1 2 Classification 6 models no ConvNet Detection results source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014
  35. 35. 41source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014 GoogLeNet Convolution Pooling Softmax Other Winners of: 
 Large Scale Visual Recognition Challenge 2014 
 (ILSVRC2014) 19 September 2014 GoogLeNet Convolution Pooling Softmax Other
  36. 36. 42source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014 Inception Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules. Can remove fully connected layers on top completely Number of parameters is reduced to 5 million 256 480 480 512 512 512 832 832 1024 Computional cost is increased by less than 2X compared to Krizhevsky’s network. (<1.5Bn operations/evaluation)
  37. 37. Impact on Computer Vision Latest State of the Art:
  38. 38. Computer Vision: Current State of the Art
  39. 39. Impact on Audio Processing 45 First public Breakthrough with Deep Learning in 2010 Dahl et al. (2010)

  40. 40. Impact on Audio Processing 46 First public Breakthrough with Deep Learning in 2010 Dahl et al. (2010)
 -33%! -32%!
  41. 41. Impact on Audio Processing 47 Speech Recognition
  42. 42. Impact on Audio Processing 48 TIMIT Speech Recognition (from: Clarifai)
  43. 43. Impact on Audio Processing
  44. 44. C&W 2011 Impact on Natural Language Processing Pos: Toutanova et al.
 2003) Ner: Ando & Zhang 
 2005 C&W 2011
  45. 45. Impact on Natural Language Processing Named Entity Recognition:
  46. 46. Deep Learning: Who’s to blame?
  47. 47. 53 Deep Learning: Who’s to blame?
  48. 48. Deep Architectures can be representationally efficient • Fewer computational units for same function Deep Representations might allow for a hierarchy or representation • Allows non-local generalisation • Comprehensibility Multiple levels of latent variables allow combinatorial sharing of statistical strength 54 Deep Learning: Why?
  49. 49. — Andrew Ng “I’ve worked all my life in Machine Learning, and I’ve never seen one algorithm knock over benchmarks like Deep Learning” Deep Learning: Why? 55
  50. 50. Biological Justification Deep Learning = Brain “inspired”
 Audio/Visual Cortex has multiple stages == Hierarchical
  51. 51. Different Levels of Abstraction 57
  52. 52. Hierarchical Learning • Natural progression from low level to high level structure as seen in natural complexity Different Levels of Abstraction Feature Representation 58
  53. 53. Hierarchical Learning • Natural progression from low level to high level structure as seen in natural complexity• Easier to monitor what is being learnt and to guide the machine to better subspaces Different Levels of Abstraction Feature Representation 59
  54. 54. Hierarchical Learning • Natural progression from low level to high level structure as seen in natural complexity• Easier to monitor what is being learnt and to guide the machine to better subspaces • A good lower level representation can be used for many distinct tasks Different Levels of Abstraction Feature Representation 60
  55. 55. Hierarchical Learning • Natural progression from low level to high level structure as seen in natural complexity• Easier to monitor what is being learnt and to guide the machine to better subspaces • A good lower level representation can be used for many distinct tasks Different Levels of Abstraction Feature Representation 61
  56. 56. Different Levels of Abstraction
  57. 57. Classic Deep Architecture Input layer Hidden layers Output layer
  58. 58. Modern Deep Architecture Input layer Hidden layers Output layer movie time: http://www.cs.toronto.edu/~hinton/adi/index.htm
  59. 59. Hierarchies Efficient Generalization Distributed Sharing Unsupervised* Black Box Training Time Major PWNAGE! Much Data Why go Deep ? 65
  60. 60. No More Handcrafted Features ! 66
  61. 61. [Kudos to Richard Socher, for this eloquent summary :) ] • Manually designed features are often over-specified, incomplete and take a long time to design and validate • Learned Features are easy to adapt, fast to learn • Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information. • Deep learning can learn unsupervised (from raw text/audio/ images/whatever content) and supervised (with specific labels like positive/negative) Why Deep Learning ?
  62. 62. Deep Learning: Future Developments Currently an explosion of developments • Hessian-Free networks (2010) • Long Short Term Memory (2011) • Large Convolutional nets, max-pooling (2011) • Nesterov’s Gradient Descent (2013) Currently state of the art but... • No way of doing logical inference (extrapolation) • No easy integration of abstract knowledge • Hypothetic space bias might not conform with reality 68
  63. 63. Deep Learning: Future Challenges a 69 Szegedy, C., Wojciech, Z., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R. (2013) Intriguing properties of neural networks L: correctly identified, Center: added noise x10, R: “Ostrich”
  64. 64. as PhD candidate KTH/CSC: “Always interested in discussing Machine Learning, Deep Architectures, Graphs, and Language Technology” In touch! roelof@kth.se www.csc.kth.se/~roelof/ Data Science ConsultancyAcademic/Research roelof@gve-systems.com www.gve-systems.com 72 Gve Systems Graph Technologies
  65. 65. • Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http://deeplearning.net/software/theano/ • Pylearn2 - library designed to make machine learning research easy. http://deeplearning.net/software/ pylearn2/ • Torch - Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/ • more info: http://deeplearning.net/software links/ Wanna Play ? Wanna Play ? General Deep Learning 73
  66. 66. • RNNLM (Mikolov)
 http://rnnlm.org • NB-SVM
 https://github.com/mesnilgr/nbsvm • Word2Vec (skipgrams/cbow)
 https://code.google.com/p/word2vec/ (original)
 http://radimrehurek.com/gensim/models/word2vec.html (python) • GloVe
 http://nlp.stanford.edu/projects/glove/ (original)
 https://github.com/maciejkula/glove-python (python) • Socher et al / Stanford RNN Sentiment code:
 http://nlp.stanford.edu/sentiment/code.html • Deep Learning without Magic Tutorial:
 http://nlp.stanford.edu/courses/NAACL2013/ Wanna Play ? NLP 74
  67. 67. • cuda-convnet2 (Alex Krizhevsky, Toronto) (c++/ CUDA, optimized for GTX 580) 
 https://code.google.com/p/cuda-convnet2/ • Caffe (Berkeley) (Cuda/OpenCL, Theano, Python)
 http://caffe.berkeleyvision.org/ • OverFeat (NYU) 
 http://cilvr.nyu.edu/doku.php?id=code:start Wanna Play ? Computer Vision 75

×