Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning: Theory, History, State of the Art & Practical Tools

6,133 views

Published on

An in ntroductory talk about Deep Learning given at Machine Learning Estonia meetup.

Published in: Science

Deep Learning: Theory, History, State of the Art & Practical Tools

  1. 1. by Ilya Kuzovkin ilya.kuzovkin@gmail.com Machine Learning Estonia 2016 Deep Learning THEORY, HISTORY, STATE OF THE ART & PRACTICAL TOOLS http://neuro.cs.ut.ee
  2. 2. Where it has started How it evolved What is the state now How can you use it How it learns
  3. 3. Where it has started
  4. 4. Where it has started Artificial Neuron
  5. 5. Where it has started Artificial Neuron McCulloch and Pitts “A Logical Calculus of the Ideas Immanent in Nervous Activity” 1943
  6. 6. Where it has started Artificial Neuron McCulloch and Pitts “A Logical Calculus of the Ideas Immanent in Nervous Activity” 1943
  7. 7. Where it has started Perceptron Frank Rosenblatt1957
  8. 8. Where it has started Perceptron Frank Rosenblatt1957
  9. 9. Where it has started Perceptron Frank Rosenblatt1957
  10. 10. Where it has started Perceptron Frank Rosenblatt1957 “[The Perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” THE NEW YORK TIMES
  11. 11. Where it has started Sigmoid & Backpropagation
  12. 12. Where it has started Sigmoid & Backpropagation
  13. 13. Where it has started Sigmoid & Backpropagation Measure how small changes in weights affect output Can apply NN to regression
  14. 14. Where it has started Sigmoid & Backpropagation (Werbos) Rumelhart, Hinton, Williams (1974) 1986 “Learning representations by back-propagating errors” (Nature) Measure how small changes in weights affect output Can apply NN to regression
  15. 15. Where it has started Sigmoid & Backpropagation (Werbos) Rumelhart, Hinton, Williams (1974) 1986 “Learning representations by back-propagating errors” (Nature) Measure how small changes in weights affect output Multilayer neural networks, etc. Can apply NN to regression
  16. 16. Where it has started Why DL revolution did not happen in 1986? FROM A TALK BY GEOFFREY HINTON
  17. 17. Where it has started Why DL revolution did not happen in 1986? • Not enough data (datasets were 1000 times too small) ! • Computers were too slow (1,000,000 times) FROM A TALK BY GEOFFREY HINTON
  18. 18. Where it has started Why DL revolution did not happen in 1986? • Not enough data (datasets were 1000 times too small) ! • Computers were too slow (1,000,000 times) ! • Not enough attention to network initialization ! • Wrong non-linearity FROM A TALK BY GEOFFREY HINTON
  19. 19. How it learns
  20. 20. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example NET OUT
  21. 21. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 NET OUT
  22. 22. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT
  23. 23. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT
  24. 24. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT
  25. 25. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT
  26. 26. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT Repeat for h2 = 0.596, o1 = 0.751, o2 = 0.773
  27. 27. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT We have o1, o2
  28. 28. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT We have o1, o2
  29. 29. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT We have o1, o2
  30. 30. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT We have o1, o2
  31. 31. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Forward Pass — Calculating the total error NET OUT We have o1, o2
  32. 32. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  33. 33. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  34. 34. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  35. 35. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  36. 36. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  37. 37. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  38. 38. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  39. 39. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  40. 40. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  41. 41. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  42. 42. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  43. 43. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  44. 44. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  45. 45. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  46. 46. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  47. 47. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  48. 48. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT
  49. 49. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT Learning rate
  50. 50. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights NET OUT Gradient descent update rule Learning rate
  51. 51. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 NET OUT
  52. 52. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 NET OUT • Repeat for w6, w7, w8
  53. 53. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 NET OUT • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4
  54. 54. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 NET OUT • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again: it was: 0.291027924 0.298371109
  55. 55. How it learns Backpropagation http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 NET OUT • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again: it was: ! • Repeat 10,000 times: 0.291027924 0.298371109 0.000035085
  56. 56. How it learns Optimization methods Alec Radford “Introduction to Deep Learning with Python”
  57. 57. How it learns Optimization methods Alec Radford “Introduction to Deep Learning with Python”
  58. 58. How it evolved
  59. 59. How it evolved 1-layer NN INPUT OUTPUT Alec Radford “Introduction to Deep Learning with Python”
  60. 60. How it evolved 1-layer NN 92.5% on the MNIST test set INPUT OUTPUT Alec Radford “Introduction to Deep Learning with Python”
  61. 61. How it evolved 1-layer NN 92.5% on the MNIST test set INPUT OUTPUT Alec Radford “Introduction to Deep Learning with Python”
  62. 62. How it evolved One hidden layer Alec Radford “Introduction to Deep Learning with Python”
  63. 63. How it evolved One hidden layer 98.2% on the MNIST test set Alec Radford “Introduction to Deep Learning with Python”
  64. 64. How it evolved One hidden layer 98.2% on the MNIST test set Activity of a 100 hidden neurons (out of 625) Alec Radford “Introduction to Deep Learning with Python”
  65. 65. How it evolved Overfitting
  66. 66. How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014
  67. 67. How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014
  68. 68. How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014
  69. 69. How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014
  70. 70. How it evolved ReLU X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011
  71. 71. How it evolved ReLU X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011
  72. 72. How it evolved “Modern” ANN • Several hidden layers • ReLU activation units • Dropout
  73. 73. How it evolved “Modern” ANN • Several hidden layers • ReLU activation units • Dropout
  74. 74. How it evolved “Modern” ANN • Several hidden layers • ReLU activation units • Dropout 99.0% on the MNIST test set
  75. 75. How it evolved Convolution
  76. 76. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector
  77. 77. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector
  78. 78. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector
  79. 79. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector 40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10
  80. 80. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector +1 +1 +1 0 0 0 -1 -1 -1 40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10
  81. 81. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector +40 +40 +40 0 0 0 -40 -40 -40 10 10 10 10 10 10 10 10 10
  82. 82. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector 0 +40 +40 +40 0 0 0 -40 -40 -40 10 10 10 10 10 10 10 10 10
  83. 83. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector 0 90 90 0 40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10
  84. 84. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector 0 90 90 0 40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10
  85. 85. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector 0 90 90 0 40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10
  86. 86. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector 0 90 90 0 40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10 Edge detector is a handcrafted feature detector.
  87. 87. How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones
  88. 88. How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones http://yann.lecun.com/exdb/mnist/
  89. 89. How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones 99.50% on the MNIST test set CURRENT BEST: 99.77% by committee of 35 conv. nets http://yann.lecun.com/exdb/mnist/
  90. 90. How it evolved More layers
  91. 91. How it evolved More layers C. Szegedy, et al., “Going Deeper with Convolutions”, 2014
  92. 92. How it evolved More layers C. Szegedy, et al., “Going Deeper with Convolutions”, 2014 ILSVRC 2015 winner — 152 (!) layers K. He et al., “Deep Residual Learning for Image Recognition”, 2015
  93. 93. How it evolved Hyperparameters • Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • Convolutional layers: • size • stride • number of filters • Optimization method: • learning rate • other method-specific constants • …
  94. 94. How it evolved Hyperparameters • Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • Convolutional layers: • size • stride • number of filters • Optimization method: • learning rate • other method-specific constants • … Grid search :(
  95. 95. How it evolved Hyperparameters • Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • Convolutional layers: • size • stride • number of filters • Optimization method: • learning rate • other method-specific constants • … Grid search :( Random search :/
  96. 96. How it evolved Hyperparameters • Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • Convolutional layers: • size • stride • number of filters • Optimization method: • learning rate • other method-specific constants • … Grid search :( Random search :/ Bayesian optimization :) Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”
  97. 97. How it evolved Hyperparameters • Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • Convolutional layers: • size • stride • number of filters • Optimization method: • learning rate • other method-specific constants • … Grid search :( Random search :/ Bayesian optimization :) Informal parameter search :) Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”
  98. 98. How it evolved Major Types of ANNs feedforward convolutional
  99. 99. How it evolved Major Types of ANNs recurrent feedforward convolutional
  100. 100. How it evolved Major Types of ANNs recurrent autoencoder feedforward convolutional
  101. 101. What is the state now
  102. 102. What is the state now Computer vision http://cs.stanford.edu/people/karpathy/deepimagesent/ Kaiming He, et al. “Deep Residual Learning for Image Recognition” 2015
  103. 103. What is the state now Natural Language Processing speech recognition + translation Facebook bAbi dataset: question answering http://smerity.com/articles/2015/keras_qa.html http://karpathy.github.io/2015/05/21/rnn-effectiveness/
  104. 104. What is the state now AI DeepMind’s DQN
  105. 105. What is the state now AI Sukhbaatar et al. “MazeBase: A Sandbox for Learning from Games”, 2015DeepMind’s DQN
  106. 106. What is the state now Neuroscience Güclü and Gerven, “Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream”, 2015
  107. 107. What is the state now Neuroscience Güclü and Gerven, “Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream”, 2015
  108. 108. How can you use it
  109. 109. How can you use it Pre-trained models
  110. 110. How can you use it Pre-trained models • Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model ! ! ! ! • … and use it in your application
  111. 111. How can you use it Pre-trained models • Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model ! ! ! ! • … and use it in your application • Or … ! ! !
  112. 112. How can you use it Pre-trained models • Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model ! ! ! ! • … and use it in your application • Or … ! ! ! ! • … use part of it as the starting point for your model . . .
  113. 113. How can you use it Zoo of Frameworks Low-level High-level & Wrappers
  114. 114. How can you use it Keras http://keras.io
  115. 115. How can you use it Keras http://keras.io
  116. 116. How can you use it Keras http://keras.io
  117. 117. How can you use it Keras http://keras.io
  118. 118. How can you use it Keras http://keras.io
  119. 119. How can you use it Keras http://keras.io
  120. 120. How can you use it Keras http://keras.io
  121. 121. • A Step by Step Backpropagation Example http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example • Online book by By Michael Nielsen http://neuralnetworksanddeeplearning.com • CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.stanford.edu/

×