Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Understanding Deep Learning

51 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2QcNaOA.

Jessica Yung talks about the foundational concepts about neural networks. She highlights key things to pay attention to: learning rates, how to initialize a network, how the networks are constructed and trained, and understanding why these parameters are so important. She ends the talk with practical takeaways used by state-of-the-art models to help us start building neural networks. Filmed at qconlondon.com.

Jessica Yung is a research masters student in ML at University College London. She was previously at the University of Cambridge and was an NVIDIA Self-Driving Car Engineer Scholar. She applied machine learning to finance at Jump Trading and consults on machine learning. She is keen on sharing knowledge and writes about ML and how to learn effectively on her blog at JessicaYung.com.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Understanding Deep Learning

  1. 1. Understanding Deep Learning Jessica Yung
  2. 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ understanding-deep-learning • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  4. 4. Time Price
  5. 5. ? Time Price
  6. 6. ? Time Price
  7. 7. learning rate
  8. 8. learning rate
  9. 9. def debug(): return 0 + “no”
  10. 10. 567 563m8dep j6lmj? _ g “?@”
  11. 11. Why deep learning?
  12. 12. (Super) human performance in
  13. 13. (Super) human performance in
  14. 14. (Super) human performance in
  15. 15. (Super) human performance in
  16. 16. Outline 1. Single layer and putting layers together 2. How to train models 3. Deeper is better 4. Practical tips
  17. 17. Understanding is not something you can Google.
  18. 18. Machine learning is engineering-driven.
  19. 19. I. Single Layer
  20. 20. Fully connected layer + Linear layer Non-linearity
  21. 21. A single layer Linear layer Non-linearity
  22. 22. Linear layer
  23. 23. Non-linearities Sigmoid ReLU
  24. 24. Non-linearities Sigmoid ReLU max(0, x)
  25. 25. Output layer
  26. 26. Sigmoid tanh
  27. 27. Sigmoid Linear layer
  28. 28. Sigmoid
  29. 29. MNIST dataset
  30. 30. [0.0003 0.047 0.95] Softmax(3) [-1, 4, 7]
  31. 31. Wider Deeper
  32. 32. Wider Deeper
  33. 33. Time of day Coffee needed Keynote Break Lunch Talk!
  34. 34. In
  35. 35. In Sigmoid
  36. 36. In
  37. 37. In Sophisticated feature Coffee needed
  38. 38. In Sophisticated feature Coffee needed Time of day
  39. 39. In Sophisticated feature Coffee needed Time of day
  40. 40. In Sophisticated feature Coffee needed Time of day
  41. 41. In Sophisticated feature Coffee needed Time of day
  42. 42. In Sophisticated feature Coffee needed Time of day
  43. 43. In Sophisticated feature Coffee needed Time of day
  44. 44. In Sophisticated feature Coffee needed Time of day
  45. 45. In Sophisticated feature Coffee needed Time of day
  46. 46. In
  47. 47. In
  48. 48. + =
  49. 49. + =
  50. 50. + = - =
  51. 51. + = - =
  52. 52. In
  53. 53. In
  54. 54. Time of day Coffee needed Keynote Break Lunch Talk!
  55. 55. Time of day Coffee needed Keynote Break Lunch Talk!
  56. 56. Time of day Coffee needed Keynote Break Lunch Talk!
  57. 57. Time of day Coffee needed Keynote Break Lunch Talk!
  58. 58. Wider Deeper
  59. 59. Foundational for CNNs and RNNs
  60. 60. Convolutional layers 1. Translation invariance 2. Locality
  61. 61. W1 W1 W1 Output
  62. 62. Convolutional layers Linear layer Non-linearity
  63. 63. Convolutional layers Linear layer Non-linearity Reshape
  64. 64. Convolutional layers Linear layer Non-linearity Reshape Reshape
  65. 65. Recurrent layers
  66. 66. Recurrent layers +Linearlayer Non-linearity
  67. 67. Recurrent layers +Linearlayer Non-linearity +Linearlayer Non-linearity +Linearlayer Non-linearity previous state/memory
  68. 68. Recurrent layers +Linearlayer Non-linearity Input xt +Linearlayer Non-linearity +Linearlayer Non-linearity previous state/memory
  69. 69. Recurrent layers +Linearlayer Non-linearity Input xt Output +Linearlayer Non-linearity +Linearlayer Non-linearity previous state/memory
  70. 70. II. How train?
  71. 71. Metric: ~Prediction error
  72. 72. Parameters
  73. 73. Parameters Reward
  74. 74. Parameters Reward
  75. 75. Parameters Reward
  76. 76. step size x gradient
  77. 77. Gradient descent
  78. 78. Gradient descent Parameters
  79. 79. Gradient descent Gradient Parameters
  80. 80. Gradient descent Learning rate (step size) Gradient Parameters
  81. 81. Loss Iterations
  82. 82. Loss Iterations
  83. 83. gradient x step size Loss Iterations
  84. 84. gradient x step size Loss Iterations
  85. 85. gradient x step size Loss Iterations
  86. 86. Loss Iterations
  87. 87. Loss Iterations
  88. 88. gradient x step size Loss Iterations
  89. 89. gradient x step size Loss Iterations
  90. 90. gradient x step size Loss Iterations
  91. 91. Loss Iterations
  92. 92. Loss Iterations
  93. 93. Loss Iterations gradient x step size
  94. 94. Loss Iterations gradient x step size
  95. 95. Loss Iterations gradient x step size
  96. 96. Loss Iterations
  97. 97. Loss Iterations
  98. 98. gradient x step size Loss Iterations
  99. 99. gradient x step size Loss Iterations
  100. 100. gradient x step size Loss Iterations
  101. 101. Dense 1 Dense 2 Linear Output Input Softmax(3) Iris dataset
  102. 102. How calculate gradient?
  103. 103. How calculate gradient?
  104. 104. Dense 1 Dense 2 Dense 10 … Output Input
  105. 105. Dense 1 Dense 2 Dense 10 … Output Input
  106. 106. Dense 1 Dense 2 Dense 10 … Output Input
  107. 107. Dense 1 Dense 2 Dense 10 … Output Input
  108. 108. Dense 1 Dense 2 Dense 10 … Output Input
  109. 109. Dense 1 Dense 2 Dense 10 … Output Input
  110. 110. Dense 1 Dense 2 Dense 10 … Output Input
  111. 111. Dense 1 Dense 2 Dense 10 … Output Input Backpropagation
  112. 112. Dense 1 Dense 2 Dense 10 … Output Input Backpropagation
  113. 113. Dense 1 Dense 2 Dense 3 Output Input
  114. 114. Dense 1 Dense 2 Dense 3 Output Input
  115. 115. A single layer
  116. 116. A single layer Linear layer Non-linearity
  117. 117. A single layer Linear layer Non-linearity x
  118. 118. A single layer Linear layer Non-linearity x
  119. 119. Initialise weights with SMALL variance!
  120. 120. Initialise weights with SMALL variance! e.g. Glorot initialisation
  121. 121. A single layer Linear layer Non-linearity x
  122. 122. A single layer Linear layer Non-linearity x
  123. 123. A single layer Linear layer Non-linearity x
  124. 124. Initialise weights with SMALL variance!
  125. 125. Initialise weights with SMALL variance! e.g. Glorot initialisation
  126. 126. III. Deeper is better
  127. 127. ImageNet
  128. 128. ImageNet leopard
  129. 129. sabre-toothed tiger snow leopardcat ImageNet leopard
  130. 130. sabre-toothed tiger snow leopardcat ImageNet leopard tree shrew
  131. 131. sabre-toothed tiger snow leopardcat ImageNet leopard tree shrew radiator
  132. 132. Data from Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. ‘Deep Residual Learning for Image Recognition’, CVPR 2016. Revolution of Depth: Top models on ImageNet Numberoflayers 0 40 80 120 160 2010 2011 2012 
 AlexNet 2013 2014
 VGG 2014 
 GoogLeNet 2015 
 ResNet
  133. 133. Learning rate (step size) Gradient Parameters
  134. 134. Vanishing gradients
  135. 135. Exploding gradients
  136. 136. Vanishing gradients
  137. 137. Vanishing gradients
  138. 138. Vanishing gradients
  139. 139. Skip connections Output Dense 1 Dense 2 Input ResNets (He et. al., 2015)
  140. 140. Skip connections Output Dense 1 Dense 2 Input + ResNets (He et. al., 2015)
  141. 141. Data from Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. ‘Deep Residual Learning for Image Recognition’, CVPR 2016. Revolution of Depth: Top models on ImageNet Numberoflayers 0 40 80 120 160 2010 2011 2012 
 AlexNet 2013 2014
 VGG 2014 
 GoogLeNet 2015 
 ResNet
  142. 142. IV. Practical Tips
  143. 143. Overfit First
  144. 144. Model a few examples
  145. 145. Batchnorm
  146. 146. Output Dense 1 Dense 2 Input Batchnorm Batchnorm Output Dense 1 Dense 2 Input Ioffe and Szegedy, 2015
  147. 147. Output Dense 1 Dense 2 Input Batchnorm Batchnorm Ioffe and Szegedy, 2015
  148. 148. Transfer Learning
  149. 149. Based on Zeiler and Fergus, 2013 § Low Mid High
  150. 150. Trained network Last layer(s) e.g. ResNet
  151. 151. Trained network Last layer(s) Mid/high level features e.g. ResNet
  152. 152. Trained network Last layer(s) Mid/high level features e.g. ResNet
  153. 153. Trained network Last layer(s) New Mid/high level features e.g. ResNet
  154. 154. Trained network New output Last layer(s) New Mid/high level features e.g. ResNet
  155. 155. Simplify
  156. 156. VGG-19 2 x conv-64 pool/2 2 x conv-128 pool/2 4 x conv-256 pool/2 4 x conv-512 pool/2 4 x conv-512 pool/2 3 x dense softmax Simonyan and Zisserman, 2014
  157. 157. Dense 1
  158. 158. Dense 2 Linear Softmax(3) Dense 1
  159. 159. Output Input Dense 2 Linear Softmax(3) Dense 1
  160. 160. Time and Effort Model performance
  161. 161. Time and Effort Model performance
  162. 162. Time and Effort Model performance
  163. 163. Thank You!
  164. 164. Appendix
  165. 165. Resources on Coding Neural Networks • Jason Brownlee’s Machine Learning Mastery (lots of code) • My blog posts on training models in TensorFlow (training a model, MLP, CNN)
  166. 166. Resources on Deep Learning • THE Deep Learning book (Ian Goodfellow, Yoshua Bengio, Aaron Courville) • Stanford’s course on Convolutional Neural Networks • fast.ai courses on neural networks (haven’t tried much myself but I’ve heard they are good)
  167. 167. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ understanding-deep-learning

×