Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
From Artificial Neural
Networks to Deep learning
Viet-Trung Tran

1	
  
2	
  
3	
  
4	
  
5	
  
Perceptron
•  Rosenblatt 1957
•  input signals x1, x2, 
•  bias x0 = 1
•  Net input = weighted sum = Net(w,x)
•  Activatio...
Weighted Sum and Bias
•  Weighted sum
•  Bias 
7	
  
8	
  
Hard-limiter function
•  Hard-limiter
– Threshold function
– Discontinuous function
– Discontinuous derivative

9	
  
Threshold logic function
•  Saturating linear
function
•  Contiguous
function
•  Discontinuous
derivative
10	
  
Sigmoid function
•  Most popular
•  Output (0,1)
•  Continuous derivatives
•  Easy to differentiate
11	
  
Artificial neural network – ANN
structure
•  Number of input/output signals
•  Number of hidden layers
•  Number of neurons...
Feed-forward neural network
•  connections between the units do not form a
directed cycle
13	
  
Recurrent neural network
•  A class of artificial neural network where
connections between units form a directed
cycle
14	
...
Why hidden layers
15	
  
Neural network learning 
•  2 types of learning
– Parameter learning
•  Learn neuron weight connections
– Structure learni...
Error function
•  Consider an ANN with n neurons
•  For each learning example (x,d)
– Training error caused by current wei...
Learning principle
18	
  
Neuron error gradients
19	
  
Parameter learning: back
propagation of error
•  Calculate total error at the top
•  Calculate contributions to error at e...
Back propagation discussion
•  Initial weights 
•  Learning rate
•  Number of neurons per hidden layers
•  Number of hidde...
Stochastic gradient descent
(SGD)
22	
  
23	
  
Deep learning
24	
  
Google brain
25	
  
GPU
26	
  
Learning from tagged data
•  @Andrew Ng
27	
  
2006 breakthrough
•  More data
•  Faster hardware: GPU’s, multi-core CPU’s
•  Working ideas on how to train deep
architect...
29	
  
30	
  
31	
  
Deep Learning trends
•  @Andrew Ng
32	
  
33	
  
34	
  
AI will transform the internet
•  @Andrew Ng
•  Technology areas with potential for paradigm shift:
–  Computer vision
–  ...
36	
  
37	
  
38	
  
Deep learning
39	
  
40	
  
CONVOLUTIONAL NEURAL
NETWORK
http://colah.github.io/
41	
  
Convolution
•  Convolution is a mathematical operation on two
functions f and g, producing a third function that is
typica...
Convolutional neural networks
•  Conv Nets is a kind of neural network that
uses many identical copies of the same
neuron
...
A 2D Convolutional Neural
Network
•  a convolutional neural network can learn a neuron once and
use it in many places, mak...
Structure of Conv Nets
•  Problem
– predict whether a human is speaking or not
•  Input: audio samples at different points ...
Simple approach
•  just connect them all to a fully-connected
layer
•  Then classify
46	
  
A more sophisticated approach 
•  Local properties of the data
–  frequency of sounds (increasing/decreasing)
•  Look at a...
48	
  
49	
  
Max pooling layer
50	
  
2D convolutional neural networks
51	
  
52	
  
53	
  
Three-dimensional convolutional
networks 
54	
  
Group of neurons: A
•  Bunch of neurons in parallel
•  all get the same inputs and compute different
features.
55	
  
Network in Network (Lin et al.
(2013)
56	
  
Conv Nets breakthroughs in
computer vision
•  Krizehvsky et al. (2012)
57	
  
Diferent Levels of Abstraction
58	
  
59	
  
60	
  
RECURRENT NEURAL
NETWORKS




http://colah.github.io/
61	
  
Recurrent Neural Networks (RNN)
have loops
•  A loop allows information to
be passed from one step of
the network to the n...
Unroll RNN
•  recurrent neural networks are intimately
related to sequences and lists. 
63	
  
Examples
•  predict the last word in “the clouds are in the sky"
•  the gap between the relevant information and the
place...
•  “I grew up in France… I speak fluent French.”
•  As the gap grows, RNNs become unable to
learn to connect the informatio...
LONG SHORT TERM MEMORY
NETWORKS
LSTM Networks
66	
  
LSTM networks
•  A Special kind of RNN
•  Capable of learning long-term dependencies
•  Structure in the form of a chain o...
RNN
•  repeating module has a very simple
structure, such as a single tanh layer
68	
  
•  The tanh(z) function is a rescaled version of
the sigmoid, and its output range is [ − 1,1]
instead of [0,1].
69	
  
LSTM networks
•  Repeating module consists of four neuron,
interacting in a very special way 
70	
  
Core idea behind LSTMs
•  The key to LSTMs is the cell state, the horizontal line
running through the top of the diagram.
...
Gates
•  The ability to remove or add information to
the cell state, carefully regulated by
structures called gates
•  Sig...
LSTM step 1
•  decide what information we’re going to throw
away from the cell state
•  forget gate layer
73	
  
LSTM step 2
•  decide what new information we’re going to
store in the cell state
•  input gate layer
74	
  
LSTMs step 3
•  update the old cell state, Ct−1, into the new
cell state Ct
75	
  
LSTMs step 4
•  decide what we’re going to output
76	
  
77	
  
78	
  
79	
  
80	
  
RECURRENT NEURAL
NETWORKS WITH WORD
EMBEDDINGS
81	
  
APPENDIX
82	
  
83	
  
Perceptron 1957
84	
  
Perceptron 1957
85	
  
Perceptron 1986
86	
  
Perceptron
87	
  
Activation function
88	
  
Back propagation 1974/1986
89	
  
90	
  
91	
  
•  Inspired by the architectural depth of the brain,
researchers wanted for decades to train deep multi-
layer neural netw...
2006 breakthrough
•  More data
•  Faster hardware: GPU’s, multi-core CPU’s
•  Working ideas on how to train deep
architect...
•  Beat state of the art in many areas:
– Language Modeling (2012, Mikolov et al)
– Image Recognition (Krizhevsky won 2012...
Credits
•  Roelof Pieters, www.graph-technologies.com
•  Andrew Ng
•  http://colah.github.io/
95	
  
Upcoming SlideShare
Loading in …5
×

From neural networks to deep learning

6,088 views

Published on

From neural networks to deep learning

Published in: Data & Analytics

From neural networks to deep learning

  1. 1. From Artificial Neural Networks to Deep learning Viet-Trung Tran 1  
  2. 2. 2  
  3. 3. 3  
  4. 4. 4  
  5. 5. 5  
  6. 6. Perceptron •  Rosenblatt 1957 •  input signals x1, x2, •  bias x0 = 1 •  Net input = weighted sum = Net(w,x) •  Activation/transfer func = f(Net(w,x)) •  output weighted  sum   step  func1on   6  
  7. 7. Weighted Sum and Bias •  Weighted sum •  Bias 7  
  8. 8. 8  
  9. 9. Hard-limiter function •  Hard-limiter – Threshold function – Discontinuous function – Discontinuous derivative 9  
  10. 10. Threshold logic function •  Saturating linear function •  Contiguous function •  Discontinuous derivative 10  
  11. 11. Sigmoid function •  Most popular •  Output (0,1) •  Continuous derivatives •  Easy to differentiate 11  
  12. 12. Artificial neural network – ANN structure •  Number of input/output signals •  Number of hidden layers •  Number of neurons per layer •  Neuron weights •  Topology •  Biases 12  
  13. 13. Feed-forward neural network •  connections between the units do not form a directed cycle 13  
  14. 14. Recurrent neural network •  A class of artificial neural network where connections between units form a directed cycle 14  
  15. 15. Why hidden layers 15  
  16. 16. Neural network learning •  2 types of learning – Parameter learning •  Learn neuron weight connections – Structure learning •  Learn ANN structure from training data 16  
  17. 17. Error function •  Consider an ANN with n neurons •  For each learning example (x,d) – Training error caused by current weight w •  Training error caused by w for entire learning examples 17  
  18. 18. Learning principle 18  
  19. 19. Neuron error gradients 19  
  20. 20. Parameter learning: back propagation of error •  Calculate total error at the top •  Calculate contributions to error at each step going backwards 20  
  21. 21. Back propagation discussion •  Initial weights •  Learning rate •  Number of neurons per hidden layers •  Number of hidden layers 21  
  22. 22. Stochastic gradient descent (SGD) 22  
  23. 23. 23  
  24. 24. Deep learning 24  
  25. 25. Google brain 25  
  26. 26. GPU 26  
  27. 27. Learning from tagged data •  @Andrew Ng 27  
  28. 28. 2006 breakthrough •  More data •  Faster hardware: GPU’s, multi-core CPU’s •  Working ideas on how to train deep architectures 28  
  29. 29. 29  
  30. 30. 30  
  31. 31. 31  
  32. 32. Deep Learning trends •  @Andrew Ng 32  
  33. 33. 33  
  34. 34. 34  
  35. 35. AI will transform the internet •  @Andrew Ng •  Technology areas with potential for paradigm shift: –  Computer vision –  Speech recognition & speech synthesis –  Language understanding: Machine translation; Web search; Dialog systems; …. –  Advertising –  Personalization/recommendation systems –  Robotics •  All this is hard: scalability, algorithms. 35  
  36. 36. 36  
  37. 37. 37  
  38. 38. 38  
  39. 39. Deep learning 39  
  40. 40. 40  
  41. 41. CONVOLUTIONAL NEURAL NETWORK http://colah.github.io/ 41  
  42. 42. Convolution •  Convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions, 42  
  43. 43. Convolutional neural networks •  Conv Nets is a kind of neural network that uses many identical copies of the same neuron – Large number of neurons – Large computational models – Number of actual weights (parameters) to be learned fairly small 43  
  44. 44. A 2D Convolutional Neural Network •  a convolutional neural network can learn a neuron once and use it in many places, making it easier to learn the model and reducing error. 44  
  45. 45. Structure of Conv Nets •  Problem – predict whether a human is speaking or not •  Input: audio samples at different points in time 45  
  46. 46. Simple approach •  just connect them all to a fully-connected layer •  Then classify 46  
  47. 47. A more sophisticated approach •  Local properties of the data –  frequency of sounds (increasing/decreasing) •  Look at a small window of the audio sample –  Create a group of neuron A to compute certain features –  the output of this convolutional layer is fed into a fully- connected layer, F 47  
  48. 48. 48  
  49. 49. 49  
  50. 50. Max pooling layer 50  
  51. 51. 2D convolutional neural networks 51  
  52. 52. 52  
  53. 53. 53  
  54. 54. Three-dimensional convolutional networks 54  
  55. 55. Group of neurons: A •  Bunch of neurons in parallel •  all get the same inputs and compute different features. 55  
  56. 56. Network in Network (Lin et al. (2013) 56  
  57. 57. Conv Nets breakthroughs in computer vision •  Krizehvsky et al. (2012) 57  
  58. 58. Diferent Levels of Abstraction 58  
  59. 59. 59  
  60. 60. 60  
  61. 61. RECURRENT NEURAL NETWORKS
 
 http://colah.github.io/ 61  
  62. 62. Recurrent Neural Networks (RNN) have loops •  A loop allows information to be passed from one step of the network to the next. 62  
  63. 63. Unroll RNN •  recurrent neural networks are intimately related to sequences and lists. 63  
  64. 64. Examples •  predict the last word in “the clouds are in the sky" •  the gap between the relevant information and the place that it’s needed is small •  RNNs can learn to use the past information 64  
  65. 65. •  “I grew up in France… I speak fluent French.” •  As the gap grows, RNNs become unable to learn to connect the information. 65  
  66. 66. LONG SHORT TERM MEMORY NETWORKS LSTM Networks 66  
  67. 67. LSTM networks •  A Special kind of RNN •  Capable of learning long-term dependencies •  Structure in the form of a chain of repeating modules of neural network 67  
  68. 68. RNN •  repeating module has a very simple structure, such as a single tanh layer 68  
  69. 69. •  The tanh(z) function is a rescaled version of the sigmoid, and its output range is [ − 1,1] instead of [0,1]. 69  
  70. 70. LSTM networks •  Repeating module consists of four neuron, interacting in a very special way 70  
  71. 71. Core idea behind LSTMs •  The key to LSTMs is the cell state, the horizontal line running through the top of the diagram. •  The cell state runs straight down the entire chain, with only some minor linear interactions •  Easy for information to just flow along it unchanged 71  
  72. 72. Gates •  The ability to remove or add information to the cell state, carefully regulated by structures called gates •  Sigmoid – How much of each component should be let through. – Zero means nothing through – One means let everything through •  An LSTM has three of these gates 72  
  73. 73. LSTM step 1 •  decide what information we’re going to throw away from the cell state •  forget gate layer 73  
  74. 74. LSTM step 2 •  decide what new information we’re going to store in the cell state •  input gate layer 74  
  75. 75. LSTMs step 3 •  update the old cell state, Ct−1, into the new cell state Ct 75  
  76. 76. LSTMs step 4 •  decide what we’re going to output 76  
  77. 77. 77  
  78. 78. 78  
  79. 79. 79  
  80. 80. 80  
  81. 81. RECURRENT NEURAL NETWORKS WITH WORD EMBEDDINGS 81  
  82. 82. APPENDIX 82  
  83. 83. 83  
  84. 84. Perceptron 1957 84  
  85. 85. Perceptron 1957 85  
  86. 86. Perceptron 1986 86  
  87. 87. Perceptron 87  
  88. 88. Activation function 88  
  89. 89. Back propagation 1974/1986 89  
  90. 90. 90  
  91. 91. 91  
  92. 92. •  Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi- layer neural networks. •  No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998 •  SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow •  architecture). •  Breakthrough in 2006! 92  
  93. 93. 2006 breakthrough •  More data •  Faster hardware: GPU’s, multi-core CPU’s •  Working ideas on how to train deep architectures 93  
  94. 94. •  Beat state of the art in many areas: – Language Modeling (2012, Mikolov et al) – Image Recognition (Krizhevsky won 2012 ImageNet competition) – Sentiment Classification (2011, Socher et al) – Speech Recognition (2010, Dahl et al) – MNIST hand-written digit recognition (Ciresan et al, 2010) 94  
  95. 95. Credits •  Roelof Pieters, www.graph-technologies.com •  Andrew Ng •  http://colah.github.io/ 95  

×