Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

6,088 views

Published on

From neural networks to deep learning

Published in:
Data & Analytics

No Downloads

Total views

6,088

On SlideShare

0

From Embeds

0

Number of Embeds

41

Shares

0

Downloads

202

Comments

0

Likes

12

No embeds

No notes for slide

- 1. From Artiﬁcial Neural Networks to Deep learning Viet-Trung Tran 1
- 2. 2
- 3. 3
- 4. 4
- 5. 5
- 6. Perceptron • Rosenblatt 1957 • input signals x1, x2, • bias x0 = 1 • Net input = weighted sum = Net(w,x) • Activation/transfer func = f(Net(w,x)) • output weighted sum step func1on 6
- 7. Weighted Sum and Bias • Weighted sum • Bias 7
- 8. 8
- 9. Hard-limiter function • Hard-limiter – Threshold function – Discontinuous function – Discontinuous derivative 9
- 10. Threshold logic function • Saturating linear function • Contiguous function • Discontinuous derivative 10
- 11. Sigmoid function • Most popular • Output (0,1) • Continuous derivatives • Easy to diﬀerentiate 11
- 12. Artiﬁcial neural network – ANN structure • Number of input/output signals • Number of hidden layers • Number of neurons per layer • Neuron weights • Topology • Biases 12
- 13. Feed-forward neural network • connections between the units do not form a directed cycle 13
- 14. Recurrent neural network • A class of artiﬁcial neural network where connections between units form a directed cycle 14
- 15. Why hidden layers 15
- 16. Neural network learning • 2 types of learning – Parameter learning • Learn neuron weight connections – Structure learning • Learn ANN structure from training data 16
- 17. Error function • Consider an ANN with n neurons • For each learning example (x,d) – Training error caused by current weight w • Training error caused by w for entire learning examples 17
- 18. Learning principle 18
- 19. Neuron error gradients 19
- 20. Parameter learning: back propagation of error • Calculate total error at the top • Calculate contributions to error at each step going backwards 20
- 21. Back propagation discussion • Initial weights • Learning rate • Number of neurons per hidden layers • Number of hidden layers 21
- 22. Stochastic gradient descent (SGD) 22
- 23. 23
- 24. Deep learning 24
- 25. Google brain 25
- 26. GPU 26
- 27. Learning from tagged data • @Andrew Ng 27
- 28. 2006 breakthrough • More data • Faster hardware: GPU’s, multi-core CPU’s • Working ideas on how to train deep architectures 28
- 29. 29
- 30. 30
- 31. 31
- 32. Deep Learning trends • @Andrew Ng 32
- 33. 33
- 34. 34
- 35. AI will transform the internet • @Andrew Ng • Technology areas with potential for paradigm shift: – Computer vision – Speech recognition & speech synthesis – Language understanding: Machine translation; Web search; Dialog systems; …. – Advertising – Personalization/recommendation systems – Robotics • All this is hard: scalability, algorithms. 35
- 36. 36
- 37. 37
- 38. 38
- 39. Deep learning 39
- 40. 40
- 41. CONVOLUTIONAL NEURAL NETWORK http://colah.github.io/ 41
- 42. Convolution • Convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modiﬁed version of one of the original functions, 42
- 43. Convolutional neural networks • Conv Nets is a kind of neural network that uses many identical copies of the same neuron – Large number of neurons – Large computational models – Number of actual weights (parameters) to be learned fairly small 43
- 44. A 2D Convolutional Neural Network • a convolutional neural network can learn a neuron once and use it in many places, making it easier to learn the model and reducing error. 44
- 45. Structure of Conv Nets • Problem – predict whether a human is speaking or not • Input: audio samples at diﬀerent points in time 45
- 46. Simple approach • just connect them all to a fully-connected layer • Then classify 46
- 47. A more sophisticated approach • Local properties of the data – frequency of sounds (increasing/decreasing) • Look at a small window of the audio sample – Create a group of neuron A to compute certain features – the output of this convolutional layer is fed into a fully- connected layer, F 47
- 48. 48
- 49. 49
- 50. Max pooling layer 50
- 51. 2D convolutional neural networks 51
- 52. 52
- 53. 53
- 54. Three-dimensional convolutional networks 54
- 55. Group of neurons: A • Bunch of neurons in parallel • all get the same inputs and compute diﬀerent features. 55
- 56. Network in Network (Lin et al. (2013) 56
- 57. Conv Nets breakthroughs in computer vision • Krizehvsky et al. (2012) 57
- 58. Diferent Levels of Abstraction 58
- 59. 59
- 60. 60
- 61. RECURRENT NEURAL NETWORKS http://colah.github.io/ 61
- 62. Recurrent Neural Networks (RNN) have loops • A loop allows information to be passed from one step of the network to the next. 62
- 63. Unroll RNN • recurrent neural networks are intimately related to sequences and lists. 63
- 64. Examples • predict the last word in “the clouds are in the sky" • the gap between the relevant information and the place that it’s needed is small • RNNs can learn to use the past information 64
- 65. • “I grew up in France… I speak ﬂuent French.” • As the gap grows, RNNs become unable to learn to connect the information. 65
- 66. LONG SHORT TERM MEMORY NETWORKS LSTM Networks 66
- 67. LSTM networks • A Special kind of RNN • Capable of learning long-term dependencies • Structure in the form of a chain of repeating modules of neural network 67
- 68. RNN • repeating module has a very simple structure, such as a single tanh layer 68
- 69. • The tanh(z) function is a rescaled version of the sigmoid, and its output range is [ − 1,1] instead of [0,1]. 69
- 70. LSTM networks • Repeating module consists of four neuron, interacting in a very special way 70
- 71. Core idea behind LSTMs • The key to LSTMs is the cell state, the horizontal line running through the top of the diagram. • The cell state runs straight down the entire chain, with only some minor linear interactions • Easy for information to just ﬂow along it unchanged 71
- 72. Gates • The ability to remove or add information to the cell state, carefully regulated by structures called gates • Sigmoid – How much of each component should be let through. – Zero means nothing through – One means let everything through • An LSTM has three of these gates 72
- 73. LSTM step 1 • decide what information we’re going to throw away from the cell state • forget gate layer 73
- 74. LSTM step 2 • decide what new information we’re going to store in the cell state • input gate layer 74
- 75. LSTMs step 3 • update the old cell state, Ct−1, into the new cell state Ct 75
- 76. LSTMs step 4 • decide what we’re going to output 76
- 77. 77
- 78. 78
- 79. 79
- 80. 80
- 81. RECURRENT NEURAL NETWORKS WITH WORD EMBEDDINGS 81
- 82. APPENDIX 82
- 83. 83
- 84. Perceptron 1957 84
- 85. Perceptron 1957 85
- 86. Perceptron 1986 86
- 87. Perceptron 87
- 88. Activation function 88
- 89. Back propagation 1974/1986 89
- 90. 90
- 91. 91
- 92. • Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi- layer neural networks. • No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998 • SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow • architecture). • Breakthrough in 2006! 92
- 93. 2006 breakthrough • More data • Faster hardware: GPU’s, multi-core CPU’s • Working ideas on how to train deep architectures 93
- 94. • Beat state of the art in many areas: – Language Modeling (2012, Mikolov et al) – Image Recognition (Krizhevsky won 2012 ImageNet competition) – Sentiment Classiﬁcation (2011, Socher et al) – Speech Recognition (2010, Dahl et al) – MNIST hand-written digit recognition (Ciresan et al, 2010) 94
- 95. Credits • Roelof Pieters, www.graph-technologies.com • Andrew Ng • http://colah.github.io/ 95

No public clipboards found for this slide

Be the first to comment