Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
[course site]
Day 2 Lecture 2
Recurrent Neural
Networks I
Santiago Pascual
2
Outline
1. The importance of context
2. Where is the memory?
3. Vanilla RNN
4. Problems
5. Gating methodology
a. LSTM
b....
3
The importance of context
● Recall the 5th digit of your phone number
● Sing your favourite song beginning at third sent...
Recall from Day 1 Lecture 2…
4
FeedFORWARD
Every y/hi is computed from the
sequence of forward activations out of
input x.
5
Where is the Memory?
If we have a sequence of samples...
predict sample x[t+1] knowing previous values {x[t], x[t-1], x[...
6
Where is the Memory?
Feed Forward approach:
● static window of size L
● slide the window time-step wise
...
...
...
x[t+...
7
Where is the Memory?
Feed Forward approach:
● static window of size L
● slide the window time-step wise
...
...
...
x[t+...
8
Where is the Memory?
Feed Forward approach:
● static window of size L
● slide the window time-step wise
x[t+3]
L
...
......
9
Where is the Memory?
...
...
...
x1, x2, …, xL
Problems for the feed forward + static window approach:
● What’s the matt...
10
Recurrent Neural Network
Solution: Build specific connections capturing the temporal evolution → Shared weights
in time...
11
Solution: Build specific connections capturing the temporal evolution → Shared weights
in time
● Give volatile memory t...
Recurrent Neural Network
time
time
Rotation
90o
Front View Side View
Rotation
90o
Slide credit: Xavi Giro
Recurrent Neural Network
Hence we have two data flows: Forward in space + time propagation: 2 projections per
layer activa...
Recurrent Neural Network
Hence we have two data flows: Forward in space + time propagation: 2 projections per
layer activa...
Recurrent Neural Network
Hence we have two data flows: Forward in space + time propagation: 2 projections per
layer activa...
Recurrent Neural Network
Hence we have two data flows: Forward in space + time propagation: 2 projections per
layer activa...
Recurrent Neural Network
Hence we have two data flows: Forward in space + time propagation: 2 projections per
layer activa...
18
Bidirectional RNN (BRNN)
Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks”
Must learn weights...
Recurrent Neural Network
Back Propagation Through Time (BPTT): The training method has to take into account the
time opera...
Recurrent Neural Network
Main problems:
● Long-term memory (remembering quite far time-steps) vanishes quickly because of
...
Gating method
Solutions:
1. Change the way in which past information is kept → create the notion of cell state, a
memory u...
22
Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no.
8 (1997): 1735-1780.
Long...
Long Short Term Memory (LSTM) cell
An LSTM cell is defined by two groups of neurons plus the cell state (memory unit):
1. ...
24
Long Short-Term Memory (LSTM)
Three gates are governed by sigmoid units (btw [0,1]) define
the control of in & out info...
25
Long Short-Term Memory (LSTM)
Forget Gate:
Concatenate
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / ...
26
Long Short-Term Memory (LSTM)
Input Gate Layer
New contribution to cell state
Classic neuron
Figure: Cristopher Olah, “...
27
Long Short-Term Memory (LSTM)
Update Cell State (memory):
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)...
28
Long Short-Term Memory (LSTM)
Output Gate Layer
Output to next layer
Figure: Cristopher Olah, “Understanding LSTM Netwo...
29
Long Short-Term Memory (LSTM)
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
Long Short Term Memory (LSTM) cell
An LSTM cell is defined by two groups of neurons plus the cell state (memory unit):
1. ...
31
Gated Recurrent Unit (GRU)
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol...
32
Gated Recurrent Unit (GRU)
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol...
33Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
Gated Recurrent Unit (GRU)
34
Thanks ! Q&A ?
@Santty128
Upcoming SlideShare
Loading in …5
×

Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2017)

1,206 views

Published on

https://telecombcn-dl.github.io/2017-dlsl/

Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.

The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.

Published in: Data & Analytics
  • Be the first to comment

Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2017)

  1. 1. [course site] Day 2 Lecture 2 Recurrent Neural Networks I Santiago Pascual
  2. 2. 2 Outline 1. The importance of context 2. Where is the memory? 3. Vanilla RNN 4. Problems 5. Gating methodology a. LSTM b. GRU
  3. 3. 3 The importance of context ● Recall the 5th digit of your phone number ● Sing your favourite song beginning at third sentence ● Recall 10th character of the alphabet Probably you went straight from the beginning of the stream in each case… because in sequences order matters! Idea: retain the information preserving the importance of order
  4. 4. Recall from Day 1 Lecture 2… 4 FeedFORWARD Every y/hi is computed from the sequence of forward activations out of input x.
  5. 5. 5 Where is the Memory? If we have a sequence of samples... predict sample x[t+1] knowing previous values {x[t], x[t-1], x[t-2], …, x[t-τ]}
  6. 6. 6 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] x[t+1] L
  7. 7. 7 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise ... ... ... x[t+2] x[t-L+1], …, x[t], x[t+1] ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] x[t+2] L
  8. 8. 8 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise x[t+3] L ... ... ... x[t+3] x[t-L+2], …, x[t+1], x[t+2] ... ... ... x[t+2] x[t-L+1], …, x[t], x[t+1] ... ... ... x[t+1] x[t-L], …, x[t-1], x[t]
  9. 9. 9 Where is the Memory? ... ... ... x1, x2, …, xL Problems for the feed forward + static window approach: ● What’s the matter increasing L? → Fast growth of num of parameters! ● Decisions are independent between time-steps! ○ The network doesn’t care about what happened at previous time-step, only present window matters → doesn’t look good ● Cumbersome padding when there are not enough samples to fill L size ○ Can’t work with variable sequence lengths x1, x2, …, xL, …, x2L ... ... x1, x2, …, xL, …, x2L, …, x3L ... ... ... ...
  10. 10. 10 Recurrent Neural Network Solution: Build specific connections capturing the temporal evolution → Shared weights in time ● Give volatile memory to the network Feed-Forward Fully-Connected Figure credit: Xavi Giro
  11. 11. 11 Solution: Build specific connections capturing the temporal evolution → Shared weights in time ● Give volatile memory to the network Feed-Forward Fully-Connected Figure credit: Xavi Giro Fully connected in time: recurrent matrix Recurrent Neural Network
  12. 12. Recurrent Neural Network time time Rotation 90o Front View Side View Rotation 90o Slide credit: Xavi Giro
  13. 13. Recurrent Neural Network Hence we have two data flows: Forward in space + time propagation: 2 projections per layer activation. BEWARE: We have extra depth now! Every time-step is an extra level of depth (as a deeper stack of layers in a feed-forward fashion!)
  14. 14. Recurrent Neural Network Hence we have two data flows: Forward in space + time propagation: 2 projections per layer activation ○ Last time-step includes the context of our decisions recursively
  15. 15. Recurrent Neural Network Hence we have two data flows: Forward in space + time propagation: 2 projections per layer activation ○ Last time-step includes the context of our decisions recursively
  16. 16. Recurrent Neural Network Hence we have two data flows: Forward in space + time propagation: 2 projections per layer activation ○ Last time-step includes the context of our decisions recursively
  17. 17. Recurrent Neural Network Hence we have two data flows: Forward in space + time propagation: 2 projections per layer activation ○ Last time-step includes the context of our decisions recursively
  18. 18. 18 Bidirectional RNN (BRNN) Alex Graves, “Supervised Sequence Labelling with Recurrent Neural Networks” Must learn weights w2, w3, w4 & w5; in addition to w1 & w6. slide credit: Xavi Giro
  19. 19. Recurrent Neural Network Back Propagation Through Time (BPTT): The training method has to take into account the time operations → a cost function E is defined to train our RNN, and in this case the total error at the output of the network is the sum of the errors at each time-step: T: max amount of time-steps to do back-prop. In Keras this is specified when defining the “input shape” to the RNN layer, by means of: (batch size, sequence length (T), input_dim)
  20. 20. Recurrent Neural Network Main problems: ● Long-term memory (remembering quite far time-steps) vanishes quickly because of the recursive operation with U ● During training gradients explode/vanish easily because of depth-in-time → Exploding/Vanishing gradients!
  21. 21. Gating method Solutions: 1. Change the way in which past information is kept → create the notion of cell state, a memory unit that keeps long-term information in a safer way by protecting it from recursive operations 2. Make every RNN unit able to decide whether the current time-step information matters or not, to accept or discard (optimized reading mechanism) 3. Make every RNN unit able to forget whatever may not be useful anymore by clearing that info from the cell state (optimized clearing mechanism) 4. Make every RNN unit able to output the decisions whenever it is ready to do so (optimized output mechanism)
  22. 22. 22 Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780. Long Short-Term Memory (LSTM) slide credit: Xavi Giro
  23. 23. Long Short Term Memory (LSTM) cell An LSTM cell is defined by two groups of neurons plus the cell state (memory unit): 1. Gates 2. Activation units 3. Cell state Computation Flow
  24. 24. 24 Long Short-Term Memory (LSTM) Three gates are governed by sigmoid units (btw [0,1]) define the control of in & out information.. Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) slide credit: Xavi Giro
  25. 25. 25 Long Short-Term Memory (LSTM) Forget Gate: Concatenate Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
  26. 26. 26 Long Short-Term Memory (LSTM) Input Gate Layer New contribution to cell state Classic neuron Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
  27. 27. 27 Long Short-Term Memory (LSTM) Update Cell State (memory): Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
  28. 28. 28 Long Short-Term Memory (LSTM) Output Gate Layer Output to next layer Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
  29. 29. 29 Long Short-Term Memory (LSTM) Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
  30. 30. Long Short Term Memory (LSTM) cell An LSTM cell is defined by two groups of neurons plus the cell state (memory unit): 1. Gates 2. Activation units 3. Cell state 3 gates + input activation
  31. 31. 31 Gated Recurrent Unit (GRU) Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014. Similar performance as LSTM with less computation. slide credit: Xavi Giro
  32. 32. 32 Gated Recurrent Unit (GRU) Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014. Similar performance as LSTM with less computation. slide credit: Xavi Giro
  33. 33. 33Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Gated Recurrent Unit (GRU)
  34. 34. 34 Thanks ! Q&A ? @Santty128

×