SlideShare a Scribd company logo
1 of 44
Recurrent Neural
Networks (RNN)
Giuseppe Attardi
Some slides from Arun Mallya
 RNNs are called recurrent because they perform
the same task for every element of a sequence,
with the output depending on the previous
values.
 RNNs have a “memory” which captures
information about what has been calculated so
far.
 In theory RNNs can make use of information
in arbitrarily long sequences.
Recurrent Neural Network
 The hidden state st represents the memory of the
network.
 st captures information about what happened in all
the previous time steps.
 The output ot is calculated solely based on the
memory at time t.
 st typically can’t capture information from too many
time steps ago.
 Unlike a traditional deep neural network, which uses
different parameters at each layer, a RNN shares the
same parameters (U, V, W above) across all steps.
 This reflects the fact that we are performing the same
task at each step, just with different inputs. This
greatly reduces the total number of parameters we
need to learn.
 RNN Advantages:
 Can process any length input
 Model size doesn’t increase for longer input
 Computation for step t can (in theory) use information
from many steps back
 Weights are shared across timesteps →
representations are shared
 RNN Disadvantages:
 Recurrent computation is slow
 In practice, difficult to access information from many
steps back
Slide by Richard Socher
How do RNN reduce
complexity?
 Given function f: h’, y = f(h, x)
f
h0 h1
y1
x1
f h2
y2
x2
f h3
y3
x3
……
No matter how long the input/output sequence is, we only
need one function f. If f’s are different, then it becomes a
feedforward NN.
h and h’ are vectors with
the same dimension
Deep RNN
RNNs with multiple hidden layers
x1 x2 x3 x4 x5 x6
y1 y2 y3 y4 y5 y6
Slide by Arun Mallya
Bidirectional RNN
 RNNs can process the input sequence in both forward
and in reverse direction
x1 x2 x3 x4 x5 x6
y1 y2 y3 y4 y5 y6
 Popular in speech recognition
Pyramid RNN
 Reducing the number of time steps
W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural
network for large vocabulary conversational speech recognition,” ICASSP, 2016
Bidirectional
RNN
Significantly speed up training
Vanilla RNN
f
h h'
y
x
h'
y Wo
Wh
h’
Wi
x
h
Note, y is computed
from h’
= s( + + b)
= s( )
Vanilla RNN
Unfolding the network in time
ht = s(Wih xt + U ht1 + b)
yt = s(W0 ht)
RNN Training
• Reliable and controlled convergence
• Supported by most of MLframeworks
Evolutionary methods, expectation maximization, non-
parametric methods, particle swarm optimization
Methods:
Backpropagation:
Research
Tasks:
• For each timestamp of the input sequence x predict output y(synchronously)
• For the input sequence x predict the scalar value of y (e.g., at end ofsequence)
• For the input sequence x of length Lx generatethe
output sequence y of different length Ly
RNN Training
Target: obtain the network parameters that optimize the cost function
Cost functions: log loss, mean squared root error etc…
1. Unfold the network.
2. Repeat for the train data:
1. Given some input sequence 𝒙
2. For t in 0, N-1:
1. Forward-propagate
2. Initialize hidden state to the past value 𝒉𝑡−1
3. Obtain output sequence 𝒚
4. Calculate error E(y, y)
5. Back-propagate error across the unfolded network
6. Average the weights
7. Compute next hidden state value 𝒉𝑡
Back-propagation Through Time
ht = s(Wih xt + U ht1 + b)
yt = s(W0 ht)
Cross Entropy Loss: 𝐸 𝑦, 𝑦 = −
𝑡
𝑦 log 𝑦𝑡
Apply chain rule:
𝜽 - Network parameters
For time 2:
𝜕ℎ2
𝜕ℎ0
=
𝜕ℎ2
𝜕ℎ1
𝜕ℎ1
𝜕ℎ0
Back-propagation Through Time
𝜕𝐸2
𝜕𝜃
=
𝑘=0
2
𝜕𝐸2
𝜕𝑦2
⋅
𝜕𝑦2
𝜕ℎ0
⋅
𝜕ℎ2
𝜕ℎ𝑘
⋅
𝜕ℎ𝑘
𝜕𝜃
Timestep
0
hinit
y0
x0
E0
ŷ0
Timestep
1
h0
y1
x1
E1
ŷ1
Timestep
2
h1
y0
x2
E2
ŷ2
h2
Back-propagation Through Time
Timestep
0
hinit
y0
x0
E0
ŷ0
Timestep
1
h0
y1
x1
E1
ŷ1
Timestep
2
h1
y0
x2
E2
ŷ2
h2
𝜕𝐸0
𝜕ℎ0
𝜕𝐸1
𝜕ℎ1
𝜕𝐸2
𝜕ℎ2
𝜕ℎ2
𝜕ℎ1
𝜕ℎ1
𝜕ℎ0
Saturation
Gradient
close to 0
Saturated neurons gradients → 0
Drive previous layers gradients to 0
(especially for far time-stamps)
Known problem for deep feed-forward networks.
For recurrent networks (even shallow) makes impossible to learn long-term dependencies!
• Smaller weigh parameters leads to faster gradients vanishing.
• Very big initial parameters make the gradient descent to diverge fast (explode).
𝝏𝒉𝑡
𝝏𝒉0
=
𝝏𝒉𝑡
𝝏𝒉𝑡
−1
∙⋯∙
𝝏𝒉3 𝝏𝒉2
∙ ∙
𝝏𝒉1
𝝏𝒉2 𝝏𝒉1 𝝏𝒉0
• Decays exponentially
• Network stops learning, can’t update
• Impossible to learn correlations
• between temporally distant events
Problem: vanishing gradients
Network can not converge and
weigh parameters do not stabilize
Large increase in the norm of the gradient
during training
Diagnostics: NaNs; Cost function large fluctuations
Pascanou R. et al, On the difficulty oftraining
recurrent neural networks. arXiv (2012)
Problem: exploding gradients
Solutions:
• Use gradient clipping
• Try reduce learning rate
• Change loss function by setting constrains on weights (L1/L2norms)
Problems with Vanilla RNN
 When dealing with a time series, it tends to forget
old information. When there is a distant
relationship of unknown length, we wish to have
a “memory” to it.
 Vanishing gradient problem.
LSTM
The sigmoid layer outputs numbers between 0-1 determine how much
each component should be let through. Pink X gate is point-wise multiplication.
LSTM
The core idea is this cell
state Ct, it is changed
slowly, with only minor
linear interactions. It is very
easy for information to flow
along it unchanged.
ht-1
Ct-1
This sigmoid gate
determines how much
information goes thru
This decides what info
Is to add to the cell state
Output gate
Controls what
goes into output
Forget input
gate gate
Why sigmoid or tanh:
Sigmoid: 0,1 gating as switch.
Vanishing gradient problem in
LSTM is handled already.
ReLU replaces tanh ok?
it decides what component
is to be updated.
C’t provides change contents
Updating the cell state
Decide what part of the cell
state to output
RNN vs LSTM
LSTM – Forward/Backward
Explore the equations for LSTM forward and backward
computations:
Illustrated LSTM Forward and Backward Pass
Information Flow in a LSTM
xt
z
zi
zf zo
ht-1
ct-1
z
xt
ht-1
W
zi
xt
ht-1
Wi
zf
xt
ht-1
Wf
zo
xt
ht-1
Wo
= σ( )
= σ( )
= σ( )
Controls
forget gate
Controls
input gate
Updating
information
Controls
Output gate
These 4 matrix
computation can
be done
concurrently.
= tanh( )
ht
xt
z
zi
zf zo
yt
ht-1
ct-1 ct
tanh
ct = zf  ct-1 + ziz
ht = zo  tanh(ct)
yt = σ(W’ht)
Information flow of LSTM
Element-wise multiply
LSTM information flow
xt
z
zi
zf zo
yt
ht-1
ct-1 ct
xt+1
z
zi
zf zo
yt+1
ht
ct+1
tanh tanh
ht+1
GRU – Gated Recurrent Unit
It combines the forget and input into a single update gate.
It also merges the cell state and hidden state. This is simpler
than LSTM. There are many other variants too.
reset gate
X,*: element-wise multiply
Update gate
Both take xt and ht-1 as inputs and compute ht to pass along.
GRUs don't need the cell layer to pass values along.
The calculations within each iteration ensure that the ht values
being passed along either retain a high amount of old information
or are jump-started with a high amount of new information.
Feed-forward vs Recurrent
Network
x f1 a1
f2 a2
f3 a3
f4 y
x1
h0
f h1
x2
f
x3
h2
f
x4
h3 f g y4
t is layer
t is time step
1. Feedforward network does not have input at each step
2. Feedforward network has different parameters for each layer
at = ft(at-1) = σ(Wtat-1 + bt)
at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)
Applications of LSTM / RNN
Neural machine translation
LSTM
Sequence to sequence chat
model
Chat with context
U: Hi
U: Hi
M: Hi
M: Hello
M: Hi
M: Hello
Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle
Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical
Neural Network Models.
Baidu’s speech recognition using
RNN
Attention Mechanism
Image caption generation using attention
(From CY Lee lecture)
filter filter filter
filter filter filter
match 0.7
CNN
filter filter filter
filter filter filter
z0
A vector for
each region
z0 is initial parameter, it is also learned
Image Caption Generation
filter filter filter
filter filter filter
CNN
filter filter filter
filter filter filter
A vector for
each region
0.7 0.1 0.1
0.1 0.0 0.0
weighted
sum
z1
Word 1
z0
Attention to
a region
Image Caption Generation
filter filter filter
filter filter filter
CNN
filter filter filter
filter filter filter
A vector for
each region
z0
0.0 0.8 0.2
0.0 0.0 0.0
weighted
sum
z1
Word 1
z2
Word 2
Image Caption Generation
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Image Caption Generation
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron
Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Image Caption Generation
Deep Visual-Semantic Alignments for Generating Image Descriptions.
Source: http://cs.stanford.edu/people/karpathy/deepimagesent/Training
RNNs
Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo
Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV,
2015

More Related Content

Similar to 14889574 dl ml RNN Deeplearning MMMm.ppt

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Universitat Politècnica de Catalunya
 
Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs)Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs)Abdullah al Mamun
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesAbhijitVenkatesh1
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)sohaib_alam
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningRADO7900
 
Temporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationTemporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationAbel Nyamapfene
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxSagarTekwani4
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 

Similar to 14889574 dl ml RNN Deeplearning MMMm.ppt (20)

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
 
Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs)Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs)
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)Recurrent and Recursive Nets (part 2)
Recurrent and Recursive Nets (part 2)
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
Temporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationTemporal Hypermap Theory and Application
Temporal Hypermap Theory and Application
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 

More from ManiMaran230751

one shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSone shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSManiMaran230751
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptManiMaran230751
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptManiMaran230751
 
Reinforcemnet Leaning in ML and DL.pptx
Reinforcemnet Leaning in ML and  DL.pptxReinforcemnet Leaning in ML and  DL.pptx
Reinforcemnet Leaning in ML and DL.pptxManiMaran230751
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
The Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptxThe Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptxManiMaran230751
 
Open Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptxOpen Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptxManiMaran230751
 

More from ManiMaran230751 (9)

one shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSone shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DS
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
 
Reinforcemnet Leaning in ML and DL.pptx
Reinforcemnet Leaning in ML and  DL.pptxReinforcemnet Leaning in ML and  DL.pptx
Reinforcemnet Leaning in ML and DL.pptx
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
The Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptxThe Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptx
 
Open Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptxOpen Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptx
 
Fundamentals of RL.pptx
Fundamentals of RL.pptxFundamentals of RL.pptx
Fundamentals of RL.pptx
 
Acoustic Model.pptx
Acoustic Model.pptxAcoustic Model.pptx
Acoustic Model.pptx
 

Recently uploaded

call girls in Harsh Vihar (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Harsh Vihar (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Harsh Vihar (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Harsh Vihar (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP Call Girls Service Bhagyanagar Hyderabad Call +91-8250192130
VIP Call Girls Service Bhagyanagar Hyderabad Call +91-8250192130VIP Call Girls Service Bhagyanagar Hyderabad Call +91-8250192130
VIP Call Girls Service Bhagyanagar Hyderabad Call +91-8250192130Suhani Kapoor
 
NATA 2024 SYLLABUS, full syllabus explained in detail
NATA 2024 SYLLABUS, full syllabus explained in detailNATA 2024 SYLLABUS, full syllabus explained in detail
NATA 2024 SYLLABUS, full syllabus explained in detailDesigntroIntroducing
 
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiVIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...Amil baba
 
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts ServiceCall Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Servicejennyeacort
 
ARt app | UX Case Study
ARt app | UX Case StudyARt app | UX Case Study
ARt app | UX Case StudySophia Viganò
 
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一diploma 1
 
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable BricksCosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricksabhishekparmar618
 
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造kbdhl05e
 
Architecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfArchitecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfSumit Lathwal
 
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full NightCall Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full Nightssuser7cb4ff
 
Call Girls Satellite 7397865700 Ridhima Hire Me Full Night
Call Girls Satellite 7397865700 Ridhima Hire Me Full NightCall Girls Satellite 7397865700 Ridhima Hire Me Full Night
Call Girls Satellite 7397865700 Ridhima Hire Me Full Nightssuser7cb4ff
 
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxUntitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxmapanig881
 
Call In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCR
Call In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCRCall In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCR
Call In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCRdollysharma2066
 
Call Girls Meghani Nagar 7397865700 Independent Call Girls
Call Girls Meghani Nagar 7397865700  Independent Call GirlsCall Girls Meghani Nagar 7397865700  Independent Call Girls
Call Girls Meghani Nagar 7397865700 Independent Call Girlsssuser7cb4ff
 
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`dajasot375
 
How to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our SiteHow to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our Sitegalleryaagency
 

Recently uploaded (20)

call girls in Harsh Vihar (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Harsh Vihar (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Harsh Vihar (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Harsh Vihar (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
 
VIP Call Girls Service Bhagyanagar Hyderabad Call +91-8250192130
VIP Call Girls Service Bhagyanagar Hyderabad Call +91-8250192130VIP Call Girls Service Bhagyanagar Hyderabad Call +91-8250192130
VIP Call Girls Service Bhagyanagar Hyderabad Call +91-8250192130
 
NATA 2024 SYLLABUS, full syllabus explained in detail
NATA 2024 SYLLABUS, full syllabus explained in detailNATA 2024 SYLLABUS, full syllabus explained in detail
NATA 2024 SYLLABUS, full syllabus explained in detail
 
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiVIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
 
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
 
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts ServiceCall Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
 
ARt app | UX Case Study
ARt app | UX Case StudyARt app | UX Case Study
ARt app | UX Case Study
 
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
 
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable BricksCosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricks
 
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造
 
Architecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfArchitecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdf
 
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full NightCall Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
 
Call Girls Satellite 7397865700 Ridhima Hire Me Full Night
Call Girls Satellite 7397865700 Ridhima Hire Me Full NightCall Girls Satellite 7397865700 Ridhima Hire Me Full Night
Call Girls Satellite 7397865700 Ridhima Hire Me Full Night
 
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxUntitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptx
 
Call In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCR
Call In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCRCall In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCR
Call In girls Bhikaji Cama Place 🔝 ⇛8377877756 FULL Enjoy Delhi NCR
 
Call Girls Meghani Nagar 7397865700 Independent Call Girls
Call Girls Meghani Nagar 7397865700  Independent Call GirlsCall Girls Meghani Nagar 7397865700  Independent Call Girls
Call Girls Meghani Nagar 7397865700 Independent Call Girls
 
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
 
How to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our SiteHow to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our Site
 

14889574 dl ml RNN Deeplearning MMMm.ppt

  • 1. Recurrent Neural Networks (RNN) Giuseppe Attardi Some slides from Arun Mallya
  • 2.  RNNs are called recurrent because they perform the same task for every element of a sequence, with the output depending on the previous values.  RNNs have a “memory” which captures information about what has been calculated so far.  In theory RNNs can make use of information in arbitrarily long sequences.
  • 4.  The hidden state st represents the memory of the network.  st captures information about what happened in all the previous time steps.  The output ot is calculated solely based on the memory at time t.  st typically can’t capture information from too many time steps ago.  Unlike a traditional deep neural network, which uses different parameters at each layer, a RNN shares the same parameters (U, V, W above) across all steps.  This reflects the fact that we are performing the same task at each step, just with different inputs. This greatly reduces the total number of parameters we need to learn.
  • 5.  RNN Advantages:  Can process any length input  Model size doesn’t increase for longer input  Computation for step t can (in theory) use information from many steps back  Weights are shared across timesteps → representations are shared  RNN Disadvantages:  Recurrent computation is slow  In practice, difficult to access information from many steps back Slide by Richard Socher
  • 6. How do RNN reduce complexity?  Given function f: h’, y = f(h, x) f h0 h1 y1 x1 f h2 y2 x2 f h3 y3 x3 …… No matter how long the input/output sequence is, we only need one function f. If f’s are different, then it becomes a feedforward NN. h and h’ are vectors with the same dimension
  • 7. Deep RNN RNNs with multiple hidden layers x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 Slide by Arun Mallya
  • 8. Bidirectional RNN  RNNs can process the input sequence in both forward and in reverse direction x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6  Popular in speech recognition
  • 9. Pyramid RNN  Reducing the number of time steps W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” ICASSP, 2016 Bidirectional RNN Significantly speed up training
  • 10. Vanilla RNN f h h' y x h' y Wo Wh h’ Wi x h Note, y is computed from h’ = s( + + b) = s( )
  • 11. Vanilla RNN Unfolding the network in time ht = s(Wih xt + U ht1 + b) yt = s(W0 ht)
  • 13. • Reliable and controlled convergence • Supported by most of MLframeworks Evolutionary methods, expectation maximization, non- parametric methods, particle swarm optimization Methods: Backpropagation: Research Tasks: • For each timestamp of the input sequence x predict output y(synchronously) • For the input sequence x predict the scalar value of y (e.g., at end ofsequence) • For the input sequence x of length Lx generatethe output sequence y of different length Ly RNN Training Target: obtain the network parameters that optimize the cost function Cost functions: log loss, mean squared root error etc…
  • 14. 1. Unfold the network. 2. Repeat for the train data: 1. Given some input sequence 𝒙 2. For t in 0, N-1: 1. Forward-propagate 2. Initialize hidden state to the past value 𝒉𝑡−1 3. Obtain output sequence 𝒚 4. Calculate error E(y, y) 5. Back-propagate error across the unfolded network 6. Average the weights 7. Compute next hidden state value 𝒉𝑡 Back-propagation Through Time ht = s(Wih xt + U ht1 + b) yt = s(W0 ht) Cross Entropy Loss: 𝐸 𝑦, 𝑦 = − 𝑡 𝑦 log 𝑦𝑡
  • 15. Apply chain rule: 𝜽 - Network parameters For time 2: 𝜕ℎ2 𝜕ℎ0 = 𝜕ℎ2 𝜕ℎ1 𝜕ℎ1 𝜕ℎ0 Back-propagation Through Time 𝜕𝐸2 𝜕𝜃 = 𝑘=0 2 𝜕𝐸2 𝜕𝑦2 ⋅ 𝜕𝑦2 𝜕ℎ0 ⋅ 𝜕ℎ2 𝜕ℎ𝑘 ⋅ 𝜕ℎ𝑘 𝜕𝜃 Timestep 0 hinit y0 x0 E0 ŷ0 Timestep 1 h0 y1 x1 E1 ŷ1 Timestep 2 h1 y0 x2 E2 ŷ2 h2
  • 17. Saturation Gradient close to 0 Saturated neurons gradients → 0 Drive previous layers gradients to 0 (especially for far time-stamps) Known problem for deep feed-forward networks. For recurrent networks (even shallow) makes impossible to learn long-term dependencies! • Smaller weigh parameters leads to faster gradients vanishing. • Very big initial parameters make the gradient descent to diverge fast (explode). 𝝏𝒉𝑡 𝝏𝒉0 = 𝝏𝒉𝑡 𝝏𝒉𝑡 −1 ∙⋯∙ 𝝏𝒉3 𝝏𝒉2 ∙ ∙ 𝝏𝒉1 𝝏𝒉2 𝝏𝒉1 𝝏𝒉0 • Decays exponentially • Network stops learning, can’t update • Impossible to learn correlations • between temporally distant events Problem: vanishing gradients
  • 18. Network can not converge and weigh parameters do not stabilize Large increase in the norm of the gradient during training Diagnostics: NaNs; Cost function large fluctuations Pascanou R. et al, On the difficulty oftraining recurrent neural networks. arXiv (2012) Problem: exploding gradients Solutions: • Use gradient clipping • Try reduce learning rate • Change loss function by setting constrains on weights (L1/L2norms)
  • 19. Problems with Vanilla RNN  When dealing with a time series, it tends to forget old information. When there is a distant relationship of unknown length, we wish to have a “memory” to it.  Vanishing gradient problem.
  • 20. LSTM
  • 21. The sigmoid layer outputs numbers between 0-1 determine how much each component should be let through. Pink X gate is point-wise multiplication.
  • 22. LSTM The core idea is this cell state Ct, it is changed slowly, with only minor linear interactions. It is very easy for information to flow along it unchanged. ht-1 Ct-1 This sigmoid gate determines how much information goes thru This decides what info Is to add to the cell state Output gate Controls what goes into output Forget input gate gate Why sigmoid or tanh: Sigmoid: 0,1 gating as switch. Vanishing gradient problem in LSTM is handled already. ReLU replaces tanh ok?
  • 23. it decides what component is to be updated. C’t provides change contents Updating the cell state Decide what part of the cell state to output
  • 25. LSTM – Forward/Backward Explore the equations for LSTM forward and backward computations: Illustrated LSTM Forward and Backward Pass
  • 26. Information Flow in a LSTM xt z zi zf zo ht-1 ct-1 z xt ht-1 W zi xt ht-1 Wi zf xt ht-1 Wf zo xt ht-1 Wo = σ( ) = σ( ) = σ( ) Controls forget gate Controls input gate Updating information Controls Output gate These 4 matrix computation can be done concurrently. = tanh( )
  • 27. ht xt z zi zf zo yt ht-1 ct-1 ct tanh ct = zf  ct-1 + ziz ht = zo  tanh(ct) yt = σ(W’ht) Information flow of LSTM Element-wise multiply
  • 28. LSTM information flow xt z zi zf zo yt ht-1 ct-1 ct xt+1 z zi zf zo yt+1 ht ct+1 tanh tanh ht+1
  • 29. GRU – Gated Recurrent Unit It combines the forget and input into a single update gate. It also merges the cell state and hidden state. This is simpler than LSTM. There are many other variants too. reset gate X,*: element-wise multiply Update gate
  • 30. Both take xt and ht-1 as inputs and compute ht to pass along. GRUs don't need the cell layer to pass values along. The calculations within each iteration ensure that the ht values being passed along either retain a high amount of old information or are jump-started with a high amount of new information.
  • 31. Feed-forward vs Recurrent Network x f1 a1 f2 a2 f3 a3 f4 y x1 h0 f h1 x2 f x3 h2 f x4 h3 f g y4 t is layer t is time step 1. Feedforward network does not have input at each step 2. Feedforward network has different parameters for each layer at = ft(at-1) = σ(Wtat-1 + bt) at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)
  • 34. Sequence to sequence chat model
  • 35. Chat with context U: Hi U: Hi M: Hi M: Hello M: Hi M: Hello Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models.
  • 38. Image caption generation using attention (From CY Lee lecture) filter filter filter filter filter filter match 0.7 CNN filter filter filter filter filter filter z0 A vector for each region z0 is initial parameter, it is also learned
  • 39. Image Caption Generation filter filter filter filter filter filter CNN filter filter filter filter filter filter A vector for each region 0.7 0.1 0.1 0.1 0.0 0.0 weighted sum z1 Word 1 z0 Attention to a region
  • 40. Image Caption Generation filter filter filter filter filter filter CNN filter filter filter filter filter filter A vector for each region z0 0.0 0.8 0.2 0.0 0.0 0.0 weighted sum z1 Word 1 z2 Word 2
  • 41. Image Caption Generation Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015
  • 42. Image Caption Generation Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML, 2015
  • 43. Image Caption Generation Deep Visual-Semantic Alignments for Generating Image Descriptions. Source: http://cs.stanford.edu/people/karpathy/deepimagesent/Training RNNs
  • 44. Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV, 2015

Editor's Notes

  1. Caption generation story like How to do story like !!!!!!!!!!!!!!!!!! http://www.cs.toronto.edu/~mbweb/ https://github.com/ryankiros/neural-storyteller
  2. 滑板相關詞 skateboarding 查看全部 skateboard  1 Dr.eye譯典通 KK [ˋsket͵bord] DJ [ˋskeitbɔ:d]  
  3. Another application is summarization