SlideShare a Scribd company logo
Long short-term memory
Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term
memory." Neural computation 9.8 (1997): 1735-1780.
01
Long Short-Term Memory (LSTM)
Olivia Ni
• Recurrent Neural Networks (RNN)
• The Problem of Long-Term Dependencies
• LSTM Networks
• The Core Idea Behind LSTMs
• Step-by-Step LSTM Walk Through
• Variants on LSTMs
• Conclusions & References
• Appendix (BPTT  Gradient Exploding/ Vanishing)
02
Outline
• Idea:
• condition the neural network on all previous information and tie the weights
at each time step
• Assumption: temporal information matters (i.e. time series data)
03
Recurrent Neural Networks (RNN)
RNN RNNRNN
𝐼𝑛𝑝𝑢𝑡 𝑡
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡
𝑆𝑇𝑀 𝑡−1 𝑆𝑇𝑀 𝑡
𝐼𝑛𝑝𝑢𝑡 𝑡−1
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡−1
𝐼𝑛𝑝𝑢𝑡 𝑡+1
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡+1
𝑆𝑇𝑀 𝑡−2 𝑆𝑇𝑀 𝑡+1
• STM = Short-term memory
• RNN Definition:
• Model Training:
• All model parameters 𝜃 = 𝑈, 𝑉, 𝑊 can be updated by gradient descent
04
Recurrent Neural Networks (RNN)
𝑆𝑡 = 𝜎 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1
𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡
𝜃 𝑖+1
← 𝜃 𝑖
− 𝜂𝛻𝜃 𝐶 𝜃 𝑖
• Example: (Consider trying to predict the last word in the text)
• Issue: in theory, RNNs can handle such “long-term dependencies,” but
they cannot in practice!!
“The clouds are in the sky.”
“I grew up in France… I speak fluent French.”
05
The Problem of Long-Term Dependencies
• RNN Training Issue:
(1) The gradient is a product of Jacobian matrices, each associated with a step
in the forward computation
(2) Multiply the same matrix at each time step during BPTT
• The gradient becomes very small or very large quickly
• Vanishing or Exploding gradient
• The error surface is either very flat or very steep
06
The Problem of Long-Term Dependencies
• Possible Solutions:
• Gradient Exploding:
• Clipping (https://arxiv.org/abs/1211.5063?context=cs)
• Gradient Vanishing:
• Better Initialization (https://arxiv.org/abs/1504.00941)
• Gating Mechanism (LSTM, GRU, …, etc.)
• Attention Mechanism (https://arxiv.org/pdf/1706.03762.pdf)
07
The Problem of Long-Term Dependencies
08
LSTM Networks – The Core Idea Behind LSTMs
RNN RNNRNN
𝐼𝑛𝑝𝑢𝑡 𝑡
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡
𝑆𝑇𝑀 𝑡−1 𝑆𝑇𝑀 𝑡
𝐼𝑛𝑝𝑢𝑡 𝑡−1
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡−1
𝐼𝑛𝑝𝑢𝑡 𝑡+1
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡+1
𝑆𝑇𝑀 𝑡−2 𝑆𝑇𝑀 𝑡+1
• STM = Short-term memory
• LSTM = Long Short-term memory
LSTM LSTMLSTM
𝐼𝑛𝑝𝑢𝑡 𝑡
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡
𝑆𝑇𝑀 𝑡−1 𝑆𝑇𝑀 𝑡
𝐼𝑛𝑝𝑢𝑡 𝑡−1
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡−1
𝐼𝑛𝑝𝑢𝑡 𝑡+1
𝑂𝑢𝑡𝑝𝑢𝑡 𝑡+1
𝑆𝑇𝑀 𝑡−2 𝑆𝑇𝑀 𝑡+1
𝐿𝑆𝑇𝑀 𝑡−1 𝐿𝑆𝑇𝑀 𝑡
𝐿𝑆𝑇𝑀 𝑡−2 𝐿𝑆𝑇𝑀 𝑡+1
09
LSTM Networks – The Core Idea Behind LSTMs
• STM = Short-term memory
• LSTM = Long Short-term memory
𝐿𝑆𝑇𝑀
𝑆𝑇𝑀
𝑆𝑇𝑀
10
LSTM Networks – Step-by-Step LSTM Walk Through (0/4)
• The cell state runs straight down
the entire chain, with only some
minor linear interactions.
 Easy for information to flow
along it unchanged
• The LSTM does have the ability
to remove or add information to
the cell state, carefully regulated
by structures called gates.
11
LSTM Networks – Step-by-Step LSTM Walk Through (1/4)
• Forget gate (sigmoid + pointwise
multiplication operation):
decides what information we’re
going to throw away from the
cell state
• 1: ‘’Complete keep this”
• 0: “Complete get rid of this”
12
LSTM Networks – Step-by-Step LSTM Walk Through (2/4)
• Input gate (sigmoid + pointwise
multiplication operation):
decides what new information
we’re going to store in the cell
state
Vanilla RNN
13
LSTM Networks – Step-by-Step LSTM Walk Through (3/4)
• Cell state update: forgets the
things we decided to forget
earlier and add the new
candidate values, scaled by how
much we decided to update
• 𝑓𝑡: decide which to forget
• 𝑖 𝑡: decide which to update
⟹ 𝐶𝑡 has been updated at timestamp 𝑡, which change slowly!
14
LSTM Networks – Step-by-Step LSTM Walk Through (4/4)
• Output gate (sigmoid + pointwise
multiplication operation):
decides what new information
we’re going to output
⟹ ℎ 𝑡 has been updated at timestamp 𝑡, which change faster!
15
LSTM Networks – Variants on LSTMs (1/3)
• LSTM with Peephole Connections
• Idea: allow gate layers to look at
the cell state
16
LSTM Networks – Variants on LSTMs (2/3)
• LSTM with Coupled Forget/ Input
Gate
• Idea: we only forget when we’re
going to input something in its
place, and vice versa.
17
LSTM Networks – Variants on LSTMs (3/3)
• Gated Recurrent Unit (GRU)
• Idea:
• combine the forget and input gates
into a single “update gate”
• merge the cell state and hidden state
Update gate:
Reset gate:
State Candidate:
Current State:
Explain by
- Backpropagation
Through Time (BPTT)
RNN Training Issue:
- Gradient Vanishing
- Gradient Exploding
Review
- Backpropagation (BP)
18
Appendix – The Problem of Long-Term Dependencies
𝜕 𝐶 𝜃
𝜕 𝑤𝑖𝑗
𝑙
𝜕 𝐶 𝜃
𝜕 𝑤𝑖𝑗
𝑙
• Gradient Descent for Neural Networks
• Computing the gradient includes millions of parameters.
• To compute it efficiently, we use backpropagation.
• Compute the gradient based on two pre-computed terms from forward and backward pass.
19
Appendix – The Problem of Long-Term Dependencies
𝜕 𝐶 𝜃
𝜕 𝑤𝑖𝑗
𝑙
BPTT
BP
𝜕 𝐶 𝜃
𝜕 𝑤𝑖𝑗
𝑙
=
𝜕 𝐶 𝜃
𝜕 𝑧𝑖
𝑙
𝜕 𝑧𝑖
𝑙
𝜕 𝑤𝑖𝑗
𝑙
• WLOG, we use 𝑤𝑖𝑗
𝑙
to demonstrate
• Forward pass:
20
Appendix – The Problem of Long-Term Dependencies
𝜕 𝑧𝑖
𝑙
𝜕 𝑤𝑖𝑗
𝑙
= ൝
𝑥𝑗 , 𝑖𝑓 𝑙 = 1
𝑎𝑗
𝑙−1
, 𝑖𝑓 𝑙 > 1
(𝑙 = 1) (𝑙 > 1)
BPTT
BP
𝜕 𝐶 𝜃
𝜕 𝑤𝑖𝑗
𝑙
=
𝜕 𝐶 𝜃
𝜕 𝑧𝑖
𝑙
𝜕 𝑧𝑖
𝑙
𝜕 𝑤𝑖𝑗
𝑙
• WLOG, we use 𝑤𝑖𝑗
𝑙
to demonstrate
• Backward pass :
21
Appendix – The Problem of Long-Term Dependencies
(𝑙 = 𝐿) (𝑙 < 𝐿)
𝛿𝑖
𝐿
=
𝜕 𝐶 𝜃
𝜕 𝑧𝑖
𝐿
=
𝜕 𝐶 𝜃
𝜕 𝑦𝑖
𝜕 𝑦𝑖
𝜕 𝑧𝑖
𝐿
=
𝜕 𝐶 𝜃
𝜕 𝑦𝑖
𝜕𝑎𝑖
𝐿
𝜕 𝑧𝑖
𝐿
=
𝜕 𝐶 𝜃
𝜕 𝑦𝑖
𝜕σ 𝑧𝑖
𝐿
𝜕 𝑧𝑖
𝐿
=
𝜕 𝐶 𝜃
𝜕 𝑦𝑖
σ′ 𝑧𝑖
𝐿
BPTT
BP
𝛿𝑖
𝑙
=
𝜕 𝐶 𝜃
𝜕 𝑧𝑖
𝑙
= ෍
𝑘
𝜕 𝐶 𝜃
𝜕 𝑧 𝑘
𝑙+1
𝜕 𝑧 𝑘
𝑙+1
𝜕𝑎𝑖
𝐿
𝜕𝑎𝑖
𝐿
𝜕 𝑧𝑖
𝑙
=
𝜕𝑎𝑖
𝐿
𝜕 𝑧𝑖
𝑙
෍
𝑘
𝜕 𝐶 𝜃
𝜕 𝑧 𝑘
𝑙+1
𝜕 𝑧 𝑘
𝑙+1
𝜕𝑎𝑖
𝐿
=
𝜕𝑎𝑖
𝐿
𝜕 𝑧𝑖
𝑙
෍
𝑘
𝛿𝑖
𝑙+1 𝜕 𝑧 𝑘
𝑙+1
𝜕𝑎𝑖
𝐿
= σ′ 𝑧𝑖
𝑙
෍
𝑘
𝛿𝑖
𝑙+1
𝑤 𝑘𝑖
𝑙+1
𝜕 𝐶 𝜃
𝜕 𝑧𝑖
𝑙 ≜ 𝛿𝑖
𝑙
=
σ′ 𝑧𝑖
𝐿 𝜕 𝐶 𝜃
𝜕 𝑦𝑖
, 𝑖𝑓 𝑙 = 𝐿
σ′ 𝑧𝑖
𝑙
෍
𝑘
𝛿𝑖
𝑙+1
𝑤 𝑘𝑖
𝑙+1
, 𝑖𝑓 𝑙 < 𝐿
𝜕 𝐶 𝜃
𝜕 𝑤𝑖𝑗
𝑙
=
𝜕 𝐶 𝜃
𝜕 𝑧𝑖
𝑙
𝜕 𝑧𝑖
𝑙
𝜕 𝑤𝑖𝑗
𝑙
• WLOG, we use 𝑤𝑖𝑗
𝑙
to demonstrate
• Backward pass :
22
Appendix – The Problem of Long-Term Dependencies
BPTT
BP
𝜕 𝐶 𝜃
𝜕 𝑧𝑖
𝑙 ≜ 𝛿𝑖
𝑙
=
σ′ 𝑧𝑖
𝐿 𝜕 𝐶 𝜃
𝜕 𝑦𝑖
, 𝑖𝑓 𝑙 = 𝐿
σ′ 𝑧𝑖
𝑙
෍
𝑘
𝛿𝑖
𝑙+1
𝑤 𝑘𝑖
𝑙+1
, 𝑖𝑓 𝑙 < 𝐿
𝜕 𝐶 𝜃
𝜕 𝑤𝑖𝑗
𝑙
=
𝜕 𝐶 𝜃
𝜕 𝑧𝑖
𝑙
𝜕 𝑧𝑖
𝑙
𝜕 𝑤𝑖𝑗
𝑙
• Concluding Remarks for Backpropagation (BP)
23
Appendix – The Problem of Long-Term Dependencies
BPTT
BP
• Recap Recurrent Neuron Network (RNN) Architectures
• Model Training:
• All model parameters 𝜃 = 𝑈, 𝑉, 𝑊 can be updated by gradient descent
24
Appendix – The Problem of Long-Term Dependencies
BPTT
BP
𝑆𝑡 = 𝜎 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1
𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡
𝜃 𝑖+1
← 𝜃 𝑖
− 𝜂𝛻𝜃 𝐶 𝜃 𝑖
25
Appendix – The Problem of Long-Term Dependencies
BPTT
BP
𝑆𝑡 = 𝜎 𝑈𝑥 𝑡 + 𝑊𝑠𝑡−1
𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡
𝐹𝑜𝑟 𝜃 = 𝑈, 𝑉, 𝑊 , 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑡ℎ𝑒𝑚 𝑏𝑦 𝜃 𝑖+1
← 𝜃 𝑖
− 𝜂𝛻𝜃 𝐶 𝜃 𝑖
𝑾 𝑼 𝑽
𝑊(1)
⟵ 𝑊 1
−
𝜕𝐶 3
𝜃
𝜕𝑊 1
𝑊(2)
⟵ 𝑊 2
−
𝜕𝐶 3
𝜃
𝜕𝑊 2
𝑊(3)
⟵ 𝑊 3
−
𝜕𝐶 3
𝜃
𝜕𝑊 3
𝑈(1)
⟵ 𝑈 1
−
𝜕𝐶 3
𝜃
𝜕𝑈 1
𝑈(2)
⟵ 𝑈 2
−
𝜕𝐶 3
𝜃
𝜕𝑈 2
𝑈(3)
⟵ 𝑈 3
−
𝜕𝐶 3
𝜃
𝜕𝑈 3
𝑉(3)
⟵ 𝑉 3
−
𝜕𝐶 3
𝜃
𝜕𝑉 3
𝑊 ⟵ 𝑊 −
𝜕𝐶 3 𝜃
𝜕𝑊 1
−
𝜕𝐶 3 𝜃
𝜕𝑊 2
−
𝜕𝐶 3 𝜃
𝜕𝑊 3
𝑈 ⟵ 𝑈 −
𝜕𝐶 3 𝜃
𝜕𝑈 1
−
𝜕𝐶 3 𝜃
𝜕𝑈 2
−
𝜕𝐶 3 𝜃
𝜕𝑈 3
𝑉 ⟵ 𝑉 −
𝜕𝐶 3 𝜃
𝜕𝑉 3
𝐶 ≜ ෍
𝑡
𝐶 𝑡 , 𝑊𝐿𝑂𝐺, 𝑢𝑠𝑖𝑛𝑔 𝐶 3 𝑡𝑜 𝑒𝑥𝑝𝑙𝑎𝑖𝑛
Tie 𝜃
NO
Yes
26
Appendix – The Problem of Long-Term Dependencies
BPTT
BP
𝜕𝐶 3
𝜃
𝜕𝑊
=
𝜕𝐶 3
𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
𝜕𝑠3
𝜕𝑊
= ෍
𝑘=0
3 𝜕𝐶 3
𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
𝜕𝑠3
𝜕𝑠 𝑘
𝜕𝑠 𝑘
𝜕𝑊
= ෍
𝑘=0
3 𝜕𝐶 3
𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
ෑ
𝑗=𝑘+1
3 𝜕𝑠𝑗
𝜕𝑠𝑗−1
𝜕𝑠 𝑘
𝜕𝑊
𝑾 𝑼 𝑽
𝑊 ⟵ 𝑊 −
𝜕𝐶 3
𝜃
𝜕𝑊 1
−
𝜕𝐶 3
𝜃
𝜕𝑊 2
−
𝜕𝐶 3
𝜃
𝜕𝑊 3
𝑈 ⟵ 𝑈 −
𝜕𝐶 3
𝜃
𝜕𝑈 1
−
𝜕𝐶 3
𝜃
𝜕𝑈 2
−
𝜕𝐶 3
𝜃
𝜕𝑈 3
𝑉 ⟵ 𝑉 −
𝜕𝐶 3
𝜃
𝜕𝑉 3
Tie 𝜃
Yes
𝜕𝐶 3 𝜃
𝜕𝑈
=
𝜕𝐶 3 𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
𝜕𝑠3
𝜕𝑈
= ෍
𝑘=1
3 𝜕𝐶 3
𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
𝜕𝑠3
𝜕𝑠 𝑘
𝜕𝑠 𝑘
𝜕𝑈
= ෍
𝑘=1
3 𝜕𝐶 3 𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
ෑ
𝑗=𝑘+1
3 𝜕𝑠𝑗
𝜕𝑠𝑗−1
𝜕𝑠 𝑘
𝜕𝑈
𝜕𝐶 3 𝜃
𝜕𝑉
=
𝜕𝐶 3 𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑉
𝐶 ≜ ෍
𝑡
𝐶 𝑡 , 𝑊𝐿𝑂𝐺, 𝑢𝑠𝑖𝑛𝑔 𝐶 3 𝑡𝑜 𝑒𝑥𝑝𝑙𝑎𝑖𝑛
𝑆𝑡 = 𝜎 𝑈𝑥 𝑡 + 𝑊𝑠𝑡−1
𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡
𝐹𝑜𝑟 𝜃 = 𝑈, 𝑉, 𝑊 , 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑡ℎ𝑒𝑚 𝑏𝑦 𝜃 𝑖+1
← 𝜃 𝑖
− 𝜂𝛻𝜃 𝐶 𝜃 𝑖
27
Appendix – The Problem of Long-Term Dependencies
BPTT
BP
𝜕𝑠𝑗
𝜕𝑠 𝑘
= ෑ
𝑗=𝑘+1
3 𝜕𝑠𝑗
𝜕𝑠𝑗−1
= ෑ
𝑗=𝑘+1
3
𝑊 𝑇
𝑱𝒂𝒄𝒐𝒃𝒊𝒂𝒏_𝒎𝒂𝒕𝒓𝒊𝒙 𝜎′
𝑠𝑗−1
𝜕𝐶 3
𝜃
𝜕𝑊
=
𝜕𝐶 3
𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
𝜕𝑠3
𝜕𝑊
= ෍
𝑘=0
3 𝜕𝐶 3
𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
𝜕𝑠3
𝜕𝑠 𝑘
𝜕𝑠 𝑘
𝜕𝑊
= ෍
𝑘=0
3 𝜕𝐶 3
𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
ෑ
𝑗=𝑘+1
3 𝜕𝑠𝑗
𝜕𝑠𝑗−1
𝜕𝑠 𝑘
𝜕𝑊
𝜕𝐶 3 𝜃
𝜕𝑈
=
𝜕𝐶 3 𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
𝜕𝑠3
𝜕𝑈
= ෍
𝑘=1
3 𝜕𝐶 3
𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
𝜕𝑠3
𝜕𝑠 𝑘
𝜕𝑠 𝑘
𝜕𝑈
= ෍
𝑘=1
3 𝜕𝐶 3 𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑠3
ෑ
𝑗=𝑘+1
3 𝜕𝑠𝑗
𝜕𝑠𝑗−1
𝜕𝑠 𝑘
𝜕𝑈
𝜕𝐶 3 𝜃
𝜕𝑉
=
𝜕𝐶 3 𝜃
𝜕𝑜3
𝜕𝑜3
𝜕𝑉
𝑆𝑡 = 𝜎 𝑈𝑥 𝑡 + 𝑊𝑠𝑡−1
𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡
𝐹𝑜𝑟 𝜃 = 𝑈, 𝑉, 𝑊 , 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑡ℎ𝑒𝑚 𝑏𝑦 𝜃 𝑖+1
← 𝜃 𝑖
− 𝜂𝛻𝜃 𝐶 𝜃 𝑖
• Understand the difficulty of training recurrent neural networks
• Gradient Exploding
• Gradient Vanishing
• One possible solution for solving the gradient vanishing problem is
“Gating mechanism”, which is the key concept of LSTM
• LSTM can be “deep” if we stack multiple LSTM cells
• Extensions:
• Uni-directional v.s. Bi-directional
• One-to-one, One-to-many, Many-to-one, Many-to-Many (w/ or w/o Encoder-Decoder)
28
Conclusions
• Understanding LSTM Networks
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Prof. Hung-yi Lee Courses
https://www.youtube.com/watch?v=xCGidAeyS4M
https://www.youtube.com/watch?v=rTqmWlnwz_0
• On the difficulty of training recurrent neural networks
https://arxiv.org/abs/1211.5063
• UDACITY Courses: Intro to Deep Learning with PyTorch
https://classroom.udacity.com/courses/ud188
29
References
20
Thanks for your listening.

More Related Content

What's hot

Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
Shreshth Saxena
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
Khang Pham
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
Sungjoon Choi
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Knoldus Inc.
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
健程 杨
 
Precise LSTM Algorithm
Precise LSTM AlgorithmPrecise LSTM Algorithm
Precise LSTM Algorithm
YasutoTamura1
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
Time series predictions using LSTMs
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMs
Setu Chokshi
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
Yao-Chieh Hu
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Sharath TS
 
Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
Olusola Amusan
 

What's hot (20)

Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
Precise LSTM Algorithm
Precise LSTM AlgorithmPrecise LSTM Algorithm
Precise LSTM Algorithm
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Time series predictions using LSTMs
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMs
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
 

Similar to LSTM

Long Short-Term Memory
Long Short-Term MemoryLong Short-Term Memory
Long Short-Term Memory
milad abbasi
 
Long Short Term Memory LSTM
Long Short Term Memory LSTMLong Short Term Memory LSTM
Long Short Term Memory LSTM
Abdullah al Mamun
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Universitat Politècnica de Catalunya
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Fordham University
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
Rnn presentation 2
Rnn presentation 2Rnn presentation 2
Rnn presentation 2
Shubhangi Tandon
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
San Kim
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
Dongang (Sean) Wang
 
lepibwp74jd2rz.pdf
lepibwp74jd2rz.pdflepibwp74jd2rz.pdf
lepibwp74jd2rz.pdf
SajalTyagi6
 
An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)
EmmanuelJosterSsenjo
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
SagarTekwani4
 
حافظه طولانی کوتاه مدت Long Short-Term Memory (LSTM)
حافظه طولانی کوتاه مدت Long Short-Term Memory (LSTM)حافظه طولانی کوتاه مدت Long Short-Term Memory (LSTM)
حافظه طولانی کوتاه مدت Long Short-Term Memory (LSTM)
mahrokh sahraei
 
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
Andrew Ferlitsch
 
Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...
Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...
Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...
Sasidhar Tadanki
 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1
paul0001
 
Complete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxComplete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptx
ArunKumar674066
 
RNN-LSTM.pptx
RNN-LSTM.pptxRNN-LSTM.pptx
RNN-LSTM.pptx
ssuserc755f1
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
AbhijitVenkatesh1
 

Similar to LSTM (20)

Long Short-Term Memory
Long Short-Term MemoryLong Short-Term Memory
Long Short-Term Memory
 
Long Short Term Memory LSTM
Long Short Term Memory LSTMLong Short Term Memory LSTM
Long Short Term Memory LSTM
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Rnn presentation 2
Rnn presentation 2Rnn presentation 2
Rnn presentation 2
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
lepibwp74jd2rz.pdf
lepibwp74jd2rz.pdflepibwp74jd2rz.pdf
lepibwp74jd2rz.pdf
 
An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 
حافظه طولانی کوتاه مدت Long Short-Term Memory (LSTM)
حافظه طولانی کوتاه مدت Long Short-Term Memory (LSTM)حافظه طولانی کوتاه مدت Long Short-Term Memory (LSTM)
حافظه طولانی کوتاه مدت Long Short-Term Memory (LSTM)
 
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
 
Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...
Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...
Multiple Resonant Multiconductor Transmission line Resonator Design using Cir...
 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1
 
Complete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxComplete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptx
 
RNN-LSTM.pptx
RNN-LSTM.pptxRNN-LSTM.pptx
RNN-LSTM.pptx
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
 

Recently uploaded

Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 

Recently uploaded (20)

Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 

LSTM

  • 1. Long short-term memory Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780. 01 Long Short-Term Memory (LSTM) Olivia Ni
  • 2. • Recurrent Neural Networks (RNN) • The Problem of Long-Term Dependencies • LSTM Networks • The Core Idea Behind LSTMs • Step-by-Step LSTM Walk Through • Variants on LSTMs • Conclusions & References • Appendix (BPTT  Gradient Exploding/ Vanishing) 02 Outline
  • 3. • Idea: • condition the neural network on all previous information and tie the weights at each time step • Assumption: temporal information matters (i.e. time series data) 03 Recurrent Neural Networks (RNN) RNN RNNRNN 𝐼𝑛𝑝𝑢𝑡 𝑡 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡 𝑆𝑇𝑀 𝑡−1 𝑆𝑇𝑀 𝑡 𝐼𝑛𝑝𝑢𝑡 𝑡−1 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡−1 𝐼𝑛𝑝𝑢𝑡 𝑡+1 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡+1 𝑆𝑇𝑀 𝑡−2 𝑆𝑇𝑀 𝑡+1 • STM = Short-term memory
  • 4. • RNN Definition: • Model Training: • All model parameters 𝜃 = 𝑈, 𝑉, 𝑊 can be updated by gradient descent 04 Recurrent Neural Networks (RNN) 𝑆𝑡 = 𝜎 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡 𝜃 𝑖+1 ← 𝜃 𝑖 − 𝜂𝛻𝜃 𝐶 𝜃 𝑖
  • 5. • Example: (Consider trying to predict the last word in the text) • Issue: in theory, RNNs can handle such “long-term dependencies,” but they cannot in practice!! “The clouds are in the sky.” “I grew up in France… I speak fluent French.” 05 The Problem of Long-Term Dependencies
  • 6. • RNN Training Issue: (1) The gradient is a product of Jacobian matrices, each associated with a step in the forward computation (2) Multiply the same matrix at each time step during BPTT • The gradient becomes very small or very large quickly • Vanishing or Exploding gradient • The error surface is either very flat or very steep 06 The Problem of Long-Term Dependencies
  • 7. • Possible Solutions: • Gradient Exploding: • Clipping (https://arxiv.org/abs/1211.5063?context=cs) • Gradient Vanishing: • Better Initialization (https://arxiv.org/abs/1504.00941) • Gating Mechanism (LSTM, GRU, …, etc.) • Attention Mechanism (https://arxiv.org/pdf/1706.03762.pdf) 07 The Problem of Long-Term Dependencies
  • 8. 08 LSTM Networks – The Core Idea Behind LSTMs RNN RNNRNN 𝐼𝑛𝑝𝑢𝑡 𝑡 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡 𝑆𝑇𝑀 𝑡−1 𝑆𝑇𝑀 𝑡 𝐼𝑛𝑝𝑢𝑡 𝑡−1 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡−1 𝐼𝑛𝑝𝑢𝑡 𝑡+1 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡+1 𝑆𝑇𝑀 𝑡−2 𝑆𝑇𝑀 𝑡+1 • STM = Short-term memory • LSTM = Long Short-term memory LSTM LSTMLSTM 𝐼𝑛𝑝𝑢𝑡 𝑡 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡 𝑆𝑇𝑀 𝑡−1 𝑆𝑇𝑀 𝑡 𝐼𝑛𝑝𝑢𝑡 𝑡−1 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡−1 𝐼𝑛𝑝𝑢𝑡 𝑡+1 𝑂𝑢𝑡𝑝𝑢𝑡 𝑡+1 𝑆𝑇𝑀 𝑡−2 𝑆𝑇𝑀 𝑡+1 𝐿𝑆𝑇𝑀 𝑡−1 𝐿𝑆𝑇𝑀 𝑡 𝐿𝑆𝑇𝑀 𝑡−2 𝐿𝑆𝑇𝑀 𝑡+1
  • 9. 09 LSTM Networks – The Core Idea Behind LSTMs • STM = Short-term memory • LSTM = Long Short-term memory 𝐿𝑆𝑇𝑀 𝑆𝑇𝑀 𝑆𝑇𝑀
  • 10. 10 LSTM Networks – Step-by-Step LSTM Walk Through (0/4) • The cell state runs straight down the entire chain, with only some minor linear interactions.  Easy for information to flow along it unchanged • The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.
  • 11. 11 LSTM Networks – Step-by-Step LSTM Walk Through (1/4) • Forget gate (sigmoid + pointwise multiplication operation): decides what information we’re going to throw away from the cell state • 1: ‘’Complete keep this” • 0: “Complete get rid of this”
  • 12. 12 LSTM Networks – Step-by-Step LSTM Walk Through (2/4) • Input gate (sigmoid + pointwise multiplication operation): decides what new information we’re going to store in the cell state Vanilla RNN
  • 13. 13 LSTM Networks – Step-by-Step LSTM Walk Through (3/4) • Cell state update: forgets the things we decided to forget earlier and add the new candidate values, scaled by how much we decided to update • 𝑓𝑡: decide which to forget • 𝑖 𝑡: decide which to update ⟹ 𝐶𝑡 has been updated at timestamp 𝑡, which change slowly!
  • 14. 14 LSTM Networks – Step-by-Step LSTM Walk Through (4/4) • Output gate (sigmoid + pointwise multiplication operation): decides what new information we’re going to output ⟹ ℎ 𝑡 has been updated at timestamp 𝑡, which change faster!
  • 15. 15 LSTM Networks – Variants on LSTMs (1/3) • LSTM with Peephole Connections • Idea: allow gate layers to look at the cell state
  • 16. 16 LSTM Networks – Variants on LSTMs (2/3) • LSTM with Coupled Forget/ Input Gate • Idea: we only forget when we’re going to input something in its place, and vice versa.
  • 17. 17 LSTM Networks – Variants on LSTMs (3/3) • Gated Recurrent Unit (GRU) • Idea: • combine the forget and input gates into a single “update gate” • merge the cell state and hidden state Update gate: Reset gate: State Candidate: Current State:
  • 18. Explain by - Backpropagation Through Time (BPTT) RNN Training Issue: - Gradient Vanishing - Gradient Exploding Review - Backpropagation (BP) 18 Appendix – The Problem of Long-Term Dependencies
  • 19. 𝜕 𝐶 𝜃 𝜕 𝑤𝑖𝑗 𝑙 𝜕 𝐶 𝜃 𝜕 𝑤𝑖𝑗 𝑙 • Gradient Descent for Neural Networks • Computing the gradient includes millions of parameters. • To compute it efficiently, we use backpropagation. • Compute the gradient based on two pre-computed terms from forward and backward pass. 19 Appendix – The Problem of Long-Term Dependencies 𝜕 𝐶 𝜃 𝜕 𝑤𝑖𝑗 𝑙 BPTT BP
  • 20. 𝜕 𝐶 𝜃 𝜕 𝑤𝑖𝑗 𝑙 = 𝜕 𝐶 𝜃 𝜕 𝑧𝑖 𝑙 𝜕 𝑧𝑖 𝑙 𝜕 𝑤𝑖𝑗 𝑙 • WLOG, we use 𝑤𝑖𝑗 𝑙 to demonstrate • Forward pass: 20 Appendix – The Problem of Long-Term Dependencies 𝜕 𝑧𝑖 𝑙 𝜕 𝑤𝑖𝑗 𝑙 = ൝ 𝑥𝑗 , 𝑖𝑓 𝑙 = 1 𝑎𝑗 𝑙−1 , 𝑖𝑓 𝑙 > 1 (𝑙 = 1) (𝑙 > 1) BPTT BP
  • 21. 𝜕 𝐶 𝜃 𝜕 𝑤𝑖𝑗 𝑙 = 𝜕 𝐶 𝜃 𝜕 𝑧𝑖 𝑙 𝜕 𝑧𝑖 𝑙 𝜕 𝑤𝑖𝑗 𝑙 • WLOG, we use 𝑤𝑖𝑗 𝑙 to demonstrate • Backward pass : 21 Appendix – The Problem of Long-Term Dependencies (𝑙 = 𝐿) (𝑙 < 𝐿) 𝛿𝑖 𝐿 = 𝜕 𝐶 𝜃 𝜕 𝑧𝑖 𝐿 = 𝜕 𝐶 𝜃 𝜕 𝑦𝑖 𝜕 𝑦𝑖 𝜕 𝑧𝑖 𝐿 = 𝜕 𝐶 𝜃 𝜕 𝑦𝑖 𝜕𝑎𝑖 𝐿 𝜕 𝑧𝑖 𝐿 = 𝜕 𝐶 𝜃 𝜕 𝑦𝑖 𝜕σ 𝑧𝑖 𝐿 𝜕 𝑧𝑖 𝐿 = 𝜕 𝐶 𝜃 𝜕 𝑦𝑖 σ′ 𝑧𝑖 𝐿 BPTT BP 𝛿𝑖 𝑙 = 𝜕 𝐶 𝜃 𝜕 𝑧𝑖 𝑙 = ෍ 𝑘 𝜕 𝐶 𝜃 𝜕 𝑧 𝑘 𝑙+1 𝜕 𝑧 𝑘 𝑙+1 𝜕𝑎𝑖 𝐿 𝜕𝑎𝑖 𝐿 𝜕 𝑧𝑖 𝑙 = 𝜕𝑎𝑖 𝐿 𝜕 𝑧𝑖 𝑙 ෍ 𝑘 𝜕 𝐶 𝜃 𝜕 𝑧 𝑘 𝑙+1 𝜕 𝑧 𝑘 𝑙+1 𝜕𝑎𝑖 𝐿 = 𝜕𝑎𝑖 𝐿 𝜕 𝑧𝑖 𝑙 ෍ 𝑘 𝛿𝑖 𝑙+1 𝜕 𝑧 𝑘 𝑙+1 𝜕𝑎𝑖 𝐿 = σ′ 𝑧𝑖 𝑙 ෍ 𝑘 𝛿𝑖 𝑙+1 𝑤 𝑘𝑖 𝑙+1 𝜕 𝐶 𝜃 𝜕 𝑧𝑖 𝑙 ≜ 𝛿𝑖 𝑙 = σ′ 𝑧𝑖 𝐿 𝜕 𝐶 𝜃 𝜕 𝑦𝑖 , 𝑖𝑓 𝑙 = 𝐿 σ′ 𝑧𝑖 𝑙 ෍ 𝑘 𝛿𝑖 𝑙+1 𝑤 𝑘𝑖 𝑙+1 , 𝑖𝑓 𝑙 < 𝐿
  • 22. 𝜕 𝐶 𝜃 𝜕 𝑤𝑖𝑗 𝑙 = 𝜕 𝐶 𝜃 𝜕 𝑧𝑖 𝑙 𝜕 𝑧𝑖 𝑙 𝜕 𝑤𝑖𝑗 𝑙 • WLOG, we use 𝑤𝑖𝑗 𝑙 to demonstrate • Backward pass : 22 Appendix – The Problem of Long-Term Dependencies BPTT BP 𝜕 𝐶 𝜃 𝜕 𝑧𝑖 𝑙 ≜ 𝛿𝑖 𝑙 = σ′ 𝑧𝑖 𝐿 𝜕 𝐶 𝜃 𝜕 𝑦𝑖 , 𝑖𝑓 𝑙 = 𝐿 σ′ 𝑧𝑖 𝑙 ෍ 𝑘 𝛿𝑖 𝑙+1 𝑤 𝑘𝑖 𝑙+1 , 𝑖𝑓 𝑙 < 𝐿
  • 23. 𝜕 𝐶 𝜃 𝜕 𝑤𝑖𝑗 𝑙 = 𝜕 𝐶 𝜃 𝜕 𝑧𝑖 𝑙 𝜕 𝑧𝑖 𝑙 𝜕 𝑤𝑖𝑗 𝑙 • Concluding Remarks for Backpropagation (BP) 23 Appendix – The Problem of Long-Term Dependencies BPTT BP
  • 24. • Recap Recurrent Neuron Network (RNN) Architectures • Model Training: • All model parameters 𝜃 = 𝑈, 𝑉, 𝑊 can be updated by gradient descent 24 Appendix – The Problem of Long-Term Dependencies BPTT BP 𝑆𝑡 = 𝜎 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡 𝜃 𝑖+1 ← 𝜃 𝑖 − 𝜂𝛻𝜃 𝐶 𝜃 𝑖
  • 25. 25 Appendix – The Problem of Long-Term Dependencies BPTT BP 𝑆𝑡 = 𝜎 𝑈𝑥 𝑡 + 𝑊𝑠𝑡−1 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡 𝐹𝑜𝑟 𝜃 = 𝑈, 𝑉, 𝑊 , 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑡ℎ𝑒𝑚 𝑏𝑦 𝜃 𝑖+1 ← 𝜃 𝑖 − 𝜂𝛻𝜃 𝐶 𝜃 𝑖 𝑾 𝑼 𝑽 𝑊(1) ⟵ 𝑊 1 − 𝜕𝐶 3 𝜃 𝜕𝑊 1 𝑊(2) ⟵ 𝑊 2 − 𝜕𝐶 3 𝜃 𝜕𝑊 2 𝑊(3) ⟵ 𝑊 3 − 𝜕𝐶 3 𝜃 𝜕𝑊 3 𝑈(1) ⟵ 𝑈 1 − 𝜕𝐶 3 𝜃 𝜕𝑈 1 𝑈(2) ⟵ 𝑈 2 − 𝜕𝐶 3 𝜃 𝜕𝑈 2 𝑈(3) ⟵ 𝑈 3 − 𝜕𝐶 3 𝜃 𝜕𝑈 3 𝑉(3) ⟵ 𝑉 3 − 𝜕𝐶 3 𝜃 𝜕𝑉 3 𝑊 ⟵ 𝑊 − 𝜕𝐶 3 𝜃 𝜕𝑊 1 − 𝜕𝐶 3 𝜃 𝜕𝑊 2 − 𝜕𝐶 3 𝜃 𝜕𝑊 3 𝑈 ⟵ 𝑈 − 𝜕𝐶 3 𝜃 𝜕𝑈 1 − 𝜕𝐶 3 𝜃 𝜕𝑈 2 − 𝜕𝐶 3 𝜃 𝜕𝑈 3 𝑉 ⟵ 𝑉 − 𝜕𝐶 3 𝜃 𝜕𝑉 3 𝐶 ≜ ෍ 𝑡 𝐶 𝑡 , 𝑊𝐿𝑂𝐺, 𝑢𝑠𝑖𝑛𝑔 𝐶 3 𝑡𝑜 𝑒𝑥𝑝𝑙𝑎𝑖𝑛 Tie 𝜃 NO Yes
  • 26. 26 Appendix – The Problem of Long-Term Dependencies BPTT BP 𝜕𝐶 3 𝜃 𝜕𝑊 = 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 𝜕𝑠3 𝜕𝑊 = ෍ 𝑘=0 3 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 𝜕𝑠3 𝜕𝑠 𝑘 𝜕𝑠 𝑘 𝜕𝑊 = ෍ 𝑘=0 3 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 ෑ 𝑗=𝑘+1 3 𝜕𝑠𝑗 𝜕𝑠𝑗−1 𝜕𝑠 𝑘 𝜕𝑊 𝑾 𝑼 𝑽 𝑊 ⟵ 𝑊 − 𝜕𝐶 3 𝜃 𝜕𝑊 1 − 𝜕𝐶 3 𝜃 𝜕𝑊 2 − 𝜕𝐶 3 𝜃 𝜕𝑊 3 𝑈 ⟵ 𝑈 − 𝜕𝐶 3 𝜃 𝜕𝑈 1 − 𝜕𝐶 3 𝜃 𝜕𝑈 2 − 𝜕𝐶 3 𝜃 𝜕𝑈 3 𝑉 ⟵ 𝑉 − 𝜕𝐶 3 𝜃 𝜕𝑉 3 Tie 𝜃 Yes 𝜕𝐶 3 𝜃 𝜕𝑈 = 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 𝜕𝑠3 𝜕𝑈 = ෍ 𝑘=1 3 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 𝜕𝑠3 𝜕𝑠 𝑘 𝜕𝑠 𝑘 𝜕𝑈 = ෍ 𝑘=1 3 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 ෑ 𝑗=𝑘+1 3 𝜕𝑠𝑗 𝜕𝑠𝑗−1 𝜕𝑠 𝑘 𝜕𝑈 𝜕𝐶 3 𝜃 𝜕𝑉 = 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑉 𝐶 ≜ ෍ 𝑡 𝐶 𝑡 , 𝑊𝐿𝑂𝐺, 𝑢𝑠𝑖𝑛𝑔 𝐶 3 𝑡𝑜 𝑒𝑥𝑝𝑙𝑎𝑖𝑛 𝑆𝑡 = 𝜎 𝑈𝑥 𝑡 + 𝑊𝑠𝑡−1 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡 𝐹𝑜𝑟 𝜃 = 𝑈, 𝑉, 𝑊 , 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑡ℎ𝑒𝑚 𝑏𝑦 𝜃 𝑖+1 ← 𝜃 𝑖 − 𝜂𝛻𝜃 𝐶 𝜃 𝑖
  • 27. 27 Appendix – The Problem of Long-Term Dependencies BPTT BP 𝜕𝑠𝑗 𝜕𝑠 𝑘 = ෑ 𝑗=𝑘+1 3 𝜕𝑠𝑗 𝜕𝑠𝑗−1 = ෑ 𝑗=𝑘+1 3 𝑊 𝑇 𝑱𝒂𝒄𝒐𝒃𝒊𝒂𝒏_𝒎𝒂𝒕𝒓𝒊𝒙 𝜎′ 𝑠𝑗−1 𝜕𝐶 3 𝜃 𝜕𝑊 = 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 𝜕𝑠3 𝜕𝑊 = ෍ 𝑘=0 3 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 𝜕𝑠3 𝜕𝑠 𝑘 𝜕𝑠 𝑘 𝜕𝑊 = ෍ 𝑘=0 3 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 ෑ 𝑗=𝑘+1 3 𝜕𝑠𝑗 𝜕𝑠𝑗−1 𝜕𝑠 𝑘 𝜕𝑊 𝜕𝐶 3 𝜃 𝜕𝑈 = 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 𝜕𝑠3 𝜕𝑈 = ෍ 𝑘=1 3 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 𝜕𝑠3 𝜕𝑠 𝑘 𝜕𝑠 𝑘 𝜕𝑈 = ෍ 𝑘=1 3 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑠3 ෑ 𝑗=𝑘+1 3 𝜕𝑠𝑗 𝜕𝑠𝑗−1 𝜕𝑠 𝑘 𝜕𝑈 𝜕𝐶 3 𝜃 𝜕𝑉 = 𝜕𝐶 3 𝜃 𝜕𝑜3 𝜕𝑜3 𝜕𝑉 𝑆𝑡 = 𝜎 𝑈𝑥 𝑡 + 𝑊𝑠𝑡−1 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡 𝐹𝑜𝑟 𝜃 = 𝑈, 𝑉, 𝑊 , 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑡ℎ𝑒𝑚 𝑏𝑦 𝜃 𝑖+1 ← 𝜃 𝑖 − 𝜂𝛻𝜃 𝐶 𝜃 𝑖
  • 28. • Understand the difficulty of training recurrent neural networks • Gradient Exploding • Gradient Vanishing • One possible solution for solving the gradient vanishing problem is “Gating mechanism”, which is the key concept of LSTM • LSTM can be “deep” if we stack multiple LSTM cells • Extensions: • Uni-directional v.s. Bi-directional • One-to-one, One-to-many, Many-to-one, Many-to-Many (w/ or w/o Encoder-Decoder) 28 Conclusions
  • 29. • Understanding LSTM Networks http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • Prof. Hung-yi Lee Courses https://www.youtube.com/watch?v=xCGidAeyS4M https://www.youtube.com/watch?v=rTqmWlnwz_0 • On the difficulty of training recurrent neural networks https://arxiv.org/abs/1211.5063 • UDACITY Courses: Intro to Deep Learning with PyTorch https://classroom.udacity.com/courses/ud188 29 References
  • 30. 20 Thanks for your listening.