SlideShare a Scribd company logo
Recurrent Neural Networks
Alex Kalinin alex@alexkalinin.com
Content
1. Example of Vanilla RNN
2. RNN Forward pass
3. RNN Backward pass
4. LSTM design
RNN Training problem
Feed-forward (“vanilla”) network
1
0
0
1
0
X
y
RNN
h
𝑊ℎℎ
𝑊ℎ𝑦
𝑊𝑥ℎ
Vanilla recurrent network
1) ℎ 𝑡 = tanh 𝑊ℎℎℎ 𝑡−1 + 𝑊𝑥ℎ 𝑥 + 𝑏ℎ
2) 𝑦 = 𝑊ℎ𝑦ℎ 𝑡 + 𝑏 𝑦
Example: character-level language processing
X
y
RNN
Training sequence:
”hello”
Vocabulary:
[e, h, l, o]
0
1
0
0
1
0
0
0
0
0
1
0
0
0
0
1
“h”“e” “l” “0”
𝑊ℎℎ
𝑊ℎ𝑦
𝑊𝑥ℎ
hX Y
𝑊ℎℎ = 4.1
𝑊𝑥ℎ = [3.6 −4.8 0.35 −0.26]
𝑊ℎ𝑦 =
−12.
−0.67
−0.85
14.
P
𝑏ℎ = 0.41
𝑏 𝑦 =
−0.2
−2.9
6.1
−3.4
“hello” RNN
hX Y P
0
1
0
0
“h”
ℎ0 = 0
“h”
hX Y P
0
1
0
0
“h”
ℎ 𝑡 = tanh 𝑊ℎℎℎ 𝑡−1 + 𝑊𝑥ℎ 𝑥 + 𝑏ℎ
ℎ0 = 0
“h”
hX Y P
0
1
0
0
“h”
ℎ = −0.99
“h”
hX Y P
0
1
0
0
“h”
ℎ = −0.99 𝑦 = 𝑊ℎ𝑦ℎ 𝑡 + 𝑏 𝑦
“h”
hX Y P
0
1
0
0
“h”
ℎ = −0.99 𝑦 =
11.
−2.2
6.9
−17
“h”
hX Y P
0
1
0
0
“h”
ℎ = −0.99 𝑦 =
11.
−2.2
6.9
−17
𝑝 =
0.99
0
0.01
0
“h”
hX Y P
0
1
0
0
“h”
ℎ = −0.99 𝑦 =
11.
−2.2
6.9
−17
𝑝 =
0.99
0
0.01
0
1
0
0
0
“e”
“h”
hX Y P
1
0
0
0
“e”
ℎ = −0.99
“h” “e”
hX Y P
1
0
0
0
“e”
ℎ = −0.99
ℎ 𝑡 = tanh 𝑊ℎℎℎ 𝑡−1 + 𝑊𝑥ℎ 𝑥 + 𝑏ℎ
“h” “e”
hX Y P
1
0
0
0
“e”
ℎ = −0.09
“h” “e”
hX Y P
1
0
0
0
“e”
ℎ = −0.09 𝑦 = 𝑊ℎ𝑦ℎ 𝑡 + 𝑏 𝑦
“h” “e”
hX Y P
1
0
0
0
“e”
ℎ = −0.09 𝑦 =
0.86
−2.8
6.2
−4.6
“h” “e”
hX Y P
1
0
0
0
“e”
ℎ = −0.09 𝑦 =
0.86
−2.8
6.2
−4.6
𝑝 =
0
0
0.99
0
“h” “e”
hX Y P
1
0
0
0
“e”
ℎ = −0.09 𝑦 =
0.86
−2.8
6.2
−4.6
𝑝 =
0
0
0.99
0
0
0
1
0
“l”
“h” “e”
hX Y P
0
0
1
0
“l”
ℎ = −0.09
“h” “e” “l”
hX Y P
0
0
1
0
“l”
ℎ = 0.38
“h” “e” “l”
hX Y P
0
0
1
0
“l”
ℎ = 0.38 𝑦 =
−4.7
−3.2
5.8
1.9
“h” “e” “l”
hX Y P
0
0
1
0
“l”
ℎ = 0.38 𝑦 =
−4.7
−3.2
5.8
1.9
𝑝 =
0
0
0.98
0.02
“h” “e” “l”
hX Y P
0
0
1
0
“l”
ℎ = 0.38 𝑦 =
−4.7
−3.2
5.8
1.9
𝑝 =
0
0
0.98
0.02
0
0
1
0
“l”
“h” “e” “l”
hX Y P
0
0
1
0
“l”
ℎ = 0.38
“h” “e” “l” “l”
hX Y P
0
0
1
0
“l”
ℎ = 0.98
“h” “e” “l” “l”
hX Y P
0
0
1
0
“l”
ℎ = 0.98
“h” “e” “l” “l”
𝑦 =
−12.
−3.6
5.3
10.
hX Y P
0
0
1
0
“l”
ℎ = 0.98
“h” “e” “l” “l”
𝑦 =
−12.
−3.6
5.3
10.
𝑝 =
0
0
0.01
0.99
hX Y P
0
0
1
0
“l”
ℎ = 0.98
“h” “e” “l” “l”
𝑦 =
−12.
−3.6
5.3
10.
𝑝 =
0
0
0.01
0.99
0
0
0
1
“o”
hX Y P
ℎ = 0.98
“h” “e” “l” “l” “o”
hX Y P
“h” ℎ0 = 0 “e”⨁
“e” ℎ1 =-0.99 “l”⨁
“l” ℎ2 =-0.09 “l”⨁
“l” ℎ3 =0.38 “o”⨁
hX Y P
“hello” “hello”
“hello ben” “hello ben”
“hello world” “hello world”
hX Y P
“it was” “it was”
“it was the” “it was the”
“it was the best” “it was the best”
“It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness… “, A Tale of Two Cities, Charles Dickens
50,000
300,000 (loss = 1.6066)
1,000,000 (loss = 1.8197)
“it was the best of” “it wes the best of” 2,000,000 (loss = 4.0844)
hX Y P
…
epoch 500000, loss: 6.447782290456328
…
epoch 1000000, loss: 5.290576956983398
…
epoch 1800000, loss: 4.267105168323299
epoch 1900000, loss: 4.175163586546514
epoch 2000000, loss: 4.0844739848413285
X
y
RNN
h
𝑊ℎℎ
𝑊ℎ𝑦
𝑊𝑥ℎ
Vanilla recurrent network
1) ℎ 𝑡 = tanh 𝑊ℎℎℎ 𝑡−1 + 𝑊𝑥ℎ 𝑥 + 𝑏ℎ
2) 𝑦 = 𝑊ℎ𝑦ℎ 𝑡 + 𝑏 𝑦
Input:
Target:
i t “ “ w a s “ “
t “ “ w a s “ “ t h
t
RNNs for Different Problems
Vanilla Neural Network
RNNs for Different Problems
Image Captioning
image -> sequence of words
RNNs for Different Problems
Sentiment Analysis
sequence of words -> class
RNNs for Different Problems
Translation
sequence of words -> sequence of words
ℎ1ℎ0
1 1 2
3
ℎ2
𝑥0 𝑥1 𝑥2
𝐿 = 𝑓(𝑊𝑥ℎ, 𝑊ℎℎ, 𝑊ℎ𝑦)𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
𝑤 𝑥ℎ ≔ 𝑤 𝑥ℎ − 0.01 ∙
𝜕𝐿
𝜕𝑤 𝑥ℎ
𝑤ℎℎ ≔ 𝑤ℎℎ − 0.01 ∙
𝜕𝐿
𝜕𝑤ℎℎ
𝑤ℎ𝑦 ≔ 𝑤ℎ𝑦 − 0.01 ∙
𝜕𝐿
𝜕𝑤ℎ𝑦
Training is hard with vanilla RNNs
𝛻𝐿 = [
𝜕𝐿
𝜕𝑤 𝑥ℎ
,
𝜕𝐿
𝜕𝑤ℎℎ
,
𝜕𝐿
𝜕𝑤ℎ𝑦
]
𝑊𝑥ℎ
𝑊ℎℎ
𝑊ℎ𝑦
<— Forward pass
<— Backward pass
ℎ1ℎ0
1 1 2
3
ℎ2
𝑥0 𝑥1 𝑥2
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝜕𝐿
𝜕𝑤ℎℎ
=?
𝐿 = (𝑦 − 3)2
𝐿 =?
y
𝜕𝐿
𝜕𝑤
=
𝜕𝑓
𝜕𝑔
∙
𝜕𝑔
𝜕ℎ
∙
𝜕ℎ
𝜕𝑘
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤
𝐿 = 𝑓(𝑔 ℎ(𝑘(𝑙(𝑚 𝑛(𝑤) ))) )
𝜕𝐿
𝜕𝑤ℎℎ
=?
𝐿 = ( 𝑊ℎℎtanh(𝑊ℎℎtanh(𝑊ℎℎtanh(𝑊𝑥ℎ 𝑥0) + 𝑊𝑥ℎ 𝑥1) + 𝑊𝑥ℎ 𝑥2) − 3)2
Compute gradient
Recursive application of chain rule:
𝜕𝐿
𝜕𝑤
=?
𝑓 = 𝑓(𝑔)𝑔 = 𝑔(ℎ)ℎ = ℎ(𝑘)
Gradient by hand
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
1
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
0.078
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
0.078
tanh
0.0778
ℎ0
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
ℎ0
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
ℎ0
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
0.078
1.
𝑊𝑥ℎ
𝑥1
ℎ0
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
𝑊ℎℎ 0.024
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
𝑊ℎℎ 0.024
*
0.0019
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
0.078
2.
𝑊𝑥ℎ
𝑥2
𝑊ℎℎ 0.024
*
0.0019
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+
-2.99
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝑊𝑥ℎ = 0.078
𝑊ℎ𝑦 = 0.051
𝑊ℎℎ = 0.024
Forward Pass
ℎ0 = tanh(𝑊𝑥ℎ 𝑥0)
ℎ1 = tanh(𝑊ℎℎℎ0 + 𝑊𝑥ℎ 𝑥1)
ℎ2 = tanh(𝑊ℎℎℎ1 + 𝑊𝑥ℎ 𝑥2)
𝑦 = 𝑊ℎ𝑦ℎ2
𝐿 = (𝑦 − 3)2
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝜕𝐿
𝜕𝑤
=
𝜕𝑓
𝜕𝑔
∙
𝜕𝑔
𝜕ℎ
∙
𝜕ℎ
𝜕𝑘
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤
𝐿 = 𝑓(𝑔 ℎ(𝑘(𝑙(𝑚 𝑛(𝑤) ))) )
𝜕𝐿
𝜕𝑤ℎℎ
=?
Compute gradient
Recursive application of chain rule:
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝜕𝑓
𝜕𝑔
∙
𝜕𝑔
𝜕ℎ
∙
𝜕ℎ
𝜕𝑘
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝜕𝑓
𝜕𝑔
∙
𝜕𝑔
𝜕ℎ
∙
𝜕ℎ
𝜕𝑘
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝜕𝑔
𝜕ℎ
∙
𝜕ℎ
𝜕𝑘
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1
𝜕𝑓
𝜕𝑔
=?
𝑓 = (𝑔)2
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝜕𝑔
𝜕ℎ
∙
𝜕ℎ
𝜕𝑘
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1
𝜕𝑓
𝜕𝑔
=
𝜕𝑔2
𝜕𝑔
= 2𝑔 = 2 −2.99 = −5.98
𝑓 = (𝑔)2
-5.98
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝜕ℎ
𝜕𝑘
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1-5.98
𝑔 = ℎ − 3
𝜕𝑔
𝜕ℎ
= 1
-5.98
tanh
tanh
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝝏𝒉
𝝏𝒌
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1-5.98
-5.98
ℎ = 𝑊ℎ𝑦 𝑘
𝜕ℎ
𝜕𝑘
= 𝑊ℎ𝑦
0.051tanh
tanh
𝜕ℎ
𝜕𝑊ℎ𝑦
= 𝑘
0.1566
-0.304
0.936
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝝏𝒉
𝝏𝒌
∙
𝜕𝑘
𝜕𝑙
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1-5.98
-5.98
ℎ = 𝑊ℎ𝑦 𝑘
𝜕ℎ
𝜕𝑘
= 𝑊ℎ𝑦
tanh
tanh
𝜕ℎ
𝜕𝑊ℎ𝑦
= 𝑘
-0.304
0.936
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝝏𝒉
𝝏𝒌
∙
𝝏𝒌
𝝏𝒍
∙
𝜕𝑙
𝜕𝑚
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1-5.98
-5.98
𝑘 = tanh(𝑙)
𝜕𝑘
𝜕𝑙
= 1 − 𝑘2
= 1−.15662
= .975
-0.304-0.297
tanh
tanh
0.936
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.07970
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝝏𝒉
𝝏𝒌
∙
𝝏𝒌
𝝏𝒍
∙
𝝏𝒍
𝝏𝒎
∙
𝜕𝑚
𝜕𝑛
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1-5.98
-5.98
-0.297
tanh
tanh
-0.297-0.0071
0.936
-0.304
-0.297
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.0797
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝝏𝒉
𝝏𝒌
∙
𝝏𝒌
𝝏𝒍
∙
𝝏𝒍
𝝏𝒎
∙
𝝏𝒎
𝝏𝒏
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
1-5.98
-5.98
-0.297
tanh
tanh
-0.297-0.0071
1 − 𝑘2
= 1−.07972
= .993
-0.0071
0.936
-0.304
-0.297
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.0797
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
1-5.98
-5.98
-0.297
tanh
tanh
-0.297-0.0071-0.0071
-0.0071
-0.00017
0.936
-0.304
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝝏𝒉
𝝏𝒌
∙
𝝏𝒌
𝝏𝒍
∙
𝝏𝒍
𝝏𝒎
∙
𝝏𝒎
𝝏𝒏
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
-0.0005
-0.297
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.0797
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
1-5.98
-5.98
-0.297
tanh
tanh
-0.297-0.0071-0.0071
-0.0071
-0.00017
1 − 𝑘2
= 1−.07782
= .993
0.936
-0.304
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝝏𝒉
𝝏𝒌
∙
𝝏𝒌
𝝏𝒍
∙
𝝏𝒍
𝝏𝒎
∙
𝝏𝒎
𝝏𝒏
∙
𝜕𝑛
𝜕𝑤 𝑥ℎ
-0.00017
-0.0005
-0.297
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.0797
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
1-5.98
-5.98
-0.297
tanh
tanh
-0.297-0.0071-0.0071
-0.0071
-0.00017
0.936
-0.304
𝜕𝐿
𝜕𝑤 𝑥ℎ
=
𝝏𝒇
𝝏𝒇
∙
𝝏𝒇
𝝏𝒈
∙
𝝏𝒈
𝝏𝒉
∙
𝝏𝒉
𝝏𝒌
∙
𝝏𝒌
𝝏𝒍
∙
𝝏𝒍
𝝏𝒎
∙
𝝏𝒎
𝝏𝒏
∙
𝝏𝒏
𝝏𝒘 𝒙𝒉
-0.00017
-0.00017
-0.0005
-0.297
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
0.0778
*
0.00187
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
0.07987
ℎ1
0.0797
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
0.0019
+
0.1579 0.1566
ℎ2
0.051𝑊ℎ𝑦
*
0.0080
𝑦
-3
+ **
-2.99 8.95
𝐿
1-5.98
-5.98
-0.297
tanh
tanh
-0.297-0.0071-0.0071
-0.0071
-0.00017
0.936
-0.304
-0.00017
-0.00017
-0.0005
-0.297
𝑤 𝑎 ≔ 𝑤 𝑎 − 0.01 ∙
𝜕𝐿
𝜕𝑤 𝑎
𝑤 𝑥ℎ ≔ 0.078 − 0.01 ∙ −.00017 = 0.0780017
𝑤ℎℎ ≔ 0.024 − 0.01 ∙ −.0005 = 0.024005
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
Backward Pass
*
0.078
1.
𝑊𝑥ℎ
𝑥0
𝑊ℎℎ 0.024
0.078
tanh
*
*
0.078
1.
𝑊𝑥ℎ
𝑥1
0.078
ℎ0
+
ℎ1
*
0.078
2.
𝑊𝑥ℎ
𝑥2
0.156
𝑊ℎℎ 0.024
*
+
0.1579
0.051𝑊ℎ𝑦
*
+ **
1-5.98
tanh
tanh
-0.297-0.0071
-0.0071
-0.00017
𝑥1𝑥0
ℎ1ℎ0
1 2
ℎ2
𝑥2
3
1
𝜕𝐿
𝜕𝑥
= 𝑤ℎℎ … 𝑤ℎℎ … 𝑤ℎℎ … 𝑤ℎℎ = 𝑤ℎℎ
𝑛
∙ 𝐶(𝑤)
𝑤ℎℎ𝑤ℎℎ𝑤ℎℎ
𝑤ℎℎ𝑤ℎℎ
1. 0.024
2. 0.000576
3. 1.382e-05
4. 3.318e-07
5. 7.963e-09
6. 1.911e-10
7. 4.586e-12
8. 1.101e-13
9. 2.642e-15
10. 6.340e-17
𝑊ℎℎ = 0.024
tanh tanhtanhtanhtanhtanh
Source: https://imgur.com/gallery/vaNahKE
W
x
2n
4n
𝑖
𝑓
𝑜
𝑔
=
𝑠𝑖𝑔𝑚
𝑠𝑖𝑔𝑚
𝑠𝑖𝑔𝑚
𝑡𝑎𝑛ℎ
𝑊
𝑥
ℎ 𝑡−1
𝑐𝑡 = 𝑓 ∙ 𝑐𝑡−1 + 𝑖 ∙ 𝑔
ℎ 𝑡 = 𝑜 ∙ tanh(𝑐𝑡)
i
f
o
g
x
h
Long Short-Term Memory (LSTM)
n
n
n
n
𝜎
𝜎
𝜎
𝜏
𝑡 − 1 𝑡
ℎ 𝑡 = (tanh) 𝑊
𝑥
ℎ 𝑡−1
- RNN
𝑐𝑡 = 𝑓 ∙ 𝑐𝑡−1 + 𝑖 ∙ 𝑔
ℎ 𝑡 = tanh 𝑊ℎℎℎ 𝑡−1 + 𝑊𝑥ℎ 𝑥RNN:
LSTM:
𝑖
𝑓
𝑜
𝑔
=
𝑠𝑖𝑔𝑚
𝑠𝑖𝑔𝑚
𝑠𝑖𝑔𝑚
𝑡𝑎𝑛ℎ
𝑊
𝑥
ℎ 𝑡−1
𝑐𝑡 = 𝑓 ∙ 𝑐𝑡−1 + 𝑖 ∙ 𝑔
ℎ 𝑡 = 𝑜 ∙ tanh(𝑐𝑡)
forget
gate,
0/1
input
gate,
0/1
f
incoming
X
i og
+
X
tanh
X
Long Short-Term Memory (LSTM)
𝑖
𝑓
𝑜
𝑔
=
𝑠𝑖𝑔𝑚
𝑠𝑖𝑔𝑚
𝑠𝑖𝑔𝑚
𝑡𝑎𝑛ℎ
𝑊
𝑥
ℎ 𝑡−1
𝑐𝑡 = 𝑓 ∙ 𝑐𝑡−1 + 𝑖 ∙ 𝑔
ℎ 𝑡 = 𝑜 ∙ tanh(𝑐𝑡)
𝑐𝑡−1
ℎ 𝑡
𝜕𝐿
𝜕𝑥
= 𝑤ℎℎ … 𝑤ℎℎ … 𝑤ℎℎ … 𝑤ℎℎ = 𝑤ℎℎ
𝑛
∙ 𝐶(𝑤)
𝑤ℎℎ𝑤ℎℎ𝑤ℎℎ
f f f
f f f
+ + +
RNN
LSTM
Flow of gradient
𝑡 − 1 𝑡 𝑡 + 1
𝑡 − 1 𝑡 𝑡 + 1
Source: https://imgur.com/gallery/vaNahKE
Long Short-Term Memory (LSTM)
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Reference
1. Long Term-Short Memory (Hochreiter, 1997),
http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf
2. Learning Long Term Dependencies With Gradient Descent is Difficult (Yoshua Bengio, 1994),
http://www.dsi.unifi.it/~paolo/ps/tnn-94-gradient.pdf
3. http://neuralnetworksanddeeplearning.com/chap5.html
4. Deep Learning, Ian Goodfellow et al., The MIT Press
5. Recurrent Neural Networks, LSTM, Andrej Karpathy, Stanford Lectures,
https://www.youtube.com/watch?v=iX5V1WpxxkY
Alex Kalinin alex@alexkalinin.com

More Related Content

What's hot

Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
International Journal of Engineering Inventions www.ijeijournal.com
 
E E 481 Lab 1
E E 481 Lab 1E E 481 Lab 1
E E 481 Lab 1
Chad Weiss
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter Three
Chung Hua Universit
 
B010310813
B010310813B010310813
B010310813
IOSR Journals
 
Least Squares
Least SquaresLeast Squares
Least Squares
Christopher Carbone
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
sundarKanagaraj1
 
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ssusere0a682
 
Study Material Numerical Solution of Odinary Differential Equations
Study Material Numerical Solution of Odinary Differential EquationsStudy Material Numerical Solution of Odinary Differential Equations
Study Material Numerical Solution of Odinary Differential Equations
Meenakshisundaram N
 
Fourier series
Fourier series Fourier series
Fourier series
Santhanam Krishnan
 
Integral calculus
Integral calculusIntegral calculus
Integral calculus
Santhanam Krishnan
 
基礎からのベイズ統計学 輪読会資料 第8章 「比率・相関・信頼性」
基礎からのベイズ統計学 輪読会資料  第8章 「比率・相関・信頼性」基礎からのベイズ統計学 輪読会資料  第8章 「比率・相関・信頼性」
基礎からのベイズ統計学 輪読会資料 第8章 「比率・相関・信頼性」
Ken'ichi Matsui
 
Fourier transforms
Fourier transformsFourier transforms
Fourier transforms
Santhanam Krishnan
 
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAn Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
Alex (Oleksiy) Varfolomiyev
 
Mx/G(a,b)/1 With Modified Vacation, Variant Arrival Rate With Restricted Admi...
Mx/G(a,b)/1 With Modified Vacation, Variant Arrival Rate With Restricted Admi...Mx/G(a,b)/1 With Modified Vacation, Variant Arrival Rate With Restricted Admi...
Mx/G(a,b)/1 With Modified Vacation, Variant Arrival Rate With Restricted Admi...
IJRES Journal
 
Matrices ii
Matrices iiMatrices ii
Matrices ii
Solo Hermelin
 
Solution to second order pde
Solution to second order pdeSolution to second order pde
Solution to second order pde
Santhanam Krishnan
 
slides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial scienceslides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial science
Arthur Charpentier
 
Integrales solucionario
Integrales solucionarioIntegrales solucionario
Integrales solucionario
Gualberto Lopéz Durán
 
slides tails copulas
slides tails copulasslides tails copulas
slides tails copulas
Arthur Charpentier
 
DissertationSlides169
DissertationSlides169DissertationSlides169
DissertationSlides169
Ryan White
 

What's hot (20)

Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
Periodic Solutions for Nonlinear Systems of Integro-Differential Equations of...
 
E E 481 Lab 1
E E 481 Lab 1E E 481 Lab 1
E E 481 Lab 1
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter Three
 
B010310813
B010310813B010310813
B010310813
 
Least Squares
Least SquaresLeast Squares
Least Squares
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
 
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
 
Study Material Numerical Solution of Odinary Differential Equations
Study Material Numerical Solution of Odinary Differential EquationsStudy Material Numerical Solution of Odinary Differential Equations
Study Material Numerical Solution of Odinary Differential Equations
 
Fourier series
Fourier series Fourier series
Fourier series
 
Integral calculus
Integral calculusIntegral calculus
Integral calculus
 
基礎からのベイズ統計学 輪読会資料 第8章 「比率・相関・信頼性」
基礎からのベイズ統計学 輪読会資料  第8章 「比率・相関・信頼性」基礎からのベイズ統計学 輪読会資料  第8章 「比率・相関・信頼性」
基礎からのベイズ統計学 輪読会資料 第8章 「比率・相関・信頼性」
 
Fourier transforms
Fourier transformsFourier transforms
Fourier transforms
 
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAn Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
 
Mx/G(a,b)/1 With Modified Vacation, Variant Arrival Rate With Restricted Admi...
Mx/G(a,b)/1 With Modified Vacation, Variant Arrival Rate With Restricted Admi...Mx/G(a,b)/1 With Modified Vacation, Variant Arrival Rate With Restricted Admi...
Mx/G(a,b)/1 With Modified Vacation, Variant Arrival Rate With Restricted Admi...
 
Matrices ii
Matrices iiMatrices ii
Matrices ii
 
Solution to second order pde
Solution to second order pdeSolution to second order pde
Solution to second order pde
 
slides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial scienceslides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial science
 
Integrales solucionario
Integrales solucionarioIntegrales solucionario
Integrales solucionario
 
slides tails copulas
slides tails copulasslides tails copulas
slides tails copulas
 
DissertationSlides169
DissertationSlides169DissertationSlides169
DissertationSlides169
 

Viewers also liked

Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
Machine Learning in the Real World
Machine Learning in the Real WorldMachine Learning in the Real World
Machine Learning in the Real World
Srinath Perera
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
Tomonari Masada
 
RNN Explore
RNN ExploreRNN Explore
RNN Explore
Yan Kang
 
LSTM 네트워크 이해하기
LSTM 네트워크 이해하기LSTM 네트워크 이해하기
LSTM 네트워크 이해하기
Mad Scientists
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
MLconf
 
論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」
kurotaki_weblab
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
ananth
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
hytae
 
RNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq Models
Emory NLP
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
健程 杨
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
Andrii Gakhov
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
Taegyun Jeon
 

Viewers also liked (14)

Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Machine Learning in the Real World
Machine Learning in the Real WorldMachine Learning in the Real World
Machine Learning in the Real World
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
 
RNN Explore
RNN ExploreRNN Explore
RNN Explore
 
LSTM 네트워크 이해하기
LSTM 네트워크 이해하기LSTM 네트워크 이해하기
LSTM 네트워크 이해하기
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
 
論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
 
RNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq Models
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 

Similar to Recurrent Networks and LSTM deep dive

Lecture 3 - Series Expansion III.pptx
Lecture 3 - Series Expansion III.pptxLecture 3 - Series Expansion III.pptx
Lecture 3 - Series Expansion III.pptx
Pratik P Chougule
 
Functions of severable variables
Functions of severable variablesFunctions of severable variables
Functions of severable variables
Santhanam Krishnan
 
FOURIER SERIES Presentation of given functions.pptx
FOURIER SERIES Presentation of given functions.pptxFOURIER SERIES Presentation of given functions.pptx
FOURIER SERIES Presentation of given functions.pptx
jyotidighole2
 
내적의 이해.pdf
내적의 이해.pdf내적의 이해.pdf
내적의 이해.pdf
Lee Dustin
 
Computation in Real Closed Infinitesimal and Transcendental Extensions of the...
Computation in Real Closed Infinitesimal and Transcendental Extensions of the...Computation in Real Closed Infinitesimal and Transcendental Extensions of the...
Computation in Real Closed Infinitesimal and Transcendental Extensions of the...
TawseefAhmad25
 
09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices
Hipólito Aguilar
 
Mpc 006 - 02-01 product moment coefficient of correlation
Mpc 006 - 02-01 product moment coefficient of correlationMpc 006 - 02-01 product moment coefficient of correlation
Mpc 006 - 02-01 product moment coefficient of correlation
Vasant Kothari
 
Numerical Methods and Analysis
Numerical Methods and AnalysisNumerical Methods and Analysis
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullah
Ali Abdullah
 
Trapezoidal Method IN Numerical Analysis
Trapezoidal Method IN  Numerical AnalysisTrapezoidal Method IN  Numerical Analysis
Trapezoidal Method IN Numerical Analysis
Mostafijur Rahman
 
Fisica matematica final
Fisica matematica finalFisica matematica final
Fisica matematica final
danbohe
 
07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing
JordanJohmMallillin
 
taller transformaciones lineales
taller transformaciones linealestaller transformaciones lineales
taller transformaciones lineales
emojose107
 
Taller 1 parcial 3
Taller 1 parcial 3Taller 1 parcial 3
Taller 1 parcial 3
katherinecedeo11
 
derivatives part 1.pptx
derivatives part 1.pptxderivatives part 1.pptx
derivatives part 1.pptx
KulsumPaleja1
 
Piii taller transformaciones lineales
Piii taller transformaciones linealesPiii taller transformaciones lineales
Piii taller transformaciones lineales
JHANDRYALCIVARGUAJAL
 
Lecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdfLecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdf
MohammedKhodary4
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
RajKumar382958
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
RajKumar382958
 
Fourier series and pde
Fourier series and pdeFourier series and pde
Fourier series and pde
HosseinGholizadeh7
 

Similar to Recurrent Networks and LSTM deep dive (20)

Lecture 3 - Series Expansion III.pptx
Lecture 3 - Series Expansion III.pptxLecture 3 - Series Expansion III.pptx
Lecture 3 - Series Expansion III.pptx
 
Functions of severable variables
Functions of severable variablesFunctions of severable variables
Functions of severable variables
 
FOURIER SERIES Presentation of given functions.pptx
FOURIER SERIES Presentation of given functions.pptxFOURIER SERIES Presentation of given functions.pptx
FOURIER SERIES Presentation of given functions.pptx
 
내적의 이해.pdf
내적의 이해.pdf내적의 이해.pdf
내적의 이해.pdf
 
Computation in Real Closed Infinitesimal and Transcendental Extensions of the...
Computation in Real Closed Infinitesimal and Transcendental Extensions of the...Computation in Real Closed Infinitesimal and Transcendental Extensions of the...
Computation in Real Closed Infinitesimal and Transcendental Extensions of the...
 
09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices
 
Mpc 006 - 02-01 product moment coefficient of correlation
Mpc 006 - 02-01 product moment coefficient of correlationMpc 006 - 02-01 product moment coefficient of correlation
Mpc 006 - 02-01 product moment coefficient of correlation
 
Numerical Methods and Analysis
Numerical Methods and AnalysisNumerical Methods and Analysis
Numerical Methods and Analysis
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullah
 
Trapezoidal Method IN Numerical Analysis
Trapezoidal Method IN  Numerical AnalysisTrapezoidal Method IN  Numerical Analysis
Trapezoidal Method IN Numerical Analysis
 
Fisica matematica final
Fisica matematica finalFisica matematica final
Fisica matematica final
 
07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing07-Convolution.pptx signal spectra and signal processing
07-Convolution.pptx signal spectra and signal processing
 
taller transformaciones lineales
taller transformaciones linealestaller transformaciones lineales
taller transformaciones lineales
 
Taller 1 parcial 3
Taller 1 parcial 3Taller 1 parcial 3
Taller 1 parcial 3
 
derivatives part 1.pptx
derivatives part 1.pptxderivatives part 1.pptx
derivatives part 1.pptx
 
Piii taller transformaciones lineales
Piii taller transformaciones linealesPiii taller transformaciones lineales
Piii taller transformaciones lineales
 
Lecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdfLecture5_Laplace_ODE.pdf
Lecture5_Laplace_ODE.pdf
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Fourier series and pde
Fourier series and pdeFourier series and pde
Fourier series and pde
 

Recently uploaded

Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 

Recently uploaded (20)

Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 

Recurrent Networks and LSTM deep dive

Editor's Notes

  1. First, we calculate new hidden state. We use both the previous hidden state and the input. Using the previous hidden states provides “memory”. Then, we use new hidden state to calculate new output, y. This is a forward pass. All operations are differentiable, so we can use vanilla back-propagation to train our network.
  2. First, we calculate new hidden state. We use both the previous hidden state and the input. Using the previous hidden states provides “memory”. Then, we use new hidden state to calculate new output, y. This is a forward pass. All operations are differentiable, so we can use vanilla back-propagation to train our network.
  3. We need to train our network. Calculating L is a forward pass. Updating weights is a back-propagation.
  4. How to calculate the derivative of function.