SlideShare a Scribd company logo
1 of 120
Download to read offline
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
1
Lecture 10:
Recurrent Neural Networks
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
2
Administrative
- Project TA matchups out, see Ed for the link
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
3
Administrative
- A2 is due next Monday May 2nd, 11:59pm
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
4
Administrative
- Discussion section tomorrow 2:30-3:30PT
Object detection & RNNs Review
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
5
Last time: Detection and Segmentation
Classification
Semantic
Segmentation
Object
Detection
Instance
Segmentation
CAT GRASS, CAT,
TREE, SKY
DOG, DOG, CAT DOG, DOG, CAT
No spatial extent Multiple Objects
No objects, just pixels This image is CC0 public domain
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
1. One time set up: activation functions, preprocessing,
weight initialization, regularization, gradient checking
2. Training dynamics: babysitting the learning process,
parameter updates, hyperparameter optimization
3. Evaluation: model ensembles, test-time
augmentation, transfer learning
Training “Feedforward” Neural Networks
6
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
7
Today: Recurrent Neural Networks
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
8
Vanilla Neural Networks
“Vanilla” Neural Network
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
9
Recurrent Neural Networks: Process Sequences
e.g. Image Captioning
image -> sequence of words
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
10
Recurrent Neural Networks: Process Sequences
e.g. action prediction
sequence of video frames -> action class
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
11
Recurrent Neural Networks: Process Sequences
E.g. Video Captioning
Sequence of video frames -> caption
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
12
Recurrent Neural Networks: Process Sequences
e.g. Video classification on frame level
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
13
Sequential Processing of Non-Sequence Data
Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015.
Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015
Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra,
2015. Reproduced with permission.
Classify images by taking a
series of “glimpses”
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
14
Sequential Processing of Non-Sequence Data
Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015
Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with
permission.
Generate images one piece at a time!
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
15
Recurrent Neural Network
x
RNN
y
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
16
Recurrent Neural Network
x
RNN
y
Key idea: RNNs have an
“internal state” that is
updated as a sequence is
processed
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
17
Unrolled RNN
x1
RNN
y1
x2
RNN
y2
x3
RNN
y3
...
xt
RNN
yt
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
18
RNN hidden state update
x
RNN
y
We can process a sequence of vectors x by
applying a recurrence formula at every time step:
new state old state input vector at
some time step
some function
with parameters W
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
19
RNN output generation
x
RNN
y
We can process a sequence of vectors x by
applying a recurrence formula at every time step:
new state
another function
with parameters Wo
output
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
20
Recurrent Neural Network
x1
RNN
y1
x2
RNN
y2
x3
RNN
y3
...
xt
RNN
yt
h1
h2
h3
h0
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
21
Recurrent Neural Network
x
RNN
y
We can process a sequence of vectors x by
applying a recurrence formula at every time step:
Notice: the same function and the same set
of parameters are used at every time step.
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
22
(Vanilla) Recurrent Neural Network
x
RNN
y
The state consists of a single “hidden” vector h:
Sometimes called a “Vanilla RNN” or an
“Elman RNN” after Prof. Jeffrey Elman
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
23
h0 fW
h1
x1
RNN: Computational Graph
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
24
h0 fW
h1 fW
h2
x2
x1
RNN: Computational Graph
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
25
h0 fW
h1 fW
h2 fW
h3
x3
…
x2
x1
RNN: Computational Graph
hT
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
26
h0 fW
h1 fW
h2 fW
h3
x3
…
x2
x1
W
RNN: Computational Graph
Re-use the same weight matrix at every time-step
hT
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
27
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2
y1
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
28
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2
y1
L1
L2
L3
LT
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
29
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2
y1
L1
L2
L3
LT
L
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
30
h0 fW
h1 fW
h2 fW
h3
x3
y
…
x2
x1
W
RNN: Computational Graph: Many to One
hT
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
31
h0 fW
h1 fW
h2 fW
h3
x3
y
…
x2
x1
W
RNN: Computational Graph: Many to One
hT
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
32
h0 fW
h1 fW
h2 fW
h3
yT
…
x
W
RNN: Computational Graph: One to Many
hT
y3
y2
y1
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
33
h0 fW
h1 fW
h2 fW
h3
yT
…
x
W
RNN: Computational Graph: One to Many
hT
y3
y2
y1
? ?
?
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
34
h0 fW
h1 fW
h2 fW
h3
yT
…
x
W
RNN: Computational Graph: One to Many
hT
y3
y2
y1
0 0
0
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
yT-1
35
h0 fW
h1 fW
h2 fW
h3
yT
…
x
W
RNN: Computational Graph: One to Many
hT
y3
y2
y1
y1
y2
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
36
Sequence to Sequence: Many-to-one + one-to-many
h0 fW
h1 fW
h2 fW
h3
x3
…
x2
x1
W1
hT
Many to one: Encode input
sequence in a single vector
Sutskever et al, “Sequence to Sequence Learning with Neural Networks”, NIPS 2014
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
37
Sequence to Sequence: Many-to-one +
one-to-many
y1
y2
…
Many to one: Encode input
sequence in a single vector
One to many: Produce output
sequence from single input vector
fW
h1 fW
h2 fW
W2
Sutskever et al, “Sequence to Sequence Learning with Neural Networks”, NIPS 2014
h0 fW
h1 fW
h2 fW
h3
x3
…
x2
x1
W1
hT
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
38
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
39
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
40
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
41
Example: Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample characters
one at a time, feed back to
model
.03
.84
.00
.13
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
42
.03
.84
.00
.13
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Example: Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time, feed
back to model
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
43
.03
.84
.00
.13
.25
.20
.50
.05
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Example: Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time, feed
back to model
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
44
.03
.84
.00
.13
.25
.20
.50
.05
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Example: Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time, feed
back to model
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
[w11
w12
w13
w14
] [1] [w11
]
[w21
w22
w23
w14
] [0] = [w21
]
[w31
w32
w33
w14
] [0] [w31
]
[0]
Matrix multiply with a one-hot vector just
extracts a column from the weight matrix.
We often put a separate embedding layer
between input and hidden layers.
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.17
.68
.03
Embedding
layer
Example: Character-level
Language Model
Sampling
45
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
46
Backpropagation through time
Loss
Forward through entire sequence to
compute loss, then backward through
entire sequence to compute gradient
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
47
Truncated Backpropagation through time
Loss
Run forward and backward
through chunks of the
sequence instead of whole
sequence
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
48
Truncated Backpropagation through time
Loss
Carry hidden states
forward in time forever,
but only backpropagate
for some smaller
number of steps
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
49
Truncated Backpropagation through time
Loss
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
50
min-char-rnn.py gist: 112 lines of Python
(https://gist.github.com/karpathy/d4dee
566867f8291f086)
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
51
x
RNN
y
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
52
train more
train more
train more
at first:
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
53
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
54
The Stacks Project: open source algebraic geometry textbook
Latex source http://stacks.math.columbia.edu/
The stacks project is licensed under the GNU Free Documentation License
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
55
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
56
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
57
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
58
Generated
C code
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
59
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
60
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
61
https://openai.com/blog/openai-codex/
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
OpenAI GPT-2 generated text
62
Input: In a shocking finding, scientist discovered a herd of unicorns living in a remote,
previously unexplored valley, in the Andes Mountains. Even more surprising to the
researchers was the fact that the unicorns spoke perfect English.
Output: The scientist named the population, after their distinctive horn, Ovid’s Unicorn.
These four-horned, silver-white unicorns were previously unknown to science.
Now, after almost two centuries, the mystery of what sparked this odd phenomenon is
finally solved.
Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several
companions, were exploring the Andes Mountains when they found a small valley, with no
other animals or humans. Pérez noticed that the valley had what appeared to be a natural
fountain, surrounded by two peaks of rock and silver snow.
source
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
63
Searching for interpretable cells
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
64
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
65
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
quote detection cell
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
66
Searching for interpretable cells
line length tracking cell
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
67
Searching for interpretable cells
if statement cell
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
68
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
quote/comment cell
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
69
Searching for interpretable cells
code depth cell
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
RNN tradeoffs
RNN Advantages:
- Can process any length input
- Computation for step t can (in theory) use information from many steps back
- Model size doesn’t increase for longer input
- Same weights applied on every timestep, so there is symmetry in how inputs
are processed.
RNN Disadvantages:
- Recurrent computation is slow
- In practice, difficult to access information from many steps back
70
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
71
Explain Images with Multimodal Recurrent Neural Networks, Mao et al.
Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei
Show and Tell: A Neural Image Caption Generator, Vinyals et al.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.
Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick
Image Captioning
Figure from Karpathy et a, “Deep
Visual-Semantic Alignments for Generating
Image Descriptions”, CVPR 2015; figure
copyright IEEE, 2015.
Reproduced for educational purposes.
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
72
Convolutional Neural Network
Recurrent Neural Network
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
test image
This image is CC0 public domain
73
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
test image
74
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
test image
X
75
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
test image
x0
<START>
76
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
h0
y0
test image
before:
h = tanh(Wxh
* x + Whh
* h)
now:
h = tanh(Wxh
* x + Whh
* h + Wih
* v)
Wih
x0
<START>
77
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
h0
y0
test image
straw
sample!
x0
<START>
78
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
h0
y0
test image
straw
h1
y1
x0
<START>
79
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
h0
y0
test image
straw
h1
y1
hat
sample!
x0
<START>
80
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
h0
y0
test image
straw
h1
y1
hat
h2
y2
x0
<START>
81
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
h0
y0
test image
straw
h1
y1
hat
h2
y2
sample
<END> token
=> finish.
x0
<START>
82
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
83
A cat sitting on a
suitcase on the floor
A cat is sitting on a tree
branch
A dog is running in the
grass with a frisbee
A white teddy bear sitting in
the grass
Two people walking on
the beach with surfboards
Two giraffes standing in a
grassy field
A man riding a dirt bike on
a dirt track
Image Captioning: Example Results
A tennis player in action
on the court
Captions generated using neuraltalk2
All images are CC0 Public domain:
cat suitcase, cat tree, dog, bear,
surfers, tennis, giraffe, motorcycle
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
84
Image Captioning: Failure Cases
A woman is holding a cat
in her hand
A woman standing on a
beach holding a surfboard
A person holding a
computer mouse on a desk
A bird is perched on
a tree branch
A man in a
baseball uniform
throwing a ball
Captions generated using neuraltalk2
All images are CC0 Public domain: fur
coat, handstand, spider web, baseball
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
85
Visual Question Answering (VQA)
Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015
Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016
Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
86
Agrawal et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2015
Figures from Agrawal et al, copyright IEEE 2015. Reproduced for educational purposes.
Visual Question Answering (VQA)
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
87
Das et al, “Visual Dialog”, CVPR 2017
Figures from Das et al, copyright IEEE 2017. Reproduced with permission.
Visual Dialog: Conversations about images
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
Agent encodes instructions in
language and uses an RNN to
generate a series of movements as the
visual input changes after each move.
88
Wang et al, “Reinforced Cross-Modal Matching and Self-Supervised
Imitation Learning for Vision-Language Navigation”, CVPR 2018
Figures from Wang et al, copyright IEEE 2017. Reproduced with permission.
Visual Language Navigation: Go to the living room
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
Jabri et al. “Revisiting Visual Question Answering Baselines” ECCV 2016
89
Visual Question Answering: Dataset Bias
All images are CC0 Public domain:
dog,
What is the dog
playing with?
Frisbee
Image
Question
Answer
Model Yes or No
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
90
time
depth
Multilayer RNNs
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
91
Long Short Term Memory (LSTM)
Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997
Vanilla RNN LSTM
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
92
ht-1
xt
W
stack
tanh
ht
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
yt
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
93
ht-1
xt
W
stack
tanh
ht
Vanilla RNN Gradient Flow
Backpropagation from ht
to ht-1
multiplies by W
(actually Whh
T
)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
yt
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
94
ht-1
xt
W
stack
tanh
ht
Vanilla RNN Gradient Flow
Backpropagation from ht
to ht-1
multiplies by W
(actually Whh
T
)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
yt
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
95
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
96
Vanilla RNN Gradient Flow
Gradients over multiple time steps:
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
97
Vanilla RNN Gradient Flow
Gradients over multiple time steps:
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
98
Vanilla RNN Gradient Flow
Gradients over multiple time steps:
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
99
Vanilla RNN Gradient Flow
Gradients over multiple time steps:
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
Almost always < 1
Vanishing gradients
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
100
Vanilla RNN Gradient Flow
Gradients over multiple time steps:
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
What if we assumed no non-linearity?
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
101
Vanilla RNN Gradient Flow
Gradients over multiple time steps:
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
What if we assumed no non-linearity?
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
102
Vanilla RNN Gradient Flow
Gradients over multiple time steps:
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
What if we assumed no non-linearity?
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Gradient clipping:
Scale gradient if its
norm is too big
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
103
Vanilla RNN Gradient Flow
Gradients over multiple time steps:
h0
h1
h2
h3
h4
x1
x2
x3
x4
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
y1
y2
y3
y4
What if we assumed no non-linearity?
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Change RNN
architecture
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
104
Long Short Term Memory (LSTM)
Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997
Vanilla RNN LSTM
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
105
Long Short Term Memory (LSTM)
Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997
Vanilla RNN LSTM
Cell state
Hidden state
Four gates
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
106
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x
h
vector from
before (h)
W
i
f
o
g
vector from
below (x)
sigmoid
sigmoid
tanh
sigmoid
4h x 2h 4h 4*h
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
107
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x
h
vector from
before (h)
W
i
f
o
g
vector from
below (x)
sigmoid
sigmoid
tanh
sigmoid
4h x 2h 4h
i: Input gate, whether to write to cell
f: Forget gate, Whether to erase cell
o: Output gate, How much to reveal cell
g: Gate gate (?), How much to write to cell
4*h
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
i: Input gate, whether to write to cell
f: Forget gate, Whether to erase cell
o: Output gate, How much to reveal cell
g: Gate gate (?), How much to write to cell
108
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x
h
vector from
before (h)
W
i
f
o
g
vector from
below (x)
sigmoid
sigmoid
tanh
sigmoid
4h x 2h 4h 4*h
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
109
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x
h
vector from
before (h)
W
i
f
o
g
vector from
below (x)
sigmoid
sigmoid
tanh
sigmoid
4h x 2h 4h 4*h
i: Input gate, whether to write to cell
f: Forget gate, Whether to erase cell
o: Output gate, How much to reveal cell
g: Gate gate (?), How much to write to cell
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
110
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x
h
vector from
before (h)
W
i
f
o
g
vector from
below (x)
sigmoid
sigmoid
tanh
sigmoid
4h x 2h 4h 4*h
i: Input gate, whether to write to cell
f: Forget gate, Whether to erase cell
o: Output gate, How much to reveal cell
g: Gate gate (?), How much to write to cell
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
☉
111
ct-1
ht-1
xt
f
i
g
o
W ☉
+ ct
tanh
☉ ht
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
stack
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
☉
112
ct-1
ht-1
xt
f
i
g
o
W ☉
+ ct
tanh
☉ ht
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
stack
Backpropagation from ct
to
ct-1
only elementwise
multiplication by f, no matrix
multiply by W
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
113
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Notice that the gradient contains the f gate’s vector of activations
- allows better control of gradients values, using suitable parameter updates of the
forget gate.
Also notice that are added through the f, i, g, and o gates
- better balancing of gradient values
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
Do LSTMs solve the vanishing gradient problem?
The LSTM architecture makes it easier for the RNN to preserve information
over many timesteps
- e.g. if the f = 1 and the i = 0, then the information of that cell is preserved
indefinitely.
- By contrast, it’s harder for vanilla RNN to learn a recurrent weight matrix
Wh that preserves info in hidden state
LSTM doesn’t guarantee that there is no vanishing/exploding gradient, but it
does provide an easier way for the model to learn long-distance dependencies
114
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
115
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Input
Softmax
3x3
conv,
64
7x7
conv,
64
/
2
FC
1000
Pool
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
128
3x3
conv,
128
/
2
3x3
conv,
128
3x3
conv,
128
3x3
conv,
128
3x3
conv,
128
...
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
Pool
Similar to ResNet!
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
116
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Input
Softmax
3x3
conv,
64
7x7
conv,
64
/
2
FC
1000
Pool
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
128
3x3
conv,
128
/
2
3x3
conv,
128
3x3
conv,
128
3x3
conv,
128
3x3
conv,
128
...
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
Pool
Similar to ResNet!
In between:
Highway Networks
Srivastava et al, “Highway Networks”,
ICML DL Workshop 2015
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
117
Other RNN Variants
[LSTM: A Search Space Odyssey,
Greff et al., 2015]
[An Empirical Exploration of
Recurrent Network Architectures,
Jozefowicz et al., 2015]
GRU [Learning phrase representations using rnn
encoder-decoder for statistical machine translation,
Cho et al. 2014]
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
LSTM cell
118
Neural Architecture Search for RNN architectures
Zoph et Le, “Neural Architecture Search with Reinforcement Learning”, ICLR 2017
Figures copyright Zoph et al, 2017. Reproduced with permission.
Cell they found
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
119
Summary
- RNNs allow a lot of flexibility in architecture design
- Vanilla RNNs are simple but don’t work very well
- Common to use LSTM or GRU: their additive interactions
improve gradient flow
- Backward flow of gradients in RNN can explode or vanish.
Exploding is controlled with gradient clipping. Vanishing is
controlled with additive interactions (LSTM)
- Better/simpler architectures are a hot topic of current research,
as well as new paradigms for reasoning over sequences
- Better understanding (both theoretical and empirical) is needed.
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022
120
Next time: Attention and Transformers

More Related Content

Similar to lecture_10_ruohanTTTTTTTTTTTTTTTTTTT.pdf

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
DeepVO - Towards Visual Odometry with Deep Learning
DeepVO - Towards Visual Odometry with Deep LearningDeepVO - Towards Visual Odometry with Deep Learning
DeepVO - Towards Visual Odometry with Deep LearningJacky Liu
 
Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)Yehya Abouelnaga
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 
Continuous Learning with Deep Architectures
Continuous Learning with Deep ArchitecturesContinuous Learning with Deep Architectures
Continuous Learning with Deep ArchitecturesVincenzo Lomonaco
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingYanbin Kong
 
Multimedia data mining using deep learning
Multimedia data mining using deep learningMultimedia data mining using deep learning
Multimedia data mining using deep learningPeter Wlodarczak
 
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Universitat Politècnica de Catalunya
 
Recurrent Neural Network and natural languages processing.pdf
Recurrent Neural Network and natural languages processing.pdfRecurrent Neural Network and natural languages processing.pdf
Recurrent Neural Network and natural languages processing.pdfDayakerP
 
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in RGentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in RMarco Wirthlin
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
PR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypointsPR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypointsjaewon lee
 
Underwater sparse image classification using deep convolutional neural networks
Underwater sparse image classification using deep convolutional neural networksUnderwater sparse image classification using deep convolutional neural networks
Underwater sparse image classification using deep convolutional neural networksMohamed Elawady
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Short course fuzzy logic applications
Short course fuzzy logic applicationsShort course fuzzy logic applications
Short course fuzzy logic applicationsDalia Baziuke
 
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...Tomoyuki Suzuki
 

Similar to lecture_10_ruohanTTTTTTTTTTTTTTTTTTT.pdf (20)

lecture_7.pdf
lecture_7.pdflecture_7.pdf
lecture_7.pdf
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
DeepVO - Towards Visual Odometry with Deep Learning
DeepVO - Towards Visual Odometry with Deep LearningDeepVO - Towards Visual Odometry with Deep Learning
DeepVO - Towards Visual Odometry with Deep Learning
 
Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)Visual Object Tracking: Action-Decision Networks (ADNets)
Visual Object Tracking: Action-Decision Networks (ADNets)
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Continuous Learning with Deep Architectures
Continuous Learning with Deep ArchitecturesContinuous Learning with Deep Architectures
Continuous Learning with Deep Architectures
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
 
Multimedia data mining using deep learning
Multimedia data mining using deep learningMultimedia data mining using deep learning
Multimedia data mining using deep learning
 
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
 
Recurrent Neural Network and natural languages processing.pdf
Recurrent Neural Network and natural languages processing.pdfRecurrent Neural Network and natural languages processing.pdf
Recurrent Neural Network and natural languages processing.pdf
 
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in RGentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
PR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypointsPR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypoints
 
Video + Language 2019
Video + Language 2019Video + Language 2019
Video + Language 2019
 
Video + Language
Video + LanguageVideo + Language
Video + Language
 
Underwater sparse image classification using deep convolutional neural networks
Underwater sparse image classification using deep convolutional neural networksUnderwater sparse image classification using deep convolutional neural networks
Underwater sparse image classification using deep convolutional neural networks
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
 
Short course fuzzy logic applications
Short course fuzzy logic applicationsShort course fuzzy logic applications
Short course fuzzy logic applications
 
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
 

Recently uploaded

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...vershagrag
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 

Recently uploaded (20)

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 

lecture_10_ruohanTTTTTTTTTTTTTTTTTTT.pdf

  • 1. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 1 Lecture 10: Recurrent Neural Networks
  • 2. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 2 Administrative - Project TA matchups out, see Ed for the link
  • 3. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 3 Administrative - A2 is due next Monday May 2nd, 11:59pm
  • 4. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 4 Administrative - Discussion section tomorrow 2:30-3:30PT Object detection & RNNs Review
  • 5. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 5 Last time: Detection and Segmentation Classification Semantic Segmentation Object Detection Instance Segmentation CAT GRASS, CAT, TREE, SKY DOG, DOG, CAT DOG, DOG, CAT No spatial extent Multiple Objects No objects, just pixels This image is CC0 public domain
  • 6. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 1. One time set up: activation functions, preprocessing, weight initialization, regularization, gradient checking 2. Training dynamics: babysitting the learning process, parameter updates, hyperparameter optimization 3. Evaluation: model ensembles, test-time augmentation, transfer learning Training “Feedforward” Neural Networks 6
  • 7. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 7 Today: Recurrent Neural Networks
  • 8. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 8 Vanilla Neural Networks “Vanilla” Neural Network
  • 9. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 9 Recurrent Neural Networks: Process Sequences e.g. Image Captioning image -> sequence of words
  • 10. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 10 Recurrent Neural Networks: Process Sequences e.g. action prediction sequence of video frames -> action class
  • 11. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 11 Recurrent Neural Networks: Process Sequences E.g. Video Captioning Sequence of video frames -> caption
  • 12. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 12 Recurrent Neural Networks: Process Sequences e.g. Video classification on frame level
  • 13. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 13 Sequential Processing of Non-Sequence Data Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015. Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Classify images by taking a series of “glimpses”
  • 14. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 14 Sequential Processing of Non-Sequence Data Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Generate images one piece at a time!
  • 15. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 15 Recurrent Neural Network x RNN y
  • 16. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 16 Recurrent Neural Network x RNN y Key idea: RNNs have an “internal state” that is updated as a sequence is processed
  • 17. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 17 Unrolled RNN x1 RNN y1 x2 RNN y2 x3 RNN y3 ... xt RNN yt
  • 18. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 18 RNN hidden state update x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step: new state old state input vector at some time step some function with parameters W
  • 19. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 19 RNN output generation x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step: new state another function with parameters Wo output
  • 20. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 20 Recurrent Neural Network x1 RNN y1 x2 RNN y2 x3 RNN y3 ... xt RNN yt h1 h2 h3 h0
  • 21. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 21 Recurrent Neural Network x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step: Notice: the same function and the same set of parameters are used at every time step.
  • 22. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 22 (Vanilla) Recurrent Neural Network x RNN y The state consists of a single “hidden” vector h: Sometimes called a “Vanilla RNN” or an “Elman RNN” after Prof. Jeffrey Elman
  • 23. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 23 h0 fW h1 x1 RNN: Computational Graph
  • 24. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 24 h0 fW h1 fW h2 x2 x1 RNN: Computational Graph
  • 25. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 25 h0 fW h1 fW h2 fW h3 x3 … x2 x1 RNN: Computational Graph hT
  • 26. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 26 h0 fW h1 fW h2 fW h3 x3 … x2 x1 W RNN: Computational Graph Re-use the same weight matrix at every time-step hT
  • 27. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 27 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2 y1
  • 28. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 28 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2 y1 L1 L2 L3 LT
  • 29. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 29 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2 y1 L1 L2 L3 LT L
  • 30. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 30 h0 fW h1 fW h2 fW h3 x3 y … x2 x1 W RNN: Computational Graph: Many to One hT
  • 31. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 31 h0 fW h1 fW h2 fW h3 x3 y … x2 x1 W RNN: Computational Graph: Many to One hT
  • 32. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 32 h0 fW h1 fW h2 fW h3 yT … x W RNN: Computational Graph: One to Many hT y3 y2 y1
  • 33. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 33 h0 fW h1 fW h2 fW h3 yT … x W RNN: Computational Graph: One to Many hT y3 y2 y1 ? ? ?
  • 34. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 34 h0 fW h1 fW h2 fW h3 yT … x W RNN: Computational Graph: One to Many hT y3 y2 y1 0 0 0
  • 35. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 yT-1 35 h0 fW h1 fW h2 fW h3 yT … x W RNN: Computational Graph: One to Many hT y3 y2 y1 y1 y2
  • 36. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 36 Sequence to Sequence: Many-to-one + one-to-many h0 fW h1 fW h2 fW h3 x3 … x2 x1 W1 hT Many to one: Encode input sequence in a single vector Sutskever et al, “Sequence to Sequence Learning with Neural Networks”, NIPS 2014
  • 37. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 37 Sequence to Sequence: Many-to-one + one-to-many y1 y2 … Many to one: Encode input sequence in a single vector One to many: Produce output sequence from single input vector fW h1 fW h2 fW W2 Sutskever et al, “Sequence to Sequence Learning with Neural Networks”, NIPS 2014 h0 fW h1 fW h2 fW h3 x3 … x2 x1 W1 hT
  • 38. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 38 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 39. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 39 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 40. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 40 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 41. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 41 Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model .03 .84 .00 .13 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample
  • 42. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 42 .03 .84 .00 .13 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 43. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 43 .03 .84 .00 .13 .25 .20 .50 .05 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 44. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 44 .03 .84 .00 .13 .25 .20 .50 .05 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 45. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 [w11 w12 w13 w14 ] [1] [w11 ] [w21 w22 w23 w14 ] [0] = [w21 ] [w31 w32 w33 w14 ] [0] [w31 ] [0] Matrix multiply with a one-hot vector just extracts a column from the weight matrix. We often put a separate embedding layer between input and hidden layers. .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .17 .68 .03 Embedding layer Example: Character-level Language Model Sampling 45
  • 46. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 46 Backpropagation through time Loss Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient
  • 47. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 47 Truncated Backpropagation through time Loss Run forward and backward through chunks of the sequence instead of whole sequence
  • 48. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 48 Truncated Backpropagation through time Loss Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps
  • 49. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 49 Truncated Backpropagation through time Loss
  • 50. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 50 min-char-rnn.py gist: 112 lines of Python (https://gist.github.com/karpathy/d4dee 566867f8291f086)
  • 51. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 51 x RNN y
  • 52. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 52 train more train more train more at first:
  • 53. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 53
  • 54. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 54 The Stacks Project: open source algebraic geometry textbook Latex source http://stacks.math.columbia.edu/ The stacks project is licensed under the GNU Free Documentation License
  • 55. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 55
  • 56. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 56
  • 57. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 57
  • 58. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 58 Generated C code
  • 59. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 59
  • 60. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 60
  • 61. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 61 https://openai.com/blog/openai-codex/
  • 62. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 OpenAI GPT-2 generated text 62 Input: In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. Output: The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow. source
  • 63. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 63 Searching for interpretable cells
  • 64. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 64 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
  • 65. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 65 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission quote detection cell
  • 66. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 66 Searching for interpretable cells line length tracking cell Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
  • 67. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 67 Searching for interpretable cells if statement cell Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
  • 68. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 68 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission quote/comment cell
  • 69. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 69 Searching for interpretable cells code depth cell Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
  • 70. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 RNN tradeoffs RNN Advantages: - Can process any length input - Computation for step t can (in theory) use information from many steps back - Model size doesn’t increase for longer input - Same weights applied on every timestep, so there is symmetry in how inputs are processed. RNN Disadvantages: - Recurrent computation is slow - In practice, difficult to access information from many steps back 70
  • 71. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 71 Explain Images with Multimodal Recurrent Neural Networks, Mao et al. Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei Show and Tell: A Neural Image Caption Generator, Vinyals et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al. Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick Image Captioning Figure from Karpathy et a, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015; figure copyright IEEE, 2015. Reproduced for educational purposes.
  • 72. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 72 Convolutional Neural Network Recurrent Neural Network
  • 73. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 test image This image is CC0 public domain 73
  • 74. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 test image 74
  • 75. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 test image X 75
  • 76. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 test image x0 <START> 76
  • 77. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 h0 y0 test image before: h = tanh(Wxh * x + Whh * h) now: h = tanh(Wxh * x + Whh * h + Wih * v) Wih x0 <START> 77
  • 78. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 h0 y0 test image straw sample! x0 <START> 78
  • 79. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 h0 y0 test image straw h1 y1 x0 <START> 79
  • 80. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 h0 y0 test image straw h1 y1 hat sample! x0 <START> 80
  • 81. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 h0 y0 test image straw h1 y1 hat h2 y2 x0 <START> 81
  • 82. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 h0 y0 test image straw h1 y1 hat h2 y2 sample <END> token => finish. x0 <START> 82
  • 83. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 83 A cat sitting on a suitcase on the floor A cat is sitting on a tree branch A dog is running in the grass with a frisbee A white teddy bear sitting in the grass Two people walking on the beach with surfboards Two giraffes standing in a grassy field A man riding a dirt bike on a dirt track Image Captioning: Example Results A tennis player in action on the court Captions generated using neuraltalk2 All images are CC0 Public domain: cat suitcase, cat tree, dog, bear, surfers, tennis, giraffe, motorcycle
  • 84. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 84 Image Captioning: Failure Cases A woman is holding a cat in her hand A woman standing on a beach holding a surfboard A person holding a computer mouse on a desk A bird is perched on a tree branch A man in a baseball uniform throwing a ball Captions generated using neuraltalk2 All images are CC0 Public domain: fur coat, handstand, spider web, baseball
  • 85. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 85 Visual Question Answering (VQA) Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015 Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016 Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.
  • 86. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 86 Agrawal et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2015 Figures from Agrawal et al, copyright IEEE 2015. Reproduced for educational purposes. Visual Question Answering (VQA)
  • 87. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 87 Das et al, “Visual Dialog”, CVPR 2017 Figures from Das et al, copyright IEEE 2017. Reproduced with permission. Visual Dialog: Conversations about images
  • 88. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 Agent encodes instructions in language and uses an RNN to generate a series of movements as the visual input changes after each move. 88 Wang et al, “Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation”, CVPR 2018 Figures from Wang et al, copyright IEEE 2017. Reproduced with permission. Visual Language Navigation: Go to the living room
  • 89. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 Jabri et al. “Revisiting Visual Question Answering Baselines” ECCV 2016 89 Visual Question Answering: Dataset Bias All images are CC0 Public domain: dog, What is the dog playing with? Frisbee Image Question Answer Model Yes or No
  • 90. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 90 time depth Multilayer RNNs
  • 91. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 91 Long Short Term Memory (LSTM) Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997 Vanilla RNN LSTM
  • 92. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 92 ht-1 xt W stack tanh ht Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 yt
  • 93. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 93 ht-1 xt W stack tanh ht Vanilla RNN Gradient Flow Backpropagation from ht to ht-1 multiplies by W (actually Whh T ) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 yt
  • 94. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 94 ht-1 xt W stack tanh ht Vanilla RNN Gradient Flow Backpropagation from ht to ht-1 multiplies by W (actually Whh T ) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 yt
  • 95. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 95 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4
  • 96. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 96 Vanilla RNN Gradient Flow Gradients over multiple time steps: h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4
  • 97. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 97 Vanilla RNN Gradient Flow Gradients over multiple time steps: h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4
  • 98. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 98 Vanilla RNN Gradient Flow Gradients over multiple time steps: h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4
  • 99. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 99 Vanilla RNN Gradient Flow Gradients over multiple time steps: h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4 Almost always < 1 Vanishing gradients
  • 100. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 100 Vanilla RNN Gradient Flow Gradients over multiple time steps: h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4 What if we assumed no non-linearity?
  • 101. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 101 Vanilla RNN Gradient Flow Gradients over multiple time steps: h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4 What if we assumed no non-linearity? Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients
  • 102. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 102 Vanilla RNN Gradient Flow Gradients over multiple time steps: h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4 What if we assumed no non-linearity? Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients Gradient clipping: Scale gradient if its norm is too big
  • 103. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 103 Vanilla RNN Gradient Flow Gradients over multiple time steps: h0 h1 h2 h3 h4 x1 x2 x3 x4 Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013 y1 y2 y3 y4 What if we assumed no non-linearity? Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients Change RNN architecture
  • 104. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 104 Long Short Term Memory (LSTM) Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997 Vanilla RNN LSTM
  • 105. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 105 Long Short Term Memory (LSTM) Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997 Vanilla RNN LSTM Cell state Hidden state Four gates
  • 106. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 106 Long Short Term Memory (LSTM) [Hochreiter et al., 1997] x h vector from before (h) W i f o g vector from below (x) sigmoid sigmoid tanh sigmoid 4h x 2h 4h 4*h
  • 107. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 107 Long Short Term Memory (LSTM) [Hochreiter et al., 1997] x h vector from before (h) W i f o g vector from below (x) sigmoid sigmoid tanh sigmoid 4h x 2h 4h i: Input gate, whether to write to cell f: Forget gate, Whether to erase cell o: Output gate, How much to reveal cell g: Gate gate (?), How much to write to cell 4*h
  • 108. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 i: Input gate, whether to write to cell f: Forget gate, Whether to erase cell o: Output gate, How much to reveal cell g: Gate gate (?), How much to write to cell 108 Long Short Term Memory (LSTM) [Hochreiter et al., 1997] x h vector from before (h) W i f o g vector from below (x) sigmoid sigmoid tanh sigmoid 4h x 2h 4h 4*h
  • 109. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 109 Long Short Term Memory (LSTM) [Hochreiter et al., 1997] x h vector from before (h) W i f o g vector from below (x) sigmoid sigmoid tanh sigmoid 4h x 2h 4h 4*h i: Input gate, whether to write to cell f: Forget gate, Whether to erase cell o: Output gate, How much to reveal cell g: Gate gate (?), How much to write to cell
  • 110. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 110 Long Short Term Memory (LSTM) [Hochreiter et al., 1997] x h vector from before (h) W i f o g vector from below (x) sigmoid sigmoid tanh sigmoid 4h x 2h 4h 4*h i: Input gate, whether to write to cell f: Forget gate, Whether to erase cell o: Output gate, How much to reveal cell g: Gate gate (?), How much to write to cell
  • 111. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 ☉ 111 ct-1 ht-1 xt f i g o W ☉ + ct tanh ☉ ht Long Short Term Memory (LSTM) [Hochreiter et al., 1997] stack
  • 112. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 ☉ 112 ct-1 ht-1 xt f i g o W ☉ + ct tanh ☉ ht Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] stack Backpropagation from ct to ct-1 only elementwise multiplication by f, no matrix multiply by W
  • 113. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 113 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow! Notice that the gradient contains the f gate’s vector of activations - allows better control of gradients values, using suitable parameter updates of the forget gate. Also notice that are added through the f, i, g, and o gates - better balancing of gradient values
  • 114. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 Do LSTMs solve the vanishing gradient problem? The LSTM architecture makes it easier for the RNN to preserve information over many timesteps - e.g. if the f = 1 and the i = 0, then the information of that cell is preserved indefinitely. - By contrast, it’s harder for vanilla RNN to learn a recurrent weight matrix Wh that preserves info in hidden state LSTM doesn’t guarantee that there is no vanishing/exploding gradient, but it does provide an easier way for the model to learn long-distance dependencies 114
  • 115. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 115 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow! Input Softmax 3x3 conv, 64 7x7 conv, 64 / 2 FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128 3x3 conv, 128 / 2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 ... 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Pool Similar to ResNet!
  • 116. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 116 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow! Input Softmax 3x3 conv, 64 7x7 conv, 64 / 2 FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128 3x3 conv, 128 / 2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 ... 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Pool Similar to ResNet! In between: Highway Networks Srivastava et al, “Highway Networks”, ICML DL Workshop 2015
  • 117. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 117 Other RNN Variants [LSTM: A Search Space Odyssey, Greff et al., 2015] [An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015] GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014]
  • 118. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 LSTM cell 118 Neural Architecture Search for RNN architectures Zoph et Le, “Neural Architecture Search with Reinforcement Learning”, ICLR 2017 Figures copyright Zoph et al, 2017. Reproduced with permission. Cell they found
  • 119. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 119 Summary - RNNs allow a lot of flexibility in architecture design - Vanilla RNNs are simple but don’t work very well - Common to use LSTM or GRU: their additive interactions improve gradient flow - Backward flow of gradients in RNN can explode or vanish. Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM) - Better/simpler architectures are a hot topic of current research, as well as new paradigms for reasoning over sequences - Better understanding (both theoretical and empirical) is needed.
  • 120. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 10 - April 28, 2022 120 Next time: Attention and Transformers