SlideShare a Scribd company logo
1 of 105
Download to read offline
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
1
Lecture 10:
Recurrent Neural Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
2
Administrative
A1 grades will go out soon
A2 is due today (11:59pm)
Midterm is in-class on Tuesday!
We will send out details on where to go soon
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
3
Extra Credit: Train Game
More details on Piazza
by early next week
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
4
Last Time: CNN Architectures
AlexNet
Figure copyright Kaiming He, 2016. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
5
Last Time: CNN Architectures
Figure copyright Kaiming He, 2016. Reproduced with permission.
3x3 conv, 128
Pool
3x3 conv, 64
3x3 conv, 64
Input
3x3 conv, 128
Pool
3x3 conv, 256
3x3 conv, 256
Pool
3x3 conv, 512
3x3 conv, 512
Pool
3x3 conv, 512
3x3 conv, 512
Pool
FC 4096
FC 1000
Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 256
3x3 conv, 256
3x3 conv, 128
3x3 conv, 128
3x3 conv, 64
3x3 conv, 64
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
FC 4096
FC 1000
FC 4096
VGG16 VGG19 GoogLeNet
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
6
Last Time: CNN Architectures
Figure copyright Kaiming He, 2016. Reproduced with permission.
Input
Softmax
3x3 conv, 64
7x7 conv, 64 / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 128
3x3 conv, 128 / 2
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
...
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
Pool
relu
Residual block
conv
conv
X
identity
F(x) + x
F(x)
relu
X
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
7
Figures copyright Larsson et al., 2017. Reproduced with permission.
Pool
Conv
Dense Block 1
Conv
Input
Conv
Dense Block 2
Conv
Pool
Conv
Dense Block 3
Softmax
FC
Pool
Conv
Conv
1x1 conv, 64
1x1 conv, 64
Input
Concat
Concat
Concat
Dense Block
DenseNet
FractalNet
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
8
Last Time: CNN Architectures
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
9
Last Time: CNN Architectures
AlexNet and VGG have
tons of parameters in the
fully connected layers
AlexNet: ~62M parameters
FC6: 256x6x6 -> 4096: 38M params
FC7: 4096 -> 4096: 17M params
FC8: 4096 -> 1000: 4M params
~59M params in FC layers!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
10
Today: Recurrent Neural Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
11
Vanilla Neural Networks
“Vanilla” Neural Network
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
12
Recurrent Neural Networks: Process Sequences
e.g. Image Captioning
image -> sequence of words
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
13
Recurrent Neural Networks: Process Sequences
e.g. Sentiment Classification
sequence of words -> sentiment
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
14
Recurrent Neural Networks: Process Sequences
e.g. Machine Translation
seq of words -> seq of words
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
15
Recurrent Neural Networks: Process Sequences
e.g. Video classification on frame level
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
16
Sequential Processing of Non-Sequence Data
Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015.
Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015
Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with
permission.
Classify images by taking a
series of “glimpses”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
17
Sequential Processing of Non-Sequence Data
Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015
Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with
permission.
Generate images one piece at a time!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
18
Recurrent Neural Network
x
RNN
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
19
Recurrent Neural Network
x
RNN
y
usually want to
predict a vector at
some time steps
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
20
Recurrent Neural Network
x
RNN
y
We can process a sequence of vectors x by
applying a recurrence formula at every time step:
new state old state input vector at
some time step
some function
with parameters W
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
21
Recurrent Neural Network
x
RNN
y
We can process a sequence of vectors x by
applying a recurrence formula at every time step:
Notice: the same function and the same set
of parameters are used at every time step.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
22
(Vanilla) Recurrent Neural Network
x
RNN
y
The state consists of a single “hidden” vector h:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
23
h0 fW
h1
x1
RNN: Computational Graph
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
24
h0 fW
h1 fW
h2
x2
x1
RNN: Computational Graph
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
25
h0 fW
h1 fW
h2 fW
h3
x3
…
x2
x1
RNN: Computational Graph
hT
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
26
h0 fW
h1 fW
h2 fW
h3
x3
…
x2
x1
W
RNN: Computational Graph
Re-use the same weight matrix at every time-step
hT
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
27
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2
y1
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
28
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2
y1
L1
L2
L3
LT
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
29
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2
y1
L1
L2
L3
LT
L
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
30
h0 fW
h1 fW
h2 fW
h3
x3
y
…
x2
x1
W
RNN: Computational Graph: Many to One
hT
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
31
h0 fW
h1 fW
h2 fW
h3
yT
…
x
W
RNN: Computational Graph: One to Many
hT
y3
y3
y3
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
32
Sequence to Sequence: Many-to-one +
one-to-many
h
0
fW
h
1
fW
h
2
fW
h
3
x
3
…
x
2
x
1
W
1
h
T
Many to one: Encode input
sequence in a single vector
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
33
Sequence to Sequence: Many-to-one +
one-to-many
h
0
fW
h
1
fW
h
2
fW
h
3
x
3
…
x
2
x
1
W
1
h
T
y
1
y
2
…
Many to one: Encode input
sequence in a single vector
One to many: Produce output
sequence from single input vector
fW
h
1
fW
h
2
fW
W
2
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
34
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
35
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
36
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
37
Example:
Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time,
feed back to model
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
38
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Example:
Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time,
feed back to model
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
39
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Example:
Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time,
feed back to model
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
40
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Example:
Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time,
feed back to model
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
41
Backpropagation through time
Loss
Forward through entire sequence to
compute loss, then backward through
entire sequence to compute gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
42
Truncated Backpropagation through time
Loss
Run forward and backward
through chunks of the
sequence instead of whole
sequence
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
43
Truncated Backpropagation through time
Loss
Carry hidden states
forward in time forever,
but only backpropagate
for some smaller
number of steps
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
44
Truncated Backpropagation through time
Loss
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
45
min-char-rnn.py gist: 112 lines of Python
(https://gist.github.com/karpathy/d4dee
566867f8291f086)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
46
x
RNN
y
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
47
train more
train more
train more
at first:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
48
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
49
The Stacks Project: open source algebraic geometry textbook
Latex source http://stacks.math.columbia.edu/
The stacks project is licensed under the GNU Free Documentation License
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
50
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
51
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
52
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
53
Generated
C code
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
54
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
55
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
56
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
57
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
58
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
quote detection cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
59
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
line length tracking cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
60
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
if statement cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
61
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
quote/comment cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
62
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
code depth cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
63
Explain Images with Multimodal Recurrent Neural Networks, Mao et al.
Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei
Show and Tell: A Neural Image Caption Generator, Vinyals et al.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.
Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick
Image Captioning
Figure from Karpathy et a, “Deep
Visual-Semantic Alignments for Generating
Image Descriptions”, CVPR 2015; figure
copyright IEEE, 2015.
Reproduced for educational purposes.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
64
Convolutional Neural Network
Recurrent Neural Network
test image
This image is CC0 public domain
test image
test image
X
test image
x0
<STA
RT>
<START>
h0
x0
<STA
RT>
y0
<START>
test image
before:
h = tanh(Wxh * x + Whh * h)
now:
h = tanh(Wxh * x + Whh * h + Wih * v)
v
Wih
h0
x0
<STA
RT>
y0
<START>
test image
straw
sample!
h0
x0
<STA
RT>
y0
<START>
test image
straw
h1
y1
h0
x0
<STA
RT>
y0
<START>
test image
straw
h1
y1
hat
sample!
h0
x0
<STA
RT>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
h0
x0
<STA
RT>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
sample
<END> token
=> finish.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
75
A cat sitting on a
suitcase on the floor
A cat is sitting on a tree
branch
A dog is running in the
grass with a frisbee
A white teddy bear sitting in
the grass
Two people walking on
the beach with surfboards
Two giraffes standing in a
grassy field
A man riding a dirt bike on
a dirt track
Image Captioning: Example Results
A tennis player in action
on the court
Captions generated using neuraltalk2
All images are CC0 Public domain:
cat suitcase, cat tree, dog, bear,
surfers, tennis, giraffe, motorcycle
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
76
Image Captioning: Failure Cases
A woman is holding a
cat in her hand
A woman standing on a
beach holding a surfboard
A person holding a
computer mouse on a desk
A bird is perched on
a tree branch
A man in a
baseball uniform
throwing a ball
Captions generated using neuraltalk2
All images are CC0 Public domain: fur
coat, handstand, spider web, baseball
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
77
Image Captioning with Attention
Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
RNN focuses its attention at a different spatial location
when generating each word
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
78
Image Captioning with Attention
CNN
Image:
H x W x 3
Features:
L x D
h0
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
79
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
Distribution over
L locations
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
80
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
Weighted
combination
of features
Distribution over
L locations
z1
Weighted
features: D
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
81
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
h1
Distribution over
L locations
Weighted
features: D
y1
First word
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
82
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
y1
h1
First word
Distribution over
L locations
a2 d1
Weighted
features: D
Distribution
over vocab
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
83
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
y1
h1
First word
Distribution over
L locations
a2 d1
h2
z2 y2
Weighted
features: D
Distribution
over vocab
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
84
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
y1
h1
First word
Distribution over
L locations
a2 d1
h2
a3 d2
z2 y2
Weighted
features: D
Distribution
over vocab
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
85
Soft attention
Hard attention
Image Captioning with Attention
Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
86
Image Captioning with Attention
Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
87
Visual Question Answering
Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015
Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016
Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
88
Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016
Figures from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.
Visual Question Answering: RNNs with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
89
time
depth
Multilayer RNNs
LSTM:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
90
ht-1
xt
W
stack
tanh
ht
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
91
ht-1
xt
W
stack
tanh
ht
Vanilla RNN Gradient Flow
Backpropagation from ht
to ht-1
multiplies by W
(actually Whh
T
)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
92
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Computing gradient
of h0
involves many
factors of W
(and repeated tanh)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
93
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Computing gradient
of h0
involves many
factors of W
(and repeated tanh)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
94
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Gradient clipping: Scale
gradient if its norm is too big
Computing gradient
of h0
involves many
factors of W
(and repeated tanh)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
95
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Computing gradient
of h0
involves many
factors of W
(and repeated tanh)
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Change RNN architecture
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
96
Long Short Term Memory (LSTM)
Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation
1997
Vanilla RNN LSTM
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
97
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x
h
vector from
before (h)
W
i
f
o
g
vector from
below (x)
sigmoid
sigmoid
tanh
sigmoid
4h x 2h 4h 4*h
f: Forget gate, Whether to erase cell
i: Input gate, whether to write to cell
g: Gate gate (?), How much to write to cell
o: Output gate, How much to reveal cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
☉
98
ct-1
ht-1
xt
f
i
g
o
W ☉
+ ct
tanh
☉ ht
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
stack
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
☉
99
ct-1
ht-1
xt
f
i
g
o
W ☉
+ ct
tanh
☉ ht
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
stack
Backpropagation from ct
to
ct-1
only elementwise
multiplication by f, no matrix
multiply by W
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
100
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
101
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Input
Softmax
3x3
conv,
64
7x7
conv,
64
/
2
FC
1000
Pool
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
128
3x3
conv,
128
/
2
3x3
conv,
128
3x3
conv,
128
3x3
conv,
128
3x3
conv,
128
...
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
Pool
Similar to ResNet!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
102
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Input
Softmax
3x3
conv,
64
7x7
conv,
64
/
2
FC
1000
Pool
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
128
3x3
conv,
128
/
2
3x3
conv,
128
3x3
conv,
128
3x3
conv,
128
3x3
conv,
128
...
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
3x3
conv,
64
Pool
Similar to ResNet!
In between:
Highway Networks
Srivastava et al, “Highway Networks”,
ICML DL Workshop 2015
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
103
Other RNN Variants
[LSTM: A Search Space Odyssey,
Greff et al., 2015]
[An Empirical Exploration of
Recurrent Network Architectures,
Jozefowicz et al., 2015]
GRU [Learning phrase representations using rnn
encoder-decoder for statistical machine translation,
Cho et al. 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
104
Summary
- RNNs allow a lot of flexibility in architecture design
- Vanilla RNNs are simple but don’t work very well
- Common to use LSTM or GRU: their additive interactions
improve gradient flow
- Backward flow of gradients in RNN can explode or vanish.
Exploding is controlled with gradient clipping. Vanishing is
controlled with additive interactions (LSTM)
- Better/simpler architectures are a hot topic of current research
- Better understanding (both theoretical and empirical) is needed.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
105
Next time: Midterm!
Then Detection and Segmentation

More Related Content

Recently uploaded

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Recently uploaded (20)

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Recurrent Neural Network and natural languages processing.pdf

  • 1. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 1 Lecture 10: Recurrent Neural Networks
  • 2. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 2 Administrative A1 grades will go out soon A2 is due today (11:59pm) Midterm is in-class on Tuesday! We will send out details on where to go soon
  • 3. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 3 Extra Credit: Train Game More details on Piazza by early next week
  • 4. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 4 Last Time: CNN Architectures AlexNet Figure copyright Kaiming He, 2016. Reproduced with permission.
  • 5. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 5 Last Time: CNN Architectures Figure copyright Kaiming He, 2016. Reproduced with permission. 3x3 conv, 128 Pool 3x3 conv, 64 3x3 conv, 64 Input 3x3 conv, 128 Pool 3x3 conv, 256 3x3 conv, 256 Pool 3x3 conv, 512 3x3 conv, 512 Pool 3x3 conv, 512 3x3 conv, 512 Pool FC 4096 FC 1000 Softmax FC 4096 3x3 conv, 512 3x3 conv, 512 Pool Input Pool Pool Pool Pool Softmax 3x3 conv, 512 3x3 conv, 512 3x3 conv, 256 3x3 conv, 256 3x3 conv, 128 3x3 conv, 128 3x3 conv, 64 3x3 conv, 64 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 FC 4096 FC 1000 FC 4096 VGG16 VGG19 GoogLeNet
  • 6. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 6 Last Time: CNN Architectures Figure copyright Kaiming He, 2016. Reproduced with permission. Input Softmax 3x3 conv, 64 7x7 conv, 64 / 2 FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128 3x3 conv, 128 / 2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 ... 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Pool relu Residual block conv conv X identity F(x) + x F(x) relu X
  • 7. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 7 Figures copyright Larsson et al., 2017. Reproduced with permission. Pool Conv Dense Block 1 Conv Input Conv Dense Block 2 Conv Pool Conv Dense Block 3 Softmax FC Pool Conv Conv 1x1 conv, 64 1x1 conv, 64 Input Concat Concat Concat Dense Block DenseNet FractalNet
  • 8. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 8 Last Time: CNN Architectures
  • 9. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 9 Last Time: CNN Architectures AlexNet and VGG have tons of parameters in the fully connected layers AlexNet: ~62M parameters FC6: 256x6x6 -> 4096: 38M params FC7: 4096 -> 4096: 17M params FC8: 4096 -> 1000: 4M params ~59M params in FC layers!
  • 10. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 10 Today: Recurrent Neural Networks
  • 11. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 11 Vanilla Neural Networks “Vanilla” Neural Network
  • 12. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 12 Recurrent Neural Networks: Process Sequences e.g. Image Captioning image -> sequence of words
  • 13. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 13 Recurrent Neural Networks: Process Sequences e.g. Sentiment Classification sequence of words -> sentiment
  • 14. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 14 Recurrent Neural Networks: Process Sequences e.g. Machine Translation seq of words -> seq of words
  • 15. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 15 Recurrent Neural Networks: Process Sequences e.g. Video classification on frame level
  • 16. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 16 Sequential Processing of Non-Sequence Data Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015. Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Classify images by taking a series of “glimpses”
  • 17. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 17 Sequential Processing of Non-Sequence Data Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Generate images one piece at a time!
  • 18. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 18 Recurrent Neural Network x RNN
  • 19. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 19 Recurrent Neural Network x RNN y usually want to predict a vector at some time steps
  • 20. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 20 Recurrent Neural Network x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step: new state old state input vector at some time step some function with parameters W
  • 21. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 21 Recurrent Neural Network x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step: Notice: the same function and the same set of parameters are used at every time step.
  • 22. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 22 (Vanilla) Recurrent Neural Network x RNN y The state consists of a single “hidden” vector h:
  • 23. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 23 h0 fW h1 x1 RNN: Computational Graph
  • 24. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 24 h0 fW h1 fW h2 x2 x1 RNN: Computational Graph
  • 25. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 25 h0 fW h1 fW h2 fW h3 x3 … x2 x1 RNN: Computational Graph hT
  • 26. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 26 h0 fW h1 fW h2 fW h3 x3 … x2 x1 W RNN: Computational Graph Re-use the same weight matrix at every time-step hT
  • 27. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 27 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2 y1
  • 28. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 28 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2 y1 L1 L2 L3 LT
  • 29. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 29 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2 y1 L1 L2 L3 LT L
  • 30. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 30 h0 fW h1 fW h2 fW h3 x3 y … x2 x1 W RNN: Computational Graph: Many to One hT
  • 31. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 31 h0 fW h1 fW h2 fW h3 yT … x W RNN: Computational Graph: One to Many hT y3 y3 y3
  • 32. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 32 Sequence to Sequence: Many-to-one + one-to-many h 0 fW h 1 fW h 2 fW h 3 x 3 … x 2 x 1 W 1 h T Many to one: Encode input sequence in a single vector
  • 33. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 33 Sequence to Sequence: Many-to-one + one-to-many h 0 fW h 1 fW h 2 fW h 3 x 3 … x 2 x 1 W 1 h T y 1 y 2 … Many to one: Encode input sequence in a single vector One to many: Produce output sequence from single input vector fW h 1 fW h 2 fW W 2
  • 34. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 34 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 35. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 35 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 36. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 36 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 37. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 37 Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample
  • 38. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 38 .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 39. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 39 .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 40. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 40 .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 41. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 41 Backpropagation through time Loss Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient
  • 42. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 42 Truncated Backpropagation through time Loss Run forward and backward through chunks of the sequence instead of whole sequence
  • 43. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 43 Truncated Backpropagation through time Loss Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps
  • 44. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 44 Truncated Backpropagation through time Loss
  • 45. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 45 min-char-rnn.py gist: 112 lines of Python (https://gist.github.com/karpathy/d4dee 566867f8291f086)
  • 46. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 46 x RNN y
  • 47. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 47 train more train more train more at first:
  • 48. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 48
  • 49. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 49 The Stacks Project: open source algebraic geometry textbook Latex source http://stacks.math.columbia.edu/ The stacks project is licensed under the GNU Free Documentation License
  • 50. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 50
  • 51. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 51
  • 52. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 52
  • 53. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 53 Generated C code
  • 54. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 54
  • 55. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 55
  • 56. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 56 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
  • 57. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 57 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
  • 58. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 58 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission quote detection cell
  • 59. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 59 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission line length tracking cell
  • 60. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 60 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission if statement cell
  • 61. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 61 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission quote/comment cell
  • 62. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 62 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission code depth cell
  • 63. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 63 Explain Images with Multimodal Recurrent Neural Networks, Mao et al. Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei Show and Tell: A Neural Image Caption Generator, Vinyals et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al. Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick Image Captioning Figure from Karpathy et a, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015; figure copyright IEEE, 2015. Reproduced for educational purposes.
  • 64. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 64 Convolutional Neural Network Recurrent Neural Network
  • 65. test image This image is CC0 public domain
  • 69. h0 x0 <STA RT> y0 <START> test image before: h = tanh(Wxh * x + Whh * h) now: h = tanh(Wxh * x + Whh * h + Wih * v) v Wih
  • 75. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 75 A cat sitting on a suitcase on the floor A cat is sitting on a tree branch A dog is running in the grass with a frisbee A white teddy bear sitting in the grass Two people walking on the beach with surfboards Two giraffes standing in a grassy field A man riding a dirt bike on a dirt track Image Captioning: Example Results A tennis player in action on the court Captions generated using neuraltalk2 All images are CC0 Public domain: cat suitcase, cat tree, dog, bear, surfers, tennis, giraffe, motorcycle
  • 76. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 76 Image Captioning: Failure Cases A woman is holding a cat in her hand A woman standing on a beach holding a surfboard A person holding a computer mouse on a desk A bird is perched on a tree branch A man in a baseball uniform throwing a ball Captions generated using neuraltalk2 All images are CC0 Public domain: fur coat, handstand, spider web, baseball
  • 77. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 77 Image Captioning with Attention Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission. RNN focuses its attention at a different spatial location when generating each word
  • 78. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 78 Image Captioning with Attention CNN Image: H x W x 3 Features: L x D h0 Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
  • 79. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 79 CNN Image: H x W x 3 Features: L x D h0 a1 Distribution over L locations Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 80. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 80 CNN Image: H x W x 3 Features: L x D h0 a1 Weighted combination of features Distribution over L locations z1 Weighted features: D Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 81. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 81 CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features h1 Distribution over L locations Weighted features: D y1 First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 82. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 82 CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features y1 h1 First word Distribution over L locations a2 d1 Weighted features: D Distribution over vocab Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 83. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 83 CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features y1 h1 First word Distribution over L locations a2 d1 h2 z2 y2 Weighted features: D Distribution over vocab Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 84. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 84 CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features y1 h1 First word Distribution over L locations a2 d1 h2 a3 d2 z2 y2 Weighted features: D Distribution over vocab Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 85. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 85 Soft attention Hard attention Image Captioning with Attention Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
  • 86. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 86 Image Captioning with Attention Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
  • 87. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 87 Visual Question Answering Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015 Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016 Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.
  • 88. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 88 Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016 Figures from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes. Visual Question Answering: RNNs with Attention
  • 89. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 89 time depth Multilayer RNNs LSTM:
  • 90. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 90 ht-1 xt W stack tanh ht Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 91. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 91 ht-1 xt W stack tanh ht Vanilla RNN Gradient Flow Backpropagation from ht to ht-1 multiplies by W (actually Whh T ) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 92. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 92 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Computing gradient of h0 involves many factors of W (and repeated tanh) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 93. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 93 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients Computing gradient of h0 involves many factors of W (and repeated tanh) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 94. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 94 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients Gradient clipping: Scale gradient if its norm is too big Computing gradient of h0 involves many factors of W (and repeated tanh) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 95. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 95 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Computing gradient of h0 involves many factors of W (and repeated tanh) Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients Change RNN architecture Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 96. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 96 Long Short Term Memory (LSTM) Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997 Vanilla RNN LSTM
  • 97. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 97 Long Short Term Memory (LSTM) [Hochreiter et al., 1997] x h vector from before (h) W i f o g vector from below (x) sigmoid sigmoid tanh sigmoid 4h x 2h 4h 4*h f: Forget gate, Whether to erase cell i: Input gate, whether to write to cell g: Gate gate (?), How much to write to cell o: Output gate, How much to reveal cell
  • 98. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 ☉ 98 ct-1 ht-1 xt f i g o W ☉ + ct tanh ☉ ht Long Short Term Memory (LSTM) [Hochreiter et al., 1997] stack
  • 99. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 ☉ 99 ct-1 ht-1 xt f i g o W ☉ + ct tanh ☉ ht Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] stack Backpropagation from ct to ct-1 only elementwise multiplication by f, no matrix multiply by W
  • 100. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 100 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow!
  • 101. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 101 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow! Input Softmax 3x3 conv, 64 7x7 conv, 64 / 2 FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128 3x3 conv, 128 / 2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 ... 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Pool Similar to ResNet!
  • 102. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 102 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow! Input Softmax 3x3 conv, 64 7x7 conv, 64 / 2 FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128 3x3 conv, 128 / 2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 ... 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Pool Similar to ResNet! In between: Highway Networks Srivastava et al, “Highway Networks”, ICML DL Workshop 2015
  • 103. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 103 Other RNN Variants [LSTM: A Search Space Odyssey, Greff et al., 2015] [An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015] GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014]
  • 104. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 104 Summary - RNNs allow a lot of flexibility in architecture design - Vanilla RNNs are simple but don’t work very well - Common to use LSTM or GRU: their additive interactions improve gradient flow - Backward flow of gradients in RNN can explode or vanish. Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM) - Better/simpler architectures are a hot topic of current research - Better understanding (both theoretical and empirical) is needed.
  • 105. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 105 Next time: Midterm! Then Detection and Segmentation