Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Skip RNN: Learning to Skip
State Updates in RNNs
Víctor Campos Jordi Torres Xavier Giró-i-Nieto Shih-Fu ChangBrendan Jou
MSc Thesis Defence
September 8th 2017

Outline
1. Motivation
2. Related work
3. Model
4. Results
5. Conclusion
2

Recurrent Neural Networks
4
S
xt
st
Recurrent Neural Networks (RNNs) are
state-of-the-art solutions for sequence
modeling tasks such as
● Natural Language Processing
● Image captioning
● Text generation
● … and many more!

5
S
x1
s1
S
x2
s2
S
xT
sT
…

6
S
x1
s1
S
x2
s2
S
xT
sT
…
Inherently sequential !

Potential issues with RNNs
1. Slow inference
2. Difficulties capturing long term dependencies
3. Vanishing gradients during training
7

1. Slow inference
8

1. Slow inference
9

Proposed solution
10
These issues are related to the resulting
computational graphs being too long
Let’s learn how to shorten them

Adaptive Computation Time (ACT)
12
Fixed Computation Time Adaptive Computation Time (ACT)
A. Graves, Adaptive Computation Time for Recurrent Neural Networks, arXiv 2016

LSTM-Jump
13
Yu et al., Learning to Skim Text, ACL 2017

Model description
Intuition: introduce a binary update state gate, ut
, deciding whether the
RNN state is updated or copied
15
S(xt
, ht-1
) if ut
= 1 // update operation
st-1
if ut
= 0 // copy operation
st
=S
xt
st

Model description
17
Update state gate ∈ {0, 1}

Model description
18
Update state probability ∈ [0, 1]

Model description
19

Model description
20
Increment for the update state probability

Model description
21
Increment for the update state probability

Model description
22
Update state (ut
= 1) Copy state (ut
= 0)

Model description
23
Update state (ut
= 1) Copy state (ut
= 0)

Model description
24
Update state (ut
= 1) Copy state (ut
= 0)

Straight Through Estimator
25
fbinarize
(x)
x
Forward pass
G. Hinton, Neural Networks for Machine Learning, Coursera video lectures 2012
1
10

Straight Through Estimator
26
fbinarize
(x)
x
Forward pass
Backward pass
G. Hinton, Neural Networks for Machine Learning, Coursera video lectures 2012
1
10

Limiting computation
27
cost per sample
1 if sample used
0 otherwise
Intuition: the network can be encouraged to perform fewer
updates by adding a penalization when ut
= 1

Evaluated tasks
Skip RNN has been evaluated on
1. Adding task
2. Frequency discrimination task
3. Digit classification
4. Sentiment analysis
5. Action classification
29

Evaluated tasks
Skip RNN has been evaluated on
1. Adding task
2. Frequency discrimination task
3. Digit classification
4. Sentiment analysis
5. Action classification
30
image
text
video
synthetic (regression)
synthetic (classification)
Type of data

Adding task: overview
31
▷ Input: (value, marker) pairs
○ Two elements are marked with 1
○ The rest are marked with 0
▷ Output: addition of the two marked values
▷ Marked values placed randomly in:
○ First marker: first 10% of the sequence
○ Second marker: last 50% of the sequence
▷ At least 40% of dummy data per sequence
▷ Loss: Mean Squared Error (MSE)
RNN
(value, marker)
FC (1)
out

Adding task: examples (Skip GRU)
33
Used
Unused

Sequential MNIST: overview
▷ Digit classification task with 10 classes, i.e.
[0, 9]
▷ Traditionally addressed with CNNs, but it
can be converted into a sequential task by
flattening the images
○ Original images: 28x28
○ Flattened images: 784-d vectors
▷ The RNN is given 1 pixel at a time
34
RNN
pixel intensity
FC (10)
out

Sequential MNIST: examples
36
Epochs for Skip LSTM (λ = 10-4
)
~30% acc ~50% acc ~70% acc ~95% acc
11 epochs 51 epochs 101 epochs 400 epochs
Used
Unused

UCF-101: overview
37
▷ Short, trimmed videos
▷ 101 action classes
▷ 10s of video
○ Cropped longer videos
○ Padded shorter ones with empty frames
▷ Using original framerate: 25 fps
▷ Frame-level ResNet-50 GAP features
RNN
FC (101)
out
CNN
RGB frame

UCF-101: results (Skip LSTM, λ = 10-4)
39
Used Unused

Novel RNN architecture
41
▷ Preserves performance while reducing
number of state updates
▷ Orthogonal to recent advances in RNNs
▷ Implemented on top of LSTM and GRU
▷ Evaluated on different modalities and tasks
▷ Potential for extension and improvement

42

43

44

45

Project site
47
https://imatge-upc.github.io/skiprnn-2017-telecombcn

Public code
48
https://github.com/imatge-upc/skiprnn-2017-telecombcn

Variable computation in RNNs
53
Neil et al., Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences, NIPS 2016
Jernite et al., Variable Computation in Recurrent Neural Networks, ICLR 2017
Phased LSTM Variable Computation RNN

LSTM-Jump: constraints
54

LSTM-Jump: loss function
55
task loss “jump” loss
parameters for the
variance reduction method

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Similar to Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks