Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Intelligent Data Science and Artificial Intelligence Center
Universitat Politecnica de Catalunya (UPC)
DEEP
LEARNING
WORKSHOP
Dublin City University
21-22 May 2018
Skipping and Repeating
Samples in RNNs
Case study III
#InsightDL2018
bit.ly/InsightDL2018
#InsightDL2018
Our team @ UPC IDEAI Barcelona
Victor
Campos
Amaia
Salvador
Amanda
Duarte
Dani
Fojo
Görkem
Çamli
Akis
Linardos
Santiago
Pascual
Xavi
Giró
Miriam
Bellver
Marta R.
Costa.jussà
Ferran
Marqués
Jordi
Torres (BSC)
Marc
Assens
Sandra
Roca
Miquel
LLobet
Antonio
Bonafonte
bit.ly/InsightDL2018
#InsightDL2018
Kevin
McGuinness
Shih-Fu
Chang
Noel E.
O’Connor
Cathal
Gurrin
Eva
Mohedano
Carles
Ventura
Our partners @ academia
Francisco
Roldan
Marta
Coll
Paula
Gómez
Alejandro
Woodward
Marc
Górriz
bit.ly/InsightDL2018
#InsightDL2018
Dèlia
Fernández
Elisenda
Bou
Sebastian
Palacio
Eduard
Ramon
Andreu
Girbau
Carlos
Arenas
Our team @ industry
2016 20172015
Convolutional Neural Networks (CNNs)
Evolution
Strategies
Recurrent Neural Networks (RNN)
Adversarial Training
2018
Unsupervised Learning
Reinforcement Learning
2016 20172015
Convolutional Neural Networks (CNNs)
Evolution
Strategies
Recurrent Neural Networks (RNN)
Adversarial Training
2018
Unsupervised Learning
Reinforcement Learning
bit.ly/InsightDL2018
#InsightDL2018
Motivation: Dynamic Computation
2+2
bit.ly/InsightDL2018
#InsightDL2018
Outline
Recurrent Neural
Networks (RNNs)
SkipRNN
[ICLR 2018]
RepeatRNN
[ICLRW 2018]
bit.ly/InsightDL2018
#InsightDL2018
Outline
Recurrent Neural
Networks (RNNs)
SkipRNN
[ICLR 2018]
RepeatRNN
[ICLRW 2018]
bit.ly/InsightDL2018
#InsightDL2018
Feed-forward Neural Network
INPUT(x)
OUTPUT(y)
FeedForward
Hidden
States
h1
& h2
Feed-forward
Weights (Wi
)
Figure: Hugo Larochelle
bit.ly/InsightDL2018
#InsightDL2018
Recurrent Neural Network
bit.ly/InsightDL2018
#InsightDL2018
Recurrent Neural Network
Recurrent
Weights (U)
Feed-forward
Weights (W)
bit.ly/InsightDL2018
#InsightDL2018
Recurrent Neural Network
Updated
state
Previous
state
INPUT(x)
bit.ly/InsightDL2018
#InsightDL2018
Recurrent Neural Network
time
time
Unfold
(Rotation 90o
)
Unfold
(Rotation 90o
)
bit.ly/InsightDL2018
#InsightDL2018
Recurrent Neural Network
S
x1
s1
S
x2
s2
S
xT
sT
…
time
bit.ly/InsightDL2018
#InsightDL2018
Recurrent Neural Network
CNN CNN CNN...
RNN RNN RNN...
Video sequences (eg. action recognition)
bit.ly/InsightDL2018
#InsightDL2018
Recurrent Neural Network
Word sequences (eg. Machine Translation)
bit.ly/InsightDL2018
#InsightDL2018
Recurrent Neural Network
Spectrograms sequence (eg. speech recognition)
T H _ _ E … D _ O _ _G
bit.ly/InsightDL2018
#InsightDL2018
Outline
Recurrent Neural
Networks (RNNs)
SkipRNN
[ICLR 2018]
RepeatRNN
[ICLRW 2018]
bit.ly/InsightDL2018
#InsightDL2018
Skip RNN: Learning to Skip
State Updates in RNNs
Víctor Campos Jordi Torres Xavier Giró-i-Nieto Shih-Fu ChangBrendan Jou
Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to
Skip State Updates in Recurrent Neural Networks”, ICLR 2018.
bit.ly/InsightDL2018
#InsightDL2018
Motivation
Used Unused
CNN CNN CNN...
RNN RNN RNN...
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
S
x1
s1
time
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
S
x1
s1
time
COPY
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
S
x1
s1
x2
time
COPY
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
S
x1
s1
S
x2
s1
time
COPY
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
S
x1
s1
S
x2
s1
time
COPY UPDATE
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
S
x1
s1
S
x2
s1
time
S
x3
s3
COPY UPDATE
bit.ly/InsightDL2018
#InsightDL2018
Limiting computation
cost per sample
(hyperparameter)
1 if sample used
0 otherwise
Intuition: the network can be encouraged to perform
fewer updates by adding a penalization when ut
= 1
bit.ly/InsightDL2018
#InsightDL2018
Experiments: Sequential MNIST
Used
Unused
bit.ly/InsightDL2018
#InsightDL2018
Experiments: Sequential MNIST
Epochs for Skip LSTM (λ = 10-4
)
~30% acc ~50% acc ~70% acc ~95% acc
11 epochs 51 epochs 101 epochs 400 epochs
Used
Unused
bit.ly/InsightDL2018
#InsightDL2018
Experiments: Sequential MNIST
cost per sample
(hyperparameter)
Random
baseline
bit.ly/InsightDL2018
#InsightDL2018
Experiments: Sequential MNIST
Better
accuracy...
Ours
bit.ly/InsightDL2018
#InsightDL2018
Experiments: Sequential MNIST
...with less
computation
Ours
bit.ly/InsightDL2018
#InsightDL2018
Experiments: Action Localization
CNN CNN CNN...
RNN RNN RNN...
better
bit.ly/InsightDL2018
#InsightDL2018
Experiments: Action Localization
CNN CNN CNN...
RNN RNN RNN...
faster
bit.ly/InsightDL2018
#InsightDL2018
Open science
https://git.io/skip-rnn
bit.ly/InsightDL2018
#InsightDL2018
Outline
Recurrent Neural
Networks (RNNs)
SkipRNN
[ICLR 2018]
RepeatRNN
[ICLRW 2018]
Reproducing and Analyzing
Adaptive Computation Time
in PyTorch and TensorFlow
Víctor Campos Xavier Giró-i-NietoDani Fojo
Fojo, Daniel, Víctor Campos, and Xavier Giró-i-Nieto. "Comparing Fixed and Adaptive Computation Time for
Recurrent Neural Networks." ICLR Workshop 2018.
bit.ly/InsightDL2018
#InsightDL2018
Adaptive Computation Time (ACT)
Graves, Alex. "Adaptive computation time for recurrent neural networks."
arXiv preprint arXiv:1603.08983 (2016).
REPEAT 3 REPEAT 2
bit.ly/InsightDL2018
#InsightDL2018
ACT
Adaptive Computation Time (ACT)
bit.ly/InsightDL2018
#InsightDL2018
ACT vs RepeatRNN
ACT
RepeatRNN
bit.ly/InsightDL2018
#InsightDL2018
Experiment: One-hot addition
bit.ly/InsightDL2018
#InsightDL2018
Experiment: One-hot addition
Ours
bit.ly/InsightDL2018
#InsightDL2018
Experiment: One-hot addition
Interpretable
hyperparameter
Ours
bit.ly/InsightDL2018
#InsightDL2018
Experiment: One-hot addition
Faster training….
Ours
bit.ly/InsightDL2018
#InsightDL2018
Experiment: One-hot addition
...and faster inference
Ours
bit.ly/InsightDL2018
#InsightDL2018
Open Science
bit.ly/repeat-rnn
bit.ly/InsightDL2018
#InsightDL2018
Outline
Recurrent Neural
Networks (RNNs)
SkipRNN
[ICLR 2018]
RepeatRNN
[ICLRW 2018]
bit.ly/InsightDL2018
#InsightDL2018
Acknowledgements
bit.ly/InsightDL2018
#InsightDL2018Deep Learning courses @ UPC (videos)
● MSc course (2017)
● BSc course (2018)
● 1st edition (2016)
● 2nd edition (2017)
● 3rd edition (2018)
● 1st edition (2017)
● 2nd edition (2018)
Next edition Autumn 2018 Next edition Winter/Spring 2019Summer School (starts 28/06)
@DocXavi
Xavier Giro-i-Nieto
Download slides from:
http://bit.ly/InsightDL2018
We are looking for both
industrial & academic partners:
xavier.giro@upc.edu
#InsightDL2018
Click to comment:
Questions?
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
Intuition: introduce a binary update state gate, ut
,
deciding whether the RNN state is updated or copied
S(xt
, ht-1
) if ut
= 1 // update operation
st-1
if ut
= 0 // copy operation
st
=S
xt
st
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
Update state gate ∈ {0, 1}
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
Update state gate ∈ {0, 1}
Update state probability ∈ [0, 1]
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
Update state gate ∈ {0, 1}
Update state probability ∈ [0, 1]
Increment for the update state probability
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
Update state gate ∈ {0, 1}
Update state probability ∈ [0, 1]
Increment for the update state probability
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
Update state (ut
= 1) Copy state (ut
= 0)
bit.ly/InsightDL2018
#InsightDL2018
SkipRNN
Update state (ut
= 1) Copy state (ut
= 0)
bit.ly/InsightDL2018
#InsightDL2018
Straight Though Estimator
fbinarize
(x)
x
Forward pass
1
10
G. Hinton, Neural Networks for Machine Learning, Coursera video lectures 2012
bit.ly/InsightDL2018
#InsightDL2018
fbinarize
(x)
x
Forward pass
Backward pass
G. Hinton, Neural Networks for Machine Learning, Coursera video lectures 2012
1
10
Straight Though Estimator

Skipping and Repeating Samples in Recurrent Neural Networks