Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models.
4. Recurrent Neural Networks
4
S
xt
st
Recurrent Neural Networks (RNNs) are
state-of-the-art solutions for sequence
modeling tasks such as
● Natural Language Processing
● Image captioning
● Text generation
● … and many more!
12. Adaptive Computation Time (ACT)
12
Fixed Computation Time Adaptive Computation Time (ACT)
A. Graves, Adaptive Computation Time for Recurrent Neural Networks, arXiv 2016
15. Model description
Intuition: introduce a binary update state gate, ut
, deciding whether the
RNN state is updated or copied
15
S(xt
, ht-1
) if ut
= 1 // update operation
st-1
if ut
= 0 // copy operation
st
=S
xt
st
27. Limiting computation
27
cost per sample
1 if sample used
0 otherwise
Intuition: the network can be encouraged to perform fewer
updates by adding a penalization when ut
= 1
29. Evaluated tasks
Skip RNN has been evaluated on
1. Adding task
2. Frequency discrimination task
3. Digit classification
4. Sentiment analysis
5. Action classification
29
30. Evaluated tasks
Skip RNN has been evaluated on
1. Adding task
2. Frequency discrimination task
3. Digit classification
4. Sentiment analysis
5. Action classification
30
image
text
video
synthetic (regression)
synthetic (classification)
Type of data
31. Adding task: overview
31
▷ Input: (value, marker) pairs
○ Two elements are marked with 1
○ The rest are marked with 0
▷ Output: addition of the two marked values
▷ Marked values placed randomly in:
○ First marker: first 10% of the sequence
○ Second marker: last 50% of the sequence
▷ At least 40% of dummy data per sequence
▷ Loss: Mean Squared Error (MSE)
RNN
(value, marker)
FC (1)
out
34. Sequential MNIST: overview
▷ Digit classification task with 10 classes, i.e.
[0, 9]
▷ Traditionally addressed with CNNs, but it
can be converted into a sequential task by
flattening the images
○ Original images: 28x28
○ Flattened images: 784-d vectors
▷ The RNN is given 1 pixel at a time
34
RNN
pixel intensity
FC (10)
out
37. UCF-101: overview
37
▷ Short, trimmed videos
▷ 101 action classes
▷ 10s of video
○ Cropped longer videos
○ Padded shorter ones with empty frames
▷ Using original framerate: 25 fps
▷ Frame-level ResNet-50 GAP features
RNN
FC (101)
out
CNN
RGB frame
41. Novel RNN architecture
41
▷ Preserves performance while reducing
number of state updates
▷ Orthogonal to recent advances in RNNs
▷ Implemented on top of LSTM and GRU
▷ Evaluated on different modalities and tasks
▷ Potential for extension and improvement
42. Novel RNN architecture
42
▷ Preserves performance while reducing
number of state updates
▷ Orthogonal to recent advances in RNNs
▷ Implemented on top of LSTM and GRU
▷ Evaluated on different modalities and tasks
▷ Potential for extension and improvement
43. Novel RNN architecture
43
▷ Preserves performance while reducing
number of state updates
▷ Orthogonal to recent advances in RNNs
▷ Implemented on top of LSTM and GRU
▷ Evaluated on different modalities and tasks
▷ Potential for extension and improvement
44. Novel RNN architecture
44
▷ Preserves performance while reducing
number of state updates
▷ Orthogonal to recent advances in RNNs
▷ Implemented on top of LSTM and GRU
▷ Evaluated on different modalities and tasks
▷ Potential for extension and improvement
45. Novel RNN architecture
45
▷ Preserves performance while reducing
number of state updates
▷ Orthogonal to recent advances in RNNs
▷ Implemented on top of LSTM and GRU
▷ Evaluated on different modalities and tasks
▷ Potential for extension and improvement
53. Variable computation in RNNs
53
Neil et al., Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences, NIPS 2016
Jernite et al., Variable Computation in Recurrent Neural Networks, ICLR 2017
Phased LSTM Variable Computation RNN