Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow

Reproducing and Analyzing
Adaptive Computation Time
in PyTorch and TensorFlow
Víctor Campos Xavier Giró-i-NietoDani Fojo
06/02/2018

Outline
2
1. Motivation
2. Theoretical background
3. Related work
4. Adaptive Computation Time
5. Implementation
6. Experiments
7. Conclusions

Motivation
4
Why do we need adaptive computation?

Motivation
5
The complexity of posing a problem is not
proportional to the complexity of solving it.

Motivation
6
Year 1637

Motivation
7
A. Wiles “Modular elliptic curves and Fermat’s last theorem”
Year 1637 Year 1995

Motivation
8
Problems may differ in complexity

Neural Networks
Alternate linear and non-linear functions.
10

Loss function
11
Mean Squared Error:
Cross Entropy:

Recurrent Neural Networks (RNN’s)
12
Main idea: The network has a state.
Slide credit: Xavier Giró

Recurrent Neural Networks (RNN’s)
13
time
time
Unfold
(Rotation
90o
)
Front View Side View
Rotation
90o
Slide credit: Xavier Giró

Related work
15
Spatially-Adaptive Computation Time for
Residual Networks.
M. Figurnov et al. “Spatially Adaptive Computation Time for Residual Networks” CVPR 2017

Related work
16
LSTM-Jump and Skip-RNN.
A. Yu et al. “Learning to Skim Text” ACL 2017
V. Campos et al. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks” ICLR 2018

Adaptive Computation Time
18
Adaptive Computation Time for RNN’s (ACT).
A. Graves “Adaptive Computation Time for Recurrent Neural Networks”

Adaptive Computation Time (ACT)
19

Simple Recurrent Neural Network (RNN)
20

Model description
22
Halting probability

Model description
23
Each sample is processed until these probabilities
add up to one.

Model description
24
● Residual:
● Outputs:

Limiting computation
25
● Modified loss:
● Ponder cost:
Time penalty

Ponder cost: Intuition
26
Constant
Negated
probabilities

Limiting computation
27
Maximum steps

Computation graph
31
Static graph Dynamic graph

Implementation
32
Pros:
● Dynamic graph
Cons:
● Version 0.2
● Lack of docs
● Too slow for custom
RNN’s

Implementation
33
Pros:
● Version 1.4
● Much better docs
● Much faster
Cons:
● Static graph
● Harder to use

Implementation
34
bit.ly/dani-tfg

Experiments
36
We wanted to test ACT:
Comparing it with a simple RNN is unfair.
VS

New baseline: Repeat-RNN
37
Repetitions

Experiments: Addition
39
Each digit is fed as a one-hot encoding

40

41

42

43

44

46

Conclusions
55
1. We implemented ACT in two frameworks
2. Designed Repeat-RNN and compared it to ACT
3. We could achieve better performance than ACT
with a simpler more interpretable model

Conclusions
56

Conclusions
57
VS

Future work
58
● ICLR 2018 workshop
● Improve ACT
● Skipping samples of Repeat-RNN

Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow

More Related Content

What's hot

Similar to Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow

More from Universitat Politècnica de Catalunya

Recently uploaded

Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow