Persistent RNNs
(stashing recurrent weights on-chip)
Presenter: Gregory Diamos
Silicon Valley AI Lab
Baidu
Jun 20, 2016
Presenter: Gregory Diamos Persistent RNNs
Machine learning has benefited greatly from faster computer systems.
GPUs in particular, have delivered a step forward.
Presenter: Gregory Diamos Persistent RNNs
Imagine the problems that you could solve
with even faster systems.
Presenter: Gregory Diamos Persistent RNNs
HPC is an opportunity
10,000x
TitanX GPU
Fastest supercomputer
Presenter: Gregory Diamos Persistent RNNs
Limits of data-parallelism
Presenter: Gregory Diamos Persistent RNNs
Hardware limits
wall-clocktimetoconvergence
mini-batch size
inefficient hardware
Hardware becomes less efficient at small batch sizes.
Presenter: Gregory Diamos Persistent RNNs
Optimization limits
wall-clocktimetoconvergence
mini-batch size
inefficient optimization
Optimization algorithms perform more work at large batch sizes.
Presenter: Gregory Diamos Persistent RNNs
Mini-batch limits
wall-clocktimetoconvergence
mini-batch size
inefficient hardware inefficient optimization
These effects combine to limit the maximum number of GPUs.
Presenter: Gregory Diamos Persistent RNNs
Persistent RNNs
Open source CUDA implementation:
https://github.com/baidu-research/persistent-rnns
Presenter: Gregory Diamos Persistent RNNs
Persistent RNN Details
Presenter: Gregory Diamos Persistent RNNs
Persistent RNNs
weights
GEMM GEMM GEMM GEMM
Persistent RNN
weights
weights weights weights
data0 data1 data2 data3 data4
data0 data1 data2 data3 data4
RNNs built on GEMM routines reload the weights each timestep.
However, the weights are constant, and this is wasteful.
Presenter: Gregory Diamos Persistent RNNs
Cache weights in registers
weights
GPU thread
registers
datapath
Presenter: Gregory Diamos Persistent RNNs
A global barrier
data0 GPU data1 GPU
barrier
Presenter: Gregory Diamos Persistent RNNs
Experiments
Presenter: Gregory Diamos Persistent RNNs
Scaling to 128 GPUs
Presenter: Gregory Diamos Persistent RNNs
Exploring deep residual RNNs
Presenter: Gregory Diamos Persistent RNNs
Pascal and future
Future GPUs will enable bigger and faster RNN layers.
Presenter: Gregory Diamos Persistent RNNs
Three challenges
Presenter: Gregory Diamos Persistent RNNs
Close the gap with the fastest supercomputers.
Presenter: Gregory Diamos Persistent RNNs
Do not settle for inefficient algorithms.
Presenter: Gregory Diamos Persistent RNNs
Push performance to the edge of physical limits.
10 PetaFlops in 300 Watts.
150 ExaFlops in 25 MegaWatts.
Presenter: Gregory Diamos Persistent RNNs

Persistent RNNs: Stashing Recurrent Weights On-Chip