1. Zoneout: Regularization RNNs by Randomly
Preserving Hidden Activations
Krueger et al. In CoRR 2016
Federico Raue
Reading Group at DFKI
27-September-2016
2. Content
Dropout in Feed-forward Networks
Related Work
Dropout in RNN
Stochastic Depth
Zoneout
Experiments
Sequential Permuted MNIST
Character level – Penn Treebank
Word level – Penn Treebank
Conclusions
4. Dropout in Feed-forward Networks
1
1
N. Srivastava et al. (2014). “Dropout: A Simple Way to Prevent Neural
Networks from Overfitting”. In: Journal of Machine Learning Research 15.
7. Dropout in RNN
Train a pseudo-ensemble model2
the source network is the parent model
each sampled model is the child model
noise process → sampling node masks → extract subnetworks
2
P. Bachman et al. (2014). “Learning with pseudo-ensembles”. In:
Advances in Neural Information Processing Systems.
8. Dropout in RNN
Figure: First attempts of Dropout in RNN34
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
9. Dropout in RNN
Figure: First attempts of Dropout in RNN34
Only apply to dropout feed-forward connections (up to stack)
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
10. Dropout in RNN
Figure: First attempts of Dropout in RNN34
Only apply to dropout feed-forward connections (up to stack), and
not recurrent connection (forward through time)
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
30. Sequential Permuted MNIST (1/3)
Sequential MNIST: pixels of an image representing a
number are presented to a RNN one at a time, in lexographic
order (left to right, top to bottom)
Permuted Sequential MNIST: the pixels are represented in
a (fixed) random order
Error Classification
39. Conclusions
Instead of dropping out neurons, zoneout neurons
More robust to changes in the hidden state
Identity connections of zoneout improve the flow of
information through the network
40. Conclusions
Instead of dropping out neurons, zoneout neurons
More robust to changes in the hidden state
Identity connections of zoneout improve the flow of
information through the network
Future Work: Adapt the set of probabilities of updating
various units based on the sequence input
41. References I
Bachman, P. et al. (2014). “Learning with pseudo-ensembles”. In:
Advances in Neural Information Processing Systems,
pp. 3365–3373.
Gal, Y. (2015). “A theoretically grounded application of dropout in
recurrent neural networks”. In: arXiv preprint arXiv:1512.05287.
Huang, G. et al. (2016). “Deep networks with stochastic depth”.
In: arXiv preprint arXiv:1603.09382.
Krueger, D. et al. (2016). “Zoneout: Regularizing RNNs by
Randomly Preserving Hidden Activations”. In: arXiv preprint
arXiv:1606.01305.
Moon, T. et al. (2015). “Rnndrop: A novel dropout for rnns in
asr”. In: 2015 IEEE Workshop on Automatic Speech
Recognition and Understanding (ASRU). IEEE, pp. 65–70.
42. References II
Pham, V. et al. (2014). “Dropout improves recurrent neural
networks for handwriting recognition”. In: Frontiers in
Handwriting Recognition (ICFHR), 2014 14th International
Conference on. IEEE, pp. 285–290.
Semeniuta, S. et al. (2016). “Recurrent Dropout without Memory
Loss”. In: arXiv preprint arXiv:1603.05118.
Srivastava, N. et al. (2014). “Dropout: A Simple Way to Prevent
Neural Networks from Overfitting”. In: Journal of Machine
Learning Research 15, pp. 1929–1958.
Zaremba, W. et al. (2014). “Recurrent neural network
regularization”. In: arXiv preprint arXiv:1409.2329.