SlideShare a Scribd company logo
1 of 42
Download to read offline
Zoneout: Regularization RNNs by Randomly
Preserving Hidden Activations
Krueger et al. In CoRR 2016
Federico Raue
Reading Group at DFKI
27-September-2016
Content
Dropout in Feed-forward Networks
Related Work
Dropout in RNN
Stochastic Depth
Zoneout
Experiments
Sequential Permuted MNIST
Character level – Penn Treebank
Word level – Penn Treebank
Conclusions
Dropout in Feed-forward Networks
Dropout in Feed-forward Networks
1
1
N. Srivastava et al. (2014). “Dropout: A Simple Way to Prevent Neural
Networks from Overfitting”. In: Journal of Machine Learning Research 15.
Dropout in Feed-forward Networks
Related Work
Dropout in RNN
Train a pseudo-ensemble model2
the source network is the parent model
each sampled model is the child model
noise process → sampling node masks → extract subnetworks
2
P. Bachman et al. (2014). “Learning with pseudo-ensembles”. In:
Advances in Neural Information Processing Systems.
Dropout in RNN
Figure: First attempts of Dropout in RNN34
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
Dropout in RNN
Figure: First attempts of Dropout in RNN34
Only apply to dropout feed-forward connections (up to stack)
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
Dropout in RNN
Figure: First attempts of Dropout in RNN34
Only apply to dropout feed-forward connections (up to stack), and
not recurrent connection (forward through time)
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
Dropout in RNN
Vanilla RNN
ht = f (Wh[xt, ht−1] + bh])
Dropout in RNN
Vanilla RNN
ht = f (Wh[xt, ht−1] + bh])
Vanilla RNN + (recurrent) dropout
ht = f (Wh[xt, d(ht−1)] + bh])
d(x) =
mask ∗ x if training phase
(1 − p)x otherwise,
Dropout in LSTM




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = ft ∗ ct−1 + it ∗ gt
ht = ot ∗ f (ct)
Dropout in LSTM
Dropout in LSTM5




it
ft
ot
gt



 =




σ(Wi [xt, d(ht)] + bi )
σ(Wf [xt, d(ht)] + bf )
σ(Wo[xt, d(ht)] + bo)
f (Wg [xt, d(ht)] + bg )




ct = ft ∗ ct−1 + it ∗ gt
ht = ot ∗ f (ct)
5
Y. Gal (2015). “A theoretically grounded application of dropout in
recurrent neural networks”. In: arXiv preprint arXiv:1512.05287.
Dropout in LSTM6




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = d(ft ∗ ct−1 + it ∗ gt)
ht = ot ∗ f (ct)
6
T. Moon et al. (2015). “Rnndrop: A novel dropout for rnns in asr”. In:
2015 IEEE Workshop on Automatic Speech Recognition and Understanding
(ASRU). IEEE.
Dropout in LSTM7




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = ft ∗ ct−1 + it ∗ d(gt)
ht = ot ∗ f (ct)
7
S. Semeniuta et al. (2016). “Recurrent Dropout without Memory Loss”.
In: arXiv preprint arXiv:1603.05118.
Dropout in LSTM – Summary
Stochastic Depth
Stochastic Depth8
8
G. Huang et al. (2016). “Deep networks with stochastic depth”. In: arXiv
preprint arXiv:1603.09382.
Zoneout
Zoneout9
ht = f (Wh[xt, d(ht−1)] + bh])
d(x) =
mask ∗ x if training phase
(1 − p)x otherwise,
9
D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly
Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
Zoneout9
ht = f (Wh[xt, d(ht−1)] + bh])
d(x) =
mask ∗ x if training phase
(1 − p)x otherwise,
Dropout: τt = pt ∗ ˜τt + (1 − pt) ∗ 0
9
D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly
Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
Zoneout9
ht = f (Wh[xt, d(ht−1)] + bh])
d(x) =
mask ∗ x if training phase
(1 − p)x otherwise,
Dropout: τt = pt ∗ ˜τt + (1 − pt) ∗ 0
Zoneout: τt = pt ∗ ˜τt + (1 − pt) ∗ 1
9
D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly
Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
Zoneout
Figure: Zoneout vs Recurrent Dropout
Again – LSTM equations




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = ft ∗ ct−1 + it ∗ gt
ht = ot ∗ f (ct)
LSTM equations – Zoneout




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = pt ∗ ct−1 + (1 − pt) ∗ (ft ∗ ct−1 + it ∗ gt)
ht = pt ∗ ht−1 + (1 − pt) ∗ (ot ∗ f (ct))
Zoneout + Recurrent Dropout




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = (ft ∗ ct−1 + d(it ∗ gt)) recurrent dropout
ht = ((1 − pt) ∗ ot + pt ∗ ot−1) ∗ f (ct) zoneout
Experiments
Sequential Permuted MNIST (1/3)
Sequential MNIST: pixels of an image representing a
number are presented to a RNN one at a time, in lexographic
order (left to right, top to bottom)
Permuted Sequential MNIST: the pixels are represented in
a (fixed) random order
Error Classification
Sequential Permuted MNIST (2/3)
Sequential Permuted MNIST (3/3)
Penn Treebank Corpus
Character level – Penn Treebank (1/2)
BPC = − log2 P(xt+1|yt)
xt+1 correct symbol
yt output of the algorithm
Character level – Penn Treebank (2/2)
BPC = − log2 P(xt+1|yt)
Word level – Penn Treebank (1/2)
Perplexity = dH(p)
= 2− x p(x) log2 p(x)
Word level – Penn Treebank (2/2)
Conclusions
Conclusions
Instead of dropping out neurons, zoneout neurons
More robust to changes in the hidden state
Identity connections of zoneout improve the flow of
information through the network
Conclusions
Instead of dropping out neurons, zoneout neurons
More robust to changes in the hidden state
Identity connections of zoneout improve the flow of
information through the network
Future Work: Adapt the set of probabilities of updating
various units based on the sequence input
References I
Bachman, P. et al. (2014). “Learning with pseudo-ensembles”. In:
Advances in Neural Information Processing Systems,
pp. 3365–3373.
Gal, Y. (2015). “A theoretically grounded application of dropout in
recurrent neural networks”. In: arXiv preprint arXiv:1512.05287.
Huang, G. et al. (2016). “Deep networks with stochastic depth”.
In: arXiv preprint arXiv:1603.09382.
Krueger, D. et al. (2016). “Zoneout: Regularizing RNNs by
Randomly Preserving Hidden Activations”. In: arXiv preprint
arXiv:1606.01305.
Moon, T. et al. (2015). “Rnndrop: A novel dropout for rnns in
asr”. In: 2015 IEEE Workshop on Automatic Speech
Recognition and Understanding (ASRU). IEEE, pp. 65–70.
References II
Pham, V. et al. (2014). “Dropout improves recurrent neural
networks for handwriting recognition”. In: Frontiers in
Handwriting Recognition (ICFHR), 2014 14th International
Conference on. IEEE, pp. 285–290.
Semeniuta, S. et al. (2016). “Recurrent Dropout without Memory
Loss”. In: arXiv preprint arXiv:1603.05118.
Srivastava, N. et al. (2014). “Dropout: A Simple Way to Prevent
Neural Networks from Overfitting”. In: Journal of Machine
Learning Research 15, pp. 1929–1958.
Zaremba, W. et al. (2014). “Recurrent neural network
regularization”. In: arXiv preprint arXiv:1409.2329.

More Related Content

What's hot

MULTICAST BY SAIKIRAN PANJALA
MULTICAST BY SAIKIRAN PANJALAMULTICAST BY SAIKIRAN PANJALA
MULTICAST BY SAIKIRAN PANJALASaikiran Panjala
 
Transport Layer in Computer Networks (TCP / UDP / SCTP)
Transport Layer in Computer Networks (TCP / UDP / SCTP)Transport Layer in Computer Networks (TCP / UDP / SCTP)
Transport Layer in Computer Networks (TCP / UDP / SCTP)Hamidreza Bolhasani
 
CCNA Routing Fundamentals - EIGRP, OSPF and RIP
CCNA  Routing Fundamentals -  EIGRP, OSPF and RIPCCNA  Routing Fundamentals -  EIGRP, OSPF and RIP
CCNA Routing Fundamentals - EIGRP, OSPF and RIPsushmil123
 
Multicasting and multicast routing protocols
Multicasting and multicast routing protocolsMulticasting and multicast routing protocols
Multicasting and multicast routing protocolsAbhishek Kesharwani
 
Sliding window protocol(ARQ technique)
Sliding window protocol(ARQ technique)Sliding window protocol(ARQ technique)
Sliding window protocol(ARQ technique)shilpa patel
 
Troubleshoot cisco 3750 x stack power feature
Troubleshoot cisco 3750 x stack power featureTroubleshoot cisco 3750 x stack power feature
Troubleshoot cisco 3750 x stack power featureMona Liu
 
La capa de aplicación
La capa de aplicaciónLa capa de aplicación
La capa de aplicaciónJuan Alvarez
 
Site-to-Site IPSEC VPN Between Cisco ASA and Pfsense
Site-to-Site IPSEC VPN Between Cisco ASA and PfsenseSite-to-Site IPSEC VPN Between Cisco ASA and Pfsense
Site-to-Site IPSEC VPN Between Cisco ASA and PfsenseHarris Andrea
 
Ospf routing protocol in gns3
Ospf routing protocol in gns3Ospf routing protocol in gns3
Ospf routing protocol in gns3Prashant Joshi
 
Concept of Network Routing Layer
Concept of Network Routing LayerConcept of Network Routing Layer
Concept of Network Routing LayerPawan Singh
 
Оверлейные сети ЦОД Технологии VXLAN и EVPN
Оверлейные сети ЦОД Технологии VXLAN и EVPN Оверлейные сети ЦОД Технологии VXLAN и EVPN
Оверлейные сети ЦОД Технологии VXLAN и EVPN Cisco Russia
 
Diapositivas de las capas del modelo osi.
Diapositivas de  las capas del modelo osi.Diapositivas de  las capas del modelo osi.
Diapositivas de las capas del modelo osi.Elia Archibold
 
ARQUITECTURA TCP/IP
ARQUITECTURA TCP/IPARQUITECTURA TCP/IP
ARQUITECTURA TCP/IPDaniel Cerda
 

What's hot (20)

MULTICAST BY SAIKIRAN PANJALA
MULTICAST BY SAIKIRAN PANJALAMULTICAST BY SAIKIRAN PANJALA
MULTICAST BY SAIKIRAN PANJALA
 
Flutter do zero a publicacao
Flutter do zero a publicacaoFlutter do zero a publicacao
Flutter do zero a publicacao
 
Congestion control in TCP
Congestion control in TCPCongestion control in TCP
Congestion control in TCP
 
Html básico 1
Html básico 1Html básico 1
Html básico 1
 
Transport Layer in Computer Networks (TCP / UDP / SCTP)
Transport Layer in Computer Networks (TCP / UDP / SCTP)Transport Layer in Computer Networks (TCP / UDP / SCTP)
Transport Layer in Computer Networks (TCP / UDP / SCTP)
 
CCNA Routing Fundamentals - EIGRP, OSPF and RIP
CCNA  Routing Fundamentals -  EIGRP, OSPF and RIPCCNA  Routing Fundamentals -  EIGRP, OSPF and RIP
CCNA Routing Fundamentals - EIGRP, OSPF and RIP
 
Multicasting and multicast routing protocols
Multicasting and multicast routing protocolsMulticasting and multicast routing protocols
Multicasting and multicast routing protocols
 
Sliding window protocol(ARQ technique)
Sliding window protocol(ARQ technique)Sliding window protocol(ARQ technique)
Sliding window protocol(ARQ technique)
 
Troubleshoot cisco 3750 x stack power feature
Troubleshoot cisco 3750 x stack power featureTroubleshoot cisco 3750 x stack power feature
Troubleshoot cisco 3750 x stack power feature
 
La capa de aplicación
La capa de aplicaciónLa capa de aplicación
La capa de aplicación
 
OSPF Presentation
OSPF PresentationOSPF Presentation
OSPF Presentation
 
Introduction to IPv6
Introduction to IPv6Introduction to IPv6
Introduction to IPv6
 
Site-to-Site IPSEC VPN Between Cisco ASA and Pfsense
Site-to-Site IPSEC VPN Between Cisco ASA and PfsenseSite-to-Site IPSEC VPN Between Cisco ASA and Pfsense
Site-to-Site IPSEC VPN Between Cisco ASA and Pfsense
 
Modelo osi y tcp
Modelo osi y tcpModelo osi y tcp
Modelo osi y tcp
 
Ospf routing protocol in gns3
Ospf routing protocol in gns3Ospf routing protocol in gns3
Ospf routing protocol in gns3
 
Concept of Network Routing Layer
Concept of Network Routing LayerConcept of Network Routing Layer
Concept of Network Routing Layer
 
Оверлейные сети ЦОД Технологии VXLAN и EVPN
Оверлейные сети ЦОД Технологии VXLAN и EVPN Оверлейные сети ЦОД Технологии VXLAN и EVPN
Оверлейные сети ЦОД Технологии VXLAN и EVPN
 
Diapositivas de las capas del modelo osi.
Diapositivas de  las capas del modelo osi.Diapositivas de  las capas del modelo osi.
Diapositivas de las capas del modelo osi.
 
Aula 4 - Estruturas condicionais
Aula 4 - Estruturas condicionaisAula 4 - Estruturas condicionais
Aula 4 - Estruturas condicionais
 
ARQUITECTURA TCP/IP
ARQUITECTURA TCP/IPARQUITECTURA TCP/IP
ARQUITECTURA TCP/IP
 

Viewers also liked

Biological inspired system applied to Computer Vision
Biological inspired system applied to Computer VisionBiological inspired system applied to Computer Vision
Biological inspired system applied to Computer VisionFederico Raue
 
Accordion Book
Accordion BookAccordion Book
Accordion Bookquicarroll
 
Washington & Lee Legal Studies Paper Series
Washington & Lee Legal Studies Paper SeriesWashington & Lee Legal Studies Paper Series
Washington & Lee Legal Studies Paper Seriescrysatal16
 
Social Media for Attorneys by Phil Sasso
Social Media for Attorneys by Phil SassoSocial Media for Attorneys by Phil Sasso
Social Media for Attorneys by Phil SassoSasso Marketing, Inc.
 
Wyklad 2
Wyklad 2Wyklad 2
Wyklad 2marwron
 
Innovation in the public sector oecd eu
Innovation in the public sector oecd eu Innovation in the public sector oecd eu
Innovation in the public sector oecd eu Tommaso Balbo
 
Parent portal a year on
Parent portal a year onParent portal a year on
Parent portal a year onDominic Tester
 
Presentación1 angel ortiz ft joel muñoz
Presentación1 angel ortiz ft joel muñozPresentación1 angel ortiz ft joel muñoz
Presentación1 angel ortiz ft joel muñozjoel muñoz
 
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...DATAVERSITY
 
accordion book project
accordion book projectaccordion book project
accordion book projectmmudd
 
Accordion Blocks module walkthrough
Accordion Blocks module walkthroughAccordion Blocks module walkthrough
Accordion Blocks module walkthroughAzri Solutions
 
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)Avi Dey
 
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)guimera
 

Viewers also liked (20)

Biological inspired system applied to Computer Vision
Biological inspired system applied to Computer VisionBiological inspired system applied to Computer Vision
Biological inspired system applied to Computer Vision
 
Accordion Book
Accordion BookAccordion Book
Accordion Book
 
Washington & Lee Legal Studies Paper Series
Washington & Lee Legal Studies Paper SeriesWashington & Lee Legal Studies Paper Series
Washington & Lee Legal Studies Paper Series
 
Social Media for Attorneys by Phil Sasso
Social Media for Attorneys by Phil SassoSocial Media for Attorneys by Phil Sasso
Social Media for Attorneys by Phil Sasso
 
Wyklad 2
Wyklad 2Wyklad 2
Wyklad 2
 
Pat1[1]
Pat1[1]Pat1[1]
Pat1[1]
 
Innovation in the public sector oecd eu
Innovation in the public sector oecd eu Innovation in the public sector oecd eu
Innovation in the public sector oecd eu
 
Parent portal a year on
Parent portal a year onParent portal a year on
Parent portal a year on
 
Herramientas web 2
Herramientas web 2Herramientas web 2
Herramientas web 2
 
Accordion book
Accordion bookAccordion book
Accordion book
 
Presentación1 angel ortiz ft joel muñoz
Presentación1 angel ortiz ft joel muñozPresentación1 angel ortiz ft joel muñoz
Presentación1 angel ortiz ft joel muñoz
 
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
 
Tudo poesia
Tudo poesiaTudo poesia
Tudo poesia
 
accordion book project
accordion book projectaccordion book project
accordion book project
 
Accordion Blocks module walkthrough
Accordion Blocks module walkthroughAccordion Blocks module walkthrough
Accordion Blocks module walkthrough
 
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
 
Electric Guitar Overview
Electric Guitar OverviewElectric Guitar Overview
Electric Guitar Overview
 
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
 
Building for the Future
Building for the FutureBuilding for the Future
Building for the Future
 
Hindu Gods
Hindu GodsHindu Gods
Hindu Gods
 

Similar to Zoneout

On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network ModelingYang Zhang
 
diffusion 모델부터 DALLE2까지.pdf
diffusion 모델부터 DALLE2까지.pdfdiffusion 모델부터 DALLE2까지.pdf
diffusion 모델부터 DALLE2까지.pdf수철 박
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化Ryo Hayakawa
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationArthur Mensch
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...André Panisson
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.pptManiMaran230751
 
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCTed Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Ana Luísa Pinho
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksGiuseppe Broccolo
 
Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Oleg Ovcharenko
 
Winter school-pq2016v2
Winter school-pq2016v2Winter school-pq2016v2
Winter school-pq2016v2Ludovic Perret
 
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...SMART Infrastructure Facility
 

Similar to Zoneout (20)

On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network Modeling
 
diffusion 모델부터 DALLE2까지.pdf
diffusion 모델부터 DALLE2까지.pdfdiffusion 모델부터 DALLE2까지.pdf
diffusion 모델부터 DALLE2까지.pdf
 
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt
 
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCTed Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural Networks
 
Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...
 
Winter school-pq2016v2
Winter school-pq2016v2Winter school-pq2016v2
Winter school-pq2016v2
 
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
 
Technical
TechnicalTechnical
Technical
 

Recently uploaded

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 

Zoneout

  • 1. Zoneout: Regularization RNNs by Randomly Preserving Hidden Activations Krueger et al. In CoRR 2016 Federico Raue Reading Group at DFKI 27-September-2016
  • 2. Content Dropout in Feed-forward Networks Related Work Dropout in RNN Stochastic Depth Zoneout Experiments Sequential Permuted MNIST Character level – Penn Treebank Word level – Penn Treebank Conclusions
  • 4. Dropout in Feed-forward Networks 1 1 N. Srivastava et al. (2014). “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. In: Journal of Machine Learning Research 15.
  • 7. Dropout in RNN Train a pseudo-ensemble model2 the source network is the parent model each sampled model is the child model noise process → sampling node masks → extract subnetworks 2 P. Bachman et al. (2014). “Learning with pseudo-ensembles”. In: Advances in Neural Information Processing Systems.
  • 8. Dropout in RNN Figure: First attempts of Dropout in RNN34 3 V. Pham et al. (2014). “Dropout improves recurrent neural networks for handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE. 4 W. Zaremba et al. (2014). “Recurrent neural network regularization”. In: arXiv preprint arXiv:1409.2329.
  • 9. Dropout in RNN Figure: First attempts of Dropout in RNN34 Only apply to dropout feed-forward connections (up to stack) 3 V. Pham et al. (2014). “Dropout improves recurrent neural networks for handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE. 4 W. Zaremba et al. (2014). “Recurrent neural network regularization”. In: arXiv preprint arXiv:1409.2329.
  • 10. Dropout in RNN Figure: First attempts of Dropout in RNN34 Only apply to dropout feed-forward connections (up to stack), and not recurrent connection (forward through time) 3 V. Pham et al. (2014). “Dropout improves recurrent neural networks for handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE. 4 W. Zaremba et al. (2014). “Recurrent neural network regularization”. In: arXiv preprint arXiv:1409.2329.
  • 11. Dropout in RNN Vanilla RNN ht = f (Wh[xt, ht−1] + bh])
  • 12. Dropout in RNN Vanilla RNN ht = f (Wh[xt, ht−1] + bh]) Vanilla RNN + (recurrent) dropout ht = f (Wh[xt, d(ht−1)] + bh]) d(x) = mask ∗ x if training phase (1 − p)x otherwise,
  • 13. Dropout in LSTM     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = ft ∗ ct−1 + it ∗ gt ht = ot ∗ f (ct)
  • 15. Dropout in LSTM5     it ft ot gt     =     σ(Wi [xt, d(ht)] + bi ) σ(Wf [xt, d(ht)] + bf ) σ(Wo[xt, d(ht)] + bo) f (Wg [xt, d(ht)] + bg )     ct = ft ∗ ct−1 + it ∗ gt ht = ot ∗ f (ct) 5 Y. Gal (2015). “A theoretically grounded application of dropout in recurrent neural networks”. In: arXiv preprint arXiv:1512.05287.
  • 16. Dropout in LSTM6     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = d(ft ∗ ct−1 + it ∗ gt) ht = ot ∗ f (ct) 6 T. Moon et al. (2015). “Rnndrop: A novel dropout for rnns in asr”. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE.
  • 17. Dropout in LSTM7     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = ft ∗ ct−1 + it ∗ d(gt) ht = ot ∗ f (ct) 7 S. Semeniuta et al. (2016). “Recurrent Dropout without Memory Loss”. In: arXiv preprint arXiv:1603.05118.
  • 18. Dropout in LSTM – Summary
  • 20. Stochastic Depth8 8 G. Huang et al. (2016). “Deep networks with stochastic depth”. In: arXiv preprint arXiv:1603.09382.
  • 22. Zoneout9 ht = f (Wh[xt, d(ht−1)] + bh]) d(x) = mask ∗ x if training phase (1 − p)x otherwise, 9 D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
  • 23. Zoneout9 ht = f (Wh[xt, d(ht−1)] + bh]) d(x) = mask ∗ x if training phase (1 − p)x otherwise, Dropout: τt = pt ∗ ˜τt + (1 − pt) ∗ 0 9 D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
  • 24. Zoneout9 ht = f (Wh[xt, d(ht−1)] + bh]) d(x) = mask ∗ x if training phase (1 − p)x otherwise, Dropout: τt = pt ∗ ˜τt + (1 − pt) ∗ 0 Zoneout: τt = pt ∗ ˜τt + (1 − pt) ∗ 1 9 D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
  • 25. Zoneout Figure: Zoneout vs Recurrent Dropout
  • 26. Again – LSTM equations     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = ft ∗ ct−1 + it ∗ gt ht = ot ∗ f (ct)
  • 27. LSTM equations – Zoneout     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = pt ∗ ct−1 + (1 − pt) ∗ (ft ∗ ct−1 + it ∗ gt) ht = pt ∗ ht−1 + (1 − pt) ∗ (ot ∗ f (ct))
  • 28. Zoneout + Recurrent Dropout     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = (ft ∗ ct−1 + d(it ∗ gt)) recurrent dropout ht = ((1 − pt) ∗ ot + pt ∗ ot−1) ∗ f (ct) zoneout
  • 30. Sequential Permuted MNIST (1/3) Sequential MNIST: pixels of an image representing a number are presented to a RNN one at a time, in lexographic order (left to right, top to bottom) Permuted Sequential MNIST: the pixels are represented in a (fixed) random order Error Classification
  • 34. Character level – Penn Treebank (1/2) BPC = − log2 P(xt+1|yt) xt+1 correct symbol yt output of the algorithm
  • 35. Character level – Penn Treebank (2/2) BPC = − log2 P(xt+1|yt)
  • 36. Word level – Penn Treebank (1/2) Perplexity = dH(p) = 2− x p(x) log2 p(x)
  • 37. Word level – Penn Treebank (2/2)
  • 39. Conclusions Instead of dropping out neurons, zoneout neurons More robust to changes in the hidden state Identity connections of zoneout improve the flow of information through the network
  • 40. Conclusions Instead of dropping out neurons, zoneout neurons More robust to changes in the hidden state Identity connections of zoneout improve the flow of information through the network Future Work: Adapt the set of probabilities of updating various units based on the sequence input
  • 41. References I Bachman, P. et al. (2014). “Learning with pseudo-ensembles”. In: Advances in Neural Information Processing Systems, pp. 3365–3373. Gal, Y. (2015). “A theoretically grounded application of dropout in recurrent neural networks”. In: arXiv preprint arXiv:1512.05287. Huang, G. et al. (2016). “Deep networks with stochastic depth”. In: arXiv preprint arXiv:1603.09382. Krueger, D. et al. (2016). “Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305. Moon, T. et al. (2015). “Rnndrop: A novel dropout for rnns in asr”. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, pp. 65–70.
  • 42. References II Pham, V. et al. (2014). “Dropout improves recurrent neural networks for handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE, pp. 285–290. Semeniuta, S. et al. (2016). “Recurrent Dropout without Memory Loss”. In: arXiv preprint arXiv:1603.05118. Srivastava, N. et al. (2014). “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. In: Journal of Machine Learning Research 15, pp. 1929–1958. Zaremba, W. et al. (2014). “Recurrent neural network regularization”. In: arXiv preprint arXiv:1409.2329.