SlideShare a Scribd company logo
1 of 1
Download to read offline
Bayesian neural networks increasingly
sparsify their units with depth
Mariia Vladimirova1
, Julyan Arbel1
and Pablo Mesejo2
1
Inria Grenoble Rhône-Alpes, France
2
University of Granada, Spain
Introduction
We investigate deep Bayesian neural
networks with Gaussian priors on the
weights and ReLU-like nonlinearities.
See Vladimirova et al. (2018).
...
... ...
...
...
...
...
x
h(1)
h(2)
h( )
input
layer
1st
hid.
layer
2nd
hid.
layer
th
hid.
layer
subW(1
2) subW(1) subW(2)
Notations
Given an input x ∈ RN
, the -th hid-
den layer unit activations are defined
as
g( )
(x) = W ( )
h( −1)
(x),
h( )
(x) = φ(g( )
(x)).
Assumptions
• Gaussian prior on weights:
W
( )
i,j ∼ N(0, σ2
w),
• A nonlinearity φ : R → R is said
to obey the extended envelope
property if there exist
c1, c2, d1, d2 ≥ 0 such that
|φ(u)| ≥ c1 + d1|u| for u ∈ R+
or u ∈ R−,
|φ(u)| ≤ c2 + d2|u| for u ∈ R.
Sub-Weibull
A random variable X, such that
P(|X| ≥ x) ≤ exp −x
1/θ
/K
for all x ≥ 0 and for some K > 0, is
called a sub-Weibull random vari-
able with tail parameter θ > 0:
X ∼ subW(θ).
Moment property:
X ∼ subW(θ) implies
X k = E|X|k
1/k
kθ
,
Meaning for all k ∈ N and for some
constants d, D > 0,
d < X k/kθ
< D.
Covariance theorem
The covariance between hidden
units of the same layer is non-
negative. Moreover, for any -th
hidden layer units h( )
and ˜h( )
, for
s, t ∈ N it holds
Cov h( )
s
, ˜h( )
t
≥ 0.
Penalized estimation
Regularized problem:
min
W
R(W ) + λL (W ), (1)
where R(W ) is a loss function,
λL (W ) is a penalty, λ > 0.
For Bayesian models with prior dis-
tribution π(W ), the maximum a
posteriori (MAP) solves (1) with:
L (W ) ∝ − log π(W )
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
U1
U2
Layer 1
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
U1
U2
Layer 2
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
U1
U2
Layer 3
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
U1
U2
Layer 10
Theorem (Vladimirova et al., 2018)
The -th hidden layer units U( )
(pre-activation g( )
or post-activation h( )
)
of a feed-forward Bayesian neural network with:
• Gaussian priors on weights and
• extended envelope condition activation function φ
have sub-Weibull marginal prior distribution with optimal tail pa-
rameter θ = /2, conditional on the input x:
U( )
∼ subW( /2),
Prior distributions of layers = 1, 2, 3
Illustration of units marginal prior distributions from the first three hidden
layers. Neural network parameters: (N, H1, H2, H3) = (50, 25, 24, 4).
0 2 4 6 8 10
Value
0.00
0.05
0.10
0.15
Density
sub-W(1/2)
sub-W(1)
sub-W(3/2)
0 2 4 6 8 10
Value
0.0
0.1
0.2
0.3
0.4
0.5
P(X≥x)
Proof sketch
Induction w.r.t. layer depth :
h( )
k k /2
,
which is the moment characterization
of sub-Weibull variable.
• Extended envelope property implies
h( )
k g( )
k
• Base step: g ∼ N(0, σ2
),
g k
√
k. Thus,
h k = φ(g) k g k
√
k.
• Inductive step: suppose
h( −1)
k k
( − 1)/2
.
• Lower bound: non-negative
covariance theorem:
Cov h( −1)
s
, ˜h( −1)
t
≥ 0.
• Upper bound: Holder’s inequality
• g( )
= H
j=1 W
( −1)
i,j h
( −1)
j implies
h( )
k g( )
k k /2
.
Sparsity interpretation
MAP on weights is L2-reg.
Independent Gaussian prior
π(W ) ∝
L
=1 i,j
e−1
2(W
( )
i,j )2
,
is equivalent to the weight decay
penalty with negative log-prior:
L (W ) ∝
L
=1 i,j
(W
( )
i,j )2
= W 2
2,
MAP on units induces sparsity
The joint prior distribution for all the
units can be expressed by Sklar’s rep-
resentation theorem as
π(U) =
L
=1
H
m=1
π( )
m (U( )
m ) C(F(U)),
where C is the copula of U (charac-
terizes all the dependence between the
units), F is its cumulative distribution
function. The penalty is the negative
log-prior:
L (U) ≈ U(1) 2
2 + · · · + U(L) 2/L
2/L
− log C(F(U)).
Layer W -penalty U-penalty
1 W (1) 2
2, L2
U(1) 2
2, L2
2 W (2) 2
2, L2
U(2)
, L1
W ( ) 2
2, L2
U( ) 2/
2/ , L2/
Conclusion
We prove that the marginal prior
unit distributions are heavier-tailed as
depth increases. We further interpret
this finding, showing that the units
tend to be more sparsely represented
as layers become deeper. This result
provides new theoretical insight on
deep Bayesian neural networks, under-
pinning their natural shrinkage prop-
erties and practical potential.
References
Vladimirova, M., Arbel, J., and Mesejo, P.
(2018). Bayesian neural networks increas-
ingly sparsify their units with depth. arXiv
preprint arXiv:1810.05193.

More Related Content

What's hot

Adaptive dynamic programming for control
Adaptive dynamic programming for controlAdaptive dynamic programming for control
Adaptive dynamic programming for controlSpringer
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloClaudio Attaccalite
 
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network DynamicsJAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamicshirokazutanaka
 
Causal Dynamical Triangulations
Causal Dynamical TriangulationsCausal Dynamical Triangulations
Causal Dynamical TriangulationsRene García
 
Chemical dynamics and rare events in soft matter physics
Chemical dynamics and rare events in soft matter physicsChemical dynamics and rare events in soft matter physics
Chemical dynamics and rare events in soft matter physicsBoris Fackovec
 
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...inventionjournals
 
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02Luke Underwood
 
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...Alexander Decker
 
3.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-243.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-24Alexander Decker
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...hirokazutanaka
 
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...hirokazutanaka
 
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)hirokazutanaka
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化Akira Tanimoto
 
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...hirokazutanaka
 

What's hot (20)

Adaptive dynamic programming for control
Adaptive dynamic programming for controlAdaptive dynamic programming for control
Adaptive dynamic programming for control
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systemsAdaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
 
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network DynamicsJAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
 
Causal Dynamical Triangulations
Causal Dynamical TriangulationsCausal Dynamical Triangulations
Causal Dynamical Triangulations
 
Chemical dynamics and rare events in soft matter physics
Chemical dynamics and rare events in soft matter physicsChemical dynamics and rare events in soft matter physics
Chemical dynamics and rare events in soft matter physics
 
Ch04 4
Ch04 4Ch04 4
Ch04 4
 
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
 
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
 
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
 
3.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-243.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-24
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
 
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
 
Ch13
Ch13Ch13
Ch13
 
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
 
Multi degree of freedom systems
Multi degree of freedom systemsMulti degree of freedom systems
Multi degree of freedom systems
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
 
Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
 

Similar to Bayesian neural networks increasingly sparsify their units with depth

1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 
Schrodinger equation in quantum mechanics
Schrodinger equation in quantum mechanicsSchrodinger equation in quantum mechanics
Schrodinger equation in quantum mechanicsRakeshPatil2528
 
Detection of unknown signal
Detection of unknown signalDetection of unknown signal
Detection of unknown signalsumitf1
 
Alexei Starobinsky - Inflation: the present status
Alexei Starobinsky - Inflation: the present statusAlexei Starobinsky - Inflation: the present status
Alexei Starobinsky - Inflation: the present statusSEENET-MTP
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsMercier Jean-Marc
 
Ph 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSPh 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSChandan Singh
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
adv-2015-16-solution-09
adv-2015-16-solution-09adv-2015-16-solution-09
adv-2015-16-solution-09志远 姚
 
dhirota_hone_corrected
dhirota_hone_correcteddhirota_hone_corrected
dhirota_hone_correctedAndy Hone
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 
Geometric properties for parabolic and elliptic pde
Geometric properties for parabolic and elliptic pdeGeometric properties for parabolic and elliptic pde
Geometric properties for parabolic and elliptic pdeSpringer
 
Draft classical feynmangraphs higgs
Draft classical feynmangraphs higgsDraft classical feynmangraphs higgs
Draft classical feynmangraphs higgsfoxtrot jp R
 
Active Controller Design for Regulating the Output of the Sprott-P System
Active Controller Design for Regulating the Output of the Sprott-P SystemActive Controller Design for Regulating the Output of the Sprott-P System
Active Controller Design for Regulating the Output of the Sprott-P Systemijccmsjournal
 

Similar to Bayesian neural networks increasingly sparsify their units with depth (20)

1 hofstad
1 hofstad1 hofstad
1 hofstad
 
PCA on graph/network
PCA on graph/networkPCA on graph/network
PCA on graph/network
 
Schrodinger equation in quantum mechanics
Schrodinger equation in quantum mechanicsSchrodinger equation in quantum mechanics
Schrodinger equation in quantum mechanics
 
Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
 
Detection of unknown signal
Detection of unknown signalDetection of unknown signal
Detection of unknown signal
 
Alexei Starobinsky - Inflation: the present status
Alexei Starobinsky - Inflation: the present statusAlexei Starobinsky - Inflation: the present status
Alexei Starobinsky - Inflation: the present status
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methods
 
Ph 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSPh 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICS
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
adv-2015-16-solution-09
adv-2015-16-solution-09adv-2015-16-solution-09
adv-2015-16-solution-09
 
Lecture5-FEA.pdf
Lecture5-FEA.pdfLecture5-FEA.pdf
Lecture5-FEA.pdf
 
dhirota_hone_corrected
dhirota_hone_correcteddhirota_hone_corrected
dhirota_hone_corrected
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Geometric properties for parabolic and elliptic pde
Geometric properties for parabolic and elliptic pdeGeometric properties for parabolic and elliptic pde
Geometric properties for parabolic and elliptic pde
 
Bird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysisBird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysis
 
HashiamKadhimFNLHD
HashiamKadhimFNLHDHashiamKadhimFNLHD
HashiamKadhimFNLHD
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
 
Draft classical feynmangraphs higgs
Draft classical feynmangraphs higgsDraft classical feynmangraphs higgs
Draft classical feynmangraphs higgs
 
Active Controller Design for Regulating the Output of the Sprott-P System
Active Controller Design for Regulating the Output of the Sprott-P SystemActive Controller Design for Regulating the Output of the Sprott-P System
Active Controller Design for Regulating the Output of the Sprott-P System
 

More from Julyan Arbel

Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsJulyan Arbel
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsJulyan Arbel
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measuresJulyan Arbel
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingJulyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
Lindley smith 1972
Lindley smith 1972Lindley smith 1972
Lindley smith 1972Julyan Arbel
 
Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Julyan Arbel
 
Jefferys Berger 1992
Jefferys Berger 1992Jefferys Berger 1992
Jefferys Berger 1992Julyan Arbel
 
Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Julyan Arbel
 

More from Julyan Arbel (20)

UCD_talk_nov_2020
UCD_talk_nov_2020UCD_talk_nov_2020
UCD_talk_nov_2020
 
Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian Nonparametrics
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Lindley smith 1972
Lindley smith 1972Lindley smith 1972
Lindley smith 1972
 
Berger 2000
Berger 2000Berger 2000
Berger 2000
 
Seneta 1993
Seneta 1993Seneta 1993
Seneta 1993
 
Lehmann 1990
Lehmann 1990Lehmann 1990
Lehmann 1990
 
Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985
 
Hastings 1970
Hastings 1970Hastings 1970
Hastings 1970
 
Jefferys Berger 1992
Jefferys Berger 1992Jefferys Berger 1992
Jefferys Berger 1992
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
 
R in latex
R in latexR in latex
R in latex
 
Arbel oviedo
Arbel oviedoArbel oviedo
Arbel oviedo
 
Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)
 
Causesof effects
Causesof effectsCausesof effects
Causesof effects
 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Bayesian neural networks increasingly sparsify their units with depth

  • 1. Bayesian neural networks increasingly sparsify their units with depth Mariia Vladimirova1 , Julyan Arbel1 and Pablo Mesejo2 1 Inria Grenoble Rhône-Alpes, France 2 University of Granada, Spain Introduction We investigate deep Bayesian neural networks with Gaussian priors on the weights and ReLU-like nonlinearities. See Vladimirova et al. (2018). ... ... ... ... ... ... ... x h(1) h(2) h( ) input layer 1st hid. layer 2nd hid. layer th hid. layer subW(1 2) subW(1) subW(2) Notations Given an input x ∈ RN , the -th hid- den layer unit activations are defined as g( ) (x) = W ( ) h( −1) (x), h( ) (x) = φ(g( ) (x)). Assumptions • Gaussian prior on weights: W ( ) i,j ∼ N(0, σ2 w), • A nonlinearity φ : R → R is said to obey the extended envelope property if there exist c1, c2, d1, d2 ≥ 0 such that |φ(u)| ≥ c1 + d1|u| for u ∈ R+ or u ∈ R−, |φ(u)| ≤ c2 + d2|u| for u ∈ R. Sub-Weibull A random variable X, such that P(|X| ≥ x) ≤ exp −x 1/θ /K for all x ≥ 0 and for some K > 0, is called a sub-Weibull random vari- able with tail parameter θ > 0: X ∼ subW(θ). Moment property: X ∼ subW(θ) implies X k = E|X|k 1/k kθ , Meaning for all k ∈ N and for some constants d, D > 0, d < X k/kθ < D. Covariance theorem The covariance between hidden units of the same layer is non- negative. Moreover, for any -th hidden layer units h( ) and ˜h( ) , for s, t ∈ N it holds Cov h( ) s , ˜h( ) t ≥ 0. Penalized estimation Regularized problem: min W R(W ) + λL (W ), (1) where R(W ) is a loss function, λL (W ) is a penalty, λ > 0. For Bayesian models with prior dis- tribution π(W ), the maximum a posteriori (MAP) solves (1) with: L (W ) ∝ − log π(W ) −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 U1 U2 Layer 1 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 U1 U2 Layer 2 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 U1 U2 Layer 3 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 U1 U2 Layer 10 Theorem (Vladimirova et al., 2018) The -th hidden layer units U( ) (pre-activation g( ) or post-activation h( ) ) of a feed-forward Bayesian neural network with: • Gaussian priors on weights and • extended envelope condition activation function φ have sub-Weibull marginal prior distribution with optimal tail pa- rameter θ = /2, conditional on the input x: U( ) ∼ subW( /2), Prior distributions of layers = 1, 2, 3 Illustration of units marginal prior distributions from the first three hidden layers. Neural network parameters: (N, H1, H2, H3) = (50, 25, 24, 4). 0 2 4 6 8 10 Value 0.00 0.05 0.10 0.15 Density sub-W(1/2) sub-W(1) sub-W(3/2) 0 2 4 6 8 10 Value 0.0 0.1 0.2 0.3 0.4 0.5 P(X≥x) Proof sketch Induction w.r.t. layer depth : h( ) k k /2 , which is the moment characterization of sub-Weibull variable. • Extended envelope property implies h( ) k g( ) k • Base step: g ∼ N(0, σ2 ), g k √ k. Thus, h k = φ(g) k g k √ k. • Inductive step: suppose h( −1) k k ( − 1)/2 . • Lower bound: non-negative covariance theorem: Cov h( −1) s , ˜h( −1) t ≥ 0. • Upper bound: Holder’s inequality • g( ) = H j=1 W ( −1) i,j h ( −1) j implies h( ) k g( ) k k /2 . Sparsity interpretation MAP on weights is L2-reg. Independent Gaussian prior π(W ) ∝ L =1 i,j e−1 2(W ( ) i,j )2 , is equivalent to the weight decay penalty with negative log-prior: L (W ) ∝ L =1 i,j (W ( ) i,j )2 = W 2 2, MAP on units induces sparsity The joint prior distribution for all the units can be expressed by Sklar’s rep- resentation theorem as π(U) = L =1 H m=1 π( ) m (U( ) m ) C(F(U)), where C is the copula of U (charac- terizes all the dependence between the units), F is its cumulative distribution function. The penalty is the negative log-prior: L (U) ≈ U(1) 2 2 + · · · + U(L) 2/L 2/L − log C(F(U)). Layer W -penalty U-penalty 1 W (1) 2 2, L2 U(1) 2 2, L2 2 W (2) 2 2, L2 U(2) , L1 W ( ) 2 2, L2 U( ) 2/ 2/ , L2/ Conclusion We prove that the marginal prior unit distributions are heavier-tailed as depth increases. We further interpret this finding, showing that the units tend to be more sparsely represented as layers become deeper. This result provides new theoretical insight on deep Bayesian neural networks, under- pinning their natural shrinkage prop- erties and practical potential. References Vladimirova, M., Arbel, J., and Mesejo, P. (2018). Bayesian neural networks increas- ingly sparsify their units with depth. arXiv preprint arXiv:1810.05193.