SlideShare a Scribd company logo
Bayesian neural networks increasingly
sparsify their units with depth
Mariia Vladimirova1
, Julyan Arbel1
and Pablo Mesejo2
1
Inria Grenoble Rhône-Alpes, France
2
University of Granada, Spain
Introduction
We investigate deep Bayesian neural
networks with Gaussian priors on the
weights and ReLU-like nonlinearities.
See Vladimirova et al. (2018).
...
... ...
...
...
...
...
x
h(1)
h(2)
h( )
input
layer
1st
hid.
layer
2nd
hid.
layer
th
hid.
layer
subW(1
2) subW(1) subW(2)
Notations
Given an input x ∈ RN
, the -th hid-
den layer unit activations are defined
as
g( )
(x) = W ( )
h( −1)
(x),
h( )
(x) = φ(g( )
(x)).
Assumptions
• Gaussian prior on weights:
W
( )
i,j ∼ N(0, σ2
w),
• A nonlinearity φ : R → R is said
to obey the extended envelope
property if there exist
c1, c2, d1, d2 ≥ 0 such that
|φ(u)| ≥ c1 + d1|u| for u ∈ R+
or u ∈ R−,
|φ(u)| ≤ c2 + d2|u| for u ∈ R.
Sub-Weibull
A random variable X, such that
P(|X| ≥ x) ≤ exp −x
1/θ
/K
for all x ≥ 0 and for some K > 0, is
called a sub-Weibull random vari-
able with tail parameter θ > 0:
X ∼ subW(θ).
Moment property:
X ∼ subW(θ) implies
X k = E|X|k
1/k
kθ
,
Meaning for all k ∈ N and for some
constants d, D > 0,
d < X k/kθ
< D.
Covariance theorem
The covariance between hidden
units of the same layer is non-
negative. Moreover, for any -th
hidden layer units h( )
and ˜h( )
, for
s, t ∈ N it holds
Cov h( )
s
, ˜h( )
t
≥ 0.
Penalized estimation
Regularized problem:
min
W
R(W ) + λL (W ), (1)
where R(W ) is a loss function,
λL (W ) is a penalty, λ > 0.
For Bayesian models with prior dis-
tribution π(W ), the maximum a
posteriori (MAP) solves (1) with:
L (W ) ∝ − log π(W )
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
U1
U2
Layer 1
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
U1
U2
Layer 2
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
U1
U2
Layer 3
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0
U1
U2
Layer 10
Theorem (Vladimirova et al., 2018)
The -th hidden layer units U( )
(pre-activation g( )
or post-activation h( )
)
of a feed-forward Bayesian neural network with:
• Gaussian priors on weights and
• extended envelope condition activation function φ
have sub-Weibull marginal prior distribution with optimal tail pa-
rameter θ = /2, conditional on the input x:
U( )
∼ subW( /2),
Prior distributions of layers = 1, 2, 3
Illustration of units marginal prior distributions from the first three hidden
layers. Neural network parameters: (N, H1, H2, H3) = (50, 25, 24, 4).
0 2 4 6 8 10
Value
0.00
0.05
0.10
0.15
Density
sub-W(1/2)
sub-W(1)
sub-W(3/2)
0 2 4 6 8 10
Value
0.0
0.1
0.2
0.3
0.4
0.5
P(X≥x)
Proof sketch
Induction w.r.t. layer depth :
h( )
k k /2
,
which is the moment characterization
of sub-Weibull variable.
• Extended envelope property implies
h( )
k g( )
k
• Base step: g ∼ N(0, σ2
),
g k
√
k. Thus,
h k = φ(g) k g k
√
k.
• Inductive step: suppose
h( −1)
k k
( − 1)/2
.
• Lower bound: non-negative
covariance theorem:
Cov h( −1)
s
, ˜h( −1)
t
≥ 0.
• Upper bound: Holder’s inequality
• g( )
= H
j=1 W
( −1)
i,j h
( −1)
j implies
h( )
k g( )
k k /2
.
Sparsity interpretation
MAP on weights is L2-reg.
Independent Gaussian prior
π(W ) ∝
L
=1 i,j
e−1
2(W
( )
i,j )2
,
is equivalent to the weight decay
penalty with negative log-prior:
L (W ) ∝
L
=1 i,j
(W
( )
i,j )2
= W 2
2,
MAP on units induces sparsity
The joint prior distribution for all the
units can be expressed by Sklar’s rep-
resentation theorem as
π(U) =
L
=1
H
m=1
π( )
m (U( )
m ) C(F(U)),
where C is the copula of U (charac-
terizes all the dependence between the
units), F is its cumulative distribution
function. The penalty is the negative
log-prior:
L (U) ≈ U(1) 2
2 + · · · + U(L) 2/L
2/L
− log C(F(U)).
Layer W -penalty U-penalty
1 W (1) 2
2, L2
U(1) 2
2, L2
2 W (2) 2
2, L2
U(2)
, L1
W ( ) 2
2, L2
U( ) 2/
2/ , L2/
Conclusion
We prove that the marginal prior
unit distributions are heavier-tailed as
depth increases. We further interpret
this finding, showing that the units
tend to be more sparsely represented
as layers become deeper. This result
provides new theoretical insight on
deep Bayesian neural networks, under-
pinning their natural shrinkage prop-
erties and practical potential.
References
Vladimirova, M., Arbel, J., and Mesejo, P.
(2018). Bayesian neural networks increas-
ingly sparsify their units with depth. arXiv
preprint arXiv:1810.05193.

More Related Content

What's hot

Adaptive dynamic programming for control
Adaptive dynamic programming for controlAdaptive dynamic programming for control
Adaptive dynamic programming for control
Springer
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systemsAdaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
International Journal of Power Electronics and Drive Systems
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
hirokazutanaka
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
Claudio Attaccalite
 
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network DynamicsJAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
hirokazutanaka
 
Causal Dynamical Triangulations
Causal Dynamical TriangulationsCausal Dynamical Triangulations
Causal Dynamical Triangulations
Rene García
 
Chemical dynamics and rare events in soft matter physics
Chemical dynamics and rare events in soft matter physicsChemical dynamics and rare events in soft matter physics
Chemical dynamics and rare events in soft matter physics
Boris Fackovec
 
Ch04 4
Ch04 4Ch04 4
Ch04 4
Rendy Robert
 
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
inventionjournals
 
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Luke Underwood
 
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
Alexander Decker
 
3.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-243.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-24
Alexander Decker
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
hirokazutanaka
 
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
hirokazutanaka
 
Ch13
Ch13Ch13
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
hirokazutanaka
 
Multi degree of freedom systems
Multi degree of freedom systemsMulti degree of freedom systems
Multi degree of freedom systems
YeltsinHUAMANAQUINO1
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
Akira Tanimoto
 
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
hirokazutanaka
 
Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
Core Condor
 

What's hot (20)

Adaptive dynamic programming for control
Adaptive dynamic programming for controlAdaptive dynamic programming for control
Adaptive dynamic programming for control
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systemsAdaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
 
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network DynamicsJAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
 
Causal Dynamical Triangulations
Causal Dynamical TriangulationsCausal Dynamical Triangulations
Causal Dynamical Triangulations
 
Chemical dynamics and rare events in soft matter physics
Chemical dynamics and rare events in soft matter physicsChemical dynamics and rare events in soft matter physics
Chemical dynamics and rare events in soft matter physics
 
Ch04 4
Ch04 4Ch04 4
Ch04 4
 
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
A Common Fixed Point Theorem on Fuzzy Metric Space Using Weakly Compatible an...
 
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
Electromagnetic Scattering from Objects with Thin Coatings.2016.05.04.02
 
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
11.0003www.iiste.org call for paper.common fixed point theorem for compatible...
 
3.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-243.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-24
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
 
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
Computational Motor Control: Optimal Control for Deterministic Systems (JAIST...
 
Ch13
Ch13Ch13
Ch13
 
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
 
Multi degree of freedom systems
Multi degree of freedom systemsMulti degree of freedom systems
Multi degree of freedom systems
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
 
Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
 

Similar to Bayesian neural networks increasingly sparsify their units with depth

1 hofstad
1 hofstad1 hofstad
1 hofstad
Yandex
 
PCA on graph/network
PCA on graph/networkPCA on graph/network
PCA on graph/network
Daisuke Yoneoka
 
Schrodinger equation in quantum mechanics
Schrodinger equation in quantum mechanicsSchrodinger equation in quantum mechanics
Schrodinger equation in quantum mechanics
RakeshPatil2528
 
Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
Lake Como School of Advanced Studies
 
Detection of unknown signal
Detection of unknown signalDetection of unknown signal
Detection of unknown signal
sumitf1
 
Alexei Starobinsky - Inflation: the present status
Alexei Starobinsky - Inflation: the present statusAlexei Starobinsky - Inflation: the present status
Alexei Starobinsky - Inflation: the present status
SEENET-MTP
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methods
Mercier Jean-Marc
 
Ph 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSPh 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICS
Chandan Singh
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
Elvis DOHMATOB
 
adv-2015-16-solution-09
adv-2015-16-solution-09adv-2015-16-solution-09
adv-2015-16-solution-09
志远 姚
 
Lecture5-FEA.pdf
Lecture5-FEA.pdfLecture5-FEA.pdf
Lecture5-FEA.pdf
SHABANHADAYAT
 
dhirota_hone_corrected
dhirota_hone_correcteddhirota_hone_corrected
dhirota_hone_corrected
Andy Hone
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
Mercier Jean-Marc
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
Geometric properties for parabolic and elliptic pde
Geometric properties for parabolic and elliptic pdeGeometric properties for parabolic and elliptic pde
Geometric properties for parabolic and elliptic pde
Springer
 
Bird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysisBird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysis
Radboud University Medical Center
 
HashiamKadhimFNLHD
HashiamKadhimFNLHDHashiamKadhimFNLHD
HashiamKadhimFNLHD
Hashiam Kadhim
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
The Statistical and Applied Mathematical Sciences Institute
 
Draft classical feynmangraphs higgs
Draft classical feynmangraphs higgsDraft classical feynmangraphs higgs
Draft classical feynmangraphs higgs
foxtrot jp R
 
Active Controller Design for Regulating the Output of the Sprott-P System
Active Controller Design for Regulating the Output of the Sprott-P SystemActive Controller Design for Regulating the Output of the Sprott-P System
Active Controller Design for Regulating the Output of the Sprott-P System
ijccmsjournal
 

Similar to Bayesian neural networks increasingly sparsify their units with depth (20)

1 hofstad
1 hofstad1 hofstad
1 hofstad
 
PCA on graph/network
PCA on graph/networkPCA on graph/network
PCA on graph/network
 
Schrodinger equation in quantum mechanics
Schrodinger equation in quantum mechanicsSchrodinger equation in quantum mechanics
Schrodinger equation in quantum mechanics
 
Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
 
Detection of unknown signal
Detection of unknown signalDetection of unknown signal
Detection of unknown signal
 
Alexei Starobinsky - Inflation: the present status
Alexei Starobinsky - Inflation: the present statusAlexei Starobinsky - Inflation: the present status
Alexei Starobinsky - Inflation: the present status
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methods
 
Ph 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSPh 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICS
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
adv-2015-16-solution-09
adv-2015-16-solution-09adv-2015-16-solution-09
adv-2015-16-solution-09
 
Lecture5-FEA.pdf
Lecture5-FEA.pdfLecture5-FEA.pdf
Lecture5-FEA.pdf
 
dhirota_hone_corrected
dhirota_hone_correcteddhirota_hone_corrected
dhirota_hone_corrected
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Geometric properties for parabolic and elliptic pde
Geometric properties for parabolic and elliptic pdeGeometric properties for parabolic and elliptic pde
Geometric properties for parabolic and elliptic pde
 
Bird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysisBird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysis
 
HashiamKadhimFNLHD
HashiamKadhimFNLHDHashiamKadhimFNLHD
HashiamKadhimFNLHD
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
 
Draft classical feynmangraphs higgs
Draft classical feynmangraphs higgsDraft classical feynmangraphs higgs
Draft classical feynmangraphs higgs
 
Active Controller Design for Regulating the Output of the Sprott-P System
Active Controller Design for Regulating the Output of the Sprott-P SystemActive Controller Design for Regulating the Output of the Sprott-P System
Active Controller Design for Regulating the Output of the Sprott-P System
 

More from Julyan Arbel

UCD_talk_nov_2020
UCD_talk_nov_2020UCD_talk_nov_2020
UCD_talk_nov_2020
Julyan Arbel
 
Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian Nonparametrics
Julyan Arbel
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
Julyan Arbel
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
Julyan Arbel
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Julyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
Julyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
Julyan Arbel
 
Lindley smith 1972
Lindley smith 1972Lindley smith 1972
Lindley smith 1972
Julyan Arbel
 
Berger 2000
Berger 2000Berger 2000
Berger 2000
Julyan Arbel
 
Seneta 1993
Seneta 1993Seneta 1993
Seneta 1993
Julyan Arbel
 
Lehmann 1990
Lehmann 1990Lehmann 1990
Lehmann 1990
Julyan Arbel
 
Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985
Julyan Arbel
 
Hastings 1970
Hastings 1970Hastings 1970
Hastings 1970
Julyan Arbel
 
Jefferys Berger 1992
Jefferys Berger 1992Jefferys Berger 1992
Jefferys Berger 1992
Julyan Arbel
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
Julyan Arbel
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
Julyan Arbel
 
R in latex
R in latexR in latex
R in latex
Julyan Arbel
 
Arbel oviedo
Arbel oviedoArbel oviedo
Arbel oviedo
Julyan Arbel
 
Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)
Julyan Arbel
 
Causesof effects
Causesof effectsCausesof effects
Causesof effects
Julyan Arbel
 

More from Julyan Arbel (20)

UCD_talk_nov_2020
UCD_talk_nov_2020UCD_talk_nov_2020
UCD_talk_nov_2020
 
Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian Nonparametrics
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Lindley smith 1972
Lindley smith 1972Lindley smith 1972
Lindley smith 1972
 
Berger 2000
Berger 2000Berger 2000
Berger 2000
 
Seneta 1993
Seneta 1993Seneta 1993
Seneta 1993
 
Lehmann 1990
Lehmann 1990Lehmann 1990
Lehmann 1990
 
Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985
 
Hastings 1970
Hastings 1970Hastings 1970
Hastings 1970
 
Jefferys Berger 1992
Jefferys Berger 1992Jefferys Berger 1992
Jefferys Berger 1992
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
 
R in latex
R in latexR in latex
R in latex
 
Arbel oviedo
Arbel oviedoArbel oviedo
Arbel oviedo
 
Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)
 
Causesof effects
Causesof effectsCausesof effects
Causesof effects
 

Recently uploaded

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 

Recently uploaded (20)

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 

Bayesian neural networks increasingly sparsify their units with depth

  • 1. Bayesian neural networks increasingly sparsify their units with depth Mariia Vladimirova1 , Julyan Arbel1 and Pablo Mesejo2 1 Inria Grenoble Rhône-Alpes, France 2 University of Granada, Spain Introduction We investigate deep Bayesian neural networks with Gaussian priors on the weights and ReLU-like nonlinearities. See Vladimirova et al. (2018). ... ... ... ... ... ... ... x h(1) h(2) h( ) input layer 1st hid. layer 2nd hid. layer th hid. layer subW(1 2) subW(1) subW(2) Notations Given an input x ∈ RN , the -th hid- den layer unit activations are defined as g( ) (x) = W ( ) h( −1) (x), h( ) (x) = φ(g( ) (x)). Assumptions • Gaussian prior on weights: W ( ) i,j ∼ N(0, σ2 w), • A nonlinearity φ : R → R is said to obey the extended envelope property if there exist c1, c2, d1, d2 ≥ 0 such that |φ(u)| ≥ c1 + d1|u| for u ∈ R+ or u ∈ R−, |φ(u)| ≤ c2 + d2|u| for u ∈ R. Sub-Weibull A random variable X, such that P(|X| ≥ x) ≤ exp −x 1/θ /K for all x ≥ 0 and for some K > 0, is called a sub-Weibull random vari- able with tail parameter θ > 0: X ∼ subW(θ). Moment property: X ∼ subW(θ) implies X k = E|X|k 1/k kθ , Meaning for all k ∈ N and for some constants d, D > 0, d < X k/kθ < D. Covariance theorem The covariance between hidden units of the same layer is non- negative. Moreover, for any -th hidden layer units h( ) and ˜h( ) , for s, t ∈ N it holds Cov h( ) s , ˜h( ) t ≥ 0. Penalized estimation Regularized problem: min W R(W ) + λL (W ), (1) where R(W ) is a loss function, λL (W ) is a penalty, λ > 0. For Bayesian models with prior dis- tribution π(W ), the maximum a posteriori (MAP) solves (1) with: L (W ) ∝ − log π(W ) −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 U1 U2 Layer 1 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 U1 U2 Layer 2 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 U1 U2 Layer 3 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 U1 U2 Layer 10 Theorem (Vladimirova et al., 2018) The -th hidden layer units U( ) (pre-activation g( ) or post-activation h( ) ) of a feed-forward Bayesian neural network with: • Gaussian priors on weights and • extended envelope condition activation function φ have sub-Weibull marginal prior distribution with optimal tail pa- rameter θ = /2, conditional on the input x: U( ) ∼ subW( /2), Prior distributions of layers = 1, 2, 3 Illustration of units marginal prior distributions from the first three hidden layers. Neural network parameters: (N, H1, H2, H3) = (50, 25, 24, 4). 0 2 4 6 8 10 Value 0.00 0.05 0.10 0.15 Density sub-W(1/2) sub-W(1) sub-W(3/2) 0 2 4 6 8 10 Value 0.0 0.1 0.2 0.3 0.4 0.5 P(X≥x) Proof sketch Induction w.r.t. layer depth : h( ) k k /2 , which is the moment characterization of sub-Weibull variable. • Extended envelope property implies h( ) k g( ) k • Base step: g ∼ N(0, σ2 ), g k √ k. Thus, h k = φ(g) k g k √ k. • Inductive step: suppose h( −1) k k ( − 1)/2 . • Lower bound: non-negative covariance theorem: Cov h( −1) s , ˜h( −1) t ≥ 0. • Upper bound: Holder’s inequality • g( ) = H j=1 W ( −1) i,j h ( −1) j implies h( ) k g( ) k k /2 . Sparsity interpretation MAP on weights is L2-reg. Independent Gaussian prior π(W ) ∝ L =1 i,j e−1 2(W ( ) i,j )2 , is equivalent to the weight decay penalty with negative log-prior: L (W ) ∝ L =1 i,j (W ( ) i,j )2 = W 2 2, MAP on units induces sparsity The joint prior distribution for all the units can be expressed by Sklar’s rep- resentation theorem as π(U) = L =1 H m=1 π( ) m (U( ) m ) C(F(U)), where C is the copula of U (charac- terizes all the dependence between the units), F is its cumulative distribution function. The penalty is the negative log-prior: L (U) ≈ U(1) 2 2 + · · · + U(L) 2/L 2/L − log C(F(U)). Layer W -penalty U-penalty 1 W (1) 2 2, L2 U(1) 2 2, L2 2 W (2) 2 2, L2 U(2) , L1 W ( ) 2 2, L2 U( ) 2/ 2/ , L2/ Conclusion We prove that the marginal prior unit distributions are heavier-tailed as depth increases. We further interpret this finding, showing that the units tend to be more sparsely represented as layers become deeper. This result provides new theoretical insight on deep Bayesian neural networks, under- pinning their natural shrinkage prop- erties and practical potential. References Vladimirova, M., Arbel, J., and Mesejo, P. (2018). Bayesian neural networks increas- ingly sparsify their units with depth. arXiv preprint arXiv:1810.05193.