SlideShare a Scribd company logo
1 of 31
Download to read offline
Bayesian optimal adaptive estimation using a
sieve prior
YES IV Workshop
Julyan Arbel, arbel@ensae.fr
ENSAE-CREST-Université Paris Dauphine
November 9, 2010
1 / 21
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
2 / 21
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
3 / 21
Introduction
• Posterior concentration rate and risk convergence rate in a
Bayesian nonparametric setting.
• Results in the same spirit as the ones by Ghosal, Ghosh and Van
der Vaart (2000) and Ghosal and Van der Vaart (2007), in the
specific case of models which are suitable for the use of sieve
priors.
• Use of a family of sieve priors (introduced by Zhao (2000) in the
white noise model).
• Infinite dimensional parameter from a Sobolev smoothness class.
4 / 21
Notations
• Let a model (X(n)
, A(n)
, P
(n)
θ : θ ∈ Θ) with observations
X(n)
= (Xn
i )1≤i≤n, and
Θ =
∞
k=1
Rk
.
5 / 21
Notations
• Let a model (X(n)
, A(n)
, P
(n)
θ : θ ∈ Θ) with observations
X(n)
= (Xn
i )1≤i≤n, and
Θ =
∞
k=1
Rk
.
• Denote θ0 the parameter associated to the true model. Densities
are denoted p
(n)
θ (p
(n)
0 for θ0). The first k coordinates of θ0 are
denoted θ0k .
5 / 21
Notations
• Let a model (X(n)
, A(n)
, P
(n)
θ : θ ∈ Θ) with observations
X(n)
= (Xn
i )1≤i≤n, and
Θ =
∞
k=1
Rk
.
• Denote θ0 the parameter associated to the true model. Densities
are denoted p
(n)
θ (p
(n)
0 for θ0). The first k coordinates of θ0 are
denoted θ0k .
• A sieve prior Π on Θ is defined as follows
Π(θ) =
k
λk Πk (θ),
k
λk = 1,
and
θi
τi
∼ g, where τi > 0.
5 / 21
We define four different divergencies
K(f, g) =
ˆ
f log(f/g)dµ,
Vp,0(f, g) =
ˆ
f |log(f/g) − K(f, g)|
p
dµ,
K(f, g) =
ˆ
p
(n)
0 |log(f, g)| dµ,
Vp,0(f, g) =
ˆ
p
(n)
0 |log(f, g) − K(f, g)|
p
dµ.
6 / 21
Define a Kullblack-Leibler neighborhood
Bn = θ : K p
(n)
0 , p
(n)
θ ≤ n 2
n, Vp,0 p
(n)
0 , p
(n)
θ ≤ n 2
n
p/2
.
We use a semimetric dn on Θ, and define Θn = θ ∈ Rkn
, θ ≤ ωn
with kn = k0n 2
n/ log n and ωn some power of n.
The posterior distribution is defined by
Π(B|X(n)
) =
´
B
p
(n)
θ X(n)
dΠ(θ)
´
Θ
p
(n)
θ X(n) dΠ(θ)
.
7 / 21
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
8 / 21
Assumptions
Assumption 1 On the prior
Assume there exist a, b, c, d > 0 such that λk and gn satisfy
e−ak log k
≤ λk ≤ e−bk log k
,
Ae−A1|t|d
≤ g(t) ≤ Be−B1|t|d
,
∃T, τ0 > 0, s.t. min
i≤kn
τi ≥ n−T
and max
i>0
τi ≤ τ0 < ∞,
kn
i=1
|θ0i |
d
/τd
i ≤ Ckn log n.
9 / 21
Assumptions
Assumption 1 On the prior
Assume there exist a, b, c, d > 0 such that λk and gn satisfy
e−ak log k
≤ λk ≤ e−bk log k
,
Ae−A1|t|d
≤ g(t) ≤ Be−B1|t|d
,
∃T, τ0 > 0, s.t. min
i≤kn
τi ≥ n−T
and max
i>0
τi ≤ τ0 < ∞,
kn
i=1
|θ0i |
d
/τd
i ≤ Ckn log n.
Assumption 2 On the rate of convergence
The rate of convergence n is bounded below by the two inequalities
K p
(n)
0 , p
(n)
0kn
≤ n 2
n, and Vp,0 p
(n)
0 , p
(n)
0kn
≤ n 2
n
p/2
.
9 / 21
Assumption 3 On divergencies
K and Vp,0 satisfy
K p
(n)
0kn
, p
(n)
θ ≤ C
n
2
θ0kn
− θ
2
, Vp,0 p
(n)
0kn
, p
(n)
θ ≤ Cnp/2
θ0kn
− θ
p
,
10 / 21
Assumption 3 On divergencies
K and Vp,0 satisfy
K p
(n)
0kn
, p
(n)
θ ≤ C
n
2
θ0kn
− θ
2
, Vp,0 p
(n)
0kn
, p
(n)
θ ≤ Cnp/2
θ0kn
− θ
p
,
Assumption 4 On semimetric dn
There exist G0, G > 0 such that, for any two θ, θ ,
dn(θ, θ ) ≤ CkG0
n θ − θ
G
10 / 21
Assumption 5 Test condition
There exist constants c1, ζ > 0 such that for every > 0 and for each
θ1 such that dn(θ1, θ0) > , one can construct a test statistic φn ∈ [0, 1]
which satisfies
E
(n)
0 φn ≤ e−c1n 2
, sup
dn(θ,θ1)<ζ
E
(n)
θ (1 − φn) ≤ e−c1n 2
.
11 / 21
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
12 / 21
Results
Theorem Posterior concentration rate
The rate of convergence of the posterior distribution relative to dn is
n,
E
(n)
0 Π d2
n (θ, θ0) ≥ M 2
n|X(n)
→ 0.
13 / 21
Results
Theorem Posterior concentration rate
The rate of convergence of the posterior distribution relative to dn is
n,
E
(n)
0 Π d2
n (θ, θ0) ≥ M 2
n|X(n)
→ 0.
Corollary Risk convergence rate
If assumptions are satisfied with p > 2, and if dn is bounded, then the
integrated posterior risk given θ0 and Π converges at least at the
same rate n
Rdn
n (θ0, Π) = E
(n)
0 EΠ
d2
n (θ, θ0)|X(n)
= O 2
n .
13 / 21
Suppose the true parameter θ0 has the Sobolev regularity (β > 1/2)
Θβ(Q0) = θ :
∞
i=1
θ2
i i2β
≤ Q0 < ∞ .
Then the assumption of the following Corollary holds in the Gaussian
white noise model and in the regression. For these models, the rate
given in the following Corollary coincides with the minimax rate (up to
a log n term) in these models: it is in this sense adaptive optimal.
14 / 21
Corollary
If θ ∈ Θβ(Q0) and
K p
(n)
0 , p
(n)
0kn
≤ Cn θ0 − θ0kn
2
, Vp,0 p
(n)
0 , p
(n)
0kn
≤ Cnp/2
θ0 − θ0kn
p
,
then the rate n is
n = 0
log n
n
β
2β+1
.
15 / 21
Corollary
If θ ∈ Θβ(Q0) and
K p
(n)
0 , p
(n)
0kn
≤ Cn θ0 − θ0kn
2
, Vp,0 p
(n)
0 , p
(n)
0kn
≤ Cnp/2
θ0 − θ0kn
p
,
then the rate n is
n = 0
log n
n
β
2β+1
.
15 / 21
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
16 / 21
White noise model
dXn
(t) = f0(t)dt +
1
√
n
dW(t), 0 ≤ t ≤ 1,
By Fourier transform on a basis (φi ), equivalent normal mean model
Xn
i = θ0i +
1
√
n
ξi , i = 1, 2, . . .
17 / 21
White noise model
dXn
(t) = f0(t)dt +
1
√
n
dW(t), 0 ≤ t ≤ 1,
By Fourier transform on a basis (φi ), equivalent normal mean model
Xn
i = θ0i +
1
√
n
ξi , i = 1, 2, . . .
Global L2
loss
RL2
n = E
(n)
0
ˆfn − f0
2
= E
(n)
0
∞
i=1
θni − θ0i
2
.
Pointwise l2
loss at point t (with ai = φi (t))
Rl2
n = E
(n)
0
ˆfn(t) − f0(t)
2
= E
(n)
0
∞
i=1
ai θni − θ0i
2
.
17 / 21
Results in the white noise model
We show that the model satisfies Assumptions 1 to 5.
Proposition
Under global loss, concentration and risk rates are adaptive optimal
E
(n)
0 Π θ − θ0
2
≥ M 2
n|X(n)
→ 0,
RL2
n (θ0, Π) = E
(n)
0 EΠ
θ − θ0
2
|X(n)
= O 2
n .
18 / 21
Pointwise loss
Pointwise l2
loss does not satisfy Assumption 4. We can show the
following lower bound on the rate of the associated risk.
Proposition
Under pointwise loss, a lower bound on the frequentist risk rate is
given by
sup
θ0∈Θβ (Q0)
Rl2
n (θ0, Π)
n− 2β−1
2β+1
log2
n
.
19 / 21
Pointwise loss
Pointwise l2
loss does not satisfy Assumption 4. We can show the
following lower bound on the rate of the associated risk.
Proposition
Under pointwise loss, a lower bound on the frequentist risk rate is
given by
sup
θ0∈Θβ (Q0)
Rl2
n (θ0, Π)
n− 2β−1
2β+1
log2
n
.
A global optimal estimator can not be pointwise optimal (result stated
by Cai, Low and Zhao, 2007).
There is a penalty here from global to pointwise loss of (up to a log n
term)
n
1
2β(2β+1) .
19 / 21
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
20 / 21
Conclusion
• We have first derived posterior concentration and risk
convergence rates for a variety of models that accomodate a
sieve prior.
21 / 21
Conclusion
• We have first derived posterior concentration and risk
convergence rates for a variety of models that accomodate a
sieve prior.
• In a second result we have obtained a lower bound for the
frequentist risk under pointwise loss, that is to say that the sieve
prior does not achieve the optimal rate under pointwise loss.
21 / 21
Conclusion
• We have first derived posterior concentration and risk
convergence rates for a variety of models that accomodate a
sieve prior.
• In a second result we have obtained a lower bound for the
frequentist risk under pointwise loss, that is to say that the sieve
prior does not achieve the optimal rate under pointwise loss.
• Further work should focus on posterior concentration rate under
pointwise loss.
21 / 21

More Related Content

What's hot

Wk 12 fr bode plot nyquist may 9 2016
Wk 12 fr bode plot nyquist   may 9 2016Wk 12 fr bode plot nyquist   may 9 2016
Wk 12 fr bode plot nyquist may 9 2016Charlton Inao
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GANJinho Lee
 
Nyquist criterion for zero ISI
Nyquist criterion for zero ISINyquist criterion for zero ISI
Nyquist criterion for zero ISIGunasekara Reddy
 
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Alexander Litvinenko
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Jagadeeswaran Rathinavel
 
Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Vissarion Fisikopoulos
 

What's hot (20)

Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
Wk 12 fr bode plot nyquist may 9 2016
Wk 12 fr bode plot nyquist   may 9 2016Wk 12 fr bode plot nyquist   may 9 2016
Wk 12 fr bode plot nyquist may 9 2016
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Chapter6 sampling
Chapter6 samplingChapter6 sampling
Chapter6 sampling
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Matched filter
Matched filterMatched filter
Matched filter
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
Nyquist criterion for zero ISI
Nyquist criterion for zero ISINyquist criterion for zero ISI
Nyquist criterion for zero ISI
 
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration
 
Sampling
SamplingSampling
Sampling
 
Siphon
SiphonSiphon
Siphon
 
Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.Oracle-based algorithms for high-dimensional polytopes.
Oracle-based algorithms for high-dimensional polytopes.
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
 

Similar to Bayesian adaptive optimal estimation using a sieve prior

Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measuresJulyan Arbel
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methodsChristian Robert
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning Sean Meyn
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesPierre Jacob
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
Formulas statistics
Formulas statisticsFormulas statistics
Formulas statisticsPrashi_Jain
 
asymptotic notations i
asymptotic notations iasymptotic notations i
asymptotic notations iAli mahmood
 
Maksim Zhukovskii – Zero-one k-laws for G(n,n−α)
Maksim Zhukovskii – Zero-one k-laws for G(n,n−α)Maksim Zhukovskii – Zero-one k-laws for G(n,n−α)
Maksim Zhukovskii – Zero-one k-laws for G(n,n−α)Yandex
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
 
Decomposition and Denoising for moment sequences using convex optimization
Decomposition and Denoising for moment sequences using convex optimizationDecomposition and Denoising for moment sequences using convex optimization
Decomposition and Denoising for moment sequences using convex optimizationBadri Narayan Bhaskar
 

Similar to Bayesian adaptive optimal estimation using a sieve prior (20)

Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Mazurov ferrara2014
Mazurov ferrara2014Mazurov ferrara2014
Mazurov ferrara2014
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Formulas statistics
Formulas statisticsFormulas statistics
Formulas statistics
 
asymptotic notations i
asymptotic notations iasymptotic notations i
asymptotic notations i
 
02 asymp
02 asymp02 asymp
02 asymp
 
Presentation OCIP 2015
Presentation OCIP 2015Presentation OCIP 2015
Presentation OCIP 2015
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Maksim Zhukovskii – Zero-one k-laws for G(n,n−α)
Maksim Zhukovskii – Zero-one k-laws for G(n,n−α)Maksim Zhukovskii – Zero-one k-laws for G(n,n−α)
Maksim Zhukovskii – Zero-one k-laws for G(n,n−α)
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
Decomposition and Denoising for moment sequences using convex optimization
Decomposition and Denoising for moment sequences using convex optimizationDecomposition and Denoising for moment sequences using convex optimization
Decomposition and Denoising for moment sequences using convex optimization
 

More from Julyan Arbel

Bayesian neural networks increasingly sparsify their units with depth
Bayesian neural networks increasingly sparsify their units with depthBayesian neural networks increasingly sparsify their units with depth
Bayesian neural networks increasingly sparsify their units with depthJulyan Arbel
 
Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsJulyan Arbel
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsJulyan Arbel
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingJulyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
Lindley smith 1972
Lindley smith 1972Lindley smith 1972
Lindley smith 1972Julyan Arbel
 
Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Julyan Arbel
 
Jefferys Berger 1992
Jefferys Berger 1992Jefferys Berger 1992
Jefferys Berger 1992Julyan Arbel
 
Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Julyan Arbel
 

More from Julyan Arbel (20)

UCD_talk_nov_2020
UCD_talk_nov_2020UCD_talk_nov_2020
UCD_talk_nov_2020
 
Bayesian neural networks increasingly sparsify their units with depth
Bayesian neural networks increasingly sparsify their units with depthBayesian neural networks increasingly sparsify their units with depth
Bayesian neural networks increasingly sparsify their units with depth
 
Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian Nonparametrics
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Lindley smith 1972
Lindley smith 1972Lindley smith 1972
Lindley smith 1972
 
Berger 2000
Berger 2000Berger 2000
Berger 2000
 
Seneta 1993
Seneta 1993Seneta 1993
Seneta 1993
 
Lehmann 1990
Lehmann 1990Lehmann 1990
Lehmann 1990
 
Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985
 
Hastings 1970
Hastings 1970Hastings 1970
Hastings 1970
 
Jefferys Berger 1992
Jefferys Berger 1992Jefferys Berger 1992
Jefferys Berger 1992
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
 
R in latex
R in latexR in latex
R in latex
 
Arbel oviedo
Arbel oviedoArbel oviedo
Arbel oviedo
 
Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)
 
Causesof effects
Causesof effectsCausesof effects
Causesof effects
 

Bayesian adaptive optimal estimation using a sieve prior

  • 1. Bayesian optimal adaptive estimation using a sieve prior YES IV Workshop Julyan Arbel, arbel@ensae.fr ENSAE-CREST-Université Paris Dauphine November 9, 2010 1 / 21
  • 2. Outline 1 Motivations 2 Assumptions 3 Results 4 White noise model 5 Conclusion 2 / 21
  • 3. Outline 1 Motivations 2 Assumptions 3 Results 4 White noise model 5 Conclusion 3 / 21
  • 4. Introduction • Posterior concentration rate and risk convergence rate in a Bayesian nonparametric setting. • Results in the same spirit as the ones by Ghosal, Ghosh and Van der Vaart (2000) and Ghosal and Van der Vaart (2007), in the specific case of models which are suitable for the use of sieve priors. • Use of a family of sieve priors (introduced by Zhao (2000) in the white noise model). • Infinite dimensional parameter from a Sobolev smoothness class. 4 / 21
  • 5. Notations • Let a model (X(n) , A(n) , P (n) θ : θ ∈ Θ) with observations X(n) = (Xn i )1≤i≤n, and Θ = ∞ k=1 Rk . 5 / 21
  • 6. Notations • Let a model (X(n) , A(n) , P (n) θ : θ ∈ Θ) with observations X(n) = (Xn i )1≤i≤n, and Θ = ∞ k=1 Rk . • Denote θ0 the parameter associated to the true model. Densities are denoted p (n) θ (p (n) 0 for θ0). The first k coordinates of θ0 are denoted θ0k . 5 / 21
  • 7. Notations • Let a model (X(n) , A(n) , P (n) θ : θ ∈ Θ) with observations X(n) = (Xn i )1≤i≤n, and Θ = ∞ k=1 Rk . • Denote θ0 the parameter associated to the true model. Densities are denoted p (n) θ (p (n) 0 for θ0). The first k coordinates of θ0 are denoted θ0k . • A sieve prior Π on Θ is defined as follows Π(θ) = k λk Πk (θ), k λk = 1, and θi τi ∼ g, where τi > 0. 5 / 21
  • 8. We define four different divergencies K(f, g) = ˆ f log(f/g)dµ, Vp,0(f, g) = ˆ f |log(f/g) − K(f, g)| p dµ, K(f, g) = ˆ p (n) 0 |log(f, g)| dµ, Vp,0(f, g) = ˆ p (n) 0 |log(f, g) − K(f, g)| p dµ. 6 / 21
  • 9. Define a Kullblack-Leibler neighborhood Bn = θ : K p (n) 0 , p (n) θ ≤ n 2 n, Vp,0 p (n) 0 , p (n) θ ≤ n 2 n p/2 . We use a semimetric dn on Θ, and define Θn = θ ∈ Rkn , θ ≤ ωn with kn = k0n 2 n/ log n and ωn some power of n. The posterior distribution is defined by Π(B|X(n) ) = ´ B p (n) θ X(n) dΠ(θ) ´ Θ p (n) θ X(n) dΠ(θ) . 7 / 21
  • 10. Outline 1 Motivations 2 Assumptions 3 Results 4 White noise model 5 Conclusion 8 / 21
  • 11. Assumptions Assumption 1 On the prior Assume there exist a, b, c, d > 0 such that λk and gn satisfy e−ak log k ≤ λk ≤ e−bk log k , Ae−A1|t|d ≤ g(t) ≤ Be−B1|t|d , ∃T, τ0 > 0, s.t. min i≤kn τi ≥ n−T and max i>0 τi ≤ τ0 < ∞, kn i=1 |θ0i | d /τd i ≤ Ckn log n. 9 / 21
  • 12. Assumptions Assumption 1 On the prior Assume there exist a, b, c, d > 0 such that λk and gn satisfy e−ak log k ≤ λk ≤ e−bk log k , Ae−A1|t|d ≤ g(t) ≤ Be−B1|t|d , ∃T, τ0 > 0, s.t. min i≤kn τi ≥ n−T and max i>0 τi ≤ τ0 < ∞, kn i=1 |θ0i | d /τd i ≤ Ckn log n. Assumption 2 On the rate of convergence The rate of convergence n is bounded below by the two inequalities K p (n) 0 , p (n) 0kn ≤ n 2 n, and Vp,0 p (n) 0 , p (n) 0kn ≤ n 2 n p/2 . 9 / 21
  • 13. Assumption 3 On divergencies K and Vp,0 satisfy K p (n) 0kn , p (n) θ ≤ C n 2 θ0kn − θ 2 , Vp,0 p (n) 0kn , p (n) θ ≤ Cnp/2 θ0kn − θ p , 10 / 21
  • 14. Assumption 3 On divergencies K and Vp,0 satisfy K p (n) 0kn , p (n) θ ≤ C n 2 θ0kn − θ 2 , Vp,0 p (n) 0kn , p (n) θ ≤ Cnp/2 θ0kn − θ p , Assumption 4 On semimetric dn There exist G0, G > 0 such that, for any two θ, θ , dn(θ, θ ) ≤ CkG0 n θ − θ G 10 / 21
  • 15. Assumption 5 Test condition There exist constants c1, ζ > 0 such that for every > 0 and for each θ1 such that dn(θ1, θ0) > , one can construct a test statistic φn ∈ [0, 1] which satisfies E (n) 0 φn ≤ e−c1n 2 , sup dn(θ,θ1)<ζ E (n) θ (1 − φn) ≤ e−c1n 2 . 11 / 21
  • 16. Outline 1 Motivations 2 Assumptions 3 Results 4 White noise model 5 Conclusion 12 / 21
  • 17. Results Theorem Posterior concentration rate The rate of convergence of the posterior distribution relative to dn is n, E (n) 0 Π d2 n (θ, θ0) ≥ M 2 n|X(n) → 0. 13 / 21
  • 18. Results Theorem Posterior concentration rate The rate of convergence of the posterior distribution relative to dn is n, E (n) 0 Π d2 n (θ, θ0) ≥ M 2 n|X(n) → 0. Corollary Risk convergence rate If assumptions are satisfied with p > 2, and if dn is bounded, then the integrated posterior risk given θ0 and Π converges at least at the same rate n Rdn n (θ0, Π) = E (n) 0 EΠ d2 n (θ, θ0)|X(n) = O 2 n . 13 / 21
  • 19. Suppose the true parameter θ0 has the Sobolev regularity (β > 1/2) Θβ(Q0) = θ : ∞ i=1 θ2 i i2β ≤ Q0 < ∞ . Then the assumption of the following Corollary holds in the Gaussian white noise model and in the regression. For these models, the rate given in the following Corollary coincides with the minimax rate (up to a log n term) in these models: it is in this sense adaptive optimal. 14 / 21
  • 20. Corollary If θ ∈ Θβ(Q0) and K p (n) 0 , p (n) 0kn ≤ Cn θ0 − θ0kn 2 , Vp,0 p (n) 0 , p (n) 0kn ≤ Cnp/2 θ0 − θ0kn p , then the rate n is n = 0 log n n β 2β+1 . 15 / 21
  • 21. Corollary If θ ∈ Θβ(Q0) and K p (n) 0 , p (n) 0kn ≤ Cn θ0 − θ0kn 2 , Vp,0 p (n) 0 , p (n) 0kn ≤ Cnp/2 θ0 − θ0kn p , then the rate n is n = 0 log n n β 2β+1 . 15 / 21
  • 22. Outline 1 Motivations 2 Assumptions 3 Results 4 White noise model 5 Conclusion 16 / 21
  • 23. White noise model dXn (t) = f0(t)dt + 1 √ n dW(t), 0 ≤ t ≤ 1, By Fourier transform on a basis (φi ), equivalent normal mean model Xn i = θ0i + 1 √ n ξi , i = 1, 2, . . . 17 / 21
  • 24. White noise model dXn (t) = f0(t)dt + 1 √ n dW(t), 0 ≤ t ≤ 1, By Fourier transform on a basis (φi ), equivalent normal mean model Xn i = θ0i + 1 √ n ξi , i = 1, 2, . . . Global L2 loss RL2 n = E (n) 0 ˆfn − f0 2 = E (n) 0 ∞ i=1 θni − θ0i 2 . Pointwise l2 loss at point t (with ai = φi (t)) Rl2 n = E (n) 0 ˆfn(t) − f0(t) 2 = E (n) 0 ∞ i=1 ai θni − θ0i 2 . 17 / 21
  • 25. Results in the white noise model We show that the model satisfies Assumptions 1 to 5. Proposition Under global loss, concentration and risk rates are adaptive optimal E (n) 0 Π θ − θ0 2 ≥ M 2 n|X(n) → 0, RL2 n (θ0, Π) = E (n) 0 EΠ θ − θ0 2 |X(n) = O 2 n . 18 / 21
  • 26. Pointwise loss Pointwise l2 loss does not satisfy Assumption 4. We can show the following lower bound on the rate of the associated risk. Proposition Under pointwise loss, a lower bound on the frequentist risk rate is given by sup θ0∈Θβ (Q0) Rl2 n (θ0, Π) n− 2β−1 2β+1 log2 n . 19 / 21
  • 27. Pointwise loss Pointwise l2 loss does not satisfy Assumption 4. We can show the following lower bound on the rate of the associated risk. Proposition Under pointwise loss, a lower bound on the frequentist risk rate is given by sup θ0∈Θβ (Q0) Rl2 n (θ0, Π) n− 2β−1 2β+1 log2 n . A global optimal estimator can not be pointwise optimal (result stated by Cai, Low and Zhao, 2007). There is a penalty here from global to pointwise loss of (up to a log n term) n 1 2β(2β+1) . 19 / 21
  • 28. Outline 1 Motivations 2 Assumptions 3 Results 4 White noise model 5 Conclusion 20 / 21
  • 29. Conclusion • We have first derived posterior concentration and risk convergence rates for a variety of models that accomodate a sieve prior. 21 / 21
  • 30. Conclusion • We have first derived posterior concentration and risk convergence rates for a variety of models that accomodate a sieve prior. • In a second result we have obtained a lower bound for the frequentist risk under pointwise loss, that is to say that the sieve prior does not achieve the optimal rate under pointwise loss. 21 / 21
  • 31. Conclusion • We have first derived posterior concentration and risk convergence rates for a variety of models that accomodate a sieve prior. • In a second result we have obtained a lower bound for the frequentist risk under pointwise loss, that is to say that the sieve prior does not achieve the optimal rate under pointwise loss. • Further work should focus on posterior concentration rate under pointwise loss. 21 / 21