SlideShare a Scribd company logo
1 of 29
Download to read offline
Coverage of credible intervals for monotone regression
Subhashis Ghoshal,
North Carolina State University
BFF Conferences, Duke University, April 29, 2019
1 / 29
Based on collaborations with my graduate student Moumita Chakraborty
2 / 29
Introduction
3 / 29
Monotone regression
Yi = f (Xi ) + εi , εi iid N(0, σ2), i = 1, . . . , n, independent of
X1, . . . , Xn ∼ G, G has bounded positive density g.
f : [0, 1] → R is monotone.
Wish to make inference on f . More specifically, we shall construct
credible interval for f (x0) at a fixed point with guaranteed frequentist
coverage;
For theoretical study, errors will be only assumed to be subgaussian
with mean zero.
4 / 29
To construct a prior for f , with monotonicity in mind.
A finite random series of step functions:
f =
J
j=1
θj 1(ξj−1 < x ≤ ξj ],
where 0 = ξ0 < ξ1 < . . . < xiJ−1 < ξj = 1 are the knots, θ1, . . . , θJ
are the coefficients, J the number of terms.
For the present purpose, it suffices to work with a deterministic choice
of J and equispaced knots.
5 / 29
Prior
We need to put a prior on θ = (θ1, . . . , θJ).
The conjugate prior for θ is normal. Could be chosen to be a product
of independent normals.
This does not give monotone functions —
f = J
j=1 θj 1(ξj−1 < x ≤ ξj ] is monotone if and only if
θ1 ≤ · · · ≤ θJ.
A simple way to achieve is to restrict the normal prior to the cone
{θ : θ1 ≤ · · · ≤ θJ}
Technically this is also conjugate, but is not easy to work with
theoretically (and to some extent, computationally).
6 / 29
Projection posterior
“If the model space is complicated for a direct prior imposition,
posterior computation and theoretical analysis, so a prior is put on a
larger space and the posterior is projected on the desired subset.”
In this context, samples can be generated from the conjugate normal
posterior which can be isotonized to θ∗
1 ≤ · · · ≤ θ∗
J: Minimize
J
j=1 Nj (θ∗
j − θj )2 subject to θ∗
1 ≤ · · · ≤ θ∗
J, where
Nj = n
i=1 1(ξj−1 < Xi ≤ ξj ].
There are efficient algorithms for this purpose, such as PAVA
(discussed later).
7 / 29
Why is the projection posterior good?
If the unrestricted posterior concentrates, then so does the projection
posterior around a monotone function:
J
j=1
|θ∗
j − θ0j | ≤
J
j=1
|θ∗
j − θj | +
J
j=1
|θj − θ0j | ≤ 2
J
j=1
|θj − θ0j |,
implying that |f ∗ − f0| ≤ 2 |f − f0|.
Thus accuracy of the projection posterior is inherited from the
unrestricted posterior, and hence it suffices to study the latter.
8 / 29
Why step functions give good approximation?
Approximation properties of step functions:
To approximate a monotone function using J steps, we can get the
optimal approximation error J−1
using equal length intervals in terms of L1-distance
using varying intervals for Lp-distance (equal intervals give a
suboptimal J−1/p
-rate)
Coefficients can be chosen to be monotone.
No additional smoothness is needed.
9 / 29
Handling unknown σ
Unknown σ can be handled by plug-in or fully Bayes approach.
Integrating out θ by conjugacy, explicit expression for the marginal
likelihood of σ is obtained, and the MLE can be shown to be
consistent.
In the fully Bayes approach, σ2 is given a conjugate inverse gamma
prior on σ2. This leads to a consistent posterior for σ.
Effectively σ can almost be treated as known.
10 / 29
Computational algorithm
11 / 29
slogcom operator
The solution of the isotonization problem θ∗
j is given by the slope of
the left derivative of the greatest convex minorant (slogcm) of
(0, 0), (n−1
j
k=1
Nk, n−1
j
k=1
Nk
¯Yi )J
j=1
at the point j
k=1 Nk/n.
At a point x0,
f ∗
(x0) = slogcm (0, 0), (n−1
j
k=1
Nk, n−1
j
k=1
Nk
¯Yi )J
j=1 ( x0J /J).
12 / 29
Pool adjacent violator algorithm
The greatest convex minorant of a cumulative sum diagram is
popularly obtained by the pool adjacent violators algorithm (PAVA).
PAVA algebraically describes a method of successive approximation to
the GCM in O(n) time.
Whenever it sees a violation of monotonicity between two adjacent
points (blocks), it pulls them and replaces both by the same weighted
average.
Works fine for both ordinary and weighted sum of squares, as well as
several other convex criteria.
13 / 29
Coverage of credible intervals
14 / 29
Credibility or and coverage
Coverage is E[1(θ ∈ S(X))|θ].
Credibility is E[1(θ ∈ S(X))|X].
Both are some sort of “projections” of the actual important thing
1(θ ∈ S(X)).
Together credibility and coverage may give a much more complete
picture if they are in close agreement.
In parametric problems, they often agree in view of the Bernstein-von
Mises theorem.
Even second order matching is possible with probability matching
priors (Jeffreys’ prior if no nuisance parameter)
In curve estimation with optimal smoothing, coverage of a credible
set may be arbitrarily low [Cox (1993)].
This is usually addressed by undersmoothing or inflating a credible
region.
15 / 29
Coverage of pointwise credible interval
Fix x0 ∈ (0, 1) where f0(x0) > 0.
Consider the projection posterior given by the distribution of f ∗(x0),
where f ∗ is the isotonization of f .
Consider a (1 − α)-credible interval (around mean, median or equal
tailed). Does its coverage go to 1 − α?
If there will be a Bernstein-von Mises type theorem:
Π(n1/3(f ∗(x0) − ˆf (x0)) ≤ z|Dn) →p H(z) and
P(n1/3(f0(x0) − ˆf (x0)) ≤ z|f0) → H(z), for some suitable estimator ˆf ,
then coverage will approach (1 − α), and the posterior median (mean
also if H is symmetric) will be asymptotically equivalent with ˆf (x0).
16 / 29
Centering estimator
What is the right choice of ˆf ?
The common estimator is the MLE: Minimize n
i=1(Yi − f (Xi ))2
subject to monotonicity. The solution is isotonization of the pairs
(Xi , Yi ), i = 1, . . . , n.
This has a limiting distribution under centering at f0(x0) and scaling
by n1/3, known as the Chernoff distribution — the distribution of
argmin of a two-sided Brownian motion with a parabolic drift.
The key technique in the proof is the “switch relation”: For a lower
semicontinuous function Φ on an interval I, ∀t ∈ I, v ∈ R:
slogcm(Φ)(t) > v if and only if argmin+
s∈I (Φ(s) − vs) < t, and
mirroring for right derivative.
Brownian motion comes through the Donsker theorem for a local
empirical process.
The Chernoff distribution is symmetric and has tails sharper than the
normal.
17 / 29
Sieve MLE
But the estimator’s structure is hardly anything similar to that of the
prior/posterior.
Modify to sieve MLE: Minimize n
i=1(Yi − f (Xi ))2 subject to
f = J
j=1 θj 1Ij
, θ1 ≤ · · · ≤ θJ.
With some tweaking, for any choice n1/3 J n2/3, the normalized
sieve MLE also has the same asymptotic Chernoff distribution CZ,
where Z is a standard Chernoff variable, C = 2b(a/b)2/3,
a = σ2/g(x0), b = f0(x0)g(x0)/2.
Let f ∗ = J
j=1 θ∗
j 1Ij
be the isotonization of a random draw from the
unrestricted posterior, so that the distribution of f ∗ is the projection
posterior.
18 / 29
No Bernstein-von Mises type theorem
Theorem
Let W1, W2 be independent two-sided Brownian motions on R with
W1(0) = W2(0) = 0, Z2 = arg min{W1(t) + W2(t) + t2 : t ∈ R}, Z1 =
arg min{W1(t) + t2 : t ∈ R}, and C = 2b (a/b)2/3
with a = σ2
0/g(x0)
and b = f0(x0)/2.
(a) For every z ∈ R, P0(n1/3(ˆfn(x0) − f0(x0)) ≤ z) → P(CZ1 ≤ z).
(b) For every z ∈ R, P0(n1/3(f ∗(x0) − f0(x0)) ≤ z) → P(CZ2 ≤ z).
(c) The conditional process z → Π(n1/3(f ∗(x0) − ˆfn(x0)) ≤ z Dn) does
not have a limit in probability.
19 / 29
Case 1 Case 2
Case 3
Figure: Plot demonstrating that Π(n1/3
(f ∗
(x0) − ˆfn(x0)) ≤ 0 Dn) does not have a
limit in probability, using sample size n = 2000 in three cases.
20 / 29
Coverage of credible intervals
Let
F∗
n (z|Dn) = Π n1/3
(f ∗
(x0) − f0(x0)) ≤ z |Dn ,
F∗
a,b(z|W1) = P 2b(a/b)2/3
arg min
t∈R
V (t) ≤ z |W1 ,
where V (t) = W1 (t) + W2(t) + t2. For every n ≥ 1, γ ∈ [0, 1], define
Qn,γ = inf {z ∈ R : Π(f ∗
(x0) ≤ z|Dn) ≥ 1 − γ} ,
In,γ = [Qn,1−γ
2
, Qn, γ
2
],
∆∗
W1,W2
= arg min
t∈R
W1(t) + W2(t) + t2
.
We are primarily interested in P0 (f0(x0) ∈ In,γ), the coverage of the
credible interval.
21 / 29
Limiting coverage
Theorem
(a) for every z ∈ R, F∗
n (z|Dn) F∗
a,b(z|W1);
(b) the distribution of F∗
a,b(0|W1) is symmetric about 1/2;
(c) the limiting coverage of In,γ is characterized as follows:
P0 (f0(x0) ∈ In,γ) → P
γ
2
≤ P(∆∗
W1,W2
≥ 0 W1) ≤ 1 −
γ
2
.
22 / 29
Thus the Bayesian and frequentist distributions do not exactly tally,
meaning that the asymptotic coverage is not 1 − α, but how do they
compare?
For every α, γ ∈ [0, 1], define
A(γ) = P(P(∆∗
W1,W2
≥ 0 W1) ≤ γ), γ(α) = 2A−1
(α/2).
Thus the theorem says that the limiting coverage of the
(1 − γ)-credible interval In,γ is 1 − 2A(γ/2), which depends only on γ.
It does not match the nominal level (1 − γ), but something
remarkable happens using a recalibration: If the target coverage is
(1 − α), instead of starting with a (1 − α)-credible interval, start with
a (1 − γ)-credible interval, where A(γ/2) = α/2. Then the limiting
coverage (1 − α) is attained exactly.
Unlike a confidence interval based on the MLE, no need for
estimation of nuisance parameters.
23 / 29
Recalibration table
1 − α 0.900 0.920 0.940 0.950 0.960 0.970 0.980 0.990
1 − γ 0.874 0.897 0.922 0.934 0.946 0.960 0.973 0.986
Simulation based — limited accuracy.
This is therefore a reverse Cox phenomenon, meaning that we will have
to shrink a credible interval to get nominal coverage.
24 / 29
Main idea of the proof
How to use the weak convergence assertion?
P0 (f0(x0) ≤ Qn,γ)
= P0 (Π(f ∗
(x0) ≤ f0(x0)|Dn) ≤ 1 − γ)
= P0 Π(n1/3
(f ∗
(x0) − f0(x0)) ≤ 0|Dn) ≤ 1 − γ
= P0 F∗
n (0 Dn) ≤ 1 − γ
→ P F∗
a,b(0 W1) ≤ 1 − γ
= P P C arg min
t∈R
W1(t) + W2(t) + t2
≥ 0 W1 ≤ 1 − γ
= P P ∆∗
W1,W2
≥ 0 W1 ≤ 1 − γ .
25 / 29
Simulation
26 / 29
Coverage
Consider data generated from regression function
f0(x) = ex−0.5/(1 + ex−0.5), G uniform on [0, 1].
Choose x0 = 0.5, σ = 1, J ≈ n1/3 log n.
Take n = 500, 1000, 1500, 2000.
For each n, 1000 Monte Carlo samples are used.
27 / 29
Table: Comparison of coverage and average length of uncorrected and corrected
Bayesian credible intervals and confidence interval based on MLE.
1 − α n=500 n=1000
CB (α) LB (α) C∗
B (α) L∗
B (α) CF (α) LF (α) CB (α) LB (α) C∗
B (α) L∗
B (α) CF (α) LF (α)
0.99 0.994 0.48 0.983 0.43 0.992 0.54 0.996 0.39 0.991 0.35 0.986 0.43
0.95 0.958 0.38 0.935 0.35 0.957 0.42 0.967 0.30 0.951 0.28 0.952 0.34
0.90 0.911 0.32 0.893 0.30 0.907 0.36 0.929 0.26 0.900 0.24 0.891 0.28
1 − α n=1500 n=2000
CB (α) LB (α) C∗
B (α) L∗
B (α) CF (α) LF (α) CB (α) LB (α) C∗
B (α) L∗
B (α) CF (α) LF (α)
0.99 0.994 0.34 0.984 0.31 0.989 0.38 0.996 0.31 0.988 0.28 0.993 0.34
0.95 0.967 0.27 0.949 0.25 0.953 0.29 0.968 0.25 0.939 0.23 0.951 0.27
0.90 0.914 0.23 0.894 0.21 0.905 0.25 0.914 0.21 0.895 0.19 0.889 0.23
28 / 29
Thank you
29 / 29

More Related Content

What's hot

NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
SOME THOUGHTS ON DIVERGENT SERIES
SOME THOUGHTS ON DIVERGENT SERIESSOME THOUGHTS ON DIVERGENT SERIES
SOME THOUGHTS ON DIVERGENT SERIESgenius98
 
A Generalization of QN-Maps
A Generalization of QN-MapsA Generalization of QN-Maps
A Generalization of QN-MapsIOSR Journals
 
Matrix Models of 2D String Theory in Non-trivial Backgrounds
Matrix Models of 2D String Theory in Non-trivial BackgroundsMatrix Models of 2D String Theory in Non-trivial Backgrounds
Matrix Models of 2D String Theory in Non-trivial BackgroundsUtrecht University
 
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...SEENET-MTP
 
Toxic Release And Dispersion Models
Toxic Release And Dispersion ModelsToxic Release And Dispersion Models
Toxic Release And Dispersion Modelsleksonobangun
 
UMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUmberto Lupo
 
On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-centerFrank Nielsen
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
The wild McKay correspondence
The wild McKay correspondenceThe wild McKay correspondence
The wild McKay correspondenceTakehiko Yasuda
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmologyChristian Robert
 
Postdoctoral research statement
Postdoctoral research statementPostdoctoral research statement
Postdoctoral research statementSusovan Pal
 

What's hot (20)

NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
SOME THOUGHTS ON DIVERGENT SERIES
SOME THOUGHTS ON DIVERGENT SERIESSOME THOUGHTS ON DIVERGENT SERIES
SOME THOUGHTS ON DIVERGENT SERIES
 
A Generalization of QN-Maps
A Generalization of QN-MapsA Generalization of QN-Maps
A Generalization of QN-Maps
 
Matrix Models of 2D String Theory in Non-trivial Backgrounds
Matrix Models of 2D String Theory in Non-trivial BackgroundsMatrix Models of 2D String Theory in Non-trivial Backgrounds
Matrix Models of 2D String Theory in Non-trivial Backgrounds
 
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
 
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
 
Matching
MatchingMatching
Matching
 
Toxic Release And Dispersion Models
Toxic Release And Dispersion ModelsToxic Release And Dispersion Models
Toxic Release And Dispersion Models
 
UMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUMAP - Mathematics and implementational details
UMAP - Mathematics and implementational details
 
New test123
New test123New test123
New test123
 
On approximating the Riemannian 1-center
On approximating the Riemannian 1-centerOn approximating the Riemannian 1-center
On approximating the Riemannian 1-center
 
Computing F-blowups
Computing F-blowupsComputing F-blowups
Computing F-blowups
 
NC time seminar
NC time seminarNC time seminar
NC time seminar
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
The wild McKay correspondence
The wild McKay correspondenceThe wild McKay correspondence
The wild McKay correspondence
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
Postdoctoral research statement
Postdoctoral research statementPostdoctoral research statement
Postdoctoral research statement
 

Similar to MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible Intervals for Monotone Regression, Subhashis Ghosal, April 29, 2019

A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...A G
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
 
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...CDSL_at_SNU
 
Roots equations
Roots equationsRoots equations
Roots equationsoscar
 
Roots equations
Roots equationsRoots equations
Roots equationsoscar
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Bayesian Subset Simulation
Bayesian Subset SimulationBayesian Subset Simulation
Bayesian Subset SimulationJulien Bect
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyFrank Nielsen
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSTahia ZERIZER
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationAlexander Litvinenko
 

Similar to MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible Intervals for Monotone Regression, Subhashis Ghosal, April 29, 2019 (20)

A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
A Fast Algorithm for Solving Scalar Wave Scattering Problem by Billions of Pa...
 
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
 
Bayes gauss
Bayes gaussBayes gauss
Bayes gauss
 
Roots equations
Roots equationsRoots equations
Roots equations
 
Roots equations
Roots equationsRoots equations
Roots equations
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
 
Bayesian Subset Simulation
Bayesian Subset SimulationBayesian Subset Simulation
Bayesian Subset Simulation
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
Metodo gauss_newton.pdf
Metodo gauss_newton.pdfMetodo gauss_newton.pdf
Metodo gauss_newton.pdf
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropy
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 
02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible Intervals for Monotone Regression, Subhashis Ghosal, April 29, 2019

  • 1. Coverage of credible intervals for monotone regression Subhashis Ghoshal, North Carolina State University BFF Conferences, Duke University, April 29, 2019 1 / 29
  • 2. Based on collaborations with my graduate student Moumita Chakraborty 2 / 29
  • 4. Monotone regression Yi = f (Xi ) + εi , εi iid N(0, σ2), i = 1, . . . , n, independent of X1, . . . , Xn ∼ G, G has bounded positive density g. f : [0, 1] → R is monotone. Wish to make inference on f . More specifically, we shall construct credible interval for f (x0) at a fixed point with guaranteed frequentist coverage; For theoretical study, errors will be only assumed to be subgaussian with mean zero. 4 / 29
  • 5. To construct a prior for f , with monotonicity in mind. A finite random series of step functions: f = J j=1 θj 1(ξj−1 < x ≤ ξj ], where 0 = ξ0 < ξ1 < . . . < xiJ−1 < ξj = 1 are the knots, θ1, . . . , θJ are the coefficients, J the number of terms. For the present purpose, it suffices to work with a deterministic choice of J and equispaced knots. 5 / 29
  • 6. Prior We need to put a prior on θ = (θ1, . . . , θJ). The conjugate prior for θ is normal. Could be chosen to be a product of independent normals. This does not give monotone functions — f = J j=1 θj 1(ξj−1 < x ≤ ξj ] is monotone if and only if θ1 ≤ · · · ≤ θJ. A simple way to achieve is to restrict the normal prior to the cone {θ : θ1 ≤ · · · ≤ θJ} Technically this is also conjugate, but is not easy to work with theoretically (and to some extent, computationally). 6 / 29
  • 7. Projection posterior “If the model space is complicated for a direct prior imposition, posterior computation and theoretical analysis, so a prior is put on a larger space and the posterior is projected on the desired subset.” In this context, samples can be generated from the conjugate normal posterior which can be isotonized to θ∗ 1 ≤ · · · ≤ θ∗ J: Minimize J j=1 Nj (θ∗ j − θj )2 subject to θ∗ 1 ≤ · · · ≤ θ∗ J, where Nj = n i=1 1(ξj−1 < Xi ≤ ξj ]. There are efficient algorithms for this purpose, such as PAVA (discussed later). 7 / 29
  • 8. Why is the projection posterior good? If the unrestricted posterior concentrates, then so does the projection posterior around a monotone function: J j=1 |θ∗ j − θ0j | ≤ J j=1 |θ∗ j − θj | + J j=1 |θj − θ0j | ≤ 2 J j=1 |θj − θ0j |, implying that |f ∗ − f0| ≤ 2 |f − f0|. Thus accuracy of the projection posterior is inherited from the unrestricted posterior, and hence it suffices to study the latter. 8 / 29
  • 9. Why step functions give good approximation? Approximation properties of step functions: To approximate a monotone function using J steps, we can get the optimal approximation error J−1 using equal length intervals in terms of L1-distance using varying intervals for Lp-distance (equal intervals give a suboptimal J−1/p -rate) Coefficients can be chosen to be monotone. No additional smoothness is needed. 9 / 29
  • 10. Handling unknown σ Unknown σ can be handled by plug-in or fully Bayes approach. Integrating out θ by conjugacy, explicit expression for the marginal likelihood of σ is obtained, and the MLE can be shown to be consistent. In the fully Bayes approach, σ2 is given a conjugate inverse gamma prior on σ2. This leads to a consistent posterior for σ. Effectively σ can almost be treated as known. 10 / 29
  • 12. slogcom operator The solution of the isotonization problem θ∗ j is given by the slope of the left derivative of the greatest convex minorant (slogcm) of (0, 0), (n−1 j k=1 Nk, n−1 j k=1 Nk ¯Yi )J j=1 at the point j k=1 Nk/n. At a point x0, f ∗ (x0) = slogcm (0, 0), (n−1 j k=1 Nk, n−1 j k=1 Nk ¯Yi )J j=1 ( x0J /J). 12 / 29
  • 13. Pool adjacent violator algorithm The greatest convex minorant of a cumulative sum diagram is popularly obtained by the pool adjacent violators algorithm (PAVA). PAVA algebraically describes a method of successive approximation to the GCM in O(n) time. Whenever it sees a violation of monotonicity between two adjacent points (blocks), it pulls them and replaces both by the same weighted average. Works fine for both ordinary and weighted sum of squares, as well as several other convex criteria. 13 / 29
  • 14. Coverage of credible intervals 14 / 29
  • 15. Credibility or and coverage Coverage is E[1(θ ∈ S(X))|θ]. Credibility is E[1(θ ∈ S(X))|X]. Both are some sort of “projections” of the actual important thing 1(θ ∈ S(X)). Together credibility and coverage may give a much more complete picture if they are in close agreement. In parametric problems, they often agree in view of the Bernstein-von Mises theorem. Even second order matching is possible with probability matching priors (Jeffreys’ prior if no nuisance parameter) In curve estimation with optimal smoothing, coverage of a credible set may be arbitrarily low [Cox (1993)]. This is usually addressed by undersmoothing or inflating a credible region. 15 / 29
  • 16. Coverage of pointwise credible interval Fix x0 ∈ (0, 1) where f0(x0) > 0. Consider the projection posterior given by the distribution of f ∗(x0), where f ∗ is the isotonization of f . Consider a (1 − α)-credible interval (around mean, median or equal tailed). Does its coverage go to 1 − α? If there will be a Bernstein-von Mises type theorem: Π(n1/3(f ∗(x0) − ˆf (x0)) ≤ z|Dn) →p H(z) and P(n1/3(f0(x0) − ˆf (x0)) ≤ z|f0) → H(z), for some suitable estimator ˆf , then coverage will approach (1 − α), and the posterior median (mean also if H is symmetric) will be asymptotically equivalent with ˆf (x0). 16 / 29
  • 17. Centering estimator What is the right choice of ˆf ? The common estimator is the MLE: Minimize n i=1(Yi − f (Xi ))2 subject to monotonicity. The solution is isotonization of the pairs (Xi , Yi ), i = 1, . . . , n. This has a limiting distribution under centering at f0(x0) and scaling by n1/3, known as the Chernoff distribution — the distribution of argmin of a two-sided Brownian motion with a parabolic drift. The key technique in the proof is the “switch relation”: For a lower semicontinuous function Φ on an interval I, ∀t ∈ I, v ∈ R: slogcm(Φ)(t) > v if and only if argmin+ s∈I (Φ(s) − vs) < t, and mirroring for right derivative. Brownian motion comes through the Donsker theorem for a local empirical process. The Chernoff distribution is symmetric and has tails sharper than the normal. 17 / 29
  • 18. Sieve MLE But the estimator’s structure is hardly anything similar to that of the prior/posterior. Modify to sieve MLE: Minimize n i=1(Yi − f (Xi ))2 subject to f = J j=1 θj 1Ij , θ1 ≤ · · · ≤ θJ. With some tweaking, for any choice n1/3 J n2/3, the normalized sieve MLE also has the same asymptotic Chernoff distribution CZ, where Z is a standard Chernoff variable, C = 2b(a/b)2/3, a = σ2/g(x0), b = f0(x0)g(x0)/2. Let f ∗ = J j=1 θ∗ j 1Ij be the isotonization of a random draw from the unrestricted posterior, so that the distribution of f ∗ is the projection posterior. 18 / 29
  • 19. No Bernstein-von Mises type theorem Theorem Let W1, W2 be independent two-sided Brownian motions on R with W1(0) = W2(0) = 0, Z2 = arg min{W1(t) + W2(t) + t2 : t ∈ R}, Z1 = arg min{W1(t) + t2 : t ∈ R}, and C = 2b (a/b)2/3 with a = σ2 0/g(x0) and b = f0(x0)/2. (a) For every z ∈ R, P0(n1/3(ˆfn(x0) − f0(x0)) ≤ z) → P(CZ1 ≤ z). (b) For every z ∈ R, P0(n1/3(f ∗(x0) − f0(x0)) ≤ z) → P(CZ2 ≤ z). (c) The conditional process z → Π(n1/3(f ∗(x0) − ˆfn(x0)) ≤ z Dn) does not have a limit in probability. 19 / 29
  • 20. Case 1 Case 2 Case 3 Figure: Plot demonstrating that Π(n1/3 (f ∗ (x0) − ˆfn(x0)) ≤ 0 Dn) does not have a limit in probability, using sample size n = 2000 in three cases. 20 / 29
  • 21. Coverage of credible intervals Let F∗ n (z|Dn) = Π n1/3 (f ∗ (x0) − f0(x0)) ≤ z |Dn , F∗ a,b(z|W1) = P 2b(a/b)2/3 arg min t∈R V (t) ≤ z |W1 , where V (t) = W1 (t) + W2(t) + t2. For every n ≥ 1, γ ∈ [0, 1], define Qn,γ = inf {z ∈ R : Π(f ∗ (x0) ≤ z|Dn) ≥ 1 − γ} , In,γ = [Qn,1−γ 2 , Qn, γ 2 ], ∆∗ W1,W2 = arg min t∈R W1(t) + W2(t) + t2 . We are primarily interested in P0 (f0(x0) ∈ In,γ), the coverage of the credible interval. 21 / 29
  • 22. Limiting coverage Theorem (a) for every z ∈ R, F∗ n (z|Dn) F∗ a,b(z|W1); (b) the distribution of F∗ a,b(0|W1) is symmetric about 1/2; (c) the limiting coverage of In,γ is characterized as follows: P0 (f0(x0) ∈ In,γ) → P γ 2 ≤ P(∆∗ W1,W2 ≥ 0 W1) ≤ 1 − γ 2 . 22 / 29
  • 23. Thus the Bayesian and frequentist distributions do not exactly tally, meaning that the asymptotic coverage is not 1 − α, but how do they compare? For every α, γ ∈ [0, 1], define A(γ) = P(P(∆∗ W1,W2 ≥ 0 W1) ≤ γ), γ(α) = 2A−1 (α/2). Thus the theorem says that the limiting coverage of the (1 − γ)-credible interval In,γ is 1 − 2A(γ/2), which depends only on γ. It does not match the nominal level (1 − γ), but something remarkable happens using a recalibration: If the target coverage is (1 − α), instead of starting with a (1 − α)-credible interval, start with a (1 − γ)-credible interval, where A(γ/2) = α/2. Then the limiting coverage (1 − α) is attained exactly. Unlike a confidence interval based on the MLE, no need for estimation of nuisance parameters. 23 / 29
  • 24. Recalibration table 1 − α 0.900 0.920 0.940 0.950 0.960 0.970 0.980 0.990 1 − γ 0.874 0.897 0.922 0.934 0.946 0.960 0.973 0.986 Simulation based — limited accuracy. This is therefore a reverse Cox phenomenon, meaning that we will have to shrink a credible interval to get nominal coverage. 24 / 29
  • 25. Main idea of the proof How to use the weak convergence assertion? P0 (f0(x0) ≤ Qn,γ) = P0 (Π(f ∗ (x0) ≤ f0(x0)|Dn) ≤ 1 − γ) = P0 Π(n1/3 (f ∗ (x0) − f0(x0)) ≤ 0|Dn) ≤ 1 − γ = P0 F∗ n (0 Dn) ≤ 1 − γ → P F∗ a,b(0 W1) ≤ 1 − γ = P P C arg min t∈R W1(t) + W2(t) + t2 ≥ 0 W1 ≤ 1 − γ = P P ∆∗ W1,W2 ≥ 0 W1 ≤ 1 − γ . 25 / 29
  • 27. Coverage Consider data generated from regression function f0(x) = ex−0.5/(1 + ex−0.5), G uniform on [0, 1]. Choose x0 = 0.5, σ = 1, J ≈ n1/3 log n. Take n = 500, 1000, 1500, 2000. For each n, 1000 Monte Carlo samples are used. 27 / 29
  • 28. Table: Comparison of coverage and average length of uncorrected and corrected Bayesian credible intervals and confidence interval based on MLE. 1 − α n=500 n=1000 CB (α) LB (α) C∗ B (α) L∗ B (α) CF (α) LF (α) CB (α) LB (α) C∗ B (α) L∗ B (α) CF (α) LF (α) 0.99 0.994 0.48 0.983 0.43 0.992 0.54 0.996 0.39 0.991 0.35 0.986 0.43 0.95 0.958 0.38 0.935 0.35 0.957 0.42 0.967 0.30 0.951 0.28 0.952 0.34 0.90 0.911 0.32 0.893 0.30 0.907 0.36 0.929 0.26 0.900 0.24 0.891 0.28 1 − α n=1500 n=2000 CB (α) LB (α) C∗ B (α) L∗ B (α) CF (α) LF (α) CB (α) LB (α) C∗ B (α) L∗ B (α) CF (α) LF (α) 0.99 0.994 0.34 0.984 0.31 0.989 0.38 0.996 0.31 0.988 0.28 0.993 0.34 0.95 0.967 0.27 0.949 0.25 0.953 0.29 0.968 0.25 0.939 0.23 0.951 0.27 0.90 0.914 0.23 0.894 0.21 0.905 0.25 0.914 0.21 0.895 0.19 0.889 0.23 28 / 29