SlideShare a Scribd company logo
1 of 9
Download to read offline
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
, VOL. , NO. , –
http://dx.doi.org/./..
High-dimensional posterior consistency of the Bayesian lasso
Shibasish Dasgupta
Department of Mathematics & Statistics, University of South Alabama, Mobile, AL, USA
ARTICLE HISTORY
Received  February 
Accepted  September 
KEYWORDS
Bayesian lasso;
high-dimensional variable
selection; orthogonal design;
posterior consistency.
MATHEMATICS SUBJECT
CLASSIFICATION
F
ABSTRACT
This paper considers posterior consistency in the context of high-
dimensional variable selection using the Bayesian lasso algorithm. In a
frequentist setting, consistency is perhaps the most basic property that
we expect any reasonable estimator to achieve. However, in a Bayesian
setting, consistency is often ignored or taken for granted, especially
in more complex hierarchical Bayesian models. In this paper, we have
derived sufficient conditions for posterior consistency in the Bayesian
lasso model with the orthogonal design, where the number of parame-
ters grows with the sample size.
1. Introduction
The simple problem of regressing a response vector of n observations on pcovariates is among
the first encountered by any student of statistics, and regression models have practical appli-
cations in virtually any conceivable field of study. More specifically, regression under the
Bayesian paradigm has grown in popularity in recent decades with the rapid acceleration of
computing power and the continuing development of Markov chain Monte Carlo numerical
integration techniques. The behavior and theoretical properties of these Bayesian regression
models have therefore become important topics of study.
1.1. Interpretation of posterior consistency
This basic formulation of regression lends itself easily to a Bayesian analysis in which a prior
is placed on the unknown coefficient vector β and variance σ2
. The application of the Bayes
theorem and integrating out the nuisance parameter σ2
yield the marginal posterior of β,
which is the primary object on which Bayesian inference is based. Call this entire Bayesian
model PM. However, suppose that the data is actually generated under some model P0 com-
prising the same likelihood as PM, but with some fixed but unknown parameter values β0
and σ2
0 . One would hope that as the sample size tends to infinity, the marginal posterior
of β converges to degeneracy at β0 almost surely under P0. We call this property posterior
consistency, and the obvious question is to determine the values of β0 and σ2
0 for which
posterior consistency occurs. In this simplest possible setup, it is easily verified that pos-
terior consistency occurs as long as basic regularity assumptions about the design matrix
hold.
CONTACT Shibasish Dasgupta shibasish.dasgupta@gmail.com Department of Mathematics & Statistics, University of
South Alabama, Mobile, AL, USA.
©  Taylor & Francis Group, LLC
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 6701
In Bayesian analysis, one starts with some prior knowledge (sometimes imprecise)
expressed as a distribution on the parameter space and updates the knowledge according
to the posterior distribution given the data. It is therefore of utmost importance to know
whether the updated knowledge becomes more and more accurate and precise as data are
collected indefinitely. This requirement is called the consistency of the posterior distribution.
Although it is an asymptotic property, consistency is one of the benchmarks since the viola-
tion of consistency is clearly undesirable and one may have serious doubts against inferences
based on an inconsistent posterior distribution.
1.2. Formal definition and choice of vector norm
As was previously stated, the notion of posterior consistency considered herein is the con-
vergence of the posterior distribution of β to degeneracy at β0 with P0-probability 1. We now
state a formal definition of posterior consistency.
Definition. Let β0n ∈ Rpn for each n ≥ 1 and σ2
0 > 0. Now let P0 denote the distribution of
{ˆβn, n ≥ 1} under the model yn = Xnβ0n + en for each n ≥ 1, where en ∼ Nn(0n, σ2
0 In) for
each n ≥ 1. The sequence of posterior distributions PM(βn | ˆβn) is said to be consistent at
{(β0n, σ2
0 ), n ≥ 1} if PM(||βn − β0n||∞ > | ˆβn) → 0 a.s.(P0) for every > 0.
The choice of the ∞ norm in our definition of posterior consistency warrants some dis-
cussion. In the case where the number of covariates p is fixed, it is clear that the particular
choice of vector norm is irrelevant, since the ∞ norm could be replaced by any other r norm,
1 ≤ r < ∞, and the definition would still be equivalent. However, the distinction becomes
relevant when p tends to infinity at some rate along with the sample size, in which case p, β,
and β0 become pn, βn, and β0n. If we wish to allow pn to grow in proportion to n, then the
conventional 2 norm, defined as ||x||2 = (
pn
i=1 x2
i )1/2
, makes posterior consistency unrea-
sonably difficult to achieve. As justification, note that under the 2 norm, even the MLE itself
fails to achieve classical frequentist consistency. Thus, we instead consider posterior consis-
tency under the ∞ norm ||x||∞ = max1≤i≤pn |xi|.
The following lemma and corollary illustrate why the 2 norm is not sufficiently flexible
for our purposes.
Lemma 1. Let Zn ∼ Npn (0pn , n−1
Vn), where pn < n, and where the eigenvalues
ωn,1, . . . , ωn,pn of Vn satisfy 0 < ωmin ≤ infn,i ωn,i ≤ supn,i ωn,i ≤ ωmax < ∞ for some ωmin
and ωmax. Then ||Zn||2 → 0 almost surely if and only if pn/n → 0.
Proof. Let > 0. Note that Var(Zn,i) = n−1
Vn,ii ≤ n−1
ωmax, and n1/2
V−1/2
n,ii Zn,i ∼ N(0, 1).
Now let Un = n
pn
i=1 V−1
n,ii Z2
n,i, so that Un ∼ χ2
pn
. By the properties of the chi-squared distri-
bution, Un/n → 0 almost surely if and only if pn/n → 0. Then since
ωminUn
n
≤ ||Zn||2 ≤
ωmaxUn
n
,
it follows that ||Zn||2 → 0 almost surely if and only if pn/n → 0.
Corollary 1. ||ˆβn − β0n||2 → 0 a.s.(P0) if and only if pn/n → 0.
Proof. Apply Lemma 1 under P0 with Zn = ˆβn − β0n and Vn = σ2
0 (1
n
XT
n Xn)−1
.
6702 S. DASGUPTA
As is clear from Corollary 1, not even the MLE ˆβn achieves almost sure consistency under
the 2 norm when pn grows at the same rate as n. Thus, any attempt to establish posterior
consistency under the 2 norm of a Bayesian regression model under the same circumstances
would be futile. However, the following lemma and corollary motivate the choice of the ∞
norm instead.
Lemma 2. Let Zn ∼ Npn (0pn , n−1
Vn), where pn < n, and where the eigenvalues
ωn,1, . . . , ωn,pn of Vn satisfy supn,i ωn,i ≤ ωmax < ∞ for some ωmax. Then ||Zn||∞ → 0
almost surely.
Proof. Let > 0. Note that Var(Zn,i) = n−1
Vn,ii ≤ n−1
ωmax, and n1/2
V−1/2
n,ii Zn,i ∼ N(0, 1).
Then
∞
n=1
P (||Zn||∞ > ) =
∞
n=1
P max
1≤i≤pn
Zn,i >
≤
∞
n=1
pn
i=1
P n1/2
V−1/2
n,ii Zn,i > n−1
Vn,ii
−1/2
≤
∞
n=1
pn
i=1
P n1/2
V−1/2
n,ii Zn,i > ω−1/2
max n1/2
≤
∞
n=1
pn
i=1
15ω3
max
6n3
< ∞,
by applying Markov’s inequality to n3
V−3
n,ii Z6
n,i, and the result follows from the Borel-Cantelli
lemma, noting that pn < n.
Corollary 2. ||ˆβn − β0n||∞ → 0 a.s.(P0).
Proof. Apply Lemma 2 under P0 with Zn = ˆβn − β0n and Vn = σ2
0 (1
n
XT
n Xn)−1
.
Although Corollaries 1 and 2 are not posterior consistency results per se, they nonetheless
demonstrate the added flexibility that can arise from the use of the ∞ norm instead of the 2
norm when proving consistency results. For this reason, we choose to work with the ∞ norm
throughout our work.
1.3. Conditional independence prior and the Bayesian lasso
The most common approach to prior specification in Bayesian regression models is to first
place a prior on σ2
, and to then place a prior on β | σ2
such that the prior variance of β | σ2
is proportional to σ2
. The conjugate choice for the prior on σ2
is the inverse gamma with shape
parameter a/2 and scale parameter b/2 (the factor of 1/2 is included for later convenience),
where a, b > 0. One may also wish to use an improper prior proportional to 1/σ2
, 1/σ, or 1,
but these improper priors can be seen to have the same basic form as inverse gamma densities
if the parameter restrictions are relaxed to a ≥ −2 and b ≥ 0.
Although there exist various prior structures for the coefficient vector with interest-
ing properties and applications, perhaps the most obvious alternative is to simply replace
XT
n Xn
−1
in the prior variance of β | σ2
with some diagonal matrix Dτ = Diag(τ2
1 , . . . , τ2
p ),
yielding
β | σ2
∼ Np(γ, σ2
Dτ ).
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 6703
Thus, the components of β are independent a priori when conditioned on σ2
. The values of
τ2
1 , . . . , τ2
p can be taken as fixed, or they can be set equal to a common value τ2
which is then
estimated through an empirical Bayesian approach. However, the most important applica-
tion of this model is the extension to a hierarchical model in which τ2
1 , . . . , τ2
p are assigned
independent exponential priors with common rate parameter λ2
/2. As noted by Park and
Casella (2008), this formulation leads to a Bayesian version of the lasso of Tibshirani (1996)
if the point estimate of the coefficient vector β is taken to be its posterior mode. Park and
Casella observe that the resulting Bayesian lasso typically yields results quite similar to those
of the ordinary lasso, but with the advantage of automatic interval estimates for all parameters
via any of the usual constructions of Bayesian credible intervals. Of course, this still leaves the
question of how to specify the parameter λ. Casella (2001) examines the replacement of λ
with an empirical Bayesian estimate ˆλEB
derived by maximizing the marginal likelihood of λ.
Alternatively, the hierarchical structure can be extended further by specifying a prior on λ,
though Park and Casella advise caution here, as seemingly innocuous improper priors such
as 1/λ2
can lead to impropriety of the posterior. Further discussion of Bayesian lasso methods
can be found in Kyung et al. (2010).
A slight but significant modification of the above structure is to take β and σ2
to be a priori
independent, removing the dependence on σ2
from the prior given above for the coefficient
vector β. However, Park and Casella (2008) show that this unconditional prior can easily lead
to a bimodal posterior on β, σ2
. In contrast, they show that the conditional prior always leads
to a unimodal posterior as long as σ2
∼ Inverse-Gamma(a/2, b/2), where we permit a ≥ −2
and b ≥ 0, as before.
Moreover, Kyung et al. (2010) illustrate other lasso-type penalized regression schemes that
can be represented through hierarchical extensions of the conditional independence prior. In
addition to Tibshirani’s original lasso, both the group lasso of Yuan and Lin (2006) and the
elastic net of Zou and Hastie (2005) can be represented in this fashion. A general examination
of posterior consistency under hierarchical extensions of the conditional independence prior
could provide conditions under which these lasso-type regression techniques are consistent
in the frequentist sense.
1.4. Shrinkage priors
Shrinkage estimation through continuous priors (Griffin and Brown, 2007; Park and Casella,
2008; Hans, 2009; Carvalho et al., 2010; Griffin and Brown, 2010) has found much atten-
tion in recent years along with their frequentist analogues (Knight and Fu, 2000; Fan and Li,
2001; Yuan and Lin, 2005; Zhao and Yu, 2006; Zou, 2006; Zou and Li, 2008) in the regulariza-
tion framework. The Lasso of Tibshirani (1996) and its Bayesian analogues relying on double
exponential priors (Park and Casella, 2008; Hans, 2009) have drawn particular attention, with
many variations being proposed. These priors yield undeniable computational advantages in
regression models over Bayesian variable selection approaches that require a search over a
huge discrete model space (George and McCulloch, 1993; Raftery et al., 1997; Chipman et al.,
2001; Liang et al., 2008; Clyde et al., 2010).
Consider the linear model yn = Xnβ0n + n, where yn is an n-dimensional vector of
responses, Xn is the n × pn design matrix, n ∼ Nn(0, σ2
0 In) with fixed σ2
0 , and some of the
components of β0n are zero.
In the Bayesian framework, to justify use in high-dimensional settings, it is important
to establish posterior consistency in cases in which the number of parameters p increases
6704 S. DASGUPTA
with sample size n. Armagan et al. (2013) investigated the asymptotic behavior of posterior
distributions of regression coefficients in high-dimensional linear models as the number of
parameters grows with the number of observations. Their main contribution is providing
a simple sufficient condition on the prior concentration for strong posterior consistency (in
2 norm) when pn = o(n). Their particular focus is on shrinkage priors, including the Laplace,
Student t, generalized double Pareto, and horseshoe-type priors (Johnstone and Silverman,
2004; Griffin and Brown, 2007; Carvalho et al., 2010; Armagan et al., 2011a).
In this paper, we focus Bayesian lasso model with the orthogonal design and fixed variance
parameter, where the number of parameters grows with the sample size. The main objective
of this paper is to derive sufficient conditions for posterior consistency in the Bayesian lasso
model. In Section 2, we introduce the model and provide the main result.
2. Main result
Consider the following Bayesian Lasso (Park and Casella, 2008) model where we treat the
variance parameter σ2
as a non random quantity:
yn | Xn, βn, σ2
∼ Nn(Xnβn, σ2
In)
βn | σ2
, τ2
1 , . . . , τ2
pn
∼ Npn (0pn , σ2
Dτ ); where Dτ =diag(τ2
1 , . . . , τ2
pn
)
i.e., βj | σ2
, τ2
j
ind
∼ N(0, σ2
τ2
j ); j = 1 . . . pn.
τ2
j
iid
∼ exp(λ2
/2); j = 1 . . . pn.
τ2
1 , . . . , τ2
pn
> 0.
Now, suppose the true model is: yn = Xnβ0n + n; where n ∼ Nn(0, σ2
0 In), then we need
to find the conditions on {Xn}n≥1 , {β0n}n≥1 and σ2
0 such that Pn ||βn − β0n||∞ > | yn → 0
a.s as n → ∞, for every > 0.
We investigate the posterior consistency of the above Bayesian Lasso model for a much
more relaxed growth restriction on the dimension, i.e., when pn = O(n). As discussed above,
posterior consistency in 2 norm seems unrealistic to achieve under this growth condi-
tion. Hence, we consider posterior consistency in the weaker ∞ norm. We have proved the
following theorem:
Theorem 1. Let sn be the number of true nonzero regression coefficients such that sn = O(nδ/4
).
Then for an orthogonal design, i.e; XT
n Xn = nIn and a condition on the true regression coeffi-
cients: ||β0n||2
2 = O(n2−δ
); δ > 0, the posterior consistency of the regression coefficients can be
achieved, i.e., Pn ||βn − β0n||∞ > | yn → 0 a.s as n → ∞, for every > 0.
Remark: The assumption on sn and β0n are satisfied, for example, when sn = O(n2/5
) and entries
of β0n are uniformly bounded in n.
Proof.
Pn ||βn − β0n||∞ > | yn
= Pn βn − E βn|σ2
, τ2
, yn + E βn|σ2
, τ2
yn − β0n ∞
> | yn
≤ Pn βn − E βn|σ2
, τ2
, yn ∞
> /2 | yn + Pn E βn|σ2
, τ2
yn − β0n ∞
> /2 | yn
= (I) + (II), say. (2.1)
Let ˜β σ2
, τ2
, yn = E βn|σ2
, τ2
, yn) = (nIn + D−1
τ )−1
XT
n yn, since ˆβn = XT
n yn. Notice
that βn − ˜β σ2
, τ2
, yn | σ2
, τ2
, yn ∼ Npn 0, σ2
(nIn + D−1
τ )−1
. Also, let vii is the ith diag-
onal element of (nIn + D−1
τ )−1
. Then, by applying the Tower property and the Bonferroni
bound consecutively, we obtain
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 6705
(I) = Pn βn − ˜β σ2
, τ2
, yn
∞
> | yn
= E Pn βn − ˜β σ2
, τ2
, yn
∞
> | σ2
, τ2
yn
≤
pn
i=1
E P
|βi − ˜βi(σ2
, τ2
, yn)|
σ
√
vii
>
σ
√
vii
| σ2
, τ2
yn
=
pn
i=1
E P |Zi| >
σ
√
vii
| yn , where Zi ∼ N(0, 1)
≤
pn
i=1
P |Zi| >
√
n
σ
= pnP |Z| >
√
n
σ
→ 0, as n → ∞. (2.2)
Next observe that by the triangle inequality,
(II) = Pn (nIn + D−1
τ )−1
XT
n yn − β0n ∞
> | yn
≤ Pn max
1≤i≤pn
| ˆβn,i − β0,i| > /2 | yn + Pn max
1≤i≤pn
τ−2
i
n + τ−2
i
| ˆβn,i |> /2 | yn
= (III) + (IV ), say. (2.3)
By Corollary 2, it is easy to see that (III) → 0 a.s. Also,
(IV ) = Pn max
1≤i≤pn
τ−2
i
n + τ−2
i
| ˆβn,i |> /2 | yn I ˆβn − β0n
∞
≤ /2
+ Pn max
1≤i≤pn
τ−2
i
n + τ−2
i
| ˆβn,i |> /2 | yn I ˆβn − β0n
∞
> /2
= (V ) + (VI), say. (2.4)
Clearly, (VI) → 0 a.s. Now
(V ) = Pn max
1≤i≤pn
τ−2
i
n + τ−2
i
| ˆβn,i |> /2 | yn I ˆβn − β0n
∞
≤ /2
≤
i:| ˆβn,i|> /2
Pn
τ−2
i
n + τ−2
i
| ˆβn,i |> /2 | yn I ˆβn − β0n
∞
≤ /2
≤
i:| ˆβn,i|> /2
Pn τ−2
i >
n2
| ˆβn,i |
| yn I ˆβn − β0n
∞
≤ /2 , (2.5)
since ˆβn − β0n
∞
≤ /2 and ||β0n||2
2 = O(n2−δ
); δ > 0, | ˆβn,i| = Op(n1−δ/2
). Thus, for large
n, using K(> 0) as a generic constant,
6706 S. DASGUPTA
i:| ˆβn,i|> /2
Pn τ−2
i >
n /2
| ˆβn,i |
| yn I ˆβn − β0n
∞
≤ /2
≤
i:| ˆβn,i|> /2
Pn τ−2
i > Knδ/2
| yn I ˆβn − β0n
∞
≤ /2
≤
i:| ˆβn,i|> /2
E Pn τ−2
i > Knδ/2
| βn, yn | yn
≤ sn E Pn(τ−2
i > Knδ/2
βn, yn) yn (2.6)
Next we observe that τ−2
i | βn, yn ∼ Inverse − Gaussian λσ
|βi|
, λ2
. In order to find an
upper bound for the inner probability in (2.6), we need the following lemma.
Lemma 3. Suppose, X ∼ Inverse − Gaussian(μ, λ). Then
P(X > M) ≤
λ
8πM
exp
λ
μ
exp −
λM
2μ2
.
Proof.
P(X > M) =
∞
M
λ
2πx3
exp −
λ(x − μ)2
2μ2x
dx
=
λ
2π
exp
λ
μ
∞
M
1
x3/2
exp −
λx
2μ2
exp −
λ
2x
dx
≤
λ
2π
exp
λ
μ
∞
M
1
x3/2
exp −
λx
2μ2
dx
≤
λ
2π
exp
λ
μ
exp −
λM
2μ2
∞
M
1
x3/2
dx
=
λ
2π
1
2
√
M
exp
λ
μ
exp −
λM
2μ2
=
λ
8πM
exp
λ
μ
exp −
λM
2μ2
.
By the above lemma an upper bound for (2.6) is given by
sn
K
nδ/4
E exp
λ|βi|
σ
−
Knδ/2
β2
i
2λσ2
| yn . (2.7)
Next observe that the βi have iid priors with common pdf f (β | λ, σ ) =
λ
2σ
exp [−(λ/σ )|β|]. Also, since XT
n Xn = nIn, writing ||yn − Xnβn||2
= ||yn − Xn
ˆβn||2
+
||Xn(ˆβn − βn)||2
and further ||Xn(ˆβn − βn)||2
= ||ˆβn − βn||2
= n n
i=1(βi − ˆβn,i)2
.
Hence, the posterior of βi | yn is
π(βi | yn) ∝ exp −
n
2σ2
(βi − ˆβn,i)2
−
λ|βi|
σ
. (2.8)
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 6707
In view of (2.7),
E exp
λ|βi|
σ
−
Knδ/2
β2
i
2λσ2
| yn
=
∞
−∞
exp − n
2σ2 (βi − ˆβn,i)2
−
Knδ/2β2
i
2λσ2 dβi
∞
−∞
exp − n
2σ2 (βi − ˆβn,i)2 − λ|βi|
σ
dβi
= N/D (say). (2.9)
Now,
N ≤
∞
−∞
exp −
n
2σ2
(βi − ˆβn,i)2
dβi = (2πσ2
n)1/2
. (2.10)
Also,
D =
∞
0
exp −
nβ2
i
2σ2
+
nβi
ˆβn,i
σ2
−
n ˆβ2
n,i
2σ2
−
λβi
σ
dβi
+
0
−∞
exp −
nβ2
i
2σ2
+
nβi
ˆβn,i
σ2
−
n ˆβ2
n,i
2σ2
−
λβi
σ
dβi
=
∞
0
exp −
n
2σ2
βi − ˆβn,i −
λβiσ
n
2
exp
λ2
β2
i
2n
−
βi
ˆβn,i
σ2
dβi
+
0
−∞
exp −
n
2σ2
βi − ˆβn,i +
λβiσ
n
2
exp
λ2
β2
i
2n
+
βi
ˆβn,i
σ2
dβi
= ˆβn,i −
λβiσ
n
exp
λ2
β2
i
2n
−
βi
ˆβn,i
σ2
+ ˆβn,i +
λβiσ
n
exp
λ2
β2
i
2n
+
βi
ˆβn,i
σ2
(2πσ2
/n)1/2
. (2.11)
Since ˆβn,i → β0,i a.s., from (2.10) and (2.11) it follows that N/D = O(1) a.s. as n → ∞.
Hence, from (2.7) to (2.11), it follows that (V ) → 0 a.s. as n → ∞.
3. Discussion
The Bayesian lasso is a popular and widely used algorithm for sparse Bayesian estimation
in linear regression. In this paper, we have established posterior consistency of the Bayesian
lasso algorithm under the orthogonal design and fixed variance parameter where the number
of parameters grows with the sample size. Using the insights obtained from the analysis in
this paper, we are currently investigating the high-dimensional posterior consistency of the
Bayesian lasso for an arbitrary design matrix and a stochastic variance parameter. In future
work, we would also like to consider a careful analysis of the convergence rates.
6708 S. DASGUPTA
Acknowledgment
The author would like to thank Prof. Malay Ghosh and Prof. Kshitij Khare for their help with
the paper.
References
Armagan, A., Dunson, D.B., Lee, J., Bajwa, W.U., Stawn, N. (2013). Posterior consistency in linear mod-
els under shrinkage priors. Biometrika 100:1011–1018.
Armagan, A., Dunson, D.B., Clyde, M. (2011a). Generalized beta mixtures of gaussians. Adv. Neural
Info. Proces. Syst. (NIPS).
Carvalho, C.M., Polson, N.G., Scott, J.G. (2010). The horseshoe estimator for sparse signals. Biometrika
97:465–480.
Chipman, H., George, E.I., Mcculloch, R.E. (2001). The practical implementation of Bayesian model
selection. IMS Lect. Notes - Monograph Ser. 38.
Clyde, M., Ghosh, J., Littman, M.L. (2010). Bayesian adaptive sampling for variable selection and model
averaging. J. Comput. Graph. Stat. 20(1):80–101.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties.
J. Am. Stat. Assoc. 96:1348–1360.
Griffin, J.E., Brown, P.J. (2007). Bayesian adaptive lassos with non-convex penalization. Technical Report.
Griffin, J.E., Brown, P.J. (2010). Inference with normal-gamma prior distributions in regression prob-
lems. Bayesian Anal. 5:171–188.
George, E.I., Mcculloch, R.E. (1993). Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88:881–
889.
Hans, C. (2009). Bayesian lasso regression. Biometrika 96:835–845.
Johnstone, I.M., Silverman, B.W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of
possibly sparse sequences. Ann. Stat. 32:1594–1649.
Knight, K., Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Stat. 28:1356–1378.
Kyung, M., Gill, J., Ghosh, M., Casella, G. (2010). Penalized regression, standard errors, and Bayesian
lassos. Bayesian Anal. 5:369–412.
Liang, F., Paulo, R., Molina, G., Clyde, M., Berger, J. (2008). Mixtures of g priors for Bayesian variable
selection. J. Am. Stat. Assoc. 103:410–423.
Park, T., Casella, G. (2008). The Bayesian lasso. J. Am. Stat. Assoc. 103(6):681–686.
Raftery, A.E., Madigan, D., Hoeting, J.A., (1997). Bayesian model averaging for linear regression models.
J. Am. Stat. Assoc. 92:179–191.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B 58:267–
288.
Yuan, M., Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J.
Am. Stat. Assoc. 100:1215–1225.
Zhao, P., Yu, B. (2006). On model selection consistency of lasso. J. Mach. Learn. Res. 7:2541–2563.
Zou, H. (2006). The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476):1418–1429.
Zou, H., Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat.
36:1509–1533.

More Related Content

What's hot

Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...Kyle Poe
 
SMART Seminar Series: "A polynomial algorithm to solve hard np 3 cnf-sat prob...
SMART Seminar Series: "A polynomial algorithm to solve hard np 3 cnf-sat prob...SMART Seminar Series: "A polynomial algorithm to solve hard np 3 cnf-sat prob...
SMART Seminar Series: "A polynomial algorithm to solve hard np 3 cnf-sat prob...SMART Infrastructure Facility
 
Poincaré’s conjecture proved by G. Perelman by the isomorphism of Minkowski s...
Poincaré’s conjecture proved by G. Perelman by the isomorphism of Minkowski s...Poincaré’s conjecture proved by G. Perelman by the isomorphism of Minkowski s...
Poincaré’s conjecture proved by G. Perelman by the isomorphism of Minkowski s...Vasil Penchev
 
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...Lake Como School of Advanced Studies
 
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis Lake Como School of Advanced Studies
 
The Graph Minor Theorem: a walk on the wild side of graphs
The Graph Minor Theorem: a walk on the wild side of graphsThe Graph Minor Theorem: a walk on the wild side of graphs
The Graph Minor Theorem: a walk on the wild side of graphsMarco Benini
 
Calculus ii power series and functions
Calculus ii   power series and functionsCalculus ii   power series and functions
Calculus ii power series and functionsmeezanchand
 
Proofs nearest rank
Proofs nearest rankProofs nearest rank
Proofs nearest rankfithisux
 
Tensor 1
Tensor  1Tensor  1
Tensor 1BAIJU V
 
Quantum physics the bottom up approach
Quantum physics the bottom up approachQuantum physics the bottom up approach
Quantum physics the bottom up approachSpringer
 
Tensor analysis EFE
Tensor analysis  EFETensor analysis  EFE
Tensor analysis EFEBAIJU V
 
3 bessel's functions
3 bessel's functions3 bessel's functions
3 bessel's functionsMayank Maruka
 
Solutions modern particlephysics
Solutions modern particlephysicsSolutions modern particlephysics
Solutions modern particlephysicsitachikaleido
 

What's hot (20)

magnt
magntmagnt
magnt
 
Partial
Partial Partial
Partial
 
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
Investigation of Steady-State Carrier Distribution in CNT Porins in Neuronal ...
 
SMART Seminar Series: "A polynomial algorithm to solve hard np 3 cnf-sat prob...
SMART Seminar Series: "A polynomial algorithm to solve hard np 3 cnf-sat prob...SMART Seminar Series: "A polynomial algorithm to solve hard np 3 cnf-sat prob...
SMART Seminar Series: "A polynomial algorithm to solve hard np 3 cnf-sat prob...
 
Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
 
Poincaré’s conjecture proved by G. Perelman by the isomorphism of Minkowski s...
Poincaré’s conjecture proved by G. Perelman by the isomorphism of Minkowski s...Poincaré’s conjecture proved by G. Perelman by the isomorphism of Minkowski s...
Poincaré’s conjecture proved by G. Perelman by the isomorphism of Minkowski s...
 
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
 
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
 
The Graph Minor Theorem: a walk on the wild side of graphs
The Graph Minor Theorem: a walk on the wild side of graphsThe Graph Minor Theorem: a walk on the wild side of graphs
The Graph Minor Theorem: a walk on the wild side of graphs
 
Quantum chaos in clean many-body systems - Tomaž Prosen
Quantum chaos in clean many-body systems - Tomaž ProsenQuantum chaos in clean many-body systems - Tomaž Prosen
Quantum chaos in clean many-body systems - Tomaž Prosen
 
Calculus ii power series and functions
Calculus ii   power series and functionsCalculus ii   power series and functions
Calculus ii power series and functions
 
Proofs nearest rank
Proofs nearest rankProofs nearest rank
Proofs nearest rank
 
Tensor 1
Tensor  1Tensor  1
Tensor 1
 
Physics Assignment Help
Physics Assignment Help Physics Assignment Help
Physics Assignment Help
 
Ch05 8
Ch05 8Ch05 8
Ch05 8
 
Quantum physics the bottom up approach
Quantum physics the bottom up approachQuantum physics the bottom up approach
Quantum physics the bottom up approach
 
Bessel equation
Bessel equationBessel equation
Bessel equation
 
Tensor analysis EFE
Tensor analysis  EFETensor analysis  EFE
Tensor analysis EFE
 
3 bessel's functions
3 bessel's functions3 bessel's functions
3 bessel's functions
 
Solutions modern particlephysics
Solutions modern particlephysicsSolutions modern particlephysics
Solutions modern particlephysics
 

Viewers also liked

Reliability Evaluation of Riyadh System Incorporating Renewable Generation
Reliability Evaluation of Riyadh System Incorporating Renewable GenerationReliability Evaluation of Riyadh System Incorporating Renewable Generation
Reliability Evaluation of Riyadh System Incorporating Renewable Generationinventy
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential StatisticsKate Organ
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey papereSAT Publishing House
 
11.the trend analysis of the level of fin metrics and e-stat tools for resear...
11.the trend analysis of the level of fin metrics and e-stat tools for resear...11.the trend analysis of the level of fin metrics and e-stat tools for resear...
11.the trend analysis of the level of fin metrics and e-stat tools for resear...Alexander Decker
 
A Time Series Analysis of Power Markets
A Time Series Analysis of Power MarketsA Time Series Analysis of Power Markets
A Time Series Analysis of Power MarketsIbrahim Khan
 
Interesting Research Paper Topics
Interesting Research Paper TopicsInteresting Research Paper Topics
Interesting Research Paper TopicsEssayAcademy
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generationrsathishwaran
 
Seven statistical tools of quality
Seven statistical tools of qualitySeven statistical tools of quality
Seven statistical tools of qualityBalaji Tamilselvam
 
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...inventy
 
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNINGBASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNINGCheryl Asia
 
Inferential statistics powerpoint
Inferential statistics powerpointInferential statistics powerpoint
Inferential statistics powerpointkellula
 
Inferential statistics.ppt
Inferential statistics.pptInferential statistics.ppt
Inferential statistics.pptNursing Path
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statisticsMona Sajid
 
Indian Paper Industry
Indian Paper IndustryIndian Paper Industry
Indian Paper IndustryJaspal Singh
 
Statistical tools in research
Statistical tools in researchStatistical tools in research
Statistical tools in researchShubhrat Sharma
 
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...Spandan Mishra, PhD.
 
Statistical Process Control (SPC) Tools - 7 Basic Tools
Statistical Process Control (SPC) Tools - 7 Basic ToolsStatistical Process Control (SPC) Tools - 7 Basic Tools
Statistical Process Control (SPC) Tools - 7 Basic ToolsMadeleine Lee
 

Viewers also liked (20)

Stat paper
Stat paperStat paper
Stat paper
 
Reliability Evaluation of Riyadh System Incorporating Renewable Generation
Reliability Evaluation of Riyadh System Incorporating Renewable GenerationReliability Evaluation of Riyadh System Incorporating Renewable Generation
Reliability Evaluation of Riyadh System Incorporating Renewable Generation
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey paper
 
11.the trend analysis of the level of fin metrics and e-stat tools for resear...
11.the trend analysis of the level of fin metrics and e-stat tools for resear...11.the trend analysis of the level of fin metrics and e-stat tools for resear...
11.the trend analysis of the level of fin metrics and e-stat tools for resear...
 
Consignment eng. - copy
Consignment eng. - copyConsignment eng. - copy
Consignment eng. - copy
 
A Time Series Analysis of Power Markets
A Time Series Analysis of Power MarketsA Time Series Analysis of Power Markets
A Time Series Analysis of Power Markets
 
Interesting Research Paper Topics
Interesting Research Paper TopicsInteresting Research Paper Topics
Interesting Research Paper Topics
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Seven statistical tools of quality
Seven statistical tools of qualitySeven statistical tools of quality
Seven statistical tools of quality
 
Consignment
ConsignmentConsignment
Consignment
 
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
 
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNINGBASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
 
Inferential statistics powerpoint
Inferential statistics powerpointInferential statistics powerpoint
Inferential statistics powerpoint
 
Inferential statistics.ppt
Inferential statistics.pptInferential statistics.ppt
Inferential statistics.ppt
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statistics
 
Indian Paper Industry
Indian Paper IndustryIndian Paper Industry
Indian Paper Industry
 
Statistical tools in research
Statistical tools in researchStatistical tools in research
Statistical tools in research
 
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
 
Statistical Process Control (SPC) Tools - 7 Basic Tools
Statistical Process Control (SPC) Tools - 7 Basic ToolsStatistical Process Control (SPC) Tools - 7 Basic Tools
Statistical Process Control (SPC) Tools - 7 Basic Tools
 

Similar to Com_Stat_Paper

L1 updated introduction.pptx
L1 updated introduction.pptxL1 updated introduction.pptx
L1 updated introduction.pptxMesfinTadesse8
 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateAlexander Litvinenko
 
article_imen_ridha_2016_version_finale
article_imen_ridha_2016_version_finalearticle_imen_ridha_2016_version_finale
article_imen_ridha_2016_version_finaleMdimagh Ridha
 
Materi_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfMateri_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfHasan Dwi Cahyono
 
BlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed modelBlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed modelKyusonLim
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...Loc Nguyen
 
Bayesian adaptive optimal estimation using a sieve prior
Bayesian adaptive optimal estimation using a sieve priorBayesian adaptive optimal estimation using a sieve prior
Bayesian adaptive optimal estimation using a sieve priorJulyan Arbel
 
Data Science Cheatsheet.pdf
Data Science Cheatsheet.pdfData Science Cheatsheet.pdf
Data Science Cheatsheet.pdfqawali1
 
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxBINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxletbestrong
 

Similar to Com_Stat_Paper (20)

PH 3206 note 6.pptx
PH 3206 note 6.pptxPH 3206 note 6.pptx
PH 3206 note 6.pptx
 
L1 updated introduction.pptx
L1 updated introduction.pptxL1 updated introduction.pptx
L1 updated introduction.pptx
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
BNL_Research_Report
BNL_Research_ReportBNL_Research_Report
BNL_Research_Report
 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian update
 
article_imen_ridha_2016_version_finale
article_imen_ridha_2016_version_finalearticle_imen_ridha_2016_version_finale
article_imen_ridha_2016_version_finale
 
Materi_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfMateri_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdf
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
BlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed modelBlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed model
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
 
Bayesian adaptive optimal estimation using a sieve prior
Bayesian adaptive optimal estimation using a sieve priorBayesian adaptive optimal estimation using a sieve prior
Bayesian adaptive optimal estimation using a sieve prior
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Data Science Cheatsheet.pdf
Data Science Cheatsheet.pdfData Science Cheatsheet.pdf
Data Science Cheatsheet.pdf
 
Iwsmbvs
IwsmbvsIwsmbvs
Iwsmbvs
 
Chap2
Chap2Chap2
Chap2
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
v39i11.pdf
v39i11.pdfv39i11.pdf
v39i11.pdf
 
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxBINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
 
Bayes gauss
Bayes gaussBayes gauss
Bayes gauss
 

Com_Stat_Paper

  • 1. COMMUNICATIONS IN STATISTICS—THEORY AND METHODS , VOL. , NO. , – http://dx.doi.org/./.. High-dimensional posterior consistency of the Bayesian lasso Shibasish Dasgupta Department of Mathematics & Statistics, University of South Alabama, Mobile, AL, USA ARTICLE HISTORY Received  February  Accepted  September  KEYWORDS Bayesian lasso; high-dimensional variable selection; orthogonal design; posterior consistency. MATHEMATICS SUBJECT CLASSIFICATION F ABSTRACT This paper considers posterior consistency in the context of high- dimensional variable selection using the Bayesian lasso algorithm. In a frequentist setting, consistency is perhaps the most basic property that we expect any reasonable estimator to achieve. However, in a Bayesian setting, consistency is often ignored or taken for granted, especially in more complex hierarchical Bayesian models. In this paper, we have derived sufficient conditions for posterior consistency in the Bayesian lasso model with the orthogonal design, where the number of parame- ters grows with the sample size. 1. Introduction The simple problem of regressing a response vector of n observations on pcovariates is among the first encountered by any student of statistics, and regression models have practical appli- cations in virtually any conceivable field of study. More specifically, regression under the Bayesian paradigm has grown in popularity in recent decades with the rapid acceleration of computing power and the continuing development of Markov chain Monte Carlo numerical integration techniques. The behavior and theoretical properties of these Bayesian regression models have therefore become important topics of study. 1.1. Interpretation of posterior consistency This basic formulation of regression lends itself easily to a Bayesian analysis in which a prior is placed on the unknown coefficient vector β and variance σ2 . The application of the Bayes theorem and integrating out the nuisance parameter σ2 yield the marginal posterior of β, which is the primary object on which Bayesian inference is based. Call this entire Bayesian model PM. However, suppose that the data is actually generated under some model P0 com- prising the same likelihood as PM, but with some fixed but unknown parameter values β0 and σ2 0 . One would hope that as the sample size tends to infinity, the marginal posterior of β converges to degeneracy at β0 almost surely under P0. We call this property posterior consistency, and the obvious question is to determine the values of β0 and σ2 0 for which posterior consistency occurs. In this simplest possible setup, it is easily verified that pos- terior consistency occurs as long as basic regularity assumptions about the design matrix hold. CONTACT Shibasish Dasgupta shibasish.dasgupta@gmail.com Department of Mathematics & Statistics, University of South Alabama, Mobile, AL, USA. ©  Taylor & Francis Group, LLC
  • 2. COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 6701 In Bayesian analysis, one starts with some prior knowledge (sometimes imprecise) expressed as a distribution on the parameter space and updates the knowledge according to the posterior distribution given the data. It is therefore of utmost importance to know whether the updated knowledge becomes more and more accurate and precise as data are collected indefinitely. This requirement is called the consistency of the posterior distribution. Although it is an asymptotic property, consistency is one of the benchmarks since the viola- tion of consistency is clearly undesirable and one may have serious doubts against inferences based on an inconsistent posterior distribution. 1.2. Formal definition and choice of vector norm As was previously stated, the notion of posterior consistency considered herein is the con- vergence of the posterior distribution of β to degeneracy at β0 with P0-probability 1. We now state a formal definition of posterior consistency. Definition. Let β0n ∈ Rpn for each n ≥ 1 and σ2 0 > 0. Now let P0 denote the distribution of {ˆβn, n ≥ 1} under the model yn = Xnβ0n + en for each n ≥ 1, where en ∼ Nn(0n, σ2 0 In) for each n ≥ 1. The sequence of posterior distributions PM(βn | ˆβn) is said to be consistent at {(β0n, σ2 0 ), n ≥ 1} if PM(||βn − β0n||∞ > | ˆβn) → 0 a.s.(P0) for every > 0. The choice of the ∞ norm in our definition of posterior consistency warrants some dis- cussion. In the case where the number of covariates p is fixed, it is clear that the particular choice of vector norm is irrelevant, since the ∞ norm could be replaced by any other r norm, 1 ≤ r < ∞, and the definition would still be equivalent. However, the distinction becomes relevant when p tends to infinity at some rate along with the sample size, in which case p, β, and β0 become pn, βn, and β0n. If we wish to allow pn to grow in proportion to n, then the conventional 2 norm, defined as ||x||2 = ( pn i=1 x2 i )1/2 , makes posterior consistency unrea- sonably difficult to achieve. As justification, note that under the 2 norm, even the MLE itself fails to achieve classical frequentist consistency. Thus, we instead consider posterior consis- tency under the ∞ norm ||x||∞ = max1≤i≤pn |xi|. The following lemma and corollary illustrate why the 2 norm is not sufficiently flexible for our purposes. Lemma 1. Let Zn ∼ Npn (0pn , n−1 Vn), where pn < n, and where the eigenvalues ωn,1, . . . , ωn,pn of Vn satisfy 0 < ωmin ≤ infn,i ωn,i ≤ supn,i ωn,i ≤ ωmax < ∞ for some ωmin and ωmax. Then ||Zn||2 → 0 almost surely if and only if pn/n → 0. Proof. Let > 0. Note that Var(Zn,i) = n−1 Vn,ii ≤ n−1 ωmax, and n1/2 V−1/2 n,ii Zn,i ∼ N(0, 1). Now let Un = n pn i=1 V−1 n,ii Z2 n,i, so that Un ∼ χ2 pn . By the properties of the chi-squared distri- bution, Un/n → 0 almost surely if and only if pn/n → 0. Then since ωminUn n ≤ ||Zn||2 ≤ ωmaxUn n , it follows that ||Zn||2 → 0 almost surely if and only if pn/n → 0. Corollary 1. ||ˆβn − β0n||2 → 0 a.s.(P0) if and only if pn/n → 0. Proof. Apply Lemma 1 under P0 with Zn = ˆβn − β0n and Vn = σ2 0 (1 n XT n Xn)−1 .
  • 3. 6702 S. DASGUPTA As is clear from Corollary 1, not even the MLE ˆβn achieves almost sure consistency under the 2 norm when pn grows at the same rate as n. Thus, any attempt to establish posterior consistency under the 2 norm of a Bayesian regression model under the same circumstances would be futile. However, the following lemma and corollary motivate the choice of the ∞ norm instead. Lemma 2. Let Zn ∼ Npn (0pn , n−1 Vn), where pn < n, and where the eigenvalues ωn,1, . . . , ωn,pn of Vn satisfy supn,i ωn,i ≤ ωmax < ∞ for some ωmax. Then ||Zn||∞ → 0 almost surely. Proof. Let > 0. Note that Var(Zn,i) = n−1 Vn,ii ≤ n−1 ωmax, and n1/2 V−1/2 n,ii Zn,i ∼ N(0, 1). Then ∞ n=1 P (||Zn||∞ > ) = ∞ n=1 P max 1≤i≤pn Zn,i > ≤ ∞ n=1 pn i=1 P n1/2 V−1/2 n,ii Zn,i > n−1 Vn,ii −1/2 ≤ ∞ n=1 pn i=1 P n1/2 V−1/2 n,ii Zn,i > ω−1/2 max n1/2 ≤ ∞ n=1 pn i=1 15ω3 max 6n3 < ∞, by applying Markov’s inequality to n3 V−3 n,ii Z6 n,i, and the result follows from the Borel-Cantelli lemma, noting that pn < n. Corollary 2. ||ˆβn − β0n||∞ → 0 a.s.(P0). Proof. Apply Lemma 2 under P0 with Zn = ˆβn − β0n and Vn = σ2 0 (1 n XT n Xn)−1 . Although Corollaries 1 and 2 are not posterior consistency results per se, they nonetheless demonstrate the added flexibility that can arise from the use of the ∞ norm instead of the 2 norm when proving consistency results. For this reason, we choose to work with the ∞ norm throughout our work. 1.3. Conditional independence prior and the Bayesian lasso The most common approach to prior specification in Bayesian regression models is to first place a prior on σ2 , and to then place a prior on β | σ2 such that the prior variance of β | σ2 is proportional to σ2 . The conjugate choice for the prior on σ2 is the inverse gamma with shape parameter a/2 and scale parameter b/2 (the factor of 1/2 is included for later convenience), where a, b > 0. One may also wish to use an improper prior proportional to 1/σ2 , 1/σ, or 1, but these improper priors can be seen to have the same basic form as inverse gamma densities if the parameter restrictions are relaxed to a ≥ −2 and b ≥ 0. Although there exist various prior structures for the coefficient vector with interest- ing properties and applications, perhaps the most obvious alternative is to simply replace XT n Xn −1 in the prior variance of β | σ2 with some diagonal matrix Dτ = Diag(τ2 1 , . . . , τ2 p ), yielding β | σ2 ∼ Np(γ, σ2 Dτ ).
  • 4. COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 6703 Thus, the components of β are independent a priori when conditioned on σ2 . The values of τ2 1 , . . . , τ2 p can be taken as fixed, or they can be set equal to a common value τ2 which is then estimated through an empirical Bayesian approach. However, the most important applica- tion of this model is the extension to a hierarchical model in which τ2 1 , . . . , τ2 p are assigned independent exponential priors with common rate parameter λ2 /2. As noted by Park and Casella (2008), this formulation leads to a Bayesian version of the lasso of Tibshirani (1996) if the point estimate of the coefficient vector β is taken to be its posterior mode. Park and Casella observe that the resulting Bayesian lasso typically yields results quite similar to those of the ordinary lasso, but with the advantage of automatic interval estimates for all parameters via any of the usual constructions of Bayesian credible intervals. Of course, this still leaves the question of how to specify the parameter λ. Casella (2001) examines the replacement of λ with an empirical Bayesian estimate ˆλEB derived by maximizing the marginal likelihood of λ. Alternatively, the hierarchical structure can be extended further by specifying a prior on λ, though Park and Casella advise caution here, as seemingly innocuous improper priors such as 1/λ2 can lead to impropriety of the posterior. Further discussion of Bayesian lasso methods can be found in Kyung et al. (2010). A slight but significant modification of the above structure is to take β and σ2 to be a priori independent, removing the dependence on σ2 from the prior given above for the coefficient vector β. However, Park and Casella (2008) show that this unconditional prior can easily lead to a bimodal posterior on β, σ2 . In contrast, they show that the conditional prior always leads to a unimodal posterior as long as σ2 ∼ Inverse-Gamma(a/2, b/2), where we permit a ≥ −2 and b ≥ 0, as before. Moreover, Kyung et al. (2010) illustrate other lasso-type penalized regression schemes that can be represented through hierarchical extensions of the conditional independence prior. In addition to Tibshirani’s original lasso, both the group lasso of Yuan and Lin (2006) and the elastic net of Zou and Hastie (2005) can be represented in this fashion. A general examination of posterior consistency under hierarchical extensions of the conditional independence prior could provide conditions under which these lasso-type regression techniques are consistent in the frequentist sense. 1.4. Shrinkage priors Shrinkage estimation through continuous priors (Griffin and Brown, 2007; Park and Casella, 2008; Hans, 2009; Carvalho et al., 2010; Griffin and Brown, 2010) has found much atten- tion in recent years along with their frequentist analogues (Knight and Fu, 2000; Fan and Li, 2001; Yuan and Lin, 2005; Zhao and Yu, 2006; Zou, 2006; Zou and Li, 2008) in the regulariza- tion framework. The Lasso of Tibshirani (1996) and its Bayesian analogues relying on double exponential priors (Park and Casella, 2008; Hans, 2009) have drawn particular attention, with many variations being proposed. These priors yield undeniable computational advantages in regression models over Bayesian variable selection approaches that require a search over a huge discrete model space (George and McCulloch, 1993; Raftery et al., 1997; Chipman et al., 2001; Liang et al., 2008; Clyde et al., 2010). Consider the linear model yn = Xnβ0n + n, where yn is an n-dimensional vector of responses, Xn is the n × pn design matrix, n ∼ Nn(0, σ2 0 In) with fixed σ2 0 , and some of the components of β0n are zero. In the Bayesian framework, to justify use in high-dimensional settings, it is important to establish posterior consistency in cases in which the number of parameters p increases
  • 5. 6704 S. DASGUPTA with sample size n. Armagan et al. (2013) investigated the asymptotic behavior of posterior distributions of regression coefficients in high-dimensional linear models as the number of parameters grows with the number of observations. Their main contribution is providing a simple sufficient condition on the prior concentration for strong posterior consistency (in 2 norm) when pn = o(n). Their particular focus is on shrinkage priors, including the Laplace, Student t, generalized double Pareto, and horseshoe-type priors (Johnstone and Silverman, 2004; Griffin and Brown, 2007; Carvalho et al., 2010; Armagan et al., 2011a). In this paper, we focus Bayesian lasso model with the orthogonal design and fixed variance parameter, where the number of parameters grows with the sample size. The main objective of this paper is to derive sufficient conditions for posterior consistency in the Bayesian lasso model. In Section 2, we introduce the model and provide the main result. 2. Main result Consider the following Bayesian Lasso (Park and Casella, 2008) model where we treat the variance parameter σ2 as a non random quantity: yn | Xn, βn, σ2 ∼ Nn(Xnβn, σ2 In) βn | σ2 , τ2 1 , . . . , τ2 pn ∼ Npn (0pn , σ2 Dτ ); where Dτ =diag(τ2 1 , . . . , τ2 pn ) i.e., βj | σ2 , τ2 j ind ∼ N(0, σ2 τ2 j ); j = 1 . . . pn. τ2 j iid ∼ exp(λ2 /2); j = 1 . . . pn. τ2 1 , . . . , τ2 pn > 0. Now, suppose the true model is: yn = Xnβ0n + n; where n ∼ Nn(0, σ2 0 In), then we need to find the conditions on {Xn}n≥1 , {β0n}n≥1 and σ2 0 such that Pn ||βn − β0n||∞ > | yn → 0 a.s as n → ∞, for every > 0. We investigate the posterior consistency of the above Bayesian Lasso model for a much more relaxed growth restriction on the dimension, i.e., when pn = O(n). As discussed above, posterior consistency in 2 norm seems unrealistic to achieve under this growth condi- tion. Hence, we consider posterior consistency in the weaker ∞ norm. We have proved the following theorem: Theorem 1. Let sn be the number of true nonzero regression coefficients such that sn = O(nδ/4 ). Then for an orthogonal design, i.e; XT n Xn = nIn and a condition on the true regression coeffi- cients: ||β0n||2 2 = O(n2−δ ); δ > 0, the posterior consistency of the regression coefficients can be achieved, i.e., Pn ||βn − β0n||∞ > | yn → 0 a.s as n → ∞, for every > 0. Remark: The assumption on sn and β0n are satisfied, for example, when sn = O(n2/5 ) and entries of β0n are uniformly bounded in n. Proof. Pn ||βn − β0n||∞ > | yn = Pn βn − E βn|σ2 , τ2 , yn + E βn|σ2 , τ2 yn − β0n ∞ > | yn ≤ Pn βn − E βn|σ2 , τ2 , yn ∞ > /2 | yn + Pn E βn|σ2 , τ2 yn − β0n ∞ > /2 | yn = (I) + (II), say. (2.1) Let ˜β σ2 , τ2 , yn = E βn|σ2 , τ2 , yn) = (nIn + D−1 τ )−1 XT n yn, since ˆβn = XT n yn. Notice that βn − ˜β σ2 , τ2 , yn | σ2 , τ2 , yn ∼ Npn 0, σ2 (nIn + D−1 τ )−1 . Also, let vii is the ith diag- onal element of (nIn + D−1 τ )−1 . Then, by applying the Tower property and the Bonferroni bound consecutively, we obtain
  • 6. COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 6705 (I) = Pn βn − ˜β σ2 , τ2 , yn ∞ > | yn = E Pn βn − ˜β σ2 , τ2 , yn ∞ > | σ2 , τ2 yn ≤ pn i=1 E P |βi − ˜βi(σ2 , τ2 , yn)| σ √ vii > σ √ vii | σ2 , τ2 yn = pn i=1 E P |Zi| > σ √ vii | yn , where Zi ∼ N(0, 1) ≤ pn i=1 P |Zi| > √ n σ = pnP |Z| > √ n σ → 0, as n → ∞. (2.2) Next observe that by the triangle inequality, (II) = Pn (nIn + D−1 τ )−1 XT n yn − β0n ∞ > | yn ≤ Pn max 1≤i≤pn | ˆβn,i − β0,i| > /2 | yn + Pn max 1≤i≤pn τ−2 i n + τ−2 i | ˆβn,i |> /2 | yn = (III) + (IV ), say. (2.3) By Corollary 2, it is easy to see that (III) → 0 a.s. Also, (IV ) = Pn max 1≤i≤pn τ−2 i n + τ−2 i | ˆβn,i |> /2 | yn I ˆβn − β0n ∞ ≤ /2 + Pn max 1≤i≤pn τ−2 i n + τ−2 i | ˆβn,i |> /2 | yn I ˆβn − β0n ∞ > /2 = (V ) + (VI), say. (2.4) Clearly, (VI) → 0 a.s. Now (V ) = Pn max 1≤i≤pn τ−2 i n + τ−2 i | ˆβn,i |> /2 | yn I ˆβn − β0n ∞ ≤ /2 ≤ i:| ˆβn,i|> /2 Pn τ−2 i n + τ−2 i | ˆβn,i |> /2 | yn I ˆβn − β0n ∞ ≤ /2 ≤ i:| ˆβn,i|> /2 Pn τ−2 i > n2 | ˆβn,i | | yn I ˆβn − β0n ∞ ≤ /2 , (2.5) since ˆβn − β0n ∞ ≤ /2 and ||β0n||2 2 = O(n2−δ ); δ > 0, | ˆβn,i| = Op(n1−δ/2 ). Thus, for large n, using K(> 0) as a generic constant,
  • 7. 6706 S. DASGUPTA i:| ˆβn,i|> /2 Pn τ−2 i > n /2 | ˆβn,i | | yn I ˆβn − β0n ∞ ≤ /2 ≤ i:| ˆβn,i|> /2 Pn τ−2 i > Knδ/2 | yn I ˆβn − β0n ∞ ≤ /2 ≤ i:| ˆβn,i|> /2 E Pn τ−2 i > Knδ/2 | βn, yn | yn ≤ sn E Pn(τ−2 i > Knδ/2 βn, yn) yn (2.6) Next we observe that τ−2 i | βn, yn ∼ Inverse − Gaussian λσ |βi| , λ2 . In order to find an upper bound for the inner probability in (2.6), we need the following lemma. Lemma 3. Suppose, X ∼ Inverse − Gaussian(μ, λ). Then P(X > M) ≤ λ 8πM exp λ μ exp − λM 2μ2 . Proof. P(X > M) = ∞ M λ 2πx3 exp − λ(x − μ)2 2μ2x dx = λ 2π exp λ μ ∞ M 1 x3/2 exp − λx 2μ2 exp − λ 2x dx ≤ λ 2π exp λ μ ∞ M 1 x3/2 exp − λx 2μ2 dx ≤ λ 2π exp λ μ exp − λM 2μ2 ∞ M 1 x3/2 dx = λ 2π 1 2 √ M exp λ μ exp − λM 2μ2 = λ 8πM exp λ μ exp − λM 2μ2 . By the above lemma an upper bound for (2.6) is given by sn K nδ/4 E exp λ|βi| σ − Knδ/2 β2 i 2λσ2 | yn . (2.7) Next observe that the βi have iid priors with common pdf f (β | λ, σ ) = λ 2σ exp [−(λ/σ )|β|]. Also, since XT n Xn = nIn, writing ||yn − Xnβn||2 = ||yn − Xn ˆβn||2 + ||Xn(ˆβn − βn)||2 and further ||Xn(ˆβn − βn)||2 = ||ˆβn − βn||2 = n n i=1(βi − ˆβn,i)2 . Hence, the posterior of βi | yn is π(βi | yn) ∝ exp − n 2σ2 (βi − ˆβn,i)2 − λ|βi| σ . (2.8)
  • 8. COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 6707 In view of (2.7), E exp λ|βi| σ − Knδ/2 β2 i 2λσ2 | yn = ∞ −∞ exp − n 2σ2 (βi − ˆβn,i)2 − Knδ/2β2 i 2λσ2 dβi ∞ −∞ exp − n 2σ2 (βi − ˆβn,i)2 − λ|βi| σ dβi = N/D (say). (2.9) Now, N ≤ ∞ −∞ exp − n 2σ2 (βi − ˆβn,i)2 dβi = (2πσ2 n)1/2 . (2.10) Also, D = ∞ 0 exp − nβ2 i 2σ2 + nβi ˆβn,i σ2 − n ˆβ2 n,i 2σ2 − λβi σ dβi + 0 −∞ exp − nβ2 i 2σ2 + nβi ˆβn,i σ2 − n ˆβ2 n,i 2σ2 − λβi σ dβi = ∞ 0 exp − n 2σ2 βi − ˆβn,i − λβiσ n 2 exp λ2 β2 i 2n − βi ˆβn,i σ2 dβi + 0 −∞ exp − n 2σ2 βi − ˆβn,i + λβiσ n 2 exp λ2 β2 i 2n + βi ˆβn,i σ2 dβi = ˆβn,i − λβiσ n exp λ2 β2 i 2n − βi ˆβn,i σ2 + ˆβn,i + λβiσ n exp λ2 β2 i 2n + βi ˆβn,i σ2 (2πσ2 /n)1/2 . (2.11) Since ˆβn,i → β0,i a.s., from (2.10) and (2.11) it follows that N/D = O(1) a.s. as n → ∞. Hence, from (2.7) to (2.11), it follows that (V ) → 0 a.s. as n → ∞. 3. Discussion The Bayesian lasso is a popular and widely used algorithm for sparse Bayesian estimation in linear regression. In this paper, we have established posterior consistency of the Bayesian lasso algorithm under the orthogonal design and fixed variance parameter where the number of parameters grows with the sample size. Using the insights obtained from the analysis in this paper, we are currently investigating the high-dimensional posterior consistency of the Bayesian lasso for an arbitrary design matrix and a stochastic variance parameter. In future work, we would also like to consider a careful analysis of the convergence rates.
  • 9. 6708 S. DASGUPTA Acknowledgment The author would like to thank Prof. Malay Ghosh and Prof. Kshitij Khare for their help with the paper. References Armagan, A., Dunson, D.B., Lee, J., Bajwa, W.U., Stawn, N. (2013). Posterior consistency in linear mod- els under shrinkage priors. Biometrika 100:1011–1018. Armagan, A., Dunson, D.B., Clyde, M. (2011a). Generalized beta mixtures of gaussians. Adv. Neural Info. Proces. Syst. (NIPS). Carvalho, C.M., Polson, N.G., Scott, J.G. (2010). The horseshoe estimator for sparse signals. Biometrika 97:465–480. Chipman, H., George, E.I., Mcculloch, R.E. (2001). The practical implementation of Bayesian model selection. IMS Lect. Notes - Monograph Ser. 38. Clyde, M., Ghosh, J., Littman, M.L. (2010). Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20(1):80–101. Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96:1348–1360. Griffin, J.E., Brown, P.J. (2007). Bayesian adaptive lassos with non-convex penalization. Technical Report. Griffin, J.E., Brown, P.J. (2010). Inference with normal-gamma prior distributions in regression prob- lems. Bayesian Anal. 5:171–188. George, E.I., Mcculloch, R.E. (1993). Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88:881– 889. Hans, C. (2009). Bayesian lasso regression. Biometrika 96:835–845. Johnstone, I.M., Silverman, B.W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 32:1594–1649. Knight, K., Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Stat. 28:1356–1378. Kyung, M., Gill, J., Ghosh, M., Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal. 5:369–412. Liang, F., Paulo, R., Molina, G., Clyde, M., Berger, J. (2008). Mixtures of g priors for Bayesian variable selection. J. Am. Stat. Assoc. 103:410–423. Park, T., Casella, G. (2008). The Bayesian lasso. J. Am. Stat. Assoc. 103(6):681–686. Raftery, A.E., Madigan, D., Hoeting, J.A., (1997). Bayesian model averaging for linear regression models. J. Am. Stat. Assoc. 92:179–191. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B 58:267– 288. Yuan, M., Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J. Am. Stat. Assoc. 100:1215–1225. Zhao, P., Yu, B. (2006). On model selection consistency of lasso. J. Mach. Learn. Res. 7:2541–2563. Zou, H. (2006). The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476):1418–1429. Zou, H., Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36:1509–1533.