SlideShare a Scribd company logo
Journal of Multivariate Analysis 131 (2014) 126–148
Contents lists available at ScienceDirect
Journal of Multivariate Analysis
journal homepage: www.elsevier.com/locate/jmva
Asymptotic expansion of the posterior density in high
dimensional generalized linear models
Shibasish Dasgupta, Kshitij Khare, Malay Ghosh∗
University of Florida, United States
a r t i c l e i n f o
Article history:
Received 12 July 2013
Available online 21 June 2014
AMS 2010 subject classification:
62F15
Keywords:
Asymptotic expansion of the posterior
Generalized linear models
Canonical link function
High dimensional inference
Moment matching priors
a b s t r a c t
While developing a prior distribution for any Bayesian analysis, it is important to check
whether the corresponding posterior distribution becomes degenerate in the limit to
the true parameter value as the sample size increases. In the same vein, it is also
important to understand a more detailed asymptotic behavior of posterior distributions.
This is particularly relevant in the development of many nonsubjective priors. The present
paper focuses on asymptotic expansions of posteriors for generalized linear models with
canonical link functions when the number of regressors grows to infinity at a certain rate
relative to the growth of the sample size. These expansions are then used to derive moment
matching priors in the generalized linear model setting.
© 2014 Elsevier Inc. All rights reserved.
1. Introduction
Bayesian methodology is gaining increasing prominence in the theory and application of statistics. Its versatility has
enhanced due to its implementability via many statistical numerical integration techniques, in particular, the Markov chain
Monte Carlo method. Nevertheless, it is important not to overlook asymptotic performance of any Bayesian procedure.
Specifically, it is important to check whether a posterior distribution generated by a prior becomes degenerate in the limit
to the true parameter value as the sample size grows to infinity. In the same vein, it is also important to understand a more
detailed asymptotic behavior of the posterior distribution of the (appropriately normalized) parameter of interest.
Asymptotic normality of the posterior for regular (when the support of the distribution does not depend on the
parameter) family of distributions based on i.i.d. observations was first developed by Bernstein and Von Mises (see [2]). Later,
analogous to frequentist Edgeworth expansion of the density or the distribution function, higher order asymptotic expansion
of the posterior was developed to address various other important issues needed for Bayesian analysis, most prominently
the development of non-subjective priors using a number of different criteria; see e.g. [11] where other references are cited.
To our knowledge, the first work dealing with a comprehensive asymptotic expansion of the posterior is due to
Johnson [16,17]. This was followed up later by Walker [22], Ghosh, Sinha and Joshi [13], Crowder [4], just to name a few.
However, much of this work was focused on posteriors generated from i.i.d. observations generated from a regular family of
distributions and a smooth family of priors admitting derivatives up to a certain order. Ghosal and Samanta [10] established
asymptotic expansion of the posterior in the non-regular case, by considering a one-parameter family of discontinuous
densities.
Ghosal [6–8] made significant and topical contributions to this area by establishing posterior consistency in a high
dimensional context. Specifically, Ghosal [6] established posterior consistency (asymptotic normality in the Bernstein–von-
Mises sense) of the posterior for generalized linear models in a high dimensional setup. The number of regressors pn is
∗ Corresponding author.
E-mail address: ghoshm@stat.ufl.edu (M. Ghosh).
http://dx.doi.org/10.1016/j.jmva.2014.06.013
0047-259X/© 2014 Elsevier Inc. All rights reserved.
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 127
allowed to grow with the sample size n. In particular, it is assumed that p4
n log pn/n → 0. Later, Ghosal [7] established
asymptotic normality of the posterior for linear regression models in a similar high dimensional setup as Ghosal [6]. In [8],
asymptotic normality of the posterior was established for exponential families as the number of parameters grows with the
sample size. Bontemps [3] extended the work of Ghosal [7] by permitting the model to be misspecified and the number of
regressors to grow proportionally to the sample size. Barron et al. [1] and Ghosal et al. [9] have considered the notion of
posterior consistency in nonparametric settings.
In this paper, we focus on generalized linear models (GLM) with canonical link function. The main objective of this paper
is to extend the asymptotic consistency result of Ghosal [6], by providing a third order correct asymptotic expansion of
the posterior density for GLM with canonical link function when the number of regressors grows to infinity at a certain rate
relative to the growth of the sample size n. Since a general link function is a one-to-one function of the canonical link function,
we can get a similar asymptotic expansion for the vector of regression parameters in the general case as well. The results
bear potential for the development of a variety of objective priors in this framework. The first step towards the development
of reference priors, probability matching priors, moment matching priors and others requires asymptotic expansions of
posteriors (cf. [11]). In particular, we use the asymptotic expansion to derive moment matching priors (introduced in [12])
in the GLM setting. To the best of the authors’ knowledge, identification of moment matching priors in this setting (both
when the number of regressors is bounded, and when the number of regressors increases with n) has not been undertaken
in the literature.
The paper is organized as follows. In Section 2, we introduce the model and provide the required assumptions. In Section 3,
we prove the main asymptotic expansion result (Theorem 1). In Section 4, we use this asymptotic expansion to derive
moment matching priors. The Appendix contains proofs which establish that the assumptions (in Section 2) on the prior
density are satisfied by the multivariate normal and multivariate t densities.
2. Preliminaries
2.1. Setup and assumptions
Let X1, . . . , Xn be independent random variables. Let fi(·) denote the density of Xi with respect to a σ-finite measure ν.
Suppose
fi(xi) = exp[xiθi − ψ(θi)], i = 1, . . . , n, (2.1)
where, θi = zT
i β, β = (β1, . . . , βpn )T
is the vector of parameters and zi = (zi1, . . . , zipn )T
is the vector of covariates for
i = 1, . . . , n. Note that we are allowing the dimension pn of the parameter β to grow with the sample size n. Also, the
cumulant generating function ψ is infinitely differentiable and is assumed to be strictly convex. The above model is termed
by Haberman [14] as the ‘‘Dempster model’’.
Let π(·) denote the prior density of β. Then the posterior density of β given the observations X1, . . . , Xn is defined by:
π(β | X) =
exp[ln(β)]π(β)

exp[ln(β)]π(β)dβ
, (2.2)
where,
ln(β) =
n
i=1
(XizT
i β − ψ(zT
i β))
is the log-likelihood function. Note that the covariate vectors z1, . . . , zn, the true parameter value β0, the prior π(·), the posterior
π(· | X) all change with n. However, we suppress this dependence in our notation for simplicity of exposition. We now state the
regularity conditions needed for our result.
• (A-0) The matrix An defined by the relation
An =
n
i=1
zizT
i
is positive definite and the eigenvalues of 1
n
An are uniformly bounded, i.e., ∃ constants C1 and C2 (independent of n) such
that the matrix 1
n
An satisfies the following.
0 < C1 < λmin

1
n
An

≤ λmax

1
n
An

< C2 < ∞,
for all n. Here λmax and λmin respectively denote the largest and smallest eigenvalues of the appropriate matrix. Further,
we assume that ∥zi∥ =

zT
i zi = O(
√
pn). More specifically, there exists a constant M (independent of n) such that
∥zi∥ ≤ M
√
pn.
128 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
• (A-1) Let β0 denote the (sequence of) true value of the regression parameter vector β. Note that θ0i = zT
i β0 is the true
value of the parameter θi in (2.1). We assume that max1≤i≤n |θ0i| is uniformly bounded as n varies, i.e., there exists a
constant K (independent of n) such that
max
1≤i≤n
|zT
i β0| = max
1≤i≤n
|θ0i| < K. (2.3)
As mentioned in [6,7], this assumption makes sense particularly if the data is clean from extreme outliers. As in [6,7], we
also assume that the parameter space is restricted to those values of β for which
max
1≤i≤n
|zT
i β| ≤ K′
, (2.4)
for some K′
> K. This is equivalent to the statement that the parameter space is restricted to Θn, where
Θn = {β : max
1≤i≤n
|zT
i β| ≤ K′
}. (2.5)
Note that Θn is a convex set. The posterior density of β given the observations X1, . . . , Xn (introduced in (2.2)) is more
precisely given by
π(β | X) =
exp[ln(β)]π(β)

Θn
exp[ln(β)]π(β)dβ
1{β∈Θn}. (2.6)
We refer the reader to Ghosal [6,7] for details and discussion about this assumption. The summary is that a frequentist
can think of this as a compactness assumption to prevent the posterior mass from going to infinity. A Bayesian can think
this as a convenient and reasonable prior belief about θ. It should be noted that actual knowledge of K and K′
is not
required to obtain the main terms (up to the third order) in the expansion in Theorem 1. But K and K′
do control the rate
at which the op(1) terms in the expansion converge to 0.
In this context it is also important to clarify that when we propose priors like multivariate normal or multivariate t for
β, we implicitly truncate these priors to the region Θn.
• (A-2) The prior density π(·) of β satisfies

Θn
π(β)dβ = 1 and π(β0) > ηpn
0 , for some η0 > 0 (η0 does not depend on n).
Also, π(·) is assumed to be twice continuously differentiable with
sup
∥β−β0∥≤Cn
∥∇ log π(β)∥2 < M1p3/2
n for some M1 > 0, (2.7)
and
sup
∥β−β0∥≤Cn
max
1≤j,j′≤pn




1
π(β)

∂2
π(β)
∂βj∂βj′



 < M2p5/2
n for some M2 > 0, where, Cn = 4

pn
n
. (2.8)
This assumption is satisfied by appropriate multivariate t and multivariate normal densities (see Appendix). Note that
the prior density can be improper as a density on Rpn . We only assume that it has been normalized to integrate to 1 on the
compact set Θn.
• (A-3) The dimension pn can grow to infinity such that
p
6+ϵ
n
n
→ 0 as n → ∞ for some small ϵ > 0.
Note that (A-3) is stronger than the corresponding assumption in [6] which only requires p4
n log pn/n → 0. However, the
goal in [6] is to establish asymptotic normality of the posterior. Our goal is to get a third order asymptotic expansion of the
posterior. Hence it is not surprising that we need a slower rate of increase for pn.
2.2. Asymptotic convergence rate for MLE
Let ˆβn be the maximum likelihood estimator of β. It follows by the convexity of ψ and assumption (A-0) that the Hessian
matrix of ln(β) is a negative definite matrix for all β. Hence ln(β) is a strictly convex function and has a unique maximum.
The following lemma (Lemma 1) establishes weak consistency of the maximum likelihood estimator ˆβn, and provides an
asymptotic rate of convergence. This lemma is helpful in proving the main result (Theorem 1). Haberman [14] established
consistency and asymptotic normality for the MLE in exponential response models, a more general version of the Dempster
model considered here, when
p3
n
n
→ 0. However, it is not quite clear if Haberman’s results can be used under our assumptions
to obtain the asymptotic rate in Lemma 1. Hence, for the sake of completeness, we provide an independent proof of Lemma 1
in the Appendix by adapting the approach of Fan and Peng [5] (in the i.i.d. setting) to the GLM setting.
We briefly mention some other works on high dimensional consistency and asymptotic normality of the MLE, and
the differences between our setup and the setup in those papers. Portnoy [19,20] established consistency and asymptotic
normality of M-estimators in the context of linear regression, as the number of regression parameters pn grows with the
sample size n (satisfying the condition
(pn log pn)3/2
n
→ 0).1
Portnoy [21] established consistency and asymptotic normality
1 See [19,20] for references to earlier works in this area.
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 129
of the MLE for i.i.d. observations from exponential families, as the number of parameters pn grows with the sample size n
(satisfying the condition
p
3/2
n
n
→ 0). This is a different setting than the regression based setting (with covariates) considered
in this paper. Fan and Peng [5] established high dimensional consistency and asymptotic normality of penalized likelihood
estimators (MLE can be thought of as a special case). However, they considered the i.i.d. setting, which is different than
the setting in this paper. Zhang et al. [23] considered penalized pseudo-likelihood estimators for high dimensional GLM.
However, their Bregman divergence based loss functions do not include the negative log-likelihood loss function. More
specifically, in the context of GLM with canonical link, Zhang et al.’s [23] loss function looks like
n
i=1
−q(Xi) + q(ψ′
(zT
i β)) + (Xi − ψ′
(zT
i β))q′
(ψ′
(zT
i β)), (2.9)
where q(·) is a concave function. The log-likelihood function ln(β) cannot be written in this form. A proof of high dimensional
asymptotic normality of ˆβn in the special case of logistic regression is provided in [18].
Lemma 1. Under assumptions (A-0)–(A-3), the maximum likelihood estimator ˆβn satisfies ∥ˆβn − β0∥ = Op(

pn
n
).
Remark 1. Note that by Lemma 1 and (A-0),
|zT
i (ˆβn − β0)| ≤ ∥zi∥ ∥ˆβn − β0∥ = Op

pn
√
n

.
By (A-1), it follows that
ˆβn ∈

β : max
1≤i≤n
|zT
i β| <
K + K′
2

(2.10)
with probability tending to 1 as n → ∞. In particular, we get ˆβn ∈ Θn with probability tending to 1 as n → ∞.
3. Main result
In this section, we derive our main result: a third order correct asymptotic expansion of the posterior π(· | X) around
an appropriate normal density. We transform the parameter β to g =
√
n(β − ˆβn). Since the parameter space for β is Θn,
it follows that the parameter space for g is
Gn :=

g : ˆβn +
g
√
n
∈ Θn

.
From (2.2) we obtain that the posterior density of g is given by
π∗
(g | X) =
exp

ln

ˆβn + g
√
n

− ln(ˆβn)

π

ˆβn + g
√
n


Gn
exp

ln

ˆβn + g
√
n

− ln(ˆβn)

π

ˆβn + g
√
n

dg
1g∈Gn . (3.1)
We now prove a series of lemmas which help us to prove our main result (Theorem 1). We first show that Ghosal’s [6] result
on posterior consistency holds under our assumptions.
Lemma 2. Under assumptions (A-0)–(A-3) described above,

|π∗
(g | X) − Npn

g|µn, Σn

|dg → 0, (3.2)
where Npn

g|µn, Σn

is a pn-dimensional normal density with mean vector
µn =
√
nB−1
n
n
i=1

Xi − ψ′

zT
i β0

zi −
√
n(ˆβn − β0),
and the inverse covariance matrix
Σ−1
n =
1
n
Bn =
1
n
n
i=1
ψ′′

zT
i β0

zizT
i .
Proof. We verify that the assumptions in [6] follow from (A-0) to (A-3). Note that Ghosal (Eqs. (2.6) and (2.7)) follows
immediately from our assumptions (A-1) and (A-2). Let δn = ∥A
−1/2
n ∥. By (A-0), it follows that δn = O(n−1/2
). Note that by
130 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
(A-2), if ∥β − β0∥ ≤ 4

pn
n
, then by the mean value theorem,
| log π(β) − log π(β0)| ≤ sup
∥β−β0∥≤ 4
√pn
n
∥∇ log π(β)∥ ∥β − β0∥ ≤ M1p3/2
n ∥β − β0∥.
Note that
pn(log pn)1/2
δn = O

p
1+ ϵ
3
n
√
n

= o

4

pn
n

.
Hence, Ghosal [6, Eq. (2.8)] is satisfied with Kn = M1p
3/2
n . Note that
Knδnpn(log pn)1/2
=
p
5/2+ϵ/3
n
√
n
→ 0.
Let ηn = max1≤i≤n ∥A
−1/2
n zi∥. Then
ηn ≤ ∥A−1/2
n ∥ max
1≤i≤n
∥zi∥ = O

pn
n

,
where



A
− 1
2
n



 = sup






A
− 1
2
n x




∥x∥
: x ∈ Rn
with x ̸= 0



.
This means
p3/2
n (log pn)1/2
ηn = O

p3/2+ϵ/3
n

pn
n

= O

p
2+ϵ/3
n
√
n

→ 0.
Hence, Ghosal [6, Eq. (2.10)] is satisfied.
Now, since 1
n
n
i=1 zizT
i has uniformly bounded eigenvalues (by (A-0)), hence
tr

1
n
n
i=1
zizT
i

= O(pn).
Elementary manipulations using properties of trace imply that
n
i=1
pn
j=1
z2
ij = tr

n
i=1
zizT
i

= O(npn).
Thus, Ghosal [6, Eq. (2.11)] is also satisfied. Hence, all the assumptions in [6] hold. The lemma now follows from Theorem
2.1 of Ghosal [6] (using a straightforward linear transformation).
Define the function
Zn(g) := exp

ln

ˆβn +
g
√
n

− ln(ˆβn)

. (3.3)
Note that
π∗
(g|X) =
Zn(g)π

ˆβn + g
√
n


Gn
Zn(g)π

ˆβn + g
√
n

dg
1g∈Gn .
Henceforth, we assume that pn → ∞. If pn is uniformly bounded, a simple modification of the arguments below can be used to
establish the asymptotic expansion result. See Section 4.1.
Lemma 3. Let Cn :=

g : g ∈ Gn, ∥g∥ ≤ p
1
2
+ϵ′
n

and Kn = π(ˆβn)(2π)
pn
2


−
∇2ln(ˆβn)
n



1/2
, where ϵ′
= ϵ
6
. Then,

Cn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg
P
→ 1. (3.4)
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 131
Proof. Note that, Zn(g) = exp

ln(ˆβn + g
√
n
) − ln(ˆβn)

. By a third order correct Taylor series expansion of ln around ˆβn, we
get that
Zn(g) = exp

gT
∇2
ln(ˆβn)g
2n
−
1
6n3/2
n
i=1
ψ′′′
(zT
i β∗
n)

pn
r=1
zir gr
3

 , (3.5)
where β∗
n = β∗
n(g) is an intermediate point on the line joining between ˆβn and (ˆβn + g
√
n
). Note that by Lemma 1, ˆβn ∈ Θn
with probability tending to 1. Also, by the definition of Gn it follows that (ˆβn + g
√
n
) ∈ Θn for every g ∈ Gn. It follows by the
convexity of Θn and (2.10) that
P

β∗
n(g) ∈ Θn ∀ g ∈ Gn

→ 1, (3.6)
as n → ∞. Also, if g ∈ Cn,





pn
r=1
zir gr





≤ ∥zi∥ ∥g∥ ≤ M
√
pnp
1
2
+ϵ′
n = Mp1+ϵ′
n . (3.7)
Let
K2 := sup
x∈[−K′,K′]
ψ′′′
(x).
Note that K2 < ∞ by continuity of ψ′′′
. Hence, if ˆβn ∈ Θn and g ∈ Cn,






1
n3/2
n
i=1
ψ′′′
(zT
i β∗
n)

pn
r=1
zir gr
3






≤
K2
n3/2






n
i=1

pn
r=1
zir gr
3






≤
K2M3
p3+3ϵ′
n
√
n
. (3.8)
The previous inequality follows by (3.7). It follows by (A-3) that
sup
g∈Cn






1
n3/2
n
i=1
ψ′′′
(zT
i β∗
n)

pn
r=1
zir gr
3






= Op

p
3+ ϵ
2
n
√
n

= op(1). (3.9)
Also, by (A-2), it follows that if g ∈ Cn, then
π

ˆβn + g
√
n

π(ˆβn)
= exp

log π

ˆβn +
g
√
n

− log π(ˆβn)

= exp

(∇ log π(β∗∗
n ))T g
√
n

,
for some intermediate point β∗∗
n = β∗∗
n (g) on the line joining ˆβn and ˆβn + g
√
n
. Note that by Lemma 1 and (A-3) that
supg∈Cn
∥β∗∗
n − β0∥ = op

4

pn
n

. It follows by (A-2) that,
sup
g∈Cn
(∇ log π(β∗∗
n ))T g
√
n
≤ sup
g∈Cn

∥∇ log π(β∗∗
n )∥ ∥g∥
√
n

= Op

p2+ϵ′
n
√
n

= op(1). (3.10)
It follows by (3.5), (3.9), (3.10) and the definition of Kn that

Cn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg = exp(op(1))

Cn
Npn

g|0, ˆΣn

dg, (3.11)
132 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
where ˆΣn =

−∇2ln(ˆβn)
n
−1
=

1
n
n
i=1 ψ′′
(zT
i
ˆβn)zizT
i
−1
. Note that if Un ∼ Npn

0, ˆΣn

, then
sup
1≤i≤n



zT
i

ˆβn +
Un
√
n



 > K′
⇒ sup
1≤i≤n
|zT
i Un| >
√
n

K′
− sup
1≤i≤n


zT
i
ˆβn




⇒ ∥Un∥ >

n
pnM2

K′
− sup
1≤i≤n


zT
i
ˆβn




(by (A-0)). (3.12)
By the strict convexity of ψ, Lemma 1 and (A-0), it follows that
ENpn (0, ˆΣn)

∥Un∥2

= trace( ˆΣn) = Op(pn). (3.13)
Also by (2.10), it follows that
1
K′ − sup
1≤i≤n
|zT
i
ˆβn|
= Op(1). (3.14)
Note that Cc
n = Gc
n ∪

g : ∥g∥ > p
1
2
+ϵ′
n

. A simple application of Markov’s inequality, along with (3.12)–(3.14), yields that

Cc
n
Npn

g | 0, ˆΣn

dg ≤

Gc
n
Npn

g | 0, ˆΣn

dg +


g:∥g∥≥p
1
2
+ϵ′
n
 Npn

g | 0, ˆΣn

dg
≤
E∥Un∥2
pnM2
n

K′ − sup
1≤i≤n


zT
i
ˆβn



2
+
E∥Un∥2
p1+2ϵ′
n
= Op

p2
n
n

+ Op(p−2ϵ′
n ) = op(1). (3.15)
It follows by (3.11) and (3.15) that

Cn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg
P
→ 1
as n → ∞.
Lemma 4.

GnCn
π∗
(g|X)dg = op(1). (3.16)
Proof. Let Un ∼ Npn

g|µn, Σn

, where µn and Σn are as defined in the statement of Lemma 2. Note that
∥µn∥ ≤





√
nB−1
n
n
i=1

Xi − ψ′

zT
i β0

zi





+ OP (
√
pn). (3.17)
Since B−1
n = 1
n
Σn and ∥Σn∥ = O(1), it follows that
E





√
nB−1
n
n
i=1

Xi − ψ′

zT
i β0

zi





2
= O

E





1
√
n
n
i=1

Xi − ψ′

zT
i β0

zi





2


= O

E

1
n
n
i=1

Xi − ψ′

zT
i β0
2
zT
i zi

(∵ Xi’s are independent)
= O

1
n
n
i=1
ψ′′

zT
i β0

zT
i zi

S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 133
≤ O

1
n
max
1≤i≤n
ψ′′

zT
i β0
 n
i=1
zT
i zi

= O

1
n
n
i=1
zT
i zi


∵ By (A-1) and continuity of ψ′′

= O(pn) (∵ By (A-0)) .
It follows by (3.17) that
∥µn∥ = Op(
√
pn). (3.18)
Hence
ENpn (µn,Σn)∥Un∥2
= trace(Σn) + ∥µn∥2
= Op(pn).
By exactly the same argument as the one leading to Eq. (3.15) in the proof of Lemma 3, it follows that

Cc
n
Npn (µn, Σn)dg =
op(1). The result now follows by using Lemma 2.
Lemma 5.

Gn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg → 1.
Proof. Note that by Lemma 4,

GnCn
π∗
(g|X)dg =

GnCn
1
Kn
Zn(g)π

ˆβn + g
√
n

dg

Gn
1
Kn
Zn,ˆβn
(g)π

ˆβn + g
√
n

dg
→ 0.
Hence,

GnCn
1
Kn
Zn(g)π

ˆβn + g
√
n

dg

Cn
1
Kn
Zn(g)π

ˆβn + g
√
n

dg +

GnCn
1
Kn
Zn(g)π

ˆβn + g
√
n

dg
→ 0. (3.19)
Now, by Lemma 3,

Cn
1
Kn
Zn(g)π

ˆβn +
g
√
n

→ 1.
The result follows by (3.19).
We now state and prove the main result of the paper.
Theorem 1. Suppose β ∈ Rpn satisfies
√
n∥β − ˆβn∥ ≤ p
1
2
+ ϵ
6
n for every n. This is equivalent to the assumption that g ∈ Cn. In
such a case, under assumptions (A-0)–(A-3),
π∗
(g | X) = Npn

g|0, ˆΣn


1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit +
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
−

1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit
 
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)

×

1 − op(1)

, (3.20)
where, supg∈Cn
R(g) = Op

p
6+ϵ
n
n

and Npn

g|0, ˆΣn

is a pn-dimensional normal density with mean vector 0 and covariance
matrix ˆΣn =

−∇2ln(ˆβn)
n
−1
evaluated at g.
Remark 2. Note that by Lemma 4, the posterior probability that g does not lie in Cn converges to 0.
134 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
Proof. Since ∇ln(ˆβn) = 0, by a fourth order Taylor series expansion around ˆβn, we have:
ln

ˆβn +
g
√
n

− ln(ˆβn) =
1
2n
gT
∇2
ln(ˆβn)g +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn
+
1
24n2
pn
r,s,t,u=1
gr gsgt gu
∂4
ln(β)
∂βr ∂βs∂βt ∂βu




β=β∗
n
= A1(g) + A2(g) + A3(g) (say). (3.21)
Here β∗
n = β∗
n(g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Based on exactly the same argument leading
up to (3.6) (in the proof of Lemma 3),
P

β∗
n(g) ∈ Θn ∀ g ∈ Gn

→ 1, (3.22)
as n → ∞. Also,
π

ˆβn +
g
√
n

= π(ˆβn) +
1
√
n
gT
∇π(ˆβn) +
1
2n
gT
∇2
π(β∗∗
n )g
= π(ˆβn)

1 +
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
+
1
2n
gT
∇2
π(β∗∗
n )g
π(ˆβn)

= π(ˆβn)(1 + B1(g) + B2(g)) (say), (3.23)
where β∗∗
n = β∗∗
n (g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Based on exactly the same argument
leading up to (3.6) (in the proof of Lemma 3),
P

β∗∗
n (g) ∈ Θn ∀ g ∈ Gn

→ 1, (3.24)
as n → ∞.
We now analyze various terms in (3.21) and (3.23). By the continuity of ψ′′′
and the fact that ˆβn ∈ Θn with probability
tending to 1, it follows that
max
1≤i≤n
|ψ′′′
(zT
i
ˆβn)| = Op(1). (3.25)
Hence, for g ∈ Rpn ,
|A2(g)| =





1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn





=





1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

zir ziszit gr gsgt





≤
1
6n3/2
max
1≤i≤n


ψ′′′

zT
i
ˆβn



n
i=1
|zT
i g|3
. (3.26)
In particular, we get that for g ∈ Cn,
|A2(g)| ≤
1
6n3/2
max
1≤i≤n


ψ′′′

zT
i
ˆβn



n
i=1
(∥g∥ ∥zi∥)3
(∵ By Cauchy–Schwarz)
=
1
6n3/2
max
1≤i≤n


ψ′′′

zT
i
ˆβn


 ∥g∥3
n
i=1
∥zi∥3
≤ max
1≤i≤n


ψ′′′

zT
i
ˆβn




p
1
2
+ ϵ
6
n
3
n(M
√
pn)3
6n3/2
. (3.27)
The last inequality follows by using (A-0). It follows by (3.27) that
sup
g∈Cn
|A2(g)| = Op

p
3+ ϵ
2
n
√
n

. (3.28)
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 135
By the continuity of ψ′′′′
and (3.22), it follows that
sup
g∈Cn
max
1≤i≤n
|ψ′′′′
(zT
i β∗
n)| = Op(1). (3.29)
Hence, for g ∈ Gn,
|A3(g)| =





1
24n2
pn
r,s,t,u=1
gr gsgt gu
∂4
ln(β)
∂βr ∂βs∂βt ∂βu




β=β∗
n





=





1
24n2
pn
r,s,t,u=1
n
i=1
ψ′′′′

zT
i β∗
n

zir ziszit ziugr gsgt gu





≤
1
24n2
max
1≤i≤n

ψ′′′′

zT
i β∗
n


n
i=1
|zT
i g|4
. (3.30)
In particular, for g ∈ Cn, we get that
|A3(g)| ≤
1
24n2
max
1≤i≤n

ψ′′′′

zT
i β∗
n


n
i=1
(∥g∥ ∥zi∥)4
=
1
24n2
max
1≤i≤n

ψ′′′′

zT
i β∗
n

 ∥g∥4
n
i=1
∥zi∥4
≤ max
1≤i≤n

ψ′′′′

zT
i β∗
n



Cp
1
2
+ ϵ
6
n
4
n(M
√
pn)4
24n2
. (3.31)
The last inequality follows by using (A-0). It follows by (3.31) that
sup
g∈Cn
|A3(g)| = Op

p
4+ 2ϵ
3
n
n

 . (3.32)
Next we analyze the second order remainder term in (3.23). Note that ∥ˆβn − β0∥ = Op(

pn
n
) by Lemma 1 and supg∈Cn
∥β∗∗
n (g)− ˆβn∥ = Op

p
1
2
+ϵ′
n√
n

as β∗∗
n (g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Hence, by the triangle
inequality, we get that
sup
g∈Cn
∥β∗∗
n (g) − β0∥ = Op

p
1
2
+ϵ′
n
√
n

 . (3.33)
By (A-3), we get that
p
1
2
+ϵ′
n√
n
= o

4

pn
n

. By (A-2), it follows that
sup
g∈Cn
max
1≤r,s≤pn






1
π(β)
∂2
π(β)
∂βr ∂βs

β=β∗∗
n (g)





= Op(p4
n). (3.34)
Note that
π(β∗∗
n )
π(ˆβn)
= exp

log π(β∗∗
n ) − log(π(ˆβn))

= exp

∇ log π(β∗∗∗
n )
T

β∗∗
n − ˆβn

,
where β∗∗∗
n = β∗∗∗
n (g) is an intermediate point on the line joining β∗∗
n and ˆβn. Hence,
sup
g∈Cn
∥β∗∗∗
n − ˆβn∥ ≤ sup
g∈Cn
∥β∗∗
n − ˆβn∥ = Op

p
1
2
+ϵ′
n
√
n

 .
136 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
By (A-3),
p
1
2
+ϵ′
n√
n
= o

4

pn
n

. Hence, by Lemma 1 and (A-2), it follows that
sup
g∈Cn
∥∇ log π(β∗∗∗
n )∥ = Op(p3/2
n ).
Hence,
sup
g∈Cn





π(β∗∗
n )
π(ˆβn)





≤ sup
g∈Cn
exp

∥∇ log π(β∗∗∗
n )∥ ∥β∗∗
n − ˆβn∥

≤ exp

Op(p3/2
n )Op

p
1
2
+ϵ′
n
√
n




= Op(1). (3.35)
It follows that
|B2(g)| =

1
n
gT
∇2
π(β∗∗
n )g




π(ˆβn)



=





π(β∗∗
n )
π(ˆβn)










1
n
pn
r,s=1

1
π(β)
∂2
π(β)
∂βr ∂βs

β=β∗∗
n
gr gs





≤





π(β∗∗
n )
π(ˆβn)






1
n
pn
r,s=1






1
π(β)
∂2
π(β)
∂βr ∂βs

β=β∗∗
n





|gr | |gs|

≤





π(β∗∗
n )
π(ˆβn)






1
n
max
1≤r,s≤pn





1
π(β)
∂2
π(β)
∂βr ∂βs




β=β∗∗
n

pn∥g∥2
. (3.36)
It follows by (3.34)–(3.36) that
sup
g∈Cn
|B2(g)| = Op

p
9/2+ ϵ
3
n
n

. (3.37)
Note that by (3.1), (3.21) and (3.23), π∗
(g|X) = N/D, where,
N =
π(ˆβn) (1 + B1(g) + B2(g)) exp (A1(g) + A2(g) + A3(g))
π(ˆβn)

(2π)
pn
2


−
∇2ln(ˆβn)
n



1
2

= Npn

g|0, ˆΣn

(1 + B1(g) + B2(g)) exp (A2(g) + A3(g))
= Npn

g|0, ˆΣn

{(1 + B1(g)) (1 + A2(g)) + B2(g)(1 + A2(g) + A3(g)) + (1 + B1(g)) A3(g)}
+ Npn

g|0, ˆΣn

{(1 + B1(g) + B2(g)) (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g)))}
= Npn

g|0, ˆΣn

(N1(g) + N2(g) + N3(g) + N4(g)) , (say), (3.38)
and
D =

N(g)dg. (3.39)
Now, from (3.28), (3.32) and (3.37), it follows that
sup
g∈Cn
N2(g) = sup
g∈Cn
[B2(g)(1 + A2(g) + A3(g))] = Op

p
9/2+ ϵ
3
n
n

. (3.40)
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 137
In view of Lemma 1 and (A-2),
sup
g∈Cn
|1 + B1(g)| = 1 + sup
g∈Cn
1
√
n
pn
v=1
|gv(∇ log π(ˆβn))v|
≤ 1 + sup
g∈Cn
1
√
n
∥g∥ ∥∇ log π(ˆβn)∥
= 1 + Op

p2+ϵ′
n
√
n

= 1 + op(1). (3.41)
By (3.32), it follows that
sup
g∈Cn
N3(g) = sup
g∈Cn
[(1 + B1(g)) A3(g)] = Op

p
4+ 2ϵ
3
n
n

 . (3.42)
By (3.28) and (3.32),
sup
g∈Cn
|A2(g) + A3(g)| ≤ sup
g∈Cn
|A2(g)| + sup
g∈Cn
|A3(g)| = Op

p
3+ ϵ
2
n
√
n

.
It follows by (A-3) that for large enough n,
sup
g∈Cn
(exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g))) ≤ sup
g∈Cn
(A2(g) + A3(g))2
= Op

p6+ϵ
n
n

. (3.43)
It follows from (3.37), (3.41) and (3.43) that
sup
g∈Cn
N4(g) = sup
g∈Cn
[(1 + B1(g) + B2(g)) (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g)))]
= Op

p6+ϵ
n
n

. (3.44)
Let
R(g) := N2(g) + N3(g) + N4(g). (3.45)
It follows from (3.40), (3.42) and (3.44) that
sup
g∈Cn
(N2(g) + N3(g) + N4(g)) = Op

p6+ϵ
n
n

. (3.46)
By (3.39) and Lemma 5,
D =

Gn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg = 1 + op(1).
Thus,
π∗
(g|X) = N/D
= Npn

g|0, ˆΣn


1 +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn

×

1 +
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)


1 − op(1)

= Npn

g|0, ˆΣn


1 +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn
+
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
+

1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn
 
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)


1 − op(1)

138 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
= Npn

g|0, ˆΣn


1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit +
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
−

1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit
 
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)


1 − op(1)

,
where supg∈Cn
R(g) = Op

p
6+ϵ
n
n

.
Remark 3. Note that we have extended the first order results in [6] to a third order correct posterior expansion by requiring
stronger growth restrictions on pn. A natural question that arises is whether one can obtain a second order correct expansion
with weaker restrictions on the growth on pn. However, we have not considered second order correct expansions for two
reasons. Firstly, the derivation of a moment matching prior, which is the application that we consider in Section 4, requires a
third order correct asymptotic expansion. Secondly, the proof of Lemma 3 in the paper uses the assumption that p6+ϵ
n /n → 0
(see (3.9)). We would still need Lemma 3 to establish a second order correct posterior expansion. Therefore, establishing
a second order correct expansion would still require the same growth restriction (assuming the other conditions in (A-0),
(A-1), (A-2) and (A-3) are left unchanged).
Remark 4. Bontemps [3] establishes posterior consistency under Gaussianity, by relaxing the restrictions in [6,7] in several
ways. However, the arguments in the proof of Bontemps’ results (in particular Theorems 1 and 2 in Bontemps’ paper) rely
heavily on Gaussianity. We have made efforts to adapt them for other models, but have not been successful so far.
3.1. Posterior expansion for the uniformly bounded case
We can consider the case when pn is uniformly bounded, and obtain an expansion of the posterior density parallel to
(3.20). The fact that pn is uniformly bounded allows a slightly finer analysis of the terms in the expansion, which is useful
when deriving moment matching priors in Section 4. Firstly, we note that Lemmas 3–5 can be established by the same
set of arguments, by using, for example, Cn =

g : ∥g∥ < n
1
6+ϵ

instead of Cn =

g : ∥g∥ < p
1
2
+ ϵ
6
n

. Henceforth, in this
subsection, it will be assumed that Cn =

g : ∥g∥ < n
1
6+ϵ

. It can be easily seen by repeating appropriate steps in the proof
of Theorem 1 that in this case
|A2(g)| = Op

∥g∥3
√
n

, (3.47)
for every g ∈ Rpn , and
|A3(g)| = Op

∥g∥4
n

. (3.48)
for every g ∈ Gn. To clarify notation, (3.47) means that |A2(g)| is ∥g∥3
times a quantity which is independent of g, and is
Op

1√
n

.
Since Θn is a compact set, it follows by (3.24) and the twice continuous differentiability of π(·) that each entry of
∇π(ˆβn) and ∇2
π(β∗∗
n ) is bounded above in probability. Also, by (A-2), it follows that π(ˆβ) is bounded below in probability.
Combining (3.23) with the above facts gives us
|B1(g)| = Op

∥g∥
√
n

, (3.49)
for every g ∈ Rpn and
|B2(g)| = Op

∥g∥2
n

(3.50)
for every g ∈ Gn. It follows that
sup
g∈Cn
|B1(g)| = Op

1
n
4+ϵ
12+2ϵ

sup
g∈Cn
|B2(g)| = Op

1
n
4+ϵ
6+ϵ

S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 139
sup
g∈Cn
|A2(g)| = Op

1
n
ϵ
12+2ϵ

sup
g∈Cn
|A3(g)| = Op

1
n
2+ϵ
6+ϵ

. (3.51)
It follows by (3.38) (along with the arguments following it, adjusted for the fact that pn is uniformly bounded and for the
new choice of Cn) and (3.51) that for every g ∈ Cn,
π∗
(g|X) = Npn

g|0, ˆΣn


1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit +
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
−

1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit
 
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)

×

1 − op(1)

, (3.52)
and
|R(g)| = Op

∥g∥6
n

. (3.53)
Note that (3.52) is identical to (3.20). However, the order of the remainder term is different for the two settings. In the setting
for (3.20), we have supg∈Cn
R(g) = Op

p
6+ϵ
n
n

. However, in the current setting,
sup
g∈Cn
|R(g)| = Op

1
n
ϵ
6+ϵ

. (3.54)
Note that even in this case, the posterior probability of the set Cn converges to 1 as n → ∞. We conclude this section by
noting that if g is fixed (or ∥g∥ is uniformly bounded as n → ∞) then the order of the leading terms B1(g) and A2(g) is
1√
n
(as can be seen from (3.47) and (3.49)), while the order of the remainder term R(g) is 1
n
(as can be seen from (3.53)).
However, if one is looking for bounds uniformly over g ∈ Cn then the orders can be obtained from (3.51) and (3.54).
4. Moment matching prior
A moment matching prior (introduced by Ghosh and Liu [12]) is an objective prior for which the posterior mean matches
with the maximum likelihood estimator up to a high order of approximation. Ghosh and Liu [12] provide several examples
where they derive a moment matching prior using third order correct posterior expansions. In particular, they consider the
case with i.i.d. observations from a multi-parameter natural exponential family (with fixed p), and prove that the moment
matching prior in this situation can be uniquely determined, and in fact corresponds to Jeffreys’ general rule prior. However,
they did not consider the more complicated GLM setting. In this section we use the expansion in Theorem 1 to obtain
moment matching priors in the context of GLM with canonical link function (both when pn is uniformly bounded, and when
pn is unbounded). We will in fact show that the moment matching prior can be uniquely identified in this situation, and
corresponds to the Jeffreys’ general rule prior. In other words, Jeffreys’ general rule prior is the only prior which satisfies the
moment matching condition in the current GLM setup. We may add here that conditions for the propriety and existence of
moments for Jeffreys’ prior in the GLM setup (as well as the resulting posterior) have been addressed in [15].
The analysis in the current setup will be based on examining the posterior expectation of the quantity β − ˆβn. Note
that the remainder term in most posterior expansions (including the ones used in [12] and in this paper) is not uniformly
bounded in the variable used in the expansion (for example g in our setup) if we do not restrict to an appropriate set (such as
Cn in our setup). In fact, to show that the expected value (with respect to the posterior distribution) of the remainder term is
appropriately small, one has to restrict the computation of the expected value over a set such as Cn. Ghosh and Liu [12] take
a somewhat heuristic approach in their derivations, and do not take this issue into account. We undertake a more rigorous
approach to address this issue as follows. The computation of the posterior expectation for deriving the moment matching
prior will be restricted to the region Cn on which the appropriate bounds for the remainder term in the expansion are valid.
4.1. The uniformly bounded case
We first consider the case when the number of regressors pn is uniformly bounded. Note that the expansion in (3.52)
holds in this case with Cn = {g : ∥g∥ < n
1
6+ϵ }. The moment matching criterion of Ghosh and Liu [12] (with the modification
discussed above) dictates that the prior π(·) be chosen such that the posterior expectation
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

140 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
converges to zero faster than 1
n
. It follows by the expansion in (3.52) that
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

=
1
√
n

Cn
gπ∗
(g | X) dg
=
1 − op(1)
√
n

Cn
gNpn

g|0, ˆΣn


1 + B1(g) + A2(g) + B1(g)A2(g) + R(g)

dg. (4.1)
By (A-0) and (2.10), it follows that the eigenvalues of ˆΣn are uniformly bounded above (with probability tending to 1). Since
pn is uniformly bounded, for any subset S of Rpn , we get

S
∥g∥k
Npn

g|0, ˆΣn

dg ≤

Rpn
∥g∥k
Npn

g|0, ˆΣn

dg = Op(pk/2
n ) = Op(1) (4.2)
for every fixed k ∈ N. It follows by (3.47), (3.49) and (3.53) that

Cn
∥g∥Npn

g|0, ˆΣn


B1(g)A2(g) + R(g)

dg = Op

1
n

. (4.3)
Recall that Cc
n refers to Rpn  Cn. A simple application of Markov’s inequality, along with (3.12), (3.14), (4.2) and the uniform
boundedness of pn yields that

Cc
n
∥g∥k
Npn

g | 0, ˆΣn

dg ≤

Gc
n
∥g∥k
Npn

g | 0, ˆΣn

dg +


g:∥g∥>n
1
6+ϵ
 ∥g∥k
Npn

g | 0, ˆΣn

dg
≤

Op

1
n

+
1
n
2
6+ϵ
 
Rpn
∥g∥k+2
Npn

g | 0, ˆΣn

dg
= Op

1
n
2
6+ϵ

(4.4)
for every fixed k ∈ N. It follows by (3.47) and (3.49) that

Cc
n
∥g∥ |A2(g)|Npn

g|0, ˆΣn

dg = Op

1
n
1
2
+ 2
6+ϵ

, (4.5)
and

Cc
n
∥g∥ |B1(g)|Npn

g|0, ˆΣn

dg = Op

1
n
1
2
+ 2
6+ϵ

. (4.6)
Note that

Cn
gNpn

g|0, ˆΣn

dg +

Cc
n
gNpn

g|0, ˆΣn

dg =

Rpn
gNpn

g|0, ˆΣn

dg = 0. (4.7)
Another application of Markov’s inequality along the lines of (4.4) (but by increasing the moment by 6 instead of 2) gives

Cc
n
∥g∥Npn

g | 0, ˆΣn

dg ≤

Op

1
n3

+
1
n
6
6+ϵ
 
Rpn
∥g∥7
Npn

g | 0, ˆΣn

dg
= Op

1
n
6
6+ϵ

. (4.8)
It follows from (4.7) and (4.8) that

Cn
gNpn

g|0, ˆΣn

dg = Op

1
n
6
6+ϵ

. (4.9)
Here, when we say that a vector x is Op(cn), we mean that ∥x∥ is Op(cn). By (4.1), (4.3), (4.5), (4.6) and (4.9) we get
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

=
1 − op(1)
√
n

Cn
gNpn

g|0, ˆΣn


1 + B1(g) + A2(g)

dg + Op

1
n
3
2

=
1 − op(1)
√
n

Rpn
gNpn

g|0, ˆΣn


B1(g) + A2(g)

dg + Op

1
n
1+ 2
6+ϵ

. (4.10)
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 141
We now simplify the integral in (4.10). Note that

Rpn
gB1(g)Npn

g|0, ˆΣn

dg =
1
√
n

Rpn
ggT
Npn

g|0, ˆΣn

dg

∇ log π(ˆβn)
=
1
√
n
ˆΣn∇ log π(ˆβn). (4.11)
Note that by Isserlis’ formula for joint moments of a multivariate normal distribution, we get that for any 1 ≤ j, r, s, t ≤ pn,

Rpn
gjgr gsgt Npn

g|0, ˆΣn

dg = ˆΣn,jr
ˆΣn,st + ˆΣn,js
ˆΣn,rt + ˆΣn,jt
ˆΣn,rs. (4.12)
Let In(β) = 1
n
n
i=1 ψ′′

zT
i β

zizT
i , the information matrix evaluated at β. It follows by the definition of A2(g) and (4.12) that

Rpn
gjA2(g)Npn

g|0, ˆΣn

dg = −
1
6
√
n
pn
r,s,t=1

ˆΣn,jr
ˆΣn,st + ˆΣn,js
ˆΣn,rt + ˆΣn,jt
ˆΣn,rs

An,r,s,t , (4.13)
where
An,r,s,t =
1
n
n
i=1
ψ′′′

zT
i
ˆβn

zir ziszit =
∂
∂βr
(In(β))st




β=ˆβn
. (4.14)
Note that
∂
∂βr
log |In(β)| =
pn
s,t=1
(In(β)−1
)st
∂
∂βr
(In(β))st . (4.15)
By the symmetry in r, s, t on the right hand side of (4.13), it follows that

Rpn
gjA2(g)Npn

g|0, ˆΣn

dg = −
1
2
√
n
pn
r,s,t=1
ˆΣn,jr
ˆΣn,st An,r,s,t . (4.16)
Combining (4.14)–(4.16) along with the fact In(ˆβn) = ˆΣ−1
n , we get that

Rpn
gA2(g)Npn

g|0, ˆΣn

dg = −
1
2
√
n
ˆΣn∇ log |In(ˆβn)|. (4.17)
It follows by (4.10), (4.11) and (4.17) that to ensure
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

= Op

1
n
1+ 2
6+ϵ

,
the prior density π(·) should satisfy
ˆΣn∇ log π(ˆβn) −
1
2
ˆΣn∇ log |In(ˆβn)| = 0. (4.18)
Note that the maximum likelihood estimator ˆβn satisfies ∥ˆβn −β0∥
P
→ 0 as n → ∞. To ensure that (4.18) holds irrespective
of the true β0, we require
ˆΣn∇ log π(β) −
1
2
ˆΣn∇ log |In(β)| = 0 (4.19)
for every β. Since ˆΣn is a positive definite matrix with probability tending to 1, it follows that (4.19) holds if and only if
π(β) ∝ |In(β)|
1
2 .
To ensure that the assumptions in (A-2) hold, we choose
π(β) = Cn|In(β)|
1
2 , (4.20)
where Cn is chosen such that

Θn
π(β)dβ = 1. Since ψ is infinitely differentiable, and Θn is a compact set, it follows by
(A-0) that the eigenvalues of In(β) are uniformly bounded (above and below) over β ∈ Θn and n ∈ N. Since pn is uniformly
bounded, it follows that π(β) is uniformly bounded (above and below) over β ∈ Θn and n ∈ N. Since ψ is infinitely
differentiable, it also follows in particular that π(·) is twice continuously differentiable. Since Θn is a compact set, it follows
that all the first and second order derivatives of π(·) are uniformly bounded above over Θn and n. All these facts combined
together imply that π(·) satisfies the assumptions in (A-2).
142 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
4.2. The unbounded case
We now consider the case when pn → ∞ and p6+ϵ
n /n → 0 as n → ∞. For the moment matching prior derivation in
this case, (a) we will assume that there exists an α > 0 such that pn/nα
→ ∞ as n → ∞, and (b) we will replace the
assumption that ∥zi∥ ≤ M
√
pn for every 1 ≤ i ≤ n in (A-0), by the stronger assumption |zir | ≤ M for every 1 ≤ i ≤ n
and 1 ≤ r ≤ pn. Note that the M does not depend on n. Recall that the posterior expansion (3.20) in this case holds with
Cn = {g : ∥g∥ < p
1/2+ϵ/6
n }. The basic technique for deriving the moment matching prior remains the same as in the
uniformly bounded case. However, the order of various terms used in the analysis is different as compared to the uniformly
bounded case. Hence, this case is more complex, and needs a more careful consideration of all the relevant terms.
Note again that by (A-0) and (2.10), tr( ˆΣn) = Op(pn). By the analysis leading to (3.31) it follows that

Cn
∥g∥ |A3(g)|Npn

g|0, ˆΣn

dg = Op

1
n2
 n
i=1

Cn
∥g∥

zT
i g

4
Npn

g|0, ˆΣn

dg
= Op

1
n2
 n
i=1

Cn

∥g∥5
+

zT
i g

5

Npn

g|0, ˆΣn

dg. (4.21)
The previous step follows from an application of Minkowski’s inequality, in particular,
|ab| ≤
aq
q
+
b˜q
˜q
for every a, b ∈ R,
with q = 5 and ˜q = 5
4
. Note that by (A-0), zT
i
ˆΣnzi = Op(pn). It follows by (4.21) that

Cn
∥g∥ |A3(g)|Npn

g|0, ˆΣn

dg = Op

p
5/2
n
n

. (4.22)
By very similar arguments which use the analysis leading up to (3.31), (3.26), (3.36) and (3.41) respectively, it can be
established that

Cn
∥g∥(A3(g))2
Npn

g|0, ˆΣn

dg = Op

p
9/2
n
n2

(4.23)

Cn
∥g∥(A2(g))2
Npn

g|0, ˆΣn

dg = Op

p
7/2
n
n

(4.24)

Cn
∥g∥ |B2(g)|Npn

g|0, ˆΣn

dg = Op

p5
n
n

(4.25)

Cn
∥g∥ |A2(g)B1(g)|Npn

g|0, ˆΣn

dg = Op

p
7/2
n
n

. (4.26)
It follows by the definition of R(g) in (3.45), and by (4.22)–(4.26) that

Cn
∥g∥(R(g) + A2(g)B1(g))Npn

g|0, ˆΣn

dg = Op

p5
n
n

. (4.27)
A simple application of Markov’s inequality along with (3.12) and (3.14) implies that

Cc
n
∥g∥ |B1(g)|Npn

g|0, ˆΣn

dg
=
1
√
n

Gc
n
∥g∥





∇ log π(ˆβn)
T
g



 Npn

g|0, ˆΣn

dg
+
1
√
n

{g:∥g∥>p
1/2+ϵ/6
n }
∥g∥





∇ log π(ˆβn)
T
g



 Npn

g|0, ˆΣn

dg
≤
1
√
n





1
n

K′ − sup
1≤i≤n


zT
i
ˆβn



2
+
1
p
1+ϵ/3
n






Rpn
∥g∥3





∇ log π(ˆβn)
T
g



 Npn

g|0, ˆΣn

dg
= Op

1
√
np
1+ϵ/3
n
 
Rpn

∥g∥6
+





∇ log π(ˆβn)
T
g




2

Npn

g|0, ˆΣn

dg. (4.28)
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 143
Note that by (A-0), (A-2) and (2.10), we get

∇ log π(ˆβn)
T
ˆΣn∇ log π(ˆβn) = Op(p3
n). It follows from (4.28) that

Cc
n
∥g∥ |B1(g)|Npn

g|0, ˆΣn

dg = Op

p
2−ϵ/3
n
√
n

. (4.29)
By a similar argument, it can be established that

Cc
n
∥g∥ |A2(g)|Npn

g|0, ˆΣn

dg = Op

p
2−ϵ/3
n
√
n

. (4.30)
Recall that there exists α > 0 such that pn/nα
→ ∞ as n → ∞. Let
α∗
= max

6
ϵ

1
α
+
1
2

, 4

.
An application of Markov’s inequality along the lines of (4.4) (but by increasing the moment by α∗
instead of 2) along with
the fact that pn = o(n1/6
) gives

Cc
n
∥g∥Npn

g | 0, ˆΣn

dg ≤

Op

p
α∗/2
n
nα∗/2

+
1
p
α∗/2+(α∗ϵ)/6
n
 
Rpn
∥g∥1+α∗
Npn

g | 0, ˆΣn

dg
= Op

p
1/2+α∗
n
nα∗/2

+ Op

p
(1+α∗)/2
n
p
α∗/2+(α∗ϵ)/6
n

= Op

1
n

. (4.31)
Since p6+ϵ
n /n → 0, it follows that
p5
n
n3/2
= op

p
2−ϵ/3
n
n

.
Using this fact along with (3.20), (4.7), (4.27), (4.29), (4.30) and (4.31), we get that
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

=
1 − op(1)
√
n

Cn
gNpn

g|0, ˆΣn
 
1 + B1(g) + A2(g)

dg + Op

p5
n
n
3
2

=
1 − op(1)
√
n

Rpn
gNpn

g|0, ˆΣn
 
B1(g) + A2(g)

dg + Op

p
2−ϵ/3
n
n

. (4.32)
Using exactly the same arguments following (4.10) in the uniformly bounded case, it follows that to obtain
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

= Op

p
2−ϵ/3
n
n

irrespective of the true value β0, we must have
π(β) ∝ |In(β)|
1
2 .
Hence, the moment matching prior (up to order p
2−ϵ/3
n /n) is given by
π(β) = Cn|In(β)|
1
2 , (4.33)
where Cn is chosen such that

Θn
π(β)dβ = 1 (note that such a choice of Cn is possible because Θn is a compact set, and π(·)
is a continuous function). Since ψ′′
is a strictly convex continuous function, it follows by the definition of Θn that ψ′′
(zT
i β) is
uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N. Hence, by (A-0), all the eigenvalues of In(β)
are uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N. It follows that |In(β)|1/pn is uniformly
bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N, which immediately implies that C
1/pn
n is uniformly
bounded (away from both zero and infinity) over n ∈ N. It follows by (4.33) that there exists η0 > 0 (not depending on n)
such that π(β) > ηpn
0 for every β ∈ Θn. Note that the dependence of π on n has been suppressed for simplicity of exposition.
144 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
We now verify that the prior density π(·) in (4.33) satisfies assumptions (2.7) and (2.8). Since ψ is infinitely differentiable,
it follows in particular that π(·) is twice continuously differentiable. Note that
∂
∂βr
log π(β) =
1
2
∂
∂βr
log |In(β)| =
1
2
pn
s,t=1
(In(β)−1
)st
∂
∂βr
(In(β))st . (4.34)
Let
K2 = sup
x∈[−K′,K′]
|ψ′′′
(x)|
and recall that
∂
∂βr
(In(β))st =
1
n
pn
s,t=1
ψ′′′
(zT
i β)zir ziszit .
It follows by (A-0), (4.34) and |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn (see the first paragraph of this subsection) that
2K2M
n
An +
∂
∂βr
In(β) and
2K2M
n
An −
∂
∂βr
In(β) are both positive definite (4.35)
for every 1 ≤ r ≤ pn. It follows by (4.34) and (4.35) that




∂
∂βr
log π(β)



 =
1
2



tr

In(β)−1 ∂
∂βr
In(β)




≤ tr

In(β)−1 K2M
n
An

≤ pnλmin (In(β)) λmax

K2M
n
An

. (4.36)
Since the eigenvalues of In(β) are uniformly bounded below over β ∈ Θn and n ∈ N, it follows by (A-0) and (4.36) that there
exists M1 (independent of β and n) satisfying
∥∇ log π(β)∥ < M1p3/2
n (4.37)
for every β ∈ Θn and n ∈ N. Hence (2.7) is satisfied.
Let 1 ≤ j, j′
≤ pn be arbitrarily chosen. Then
1
π(β)

∂2
π(β)
∂βj∂βj′

=
∂2
∂βj∂βj′
log π(β) +

∂
∂βj
log π(β)
 
∂
∂βj′
log π(β)

. (4.38)
It follows by (4.34) that




∂2
∂βj∂βj′
log π(β)



 =
1
2



tr

∂
∂βj
In(β)−1
 
∂
∂βj′
In(β)

+ tr

In(β)−1 ∂2
∂βj∂βj′
In(β)




=
1
2



tr

In(β)−1

∂
∂βj
In(β)

In(β)−1

∂
∂βj′
In(β)

+ tr

In(β)−1 ∂2
∂βj∂βj′
In(β)



 . (4.39)
Note that
∂2
∂βj∂βj′
(In(β))st =
1
n
n
i=1
ψ′′′′
(zT
i β)zijzij′ ziszit . (4.40)
Let
K3 = sup
x∈[−K′,K′]
|ψ′′′′
(x)|.
It follows by (A-0), (4.40) and |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn (see the first paragraph of this subsection) that
2K3M2
n
An +
∂2
∂βj∂βj′
In(β) and
2K3M2
n
An −
∂2
∂βj∂βj′
In(β) are both positive definite (4.41)
for every 1 ≤ r ≤ pn. It follows by (4.35), (4.39) and (4.41) that




∂2
∂βj∂βj′
log π(β)



 ≤ 2K2
2 M2
tr

In(β)−1

An
n

In(β)−1

An
n

+ K3M2
tr

In(β)−1 An
n

. (4.42)
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 145
Since the eigenvalues of In(β) are uniformly bounded below over β ∈ Θn and n ∈ N, it follows by (A-0), (4.36), (4.38) and
(4.42) that there exists M2 (independent of β and n) satisfying
max
1≤j,j′≤pn




∂2
∂βj∂βj′
log π(β)



 ≤ M2p2
n.
Hence (2.8) is satisfied.
Acknowledgments
We would like to thank Prof. Subhashis Ghosal for his help with the paper. Thanks are also due to the Associate Editor
and a referee for their useful comments.
Appendix
Multivariate normal distribution satisfies assumptions in (2.7) and (2.8)
Suppose we put a normal prior on β, i.e., β ∼ Npn (µ, A). We assume that ∥µ∥ = O(
√
pn) and ∥A−1
∥ = O(
√
pn). Note
that
∇ log π(β) = −A−1
(β − µ).
Hence,
∥∇ log π(β)∥ ≤ ∥A−1
∥ ∥β − µ∥. (A.1)
Also,
∥β − µ∥ ≤ ∥β − β0∥ + ∥β0∥ + ∥µ∥
≤ ∥β − β0∥ +

1
C1



βT
0

1
n
n
i=1
zizT
i

β0 + ∥µ∥ (by assumption (A-0))
≤ ∥β − β0∥ +

1
C1



1
n
n
i=1
(zT
i β0)2 + ∥µ∥
≤ ∥β − β0∥ +
K
√
C1
+ ∥µ∥ (by assumption (A-1))
= ∥β − β0∥ + O(
√
pn). (A.2)
It follows from (A.1), (A.2) and the assumption on µ and A, that
sup
∥β−β0∥≤Cn
∥∇ log π(β)∥ = O(pn),
where, Cn = 4

pn
n
.
Note that for 1 ≤ j, j′
≤ pn,




1
π(β)

∂2
π(β)
∂βj∂βj′



 =





−(A−1
)jj′ +

pn
k=1
(A−1
)jk(βk − µk)
 
pn
k=1
(A−1
)j′k(βk − µk)





≤ ∥A−1
∥ +

∥A−1
∥ ∥β − µ∥
2
. (A.3)
It follows from (A.2) and (A.3) that
sup
∥β−β0∥≤Cn
max
1≤j,j′≤pn




1
π(β)

∂2
π(β)
∂βj∂βj′



 = O(p2
n),
where, Cn = 4

pn
n
.
146 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
Multivariate t distribution satisfies assumptions in (2.7) and (2.8)
Suppose we put a t-prior on β, i.e., β ∼ tγ (µ, A). Here tγ (µ, A) denotes the multivariate t distribution with parameters γ ,
µ and A. We take γ to be independent of n, but allow µ = µn and A = An to vary with n (the dependence on n is suppressed
henceforth for simplicity of exposition). The density of this distribution is proportional to

1 +
1
γ
(β − µ)T
A−1
(β − µ)
−(γ +pn)/2
.
We assume that ∥A−1
∥ = O(
√
pn). Now,
∇ log π(β) =
π′
(β)
π(β)
= −

γ + pn
γ

A−1
(β − µ)

1 + 1
γ
(β − µ)T A−1(β − µ)
.
Thus
∥∇ log π(β)∥ =

γ + pn
γ
 
(β − µ)T A−2(β − µ)

1 + 1
γ
(β − µ)T A−1(β − µ)

≤ O

γ + pn
γ
 
∥A−1∥

(β − µ)T A−1(β − µ)

1 + 1
γ
(β − µ)T A−1(β − µ)

≤ O(p5/4
n ).
Now, let A−1
= ((aij
)). By straightforward manipulations, we get
1
π(β)
∂2
π(β)
∂βj∂βj′
=
1
4γ 2
(γ + pn)(γ + pn + 2)
pn
k=1
akj
(βk − µk)
pn
k=1
akj′
(βk − µk)

1 + 1
γ
pn
k,l=1
akl(βk − µk)(βl − µl)
2
−
1
2γ
(γ + pn)
ajj′

1 + 1
γ
pn
k,l=1
akl(βk − µk)(βl − µl)

(a)
≤ O(p2
n)
((β − µ)T
A−1
)j((β − µ)T
A−1
)j′
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
(b)
≤ O(p2
n)
(β − µ)T
A−2
(β − µ)
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
(c)
≤ O(p2
n)
∥A−1
∥(β − µ)T
A−1
(β − µ)
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
= O(p5/2
n ),
where (a) and (c) follow from the assumption that ∥A−1
∥ = O(
√
pn), and (b) follows since ((β − µ)T
A−1
)j ≤
(β − µ)T A−2(β − µ), ∀j = 1 . . . pn.
Proof of Lemma 1
Let αn =

pn
n
. We will show that for any given ϵ, there exists a constant C such that
P

sup
∥u∥=C
ln(β0 + αnu) < ln(β0)

≥ 1 − ϵ (A.4)
for large enough n. This will imply, with probability tending to 1, that the unique maximum ˆβn lies in the ball

β0 + αnu : ∥u∥ ≤ C

, i.e., ∥ˆβn − β0∥ = Op(αn).
S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 147
Note that,
ln(β0 + αnu) − ln(β0)
= αn
n
i=1
XizT
i u −

n
i=1
ψ(zT
i (β0 + αnu)) − ψ(zT
i β0)

= αn
n
i=1

Xi − ψ′
(zT
i β0)

zT
i u −
α2
n
2
n
i=1
ψ′′
(zT
i β0)(zT
i u)2
−
α3
n
6
n
i=1
ψ′′′
(θ∗
i )(zT
i u)3
= I1 + I2 + I3, say,
where θ∗
i lies between zT
i β0 and zT
i (β0 + αnu), for every 1 ≤ i ≤ n.
Note that by (A-1), zT
i β0 is uniformly bounded (over i and n) and ψ′′
(·) is a continuous function. Hence, ψ′′
(zT
i β0) is also
uniformly bounded (over i and n) by say, K1. It follows that
E

n
i=1

Xi − ψ′
(zT
i β0)

zT
i u
2
=
n
i=1
E

(Xi − ψ′
(zT
i β0))2
(zT
i u)2

,

∵ Xi’s are independent and E[Xi] = ψ′
(zT
i β0)

=
n
i=1
(zT
i u)2
ψ′′
(zT
i β0)

∵ E

(Xi − ψ′
(zT
i β0))2

= ψ′′
(zT
i β0)

≤ K1
n
i=1
(zT
i u)2
≤ nK1uT

n
i=1
zizT
i
n

u
≤ nK1C2∥u∥2
.
The last step follows by (A-0). Hence E
n
i=1

Xi − ψ′
(zT
i β0)

zT
i u
2
= O(n)∥u∥2
. Thus,
I1 = Op(αn
√
n)∥u∥ = Op(
√
pn)∥u∥. (A.5)
Note that ψ is a strictly convex function and hence ψ′′
(·) > 0. Since ψ′′
is continuous, it follows that its infimum on
a bounded interval is strictly positive. By (A-1), zT
i β0 is uniformly bounded. This implies ψ′′
(zT
i β0) is uniformly bounded
below by a positive constant, say K2. Hence
I2 = −
α2
n
2
n
i=1
ψ′′
(zT
i β0)(zT
i u)2
≤ −K2
α2
n
2
n
i=1
(zT
i u)2
= −K2
α2
n
2
nuT

n
i=1
zizT
i /n

u
< 0,
by (A-0). Also, by (A-0) and the arguments above
|I2| ≥ K2
α2
n
2
nuT

n
i=1
zizT
i /n

u
≥ K2
α2
n
2
nC1∥u∥2
= C1K2pn∥u∥2
. (A.6)
148 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
Now, since θ∗
i lies between zT
i β0 and zT
i (β0 + αnu), it follows by (A-0) and (A-1) that
|θ∗
i | < max
1≤i≤n

|zT
i β0|, |zT
i (β0 + αnu)|

< max
1≤i≤n

K, K + αn|zT
i u|

≤ K + max
1≤i≤n
αn∥zi∥ ∥u∥
≤ K +

pn
n
O(
√
pn)∥u∥
= K + O

pn
√
n

∥u∥.
Hence ψ′′′
(θ∗
i ) is uniformly bounded by, say K3. Thus,
|I3| =





α3
n
6
n
i=1
ψ′′′
(θ∗
i )(zT
i u)3





≤ K3
α3
n
6
n
i=1
|(zT
i u)3
|
≤ K3
p
3/2
n
6n3/2
n
i=1
(∥zi∥ ∥u∥)3
=
K3M3/2
p3
n
6
√
n
∥u∥3
. (A.7)
The last step follows by (A-0). Since p6
n/n → 0 as n → ∞, it follows by (A.5)–(A.7) that the order of I2 dominates the orders
of I1 and I3 (for a suitable choice of ∥u∥). Since I2 is negative, the assertion in (A.4) holds.
References
[1] A. Barron, M. Schervish, L. Wasserman, The consistency of posterior distributions in nonparametric problems, Ann. Statist. 27 (1999) 536–561.
[2] S. Bernstein, Theory of Probability, 1917 (in Russian).
[3] D. Bontemps, Bernstein–von Mises theorems for Gaussian regression with increasing number of regressors, Ann. Statist. 39 (2011) 2557–2584.
[4] M. Crowder, Asymptotic expansions of posterior expectations, distributions and densities for stochastic processes, Ann. Inst. Statist. Math. 40 (1988)
297–309.
[5] J. Fan, H. Peng, Nonconcave penalized likelihood with a diverging number of parameters, Ann. Statist. 32 (2004) 928–961.
[6] S. Ghosal, Normal approximation to the posterior distribution for generalized linear models with many covariates, Math. Methods Statist. 6 (1997)
332–348.
[7] S. Ghosal, Asymptotic normality of posterior distributions in high dimensional linear models, Bernoulli 5 (1999) 315–331.
[8] S. Ghosal, Asymptotic normality of posterior distributions for exponential families with many parameters, J. Multivariate Anal. 74 (2000) 49–69.
[9] S. Ghosal, J. Ghosh, A. van der Vaart, Convergence rates of posterior distributions, Ann. Statist. 28 (2000) 500–531.
[10] S. Ghosal, T. Samanta, Asymptotic expansions of posterior distributions in nonregular cases, Ann. Inst. Statist. Math. 49 (1997) 181–197.
[11] M. Ghosh, Objective priors: an introduction for frequentists (with discussion), Statist. Sci. 26 (2011) 187–211.
[12] M. Ghosh, R. Liu, Moment matching priors, Sankhy¯a 73-A (2011) 185–201.
[13] J.K. Ghosh, B.K. Sinha, S.N. Joshi, Expansion for posterior probability and integrated Bayes risk, in: S.S. Gupta, J.O. Berger (Eds.), Statistical Decision
Theory and Related Topics III, Academic Press, 1982, pp. 403–456.
[14] S.J. Haberman, Maximum likelihood estimates in exponential response models, Ann. Statist. 5 (1977) 815–841.
[15] J.G. Ibrahim, P.W. Laud, On Bayesian analysis of generalized linear models using Jeffreys’s prior, J. Amer. Statist. Assoc. 86 (1991) 981–986.
[16] R.A. Johnson, On asymptotic expansion for posterior distribution, Ann. Math. Statist. 38 (1967) 1899–1906.
[17] R.A. Johnson, Asymptotic expansions associated with posterior distribution, Ann. Math. Statist. 42 (1970) 1241–1253.
[18] H. Liang, P. Du, Maximum likelihood estimation in logistic regression models with a diverging number of covariates, Electron. J. Stat. 6 (2012)
1838–1846.
[19] S. Portnoy, Asymptotic behavior of M-estimators of p regression parameters when p2
/n is large. I: Consistency, Ann. Statist. 12 (1984) 1298–1309.
[20] S. Portnoy, Asymptotic behavior of M-estimators of p regression parameters when p2
/n is large. II: Normal approximation, Ann. Statist. 13 (1985)
1403–1417.
[21] S. Portnoy, Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity, Ann. Statist. 16
(1988) 356–366.
[22] A.M. Walker, On the asymptotic behavior of the posterior distribution, J. R. Stat. Soc. Ser. B 26 (1969) 80–88.
[23] C.M. Zhang, Y. Jiang, Y. Chai, Penalized Bregman divergence for large-dimensional regression and classification, Biometrika 97 (2011) 551–566.

More Related Content

What's hot

Improving on daily measures of price discovery
Improving on daily measures of price discoveryImproving on daily measures of price discovery
Improving on daily measures of price discovery
FGV Brazil
 
Talk slides msast2016
Talk slides msast2016Talk slides msast2016
Talk slides msast2016
ychaubey
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
Laboratoire Statistique et génome
 
Teorema de Green-Tao
Teorema de Green-TaoTeorema de Green-Tao
Teorema de Green-Tao
XequeMateShannon
 
Chaubey seminarslides2017
Chaubey seminarslides2017Chaubey seminarslides2017
Chaubey seminarslides2017
ychaubey
 
Learning to summarize using coherence
Learning to summarize using coherenceLearning to summarize using coherence
Learning to summarize using coherence
Content Savvy
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
Masahiro Suzuki
 
Bd32360363
Bd32360363Bd32360363
Bd32360363
IJERA Editor
 
Selection of bin width and bin number in histograms
Selection of bin width and bin number in histogramsSelection of bin width and bin number in histograms
Selection of bin width and bin number in histograms
Kushal Kumar Dey
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spaces
Alexander Decker
 
Volume_7_avrami
Volume_7_avramiVolume_7_avrami
Volume_7_avramiJohn Obuch
 
Talk slides imsct2016
Talk slides imsct2016Talk slides imsct2016
Talk slides imsct2016
ychaubey
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
Masahiro Suzuki
 
OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...
OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...
OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...
mathsjournal
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
Masahiro Suzuki
 
Born reciprocity
Born reciprocityBorn reciprocity
Born reciprocityRene Kotze
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
Devansh16
 

What's hot (19)

Improving on daily measures of price discovery
Improving on daily measures of price discoveryImproving on daily measures of price discovery
Improving on daily measures of price discovery
 
Talk slides msast2016
Talk slides msast2016Talk slides msast2016
Talk slides msast2016
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
Teorema de Green-Tao
Teorema de Green-TaoTeorema de Green-Tao
Teorema de Green-Tao
 
Chaubey seminarslides2017
Chaubey seminarslides2017Chaubey seminarslides2017
Chaubey seminarslides2017
 
Learning to summarize using coherence
Learning to summarize using coherenceLearning to summarize using coherence
Learning to summarize using coherence
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
Bd32360363
Bd32360363Bd32360363
Bd32360363
 
Selection of bin width and bin number in histograms
Selection of bin width and bin number in histogramsSelection of bin width and bin number in histograms
Selection of bin width and bin number in histograms
 
03 choi
03 choi03 choi
03 choi
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spaces
 
Volume_7_avrami
Volume_7_avramiVolume_7_avrami
Volume_7_avrami
 
quantum gravity
quantum gravityquantum gravity
quantum gravity
 
Talk slides imsct2016
Talk slides imsct2016Talk slides imsct2016
Talk slides imsct2016
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
 
OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...
OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...
OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
Born reciprocity
Born reciprocityBorn reciprocity
Born reciprocity
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 

Viewers also liked

the-whole-bean-style-guide
the-whole-bean-style-guidethe-whole-bean-style-guide
the-whole-bean-style-guideShay Martin
 
Dialysis
DialysisDialysis
Dialysis
Naveen Kalwa
 
Presentación1
Presentación1Presentación1
Presentación1
kenyi Quiroz
 
El dret a l'energia (i les responsabilitats)
El dret a l'energia (i les responsabilitats)El dret a l'energia (i les responsabilitats)
El dret a l'energia (i les responsabilitats)
Pep Puig i Boix
 
Białka tatrzańska
Białka tatrzańskaBiałka tatrzańska
Białka tatrzańska
tabmonika
 
PwC Shanghai Conference
PwC Shanghai ConferencePwC Shanghai Conference
PwC Shanghai Conference
David Burrell
 
2017-2018_DOP_CIP_FINAL_SUBMISSION
2017-2018_DOP_CIP_FINAL_SUBMISSION2017-2018_DOP_CIP_FINAL_SUBMISSION
2017-2018_DOP_CIP_FINAL_SUBMISSIONJason Vincent, AICP
 
Problema agrario en venezuela
Problema agrario en venezuelaProblema agrario en venezuela
Problema agrario en venezuela
ramonquintero1679
 
J0 c.pptx
J0 c.pptxJ0 c.pptx
J0 c.pptx
nuurs15
 
English Speaking Classes in Baner Pune | Pune Training Institute
English Speaking Classes in Baner Pune  | Pune Training InstituteEnglish Speaking Classes in Baner Pune  | Pune Training Institute
English Speaking Classes in Baner Pune | Pune Training Institute
kunal gaikwad
 
FGM lääkärin työssä ja lääkärikoulutuksesssa
FGM lääkärin työssä ja lääkärikoulutuksesssaFGM lääkärin työssä ja lääkärikoulutuksesssa
FGM lääkärin työssä ja lääkärikoulutuksesssa
THL
 
Adote Pet
Adote PetAdote Pet
Adote Pet
ACIDADE ON
 

Viewers also liked (12)

the-whole-bean-style-guide
the-whole-bean-style-guidethe-whole-bean-style-guide
the-whole-bean-style-guide
 
Dialysis
DialysisDialysis
Dialysis
 
Presentación1
Presentación1Presentación1
Presentación1
 
El dret a l'energia (i les responsabilitats)
El dret a l'energia (i les responsabilitats)El dret a l'energia (i les responsabilitats)
El dret a l'energia (i les responsabilitats)
 
Białka tatrzańska
Białka tatrzańskaBiałka tatrzańska
Białka tatrzańska
 
PwC Shanghai Conference
PwC Shanghai ConferencePwC Shanghai Conference
PwC Shanghai Conference
 
2017-2018_DOP_CIP_FINAL_SUBMISSION
2017-2018_DOP_CIP_FINAL_SUBMISSION2017-2018_DOP_CIP_FINAL_SUBMISSION
2017-2018_DOP_CIP_FINAL_SUBMISSION
 
Problema agrario en venezuela
Problema agrario en venezuelaProblema agrario en venezuela
Problema agrario en venezuela
 
J0 c.pptx
J0 c.pptxJ0 c.pptx
J0 c.pptx
 
English Speaking Classes in Baner Pune | Pune Training Institute
English Speaking Classes in Baner Pune  | Pune Training InstituteEnglish Speaking Classes in Baner Pune  | Pune Training Institute
English Speaking Classes in Baner Pune | Pune Training Institute
 
FGM lääkärin työssä ja lääkärikoulutuksesssa
FGM lääkärin työssä ja lääkärikoulutuksesssaFGM lääkärin työssä ja lääkärikoulutuksesssa
FGM lääkärin työssä ja lääkärikoulutuksesssa
 
Adote Pet
Adote PetAdote Pet
Adote Pet
 

Similar to JMVA_Paper_Shibasish

Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Christian Robert
 
Gibbs Sampling with JAGS: Behind the Scenes
Gibbs Sampling with JAGS: Behind the ScenesGibbs Sampling with JAGS: Behind the Scenes
Gibbs Sampling with JAGS: Behind the Scenes
Gianpaolo Coro
 
Connectivity-Based Clustering for Mixed Discrete and Continuous Data
Connectivity-Based Clustering for Mixed Discrete and Continuous DataConnectivity-Based Clustering for Mixed Discrete and Continuous Data
Connectivity-Based Clustering for Mixed Discrete and Continuous Data
IJCI JOURNAL
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
Yuta Kashino
 
Considerations on the genetic equilibrium law
Considerations on the genetic equilibrium lawConsiderations on the genetic equilibrium law
Considerations on the genetic equilibrium law
IOSRJM
 
O0447796
O0447796O0447796
O0447796
IJERA Editor
 
TS-IASSL2014
TS-IASSL2014TS-IASSL2014
TS-IASSL2014
ychaubey
 
Fundamentals of Statistics - Bayesian Statistics.pdf
Fundamentals of Statistics - Bayesian Statistics.pdfFundamentals of Statistics - Bayesian Statistics.pdf
Fundamentals of Statistics - Bayesian Statistics.pdf
RizkyAlfianRizafahle
 
Inverse reliability copulas
Inverse reliability copulasInverse reliability copulas
Inverse reliability copulas
Anshul Goyal
 
Slides csm
Slides csmSlides csm
Slides csm
ychaubey
 
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
Dr. Amarjeet Singh
 
Unit3
Unit3Unit3
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Loc Nguyen
 
Measuring Robustness on Generalized Gaussian Distribution
Measuring Robustness on Generalized Gaussian DistributionMeasuring Robustness on Generalized Gaussian Distribution
Measuring Robustness on Generalized Gaussian Distribution
IJERA Editor
 
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATION
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATIONAN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATION
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATION
cscpconf
 
Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...
Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...
Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...
theijes
 

Similar to JMVA_Paper_Shibasish (20)

F0742328
F0742328F0742328
F0742328
 
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
 
Gibbs Sampling with JAGS: Behind the Scenes
Gibbs Sampling with JAGS: Behind the ScenesGibbs Sampling with JAGS: Behind the Scenes
Gibbs Sampling with JAGS: Behind the Scenes
 
Connectivity-Based Clustering for Mixed Discrete and Continuous Data
Connectivity-Based Clustering for Mixed Discrete and Continuous DataConnectivity-Based Clustering for Mixed Discrete and Continuous Data
Connectivity-Based Clustering for Mixed Discrete and Continuous Data
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 
Considerations on the genetic equilibrium law
Considerations on the genetic equilibrium lawConsiderations on the genetic equilibrium law
Considerations on the genetic equilibrium law
 
O0447796
O0447796O0447796
O0447796
 
TS-IASSL2014
TS-IASSL2014TS-IASSL2014
TS-IASSL2014
 
Fundamentals of Statistics - Bayesian Statistics.pdf
Fundamentals of Statistics - Bayesian Statistics.pdfFundamentals of Statistics - Bayesian Statistics.pdf
Fundamentals of Statistics - Bayesian Statistics.pdf
 
Inverse reliability copulas
Inverse reliability copulasInverse reliability copulas
Inverse reliability copulas
 
Slides csm
Slides csmSlides csm
Slides csm
 
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...
 
poster
posterposter
poster
 
Unit3
Unit3Unit3
Unit3
 
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...
 
Chi
ChiChi
Chi
 
Dissertation Paper
Dissertation PaperDissertation Paper
Dissertation Paper
 
Measuring Robustness on Generalized Gaussian Distribution
Measuring Robustness on Generalized Gaussian DistributionMeasuring Robustness on Generalized Gaussian Distribution
Measuring Robustness on Generalized Gaussian Distribution
 
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATION
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATIONAN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATION
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATION
 
Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...
Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...
Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...
 

JMVA_Paper_Shibasish

  • 1. Journal of Multivariate Analysis 131 (2014) 126–148 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Asymptotic expansion of the posterior density in high dimensional generalized linear models Shibasish Dasgupta, Kshitij Khare, Malay Ghosh∗ University of Florida, United States a r t i c l e i n f o Article history: Received 12 July 2013 Available online 21 June 2014 AMS 2010 subject classification: 62F15 Keywords: Asymptotic expansion of the posterior Generalized linear models Canonical link function High dimensional inference Moment matching priors a b s t r a c t While developing a prior distribution for any Bayesian analysis, it is important to check whether the corresponding posterior distribution becomes degenerate in the limit to the true parameter value as the sample size increases. In the same vein, it is also important to understand a more detailed asymptotic behavior of posterior distributions. This is particularly relevant in the development of many nonsubjective priors. The present paper focuses on asymptotic expansions of posteriors for generalized linear models with canonical link functions when the number of regressors grows to infinity at a certain rate relative to the growth of the sample size. These expansions are then used to derive moment matching priors in the generalized linear model setting. © 2014 Elsevier Inc. All rights reserved. 1. Introduction Bayesian methodology is gaining increasing prominence in the theory and application of statistics. Its versatility has enhanced due to its implementability via many statistical numerical integration techniques, in particular, the Markov chain Monte Carlo method. Nevertheless, it is important not to overlook asymptotic performance of any Bayesian procedure. Specifically, it is important to check whether a posterior distribution generated by a prior becomes degenerate in the limit to the true parameter value as the sample size grows to infinity. In the same vein, it is also important to understand a more detailed asymptotic behavior of the posterior distribution of the (appropriately normalized) parameter of interest. Asymptotic normality of the posterior for regular (when the support of the distribution does not depend on the parameter) family of distributions based on i.i.d. observations was first developed by Bernstein and Von Mises (see [2]). Later, analogous to frequentist Edgeworth expansion of the density or the distribution function, higher order asymptotic expansion of the posterior was developed to address various other important issues needed for Bayesian analysis, most prominently the development of non-subjective priors using a number of different criteria; see e.g. [11] where other references are cited. To our knowledge, the first work dealing with a comprehensive asymptotic expansion of the posterior is due to Johnson [16,17]. This was followed up later by Walker [22], Ghosh, Sinha and Joshi [13], Crowder [4], just to name a few. However, much of this work was focused on posteriors generated from i.i.d. observations generated from a regular family of distributions and a smooth family of priors admitting derivatives up to a certain order. Ghosal and Samanta [10] established asymptotic expansion of the posterior in the non-regular case, by considering a one-parameter family of discontinuous densities. Ghosal [6–8] made significant and topical contributions to this area by establishing posterior consistency in a high dimensional context. Specifically, Ghosal [6] established posterior consistency (asymptotic normality in the Bernstein–von- Mises sense) of the posterior for generalized linear models in a high dimensional setup. The number of regressors pn is ∗ Corresponding author. E-mail address: ghoshm@stat.ufl.edu (M. Ghosh). http://dx.doi.org/10.1016/j.jmva.2014.06.013 0047-259X/© 2014 Elsevier Inc. All rights reserved.
  • 2. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 127 allowed to grow with the sample size n. In particular, it is assumed that p4 n log pn/n → 0. Later, Ghosal [7] established asymptotic normality of the posterior for linear regression models in a similar high dimensional setup as Ghosal [6]. In [8], asymptotic normality of the posterior was established for exponential families as the number of parameters grows with the sample size. Bontemps [3] extended the work of Ghosal [7] by permitting the model to be misspecified and the number of regressors to grow proportionally to the sample size. Barron et al. [1] and Ghosal et al. [9] have considered the notion of posterior consistency in nonparametric settings. In this paper, we focus on generalized linear models (GLM) with canonical link function. The main objective of this paper is to extend the asymptotic consistency result of Ghosal [6], by providing a third order correct asymptotic expansion of the posterior density for GLM with canonical link function when the number of regressors grows to infinity at a certain rate relative to the growth of the sample size n. Since a general link function is a one-to-one function of the canonical link function, we can get a similar asymptotic expansion for the vector of regression parameters in the general case as well. The results bear potential for the development of a variety of objective priors in this framework. The first step towards the development of reference priors, probability matching priors, moment matching priors and others requires asymptotic expansions of posteriors (cf. [11]). In particular, we use the asymptotic expansion to derive moment matching priors (introduced in [12]) in the GLM setting. To the best of the authors’ knowledge, identification of moment matching priors in this setting (both when the number of regressors is bounded, and when the number of regressors increases with n) has not been undertaken in the literature. The paper is organized as follows. In Section 2, we introduce the model and provide the required assumptions. In Section 3, we prove the main asymptotic expansion result (Theorem 1). In Section 4, we use this asymptotic expansion to derive moment matching priors. The Appendix contains proofs which establish that the assumptions (in Section 2) on the prior density are satisfied by the multivariate normal and multivariate t densities. 2. Preliminaries 2.1. Setup and assumptions Let X1, . . . , Xn be independent random variables. Let fi(·) denote the density of Xi with respect to a σ-finite measure ν. Suppose fi(xi) = exp[xiθi − ψ(θi)], i = 1, . . . , n, (2.1) where, θi = zT i β, β = (β1, . . . , βpn )T is the vector of parameters and zi = (zi1, . . . , zipn )T is the vector of covariates for i = 1, . . . , n. Note that we are allowing the dimension pn of the parameter β to grow with the sample size n. Also, the cumulant generating function ψ is infinitely differentiable and is assumed to be strictly convex. The above model is termed by Haberman [14] as the ‘‘Dempster model’’. Let π(·) denote the prior density of β. Then the posterior density of β given the observations X1, . . . , Xn is defined by: π(β | X) = exp[ln(β)]π(β)  exp[ln(β)]π(β)dβ , (2.2) where, ln(β) = n i=1 (XizT i β − ψ(zT i β)) is the log-likelihood function. Note that the covariate vectors z1, . . . , zn, the true parameter value β0, the prior π(·), the posterior π(· | X) all change with n. However, we suppress this dependence in our notation for simplicity of exposition. We now state the regularity conditions needed for our result. • (A-0) The matrix An defined by the relation An = n i=1 zizT i is positive definite and the eigenvalues of 1 n An are uniformly bounded, i.e., ∃ constants C1 and C2 (independent of n) such that the matrix 1 n An satisfies the following. 0 < C1 < λmin  1 n An  ≤ λmax  1 n An  < C2 < ∞, for all n. Here λmax and λmin respectively denote the largest and smallest eigenvalues of the appropriate matrix. Further, we assume that ∥zi∥ =  zT i zi = O( √ pn). More specifically, there exists a constant M (independent of n) such that ∥zi∥ ≤ M √ pn.
  • 3. 128 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 • (A-1) Let β0 denote the (sequence of) true value of the regression parameter vector β. Note that θ0i = zT i β0 is the true value of the parameter θi in (2.1). We assume that max1≤i≤n |θ0i| is uniformly bounded as n varies, i.e., there exists a constant K (independent of n) such that max 1≤i≤n |zT i β0| = max 1≤i≤n |θ0i| < K. (2.3) As mentioned in [6,7], this assumption makes sense particularly if the data is clean from extreme outliers. As in [6,7], we also assume that the parameter space is restricted to those values of β for which max 1≤i≤n |zT i β| ≤ K′ , (2.4) for some K′ > K. This is equivalent to the statement that the parameter space is restricted to Θn, where Θn = {β : max 1≤i≤n |zT i β| ≤ K′ }. (2.5) Note that Θn is a convex set. The posterior density of β given the observations X1, . . . , Xn (introduced in (2.2)) is more precisely given by π(β | X) = exp[ln(β)]π(β)  Θn exp[ln(β)]π(β)dβ 1{β∈Θn}. (2.6) We refer the reader to Ghosal [6,7] for details and discussion about this assumption. The summary is that a frequentist can think of this as a compactness assumption to prevent the posterior mass from going to infinity. A Bayesian can think this as a convenient and reasonable prior belief about θ. It should be noted that actual knowledge of K and K′ is not required to obtain the main terms (up to the third order) in the expansion in Theorem 1. But K and K′ do control the rate at which the op(1) terms in the expansion converge to 0. In this context it is also important to clarify that when we propose priors like multivariate normal or multivariate t for β, we implicitly truncate these priors to the region Θn. • (A-2) The prior density π(·) of β satisfies  Θn π(β)dβ = 1 and π(β0) > ηpn 0 , for some η0 > 0 (η0 does not depend on n). Also, π(·) is assumed to be twice continuously differentiable with sup ∥β−β0∥≤Cn ∥∇ log π(β)∥2 < M1p3/2 n for some M1 > 0, (2.7) and sup ∥β−β0∥≤Cn max 1≤j,j′≤pn     1 π(β)  ∂2 π(β) ∂βj∂βj′     < M2p5/2 n for some M2 > 0, where, Cn = 4  pn n . (2.8) This assumption is satisfied by appropriate multivariate t and multivariate normal densities (see Appendix). Note that the prior density can be improper as a density on Rpn . We only assume that it has been normalized to integrate to 1 on the compact set Θn. • (A-3) The dimension pn can grow to infinity such that p 6+ϵ n n → 0 as n → ∞ for some small ϵ > 0. Note that (A-3) is stronger than the corresponding assumption in [6] which only requires p4 n log pn/n → 0. However, the goal in [6] is to establish asymptotic normality of the posterior. Our goal is to get a third order asymptotic expansion of the posterior. Hence it is not surprising that we need a slower rate of increase for pn. 2.2. Asymptotic convergence rate for MLE Let ˆβn be the maximum likelihood estimator of β. It follows by the convexity of ψ and assumption (A-0) that the Hessian matrix of ln(β) is a negative definite matrix for all β. Hence ln(β) is a strictly convex function and has a unique maximum. The following lemma (Lemma 1) establishes weak consistency of the maximum likelihood estimator ˆβn, and provides an asymptotic rate of convergence. This lemma is helpful in proving the main result (Theorem 1). Haberman [14] established consistency and asymptotic normality for the MLE in exponential response models, a more general version of the Dempster model considered here, when p3 n n → 0. However, it is not quite clear if Haberman’s results can be used under our assumptions to obtain the asymptotic rate in Lemma 1. Hence, for the sake of completeness, we provide an independent proof of Lemma 1 in the Appendix by adapting the approach of Fan and Peng [5] (in the i.i.d. setting) to the GLM setting. We briefly mention some other works on high dimensional consistency and asymptotic normality of the MLE, and the differences between our setup and the setup in those papers. Portnoy [19,20] established consistency and asymptotic normality of M-estimators in the context of linear regression, as the number of regression parameters pn grows with the sample size n (satisfying the condition (pn log pn)3/2 n → 0).1 Portnoy [21] established consistency and asymptotic normality 1 See [19,20] for references to earlier works in this area.
  • 4. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 129 of the MLE for i.i.d. observations from exponential families, as the number of parameters pn grows with the sample size n (satisfying the condition p 3/2 n n → 0). This is a different setting than the regression based setting (with covariates) considered in this paper. Fan and Peng [5] established high dimensional consistency and asymptotic normality of penalized likelihood estimators (MLE can be thought of as a special case). However, they considered the i.i.d. setting, which is different than the setting in this paper. Zhang et al. [23] considered penalized pseudo-likelihood estimators for high dimensional GLM. However, their Bregman divergence based loss functions do not include the negative log-likelihood loss function. More specifically, in the context of GLM with canonical link, Zhang et al.’s [23] loss function looks like n i=1 −q(Xi) + q(ψ′ (zT i β)) + (Xi − ψ′ (zT i β))q′ (ψ′ (zT i β)), (2.9) where q(·) is a concave function. The log-likelihood function ln(β) cannot be written in this form. A proof of high dimensional asymptotic normality of ˆβn in the special case of logistic regression is provided in [18]. Lemma 1. Under assumptions (A-0)–(A-3), the maximum likelihood estimator ˆβn satisfies ∥ˆβn − β0∥ = Op(  pn n ). Remark 1. Note that by Lemma 1 and (A-0), |zT i (ˆβn − β0)| ≤ ∥zi∥ ∥ˆβn − β0∥ = Op  pn √ n  . By (A-1), it follows that ˆβn ∈  β : max 1≤i≤n |zT i β| < K + K′ 2  (2.10) with probability tending to 1 as n → ∞. In particular, we get ˆβn ∈ Θn with probability tending to 1 as n → ∞. 3. Main result In this section, we derive our main result: a third order correct asymptotic expansion of the posterior π(· | X) around an appropriate normal density. We transform the parameter β to g = √ n(β − ˆβn). Since the parameter space for β is Θn, it follows that the parameter space for g is Gn :=  g : ˆβn + g √ n ∈ Θn  . From (2.2) we obtain that the posterior density of g is given by π∗ (g | X) = exp  ln  ˆβn + g √ n  − ln(ˆβn)  π  ˆβn + g √ n   Gn exp  ln  ˆβn + g √ n  − ln(ˆβn)  π  ˆβn + g √ n  dg 1g∈Gn . (3.1) We now prove a series of lemmas which help us to prove our main result (Theorem 1). We first show that Ghosal’s [6] result on posterior consistency holds under our assumptions. Lemma 2. Under assumptions (A-0)–(A-3) described above,  |π∗ (g | X) − Npn  g|µn, Σn  |dg → 0, (3.2) where Npn  g|µn, Σn  is a pn-dimensional normal density with mean vector µn = √ nB−1 n n i=1  Xi − ψ′  zT i β0  zi − √ n(ˆβn − β0), and the inverse covariance matrix Σ−1 n = 1 n Bn = 1 n n i=1 ψ′′  zT i β0  zizT i . Proof. We verify that the assumptions in [6] follow from (A-0) to (A-3). Note that Ghosal (Eqs. (2.6) and (2.7)) follows immediately from our assumptions (A-1) and (A-2). Let δn = ∥A −1/2 n ∥. By (A-0), it follows that δn = O(n−1/2 ). Note that by
  • 5. 130 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 (A-2), if ∥β − β0∥ ≤ 4  pn n , then by the mean value theorem, | log π(β) − log π(β0)| ≤ sup ∥β−β0∥≤ 4 √pn n ∥∇ log π(β)∥ ∥β − β0∥ ≤ M1p3/2 n ∥β − β0∥. Note that pn(log pn)1/2 δn = O  p 1+ ϵ 3 n √ n  = o  4  pn n  . Hence, Ghosal [6, Eq. (2.8)] is satisfied with Kn = M1p 3/2 n . Note that Knδnpn(log pn)1/2 = p 5/2+ϵ/3 n √ n → 0. Let ηn = max1≤i≤n ∥A −1/2 n zi∥. Then ηn ≤ ∥A−1/2 n ∥ max 1≤i≤n ∥zi∥ = O  pn n  , where    A − 1 2 n     = sup       A − 1 2 n x     ∥x∥ : x ∈ Rn with x ̸= 0    . This means p3/2 n (log pn)1/2 ηn = O  p3/2+ϵ/3 n  pn n  = O  p 2+ϵ/3 n √ n  → 0. Hence, Ghosal [6, Eq. (2.10)] is satisfied. Now, since 1 n n i=1 zizT i has uniformly bounded eigenvalues (by (A-0)), hence tr  1 n n i=1 zizT i  = O(pn). Elementary manipulations using properties of trace imply that n i=1 pn j=1 z2 ij = tr  n i=1 zizT i  = O(npn). Thus, Ghosal [6, Eq. (2.11)] is also satisfied. Hence, all the assumptions in [6] hold. The lemma now follows from Theorem 2.1 of Ghosal [6] (using a straightforward linear transformation). Define the function Zn(g) := exp  ln  ˆβn + g √ n  − ln(ˆβn)  . (3.3) Note that π∗ (g|X) = Zn(g)π  ˆβn + g √ n   Gn Zn(g)π  ˆβn + g √ n  dg 1g∈Gn . Henceforth, we assume that pn → ∞. If pn is uniformly bounded, a simple modification of the arguments below can be used to establish the asymptotic expansion result. See Section 4.1. Lemma 3. Let Cn :=  g : g ∈ Gn, ∥g∥ ≤ p 1 2 +ϵ′ n  and Kn = π(ˆβn)(2π) pn 2   − ∇2ln(ˆβn) n    1/2 , where ϵ′ = ϵ 6 . Then,  Cn 1 Kn Zn(g)π  ˆβn + g √ n  dg P → 1. (3.4)
  • 6. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 131 Proof. Note that, Zn(g) = exp  ln(ˆβn + g √ n ) − ln(ˆβn)  . By a third order correct Taylor series expansion of ln around ˆβn, we get that Zn(g) = exp  gT ∇2 ln(ˆβn)g 2n − 1 6n3/2 n i=1 ψ′′′ (zT i β∗ n)  pn r=1 zir gr 3   , (3.5) where β∗ n = β∗ n(g) is an intermediate point on the line joining between ˆβn and (ˆβn + g √ n ). Note that by Lemma 1, ˆβn ∈ Θn with probability tending to 1. Also, by the definition of Gn it follows that (ˆβn + g √ n ) ∈ Θn for every g ∈ Gn. It follows by the convexity of Θn and (2.10) that P  β∗ n(g) ∈ Θn ∀ g ∈ Gn  → 1, (3.6) as n → ∞. Also, if g ∈ Cn,      pn r=1 zir gr      ≤ ∥zi∥ ∥g∥ ≤ M √ pnp 1 2 +ϵ′ n = Mp1+ϵ′ n . (3.7) Let K2 := sup x∈[−K′,K′] ψ′′′ (x). Note that K2 < ∞ by continuity of ψ′′′ . Hence, if ˆβn ∈ Θn and g ∈ Cn,       1 n3/2 n i=1 ψ′′′ (zT i β∗ n)  pn r=1 zir gr 3       ≤ K2 n3/2       n i=1  pn r=1 zir gr 3       ≤ K2M3 p3+3ϵ′ n √ n . (3.8) The previous inequality follows by (3.7). It follows by (A-3) that sup g∈Cn       1 n3/2 n i=1 ψ′′′ (zT i β∗ n)  pn r=1 zir gr 3       = Op  p 3+ ϵ 2 n √ n  = op(1). (3.9) Also, by (A-2), it follows that if g ∈ Cn, then π  ˆβn + g √ n  π(ˆβn) = exp  log π  ˆβn + g √ n  − log π(ˆβn)  = exp  (∇ log π(β∗∗ n ))T g √ n  , for some intermediate point β∗∗ n = β∗∗ n (g) on the line joining ˆβn and ˆβn + g √ n . Note that by Lemma 1 and (A-3) that supg∈Cn ∥β∗∗ n − β0∥ = op  4  pn n  . It follows by (A-2) that, sup g∈Cn (∇ log π(β∗∗ n ))T g √ n ≤ sup g∈Cn  ∥∇ log π(β∗∗ n )∥ ∥g∥ √ n  = Op  p2+ϵ′ n √ n  = op(1). (3.10) It follows by (3.5), (3.9), (3.10) and the definition of Kn that  Cn 1 Kn Zn(g)π  ˆβn + g √ n  dg = exp(op(1))  Cn Npn  g|0, ˆΣn  dg, (3.11)
  • 7. 132 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 where ˆΣn =  −∇2ln(ˆβn) n −1 =  1 n n i=1 ψ′′ (zT i ˆβn)zizT i −1 . Note that if Un ∼ Npn  0, ˆΣn  , then sup 1≤i≤n    zT i  ˆβn + Un √ n     > K′ ⇒ sup 1≤i≤n |zT i Un| > √ n  K′ − sup 1≤i≤n   zT i ˆβn     ⇒ ∥Un∥ >  n pnM2  K′ − sup 1≤i≤n   zT i ˆβn     (by (A-0)). (3.12) By the strict convexity of ψ, Lemma 1 and (A-0), it follows that ENpn (0, ˆΣn)  ∥Un∥2  = trace( ˆΣn) = Op(pn). (3.13) Also by (2.10), it follows that 1 K′ − sup 1≤i≤n |zT i ˆβn| = Op(1). (3.14) Note that Cc n = Gc n ∪  g : ∥g∥ > p 1 2 +ϵ′ n  . A simple application of Markov’s inequality, along with (3.12)–(3.14), yields that  Cc n Npn  g | 0, ˆΣn  dg ≤  Gc n Npn  g | 0, ˆΣn  dg +   g:∥g∥≥p 1 2 +ϵ′ n  Npn  g | 0, ˆΣn  dg ≤ E∥Un∥2 pnM2 n  K′ − sup 1≤i≤n   zT i ˆβn    2 + E∥Un∥2 p1+2ϵ′ n = Op  p2 n n  + Op(p−2ϵ′ n ) = op(1). (3.15) It follows by (3.11) and (3.15) that  Cn 1 Kn Zn(g)π  ˆβn + g √ n  dg P → 1 as n → ∞. Lemma 4.  GnCn π∗ (g|X)dg = op(1). (3.16) Proof. Let Un ∼ Npn  g|µn, Σn  , where µn and Σn are as defined in the statement of Lemma 2. Note that ∥µn∥ ≤      √ nB−1 n n i=1  Xi − ψ′  zT i β0  zi      + OP ( √ pn). (3.17) Since B−1 n = 1 n Σn and ∥Σn∥ = O(1), it follows that E      √ nB−1 n n i=1  Xi − ψ′  zT i β0  zi      2 = O  E      1 √ n n i=1  Xi − ψ′  zT i β0  zi      2   = O  E  1 n n i=1  Xi − ψ′  zT i β0 2 zT i zi  (∵ Xi’s are independent) = O  1 n n i=1 ψ′′  zT i β0  zT i zi 
  • 8. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 133 ≤ O  1 n max 1≤i≤n ψ′′  zT i β0  n i=1 zT i zi  = O  1 n n i=1 zT i zi   ∵ By (A-1) and continuity of ψ′′  = O(pn) (∵ By (A-0)) . It follows by (3.17) that ∥µn∥ = Op( √ pn). (3.18) Hence ENpn (µn,Σn)∥Un∥2 = trace(Σn) + ∥µn∥2 = Op(pn). By exactly the same argument as the one leading to Eq. (3.15) in the proof of Lemma 3, it follows that  Cc n Npn (µn, Σn)dg = op(1). The result now follows by using Lemma 2. Lemma 5.  Gn 1 Kn Zn(g)π  ˆβn + g √ n  dg → 1. Proof. Note that by Lemma 4,  GnCn π∗ (g|X)dg =  GnCn 1 Kn Zn(g)π  ˆβn + g √ n  dg  Gn 1 Kn Zn,ˆβn (g)π  ˆβn + g √ n  dg → 0. Hence,  GnCn 1 Kn Zn(g)π  ˆβn + g √ n  dg  Cn 1 Kn Zn(g)π  ˆβn + g √ n  dg +  GnCn 1 Kn Zn(g)π  ˆβn + g √ n  dg → 0. (3.19) Now, by Lemma 3,  Cn 1 Kn Zn(g)π  ˆβn + g √ n  → 1. The result follows by (3.19). We now state and prove the main result of the paper. Theorem 1. Suppose β ∈ Rpn satisfies √ n∥β − ˆβn∥ ≤ p 1 2 + ϵ 6 n for every n. This is equivalent to the assumption that g ∈ Cn. In such a case, under assumptions (A-0)–(A-3), π∗ (g | X) = Npn  g|0, ˆΣn   1 − 1 6n3/2 pn r,s,t=1 n i=1 ψ′′′  zT i ˆβn  gr gsgt zir ziszit + 1 √ n pn v=1 gv  ∇ log π(ˆβn)  v −  1 6n3/2 pn r,s,t=1 n i=1 ψ′′′  zT i ˆβn  gr gsgt zir ziszit   1 √ n pn v=1 gv  ∇ log π(ˆβn)  v  + R(g)  ×  1 − op(1)  , (3.20) where, supg∈Cn R(g) = Op  p 6+ϵ n n  and Npn  g|0, ˆΣn  is a pn-dimensional normal density with mean vector 0 and covariance matrix ˆΣn =  −∇2ln(ˆβn) n −1 evaluated at g. Remark 2. Note that by Lemma 4, the posterior probability that g does not lie in Cn converges to 0.
  • 9. 134 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 Proof. Since ∇ln(ˆβn) = 0, by a fourth order Taylor series expansion around ˆβn, we have: ln  ˆβn + g √ n  − ln(ˆβn) = 1 2n gT ∇2 ln(ˆβn)g + 1 6n3/2 pn r,s,t=1 gr gsgt ∂3 ln(β) ∂βr ∂βs∂βt     β=ˆβn + 1 24n2 pn r,s,t,u=1 gr gsgt gu ∂4 ln(β) ∂βr ∂βs∂βt ∂βu     β=β∗ n = A1(g) + A2(g) + A3(g) (say). (3.21) Here β∗ n = β∗ n(g) is an intermediate point on the line joining ˆβn and (ˆβn + g √ n ). Based on exactly the same argument leading up to (3.6) (in the proof of Lemma 3), P  β∗ n(g) ∈ Θn ∀ g ∈ Gn  → 1, (3.22) as n → ∞. Also, π  ˆβn + g √ n  = π(ˆβn) + 1 √ n gT ∇π(ˆβn) + 1 2n gT ∇2 π(β∗∗ n )g = π(ˆβn)  1 + 1 √ n pn v=1 gv  ∇ log π(ˆβn)  v + 1 2n gT ∇2 π(β∗∗ n )g π(ˆβn)  = π(ˆβn)(1 + B1(g) + B2(g)) (say), (3.23) where β∗∗ n = β∗∗ n (g) is an intermediate point on the line joining ˆβn and (ˆβn + g √ n ). Based on exactly the same argument leading up to (3.6) (in the proof of Lemma 3), P  β∗∗ n (g) ∈ Θn ∀ g ∈ Gn  → 1, (3.24) as n → ∞. We now analyze various terms in (3.21) and (3.23). By the continuity of ψ′′′ and the fact that ˆβn ∈ Θn with probability tending to 1, it follows that max 1≤i≤n |ψ′′′ (zT i ˆβn)| = Op(1). (3.25) Hence, for g ∈ Rpn , |A2(g)| =      1 6n3/2 pn r,s,t=1 gr gsgt ∂3 ln(β) ∂βr ∂βs∂βt     β=ˆβn      =      1 6n3/2 pn r,s,t=1 n i=1 ψ′′′  zT i ˆβn  zir ziszit gr gsgt      ≤ 1 6n3/2 max 1≤i≤n   ψ′′′  zT i ˆβn    n i=1 |zT i g|3 . (3.26) In particular, we get that for g ∈ Cn, |A2(g)| ≤ 1 6n3/2 max 1≤i≤n   ψ′′′  zT i ˆβn    n i=1 (∥g∥ ∥zi∥)3 (∵ By Cauchy–Schwarz) = 1 6n3/2 max 1≤i≤n   ψ′′′  zT i ˆβn    ∥g∥3 n i=1 ∥zi∥3 ≤ max 1≤i≤n   ψ′′′  zT i ˆβn     p 1 2 + ϵ 6 n 3 n(M √ pn)3 6n3/2 . (3.27) The last inequality follows by using (A-0). It follows by (3.27) that sup g∈Cn |A2(g)| = Op  p 3+ ϵ 2 n √ n  . (3.28)
  • 10. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 135 By the continuity of ψ′′′′ and (3.22), it follows that sup g∈Cn max 1≤i≤n |ψ′′′′ (zT i β∗ n)| = Op(1). (3.29) Hence, for g ∈ Gn, |A3(g)| =      1 24n2 pn r,s,t,u=1 gr gsgt gu ∂4 ln(β) ∂βr ∂βs∂βt ∂βu     β=β∗ n      =      1 24n2 pn r,s,t,u=1 n i=1 ψ′′′′  zT i β∗ n  zir ziszit ziugr gsgt gu      ≤ 1 24n2 max 1≤i≤n  ψ′′′′  zT i β∗ n   n i=1 |zT i g|4 . (3.30) In particular, for g ∈ Cn, we get that |A3(g)| ≤ 1 24n2 max 1≤i≤n  ψ′′′′  zT i β∗ n   n i=1 (∥g∥ ∥zi∥)4 = 1 24n2 max 1≤i≤n  ψ′′′′  zT i β∗ n   ∥g∥4 n i=1 ∥zi∥4 ≤ max 1≤i≤n  ψ′′′′  zT i β∗ n    Cp 1 2 + ϵ 6 n 4 n(M √ pn)4 24n2 . (3.31) The last inequality follows by using (A-0). It follows by (3.31) that sup g∈Cn |A3(g)| = Op  p 4+ 2ϵ 3 n n   . (3.32) Next we analyze the second order remainder term in (3.23). Note that ∥ˆβn − β0∥ = Op(  pn n ) by Lemma 1 and supg∈Cn ∥β∗∗ n (g)− ˆβn∥ = Op  p 1 2 +ϵ′ n√ n  as β∗∗ n (g) is an intermediate point on the line joining ˆβn and (ˆβn + g √ n ). Hence, by the triangle inequality, we get that sup g∈Cn ∥β∗∗ n (g) − β0∥ = Op  p 1 2 +ϵ′ n √ n   . (3.33) By (A-3), we get that p 1 2 +ϵ′ n√ n = o  4  pn n  . By (A-2), it follows that sup g∈Cn max 1≤r,s≤pn       1 π(β) ∂2 π(β) ∂βr ∂βs  β=β∗∗ n (g)      = Op(p4 n). (3.34) Note that π(β∗∗ n ) π(ˆβn) = exp  log π(β∗∗ n ) − log(π(ˆβn))  = exp  ∇ log π(β∗∗∗ n ) T  β∗∗ n − ˆβn  , where β∗∗∗ n = β∗∗∗ n (g) is an intermediate point on the line joining β∗∗ n and ˆβn. Hence, sup g∈Cn ∥β∗∗∗ n − ˆβn∥ ≤ sup g∈Cn ∥β∗∗ n − ˆβn∥ = Op  p 1 2 +ϵ′ n √ n   .
  • 11. 136 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 By (A-3), p 1 2 +ϵ′ n√ n = o  4  pn n  . Hence, by Lemma 1 and (A-2), it follows that sup g∈Cn ∥∇ log π(β∗∗∗ n )∥ = Op(p3/2 n ). Hence, sup g∈Cn      π(β∗∗ n ) π(ˆβn)      ≤ sup g∈Cn exp  ∥∇ log π(β∗∗∗ n )∥ ∥β∗∗ n − ˆβn∥  ≤ exp  Op(p3/2 n )Op  p 1 2 +ϵ′ n √ n     = Op(1). (3.35) It follows that |B2(g)| =  1 n gT ∇2 π(β∗∗ n )g     π(ˆβn)    =      π(β∗∗ n ) π(ˆβn)           1 n pn r,s=1  1 π(β) ∂2 π(β) ∂βr ∂βs  β=β∗∗ n gr gs      ≤      π(β∗∗ n ) π(ˆβn)       1 n pn r,s=1       1 π(β) ∂2 π(β) ∂βr ∂βs  β=β∗∗ n      |gr | |gs|  ≤      π(β∗∗ n ) π(ˆβn)       1 n max 1≤r,s≤pn      1 π(β) ∂2 π(β) ∂βr ∂βs     β=β∗∗ n  pn∥g∥2 . (3.36) It follows by (3.34)–(3.36) that sup g∈Cn |B2(g)| = Op  p 9/2+ ϵ 3 n n  . (3.37) Note that by (3.1), (3.21) and (3.23), π∗ (g|X) = N/D, where, N = π(ˆβn) (1 + B1(g) + B2(g)) exp (A1(g) + A2(g) + A3(g)) π(ˆβn)  (2π) pn 2   − ∇2ln(ˆβn) n    1 2  = Npn  g|0, ˆΣn  (1 + B1(g) + B2(g)) exp (A2(g) + A3(g)) = Npn  g|0, ˆΣn  {(1 + B1(g)) (1 + A2(g)) + B2(g)(1 + A2(g) + A3(g)) + (1 + B1(g)) A3(g)} + Npn  g|0, ˆΣn  {(1 + B1(g) + B2(g)) (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g)))} = Npn  g|0, ˆΣn  (N1(g) + N2(g) + N3(g) + N4(g)) , (say), (3.38) and D =  N(g)dg. (3.39) Now, from (3.28), (3.32) and (3.37), it follows that sup g∈Cn N2(g) = sup g∈Cn [B2(g)(1 + A2(g) + A3(g))] = Op  p 9/2+ ϵ 3 n n  . (3.40)
  • 12. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 137 In view of Lemma 1 and (A-2), sup g∈Cn |1 + B1(g)| = 1 + sup g∈Cn 1 √ n pn v=1 |gv(∇ log π(ˆβn))v| ≤ 1 + sup g∈Cn 1 √ n ∥g∥ ∥∇ log π(ˆβn)∥ = 1 + Op  p2+ϵ′ n √ n  = 1 + op(1). (3.41) By (3.32), it follows that sup g∈Cn N3(g) = sup g∈Cn [(1 + B1(g)) A3(g)] = Op  p 4+ 2ϵ 3 n n   . (3.42) By (3.28) and (3.32), sup g∈Cn |A2(g) + A3(g)| ≤ sup g∈Cn |A2(g)| + sup g∈Cn |A3(g)| = Op  p 3+ ϵ 2 n √ n  . It follows by (A-3) that for large enough n, sup g∈Cn (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g))) ≤ sup g∈Cn (A2(g) + A3(g))2 = Op  p6+ϵ n n  . (3.43) It follows from (3.37), (3.41) and (3.43) that sup g∈Cn N4(g) = sup g∈Cn [(1 + B1(g) + B2(g)) (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g)))] = Op  p6+ϵ n n  . (3.44) Let R(g) := N2(g) + N3(g) + N4(g). (3.45) It follows from (3.40), (3.42) and (3.44) that sup g∈Cn (N2(g) + N3(g) + N4(g)) = Op  p6+ϵ n n  . (3.46) By (3.39) and Lemma 5, D =  Gn 1 Kn Zn(g)π  ˆβn + g √ n  dg = 1 + op(1). Thus, π∗ (g|X) = N/D = Npn  g|0, ˆΣn   1 + 1 6n3/2 pn r,s,t=1 gr gsgt ∂3 ln(β) ∂βr ∂βs∂βt     β=ˆβn  ×  1 + 1 √ n pn v=1 gv  ∇ log π(ˆβn)  v  + R(g)   1 − op(1)  = Npn  g|0, ˆΣn   1 + 1 6n3/2 pn r,s,t=1 gr gsgt ∂3 ln(β) ∂βr ∂βs∂βt     β=ˆβn + 1 √ n pn v=1 gv  ∇ log π(ˆβn)  v +  1 6n3/2 pn r,s,t=1 gr gsgt ∂3 ln(β) ∂βr ∂βs∂βt     β=ˆβn   1 √ n pn v=1 gv  ∇ log π(ˆβn)  v  + R(g)   1 − op(1) 
  • 13. 138 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 = Npn  g|0, ˆΣn   1 − 1 6n3/2 pn r,s,t=1 n i=1 ψ′′′  zT i ˆβn  gr gsgt zir ziszit + 1 √ n pn v=1 gv  ∇ log π(ˆβn)  v −  1 6n3/2 pn r,s,t=1 n i=1 ψ′′′  zT i ˆβn  gr gsgt zir ziszit   1 √ n pn v=1 gv  ∇ log π(ˆβn)  v  + R(g)   1 − op(1)  , where supg∈Cn R(g) = Op  p 6+ϵ n n  . Remark 3. Note that we have extended the first order results in [6] to a third order correct posterior expansion by requiring stronger growth restrictions on pn. A natural question that arises is whether one can obtain a second order correct expansion with weaker restrictions on the growth on pn. However, we have not considered second order correct expansions for two reasons. Firstly, the derivation of a moment matching prior, which is the application that we consider in Section 4, requires a third order correct asymptotic expansion. Secondly, the proof of Lemma 3 in the paper uses the assumption that p6+ϵ n /n → 0 (see (3.9)). We would still need Lemma 3 to establish a second order correct posterior expansion. Therefore, establishing a second order correct expansion would still require the same growth restriction (assuming the other conditions in (A-0), (A-1), (A-2) and (A-3) are left unchanged). Remark 4. Bontemps [3] establishes posterior consistency under Gaussianity, by relaxing the restrictions in [6,7] in several ways. However, the arguments in the proof of Bontemps’ results (in particular Theorems 1 and 2 in Bontemps’ paper) rely heavily on Gaussianity. We have made efforts to adapt them for other models, but have not been successful so far. 3.1. Posterior expansion for the uniformly bounded case We can consider the case when pn is uniformly bounded, and obtain an expansion of the posterior density parallel to (3.20). The fact that pn is uniformly bounded allows a slightly finer analysis of the terms in the expansion, which is useful when deriving moment matching priors in Section 4. Firstly, we note that Lemmas 3–5 can be established by the same set of arguments, by using, for example, Cn =  g : ∥g∥ < n 1 6+ϵ  instead of Cn =  g : ∥g∥ < p 1 2 + ϵ 6 n  . Henceforth, in this subsection, it will be assumed that Cn =  g : ∥g∥ < n 1 6+ϵ  . It can be easily seen by repeating appropriate steps in the proof of Theorem 1 that in this case |A2(g)| = Op  ∥g∥3 √ n  , (3.47) for every g ∈ Rpn , and |A3(g)| = Op  ∥g∥4 n  . (3.48) for every g ∈ Gn. To clarify notation, (3.47) means that |A2(g)| is ∥g∥3 times a quantity which is independent of g, and is Op  1√ n  . Since Θn is a compact set, it follows by (3.24) and the twice continuous differentiability of π(·) that each entry of ∇π(ˆβn) and ∇2 π(β∗∗ n ) is bounded above in probability. Also, by (A-2), it follows that π(ˆβ) is bounded below in probability. Combining (3.23) with the above facts gives us |B1(g)| = Op  ∥g∥ √ n  , (3.49) for every g ∈ Rpn and |B2(g)| = Op  ∥g∥2 n  (3.50) for every g ∈ Gn. It follows that sup g∈Cn |B1(g)| = Op  1 n 4+ϵ 12+2ϵ  sup g∈Cn |B2(g)| = Op  1 n 4+ϵ 6+ϵ 
  • 14. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 139 sup g∈Cn |A2(g)| = Op  1 n ϵ 12+2ϵ  sup g∈Cn |A3(g)| = Op  1 n 2+ϵ 6+ϵ  . (3.51) It follows by (3.38) (along with the arguments following it, adjusted for the fact that pn is uniformly bounded and for the new choice of Cn) and (3.51) that for every g ∈ Cn, π∗ (g|X) = Npn  g|0, ˆΣn   1 − 1 6n3/2 pn r,s,t=1 n i=1 ψ′′′  zT i ˆβn  gr gsgt zir ziszit + 1 √ n pn v=1 gv  ∇ log π(ˆβn)  v −  1 6n3/2 pn r,s,t=1 n i=1 ψ′′′  zT i ˆβn  gr gsgt zir ziszit   1 √ n pn v=1 gv  ∇ log π(ˆβn)  v  + R(g)  ×  1 − op(1)  , (3.52) and |R(g)| = Op  ∥g∥6 n  . (3.53) Note that (3.52) is identical to (3.20). However, the order of the remainder term is different for the two settings. In the setting for (3.20), we have supg∈Cn R(g) = Op  p 6+ϵ n n  . However, in the current setting, sup g∈Cn |R(g)| = Op  1 n ϵ 6+ϵ  . (3.54) Note that even in this case, the posterior probability of the set Cn converges to 1 as n → ∞. We conclude this section by noting that if g is fixed (or ∥g∥ is uniformly bounded as n → ∞) then the order of the leading terms B1(g) and A2(g) is 1√ n (as can be seen from (3.47) and (3.49)), while the order of the remainder term R(g) is 1 n (as can be seen from (3.53)). However, if one is looking for bounds uniformly over g ∈ Cn then the orders can be obtained from (3.51) and (3.54). 4. Moment matching prior A moment matching prior (introduced by Ghosh and Liu [12]) is an objective prior for which the posterior mean matches with the maximum likelihood estimator up to a high order of approximation. Ghosh and Liu [12] provide several examples where they derive a moment matching prior using third order correct posterior expansions. In particular, they consider the case with i.i.d. observations from a multi-parameter natural exponential family (with fixed p), and prove that the moment matching prior in this situation can be uniquely determined, and in fact corresponds to Jeffreys’ general rule prior. However, they did not consider the more complicated GLM setting. In this section we use the expansion in Theorem 1 to obtain moment matching priors in the context of GLM with canonical link function (both when pn is uniformly bounded, and when pn is unbounded). We will in fact show that the moment matching prior can be uniquely identified in this situation, and corresponds to the Jeffreys’ general rule prior. In other words, Jeffreys’ general rule prior is the only prior which satisfies the moment matching condition in the current GLM setup. We may add here that conditions for the propriety and existence of moments for Jeffreys’ prior in the GLM setup (as well as the resulting posterior) have been addressed in [15]. The analysis in the current setup will be based on examining the posterior expectation of the quantity β − ˆβn. Note that the remainder term in most posterior expansions (including the ones used in [12] and in this paper) is not uniformly bounded in the variable used in the expansion (for example g in our setup) if we do not restrict to an appropriate set (such as Cn in our setup). In fact, to show that the expected value (with respect to the posterior distribution) of the remainder term is appropriately small, one has to restrict the computation of the expected value over a set such as Cn. Ghosh and Liu [12] take a somewhat heuristic approach in their derivations, and do not take this issue into account. We undertake a more rigorous approach to address this issue as follows. The computation of the posterior expectation for deriving the moment matching prior will be restricted to the region Cn on which the appropriate bounds for the remainder term in the expansion are valid. 4.1. The uniformly bounded case We first consider the case when the number of regressors pn is uniformly bounded. Note that the expansion in (3.52) holds in this case with Cn = {g : ∥g∥ < n 1 6+ϵ }. The moment matching criterion of Ghosh and Liu [12] (with the modification discussed above) dictates that the prior π(·) be chosen such that the posterior expectation Eπ(·|X)  (β − ˆβn)1{ √ n(β−ˆβn)∈Cn} 
  • 15. 140 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 converges to zero faster than 1 n . It follows by the expansion in (3.52) that Eπ(·|X)  (β − ˆβn)1{ √ n(β−ˆβn)∈Cn}  = 1 √ n  Cn gπ∗ (g | X) dg = 1 − op(1) √ n  Cn gNpn  g|0, ˆΣn   1 + B1(g) + A2(g) + B1(g)A2(g) + R(g)  dg. (4.1) By (A-0) and (2.10), it follows that the eigenvalues of ˆΣn are uniformly bounded above (with probability tending to 1). Since pn is uniformly bounded, for any subset S of Rpn , we get  S ∥g∥k Npn  g|0, ˆΣn  dg ≤  Rpn ∥g∥k Npn  g|0, ˆΣn  dg = Op(pk/2 n ) = Op(1) (4.2) for every fixed k ∈ N. It follows by (3.47), (3.49) and (3.53) that  Cn ∥g∥Npn  g|0, ˆΣn   B1(g)A2(g) + R(g)  dg = Op  1 n  . (4.3) Recall that Cc n refers to Rpn Cn. A simple application of Markov’s inequality, along with (3.12), (3.14), (4.2) and the uniform boundedness of pn yields that  Cc n ∥g∥k Npn  g | 0, ˆΣn  dg ≤  Gc n ∥g∥k Npn  g | 0, ˆΣn  dg +   g:∥g∥>n 1 6+ϵ  ∥g∥k Npn  g | 0, ˆΣn  dg ≤  Op  1 n  + 1 n 2 6+ϵ   Rpn ∥g∥k+2 Npn  g | 0, ˆΣn  dg = Op  1 n 2 6+ϵ  (4.4) for every fixed k ∈ N. It follows by (3.47) and (3.49) that  Cc n ∥g∥ |A2(g)|Npn  g|0, ˆΣn  dg = Op  1 n 1 2 + 2 6+ϵ  , (4.5) and  Cc n ∥g∥ |B1(g)|Npn  g|0, ˆΣn  dg = Op  1 n 1 2 + 2 6+ϵ  . (4.6) Note that  Cn gNpn  g|0, ˆΣn  dg +  Cc n gNpn  g|0, ˆΣn  dg =  Rpn gNpn  g|0, ˆΣn  dg = 0. (4.7) Another application of Markov’s inequality along the lines of (4.4) (but by increasing the moment by 6 instead of 2) gives  Cc n ∥g∥Npn  g | 0, ˆΣn  dg ≤  Op  1 n3  + 1 n 6 6+ϵ   Rpn ∥g∥7 Npn  g | 0, ˆΣn  dg = Op  1 n 6 6+ϵ  . (4.8) It follows from (4.7) and (4.8) that  Cn gNpn  g|0, ˆΣn  dg = Op  1 n 6 6+ϵ  . (4.9) Here, when we say that a vector x is Op(cn), we mean that ∥x∥ is Op(cn). By (4.1), (4.3), (4.5), (4.6) and (4.9) we get Eπ(·|X)  (β − ˆβn)1{ √ n(β−ˆβn)∈Cn}  = 1 − op(1) √ n  Cn gNpn  g|0, ˆΣn   1 + B1(g) + A2(g)  dg + Op  1 n 3 2  = 1 − op(1) √ n  Rpn gNpn  g|0, ˆΣn   B1(g) + A2(g)  dg + Op  1 n 1+ 2 6+ϵ  . (4.10)
  • 16. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 141 We now simplify the integral in (4.10). Note that  Rpn gB1(g)Npn  g|0, ˆΣn  dg = 1 √ n  Rpn ggT Npn  g|0, ˆΣn  dg  ∇ log π(ˆβn) = 1 √ n ˆΣn∇ log π(ˆβn). (4.11) Note that by Isserlis’ formula for joint moments of a multivariate normal distribution, we get that for any 1 ≤ j, r, s, t ≤ pn,  Rpn gjgr gsgt Npn  g|0, ˆΣn  dg = ˆΣn,jr ˆΣn,st + ˆΣn,js ˆΣn,rt + ˆΣn,jt ˆΣn,rs. (4.12) Let In(β) = 1 n n i=1 ψ′′  zT i β  zizT i , the information matrix evaluated at β. It follows by the definition of A2(g) and (4.12) that  Rpn gjA2(g)Npn  g|0, ˆΣn  dg = − 1 6 √ n pn r,s,t=1  ˆΣn,jr ˆΣn,st + ˆΣn,js ˆΣn,rt + ˆΣn,jt ˆΣn,rs  An,r,s,t , (4.13) where An,r,s,t = 1 n n i=1 ψ′′′  zT i ˆβn  zir ziszit = ∂ ∂βr (In(β))st     β=ˆβn . (4.14) Note that ∂ ∂βr log |In(β)| = pn s,t=1 (In(β)−1 )st ∂ ∂βr (In(β))st . (4.15) By the symmetry in r, s, t on the right hand side of (4.13), it follows that  Rpn gjA2(g)Npn  g|0, ˆΣn  dg = − 1 2 √ n pn r,s,t=1 ˆΣn,jr ˆΣn,st An,r,s,t . (4.16) Combining (4.14)–(4.16) along with the fact In(ˆβn) = ˆΣ−1 n , we get that  Rpn gA2(g)Npn  g|0, ˆΣn  dg = − 1 2 √ n ˆΣn∇ log |In(ˆβn)|. (4.17) It follows by (4.10), (4.11) and (4.17) that to ensure Eπ(·|X)  (β − ˆβn)1{ √ n(β−ˆβn)∈Cn}  = Op  1 n 1+ 2 6+ϵ  , the prior density π(·) should satisfy ˆΣn∇ log π(ˆβn) − 1 2 ˆΣn∇ log |In(ˆβn)| = 0. (4.18) Note that the maximum likelihood estimator ˆβn satisfies ∥ˆβn −β0∥ P → 0 as n → ∞. To ensure that (4.18) holds irrespective of the true β0, we require ˆΣn∇ log π(β) − 1 2 ˆΣn∇ log |In(β)| = 0 (4.19) for every β. Since ˆΣn is a positive definite matrix with probability tending to 1, it follows that (4.19) holds if and only if π(β) ∝ |In(β)| 1 2 . To ensure that the assumptions in (A-2) hold, we choose π(β) = Cn|In(β)| 1 2 , (4.20) where Cn is chosen such that  Θn π(β)dβ = 1. Since ψ is infinitely differentiable, and Θn is a compact set, it follows by (A-0) that the eigenvalues of In(β) are uniformly bounded (above and below) over β ∈ Θn and n ∈ N. Since pn is uniformly bounded, it follows that π(β) is uniformly bounded (above and below) over β ∈ Θn and n ∈ N. Since ψ is infinitely differentiable, it also follows in particular that π(·) is twice continuously differentiable. Since Θn is a compact set, it follows that all the first and second order derivatives of π(·) are uniformly bounded above over Θn and n. All these facts combined together imply that π(·) satisfies the assumptions in (A-2).
  • 17. 142 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 4.2. The unbounded case We now consider the case when pn → ∞ and p6+ϵ n /n → 0 as n → ∞. For the moment matching prior derivation in this case, (a) we will assume that there exists an α > 0 such that pn/nα → ∞ as n → ∞, and (b) we will replace the assumption that ∥zi∥ ≤ M √ pn for every 1 ≤ i ≤ n in (A-0), by the stronger assumption |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn. Note that the M does not depend on n. Recall that the posterior expansion (3.20) in this case holds with Cn = {g : ∥g∥ < p 1/2+ϵ/6 n }. The basic technique for deriving the moment matching prior remains the same as in the uniformly bounded case. However, the order of various terms used in the analysis is different as compared to the uniformly bounded case. Hence, this case is more complex, and needs a more careful consideration of all the relevant terms. Note again that by (A-0) and (2.10), tr( ˆΣn) = Op(pn). By the analysis leading to (3.31) it follows that  Cn ∥g∥ |A3(g)|Npn  g|0, ˆΣn  dg = Op  1 n2  n i=1  Cn ∥g∥  zT i g  4 Npn  g|0, ˆΣn  dg = Op  1 n2  n i=1  Cn  ∥g∥5 +  zT i g  5  Npn  g|0, ˆΣn  dg. (4.21) The previous step follows from an application of Minkowski’s inequality, in particular, |ab| ≤ aq q + b˜q ˜q for every a, b ∈ R, with q = 5 and ˜q = 5 4 . Note that by (A-0), zT i ˆΣnzi = Op(pn). It follows by (4.21) that  Cn ∥g∥ |A3(g)|Npn  g|0, ˆΣn  dg = Op  p 5/2 n n  . (4.22) By very similar arguments which use the analysis leading up to (3.31), (3.26), (3.36) and (3.41) respectively, it can be established that  Cn ∥g∥(A3(g))2 Npn  g|0, ˆΣn  dg = Op  p 9/2 n n2  (4.23)  Cn ∥g∥(A2(g))2 Npn  g|0, ˆΣn  dg = Op  p 7/2 n n  (4.24)  Cn ∥g∥ |B2(g)|Npn  g|0, ˆΣn  dg = Op  p5 n n  (4.25)  Cn ∥g∥ |A2(g)B1(g)|Npn  g|0, ˆΣn  dg = Op  p 7/2 n n  . (4.26) It follows by the definition of R(g) in (3.45), and by (4.22)–(4.26) that  Cn ∥g∥(R(g) + A2(g)B1(g))Npn  g|0, ˆΣn  dg = Op  p5 n n  . (4.27) A simple application of Markov’s inequality along with (3.12) and (3.14) implies that  Cc n ∥g∥ |B1(g)|Npn  g|0, ˆΣn  dg = 1 √ n  Gc n ∥g∥      ∇ log π(ˆβn) T g     Npn  g|0, ˆΣn  dg + 1 √ n  {g:∥g∥>p 1/2+ϵ/6 n } ∥g∥      ∇ log π(ˆβn) T g     Npn  g|0, ˆΣn  dg ≤ 1 √ n      1 n  K′ − sup 1≤i≤n   zT i ˆβn    2 + 1 p 1+ϵ/3 n       Rpn ∥g∥3      ∇ log π(ˆβn) T g     Npn  g|0, ˆΣn  dg = Op  1 √ np 1+ϵ/3 n   Rpn  ∥g∥6 +      ∇ log π(ˆβn) T g     2  Npn  g|0, ˆΣn  dg. (4.28)
  • 18. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 143 Note that by (A-0), (A-2) and (2.10), we get  ∇ log π(ˆβn) T ˆΣn∇ log π(ˆβn) = Op(p3 n). It follows from (4.28) that  Cc n ∥g∥ |B1(g)|Npn  g|0, ˆΣn  dg = Op  p 2−ϵ/3 n √ n  . (4.29) By a similar argument, it can be established that  Cc n ∥g∥ |A2(g)|Npn  g|0, ˆΣn  dg = Op  p 2−ϵ/3 n √ n  . (4.30) Recall that there exists α > 0 such that pn/nα → ∞ as n → ∞. Let α∗ = max  6 ϵ  1 α + 1 2  , 4  . An application of Markov’s inequality along the lines of (4.4) (but by increasing the moment by α∗ instead of 2) along with the fact that pn = o(n1/6 ) gives  Cc n ∥g∥Npn  g | 0, ˆΣn  dg ≤  Op  p α∗/2 n nα∗/2  + 1 p α∗/2+(α∗ϵ)/6 n   Rpn ∥g∥1+α∗ Npn  g | 0, ˆΣn  dg = Op  p 1/2+α∗ n nα∗/2  + Op  p (1+α∗)/2 n p α∗/2+(α∗ϵ)/6 n  = Op  1 n  . (4.31) Since p6+ϵ n /n → 0, it follows that p5 n n3/2 = op  p 2−ϵ/3 n n  . Using this fact along with (3.20), (4.7), (4.27), (4.29), (4.30) and (4.31), we get that Eπ(·|X)  (β − ˆβn)1{ √ n(β−ˆβn)∈Cn}  = 1 − op(1) √ n  Cn gNpn  g|0, ˆΣn   1 + B1(g) + A2(g)  dg + Op  p5 n n 3 2  = 1 − op(1) √ n  Rpn gNpn  g|0, ˆΣn   B1(g) + A2(g)  dg + Op  p 2−ϵ/3 n n  . (4.32) Using exactly the same arguments following (4.10) in the uniformly bounded case, it follows that to obtain Eπ(·|X)  (β − ˆβn)1{ √ n(β−ˆβn)∈Cn}  = Op  p 2−ϵ/3 n n  irrespective of the true value β0, we must have π(β) ∝ |In(β)| 1 2 . Hence, the moment matching prior (up to order p 2−ϵ/3 n /n) is given by π(β) = Cn|In(β)| 1 2 , (4.33) where Cn is chosen such that  Θn π(β)dβ = 1 (note that such a choice of Cn is possible because Θn is a compact set, and π(·) is a continuous function). Since ψ′′ is a strictly convex continuous function, it follows by the definition of Θn that ψ′′ (zT i β) is uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N. Hence, by (A-0), all the eigenvalues of In(β) are uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N. It follows that |In(β)|1/pn is uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N, which immediately implies that C 1/pn n is uniformly bounded (away from both zero and infinity) over n ∈ N. It follows by (4.33) that there exists η0 > 0 (not depending on n) such that π(β) > ηpn 0 for every β ∈ Θn. Note that the dependence of π on n has been suppressed for simplicity of exposition.
  • 19. 144 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 We now verify that the prior density π(·) in (4.33) satisfies assumptions (2.7) and (2.8). Since ψ is infinitely differentiable, it follows in particular that π(·) is twice continuously differentiable. Note that ∂ ∂βr log π(β) = 1 2 ∂ ∂βr log |In(β)| = 1 2 pn s,t=1 (In(β)−1 )st ∂ ∂βr (In(β))st . (4.34) Let K2 = sup x∈[−K′,K′] |ψ′′′ (x)| and recall that ∂ ∂βr (In(β))st = 1 n pn s,t=1 ψ′′′ (zT i β)zir ziszit . It follows by (A-0), (4.34) and |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn (see the first paragraph of this subsection) that 2K2M n An + ∂ ∂βr In(β) and 2K2M n An − ∂ ∂βr In(β) are both positive definite (4.35) for every 1 ≤ r ≤ pn. It follows by (4.34) and (4.35) that     ∂ ∂βr log π(β)     = 1 2    tr  In(β)−1 ∂ ∂βr In(β)     ≤ tr  In(β)−1 K2M n An  ≤ pnλmin (In(β)) λmax  K2M n An  . (4.36) Since the eigenvalues of In(β) are uniformly bounded below over β ∈ Θn and n ∈ N, it follows by (A-0) and (4.36) that there exists M1 (independent of β and n) satisfying ∥∇ log π(β)∥ < M1p3/2 n (4.37) for every β ∈ Θn and n ∈ N. Hence (2.7) is satisfied. Let 1 ≤ j, j′ ≤ pn be arbitrarily chosen. Then 1 π(β)  ∂2 π(β) ∂βj∂βj′  = ∂2 ∂βj∂βj′ log π(β) +  ∂ ∂βj log π(β)   ∂ ∂βj′ log π(β)  . (4.38) It follows by (4.34) that     ∂2 ∂βj∂βj′ log π(β)     = 1 2    tr  ∂ ∂βj In(β)−1   ∂ ∂βj′ In(β)  + tr  In(β)−1 ∂2 ∂βj∂βj′ In(β)     = 1 2    tr  In(β)−1  ∂ ∂βj In(β)  In(β)−1  ∂ ∂βj′ In(β)  + tr  In(β)−1 ∂2 ∂βj∂βj′ In(β)     . (4.39) Note that ∂2 ∂βj∂βj′ (In(β))st = 1 n n i=1 ψ′′′′ (zT i β)zijzij′ ziszit . (4.40) Let K3 = sup x∈[−K′,K′] |ψ′′′′ (x)|. It follows by (A-0), (4.40) and |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn (see the first paragraph of this subsection) that 2K3M2 n An + ∂2 ∂βj∂βj′ In(β) and 2K3M2 n An − ∂2 ∂βj∂βj′ In(β) are both positive definite (4.41) for every 1 ≤ r ≤ pn. It follows by (4.35), (4.39) and (4.41) that     ∂2 ∂βj∂βj′ log π(β)     ≤ 2K2 2 M2 tr  In(β)−1  An n  In(β)−1  An n  + K3M2 tr  In(β)−1 An n  . (4.42)
  • 20. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 145 Since the eigenvalues of In(β) are uniformly bounded below over β ∈ Θn and n ∈ N, it follows by (A-0), (4.36), (4.38) and (4.42) that there exists M2 (independent of β and n) satisfying max 1≤j,j′≤pn     ∂2 ∂βj∂βj′ log π(β)     ≤ M2p2 n. Hence (2.8) is satisfied. Acknowledgments We would like to thank Prof. Subhashis Ghosal for his help with the paper. Thanks are also due to the Associate Editor and a referee for their useful comments. Appendix Multivariate normal distribution satisfies assumptions in (2.7) and (2.8) Suppose we put a normal prior on β, i.e., β ∼ Npn (µ, A). We assume that ∥µ∥ = O( √ pn) and ∥A−1 ∥ = O( √ pn). Note that ∇ log π(β) = −A−1 (β − µ). Hence, ∥∇ log π(β)∥ ≤ ∥A−1 ∥ ∥β − µ∥. (A.1) Also, ∥β − µ∥ ≤ ∥β − β0∥ + ∥β0∥ + ∥µ∥ ≤ ∥β − β0∥ +  1 C1    βT 0  1 n n i=1 zizT i  β0 + ∥µ∥ (by assumption (A-0)) ≤ ∥β − β0∥ +  1 C1    1 n n i=1 (zT i β0)2 + ∥µ∥ ≤ ∥β − β0∥ + K √ C1 + ∥µ∥ (by assumption (A-1)) = ∥β − β0∥ + O( √ pn). (A.2) It follows from (A.1), (A.2) and the assumption on µ and A, that sup ∥β−β0∥≤Cn ∥∇ log π(β)∥ = O(pn), where, Cn = 4  pn n . Note that for 1 ≤ j, j′ ≤ pn,     1 π(β)  ∂2 π(β) ∂βj∂βj′     =      −(A−1 )jj′ +  pn k=1 (A−1 )jk(βk − µk)   pn k=1 (A−1 )j′k(βk − µk)      ≤ ∥A−1 ∥ +  ∥A−1 ∥ ∥β − µ∥ 2 . (A.3) It follows from (A.2) and (A.3) that sup ∥β−β0∥≤Cn max 1≤j,j′≤pn     1 π(β)  ∂2 π(β) ∂βj∂βj′     = O(p2 n), where, Cn = 4  pn n .
  • 21. 146 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 Multivariate t distribution satisfies assumptions in (2.7) and (2.8) Suppose we put a t-prior on β, i.e., β ∼ tγ (µ, A). Here tγ (µ, A) denotes the multivariate t distribution with parameters γ , µ and A. We take γ to be independent of n, but allow µ = µn and A = An to vary with n (the dependence on n is suppressed henceforth for simplicity of exposition). The density of this distribution is proportional to  1 + 1 γ (β − µ)T A−1 (β − µ) −(γ +pn)/2 . We assume that ∥A−1 ∥ = O( √ pn). Now, ∇ log π(β) = π′ (β) π(β) = −  γ + pn γ  A−1 (β − µ)  1 + 1 γ (β − µ)T A−1(β − µ) . Thus ∥∇ log π(β)∥ =  γ + pn γ   (β − µ)T A−2(β − µ)  1 + 1 γ (β − µ)T A−1(β − µ)  ≤ O  γ + pn γ   ∥A−1∥  (β − µ)T A−1(β − µ)  1 + 1 γ (β − µ)T A−1(β − µ)  ≤ O(p5/4 n ). Now, let A−1 = ((aij )). By straightforward manipulations, we get 1 π(β) ∂2 π(β) ∂βj∂βj′ = 1 4γ 2 (γ + pn)(γ + pn + 2) pn k=1 akj (βk − µk) pn k=1 akj′ (βk − µk)  1 + 1 γ pn k,l=1 akl(βk − µk)(βl − µl) 2 − 1 2γ (γ + pn) ajj′  1 + 1 γ pn k,l=1 akl(βk − µk)(βl − µl)  (a) ≤ O(p2 n) ((β − µ)T A−1 )j((β − µ)T A−1 )j′ (1 + (β − µ)T A−1(β − µ))2 + O(p3/2 n ) (b) ≤ O(p2 n) (β − µ)T A−2 (β − µ) (1 + (β − µ)T A−1(β − µ))2 + O(p3/2 n ) (c) ≤ O(p2 n) ∥A−1 ∥(β − µ)T A−1 (β − µ) (1 + (β − µ)T A−1(β − µ))2 + O(p3/2 n ) = O(p5/2 n ), where (a) and (c) follow from the assumption that ∥A−1 ∥ = O( √ pn), and (b) follows since ((β − µ)T A−1 )j ≤ (β − µ)T A−2(β − µ), ∀j = 1 . . . pn. Proof of Lemma 1 Let αn =  pn n . We will show that for any given ϵ, there exists a constant C such that P  sup ∥u∥=C ln(β0 + αnu) < ln(β0)  ≥ 1 − ϵ (A.4) for large enough n. This will imply, with probability tending to 1, that the unique maximum ˆβn lies in the ball  β0 + αnu : ∥u∥ ≤ C  , i.e., ∥ˆβn − β0∥ = Op(αn).
  • 22. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 147 Note that, ln(β0 + αnu) − ln(β0) = αn n i=1 XizT i u −  n i=1 ψ(zT i (β0 + αnu)) − ψ(zT i β0)  = αn n i=1  Xi − ψ′ (zT i β0)  zT i u − α2 n 2 n i=1 ψ′′ (zT i β0)(zT i u)2 − α3 n 6 n i=1 ψ′′′ (θ∗ i )(zT i u)3 = I1 + I2 + I3, say, where θ∗ i lies between zT i β0 and zT i (β0 + αnu), for every 1 ≤ i ≤ n. Note that by (A-1), zT i β0 is uniformly bounded (over i and n) and ψ′′ (·) is a continuous function. Hence, ψ′′ (zT i β0) is also uniformly bounded (over i and n) by say, K1. It follows that E  n i=1  Xi − ψ′ (zT i β0)  zT i u 2 = n i=1 E  (Xi − ψ′ (zT i β0))2 (zT i u)2  ,  ∵ Xi’s are independent and E[Xi] = ψ′ (zT i β0)  = n i=1 (zT i u)2 ψ′′ (zT i β0)  ∵ E  (Xi − ψ′ (zT i β0))2  = ψ′′ (zT i β0)  ≤ K1 n i=1 (zT i u)2 ≤ nK1uT  n i=1 zizT i n  u ≤ nK1C2∥u∥2 . The last step follows by (A-0). Hence E n i=1  Xi − ψ′ (zT i β0)  zT i u 2 = O(n)∥u∥2 . Thus, I1 = Op(αn √ n)∥u∥ = Op( √ pn)∥u∥. (A.5) Note that ψ is a strictly convex function and hence ψ′′ (·) > 0. Since ψ′′ is continuous, it follows that its infimum on a bounded interval is strictly positive. By (A-1), zT i β0 is uniformly bounded. This implies ψ′′ (zT i β0) is uniformly bounded below by a positive constant, say K2. Hence I2 = − α2 n 2 n i=1 ψ′′ (zT i β0)(zT i u)2 ≤ −K2 α2 n 2 n i=1 (zT i u)2 = −K2 α2 n 2 nuT  n i=1 zizT i /n  u < 0, by (A-0). Also, by (A-0) and the arguments above |I2| ≥ K2 α2 n 2 nuT  n i=1 zizT i /n  u ≥ K2 α2 n 2 nC1∥u∥2 = C1K2pn∥u∥2 . (A.6)
  • 23. 148 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 Now, since θ∗ i lies between zT i β0 and zT i (β0 + αnu), it follows by (A-0) and (A-1) that |θ∗ i | < max 1≤i≤n  |zT i β0|, |zT i (β0 + αnu)|  < max 1≤i≤n  K, K + αn|zT i u|  ≤ K + max 1≤i≤n αn∥zi∥ ∥u∥ ≤ K +  pn n O( √ pn)∥u∥ = K + O  pn √ n  ∥u∥. Hence ψ′′′ (θ∗ i ) is uniformly bounded by, say K3. Thus, |I3| =      α3 n 6 n i=1 ψ′′′ (θ∗ i )(zT i u)3      ≤ K3 α3 n 6 n i=1 |(zT i u)3 | ≤ K3 p 3/2 n 6n3/2 n i=1 (∥zi∥ ∥u∥)3 = K3M3/2 p3 n 6 √ n ∥u∥3 . (A.7) The last step follows by (A-0). Since p6 n/n → 0 as n → ∞, it follows by (A.5)–(A.7) that the order of I2 dominates the orders of I1 and I3 (for a suitable choice of ∥u∥). Since I2 is negative, the assertion in (A.4) holds. References [1] A. Barron, M. Schervish, L. Wasserman, The consistency of posterior distributions in nonparametric problems, Ann. Statist. 27 (1999) 536–561. [2] S. Bernstein, Theory of Probability, 1917 (in Russian). [3] D. Bontemps, Bernstein–von Mises theorems for Gaussian regression with increasing number of regressors, Ann. Statist. 39 (2011) 2557–2584. [4] M. Crowder, Asymptotic expansions of posterior expectations, distributions and densities for stochastic processes, Ann. Inst. Statist. Math. 40 (1988) 297–309. [5] J. Fan, H. Peng, Nonconcave penalized likelihood with a diverging number of parameters, Ann. Statist. 32 (2004) 928–961. [6] S. Ghosal, Normal approximation to the posterior distribution for generalized linear models with many covariates, Math. Methods Statist. 6 (1997) 332–348. [7] S. Ghosal, Asymptotic normality of posterior distributions in high dimensional linear models, Bernoulli 5 (1999) 315–331. [8] S. Ghosal, Asymptotic normality of posterior distributions for exponential families with many parameters, J. Multivariate Anal. 74 (2000) 49–69. [9] S. Ghosal, J. Ghosh, A. van der Vaart, Convergence rates of posterior distributions, Ann. Statist. 28 (2000) 500–531. [10] S. Ghosal, T. Samanta, Asymptotic expansions of posterior distributions in nonregular cases, Ann. Inst. Statist. Math. 49 (1997) 181–197. [11] M. Ghosh, Objective priors: an introduction for frequentists (with discussion), Statist. Sci. 26 (2011) 187–211. [12] M. Ghosh, R. Liu, Moment matching priors, Sankhy¯a 73-A (2011) 185–201. [13] J.K. Ghosh, B.K. Sinha, S.N. Joshi, Expansion for posterior probability and integrated Bayes risk, in: S.S. Gupta, J.O. Berger (Eds.), Statistical Decision Theory and Related Topics III, Academic Press, 1982, pp. 403–456. [14] S.J. Haberman, Maximum likelihood estimates in exponential response models, Ann. Statist. 5 (1977) 815–841. [15] J.G. Ibrahim, P.W. Laud, On Bayesian analysis of generalized linear models using Jeffreys’s prior, J. Amer. Statist. Assoc. 86 (1991) 981–986. [16] R.A. Johnson, On asymptotic expansion for posterior distribution, Ann. Math. Statist. 38 (1967) 1899–1906. [17] R.A. Johnson, Asymptotic expansions associated with posterior distribution, Ann. Math. Statist. 42 (1970) 1241–1253. [18] H. Liang, P. Du, Maximum likelihood estimation in logistic regression models with a diverging number of covariates, Electron. J. Stat. 6 (2012) 1838–1846. [19] S. Portnoy, Asymptotic behavior of M-estimators of p regression parameters when p2 /n is large. I: Consistency, Ann. Statist. 12 (1984) 1298–1309. [20] S. Portnoy, Asymptotic behavior of M-estimators of p regression parameters when p2 /n is large. II: Normal approximation, Ann. Statist. 13 (1985) 1403–1417. [21] S. Portnoy, Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity, Ann. Statist. 16 (1988) 356–366. [22] A.M. Walker, On the asymptotic behavior of the posterior distribution, J. R. Stat. Soc. Ser. B 26 (1969) 80–88. [23] C.M. Zhang, Y. Jiang, Y. Chai, Penalized Bregman divergence for large-dimensional regression and classification, Biometrika 97 (2011) 551–566.