JMVA_Paper_Shibasish

Journal of Multivariate Analysis 131 (2014) 126–148
Contents lists available at ScienceDirect
Journal of Multivariate Analysis
journal homepage: www.elsevier.com/locate/jmva
Asymptotic expansion of the posterior density in high
dimensional generalized linear models
Shibasish Dasgupta, Kshitij Khare, Malay Ghosh∗
University of Florida, United States
a r t i c l e i n f o
Article history:
Received 12 July 2013
Available online 21 June 2014
AMS 2010 subject classification:
62F15
Keywords:
Asymptotic expansion of the posterior
Generalized linear models
Canonical link function
High dimensional inference
Moment matching priors
a b s t r a c t
While developing a prior distribution for any Bayesian analysis, it is important to check
whether the corresponding posterior distribution becomes degenerate in the limit to
the true parameter value as the sample size increases. In the same vein, it is also
important to understand a more detailed asymptotic behavior of posterior distributions.
This is particularly relevant in the development of many nonsubjective priors. The present
paper focuses on asymptotic expansions of posteriors for generalized linear models with
canonical link functions when the number of regressors grows to infinity at a certain rate
relative to the growth of the sample size. These expansions are then used to derive moment
matching priors in the generalized linear model setting.
© 2014 Elsevier Inc. All rights reserved.
1. Introduction
Bayesian methodology is gaining increasing prominence in the theory and application of statistics. Its versatility has
enhanced due to its implementability via many statistical numerical integration techniques, in particular, the Markov chain
Monte Carlo method. Nevertheless, it is important not to overlook asymptotic performance of any Bayesian procedure.
Specifically, it is important to check whether a posterior distribution generated by a prior becomes degenerate in the limit
to the true parameter value as the sample size grows to infinity. In the same vein, it is also important to understand a more
detailed asymptotic behavior of the posterior distribution of the (appropriately normalized) parameter of interest.
Asymptotic normality of the posterior for regular (when the support of the distribution does not depend on the
parameter) family of distributions based on i.i.d. observations was first developed by Bernstein and Von Mises (see [2]). Later,
analogous to frequentist Edgeworth expansion of the density or the distribution function, higher order asymptotic expansion
of the posterior was developed to address various other important issues needed for Bayesian analysis, most prominently
the development of non-subjective priors using a number of different criteria; see e.g. [11] where other references are cited.
To our knowledge, the first work dealing with a comprehensive asymptotic expansion of the posterior is due to
Johnson [16,17]. This was followed up later by Walker [22], Ghosh, Sinha and Joshi [13], Crowder [4], just to name a few.
However, much of this work was focused on posteriors generated from i.i.d. observations generated from a regular family of
distributions and a smooth family of priors admitting derivatives up to a certain order. Ghosal and Samanta [10] established
asymptotic expansion of the posterior in the non-regular case, by considering a one-parameter family of discontinuous
densities.
Ghosal [6–8] made significant and topical contributions to this area by establishing posterior consistency in a high
dimensional context. Specifically, Ghosal [6] established posterior consistency (asymptotic normality in the Bernstein–von-
Mises sense) of the posterior for generalized linear models in a high dimensional setup. The number of regressors pn is
∗ Corresponding author.
E-mail address: ghoshm@stat.ufl.edu (M. Ghosh).
http://dx.doi.org/10.1016/j.jmva.2014.06.013
0047-259X/© 2014 Elsevier Inc. All rights reserved.

S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 127
allowed to grow with the sample size n. In particular, it is assumed that p4
n log pn/n → 0. Later, Ghosal [7] established
asymptotic normality of the posterior for linear regression models in a similar high dimensional setup as Ghosal [6]. In [8],
asymptotic normality of the posterior was established for exponential families as the number of parameters grows with the
sample size. Bontemps [3] extended the work of Ghosal [7] by permitting the model to be misspecified and the number of
regressors to grow proportionally to the sample size. Barron et al. [1] and Ghosal et al. [9] have considered the notion of
posterior consistency in nonparametric settings.
In this paper, we focus on generalized linear models (GLM) with canonical link function. The main objective of this paper
is to extend the asymptotic consistency result of Ghosal [6], by providing a third order correct asymptotic expansion of
the posterior density for GLM with canonical link function when the number of regressors grows to infinity at a certain rate
relative to the growth of the sample size n. Since a general link function is a one-to-one function of the canonical link function,
we can get a similar asymptotic expansion for the vector of regression parameters in the general case as well. The results
bear potential for the development of a variety of objective priors in this framework. The first step towards the development
of reference priors, probability matching priors, moment matching priors and others requires asymptotic expansions of
posteriors (cf. [11]). In particular, we use the asymptotic expansion to derive moment matching priors (introduced in [12])
in the GLM setting. To the best of the authors’ knowledge, identification of moment matching priors in this setting (both
when the number of regressors is bounded, and when the number of regressors increases with n) has not been undertaken
in the literature.
The paper is organized as follows. In Section 2, we introduce the model and provide the required assumptions. In Section 3,
we prove the main asymptotic expansion result (Theorem 1). In Section 4, we use this asymptotic expansion to derive
moment matching priors. The Appendix contains proofs which establish that the assumptions (in Section 2) on the prior
density are satisfied by the multivariate normal and multivariate t densities.
2. Preliminaries
2.1. Setup and assumptions
Let X1, . . . , Xn be independent random variables. Let fi(·) denote the density of Xi with respect to a σ-finite measure ν.
Suppose
fi(xi) = exp[xiθi − ψ(θi)], i = 1, . . . , n, (2.1)
where, θi = zT
i β, β = (β1, . . . , βpn )T
is the vector of parameters and zi = (zi1, . . . , zipn )T
is the vector of covariates for
i = 1, . . . , n. Note that we are allowing the dimension pn of the parameter β to grow with the sample size n. Also, the
cumulant generating function ψ is infinitely differentiable and is assumed to be strictly convex. The above model is termed
by Haberman [14] as the ‘‘Dempster model’’.
Let π(·) denote the prior density of β. Then the posterior density of β given the observations X1, . . . , Xn is defined by:
π(β | X) =
exp[ln(β)]π(β)

exp[ln(β)]π(β)dβ
, (2.2)
where,
ln(β) =
n
i=1
(XizT
i β − ψ(zT
i β))
is the log-likelihood function. Note that the covariate vectors z1, . . . , zn, the true parameter value β0, the prior π(·), the posterior
π(· | X) all change with n. However, we suppress this dependence in our notation for simplicity of exposition. We now state the
regularity conditions needed for our result.
• (A-0) The matrix An defined by the relation
An =
n
i=1
zizT
i
is positive definite and the eigenvalues of 1
n
An are uniformly bounded, i.e., ∃ constants C1 and C2 (independent of n) such
that the matrix 1
n
An satisfies the following.
0 < C1 < λmin

1
n
An

≤ λmax

1
n
An

< C2 < ∞,
for all n. Here λmax and λmin respectively denote the largest and smallest eigenvalues of the appropriate matrix. Further,
we assume that ∥zi∥ =

zT
i zi = O(
√
pn). More specifically, there exists a constant M (independent of n) such that
∥zi∥ ≤ M
√
pn.

128 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
• (A-1) Let β0 denote the (sequence of) true value of the regression parameter vector β. Note that θ0i = zT
i β0 is the true
value of the parameter θi in (2.1). We assume that max1≤i≤n |θ0i| is uniformly bounded as n varies, i.e., there exists a
constant K (independent of n) such that
max
1≤i≤n
|zT
i β0| = max
1≤i≤n
|θ0i| < K. (2.3)
As mentioned in [6,7], this assumption makes sense particularly if the data is clean from extreme outliers. As in [6,7], we
also assume that the parameter space is restricted to those values of β for which
max
1≤i≤n
|zT
i β| ≤ K′
, (2.4)
for some K′
> K. This is equivalent to the statement that the parameter space is restricted to Θn, where
Θn = {β : max
1≤i≤n
|zT
i β| ≤ K′
}. (2.5)
Note that Θn is a convex set. The posterior density of β given the observations X1, . . . , Xn (introduced in (2.2)) is more
precisely given by
π(β | X) =
exp[ln(β)]π(β)

Θn
exp[ln(β)]π(β)dβ
1{β∈Θn}. (2.6)
We refer the reader to Ghosal [6,7] for details and discussion about this assumption. The summary is that a frequentist
can think of this as a compactness assumption to prevent the posterior mass from going to infinity. A Bayesian can think
this as a convenient and reasonable prior belief about θ. It should be noted that actual knowledge of K and K′
is not
required to obtain the main terms (up to the third order) in the expansion in Theorem 1. But K and K′
do control the rate
at which the op(1) terms in the expansion converge to 0.
In this context it is also important to clarify that when we propose priors like multivariate normal or multivariate t for
β, we implicitly truncate these priors to the region Θn.
• (A-2) The prior density π(·) of β satisfies

Θn
π(β)dβ = 1 and π(β0) > ηpn
0 , for some η0 > 0 (η0 does not depend on n).
Also, π(·) is assumed to be twice continuously differentiable with
sup
∥β−β0∥≤Cn
∥∇ log π(β)∥2 < M1p3/2
n for some M1 > 0, (2.7)
and
sup
∥β−β0∥≤Cn
max
1≤j,j′≤pn




1
π(β)

∂2
π(β)
∂βj∂βj′



 < M2p5/2
n for some M2 > 0, where, Cn = 4

pn
n
. (2.8)
This assumption is satisfied by appropriate multivariate t and multivariate normal densities (see Appendix). Note that
the prior density can be improper as a density on Rpn . We only assume that it has been normalized to integrate to 1 on the
compact set Θn.
• (A-3) The dimension pn can grow to infinity such that
p
6+ϵ
n
n
→ 0 as n → ∞ for some small ϵ > 0.
Note that (A-3) is stronger than the corresponding assumption in [6] which only requires p4
n log pn/n → 0. However, the
goal in [6] is to establish asymptotic normality of the posterior. Our goal is to get a third order asymptotic expansion of the
posterior. Hence it is not surprising that we need a slower rate of increase for pn.
2.2. Asymptotic convergence rate for MLE
Let ˆβn be the maximum likelihood estimator of β. It follows by the convexity of ψ and assumption (A-0) that the Hessian
matrix of ln(β) is a negative definite matrix for all β. Hence ln(β) is a strictly convex function and has a unique maximum.
The following lemma (Lemma 1) establishes weak consistency of the maximum likelihood estimator ˆβn, and provides an
asymptotic rate of convergence. This lemma is helpful in proving the main result (Theorem 1). Haberman [14] established
consistency and asymptotic normality for the MLE in exponential response models, a more general version of the Dempster
model considered here, when
p3
n
n
→ 0. However, it is not quite clear if Haberman’s results can be used under our assumptions
to obtain the asymptotic rate in Lemma 1. Hence, for the sake of completeness, we provide an independent proof of Lemma 1
in the Appendix by adapting the approach of Fan and Peng [5] (in the i.i.d. setting) to the GLM setting.
We briefly mention some other works on high dimensional consistency and asymptotic normality of the MLE, and
the differences between our setup and the setup in those papers. Portnoy [19,20] established consistency and asymptotic
normality of M-estimators in the context of linear regression, as the number of regression parameters pn grows with the
sample size n (satisfying the condition
(pn log pn)3/2
n
→ 0).1
Portnoy [21] established consistency and asymptotic normality
1 See [19,20] for references to earlier works in this area.

of the MLE for i.i.d. observations from exponential families, as the number of parameters pn grows with the sample size n
(satisfying the condition
p
3/2
n
n
→ 0). This is a different setting than the regression based setting (with covariates) considered
in this paper. Fan and Peng [5] established high dimensional consistency and asymptotic normality of penalized likelihood
estimators (MLE can be thought of as a special case). However, they considered the i.i.d. setting, which is different than
the setting in this paper. Zhang et al. [23] considered penalized pseudo-likelihood estimators for high dimensional GLM.
However, their Bregman divergence based loss functions do not include the negative log-likelihood loss function. More
specifically, in the context of GLM with canonical link, Zhang et al.’s [23] loss function looks like
n
i=1
−q(Xi) + q(ψ′
(zT
i β)) + (Xi − ψ′
(zT
i β))q′
(ψ′
(zT
i β)), (2.9)
where q(·) is a concave function. The log-likelihood function ln(β) cannot be written in this form. A proof of high dimensional
asymptotic normality of ˆβn in the special case of logistic regression is provided in [18].
Lemma 1. Under assumptions (A-0)–(A-3), the maximum likelihood estimator ˆβn satisfies ∥ˆβn − β0∥ = Op(

pn
n
).
Remark 1. Note that by Lemma 1 and (A-0),
|zT
i (ˆβn − β0)| ≤ ∥zi∥ ∥ˆβn − β0∥ = Op

pn
√
n

.
By (A-1), it follows that
ˆβn ∈

β : max
1≤i≤n
|zT
i β| <
K + K′
2

(2.10)
with probability tending to 1 as n → ∞. In particular, we get ˆβn ∈ Θn with probability tending to 1 as n → ∞.
3. Main result
In this section, we derive our main result: a third order correct asymptotic expansion of the posterior π(· | X) around
an appropriate normal density. We transform the parameter β to g =
√
n(β − ˆβn). Since the parameter space for β is Θn,
it follows that the parameter space for g is
Gn :=

g : ˆβn +
g
√
n
∈ Θn

.
From (2.2) we obtain that the posterior density of g is given by
π∗
(g | X) =
exp

ln

ˆβn + g
√
n

− ln(ˆβn)

π

ˆβn + g
√
n


Gn
exp

ln

ˆβn + g
√
n

− ln(ˆβn)

π

ˆβn + g
√
n

dg
1g∈Gn . (3.1)
We now prove a series of lemmas which help us to prove our main result (Theorem 1). We first show that Ghosal’s [6] result
on posterior consistency holds under our assumptions.
Lemma 2. Under assumptions (A-0)–(A-3) described above,

|π∗
(g | X) − Npn

g|µn, Σn

|dg → 0, (3.2)
where Npn

g|µn, Σn

is a pn-dimensional normal density with mean vector
µn =
√
nB−1
n
n
i=1

Xi − ψ′

zT
i β0

zi −
√
n(ˆβn − β0),
and the inverse covariance matrix
Σ−1
n =
1
n
Bn =
1
n
n
i=1
ψ′′

zT
i β0

zizT
i .
Proof. We verify that the assumptions in [6] follow from (A-0) to (A-3). Note that Ghosal (Eqs. (2.6) and (2.7)) follows
immediately from our assumptions (A-1) and (A-2). Let δn = ∥A
−1/2
n ∥. By (A-0), it follows that δn = O(n−1/2
). Note that by

(A-2), if ∥β − β0∥ ≤ 4

pn
n
, then by the mean value theorem,
| log π(β) − log π(β0)| ≤ sup
∥β−β0∥≤ 4
√pn
n
∥∇ log π(β)∥ ∥β − β0∥ ≤ M1p3/2
n ∥β − β0∥.
Note that
pn(log pn)1/2
δn = O

p
1+ ϵ
3
n
√
n

= o

4

pn
n

.
Hence, Ghosal [6, Eq. (2.8)] is satisfied with Kn = M1p
3/2
n . Note that
Knδnpn(log pn)1/2
=
p
5/2+ϵ/3
n
√
n
→ 0.
Let ηn = max1≤i≤n ∥A
−1/2
n zi∥. Then
ηn ≤ ∥A−1/2
n ∥ max
1≤i≤n
∥zi∥ = O

pn
n

,
where



A
− 1
2
n



 = sup






A
− 1
2
n x




∥x∥
: x ∈ Rn
with x ̸= 0



.
This means
p3/2
n (log pn)1/2
ηn = O

p3/2+ϵ/3
n

pn
n

= O

p
2+ϵ/3
n
√
n

→ 0.
Hence, Ghosal [6, Eq. (2.10)] is satisfied.
Now, since 1
n
n
i=1 zizT
i has uniformly bounded eigenvalues (by (A-0)), hence
tr

1
n
n
i=1
zizT
i

= O(pn).
Elementary manipulations using properties of trace imply that
n
i=1
pn
j=1
z2
ij = tr

n
i=1
zizT
i

= O(npn).
Thus, Ghosal [6, Eq. (2.11)] is also satisfied. Hence, all the assumptions in [6] hold. The lemma now follows from Theorem
2.1 of Ghosal [6] (using a straightforward linear transformation).
Define the function
Zn(g) := exp

ln

ˆβn +
g
√
n

− ln(ˆβn)

. (3.3)
Note that
π∗
(g|X) =
Zn(g)π

ˆβn + g
√
n


Gn
Zn(g)π

ˆβn + g
√
n

dg
1g∈Gn .
Henceforth, we assume that pn → ∞. If pn is uniformly bounded, a simple modification of the arguments below can be used to
establish the asymptotic expansion result. See Section 4.1.
Lemma 3. Let Cn :=

g : g ∈ Gn, ∥g∥ ≤ p
1
2
+ϵ′
n

and Kn = π(ˆβn)(2π)
pn
2


−
∇2ln(ˆβn)
n



1/2
, where ϵ′
= ϵ
6
. Then,

Cn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg
P
→ 1. (3.4)

Proof. Note that, Zn(g) = exp

ln(ˆβn + g
√
n
) − ln(ˆβn)

. By a third order correct Taylor series expansion of ln around ˆβn, we
get that
Zn(g) = exp

gT
∇2
ln(ˆβn)g
2n
−
1
6n3/2
n
i=1
ψ′′′
(zT
i β∗
n)

pn
r=1
zir gr
3

 , (3.5)
where β∗
n = β∗
n(g) is an intermediate point on the line joining between ˆβn and (ˆβn + g
√
n
). Note that by Lemma 1, ˆβn ∈ Θn
with probability tending to 1. Also, by the definition of Gn it follows that (ˆβn + g
√
n
) ∈ Θn for every g ∈ Gn. It follows by the
convexity of Θn and (2.10) that
P

β∗
n(g) ∈ Θn ∀ g ∈ Gn

→ 1, (3.6)
as n → ∞. Also, if g ∈ Cn,





pn
r=1
zir gr





≤ ∥zi∥ ∥g∥ ≤ M
√
pnp
1
2
+ϵ′
n = Mp1+ϵ′
n . (3.7)
Let
K2 := sup
x∈[−K′,K′]
ψ′′′
(x).
Note that K2 < ∞ by continuity of ψ′′′
. Hence, if ˆβn ∈ Θn and g ∈ Cn,






1
n3/2
n
i=1
ψ′′′
(zT
i β∗
n)

pn
r=1
zir gr
3






≤
K2
n3/2






n
i=1

pn
r=1
zir gr
3






≤
K2M3
p3+3ϵ′
n
√
n
. (3.8)
The previous inequality follows by (3.7). It follows by (A-3) that
sup
g∈Cn






1
n3/2
n
i=1
ψ′′′
(zT
i β∗
n)

pn
r=1
zir gr
3






= Op

p
3+ ϵ
2
n
√
n

= op(1). (3.9)
Also, by (A-2), it follows that if g ∈ Cn, then
π

ˆβn + g
√
n

π(ˆβn)
= exp

log π

ˆβn +
g
√
n

− log π(ˆβn)

= exp

(∇ log π(β∗∗
n ))T g
√
n

,
for some intermediate point β∗∗
n = β∗∗
n (g) on the line joining ˆβn and ˆβn + g
√
n
. Note that by Lemma 1 and (A-3) that
supg∈Cn
∥β∗∗
n − β0∥ = op

4

pn
n

. It follows by (A-2) that,
sup
g∈Cn
(∇ log π(β∗∗
n ))T g
√
n
≤ sup
g∈Cn

∥∇ log π(β∗∗
n )∥ ∥g∥
√
n

= Op

p2+ϵ′
n
√
n

= op(1). (3.10)
It follows by (3.5), (3.9), (3.10) and the definition of Kn that

Cn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg = exp(op(1))

Cn
Npn

g|0, ˆΣn

dg, (3.11)

where ˆΣn =

−∇2ln(ˆβn)
n
−1
=

1
n
n
i=1 ψ′′
(zT
i
ˆβn)zizT
i
−1
. Note that if Un ∼ Npn

0, ˆΣn

, then
sup
1≤i≤n



zT
i

ˆβn +
Un
√
n



 > K′
⇒ sup
1≤i≤n
|zT
i Un| >
√
n

K′
− sup
1≤i≤n


zT
i
ˆβn




⇒ ∥Un∥ >

n
pnM2

K′
− sup
1≤i≤n


zT
i
ˆβn




(by (A-0)). (3.12)
By the strict convexity of ψ, Lemma 1 and (A-0), it follows that
ENpn (0, ˆΣn)

∥Un∥2

= trace( ˆΣn) = Op(pn). (3.13)
Also by (2.10), it follows that
1
K′ − sup
1≤i≤n
|zT
i
ˆβn|
= Op(1). (3.14)
Note that Cc
n = Gc
n ∪

g : ∥g∥ > p
1
2
+ϵ′
n

. A simple application of Markov’s inequality, along with (3.12)–(3.14), yields that

Cc
n
Npn

g | 0, ˆΣn

dg ≤

Gc
n
Npn

g | 0, ˆΣn

dg +


g:∥g∥≥p
1
2
+ϵ′
n
 Npn

g | 0, ˆΣn

dg
≤
E∥Un∥2
pnM2
n

K′ − sup
1≤i≤n


zT
i
ˆβn



2
+
E∥Un∥2
p1+2ϵ′
n
= Op

p2
n
n

+ Op(p−2ϵ′
n ) = op(1). (3.15)
It follows by (3.11) and (3.15) that

Cn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg
P
→ 1
as n → ∞.
Lemma 4.

GnCn
π∗
(g|X)dg = op(1). (3.16)
Proof. Let Un ∼ Npn

g|µn, Σn

, where µn and Σn are as defined in the statement of Lemma 2. Note that
∥µn∥ ≤





√
nB−1
n
n
i=1

Xi − ψ′

zT
i β0

zi





+ OP (
√
pn). (3.17)
Since B−1
n = 1
n
Σn and ∥Σn∥ = O(1), it follows that
E





√
nB−1
n
n
i=1

Xi − ψ′

zT
i β0

zi





2
= O

E





1
√
n
n
i=1

Xi − ψ′

zT
i β0

zi





2


= O

E

1
n
n
i=1

Xi − ψ′

zT
i β0
2
zT
i zi

(∵ Xi’s are independent)
= O

1
n
n
i=1
ψ′′

zT
i β0

zT
i zi


≤ O

1
n
max
1≤i≤n
ψ′′

zT
i β0
 n
i=1
zT
i zi

= O

1
n
n
i=1
zT
i zi


∵ By (A-1) and continuity of ψ′′

= O(pn) (∵ By (A-0)) .
It follows by (3.17) that
∥µn∥ = Op(
√
pn). (3.18)
Hence
ENpn (µn,Σn)∥Un∥2
= trace(Σn) + ∥µn∥2
= Op(pn).
By exactly the same argument as the one leading to Eq. (3.15) in the proof of Lemma 3, it follows that

Cc
n
Npn (µn, Σn)dg =
op(1). The result now follows by using Lemma 2.
Lemma 5.

Gn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg → 1.
Proof. Note that by Lemma 4,

GnCn
π∗
(g|X)dg =

GnCn
1
Kn
Zn(g)π

ˆβn + g
√
n

dg

Gn
1
Kn
Zn,ˆβn
(g)π

ˆβn + g
√
n

dg
→ 0.
Hence,

GnCn
1
Kn
Zn(g)π

ˆβn + g
√
n

dg

Cn
1
Kn
Zn(g)π

ˆβn + g
√
n

dg +

GnCn
1
Kn
Zn(g)π

ˆβn + g
√
n

dg
→ 0. (3.19)
Now, by Lemma 3,

Cn
1
Kn
Zn(g)π

ˆβn +
g
√
n

→ 1.
The result follows by (3.19).
We now state and prove the main result of the paper.
Theorem 1. Suppose β ∈ Rpn satisfies
√
n∥β − ˆβn∥ ≤ p
1
2
+ ϵ
6
n for every n. This is equivalent to the assumption that g ∈ Cn. In
such a case, under assumptions (A-0)–(A-3),
π∗
(g | X) = Npn

g|0, ˆΣn


1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit +
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
−

1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit
 
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)

×

1 − op(1)

, (3.20)
where, supg∈Cn
R(g) = Op

p
6+ϵ
n
n

and Npn

g|0, ˆΣn

is a pn-dimensional normal density with mean vector 0 and covariance
matrix ˆΣn =

−∇2ln(ˆβn)
n
−1
evaluated at g.
Remark 2. Note that by Lemma 4, the posterior probability that g does not lie in Cn converges to 0.

Proof. Since ∇ln(ˆβn) = 0, by a fourth order Taylor series expansion around ˆβn, we have:
ln

ˆβn +
g
√
n

− ln(ˆβn) =
1
2n
gT
∇2
ln(ˆβn)g +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn
+
1
24n2
pn
r,s,t,u=1
gr gsgt gu
∂4
ln(β)
∂βr ∂βs∂βt ∂βu




β=β∗
n
= A1(g) + A2(g) + A3(g) (say). (3.21)
Here β∗
n = β∗
n(g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Based on exactly the same argument leading
up to (3.6) (in the proof of Lemma 3),
P

β∗
n(g) ∈ Θn ∀ g ∈ Gn

→ 1, (3.22)
as n → ∞. Also,
π

ˆβn +
g
√
n

= π(ˆβn) +
1
√
n
gT
∇π(ˆβn) +
1
2n
gT
∇2
π(β∗∗
n )g
= π(ˆβn)

1 +
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
+
1
2n
gT
∇2
π(β∗∗
n )g
π(ˆβn)

= π(ˆβn)(1 + B1(g) + B2(g)) (say), (3.23)
where β∗∗
n = β∗∗
n (g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Based on exactly the same argument
leading up to (3.6) (in the proof of Lemma 3),
P

β∗∗
n (g) ∈ Θn ∀ g ∈ Gn

→ 1, (3.24)
as n → ∞.
We now analyze various terms in (3.21) and (3.23). By the continuity of ψ′′′
and the fact that ˆβn ∈ Θn with probability
tending to 1, it follows that
max
1≤i≤n
|ψ′′′
(zT
i
ˆβn)| = Op(1). (3.25)
Hence, for g ∈ Rpn ,
|A2(g)| =





1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn





=





1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

zir ziszit gr gsgt





≤
1
6n3/2
max
1≤i≤n


ψ′′′

zT
i
ˆβn



n
i=1
|zT
i g|3
. (3.26)
In particular, we get that for g ∈ Cn,
|A2(g)| ≤
1
6n3/2
max
1≤i≤n


ψ′′′

zT
i
ˆβn



n
i=1
(∥g∥ ∥zi∥)3
(∵ By Cauchy–Schwarz)
=
1
6n3/2
max
1≤i≤n


ψ′′′

zT
i
ˆβn


 ∥g∥3
n
i=1
∥zi∥3
≤ max
1≤i≤n


ψ′′′

zT
i
ˆβn




p
1
2
+ ϵ
6
n
3
n(M
√
pn)3
6n3/2
. (3.27)
The last inequality follows by using (A-0). It follows by (3.27) that
sup
g∈Cn
|A2(g)| = Op

p
3+ ϵ
2
n
√
n

. (3.28)

By the continuity of ψ′′′′
and (3.22), it follows that
sup
g∈Cn
max
1≤i≤n
|ψ′′′′
(zT
i β∗
n)| = Op(1). (3.29)
Hence, for g ∈ Gn,
|A3(g)| =





1
24n2
pn
r,s,t,u=1
gr gsgt gu
∂4
ln(β)
∂βr ∂βs∂βt ∂βu




β=β∗
n





=





1
24n2
pn
r,s,t,u=1
n
i=1
ψ′′′′

zT
i β∗
n

zir ziszit ziugr gsgt gu





≤
1
24n2
max
1≤i≤n

ψ′′′′

zT
i β∗
n


n
i=1
|zT
i g|4
. (3.30)
In particular, for g ∈ Cn, we get that
|A3(g)| ≤
1
24n2
max
1≤i≤n

ψ′′′′

zT
i β∗
n


n
i=1
(∥g∥ ∥zi∥)4
=
1
24n2
max
1≤i≤n

ψ′′′′

zT
i β∗
n

 ∥g∥4
n
i=1
∥zi∥4
≤ max
1≤i≤n

ψ′′′′

zT
i β∗
n



Cp
1
2
+ ϵ
6
n
4
n(M
√
pn)4
24n2
. (3.31)
The last inequality follows by using (A-0). It follows by (3.31) that
sup
g∈Cn
|A3(g)| = Op

p
4+ 2ϵ
3
n
n

 . (3.32)
Next we analyze the second order remainder term in (3.23). Note that ∥ˆβn − β0∥ = Op(

pn
n
) by Lemma 1 and supg∈Cn
∥β∗∗
n (g)− ˆβn∥ = Op

p
1
2
+ϵ′
n√
n

as β∗∗
n (g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Hence, by the triangle
inequality, we get that
sup
g∈Cn
∥β∗∗
n (g) − β0∥ = Op

p
1
2
+ϵ′
n
√
n

 . (3.33)
By (A-3), we get that
p
1
2
+ϵ′
n√
n
= o

4

pn
n

. By (A-2), it follows that
sup
g∈Cn
max
1≤r,s≤pn






1
π(β)
∂2
π(β)
∂βr ∂βs

β=β∗∗
n (g)





= Op(p4
n). (3.34)
Note that
π(β∗∗
n )
π(ˆβn)
= exp

log π(β∗∗
n ) − log(π(ˆβn))

= exp

∇ log π(β∗∗∗
n )
T

β∗∗
n − ˆβn

,
where β∗∗∗
n = β∗∗∗
n (g) is an intermediate point on the line joining β∗∗
n and ˆβn. Hence,
sup
g∈Cn
∥β∗∗∗
n − ˆβn∥ ≤ sup
g∈Cn
∥β∗∗
n − ˆβn∥ = Op

p
1
2
+ϵ′
n
√
n

 .

By (A-3),
p
1
2
+ϵ′
n√
n
= o

4

pn
n

. Hence, by Lemma 1 and (A-2), it follows that
sup
g∈Cn
∥∇ log π(β∗∗∗
n )∥ = Op(p3/2
n ).
Hence,
sup
g∈Cn





π(β∗∗
n )
π(ˆβn)





≤ sup
g∈Cn
exp

∥∇ log π(β∗∗∗
n )∥ ∥β∗∗
n − ˆβn∥

≤ exp

Op(p3/2
n )Op

p
1
2
+ϵ′
n
√
n




= Op(1). (3.35)
It follows that
|B2(g)| =

1
n
gT
∇2
π(β∗∗
n )g




π(ˆβn)



=





π(β∗∗
n )
π(ˆβn)










1
n
pn
r,s=1

1
π(β)
∂2
π(β)
∂βr ∂βs

β=β∗∗
n
gr gs





≤





π(β∗∗
n )
π(ˆβn)






1
n
pn
r,s=1






1
π(β)
∂2
π(β)
∂βr ∂βs

β=β∗∗
n





|gr | |gs|

≤





π(β∗∗
n )
π(ˆβn)






1
n
max
1≤r,s≤pn





1
π(β)
∂2
π(β)
∂βr ∂βs




β=β∗∗
n

pn∥g∥2
. (3.36)
It follows by (3.34)–(3.36) that
sup
g∈Cn
|B2(g)| = Op

p
9/2+ ϵ
3
n
n

. (3.37)
Note that by (3.1), (3.21) and (3.23), π∗
(g|X) = N/D, where,
N =
π(ˆβn) (1 + B1(g) + B2(g)) exp (A1(g) + A2(g) + A3(g))
π(ˆβn)

(2π)
pn
2


−
∇2ln(ˆβn)
n



1
2

= Npn

g|0, ˆΣn

(1 + B1(g) + B2(g)) exp (A2(g) + A3(g))
= Npn

g|0, ˆΣn

{(1 + B1(g)) (1 + A2(g)) + B2(g)(1 + A2(g) + A3(g)) + (1 + B1(g)) A3(g)}
+ Npn

g|0, ˆΣn

{(1 + B1(g) + B2(g)) (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g)))}
= Npn

g|0, ˆΣn

(N1(g) + N2(g) + N3(g) + N4(g)) , (say), (3.38)
and
D =

N(g)dg. (3.39)
Now, from (3.28), (3.32) and (3.37), it follows that
sup
g∈Cn
N2(g) = sup
g∈Cn
[B2(g)(1 + A2(g) + A3(g))] = Op

p
9/2+ ϵ
3
n
n

. (3.40)

In view of Lemma 1 and (A-2),
sup
g∈Cn
|1 + B1(g)| = 1 + sup
g∈Cn
1
√
n
pn
v=1
|gv(∇ log π(ˆβn))v|
≤ 1 + sup
g∈Cn
1
√
n
∥g∥ ∥∇ log π(ˆβn)∥
= 1 + Op

p2+ϵ′
n
√
n

= 1 + op(1). (3.41)
By (3.32), it follows that
sup
g∈Cn
N3(g) = sup
g∈Cn
[(1 + B1(g)) A3(g)] = Op

p
4+ 2ϵ
3
n
n

 . (3.42)
By (3.28) and (3.32),
sup
g∈Cn
|A2(g) + A3(g)| ≤ sup
g∈Cn
|A2(g)| + sup
g∈Cn
|A3(g)| = Op

p
3+ ϵ
2
n
√
n

.
It follows by (A-3) that for large enough n,
sup
g∈Cn
(exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g))) ≤ sup
g∈Cn
(A2(g) + A3(g))2
= Op

p6+ϵ
n
n

. (3.43)
It follows from (3.37), (3.41) and (3.43) that
sup
g∈Cn
N4(g) = sup
g∈Cn
[(1 + B1(g) + B2(g)) (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g)))]
= Op

p6+ϵ
n
n

. (3.44)
Let
R(g) := N2(g) + N3(g) + N4(g). (3.45)
It follows from (3.40), (3.42) and (3.44) that
sup
g∈Cn
(N2(g) + N3(g) + N4(g)) = Op

p6+ϵ
n
n

. (3.46)
By (3.39) and Lemma 5,
D =

Gn
1
Kn
Zn(g)π

ˆβn +
g
√
n

dg = 1 + op(1).
Thus,
π∗
(g|X) = N/D
= Npn

g|0, ˆΣn


1 +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn

×

1 +
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)


1 − op(1)

= Npn

g|0, ˆΣn


1 +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn
+
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
+

1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt




β=ˆβn
 
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)


1 − op(1)


= Npn

g|0, ˆΣn


1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
−

1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit
 
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)


1 − op(1)

,
where supg∈Cn
R(g) = Op

p
6+ϵ
n
n

.
Remark 3. Note that we have extended the first order results in [6] to a third order correct posterior expansion by requiring
stronger growth restrictions on pn. A natural question that arises is whether one can obtain a second order correct expansion
with weaker restrictions on the growth on pn. However, we have not considered second order correct expansions for two
reasons. Firstly, the derivation of a moment matching prior, which is the application that we consider in Section 4, requires a
third order correct asymptotic expansion. Secondly, the proof of Lemma 3 in the paper uses the assumption that p6+ϵ
n /n → 0
(see (3.9)). We would still need Lemma 3 to establish a second order correct posterior expansion. Therefore, establishing
a second order correct expansion would still require the same growth restriction (assuming the other conditions in (A-0),
(A-1), (A-2) and (A-3) are left unchanged).
Remark 4. Bontemps [3] establishes posterior consistency under Gaussianity, by relaxing the restrictions in [6,7] in several
ways. However, the arguments in the proof of Bontemps’ results (in particular Theorems 1 and 2 in Bontemps’ paper) rely
heavily on Gaussianity. We have made efforts to adapt them for other models, but have not been successful so far.
3.1. Posterior expansion for the uniformly bounded case
We can consider the case when pn is uniformly bounded, and obtain an expansion of the posterior density parallel to
(3.20). The fact that pn is uniformly bounded allows a slightly finer analysis of the terms in the expansion, which is useful
when deriving moment matching priors in Section 4. Firstly, we note that Lemmas 3–5 can be established by the same
set of arguments, by using, for example, Cn =

g : ∥g∥ < n
1
6+ϵ

instead of Cn =

g : ∥g∥ < p
1
2
+ ϵ
6
n

. Henceforth, in this
subsection, it will be assumed that Cn =

g : ∥g∥ < n
1
6+ϵ

. It can be easily seen by repeating appropriate steps in the proof
of Theorem 1 that in this case
|A2(g)| = Op

∥g∥3
√
n

, (3.47)
for every g ∈ Rpn , and
|A3(g)| = Op

∥g∥4
n

. (3.48)
for every g ∈ Gn. To clarify notation, (3.47) means that |A2(g)| is ∥g∥3
times a quantity which is independent of g, and is
Op

1√
n

.
Since Θn is a compact set, it follows by (3.24) and the twice continuous differentiability of π(·) that each entry of
∇π(ˆβn) and ∇2
π(β∗∗
n ) is bounded above in probability. Also, by (A-2), it follows that π(ˆβ) is bounded below in probability.
Combining (3.23) with the above facts gives us
|B1(g)| = Op

∥g∥
√
n

, (3.49)
for every g ∈ Rpn and
|B2(g)| = Op

∥g∥2
n

(3.50)
for every g ∈ Gn. It follows that
sup
g∈Cn
|B1(g)| = Op

1
n
4+ϵ
12+2ϵ

sup
g∈Cn
|B2(g)| = Op

1
n
4+ϵ
6+ϵ


sup
g∈Cn
|A2(g)| = Op

1
n
ϵ
12+2ϵ

sup
g∈Cn
|A3(g)| = Op

1
n
2+ϵ
6+ϵ

. (3.51)
It follows by (3.38) (along with the arguments following it, adjusted for the fact that pn is uniformly bounded and for the
new choice of Cn) and (3.51) that for every g ∈ Cn,
π∗
(g|X) = Npn

g|0, ˆΣn


1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v
−

1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′

zT
i
ˆβn

gr gsgt zir ziszit
 
1
√
n
pn
v=1
gv

∇ log π(ˆβn)

v

+ R(g)

×

1 − op(1)

, (3.52)
and
|R(g)| = Op

∥g∥6
n

. (3.53)
Note that (3.52) is identical to (3.20). However, the order of the remainder term is different for the two settings. In the setting
for (3.20), we have supg∈Cn
R(g) = Op

p
6+ϵ
n
n

. However, in the current setting,
sup
g∈Cn
|R(g)| = Op

1
n
ϵ
6+ϵ

. (3.54)
Note that even in this case, the posterior probability of the set Cn converges to 1 as n → ∞. We conclude this section by
noting that if g is fixed (or ∥g∥ is uniformly bounded as n → ∞) then the order of the leading terms B1(g) and A2(g) is
1√
n
(as can be seen from (3.47) and (3.49)), while the order of the remainder term R(g) is 1
n
(as can be seen from (3.53)).
However, if one is looking for bounds uniformly over g ∈ Cn then the orders can be obtained from (3.51) and (3.54).
4. Moment matching prior
A moment matching prior (introduced by Ghosh and Liu [12]) is an objective prior for which the posterior mean matches
with the maximum likelihood estimator up to a high order of approximation. Ghosh and Liu [12] provide several examples
where they derive a moment matching prior using third order correct posterior expansions. In particular, they consider the
case with i.i.d. observations from a multi-parameter natural exponential family (with fixed p), and prove that the moment
matching prior in this situation can be uniquely determined, and in fact corresponds to Jeffreys’ general rule prior. However,
they did not consider the more complicated GLM setting. In this section we use the expansion in Theorem 1 to obtain
moment matching priors in the context of GLM with canonical link function (both when pn is uniformly bounded, and when
pn is unbounded). We will in fact show that the moment matching prior can be uniquely identified in this situation, and
corresponds to the Jeffreys’ general rule prior. In other words, Jeffreys’ general rule prior is the only prior which satisfies the
moment matching condition in the current GLM setup. We may add here that conditions for the propriety and existence of
moments for Jeffreys’ prior in the GLM setup (as well as the resulting posterior) have been addressed in [15].
The analysis in the current setup will be based on examining the posterior expectation of the quantity β − ˆβn. Note
that the remainder term in most posterior expansions (including the ones used in [12] and in this paper) is not uniformly
bounded in the variable used in the expansion (for example g in our setup) if we do not restrict to an appropriate set (such as
Cn in our setup). In fact, to show that the expected value (with respect to the posterior distribution) of the remainder term is
appropriately small, one has to restrict the computation of the expected value over a set such as Cn. Ghosh and Liu [12] take
a somewhat heuristic approach in their derivations, and do not take this issue into account. We undertake a more rigorous
approach to address this issue as follows. The computation of the posterior expectation for deriving the moment matching
prior will be restricted to the region Cn on which the appropriate bounds for the remainder term in the expansion are valid.
4.1. The uniformly bounded case
We first consider the case when the number of regressors pn is uniformly bounded. Note that the expansion in (3.52)
holds in this case with Cn = {g : ∥g∥ < n
1
6+ϵ }. The moment matching criterion of Ghosh and Liu [12] (with the modification
discussed above) dictates that the prior π(·) be chosen such that the posterior expectation
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}


converges to zero faster than 1
n
. It follows by the expansion in (3.52) that
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

=
1
√
n

Cn
gπ∗
(g | X) dg
=
1 − op(1)
√
n

Cn
gNpn

g|0, ˆΣn


1 + B1(g) + A2(g) + B1(g)A2(g) + R(g)

dg. (4.1)
By (A-0) and (2.10), it follows that the eigenvalues of ˆΣn are uniformly bounded above (with probability tending to 1). Since
pn is uniformly bounded, for any subset S of Rpn , we get

S
∥g∥k
Npn

g|0, ˆΣn

dg ≤

Rpn
∥g∥k
Npn

g|0, ˆΣn

dg = Op(pk/2
n ) = Op(1) (4.2)
for every fixed k ∈ N. It follows by (3.47), (3.49) and (3.53) that

Cn
∥g∥Npn

g|0, ˆΣn


B1(g)A2(g) + R(g)

dg = Op

1
n

. (4.3)
Recall that Cc
n refers to Rpn Cn. A simple application of Markov’s inequality, along with (3.12), (3.14), (4.2) and the uniform
boundedness of pn yields that

Cc
n
∥g∥k
Npn

g | 0, ˆΣn

dg ≤

Gc
n
∥g∥k
Npn

g | 0, ˆΣn

dg +


g:∥g∥>n
1
6+ϵ
 ∥g∥k
Npn

g | 0, ˆΣn

dg
≤

Op

1
n

+
1
n
2
6+ϵ
 
Rpn
∥g∥k+2
Npn

g | 0, ˆΣn

dg
= Op

1
n
2
6+ϵ

(4.4)
for every fixed k ∈ N. It follows by (3.47) and (3.49) that

Cc
n
∥g∥ |A2(g)|Npn

g|0, ˆΣn

dg = Op

1
n
1
2
+ 2
6+ϵ

, (4.5)
and

Cc
n
∥g∥ |B1(g)|Npn

g|0, ˆΣn

dg = Op

1
n
1
2
+ 2
6+ϵ

. (4.6)
Note that

Cn
gNpn

g|0, ˆΣn

dg +

Cc
n
gNpn

g|0, ˆΣn

dg =

Rpn
gNpn

g|0, ˆΣn

dg = 0. (4.7)
Another application of Markov’s inequality along the lines of (4.4) (but by increasing the moment by 6 instead of 2) gives

Cc
n
∥g∥Npn

g | 0, ˆΣn

dg ≤

Op

1
n3

+
1
n
6
6+ϵ
 
Rpn
∥g∥7
Npn

g | 0, ˆΣn

dg
= Op

1
n
6
6+ϵ

. (4.8)
It follows from (4.7) and (4.8) that

Cn
gNpn

g|0, ˆΣn

dg = Op

1
n
6
6+ϵ

. (4.9)
Here, when we say that a vector x is Op(cn), we mean that ∥x∥ is Op(cn). By (4.1), (4.3), (4.5), (4.6) and (4.9) we get
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

=
1 − op(1)
√
n

Cn
gNpn

g|0, ˆΣn


1 + B1(g) + A2(g)

dg + Op

1
n
3
2

=
1 − op(1)
√
n

Rpn
gNpn

g|0, ˆΣn


B1(g) + A2(g)

dg + Op

1
n
1+ 2
6+ϵ

. (4.10)

We now simplify the integral in (4.10). Note that

Rpn
gB1(g)Npn

g|0, ˆΣn

dg =
1
√
n

Rpn
ggT
Npn

g|0, ˆΣn

dg

∇ log π(ˆβn)
=
1
√
n
ˆΣn∇ log π(ˆβn). (4.11)
Note that by Isserlis’ formula for joint moments of a multivariate normal distribution, we get that for any 1 ≤ j, r, s, t ≤ pn,

Rpn
gjgr gsgt Npn

g|0, ˆΣn

dg = ˆΣn,jr
ˆΣn,st + ˆΣn,js
ˆΣn,rt + ˆΣn,jt
ˆΣn,rs. (4.12)
Let In(β) = 1
n
n
i=1 ψ′′

zT
i β

zizT
i , the information matrix evaluated at β. It follows by the definition of A2(g) and (4.12) that

Rpn
gjA2(g)Npn

g|0, ˆΣn

dg = −
1
6
√
n
pn
r,s,t=1

ˆΣn,jr
ˆΣn,st + ˆΣn,js
ˆΣn,rt + ˆΣn,jt
ˆΣn,rs

An,r,s,t , (4.13)
where
An,r,s,t =
1
n
n
i=1
ψ′′′

zT
i
ˆβn

zir ziszit =
∂
∂βr
(In(β))st




β=ˆβn
. (4.14)
Note that
∂
∂βr
log |In(β)| =
pn
s,t=1
(In(β)−1
)st
∂
∂βr
(In(β))st . (4.15)
By the symmetry in r, s, t on the right hand side of (4.13), it follows that

Rpn
gjA2(g)Npn

g|0, ˆΣn

dg = −
1
2
√
n
pn
r,s,t=1
ˆΣn,jr
ˆΣn,st An,r,s,t . (4.16)
Combining (4.14)–(4.16) along with the fact In(ˆβn) = ˆΣ−1
n , we get that

Rpn
gA2(g)Npn

g|0, ˆΣn

dg = −
1
2
√
n
ˆΣn∇ log |In(ˆβn)|. (4.17)
It follows by (4.10), (4.11) and (4.17) that to ensure
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

= Op

1
n
1+ 2
6+ϵ

,
the prior density π(·) should satisfy
ˆΣn∇ log π(ˆβn) −
1
2
ˆΣn∇ log |In(ˆβn)| = 0. (4.18)
Note that the maximum likelihood estimator ˆβn satisfies ∥ˆβn −β0∥
P
→ 0 as n → ∞. To ensure that (4.18) holds irrespective
of the true β0, we require
ˆΣn∇ log π(β) −
1
2
ˆΣn∇ log |In(β)| = 0 (4.19)
for every β. Since ˆΣn is a positive definite matrix with probability tending to 1, it follows that (4.19) holds if and only if
π(β) ∝ |In(β)|
1
2 .
To ensure that the assumptions in (A-2) hold, we choose
π(β) = Cn|In(β)|
1
2 , (4.20)
where Cn is chosen such that

Θn
π(β)dβ = 1. Since ψ is infinitely differentiable, and Θn is a compact set, it follows by
(A-0) that the eigenvalues of In(β) are uniformly bounded (above and below) over β ∈ Θn and n ∈ N. Since pn is uniformly
bounded, it follows that π(β) is uniformly bounded (above and below) over β ∈ Θn and n ∈ N. Since ψ is infinitely
differentiable, it also follows in particular that π(·) is twice continuously differentiable. Since Θn is a compact set, it follows
that all the first and second order derivatives of π(·) are uniformly bounded above over Θn and n. All these facts combined
together imply that π(·) satisfies the assumptions in (A-2).

4.2. The unbounded case
We now consider the case when pn → ∞ and p6+ϵ
n /n → 0 as n → ∞. For the moment matching prior derivation in
this case, (a) we will assume that there exists an α > 0 such that pn/nα
→ ∞ as n → ∞, and (b) we will replace the
assumption that ∥zi∥ ≤ M
√
pn for every 1 ≤ i ≤ n in (A-0), by the stronger assumption |zir | ≤ M for every 1 ≤ i ≤ n
and 1 ≤ r ≤ pn. Note that the M does not depend on n. Recall that the posterior expansion (3.20) in this case holds with
Cn = {g : ∥g∥ < p
1/2+ϵ/6
n }. The basic technique for deriving the moment matching prior remains the same as in the
uniformly bounded case. However, the order of various terms used in the analysis is different as compared to the uniformly
bounded case. Hence, this case is more complex, and needs a more careful consideration of all the relevant terms.
Note again that by (A-0) and (2.10), tr( ˆΣn) = Op(pn). By the analysis leading to (3.31) it follows that

Cn
∥g∥ |A3(g)|Npn

g|0, ˆΣn

dg = Op

1
n2
 n
i=1

Cn
∥g∥

zT
i g

4
Npn

g|0, ˆΣn

dg
= Op

1
n2
 n
i=1

Cn

∥g∥5
+

zT
i g

5

Npn

g|0, ˆΣn

dg. (4.21)
The previous step follows from an application of Minkowski’s inequality, in particular,
|ab| ≤
aq
q
+
b˜q
˜q
for every a, b ∈ R,
with q = 5 and ˜q = 5
4
. Note that by (A-0), zT
i
ˆΣnzi = Op(pn). It follows by (4.21) that

Cn
∥g∥ |A3(g)|Npn

g|0, ˆΣn

dg = Op

p
5/2
n
n

. (4.22)
By very similar arguments which use the analysis leading up to (3.31), (3.26), (3.36) and (3.41) respectively, it can be
established that

Cn
∥g∥(A3(g))2
Npn

g|0, ˆΣn

dg = Op

p
9/2
n
n2

(4.23)

Cn
∥g∥(A2(g))2
Npn

g|0, ˆΣn

dg = Op

p
7/2
n
n

(4.24)

Cn
∥g∥ |B2(g)|Npn

g|0, ˆΣn

dg = Op

p5
n
n

(4.25)

Cn
∥g∥ |A2(g)B1(g)|Npn

g|0, ˆΣn

dg = Op

p
7/2
n
n

. (4.26)
It follows by the definition of R(g) in (3.45), and by (4.22)–(4.26) that

Cn
∥g∥(R(g) + A2(g)B1(g))Npn

g|0, ˆΣn

dg = Op

p5
n
n

. (4.27)
A simple application of Markov’s inequality along with (3.12) and (3.14) implies that

Cc
n
∥g∥ |B1(g)|Npn

g|0, ˆΣn

dg
=
1
√
n

Gc
n
∥g∥





∇ log π(ˆβn)
T
g



 Npn

g|0, ˆΣn

dg
+
1
√
n

{g:∥g∥>p
1/2+ϵ/6
n }
∥g∥





∇ log π(ˆβn)
T
g



 Npn

g|0, ˆΣn

dg
≤
1
√
n





1
n

K′ − sup
1≤i≤n


zT
i
ˆβn



2
+
1
p
1+ϵ/3
n






Rpn
∥g∥3





∇ log π(ˆβn)
T
g



 Npn

g|0, ˆΣn

dg
= Op

1
√
np
1+ϵ/3
n
 
Rpn

∥g∥6
+





∇ log π(ˆβn)
T
g




2

Npn

g|0, ˆΣn

dg. (4.28)

Note that by (A-0), (A-2) and (2.10), we get

∇ log π(ˆβn)
T
ˆΣn∇ log π(ˆβn) = Op(p3
n). It follows from (4.28) that

Cc
n
∥g∥ |B1(g)|Npn

g|0, ˆΣn

dg = Op

p
2−ϵ/3
n
√
n

. (4.29)
By a similar argument, it can be established that

Cc
n
∥g∥ |A2(g)|Npn

g|0, ˆΣn

dg = Op

p
2−ϵ/3
n
√
n

. (4.30)
Recall that there exists α > 0 such that pn/nα
→ ∞ as n → ∞. Let
α∗
= max

6
ϵ

1
α
+
1
2

, 4

.
An application of Markov’s inequality along the lines of (4.4) (but by increasing the moment by α∗
instead of 2) along with
the fact that pn = o(n1/6
) gives

Cc
n
∥g∥Npn

g | 0, ˆΣn

dg ≤

Op

p
α∗/2
n
nα∗/2

+
1
p
α∗/2+(α∗ϵ)/6
n
 
Rpn
∥g∥1+α∗
Npn

g | 0, ˆΣn

dg
= Op

p
1/2+α∗
n
nα∗/2

+ Op

p
(1+α∗)/2
n
p
α∗/2+(α∗ϵ)/6
n

= Op

1
n

. (4.31)
Since p6+ϵ
n /n → 0, it follows that
p5
n
n3/2
= op

p
2−ϵ/3
n
n

.
Using this fact along with (3.20), (4.7), (4.27), (4.29), (4.30) and (4.31), we get that
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

=
1 − op(1)
√
n

Cn
gNpn

g|0, ˆΣn
 
1 + B1(g) + A2(g)

dg + Op

p5
n
n
3
2

=
1 − op(1)
√
n

Rpn
gNpn

g|0, ˆΣn
 
B1(g) + A2(g)

dg + Op

p
2−ϵ/3
n
n

. (4.32)
Using exactly the same arguments following (4.10) in the uniformly bounded case, it follows that to obtain
Eπ(·|X)

(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}

= Op

p
2−ϵ/3
n
n

irrespective of the true value β0, we must have
π(β) ∝ |In(β)|
1
2 .
Hence, the moment matching prior (up to order p
2−ϵ/3
n /n) is given by
π(β) = Cn|In(β)|
1
2 , (4.33)
where Cn is chosen such that

Θn
π(β)dβ = 1 (note that such a choice of Cn is possible because Θn is a compact set, and π(·)
is a continuous function). Since ψ′′
is a strictly convex continuous function, it follows by the definition of Θn that ψ′′
(zT
i β) is
uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N. Hence, by (A-0), all the eigenvalues of In(β)
are uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N. It follows that |In(β)|1/pn is uniformly
bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N, which immediately implies that C
1/pn
n is uniformly
bounded (away from both zero and infinity) over n ∈ N. It follows by (4.33) that there exists η0 > 0 (not depending on n)
such that π(β) > ηpn
0 for every β ∈ Θn. Note that the dependence of π on n has been suppressed for simplicity of exposition.

We now verify that the prior density π(·) in (4.33) satisfies assumptions (2.7) and (2.8). Since ψ is infinitely differentiable,
it follows in particular that π(·) is twice continuously differentiable. Note that
∂
∂βr
log π(β) =
1
2
∂
∂βr
log |In(β)| =
1
2
pn
s,t=1
(In(β)−1
)st
∂
∂βr
(In(β))st . (4.34)
Let
K2 = sup
x∈[−K′,K′]
|ψ′′′
(x)|
and recall that
∂
∂βr
(In(β))st =
1
n
pn
s,t=1
ψ′′′
(zT
i β)zir ziszit .
It follows by (A-0), (4.34) and |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn (see the first paragraph of this subsection) that
2K2M
n
An +
∂
∂βr
In(β) and
2K2M
n
An −
∂
∂βr
In(β) are both positive definite (4.35)
for every 1 ≤ r ≤ pn. It follows by (4.34) and (4.35) that




∂
∂βr
log π(β)



 =
1
2



tr

In(β)−1 ∂
∂βr
In(β)




≤ tr

In(β)−1 K2M
n
An

≤ pnλmin (In(β)) λmax

K2M
n
An

. (4.36)
Since the eigenvalues of In(β) are uniformly bounded below over β ∈ Θn and n ∈ N, it follows by (A-0) and (4.36) that there
exists M1 (independent of β and n) satisfying
∥∇ log π(β)∥ < M1p3/2
n (4.37)
for every β ∈ Θn and n ∈ N. Hence (2.7) is satisfied.
Let 1 ≤ j, j′
≤ pn be arbitrarily chosen. Then
1
π(β)

∂2
π(β)
∂βj∂βj′

=
∂2
∂βj∂βj′
log π(β) +

∂
∂βj
log π(β)
 
∂
∂βj′
log π(β)

. (4.38)
It follows by (4.34) that




∂2
∂βj∂βj′
log π(β)



 =
1
2



tr

∂
∂βj
In(β)−1
 
∂
∂βj′
In(β)

+ tr

In(β)−1 ∂2
∂βj∂βj′
In(β)




=
1
2



tr

In(β)−1

∂
∂βj
In(β)

In(β)−1

∂
∂βj′
In(β)

+ tr

In(β)−1 ∂2
∂βj∂βj′
In(β)



 . (4.39)
Note that
∂2
∂βj∂βj′
(In(β))st =
1
n
n
i=1
ψ′′′′
(zT
i β)zijzij′ ziszit . (4.40)
Let
K3 = sup
x∈[−K′,K′]
|ψ′′′′
(x)|.
It follows by (A-0), (4.40) and |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn (see the first paragraph of this subsection) that
2K3M2
n
An +
∂2
∂βj∂βj′
In(β) and
2K3M2
n
An −
∂2
∂βj∂βj′
In(β) are both positive definite (4.41)
for every 1 ≤ r ≤ pn. It follows by (4.35), (4.39) and (4.41) that




∂2
∂βj∂βj′
log π(β)



 ≤ 2K2
2 M2
tr

In(β)−1

An
n

In(β)−1

An
n

+ K3M2
tr

In(β)−1 An
n

. (4.42)

Since the eigenvalues of In(β) are uniformly bounded below over β ∈ Θn and n ∈ N, it follows by (A-0), (4.36), (4.38) and
(4.42) that there exists M2 (independent of β and n) satisfying
max
1≤j,j′≤pn




∂2
∂βj∂βj′
log π(β)



 ≤ M2p2
n.
Hence (2.8) is satisfied.
Acknowledgments
We would like to thank Prof. Subhashis Ghosal for his help with the paper. Thanks are also due to the Associate Editor
and a referee for their useful comments.
Appendix
Multivariate normal distribution satisfies assumptions in (2.7) and (2.8)
Suppose we put a normal prior on β, i.e., β ∼ Npn (µ, A). We assume that ∥µ∥ = O(
√
pn) and ∥A−1
∥ = O(
√
pn). Note
that
∇ log π(β) = −A−1
(β − µ).
Hence,
∥∇ log π(β)∥ ≤ ∥A−1
∥ ∥β − µ∥. (A.1)
Also,
∥β − µ∥ ≤ ∥β − β0∥ + ∥β0∥ + ∥µ∥
≤ ∥β − β0∥ +

1
C1



βT
0

1
n
n
i=1
zizT
i

β0 + ∥µ∥ (by assumption (A-0))
≤ ∥β − β0∥ +

1
C1



1
n
n
i=1
(zT
i β0)2 + ∥µ∥
≤ ∥β − β0∥ +
K
√
C1
+ ∥µ∥ (by assumption (A-1))
= ∥β − β0∥ + O(
√
pn). (A.2)
It follows from (A.1), (A.2) and the assumption on µ and A, that
sup
∥β−β0∥≤Cn
∥∇ log π(β)∥ = O(pn),
where, Cn = 4

pn
n
.
Note that for 1 ≤ j, j′
≤ pn,




1
π(β)

∂2
π(β)
∂βj∂βj′



 =





−(A−1
)jj′ +

pn
k=1
(A−1
)jk(βk − µk)
 
pn
k=1
(A−1
)j′k(βk − µk)





≤ ∥A−1
∥ +

∥A−1
∥ ∥β − µ∥
2
. (A.3)
It follows from (A.2) and (A.3) that
sup
∥β−β0∥≤Cn
max
1≤j,j′≤pn




1
π(β)

∂2
π(β)
∂βj∂βj′



 = O(p2
n),
where, Cn = 4

pn
n
.

Multivariate t distribution satisfies assumptions in (2.7) and (2.8)
Suppose we put a t-prior on β, i.e., β ∼ tγ (µ, A). Here tγ (µ, A) denotes the multivariate t distribution with parameters γ ,
µ and A. We take γ to be independent of n, but allow µ = µn and A = An to vary with n (the dependence on n is suppressed
henceforth for simplicity of exposition). The density of this distribution is proportional to

1 +
1
γ
(β − µ)T
A−1
(β − µ)
−(γ +pn)/2
.
We assume that ∥A−1
∥ = O(
√
pn). Now,
∇ log π(β) =
π′
(β)
π(β)
= −

γ + pn
γ

A−1
(β − µ)

1 + 1
γ
(β − µ)T A−1(β − µ)
.
Thus
∥∇ log π(β)∥ =

γ + pn
γ
 
(β − µ)T A−2(β − µ)

1 + 1
γ
(β − µ)T A−1(β − µ)

≤ O

γ + pn
γ
 
∥A−1∥

(β − µ)T A−1(β − µ)

1 + 1
γ
(β − µ)T A−1(β − µ)

≤ O(p5/4
n ).
Now, let A−1
= ((aij
)). By straightforward manipulations, we get
1
π(β)
∂2
π(β)
∂βj∂βj′
=
1
4γ 2
(γ + pn)(γ + pn + 2)
pn
k=1
akj
(βk − µk)
pn
k=1
akj′
(βk − µk)

1 + 1
γ
pn
k,l=1
akl(βk − µk)(βl − µl)
2
−
1
2γ
(γ + pn)
ajj′

1 + 1
γ
pn
k,l=1
akl(βk − µk)(βl − µl)

(a)
≤ O(p2
n)
((β − µ)T
A−1
)j((β − µ)T
A−1
)j′
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
(b)
≤ O(p2
n)
(β − µ)T
A−2
(β − µ)
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
(c)
≤ O(p2
n)
∥A−1
∥(β − µ)T
A−1
(β − µ)
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
= O(p5/2
n ),
where (a) and (c) follow from the assumption that ∥A−1
∥ = O(
√
pn), and (b) follows since ((β − µ)T
A−1
)j ≤
(β − µ)T A−2(β − µ), ∀j = 1 . . . pn.
Proof of Lemma 1
Let αn =

pn
n
. We will show that for any given ϵ, there exists a constant C such that
P

sup
∥u∥=C
ln(β0 + αnu) < ln(β0)

≥ 1 − ϵ (A.4)
for large enough n. This will imply, with probability tending to 1, that the unique maximum ˆβn lies in the ball

β0 + αnu : ∥u∥ ≤ C

, i.e., ∥ˆβn − β0∥ = Op(αn).

Note that,
ln(β0 + αnu) − ln(β0)
= αn
n
i=1
XizT
i u −

n
i=1
ψ(zT
i (β0 + αnu)) − ψ(zT
i β0)

= αn
n
i=1

Xi − ψ′
(zT
i β0)

zT
i u −
α2
n
2
n
i=1
ψ′′
(zT
i β0)(zT
i u)2
−
α3
n
6
n
i=1
ψ′′′
(θ∗
i )(zT
i u)3
= I1 + I2 + I3, say,
where θ∗
i lies between zT
i β0 and zT
i (β0 + αnu), for every 1 ≤ i ≤ n.
Note that by (A-1), zT
i β0 is uniformly bounded (over i and n) and ψ′′
(·) is a continuous function. Hence, ψ′′
(zT
i β0) is also
uniformly bounded (over i and n) by say, K1. It follows that
E

n
i=1

Xi − ψ′
(zT
i β0)

zT
i u
2
=
n
i=1
E

(Xi − ψ′
(zT
i β0))2
(zT
i u)2

,

∵ Xi’s are independent and E[Xi] = ψ′
(zT
i β0)

=
n
i=1
(zT
i u)2
ψ′′
(zT
i β0)

∵ E

(Xi − ψ′
(zT
i β0))2

= ψ′′
(zT
i β0)

≤ K1
n
i=1
(zT
i u)2
≤ nK1uT

n
i=1
zizT
i
n

u
≤ nK1C2∥u∥2
.
The last step follows by (A-0). Hence E
n
i=1

Xi − ψ′
(zT
i β0)

zT
i u
2
= O(n)∥u∥2
. Thus,
I1 = Op(αn
√
n)∥u∥ = Op(
√
pn)∥u∥. (A.5)
Note that ψ is a strictly convex function and hence ψ′′
(·) > 0. Since ψ′′
is continuous, it follows that its infimum on
a bounded interval is strictly positive. By (A-1), zT
i β0 is uniformly bounded. This implies ψ′′
(zT
i β0) is uniformly bounded
below by a positive constant, say K2. Hence
I2 = −
α2
n
2
n
i=1
ψ′′
(zT
i β0)(zT
i u)2
≤ −K2
α2
n
2
n
i=1
(zT
i u)2
= −K2
α2
n
2
nuT

n
i=1
zizT
i /n

u
< 0,
by (A-0). Also, by (A-0) and the arguments above
|I2| ≥ K2
α2
n
2
nuT

n
i=1
zizT
i /n

u
≥ K2
α2
n
2
nC1∥u∥2
= C1K2pn∥u∥2
. (A.6)

Now, since θ∗
i lies between zT
i β0 and zT
i (β0 + αnu), it follows by (A-0) and (A-1) that
|θ∗
i | < max
1≤i≤n

|zT
i β0|, |zT
i (β0 + αnu)|

< max
1≤i≤n

K, K + αn|zT
i u|

≤ K + max
1≤i≤n
αn∥zi∥ ∥u∥
≤ K +

pn
n
O(
√
pn)∥u∥
= K + O

pn
√
n

∥u∥.
Hence ψ′′′
(θ∗
i ) is uniformly bounded by, say K3. Thus,
|I3| =





α3
n
6
n
i=1
ψ′′′
(θ∗
i )(zT
i u)3





≤ K3
α3
n
6
n
i=1
|(zT
i u)3
|
≤ K3
p
3/2
n
6n3/2
n
i=1
(∥zi∥ ∥u∥)3
=
K3M3/2
p3
n
6
√
n
∥u∥3
. (A.7)
The last step follows by (A-0). Since p6
n/n → 0 as n → ∞, it follows by (A.5)–(A.7) that the order of I2 dominates the orders
of I1 and I3 (for a suitable choice of ∥u∥). Since I2 is negative, the assertion in (A.4) holds.
References
[1] A. Barron, M. Schervish, L. Wasserman, The consistency of posterior distributions in nonparametric problems, Ann. Statist. 27 (1999) 536–561.
[2] S. Bernstein, Theory of Probability, 1917 (in Russian).
[3] D. Bontemps, Bernstein–von Mises theorems for Gaussian regression with increasing number of regressors, Ann. Statist. 39 (2011) 2557–2584.
[4] M. Crowder, Asymptotic expansions of posterior expectations, distributions and densities for stochastic processes, Ann. Inst. Statist. Math. 40 (1988)
297–309.
[5] J. Fan, H. Peng, Nonconcave penalized likelihood with a diverging number of parameters, Ann. Statist. 32 (2004) 928–961.
[6] S. Ghosal, Normal approximation to the posterior distribution for generalized linear models with many covariates, Math. Methods Statist. 6 (1997)
332–348.
[7] S. Ghosal, Asymptotic normality of posterior distributions in high dimensional linear models, Bernoulli 5 (1999) 315–331.
[8] S. Ghosal, Asymptotic normality of posterior distributions for exponential families with many parameters, J. Multivariate Anal. 74 (2000) 49–69.
[9] S. Ghosal, J. Ghosh, A. van der Vaart, Convergence rates of posterior distributions, Ann. Statist. 28 (2000) 500–531.
[10] S. Ghosal, T. Samanta, Asymptotic expansions of posterior distributions in nonregular cases, Ann. Inst. Statist. Math. 49 (1997) 181–197.
[11] M. Ghosh, Objective priors: an introduction for frequentists (with discussion), Statist. Sci. 26 (2011) 187–211.
[12] M. Ghosh, R. Liu, Moment matching priors, Sankhy¯a 73-A (2011) 185–201.
[13] J.K. Ghosh, B.K. Sinha, S.N. Joshi, Expansion for posterior probability and integrated Bayes risk, in: S.S. Gupta, J.O. Berger (Eds.), Statistical Decision
Theory and Related Topics III, Academic Press, 1982, pp. 403–456.
[14] S.J. Haberman, Maximum likelihood estimates in exponential response models, Ann. Statist. 5 (1977) 815–841.
[15] J.G. Ibrahim, P.W. Laud, On Bayesian analysis of generalized linear models using Jeffreys’s prior, J. Amer. Statist. Assoc. 86 (1991) 981–986.
[16] R.A. Johnson, On asymptotic expansion for posterior distribution, Ann. Math. Statist. 38 (1967) 1899–1906.
[17] R.A. Johnson, Asymptotic expansions associated with posterior distribution, Ann. Math. Statist. 42 (1970) 1241–1253.
[18] H. Liang, P. Du, Maximum likelihood estimation in logistic regression models with a diverging number of covariates, Electron. J. Stat. 6 (2012)
1838–1846.
[19] S. Portnoy, Asymptotic behavior of M-estimators of p regression parameters when p2
/n is large. I: Consistency, Ann. Statist. 12 (1984) 1298–1309.
[20] S. Portnoy, Asymptotic behavior of M-estimators of p regression parameters when p2
/n is large. II: Normal approximation, Ann. Statist. 13 (1985)
1403–1417.
[21] S. Portnoy, Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity, Ann. Statist. 16
(1988) 356–366.
[22] A.M. Walker, On the asymptotic behavior of the posterior distribution, J. R. Stat. Soc. Ser. B 26 (1969) 80–88.
[23] C.M. Zhang, Y. Jiang, Y. Chai, Penalized Bregman divergence for large-dimensional regression and classification, Biometrika 97 (2011) 551–566.

JMVA_Paper_Shibasish

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (12)

Similar to JMVA_Paper_Shibasish

Similar to JMVA_Paper_Shibasish (20)

JMVA_Paper_Shibasish