This document presents an asymptotic expansion of the posterior density in high dimensional generalized linear models. The main results are:
1) The authors prove a third order correct asymptotic expansion of the posterior density for generalized linear models with canonical link functions when the number of regressors grows with sample size.
2) This asymptotic expansion is then used to derive moment matching priors in the generalized linear model setting.
3) The expansion assumes the number of regressors grows such that p6+εn /n → 0 as n → ∞ for some small ε > 0, which is stronger than prior work requiring only p4n log(pn)/n → 0.
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATIONcsandit
Many real-life applications need to handle and manage time pieces of information. Allen temporal relations are one of the most used and known formalisms for modelling and handling
temporal data. This paper discusses a novel idea to introduce some kind of flexibility in defining such relations between two fuzzy time intervals. The key concept of this approach is a fuzzy tolerance relation conveniently modelled. Tolerant Allen temporal relations are then defined using the dilated and the eroded intervals of the initial fuzzy time intervals. By leveraging some particular fuzzy indices to compare two fuzzy time intervals, this extension of Allen relations is integrated in the Fuzz-TIME system developed in our previous works.
This article continues the study of concrete algebra-like structures in our polyadic approach, where the arities of all operations are initially taken as arbitrary, but the relations between them, the arity shapes, are to be found from some natural conditions ("arity freedom principle"). In this way, generalized associative algebras, coassociative coalgebras, bialgebras and Hopf algebras are defined and investigated. They have many unusual features in comparison with the binary case. For instance, both the algebra and its underlying field can be zeroless and nonunital, the existence of the unit and counit is not obligatory, and the dimension of the algebra is not arbitrary, but "quantized". The polyadic convolution product and bialgebra can be defined, and when the algebra and coalgebra have unequal arities, the polyadic version of the antipode, the querantipode, has different properties. As a possible application to quantum group theory, we introduce the polyadic version of braidings, almost co-commutativity, quasitriangularity and the equations for the R-matrix (which can be treated as a polyadic analog of the Yang-Baxter equation). Finally, we propose another concept of deformation which is governed not by the twist map, but by the medial map, where only the latter is unique in the polyadic case. We present the corresponding braidings, almost co-mediality and M-matrix, for which the compatibility equations are found.
Slides taught in the course of pattern recognition of Professor Zohreh Azimifar at Shiraz University.
The slides are owned by the University of Texas.
اسلاید های تدریس شده درس شناسایی آماری الگو استاد زهره عظیمی فر در دانشگاه شیراز.
اسلاید ها متعلق به دانشگاه تگزاس است.
Fixed Point Theorems for Weak K-Quasi Contractions on a Generalized Metric Sp...IJERA Editor
In this paper we obtain conditions for a k- quasi contraction on a generalized metric space with a partial order to have a fixed point. Using this, we derive certain known results as corollaries.
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATIONcsandit
Many real-life applications need to handle and manage time pieces of information. Allen temporal relations are one of the most used and known formalisms for modelling and handling
temporal data. This paper discusses a novel idea to introduce some kind of flexibility in defining such relations between two fuzzy time intervals. The key concept of this approach is a fuzzy tolerance relation conveniently modelled. Tolerant Allen temporal relations are then defined using the dilated and the eroded intervals of the initial fuzzy time intervals. By leveraging some particular fuzzy indices to compare two fuzzy time intervals, this extension of Allen relations is integrated in the Fuzz-TIME system developed in our previous works.
This article continues the study of concrete algebra-like structures in our polyadic approach, where the arities of all operations are initially taken as arbitrary, but the relations between them, the arity shapes, are to be found from some natural conditions ("arity freedom principle"). In this way, generalized associative algebras, coassociative coalgebras, bialgebras and Hopf algebras are defined and investigated. They have many unusual features in comparison with the binary case. For instance, both the algebra and its underlying field can be zeroless and nonunital, the existence of the unit and counit is not obligatory, and the dimension of the algebra is not arbitrary, but "quantized". The polyadic convolution product and bialgebra can be defined, and when the algebra and coalgebra have unequal arities, the polyadic version of the antipode, the querantipode, has different properties. As a possible application to quantum group theory, we introduce the polyadic version of braidings, almost co-commutativity, quasitriangularity and the equations for the R-matrix (which can be treated as a polyadic analog of the Yang-Baxter equation). Finally, we propose another concept of deformation which is governed not by the twist map, but by the medial map, where only the latter is unique in the polyadic case. We present the corresponding braidings, almost co-mediality and M-matrix, for which the compatibility equations are found.
Slides taught in the course of pattern recognition of Professor Zohreh Azimifar at Shiraz University.
The slides are owned by the University of Texas.
اسلاید های تدریس شده درس شناسایی آماری الگو استاد زهره عظیمی فر در دانشگاه شیراز.
اسلاید ها متعلق به دانشگاه تگزاس است.
Fixed Point Theorems for Weak K-Quasi Contractions on a Generalized Metric Sp...IJERA Editor
In this paper we obtain conditions for a k- quasi contraction on a generalized metric space with a partial order to have a fixed point. Using this, we derive certain known results as corollaries.
Improving on daily measures of price discoveryFGV Brazil
We formulate a continuous-time price discovery model in which the price discovery measure varies (stochastically) at daily frequency. We estimate daily measures of price discovery using a kernel-based OLS estimator instead of running separate daily VECM regressions as standard in the literature. We show that our estimator is not only consistent, but also outperforms the standard daily VECM in finite samples. We illustrate our theoretical findings by studying the price discovery process of 10 actively traded stocks in the U.S. from 2007 to 2013.
Date: 2017-03
Authors:
Dias, Gustavo Fruet
Fernandes, Marcelo
Scherrer, Cristina Mabel
We propose a regularized method for multivariate linear regression when the number of predictors may exceed the sample size. This method is designed to strengthen the estimation and the selection of the relevant input features with three ingredients: it takes advantage of the dependency pattern between the responses by estimating the residual covariance; it performs selection on direct links between predictors and responses; and selection is driven by prior structural information. To this end, we build on a recent reformulation of the multivariate linear regression model to a conditional Gaussian graphical model and propose a new regularization scheme accompanied with an efficient optimization procedure. On top of showing very competitive performance on artificial and real data sets, our method demonstrates capabilities for fine interpretation of its parameters, as illustrated in applications to genetics, genomics and spectroscopy.
Na matemática, o teorema de Green-Tao, demonstrado por Ben Green e Terence Tao em 2004, afirma que a sequência de números primos contém progressões aritméticas arbitrariamente longas. Em outras palavras, para cada número natural k, existe um progressão aritmética formada por k números primos. O Teorema de Green-Tao é um caso particular da conjectura de Erdös sobre progressões aritméticas.
Defining the generative probabilistic topic model for text summarization that aims at extracting a small subset of sentences from the corpus with respect to some given query.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...mathsjournal
In this paper, the optimal prediction of the expected value of assets under the fractal scaling exponent is
considered. We first obtain a fractal exponent, then derive a seemingly Black-Scholes parabolic equation.
We further obtain its solutions under given conditions for the prediction of expected value of assets given
the fractal exponent.
Found this paper really interesting. It delves into the learning behaviors of Deep Learning Ensembles and compares them Bayesian Neural Networks, which theoretically does the same thing. This answers why Deep Ensembles Outperform
Improving on daily measures of price discoveryFGV Brazil
We formulate a continuous-time price discovery model in which the price discovery measure varies (stochastically) at daily frequency. We estimate daily measures of price discovery using a kernel-based OLS estimator instead of running separate daily VECM regressions as standard in the literature. We show that our estimator is not only consistent, but also outperforms the standard daily VECM in finite samples. We illustrate our theoretical findings by studying the price discovery process of 10 actively traded stocks in the U.S. from 2007 to 2013.
Date: 2017-03
Authors:
Dias, Gustavo Fruet
Fernandes, Marcelo
Scherrer, Cristina Mabel
We propose a regularized method for multivariate linear regression when the number of predictors may exceed the sample size. This method is designed to strengthen the estimation and the selection of the relevant input features with three ingredients: it takes advantage of the dependency pattern between the responses by estimating the residual covariance; it performs selection on direct links between predictors and responses; and selection is driven by prior structural information. To this end, we build on a recent reformulation of the multivariate linear regression model to a conditional Gaussian graphical model and propose a new regularization scheme accompanied with an efficient optimization procedure. On top of showing very competitive performance on artificial and real data sets, our method demonstrates capabilities for fine interpretation of its parameters, as illustrated in applications to genetics, genomics and spectroscopy.
Na matemática, o teorema de Green-Tao, demonstrado por Ben Green e Terence Tao em 2004, afirma que a sequência de números primos contém progressões aritméticas arbitrariamente longas. Em outras palavras, para cada número natural k, existe um progressão aritmética formada por k números primos. O Teorema de Green-Tao é um caso particular da conjectura de Erdös sobre progressões aritméticas.
Defining the generative probabilistic topic model for text summarization that aims at extracting a small subset of sentences from the corpus with respect to some given query.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
OPTIMAL PREDICTION OF THE EXPECTED VALUE OF ASSETS UNDER FRACTAL SCALING EXPO...mathsjournal
In this paper, the optimal prediction of the expected value of assets under the fractal scaling exponent is
considered. We first obtain a fractal exponent, then derive a seemingly Black-Scholes parabolic equation.
We further obtain its solutions under given conditions for the prediction of expected value of assets given
the fractal exponent.
Found this paper really interesting. It delves into the learning behaviors of Deep Learning Ensembles and compares them Bayesian Neural Networks, which theoretically does the same thing. This answers why Deep Ensembles Outperform
English Speaking Classes in Baner Pune | Pune Training Institutekunal gaikwad
Pune Training Institute provides one of the best English Speaking training classes in Baner Pune. It provides weekday and weekend classes in baner,Pune
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013Christian Robert
Those are the slides for my conference talk at 2013 WSC, in the "Jacob Bernoulli's "Ars Conjectandi" and the emergence of probability" session organised by Adam Jakubowski
Gibbs Sampling with JAGS: Behind the ScenesGianpaolo Coro
Gibbs sampling is a Bayesian inference technique that is used in various scientific domains to generate samples from a certain posterior probability density function, given experimental data. Several software implementations of Gibbs sampling exist, which generally adopt very different approaches, because it is not easy to make a Gibbs sampling implementation exactly correspond to the theoretical approach. In particular, these implementations may use different approximation algorithms to find solutions to sub-steps of the Gibbs sampling process. Scientists working in different domains often use Gibbs sampling software without knowing the details of the implementation. Nevertheless, it is our experience that understanding the implementation can be crucial to enhance the performance of a model, because a software configuration conceived to help the underlying implementation may end in better approximation of the estimated probabilities functions. JAGS (Just Another Gibbs Sampler) is a widely used open-source implementation of Gibbs sampling. Its installation and user's guide are accurate, but do not indicate how the software really implements Gibbs sampling and it is not easy to infer this information from the source code. The aim of this paper is to give a high-level overview of the JAGS algorithms and its extensions that implement Gibbs sampling. Our target reader is a scientist who may want to understand the basic concepts underlying Bayesian inference and Gibbs sampling and who want to be aware of what happens behind the scenes when building a model.
Connectivity-Based Clustering for Mixed Discrete and Continuous DataIJCI JOURNAL
This paper introduces a density-based clustering procedure for datasets with variables of mixed type. The proposed procedure, which is closely related to the concept of shared neighbourhoods, works particularly well in cases where the individual clusters differ greatly in terms of the average pairwise distance of the associated objects. Using a number of concrete examples, it is shown that the proposed clustering algorithm succeeds in allowing the identification of subgroups of objects with statistically significant distributional characteristics.
Considerations on the genetic equilibrium lawIOSRJM
In the first part of the paper I willpresentabriefreview on the Hardy-Weinberg equilibrium and it's formulation in projective algebraicgeometry. In the second and last part I willdiscussexamples and generalizations on the topic
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document presents a novel Copula based approach to generate critical sea states given a target reliability index based on the return period of the extreme event. Copula based approach is much more flexible and powerful when compared to conventional approaches using linear correlation coefficient.
On Extension of Weibull Distribution with Bayesian Analysis using S-Plus Soft...Dr. Amarjeet Singh
We introduce a new lifetime distribution which is an
extension of Weibull distribution and yields a number of wellknown distributions as a special case. The Bayesian analysis
using different priors is discussed using different priors with
the help of S-Plus software as nowadays great attention has
been given to Bayesian approach.
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...Loc Nguyen
Bayesian network is a powerful mathematical tool for prediction and diagnosis applications. A large Bayesian network can be constituted of many simple networks which in turn are constructed from simple graphs. A simple graph consists of one child node and many parent nodes. The strength of each relationship between a child node and a parent node is quantified by a weight and all relationships share the same semantics such as prerequisite, diagnostic, and aggregation. The research focuses on converting graphic relationships into conditional probabilities in order to construct a simple Bayesian network from a graph. Diagnostic relationship is the main research object, in which sufficient diagnostic proposition is proposed for validating diagnostic relationship. Relationship conversion is adhered to logic gates such as AND, OR, and XOR, which is essential feature of the research.
Measuring Robustness on Generalized Gaussian DistributionIJERA Editor
Different from previous work that measured robustness its own distribution, measuring robustness with a robust
estimator on a generalized Gaussian distribution is introduced here. In detail, an unbiased Maximum Likelihood
(ML) variance estimator and a robust variance estimator of the Gaussian distribution with a given censoring
value are applied to the generalized Gaussian distribution that represents Gaussian, Laplace, and Cauchy
distributions; then, Mean Square Error (MSE) is calculated to measure robustness. Afterward, how robustness
changes is shown because the actual distribution varies over the generalized Gaussian distribution. The results
indicate that measuring the MSE of the system can be used to point out how robust the system is when the
system distribution changes.
AN ADVANCED TOOL FOR MANAGING FUZZY COMPLEX TEMPORAL INFORMATIONcscpconf
Many real-life applications need to handle and manage time pieces of information. Allen temporal relations are one of the most used and known formalisms for modeling and handling temporal data. This paper discusses a novel idea to introduce some kind of flexibility in defining such relations between two fuzzy time intervals. The key concept of this approach is a fuzzy tolerance relation conveniently modeled. Tolerant Allen temporal relations are then defined using the dilated and the eroded intervals of the initial fuzzy time intervals. By leveraging some particular fuzzy indices to compare two fuzzy time intervals, this extension of Allen relations is integrated into the Fuzz-TIME system developed in our previous works
Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...theijes
The most commonly used method to describe the relationship between response and independent variables is a linear model with Gaussian distributed errors. In practical components, the variables examined might not be mesokurtic and the populace values probably finitely limited. In this paper, we introduce a multiple linear regression models with two-parameter doubly truncated new symmetric distributed (DTNSD) errors for the first time. To estimate the model parameters we used the method of maximum likelihood (ML) and ordinary least squares (OLS). The model desires criteria such as Akaike information criteria (AIC) and Bayesian information criteria (BIC) for the models are used. A simulation study is performed to analysis the properties of the model parameters. A comparative study of doubly truncated new symmetric linear regression models on the Gaussian model showed that the proposed model gives good fit to the data sets for the error term follow DTNSD
2. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 127
allowed to grow with the sample size n. In particular, it is assumed that p4
n log pn/n → 0. Later, Ghosal [7] established
asymptotic normality of the posterior for linear regression models in a similar high dimensional setup as Ghosal [6]. In [8],
asymptotic normality of the posterior was established for exponential families as the number of parameters grows with the
sample size. Bontemps [3] extended the work of Ghosal [7] by permitting the model to be misspecified and the number of
regressors to grow proportionally to the sample size. Barron et al. [1] and Ghosal et al. [9] have considered the notion of
posterior consistency in nonparametric settings.
In this paper, we focus on generalized linear models (GLM) with canonical link function. The main objective of this paper
is to extend the asymptotic consistency result of Ghosal [6], by providing a third order correct asymptotic expansion of
the posterior density for GLM with canonical link function when the number of regressors grows to infinity at a certain rate
relative to the growth of the sample size n. Since a general link function is a one-to-one function of the canonical link function,
we can get a similar asymptotic expansion for the vector of regression parameters in the general case as well. The results
bear potential for the development of a variety of objective priors in this framework. The first step towards the development
of reference priors, probability matching priors, moment matching priors and others requires asymptotic expansions of
posteriors (cf. [11]). In particular, we use the asymptotic expansion to derive moment matching priors (introduced in [12])
in the GLM setting. To the best of the authors’ knowledge, identification of moment matching priors in this setting (both
when the number of regressors is bounded, and when the number of regressors increases with n) has not been undertaken
in the literature.
The paper is organized as follows. In Section 2, we introduce the model and provide the required assumptions. In Section 3,
we prove the main asymptotic expansion result (Theorem 1). In Section 4, we use this asymptotic expansion to derive
moment matching priors. The Appendix contains proofs which establish that the assumptions (in Section 2) on the prior
density are satisfied by the multivariate normal and multivariate t densities.
2. Preliminaries
2.1. Setup and assumptions
Let X1, . . . , Xn be independent random variables. Let fi(·) denote the density of Xi with respect to a σ-finite measure ν.
Suppose
fi(xi) = exp[xiθi − ψ(θi)], i = 1, . . . , n, (2.1)
where, θi = zT
i β, β = (β1, . . . , βpn )T
is the vector of parameters and zi = (zi1, . . . , zipn )T
is the vector of covariates for
i = 1, . . . , n. Note that we are allowing the dimension pn of the parameter β to grow with the sample size n. Also, the
cumulant generating function ψ is infinitely differentiable and is assumed to be strictly convex. The above model is termed
by Haberman [14] as the ‘‘Dempster model’’.
Let π(·) denote the prior density of β. Then the posterior density of β given the observations X1, . . . , Xn is defined by:
π(β | X) =
exp[ln(β)]π(β)
exp[ln(β)]π(β)dβ
, (2.2)
where,
ln(β) =
n
i=1
(XizT
i β − ψ(zT
i β))
is the log-likelihood function. Note that the covariate vectors z1, . . . , zn, the true parameter value β0, the prior π(·), the posterior
π(· | X) all change with n. However, we suppress this dependence in our notation for simplicity of exposition. We now state the
regularity conditions needed for our result.
• (A-0) The matrix An defined by the relation
An =
n
i=1
zizT
i
is positive definite and the eigenvalues of 1
n
An are uniformly bounded, i.e., ∃ constants C1 and C2 (independent of n) such
that the matrix 1
n
An satisfies the following.
0 < C1 < λmin
1
n
An
≤ λmax
1
n
An
< C2 < ∞,
for all n. Here λmax and λmin respectively denote the largest and smallest eigenvalues of the appropriate matrix. Further,
we assume that ∥zi∥ =
zT
i zi = O(
√
pn). More specifically, there exists a constant M (independent of n) such that
∥zi∥ ≤ M
√
pn.
3. 128 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
• (A-1) Let β0 denote the (sequence of) true value of the regression parameter vector β. Note that θ0i = zT
i β0 is the true
value of the parameter θi in (2.1). We assume that max1≤i≤n |θ0i| is uniformly bounded as n varies, i.e., there exists a
constant K (independent of n) such that
max
1≤i≤n
|zT
i β0| = max
1≤i≤n
|θ0i| < K. (2.3)
As mentioned in [6,7], this assumption makes sense particularly if the data is clean from extreme outliers. As in [6,7], we
also assume that the parameter space is restricted to those values of β for which
max
1≤i≤n
|zT
i β| ≤ K′
, (2.4)
for some K′
> K. This is equivalent to the statement that the parameter space is restricted to Θn, where
Θn = {β : max
1≤i≤n
|zT
i β| ≤ K′
}. (2.5)
Note that Θn is a convex set. The posterior density of β given the observations X1, . . . , Xn (introduced in (2.2)) is more
precisely given by
π(β | X) =
exp[ln(β)]π(β)
Θn
exp[ln(β)]π(β)dβ
1{β∈Θn}. (2.6)
We refer the reader to Ghosal [6,7] for details and discussion about this assumption. The summary is that a frequentist
can think of this as a compactness assumption to prevent the posterior mass from going to infinity. A Bayesian can think
this as a convenient and reasonable prior belief about θ. It should be noted that actual knowledge of K and K′
is not
required to obtain the main terms (up to the third order) in the expansion in Theorem 1. But K and K′
do control the rate
at which the op(1) terms in the expansion converge to 0.
In this context it is also important to clarify that when we propose priors like multivariate normal or multivariate t for
β, we implicitly truncate these priors to the region Θn.
• (A-2) The prior density π(·) of β satisfies
Θn
π(β)dβ = 1 and π(β0) > ηpn
0 , for some η0 > 0 (η0 does not depend on n).
Also, π(·) is assumed to be twice continuously differentiable with
sup
∥β−β0∥≤Cn
∥∇ log π(β)∥2 < M1p3/2
n for some M1 > 0, (2.7)
and
sup
∥β−β0∥≤Cn
max
1≤j,j′≤pn
1
π(β)
∂2
π(β)
∂βj∂βj′
< M2p5/2
n for some M2 > 0, where, Cn = 4
pn
n
. (2.8)
This assumption is satisfied by appropriate multivariate t and multivariate normal densities (see Appendix). Note that
the prior density can be improper as a density on Rpn . We only assume that it has been normalized to integrate to 1 on the
compact set Θn.
• (A-3) The dimension pn can grow to infinity such that
p
6+ϵ
n
n
→ 0 as n → ∞ for some small ϵ > 0.
Note that (A-3) is stronger than the corresponding assumption in [6] which only requires p4
n log pn/n → 0. However, the
goal in [6] is to establish asymptotic normality of the posterior. Our goal is to get a third order asymptotic expansion of the
posterior. Hence it is not surprising that we need a slower rate of increase for pn.
2.2. Asymptotic convergence rate for MLE
Let ˆβn be the maximum likelihood estimator of β. It follows by the convexity of ψ and assumption (A-0) that the Hessian
matrix of ln(β) is a negative definite matrix for all β. Hence ln(β) is a strictly convex function and has a unique maximum.
The following lemma (Lemma 1) establishes weak consistency of the maximum likelihood estimator ˆβn, and provides an
asymptotic rate of convergence. This lemma is helpful in proving the main result (Theorem 1). Haberman [14] established
consistency and asymptotic normality for the MLE in exponential response models, a more general version of the Dempster
model considered here, when
p3
n
n
→ 0. However, it is not quite clear if Haberman’s results can be used under our assumptions
to obtain the asymptotic rate in Lemma 1. Hence, for the sake of completeness, we provide an independent proof of Lemma 1
in the Appendix by adapting the approach of Fan and Peng [5] (in the i.i.d. setting) to the GLM setting.
We briefly mention some other works on high dimensional consistency and asymptotic normality of the MLE, and
the differences between our setup and the setup in those papers. Portnoy [19,20] established consistency and asymptotic
normality of M-estimators in the context of linear regression, as the number of regression parameters pn grows with the
sample size n (satisfying the condition
(pn log pn)3/2
n
→ 0).1
Portnoy [21] established consistency and asymptotic normality
1 See [19,20] for references to earlier works in this area.
4. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 129
of the MLE for i.i.d. observations from exponential families, as the number of parameters pn grows with the sample size n
(satisfying the condition
p
3/2
n
n
→ 0). This is a different setting than the regression based setting (with covariates) considered
in this paper. Fan and Peng [5] established high dimensional consistency and asymptotic normality of penalized likelihood
estimators (MLE can be thought of as a special case). However, they considered the i.i.d. setting, which is different than
the setting in this paper. Zhang et al. [23] considered penalized pseudo-likelihood estimators for high dimensional GLM.
However, their Bregman divergence based loss functions do not include the negative log-likelihood loss function. More
specifically, in the context of GLM with canonical link, Zhang et al.’s [23] loss function looks like
n
i=1
−q(Xi) + q(ψ′
(zT
i β)) + (Xi − ψ′
(zT
i β))q′
(ψ′
(zT
i β)), (2.9)
where q(·) is a concave function. The log-likelihood function ln(β) cannot be written in this form. A proof of high dimensional
asymptotic normality of ˆβn in the special case of logistic regression is provided in [18].
Lemma 1. Under assumptions (A-0)–(A-3), the maximum likelihood estimator ˆβn satisfies ∥ˆβn − β0∥ = Op(
pn
n
).
Remark 1. Note that by Lemma 1 and (A-0),
|zT
i (ˆβn − β0)| ≤ ∥zi∥ ∥ˆβn − β0∥ = Op
pn
√
n
.
By (A-1), it follows that
ˆβn ∈
β : max
1≤i≤n
|zT
i β| <
K + K′
2
(2.10)
with probability tending to 1 as n → ∞. In particular, we get ˆβn ∈ Θn with probability tending to 1 as n → ∞.
3. Main result
In this section, we derive our main result: a third order correct asymptotic expansion of the posterior π(· | X) around
an appropriate normal density. We transform the parameter β to g =
√
n(β − ˆβn). Since the parameter space for β is Θn,
it follows that the parameter space for g is
Gn :=
g : ˆβn +
g
√
n
∈ Θn
.
From (2.2) we obtain that the posterior density of g is given by
π∗
(g | X) =
exp
ln
ˆβn + g
√
n
− ln(ˆβn)
π
ˆβn + g
√
n
Gn
exp
ln
ˆβn + g
√
n
− ln(ˆβn)
π
ˆβn + g
√
n
dg
1g∈Gn . (3.1)
We now prove a series of lemmas which help us to prove our main result (Theorem 1). We first show that Ghosal’s [6] result
on posterior consistency holds under our assumptions.
Lemma 2. Under assumptions (A-0)–(A-3) described above,
|π∗
(g | X) − Npn
g|µn, Σn
|dg → 0, (3.2)
where Npn
g|µn, Σn
is a pn-dimensional normal density with mean vector
µn =
√
nB−1
n
n
i=1
Xi − ψ′
zT
i β0
zi −
√
n(ˆβn − β0),
and the inverse covariance matrix
Σ−1
n =
1
n
Bn =
1
n
n
i=1
ψ′′
zT
i β0
zizT
i .
Proof. We verify that the assumptions in [6] follow from (A-0) to (A-3). Note that Ghosal (Eqs. (2.6) and (2.7)) follows
immediately from our assumptions (A-1) and (A-2). Let δn = ∥A
−1/2
n ∥. By (A-0), it follows that δn = O(n−1/2
). Note that by
5. 130 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
(A-2), if ∥β − β0∥ ≤ 4
pn
n
, then by the mean value theorem,
| log π(β) − log π(β0)| ≤ sup
∥β−β0∥≤ 4
√pn
n
∥∇ log π(β)∥ ∥β − β0∥ ≤ M1p3/2
n ∥β − β0∥.
Note that
pn(log pn)1/2
δn = O
p
1+ ϵ
3
n
√
n
= o
4
pn
n
.
Hence, Ghosal [6, Eq. (2.8)] is satisfied with Kn = M1p
3/2
n . Note that
Knδnpn(log pn)1/2
=
p
5/2+ϵ/3
n
√
n
→ 0.
Let ηn = max1≤i≤n ∥A
−1/2
n zi∥. Then
ηn ≤ ∥A−1/2
n ∥ max
1≤i≤n
∥zi∥ = O
pn
n
,
where
A
− 1
2
n
= sup
A
− 1
2
n x
∥x∥
: x ∈ Rn
with x ̸= 0
.
This means
p3/2
n (log pn)1/2
ηn = O
p3/2+ϵ/3
n
pn
n
= O
p
2+ϵ/3
n
√
n
→ 0.
Hence, Ghosal [6, Eq. (2.10)] is satisfied.
Now, since 1
n
n
i=1 zizT
i has uniformly bounded eigenvalues (by (A-0)), hence
tr
1
n
n
i=1
zizT
i
= O(pn).
Elementary manipulations using properties of trace imply that
n
i=1
pn
j=1
z2
ij = tr
n
i=1
zizT
i
= O(npn).
Thus, Ghosal [6, Eq. (2.11)] is also satisfied. Hence, all the assumptions in [6] hold. The lemma now follows from Theorem
2.1 of Ghosal [6] (using a straightforward linear transformation).
Define the function
Zn(g) := exp
ln
ˆβn +
g
√
n
− ln(ˆβn)
. (3.3)
Note that
π∗
(g|X) =
Zn(g)π
ˆβn + g
√
n
Gn
Zn(g)π
ˆβn + g
√
n
dg
1g∈Gn .
Henceforth, we assume that pn → ∞. If pn is uniformly bounded, a simple modification of the arguments below can be used to
establish the asymptotic expansion result. See Section 4.1.
Lemma 3. Let Cn :=
g : g ∈ Gn, ∥g∥ ≤ p
1
2
+ϵ′
n
and Kn = π(ˆβn)(2π)
pn
2
−
∇2ln(ˆβn)
n
1/2
, where ϵ′
= ϵ
6
. Then,
Cn
1
Kn
Zn(g)π
ˆβn +
g
√
n
dg
P
→ 1. (3.4)
6. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 131
Proof. Note that, Zn(g) = exp
ln(ˆβn + g
√
n
) − ln(ˆβn)
. By a third order correct Taylor series expansion of ln around ˆβn, we
get that
Zn(g) = exp
gT
∇2
ln(ˆβn)g
2n
−
1
6n3/2
n
i=1
ψ′′′
(zT
i β∗
n)
pn
r=1
zir gr
3
, (3.5)
where β∗
n = β∗
n(g) is an intermediate point on the line joining between ˆβn and (ˆβn + g
√
n
). Note that by Lemma 1, ˆβn ∈ Θn
with probability tending to 1. Also, by the definition of Gn it follows that (ˆβn + g
√
n
) ∈ Θn for every g ∈ Gn. It follows by the
convexity of Θn and (2.10) that
P
β∗
n(g) ∈ Θn ∀ g ∈ Gn
→ 1, (3.6)
as n → ∞. Also, if g ∈ Cn,
pn
r=1
zir gr
≤ ∥zi∥ ∥g∥ ≤ M
√
pnp
1
2
+ϵ′
n = Mp1+ϵ′
n . (3.7)
Let
K2 := sup
x∈[−K′,K′]
ψ′′′
(x).
Note that K2 < ∞ by continuity of ψ′′′
. Hence, if ˆβn ∈ Θn and g ∈ Cn,
1
n3/2
n
i=1
ψ′′′
(zT
i β∗
n)
pn
r=1
zir gr
3
≤
K2
n3/2
n
i=1
pn
r=1
zir gr
3
≤
K2M3
p3+3ϵ′
n
√
n
. (3.8)
The previous inequality follows by (3.7). It follows by (A-3) that
sup
g∈Cn
1
n3/2
n
i=1
ψ′′′
(zT
i β∗
n)
pn
r=1
zir gr
3
= Op
p
3+ ϵ
2
n
√
n
= op(1). (3.9)
Also, by (A-2), it follows that if g ∈ Cn, then
π
ˆβn + g
√
n
π(ˆβn)
= exp
log π
ˆβn +
g
√
n
− log π(ˆβn)
= exp
(∇ log π(β∗∗
n ))T g
√
n
,
for some intermediate point β∗∗
n = β∗∗
n (g) on the line joining ˆβn and ˆβn + g
√
n
. Note that by Lemma 1 and (A-3) that
supg∈Cn
∥β∗∗
n − β0∥ = op
4
pn
n
. It follows by (A-2) that,
sup
g∈Cn
(∇ log π(β∗∗
n ))T g
√
n
≤ sup
g∈Cn
∥∇ log π(β∗∗
n )∥ ∥g∥
√
n
= Op
p2+ϵ′
n
√
n
= op(1). (3.10)
It follows by (3.5), (3.9), (3.10) and the definition of Kn that
Cn
1
Kn
Zn(g)π
ˆβn +
g
√
n
dg = exp(op(1))
Cn
Npn
g|0, ˆΣn
dg, (3.11)
7. 132 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
where ˆΣn =
−∇2ln(ˆβn)
n
−1
=
1
n
n
i=1 ψ′′
(zT
i
ˆβn)zizT
i
−1
. Note that if Un ∼ Npn
0, ˆΣn
, then
sup
1≤i≤n
zT
i
ˆβn +
Un
√
n
> K′
⇒ sup
1≤i≤n
|zT
i Un| >
√
n
K′
− sup
1≤i≤n
zT
i
ˆβn
⇒ ∥Un∥ >
n
pnM2
K′
− sup
1≤i≤n
zT
i
ˆβn
(by (A-0)). (3.12)
By the strict convexity of ψ, Lemma 1 and (A-0), it follows that
ENpn (0, ˆΣn)
∥Un∥2
= trace( ˆΣn) = Op(pn). (3.13)
Also by (2.10), it follows that
1
K′ − sup
1≤i≤n
|zT
i
ˆβn|
= Op(1). (3.14)
Note that Cc
n = Gc
n ∪
g : ∥g∥ > p
1
2
+ϵ′
n
. A simple application of Markov’s inequality, along with (3.12)–(3.14), yields that
Cc
n
Npn
g | 0, ˆΣn
dg ≤
Gc
n
Npn
g | 0, ˆΣn
dg +
g:∥g∥≥p
1
2
+ϵ′
n
Npn
g | 0, ˆΣn
dg
≤
E∥Un∥2
pnM2
n
K′ − sup
1≤i≤n
zT
i
ˆβn
2
+
E∥Un∥2
p1+2ϵ′
n
= Op
p2
n
n
+ Op(p−2ϵ′
n ) = op(1). (3.15)
It follows by (3.11) and (3.15) that
Cn
1
Kn
Zn(g)π
ˆβn +
g
√
n
dg
P
→ 1
as n → ∞.
Lemma 4.
GnCn
π∗
(g|X)dg = op(1). (3.16)
Proof. Let Un ∼ Npn
g|µn, Σn
, where µn and Σn are as defined in the statement of Lemma 2. Note that
∥µn∥ ≤
√
nB−1
n
n
i=1
Xi − ψ′
zT
i β0
zi
+ OP (
√
pn). (3.17)
Since B−1
n = 1
n
Σn and ∥Σn∥ = O(1), it follows that
E
√
nB−1
n
n
i=1
Xi − ψ′
zT
i β0
zi
2
= O
E
1
√
n
n
i=1
Xi − ψ′
zT
i β0
zi
2
= O
E
1
n
n
i=1
Xi − ψ′
zT
i β0
2
zT
i zi
(∵ Xi’s are independent)
= O
1
n
n
i=1
ψ′′
zT
i β0
zT
i zi
8. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 133
≤ O
1
n
max
1≤i≤n
ψ′′
zT
i β0
n
i=1
zT
i zi
= O
1
n
n
i=1
zT
i zi
∵ By (A-1) and continuity of ψ′′
= O(pn) (∵ By (A-0)) .
It follows by (3.17) that
∥µn∥ = Op(
√
pn). (3.18)
Hence
ENpn (µn,Σn)∥Un∥2
= trace(Σn) + ∥µn∥2
= Op(pn).
By exactly the same argument as the one leading to Eq. (3.15) in the proof of Lemma 3, it follows that
Cc
n
Npn (µn, Σn)dg =
op(1). The result now follows by using Lemma 2.
Lemma 5.
Gn
1
Kn
Zn(g)π
ˆβn +
g
√
n
dg → 1.
Proof. Note that by Lemma 4,
GnCn
π∗
(g|X)dg =
GnCn
1
Kn
Zn(g)π
ˆβn + g
√
n
dg
Gn
1
Kn
Zn,ˆβn
(g)π
ˆβn + g
√
n
dg
→ 0.
Hence,
GnCn
1
Kn
Zn(g)π
ˆβn + g
√
n
dg
Cn
1
Kn
Zn(g)π
ˆβn + g
√
n
dg +
GnCn
1
Kn
Zn(g)π
ˆβn + g
√
n
dg
→ 0. (3.19)
Now, by Lemma 3,
Cn
1
Kn
Zn(g)π
ˆβn +
g
√
n
→ 1.
The result follows by (3.19).
We now state and prove the main result of the paper.
Theorem 1. Suppose β ∈ Rpn satisfies
√
n∥β − ˆβn∥ ≤ p
1
2
+ ϵ
6
n for every n. This is equivalent to the assumption that g ∈ Cn. In
such a case, under assumptions (A-0)–(A-3),
π∗
(g | X) = Npn
g|0, ˆΣn
1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′
zT
i
ˆβn
gr gsgt zir ziszit +
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
−
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′
zT
i
ˆβn
gr gsgt zir ziszit
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
+ R(g)
×
1 − op(1)
, (3.20)
where, supg∈Cn
R(g) = Op
p
6+ϵ
n
n
and Npn
g|0, ˆΣn
is a pn-dimensional normal density with mean vector 0 and covariance
matrix ˆΣn =
−∇2ln(ˆβn)
n
−1
evaluated at g.
Remark 2. Note that by Lemma 4, the posterior probability that g does not lie in Cn converges to 0.
9. 134 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
Proof. Since ∇ln(ˆβn) = 0, by a fourth order Taylor series expansion around ˆβn, we have:
ln
ˆβn +
g
√
n
− ln(ˆβn) =
1
2n
gT
∇2
ln(ˆβn)g +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt
β=ˆβn
+
1
24n2
pn
r,s,t,u=1
gr gsgt gu
∂4
ln(β)
∂βr ∂βs∂βt ∂βu
β=β∗
n
= A1(g) + A2(g) + A3(g) (say). (3.21)
Here β∗
n = β∗
n(g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Based on exactly the same argument leading
up to (3.6) (in the proof of Lemma 3),
P
β∗
n(g) ∈ Θn ∀ g ∈ Gn
→ 1, (3.22)
as n → ∞. Also,
π
ˆβn +
g
√
n
= π(ˆβn) +
1
√
n
gT
∇π(ˆβn) +
1
2n
gT
∇2
π(β∗∗
n )g
= π(ˆβn)
1 +
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
+
1
2n
gT
∇2
π(β∗∗
n )g
π(ˆβn)
= π(ˆβn)(1 + B1(g) + B2(g)) (say), (3.23)
where β∗∗
n = β∗∗
n (g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Based on exactly the same argument
leading up to (3.6) (in the proof of Lemma 3),
P
β∗∗
n (g) ∈ Θn ∀ g ∈ Gn
→ 1, (3.24)
as n → ∞.
We now analyze various terms in (3.21) and (3.23). By the continuity of ψ′′′
and the fact that ˆβn ∈ Θn with probability
tending to 1, it follows that
max
1≤i≤n
|ψ′′′
(zT
i
ˆβn)| = Op(1). (3.25)
Hence, for g ∈ Rpn ,
|A2(g)| =
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt
β=ˆβn
=
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′
zT
i
ˆβn
zir ziszit gr gsgt
≤
1
6n3/2
max
1≤i≤n
ψ′′′
zT
i
ˆβn
n
i=1
|zT
i g|3
. (3.26)
In particular, we get that for g ∈ Cn,
|A2(g)| ≤
1
6n3/2
max
1≤i≤n
ψ′′′
zT
i
ˆβn
n
i=1
(∥g∥ ∥zi∥)3
(∵ By Cauchy–Schwarz)
=
1
6n3/2
max
1≤i≤n
ψ′′′
zT
i
ˆβn
∥g∥3
n
i=1
∥zi∥3
≤ max
1≤i≤n
ψ′′′
zT
i
ˆβn
p
1
2
+ ϵ
6
n
3
n(M
√
pn)3
6n3/2
. (3.27)
The last inequality follows by using (A-0). It follows by (3.27) that
sup
g∈Cn
|A2(g)| = Op
p
3+ ϵ
2
n
√
n
. (3.28)
10. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 135
By the continuity of ψ′′′′
and (3.22), it follows that
sup
g∈Cn
max
1≤i≤n
|ψ′′′′
(zT
i β∗
n)| = Op(1). (3.29)
Hence, for g ∈ Gn,
|A3(g)| =
1
24n2
pn
r,s,t,u=1
gr gsgt gu
∂4
ln(β)
∂βr ∂βs∂βt ∂βu
β=β∗
n
=
1
24n2
pn
r,s,t,u=1
n
i=1
ψ′′′′
zT
i β∗
n
zir ziszit ziugr gsgt gu
≤
1
24n2
max
1≤i≤n
ψ′′′′
zT
i β∗
n
n
i=1
|zT
i g|4
. (3.30)
In particular, for g ∈ Cn, we get that
|A3(g)| ≤
1
24n2
max
1≤i≤n
ψ′′′′
zT
i β∗
n
n
i=1
(∥g∥ ∥zi∥)4
=
1
24n2
max
1≤i≤n
ψ′′′′
zT
i β∗
n
∥g∥4
n
i=1
∥zi∥4
≤ max
1≤i≤n
ψ′′′′
zT
i β∗
n
Cp
1
2
+ ϵ
6
n
4
n(M
√
pn)4
24n2
. (3.31)
The last inequality follows by using (A-0). It follows by (3.31) that
sup
g∈Cn
|A3(g)| = Op
p
4+ 2ϵ
3
n
n
. (3.32)
Next we analyze the second order remainder term in (3.23). Note that ∥ˆβn − β0∥ = Op(
pn
n
) by Lemma 1 and supg∈Cn
∥β∗∗
n (g)− ˆβn∥ = Op
p
1
2
+ϵ′
n√
n
as β∗∗
n (g) is an intermediate point on the line joining ˆβn and (ˆβn + g
√
n
). Hence, by the triangle
inequality, we get that
sup
g∈Cn
∥β∗∗
n (g) − β0∥ = Op
p
1
2
+ϵ′
n
√
n
. (3.33)
By (A-3), we get that
p
1
2
+ϵ′
n√
n
= o
4
pn
n
. By (A-2), it follows that
sup
g∈Cn
max
1≤r,s≤pn
1
π(β)
∂2
π(β)
∂βr ∂βs
β=β∗∗
n (g)
= Op(p4
n). (3.34)
Note that
π(β∗∗
n )
π(ˆβn)
= exp
log π(β∗∗
n ) − log(π(ˆβn))
= exp
∇ log π(β∗∗∗
n )
T
β∗∗
n − ˆβn
,
where β∗∗∗
n = β∗∗∗
n (g) is an intermediate point on the line joining β∗∗
n and ˆβn. Hence,
sup
g∈Cn
∥β∗∗∗
n − ˆβn∥ ≤ sup
g∈Cn
∥β∗∗
n − ˆβn∥ = Op
p
1
2
+ϵ′
n
√
n
.
11. 136 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
By (A-3),
p
1
2
+ϵ′
n√
n
= o
4
pn
n
. Hence, by Lemma 1 and (A-2), it follows that
sup
g∈Cn
∥∇ log π(β∗∗∗
n )∥ = Op(p3/2
n ).
Hence,
sup
g∈Cn
π(β∗∗
n )
π(ˆβn)
≤ sup
g∈Cn
exp
∥∇ log π(β∗∗∗
n )∥ ∥β∗∗
n − ˆβn∥
≤ exp
Op(p3/2
n )Op
p
1
2
+ϵ′
n
√
n
= Op(1). (3.35)
It follows that
|B2(g)| =
1
n
gT
∇2
π(β∗∗
n )g
π(ˆβn)
=
π(β∗∗
n )
π(ˆβn)
1
n
pn
r,s=1
1
π(β)
∂2
π(β)
∂βr ∂βs
β=β∗∗
n
gr gs
≤
π(β∗∗
n )
π(ˆβn)
1
n
pn
r,s=1
1
π(β)
∂2
π(β)
∂βr ∂βs
β=β∗∗
n
|gr | |gs|
≤
π(β∗∗
n )
π(ˆβn)
1
n
max
1≤r,s≤pn
1
π(β)
∂2
π(β)
∂βr ∂βs
β=β∗∗
n
pn∥g∥2
. (3.36)
It follows by (3.34)–(3.36) that
sup
g∈Cn
|B2(g)| = Op
p
9/2+ ϵ
3
n
n
. (3.37)
Note that by (3.1), (3.21) and (3.23), π∗
(g|X) = N/D, where,
N =
π(ˆβn) (1 + B1(g) + B2(g)) exp (A1(g) + A2(g) + A3(g))
π(ˆβn)
(2π)
pn
2
−
∇2ln(ˆβn)
n
1
2
= Npn
g|0, ˆΣn
(1 + B1(g) + B2(g)) exp (A2(g) + A3(g))
= Npn
g|0, ˆΣn
{(1 + B1(g)) (1 + A2(g)) + B2(g)(1 + A2(g) + A3(g)) + (1 + B1(g)) A3(g)}
+ Npn
g|0, ˆΣn
{(1 + B1(g) + B2(g)) (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g)))}
= Npn
g|0, ˆΣn
(N1(g) + N2(g) + N3(g) + N4(g)) , (say), (3.38)
and
D =
N(g)dg. (3.39)
Now, from (3.28), (3.32) and (3.37), it follows that
sup
g∈Cn
N2(g) = sup
g∈Cn
[B2(g)(1 + A2(g) + A3(g))] = Op
p
9/2+ ϵ
3
n
n
. (3.40)
12. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 137
In view of Lemma 1 and (A-2),
sup
g∈Cn
|1 + B1(g)| = 1 + sup
g∈Cn
1
√
n
pn
v=1
|gv(∇ log π(ˆβn))v|
≤ 1 + sup
g∈Cn
1
√
n
∥g∥ ∥∇ log π(ˆβn)∥
= 1 + Op
p2+ϵ′
n
√
n
= 1 + op(1). (3.41)
By (3.32), it follows that
sup
g∈Cn
N3(g) = sup
g∈Cn
[(1 + B1(g)) A3(g)] = Op
p
4+ 2ϵ
3
n
n
. (3.42)
By (3.28) and (3.32),
sup
g∈Cn
|A2(g) + A3(g)| ≤ sup
g∈Cn
|A2(g)| + sup
g∈Cn
|A3(g)| = Op
p
3+ ϵ
2
n
√
n
.
It follows by (A-3) that for large enough n,
sup
g∈Cn
(exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g))) ≤ sup
g∈Cn
(A2(g) + A3(g))2
= Op
p6+ϵ
n
n
. (3.43)
It follows from (3.37), (3.41) and (3.43) that
sup
g∈Cn
N4(g) = sup
g∈Cn
[(1 + B1(g) + B2(g)) (exp (A2(g) + A3(g)) − (1 + A2(g) + A3(g)))]
= Op
p6+ϵ
n
n
. (3.44)
Let
R(g) := N2(g) + N3(g) + N4(g). (3.45)
It follows from (3.40), (3.42) and (3.44) that
sup
g∈Cn
(N2(g) + N3(g) + N4(g)) = Op
p6+ϵ
n
n
. (3.46)
By (3.39) and Lemma 5,
D =
Gn
1
Kn
Zn(g)π
ˆβn +
g
√
n
dg = 1 + op(1).
Thus,
π∗
(g|X) = N/D
= Npn
g|0, ˆΣn
1 +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt
β=ˆβn
×
1 +
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
+ R(g)
1 − op(1)
= Npn
g|0, ˆΣn
1 +
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt
β=ˆβn
+
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
+
1
6n3/2
pn
r,s,t=1
gr gsgt
∂3
ln(β)
∂βr ∂βs∂βt
β=ˆβn
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
+ R(g)
1 − op(1)
13. 138 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
= Npn
g|0, ˆΣn
1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′
zT
i
ˆβn
gr gsgt zir ziszit +
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
−
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′
zT
i
ˆβn
gr gsgt zir ziszit
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
+ R(g)
1 − op(1)
,
where supg∈Cn
R(g) = Op
p
6+ϵ
n
n
.
Remark 3. Note that we have extended the first order results in [6] to a third order correct posterior expansion by requiring
stronger growth restrictions on pn. A natural question that arises is whether one can obtain a second order correct expansion
with weaker restrictions on the growth on pn. However, we have not considered second order correct expansions for two
reasons. Firstly, the derivation of a moment matching prior, which is the application that we consider in Section 4, requires a
third order correct asymptotic expansion. Secondly, the proof of Lemma 3 in the paper uses the assumption that p6+ϵ
n /n → 0
(see (3.9)). We would still need Lemma 3 to establish a second order correct posterior expansion. Therefore, establishing
a second order correct expansion would still require the same growth restriction (assuming the other conditions in (A-0),
(A-1), (A-2) and (A-3) are left unchanged).
Remark 4. Bontemps [3] establishes posterior consistency under Gaussianity, by relaxing the restrictions in [6,7] in several
ways. However, the arguments in the proof of Bontemps’ results (in particular Theorems 1 and 2 in Bontemps’ paper) rely
heavily on Gaussianity. We have made efforts to adapt them for other models, but have not been successful so far.
3.1. Posterior expansion for the uniformly bounded case
We can consider the case when pn is uniformly bounded, and obtain an expansion of the posterior density parallel to
(3.20). The fact that pn is uniformly bounded allows a slightly finer analysis of the terms in the expansion, which is useful
when deriving moment matching priors in Section 4. Firstly, we note that Lemmas 3–5 can be established by the same
set of arguments, by using, for example, Cn =
g : ∥g∥ < n
1
6+ϵ
instead of Cn =
g : ∥g∥ < p
1
2
+ ϵ
6
n
. Henceforth, in this
subsection, it will be assumed that Cn =
g : ∥g∥ < n
1
6+ϵ
. It can be easily seen by repeating appropriate steps in the proof
of Theorem 1 that in this case
|A2(g)| = Op
∥g∥3
√
n
, (3.47)
for every g ∈ Rpn , and
|A3(g)| = Op
∥g∥4
n
. (3.48)
for every g ∈ Gn. To clarify notation, (3.47) means that |A2(g)| is ∥g∥3
times a quantity which is independent of g, and is
Op
1√
n
.
Since Θn is a compact set, it follows by (3.24) and the twice continuous differentiability of π(·) that each entry of
∇π(ˆβn) and ∇2
π(β∗∗
n ) is bounded above in probability. Also, by (A-2), it follows that π(ˆβ) is bounded below in probability.
Combining (3.23) with the above facts gives us
|B1(g)| = Op
∥g∥
√
n
, (3.49)
for every g ∈ Rpn and
|B2(g)| = Op
∥g∥2
n
(3.50)
for every g ∈ Gn. It follows that
sup
g∈Cn
|B1(g)| = Op
1
n
4+ϵ
12+2ϵ
sup
g∈Cn
|B2(g)| = Op
1
n
4+ϵ
6+ϵ
14. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 139
sup
g∈Cn
|A2(g)| = Op
1
n
ϵ
12+2ϵ
sup
g∈Cn
|A3(g)| = Op
1
n
2+ϵ
6+ϵ
. (3.51)
It follows by (3.38) (along with the arguments following it, adjusted for the fact that pn is uniformly bounded and for the
new choice of Cn) and (3.51) that for every g ∈ Cn,
π∗
(g|X) = Npn
g|0, ˆΣn
1 −
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′
zT
i
ˆβn
gr gsgt zir ziszit +
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
−
1
6n3/2
pn
r,s,t=1
n
i=1
ψ′′′
zT
i
ˆβn
gr gsgt zir ziszit
1
√
n
pn
v=1
gv
∇ log π(ˆβn)
v
+ R(g)
×
1 − op(1)
, (3.52)
and
|R(g)| = Op
∥g∥6
n
. (3.53)
Note that (3.52) is identical to (3.20). However, the order of the remainder term is different for the two settings. In the setting
for (3.20), we have supg∈Cn
R(g) = Op
p
6+ϵ
n
n
. However, in the current setting,
sup
g∈Cn
|R(g)| = Op
1
n
ϵ
6+ϵ
. (3.54)
Note that even in this case, the posterior probability of the set Cn converges to 1 as n → ∞. We conclude this section by
noting that if g is fixed (or ∥g∥ is uniformly bounded as n → ∞) then the order of the leading terms B1(g) and A2(g) is
1√
n
(as can be seen from (3.47) and (3.49)), while the order of the remainder term R(g) is 1
n
(as can be seen from (3.53)).
However, if one is looking for bounds uniformly over g ∈ Cn then the orders can be obtained from (3.51) and (3.54).
4. Moment matching prior
A moment matching prior (introduced by Ghosh and Liu [12]) is an objective prior for which the posterior mean matches
with the maximum likelihood estimator up to a high order of approximation. Ghosh and Liu [12] provide several examples
where they derive a moment matching prior using third order correct posterior expansions. In particular, they consider the
case with i.i.d. observations from a multi-parameter natural exponential family (with fixed p), and prove that the moment
matching prior in this situation can be uniquely determined, and in fact corresponds to Jeffreys’ general rule prior. However,
they did not consider the more complicated GLM setting. In this section we use the expansion in Theorem 1 to obtain
moment matching priors in the context of GLM with canonical link function (both when pn is uniformly bounded, and when
pn is unbounded). We will in fact show that the moment matching prior can be uniquely identified in this situation, and
corresponds to the Jeffreys’ general rule prior. In other words, Jeffreys’ general rule prior is the only prior which satisfies the
moment matching condition in the current GLM setup. We may add here that conditions for the propriety and existence of
moments for Jeffreys’ prior in the GLM setup (as well as the resulting posterior) have been addressed in [15].
The analysis in the current setup will be based on examining the posterior expectation of the quantity β − ˆβn. Note
that the remainder term in most posterior expansions (including the ones used in [12] and in this paper) is not uniformly
bounded in the variable used in the expansion (for example g in our setup) if we do not restrict to an appropriate set (such as
Cn in our setup). In fact, to show that the expected value (with respect to the posterior distribution) of the remainder term is
appropriately small, one has to restrict the computation of the expected value over a set such as Cn. Ghosh and Liu [12] take
a somewhat heuristic approach in their derivations, and do not take this issue into account. We undertake a more rigorous
approach to address this issue as follows. The computation of the posterior expectation for deriving the moment matching
prior will be restricted to the region Cn on which the appropriate bounds for the remainder term in the expansion are valid.
4.1. The uniformly bounded case
We first consider the case when the number of regressors pn is uniformly bounded. Note that the expansion in (3.52)
holds in this case with Cn = {g : ∥g∥ < n
1
6+ϵ }. The moment matching criterion of Ghosh and Liu [12] (with the modification
discussed above) dictates that the prior π(·) be chosen such that the posterior expectation
Eπ(·|X)
(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}
15. 140 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
converges to zero faster than 1
n
. It follows by the expansion in (3.52) that
Eπ(·|X)
(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}
=
1
√
n
Cn
gπ∗
(g | X) dg
=
1 − op(1)
√
n
Cn
gNpn
g|0, ˆΣn
1 + B1(g) + A2(g) + B1(g)A2(g) + R(g)
dg. (4.1)
By (A-0) and (2.10), it follows that the eigenvalues of ˆΣn are uniformly bounded above (with probability tending to 1). Since
pn is uniformly bounded, for any subset S of Rpn , we get
S
∥g∥k
Npn
g|0, ˆΣn
dg ≤
Rpn
∥g∥k
Npn
g|0, ˆΣn
dg = Op(pk/2
n ) = Op(1) (4.2)
for every fixed k ∈ N. It follows by (3.47), (3.49) and (3.53) that
Cn
∥g∥Npn
g|0, ˆΣn
B1(g)A2(g) + R(g)
dg = Op
1
n
. (4.3)
Recall that Cc
n refers to Rpn Cn. A simple application of Markov’s inequality, along with (3.12), (3.14), (4.2) and the uniform
boundedness of pn yields that
Cc
n
∥g∥k
Npn
g | 0, ˆΣn
dg ≤
Gc
n
∥g∥k
Npn
g | 0, ˆΣn
dg +
g:∥g∥>n
1
6+ϵ
∥g∥k
Npn
g | 0, ˆΣn
dg
≤
Op
1
n
+
1
n
2
6+ϵ
Rpn
∥g∥k+2
Npn
g | 0, ˆΣn
dg
= Op
1
n
2
6+ϵ
(4.4)
for every fixed k ∈ N. It follows by (3.47) and (3.49) that
Cc
n
∥g∥ |A2(g)|Npn
g|0, ˆΣn
dg = Op
1
n
1
2
+ 2
6+ϵ
, (4.5)
and
Cc
n
∥g∥ |B1(g)|Npn
g|0, ˆΣn
dg = Op
1
n
1
2
+ 2
6+ϵ
. (4.6)
Note that
Cn
gNpn
g|0, ˆΣn
dg +
Cc
n
gNpn
g|0, ˆΣn
dg =
Rpn
gNpn
g|0, ˆΣn
dg = 0. (4.7)
Another application of Markov’s inequality along the lines of (4.4) (but by increasing the moment by 6 instead of 2) gives
Cc
n
∥g∥Npn
g | 0, ˆΣn
dg ≤
Op
1
n3
+
1
n
6
6+ϵ
Rpn
∥g∥7
Npn
g | 0, ˆΣn
dg
= Op
1
n
6
6+ϵ
. (4.8)
It follows from (4.7) and (4.8) that
Cn
gNpn
g|0, ˆΣn
dg = Op
1
n
6
6+ϵ
. (4.9)
Here, when we say that a vector x is Op(cn), we mean that ∥x∥ is Op(cn). By (4.1), (4.3), (4.5), (4.6) and (4.9) we get
Eπ(·|X)
(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}
=
1 − op(1)
√
n
Cn
gNpn
g|0, ˆΣn
1 + B1(g) + A2(g)
dg + Op
1
n
3
2
=
1 − op(1)
√
n
Rpn
gNpn
g|0, ˆΣn
B1(g) + A2(g)
dg + Op
1
n
1+ 2
6+ϵ
. (4.10)
16. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 141
We now simplify the integral in (4.10). Note that
Rpn
gB1(g)Npn
g|0, ˆΣn
dg =
1
√
n
Rpn
ggT
Npn
g|0, ˆΣn
dg
∇ log π(ˆβn)
=
1
√
n
ˆΣn∇ log π(ˆβn). (4.11)
Note that by Isserlis’ formula for joint moments of a multivariate normal distribution, we get that for any 1 ≤ j, r, s, t ≤ pn,
Rpn
gjgr gsgt Npn
g|0, ˆΣn
dg = ˆΣn,jr
ˆΣn,st + ˆΣn,js
ˆΣn,rt + ˆΣn,jt
ˆΣn,rs. (4.12)
Let In(β) = 1
n
n
i=1 ψ′′
zT
i β
zizT
i , the information matrix evaluated at β. It follows by the definition of A2(g) and (4.12) that
Rpn
gjA2(g)Npn
g|0, ˆΣn
dg = −
1
6
√
n
pn
r,s,t=1
ˆΣn,jr
ˆΣn,st + ˆΣn,js
ˆΣn,rt + ˆΣn,jt
ˆΣn,rs
An,r,s,t , (4.13)
where
An,r,s,t =
1
n
n
i=1
ψ′′′
zT
i
ˆβn
zir ziszit =
∂
∂βr
(In(β))st
β=ˆβn
. (4.14)
Note that
∂
∂βr
log |In(β)| =
pn
s,t=1
(In(β)−1
)st
∂
∂βr
(In(β))st . (4.15)
By the symmetry in r, s, t on the right hand side of (4.13), it follows that
Rpn
gjA2(g)Npn
g|0, ˆΣn
dg = −
1
2
√
n
pn
r,s,t=1
ˆΣn,jr
ˆΣn,st An,r,s,t . (4.16)
Combining (4.14)–(4.16) along with the fact In(ˆβn) = ˆΣ−1
n , we get that
Rpn
gA2(g)Npn
g|0, ˆΣn
dg = −
1
2
√
n
ˆΣn∇ log |In(ˆβn)|. (4.17)
It follows by (4.10), (4.11) and (4.17) that to ensure
Eπ(·|X)
(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}
= Op
1
n
1+ 2
6+ϵ
,
the prior density π(·) should satisfy
ˆΣn∇ log π(ˆβn) −
1
2
ˆΣn∇ log |In(ˆβn)| = 0. (4.18)
Note that the maximum likelihood estimator ˆβn satisfies ∥ˆβn −β0∥
P
→ 0 as n → ∞. To ensure that (4.18) holds irrespective
of the true β0, we require
ˆΣn∇ log π(β) −
1
2
ˆΣn∇ log |In(β)| = 0 (4.19)
for every β. Since ˆΣn is a positive definite matrix with probability tending to 1, it follows that (4.19) holds if and only if
π(β) ∝ |In(β)|
1
2 .
To ensure that the assumptions in (A-2) hold, we choose
π(β) = Cn|In(β)|
1
2 , (4.20)
where Cn is chosen such that
Θn
π(β)dβ = 1. Since ψ is infinitely differentiable, and Θn is a compact set, it follows by
(A-0) that the eigenvalues of In(β) are uniformly bounded (above and below) over β ∈ Θn and n ∈ N. Since pn is uniformly
bounded, it follows that π(β) is uniformly bounded (above and below) over β ∈ Θn and n ∈ N. Since ψ is infinitely
differentiable, it also follows in particular that π(·) is twice continuously differentiable. Since Θn is a compact set, it follows
that all the first and second order derivatives of π(·) are uniformly bounded above over Θn and n. All these facts combined
together imply that π(·) satisfies the assumptions in (A-2).
17. 142 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
4.2. The unbounded case
We now consider the case when pn → ∞ and p6+ϵ
n /n → 0 as n → ∞. For the moment matching prior derivation in
this case, (a) we will assume that there exists an α > 0 such that pn/nα
→ ∞ as n → ∞, and (b) we will replace the
assumption that ∥zi∥ ≤ M
√
pn for every 1 ≤ i ≤ n in (A-0), by the stronger assumption |zir | ≤ M for every 1 ≤ i ≤ n
and 1 ≤ r ≤ pn. Note that the M does not depend on n. Recall that the posterior expansion (3.20) in this case holds with
Cn = {g : ∥g∥ < p
1/2+ϵ/6
n }. The basic technique for deriving the moment matching prior remains the same as in the
uniformly bounded case. However, the order of various terms used in the analysis is different as compared to the uniformly
bounded case. Hence, this case is more complex, and needs a more careful consideration of all the relevant terms.
Note again that by (A-0) and (2.10), tr( ˆΣn) = Op(pn). By the analysis leading to (3.31) it follows that
Cn
∥g∥ |A3(g)|Npn
g|0, ˆΣn
dg = Op
1
n2
n
i=1
Cn
∥g∥
zT
i g
4
Npn
g|0, ˆΣn
dg
= Op
1
n2
n
i=1
Cn
∥g∥5
+
zT
i g
5
Npn
g|0, ˆΣn
dg. (4.21)
The previous step follows from an application of Minkowski’s inequality, in particular,
|ab| ≤
aq
q
+
b˜q
˜q
for every a, b ∈ R,
with q = 5 and ˜q = 5
4
. Note that by (A-0), zT
i
ˆΣnzi = Op(pn). It follows by (4.21) that
Cn
∥g∥ |A3(g)|Npn
g|0, ˆΣn
dg = Op
p
5/2
n
n
. (4.22)
By very similar arguments which use the analysis leading up to (3.31), (3.26), (3.36) and (3.41) respectively, it can be
established that
Cn
∥g∥(A3(g))2
Npn
g|0, ˆΣn
dg = Op
p
9/2
n
n2
(4.23)
Cn
∥g∥(A2(g))2
Npn
g|0, ˆΣn
dg = Op
p
7/2
n
n
(4.24)
Cn
∥g∥ |B2(g)|Npn
g|0, ˆΣn
dg = Op
p5
n
n
(4.25)
Cn
∥g∥ |A2(g)B1(g)|Npn
g|0, ˆΣn
dg = Op
p
7/2
n
n
. (4.26)
It follows by the definition of R(g) in (3.45), and by (4.22)–(4.26) that
Cn
∥g∥(R(g) + A2(g)B1(g))Npn
g|0, ˆΣn
dg = Op
p5
n
n
. (4.27)
A simple application of Markov’s inequality along with (3.12) and (3.14) implies that
Cc
n
∥g∥ |B1(g)|Npn
g|0, ˆΣn
dg
=
1
√
n
Gc
n
∥g∥
∇ log π(ˆβn)
T
g
Npn
g|0, ˆΣn
dg
+
1
√
n
{g:∥g∥>p
1/2+ϵ/6
n }
∥g∥
∇ log π(ˆβn)
T
g
Npn
g|0, ˆΣn
dg
≤
1
√
n
1
n
K′ − sup
1≤i≤n
zT
i
ˆβn
2
+
1
p
1+ϵ/3
n
Rpn
∥g∥3
∇ log π(ˆβn)
T
g
Npn
g|0, ˆΣn
dg
= Op
1
√
np
1+ϵ/3
n
Rpn
∥g∥6
+
∇ log π(ˆβn)
T
g
2
Npn
g|0, ˆΣn
dg. (4.28)
18. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 143
Note that by (A-0), (A-2) and (2.10), we get
∇ log π(ˆβn)
T
ˆΣn∇ log π(ˆβn) = Op(p3
n). It follows from (4.28) that
Cc
n
∥g∥ |B1(g)|Npn
g|0, ˆΣn
dg = Op
p
2−ϵ/3
n
√
n
. (4.29)
By a similar argument, it can be established that
Cc
n
∥g∥ |A2(g)|Npn
g|0, ˆΣn
dg = Op
p
2−ϵ/3
n
√
n
. (4.30)
Recall that there exists α > 0 such that pn/nα
→ ∞ as n → ∞. Let
α∗
= max
6
ϵ
1
α
+
1
2
, 4
.
An application of Markov’s inequality along the lines of (4.4) (but by increasing the moment by α∗
instead of 2) along with
the fact that pn = o(n1/6
) gives
Cc
n
∥g∥Npn
g | 0, ˆΣn
dg ≤
Op
p
α∗/2
n
nα∗/2
+
1
p
α∗/2+(α∗ϵ)/6
n
Rpn
∥g∥1+α∗
Npn
g | 0, ˆΣn
dg
= Op
p
1/2+α∗
n
nα∗/2
+ Op
p
(1+α∗)/2
n
p
α∗/2+(α∗ϵ)/6
n
= Op
1
n
. (4.31)
Since p6+ϵ
n /n → 0, it follows that
p5
n
n3/2
= op
p
2−ϵ/3
n
n
.
Using this fact along with (3.20), (4.7), (4.27), (4.29), (4.30) and (4.31), we get that
Eπ(·|X)
(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}
=
1 − op(1)
√
n
Cn
gNpn
g|0, ˆΣn
1 + B1(g) + A2(g)
dg + Op
p5
n
n
3
2
=
1 − op(1)
√
n
Rpn
gNpn
g|0, ˆΣn
B1(g) + A2(g)
dg + Op
p
2−ϵ/3
n
n
. (4.32)
Using exactly the same arguments following (4.10) in the uniformly bounded case, it follows that to obtain
Eπ(·|X)
(β − ˆβn)1{
√
n(β−ˆβn)∈Cn}
= Op
p
2−ϵ/3
n
n
irrespective of the true value β0, we must have
π(β) ∝ |In(β)|
1
2 .
Hence, the moment matching prior (up to order p
2−ϵ/3
n /n) is given by
π(β) = Cn|In(β)|
1
2 , (4.33)
where Cn is chosen such that
Θn
π(β)dβ = 1 (note that such a choice of Cn is possible because Θn is a compact set, and π(·)
is a continuous function). Since ψ′′
is a strictly convex continuous function, it follows by the definition of Θn that ψ′′
(zT
i β) is
uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N. Hence, by (A-0), all the eigenvalues of In(β)
are uniformly bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N. It follows that |In(β)|1/pn is uniformly
bounded (away from both zero and infinity) over β ∈ Θn and n ∈ N, which immediately implies that C
1/pn
n is uniformly
bounded (away from both zero and infinity) over n ∈ N. It follows by (4.33) that there exists η0 > 0 (not depending on n)
such that π(β) > ηpn
0 for every β ∈ Θn. Note that the dependence of π on n has been suppressed for simplicity of exposition.
19. 144 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
We now verify that the prior density π(·) in (4.33) satisfies assumptions (2.7) and (2.8). Since ψ is infinitely differentiable,
it follows in particular that π(·) is twice continuously differentiable. Note that
∂
∂βr
log π(β) =
1
2
∂
∂βr
log |In(β)| =
1
2
pn
s,t=1
(In(β)−1
)st
∂
∂βr
(In(β))st . (4.34)
Let
K2 = sup
x∈[−K′,K′]
|ψ′′′
(x)|
and recall that
∂
∂βr
(In(β))st =
1
n
pn
s,t=1
ψ′′′
(zT
i β)zir ziszit .
It follows by (A-0), (4.34) and |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn (see the first paragraph of this subsection) that
2K2M
n
An +
∂
∂βr
In(β) and
2K2M
n
An −
∂
∂βr
In(β) are both positive definite (4.35)
for every 1 ≤ r ≤ pn. It follows by (4.34) and (4.35) that
∂
∂βr
log π(β)
=
1
2
tr
In(β)−1 ∂
∂βr
In(β)
≤ tr
In(β)−1 K2M
n
An
≤ pnλmin (In(β)) λmax
K2M
n
An
. (4.36)
Since the eigenvalues of In(β) are uniformly bounded below over β ∈ Θn and n ∈ N, it follows by (A-0) and (4.36) that there
exists M1 (independent of β and n) satisfying
∥∇ log π(β)∥ < M1p3/2
n (4.37)
for every β ∈ Θn and n ∈ N. Hence (2.7) is satisfied.
Let 1 ≤ j, j′
≤ pn be arbitrarily chosen. Then
1
π(β)
∂2
π(β)
∂βj∂βj′
=
∂2
∂βj∂βj′
log π(β) +
∂
∂βj
log π(β)
∂
∂βj′
log π(β)
. (4.38)
It follows by (4.34) that
∂2
∂βj∂βj′
log π(β)
=
1
2
tr
∂
∂βj
In(β)−1
∂
∂βj′
In(β)
+ tr
In(β)−1 ∂2
∂βj∂βj′
In(β)
=
1
2
tr
In(β)−1
∂
∂βj
In(β)
In(β)−1
∂
∂βj′
In(β)
+ tr
In(β)−1 ∂2
∂βj∂βj′
In(β)
. (4.39)
Note that
∂2
∂βj∂βj′
(In(β))st =
1
n
n
i=1
ψ′′′′
(zT
i β)zijzij′ ziszit . (4.40)
Let
K3 = sup
x∈[−K′,K′]
|ψ′′′′
(x)|.
It follows by (A-0), (4.40) and |zir | ≤ M for every 1 ≤ i ≤ n and 1 ≤ r ≤ pn (see the first paragraph of this subsection) that
2K3M2
n
An +
∂2
∂βj∂βj′
In(β) and
2K3M2
n
An −
∂2
∂βj∂βj′
In(β) are both positive definite (4.41)
for every 1 ≤ r ≤ pn. It follows by (4.35), (4.39) and (4.41) that
∂2
∂βj∂βj′
log π(β)
≤ 2K2
2 M2
tr
In(β)−1
An
n
In(β)−1
An
n
+ K3M2
tr
In(β)−1 An
n
. (4.42)
20. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 145
Since the eigenvalues of In(β) are uniformly bounded below over β ∈ Θn and n ∈ N, it follows by (A-0), (4.36), (4.38) and
(4.42) that there exists M2 (independent of β and n) satisfying
max
1≤j,j′≤pn
∂2
∂βj∂βj′
log π(β)
≤ M2p2
n.
Hence (2.8) is satisfied.
Acknowledgments
We would like to thank Prof. Subhashis Ghosal for his help with the paper. Thanks are also due to the Associate Editor
and a referee for their useful comments.
Appendix
Multivariate normal distribution satisfies assumptions in (2.7) and (2.8)
Suppose we put a normal prior on β, i.e., β ∼ Npn (µ, A). We assume that ∥µ∥ = O(
√
pn) and ∥A−1
∥ = O(
√
pn). Note
that
∇ log π(β) = −A−1
(β − µ).
Hence,
∥∇ log π(β)∥ ≤ ∥A−1
∥ ∥β − µ∥. (A.1)
Also,
∥β − µ∥ ≤ ∥β − β0∥ + ∥β0∥ + ∥µ∥
≤ ∥β − β0∥ +
1
C1
βT
0
1
n
n
i=1
zizT
i
β0 + ∥µ∥ (by assumption (A-0))
≤ ∥β − β0∥ +
1
C1
1
n
n
i=1
(zT
i β0)2 + ∥µ∥
≤ ∥β − β0∥ +
K
√
C1
+ ∥µ∥ (by assumption (A-1))
= ∥β − β0∥ + O(
√
pn). (A.2)
It follows from (A.1), (A.2) and the assumption on µ and A, that
sup
∥β−β0∥≤Cn
∥∇ log π(β)∥ = O(pn),
where, Cn = 4
pn
n
.
Note that for 1 ≤ j, j′
≤ pn,
1
π(β)
∂2
π(β)
∂βj∂βj′
=
−(A−1
)jj′ +
pn
k=1
(A−1
)jk(βk − µk)
pn
k=1
(A−1
)j′k(βk − µk)
≤ ∥A−1
∥ +
∥A−1
∥ ∥β − µ∥
2
. (A.3)
It follows from (A.2) and (A.3) that
sup
∥β−β0∥≤Cn
max
1≤j,j′≤pn
1
π(β)
∂2
π(β)
∂βj∂βj′
= O(p2
n),
where, Cn = 4
pn
n
.
21. 146 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
Multivariate t distribution satisfies assumptions in (2.7) and (2.8)
Suppose we put a t-prior on β, i.e., β ∼ tγ (µ, A). Here tγ (µ, A) denotes the multivariate t distribution with parameters γ ,
µ and A. We take γ to be independent of n, but allow µ = µn and A = An to vary with n (the dependence on n is suppressed
henceforth for simplicity of exposition). The density of this distribution is proportional to
1 +
1
γ
(β − µ)T
A−1
(β − µ)
−(γ +pn)/2
.
We assume that ∥A−1
∥ = O(
√
pn). Now,
∇ log π(β) =
π′
(β)
π(β)
= −
γ + pn
γ
A−1
(β − µ)
1 + 1
γ
(β − µ)T A−1(β − µ)
.
Thus
∥∇ log π(β)∥ =
γ + pn
γ
(β − µ)T A−2(β − µ)
1 + 1
γ
(β − µ)T A−1(β − µ)
≤ O
γ + pn
γ
∥A−1∥
(β − µ)T A−1(β − µ)
1 + 1
γ
(β − µ)T A−1(β − µ)
≤ O(p5/4
n ).
Now, let A−1
= ((aij
)). By straightforward manipulations, we get
1
π(β)
∂2
π(β)
∂βj∂βj′
=
1
4γ 2
(γ + pn)(γ + pn + 2)
pn
k=1
akj
(βk − µk)
pn
k=1
akj′
(βk − µk)
1 + 1
γ
pn
k,l=1
akl(βk − µk)(βl − µl)
2
−
1
2γ
(γ + pn)
ajj′
1 + 1
γ
pn
k,l=1
akl(βk − µk)(βl − µl)
(a)
≤ O(p2
n)
((β − µ)T
A−1
)j((β − µ)T
A−1
)j′
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
(b)
≤ O(p2
n)
(β − µ)T
A−2
(β − µ)
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
(c)
≤ O(p2
n)
∥A−1
∥(β − µ)T
A−1
(β − µ)
(1 + (β − µ)T A−1(β − µ))2
+ O(p3/2
n )
= O(p5/2
n ),
where (a) and (c) follow from the assumption that ∥A−1
∥ = O(
√
pn), and (b) follows since ((β − µ)T
A−1
)j ≤
(β − µ)T A−2(β − µ), ∀j = 1 . . . pn.
Proof of Lemma 1
Let αn =
pn
n
. We will show that for any given ϵ, there exists a constant C such that
P
sup
∥u∥=C
ln(β0 + αnu) < ln(β0)
≥ 1 − ϵ (A.4)
for large enough n. This will imply, with probability tending to 1, that the unique maximum ˆβn lies in the ball
β0 + αnu : ∥u∥ ≤ C
, i.e., ∥ˆβn − β0∥ = Op(αn).
22. S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148 147
Note that,
ln(β0 + αnu) − ln(β0)
= αn
n
i=1
XizT
i u −
n
i=1
ψ(zT
i (β0 + αnu)) − ψ(zT
i β0)
= αn
n
i=1
Xi − ψ′
(zT
i β0)
zT
i u −
α2
n
2
n
i=1
ψ′′
(zT
i β0)(zT
i u)2
−
α3
n
6
n
i=1
ψ′′′
(θ∗
i )(zT
i u)3
= I1 + I2 + I3, say,
where θ∗
i lies between zT
i β0 and zT
i (β0 + αnu), for every 1 ≤ i ≤ n.
Note that by (A-1), zT
i β0 is uniformly bounded (over i and n) and ψ′′
(·) is a continuous function. Hence, ψ′′
(zT
i β0) is also
uniformly bounded (over i and n) by say, K1. It follows that
E
n
i=1
Xi − ψ′
(zT
i β0)
zT
i u
2
=
n
i=1
E
(Xi − ψ′
(zT
i β0))2
(zT
i u)2
,
∵ Xi’s are independent and E[Xi] = ψ′
(zT
i β0)
=
n
i=1
(zT
i u)2
ψ′′
(zT
i β0)
∵ E
(Xi − ψ′
(zT
i β0))2
= ψ′′
(zT
i β0)
≤ K1
n
i=1
(zT
i u)2
≤ nK1uT
n
i=1
zizT
i
n
u
≤ nK1C2∥u∥2
.
The last step follows by (A-0). Hence E
n
i=1
Xi − ψ′
(zT
i β0)
zT
i u
2
= O(n)∥u∥2
. Thus,
I1 = Op(αn
√
n)∥u∥ = Op(
√
pn)∥u∥. (A.5)
Note that ψ is a strictly convex function and hence ψ′′
(·) > 0. Since ψ′′
is continuous, it follows that its infimum on
a bounded interval is strictly positive. By (A-1), zT
i β0 is uniformly bounded. This implies ψ′′
(zT
i β0) is uniformly bounded
below by a positive constant, say K2. Hence
I2 = −
α2
n
2
n
i=1
ψ′′
(zT
i β0)(zT
i u)2
≤ −K2
α2
n
2
n
i=1
(zT
i u)2
= −K2
α2
n
2
nuT
n
i=1
zizT
i /n
u
< 0,
by (A-0). Also, by (A-0) and the arguments above
|I2| ≥ K2
α2
n
2
nuT
n
i=1
zizT
i /n
u
≥ K2
α2
n
2
nC1∥u∥2
= C1K2pn∥u∥2
. (A.6)
23. 148 S. Dasgupta et al. / Journal of Multivariate Analysis 131 (2014) 126–148
Now, since θ∗
i lies between zT
i β0 and zT
i (β0 + αnu), it follows by (A-0) and (A-1) that
|θ∗
i | < max
1≤i≤n
|zT
i β0|, |zT
i (β0 + αnu)|
< max
1≤i≤n
K, K + αn|zT
i u|
≤ K + max
1≤i≤n
αn∥zi∥ ∥u∥
≤ K +
pn
n
O(
√
pn)∥u∥
= K + O
pn
√
n
∥u∥.
Hence ψ′′′
(θ∗
i ) is uniformly bounded by, say K3. Thus,
|I3| =
α3
n
6
n
i=1
ψ′′′
(θ∗
i )(zT
i u)3
≤ K3
α3
n
6
n
i=1
|(zT
i u)3
|
≤ K3
p
3/2
n
6n3/2
n
i=1
(∥zi∥ ∥u∥)3
=
K3M3/2
p3
n
6
√
n
∥u∥3
. (A.7)
The last step follows by (A-0). Since p6
n/n → 0 as n → ∞, it follows by (A.5)–(A.7) that the order of I2 dominates the orders
of I1 and I3 (for a suitable choice of ∥u∥). Since I2 is negative, the assertion in (A.4) holds.
References
[1] A. Barron, M. Schervish, L. Wasserman, The consistency of posterior distributions in nonparametric problems, Ann. Statist. 27 (1999) 536–561.
[2] S. Bernstein, Theory of Probability, 1917 (in Russian).
[3] D. Bontemps, Bernstein–von Mises theorems for Gaussian regression with increasing number of regressors, Ann. Statist. 39 (2011) 2557–2584.
[4] M. Crowder, Asymptotic expansions of posterior expectations, distributions and densities for stochastic processes, Ann. Inst. Statist. Math. 40 (1988)
297–309.
[5] J. Fan, H. Peng, Nonconcave penalized likelihood with a diverging number of parameters, Ann. Statist. 32 (2004) 928–961.
[6] S. Ghosal, Normal approximation to the posterior distribution for generalized linear models with many covariates, Math. Methods Statist. 6 (1997)
332–348.
[7] S. Ghosal, Asymptotic normality of posterior distributions in high dimensional linear models, Bernoulli 5 (1999) 315–331.
[8] S. Ghosal, Asymptotic normality of posterior distributions for exponential families with many parameters, J. Multivariate Anal. 74 (2000) 49–69.
[9] S. Ghosal, J. Ghosh, A. van der Vaart, Convergence rates of posterior distributions, Ann. Statist. 28 (2000) 500–531.
[10] S. Ghosal, T. Samanta, Asymptotic expansions of posterior distributions in nonregular cases, Ann. Inst. Statist. Math. 49 (1997) 181–197.
[11] M. Ghosh, Objective priors: an introduction for frequentists (with discussion), Statist. Sci. 26 (2011) 187–211.
[12] M. Ghosh, R. Liu, Moment matching priors, Sankhy¯a 73-A (2011) 185–201.
[13] J.K. Ghosh, B.K. Sinha, S.N. Joshi, Expansion for posterior probability and integrated Bayes risk, in: S.S. Gupta, J.O. Berger (Eds.), Statistical Decision
Theory and Related Topics III, Academic Press, 1982, pp. 403–456.
[14] S.J. Haberman, Maximum likelihood estimates in exponential response models, Ann. Statist. 5 (1977) 815–841.
[15] J.G. Ibrahim, P.W. Laud, On Bayesian analysis of generalized linear models using Jeffreys’s prior, J. Amer. Statist. Assoc. 86 (1991) 981–986.
[16] R.A. Johnson, On asymptotic expansion for posterior distribution, Ann. Math. Statist. 38 (1967) 1899–1906.
[17] R.A. Johnson, Asymptotic expansions associated with posterior distribution, Ann. Math. Statist. 42 (1970) 1241–1253.
[18] H. Liang, P. Du, Maximum likelihood estimation in logistic regression models with a diverging number of covariates, Electron. J. Stat. 6 (2012)
1838–1846.
[19] S. Portnoy, Asymptotic behavior of M-estimators of p regression parameters when p2
/n is large. I: Consistency, Ann. Statist. 12 (1984) 1298–1309.
[20] S. Portnoy, Asymptotic behavior of M-estimators of p regression parameters when p2
/n is large. II: Normal approximation, Ann. Statist. 13 (1985)
1403–1417.
[21] S. Portnoy, Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity, Ann. Statist. 16
(1988) 356–366.
[22] A.M. Walker, On the asymptotic behavior of the posterior distribution, J. R. Stat. Soc. Ser. B 26 (1969) 80–88.
[23] C.M. Zhang, Y. Jiang, Y. Chai, Penalized Bregman divergence for large-dimensional regression and classification, Biometrika 97 (2011) 551–566.