Lindley smith 1972

Outline
Introduction
An introductory example
General Bayesian Linear Model
An application
Bayes Estimates for the Linear Model
by D.V. Lindley and A. F. M. Smith
Paolo Baudissone
April 17, 2015
Paolo Baudissone Bayes Estimates for the Linear Modelby D.V. Lindley an

Outline
Introduction
An application
Outline
1 Introduction
2 An introductory example
3 General Bayesian Linear Model
4 An application

Outline
Introduction
An application
Introduction
Object of this paper is the linear model E[y] = Aθ, where y
is a vector of observations, A a known design matrix and θ a
vector of unknown parameters. Usual estimate of θ employed
in this framework is that derived by the least squares method.
Authors argue the availability of prior information about the
parameters and the fact that this may be exploited in order to
ﬁnd improved estimates; in particular they focus on situations
in which the parameters themselves have a general linear
structure in terms of other quantities called hyperparameters.
A particular form of prior information is assumed: the one
based on De Finetti’s (1964) idea of exchangeability.

Outline
Introduction
An application
Suppose, in the general linear model, that we are dealing with a
unit design matrix so that E[yi ] = θi for i = 1, . . . , n and that
y1, . . . , yn are i.i.d. random variables normally distributed with
known variance σ2. Assume moreover that the distribution of the
θi ’s is exchangeable; this exchangeable prior knowledge determines
E[θi ] = µ, a common value for each i; in other words there is a
linear structure to the parameters analogous to the linear structure
supposed for the observations y.

Outline
Introduction
An application
In this simple example, µ is the only hyperparameter. Denoting by
τ2 the variance of each θi , it can be shown (Lindley - 1971) that
the posterior mean is given by
E[θi|y] =
yi
σ2 +
n
i=1 yi
n
τ2
1
σ2 + 1
τ2
Notice that the Bayes estimate is a weighted average of yi and the
overall mean,
n
i=1 yi
n , with weights inversely proportional to the
variances of yi and θi .

Outline
Introduction
An application
Notation
Notation y N(µ, D) means that column vector y has a
multivariate normal distribution with mean µ, a column vector,
and dispersion D, a positive semi-deﬁnite matrix.

Outline
Introduction
An application
Theorem
Suppose, given θ1, that y|θ1 N(A1θ1, C1) and that, given θ2, a
vector of p2 hyperparameters, θ1|θ2 N(A2θ2, C2). Then
1 y N(A1A2θ2, C1 + A1C2AT
1 )
2 θ1|y N(Bb, B) where
B−1
= AT
1 C−1
1 A1 + C−1
2
b = AT
1 C−1
1 y + C−1
2 A2θ2

Outline
Introduction
An application
Proof.
Part 1 Observing that y = A1θ1 + u, where u N(0, C1), and
θ1 = A2θ2 + v, where v N(0, C2); it follows that
y = A1A2θ2 + A1v + u. Moreover, A1v + u is a linear function of
independent (multivariate) normal random variables and is
distributed as N(0, C1 + A1C2AT
1 ), hence the result follows.

Outline
Introduction
An application
Proof.
Part 2 To prove the second part we use Bayes’ theorem and write
the posterior distribution of θ1 as
p(θ1|y) ∝ p(y|θ1)p(θ1)
The right-hand side can be written as e−Q
2 , where Q is given by
(y − A1θ1)T
C−1
1 (y − A1θ1) + (θ1 − A2θ2)T
C−1
2 (θ1 − A2θ2)
and after some calculations we obtain that Q is equal to:
(θ1 − Bb)T
B−1
(θ1 − Bb) + {yT C−1
1 y + θT
2 AT
2 C−1
2 A2θ2 − bT
Bb}.

Outline
Introduction
An application
The following result allows us to rewrite the expression of the
inverse of the dispersion matrix of the marginal distribution of y in
a way that will be useful for subsequent developments of the topic.
Theorem
For any matrices A1, A2, C1 and C2 of appropriate dimensions and
for which the inverses stated in the resul exist, we have
(C1 + A1C2AT
1 )−1 = C−1
1 − C−1
1 A1(AT
1 C−1
1 A1 + C−1
2 )−1AT
1 C−1
1

Outline
Introduction
An application
Posterior distribution of θ1
Theorem
Suppose that, given θ1, y|θ1 N(A1θ1, C1), given θ2,
θ1|θ2 N(A2θ2, C2) and given θ3, θ2|θ3 N(A3θ3, C3). Then
the posterior distribution of θ1 is:
θ1|{Ai }, {Ci }, θ3, y N(Dd, D)
where D−1 = AT
1 C−1
1 A1 + (C2 + A2C3AT
2 )−1 and
d = AT
1 C−1
1 y + (C2 + A2C3AT
2 )−1A2A3θ3.

Outline
Introduction
An application
Remarks I
Observe that thanks to the ﬁrst theorem we stated it is possible to
write down the marginal distribution of θ1, that is the prior
distribution, free of the hyperparameters θ2:
θ1 N(A2A3θ3, C2 + A2C3AT
2 )
The mean of the posterior distribution may be regarded as a point
estimate of θ1 to replace the usual least-squares estimate.

Outline
Introduction
An application
Remarks II
The form of this estimate is a weighted average of the
least-squares estimate (AT
1 C−1
1 A1)−1AT
1 C−1
1 y and the prior mean
A2A3θ3 with weights given by the inverses of the corresponding
dispersion matrices, AT
1 C−1
1 A1 for the least-squares values,
C2 + A2C3AT
2 for the prior distribution of θ1.

Outline
Introduction
An application
Thanks to the second theorem we have stated before, we are now
in a position to get several alternative expressions for the posterior
mean and variance; this is underlined in the two following results:
Theorem
An alternative expression for D−1 is given by
AT
1 C−1
1 A1 + C−1
2 − C−1
2 A2(AT
2 C−1
2 A2 + C−1
3 )−1AT
2 C−1
2 .

Outline
Introduction
An application
Theorem
If C−1
3 = 0, the posterior distribution of θ1 is N(D0d0, D0) where
D−1
0 = AT
1 C−1
1 A1 + C−1
2 − C−1
2 A2(AT
2 C−1
2 A2)−1AT
2 C−1
2
and
d0 = AT
1 C−1
1 y.
This result gives the form mostly often used in applications.

Outline
Introduction
An application
An application: Two-factor Experimental Designs
Consider t ”treatments” assigned to n experimental units arranged
in b ”blocks”. The usual model is given by
E[yij ] = µ + αi + βj for 1 ≤ i ≤ t, 1 ≤ j ≤ b
with the errors independent N(0, σ2). In the general notation we
have that θT
1 = (µ, α1, . . . , αt, β1, . . . , βb) and A1 describes the
designed used.

Outline
Introduction
An application
In the second stage it seems reasonable to assume an exchangeable
prior knowledge within treatment constants {αi } and within block
constants {βj }, but that these were independent. Adding the
assumption of normality, second stage can be described by
αi N(0, σ2
α), βj N(0, σ2
β), µ N(ω, σ2
µ),
these distributions being independent. A third stage is not
necessary.

Outline
Introduction
An application
Our goal now is to derive the expressions for D−1 and d. Noting
that C2 is diagonal, then the same is true for C−1
2 and leading
diagonal is given by (σ−2
µ , σ−2
α , . . . , σ−2
α , σ−2
β , . . . , σ−2
β ). Furthermore,
remember that C3 = 0. Hence, we get that
D−1 = σ−2AT
1 A1 + C−1
2 and d = σ−2AT
1 y.
Bayes estimate Dd, call it θ∗
1, satisﬁes the equation
(AT
1 A1 + σ2C−1
2 )θ∗
1 = AT
1 y
suggesting that it diﬀer from the least-squares equation only in the
inclusion of the extra term σ2C−1
2 .

Outline
Introduction
An application
The Bayes estimate θ∗
1 is given by the following expressions:
µ∗ = y.., α∗
j = bσ2
α(yi.−y..)
(bσ2
α+σ2)
, β∗
j =
tσ2
β(y.j −y..)
(tσ2
β+σ2)
where y.., yi. and y.j are sample means. Observe that the
estimators of the treatment and block eﬀects are shrunk towards
zero by a factor depending on σ2 and σ2
α or σ2
β.

Lindley smith 1972

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lindley smith 1972

Similar to Lindley smith 1972 (20)

More from Julyan Arbel

More from Julyan Arbel (20)

Recently uploaded

Recently uploaded (20)

Lindley smith 1972