Models

Models & Estimation
Merlise Clyde

STA721 Linear Models

Duke University

September 6, 2012

duke.eps

STA721 Linear Models Models & Estimation

Outline

Readings: Christensen Chapter 1-2, Appendix A

duke.eps


Models

Take an random vector Y ∈ Rn which is observable and decompose

Y =µ+

into µ ∈ Rn (unknown, ﬁxed) and ∈ Rn unobservable error
vector (random)

Usual assumptions?
E[ ] = 0 ⇒ E[Y] = µ (mean vector)
Cov[ ] = σ 2 In ⇒ Cov[Y] = σ 2 In (errors are uncorrelated)
∼ N(0, σ 2 I) ⇒ Y ∼ N(µ, σ 2 I)
The distribution assumption allows us to right down a likelihood
function

duke.eps


Likelihood Functions

The likelihood function for µ, σ 2 is proportional to the sampling
distribution of the data

1 (Y − µ)T (Y − µ)
L(µ, σ 2 ) ∝ (2πσ 2 )−n/2 exp −
2 σ2
1 Y−µ 2
∝ (σ 2 )−n/2 exp −
2 σ2

Find values of µ and σ 2 that maximize the likelihood L(µ, σ 2 ) for
ˆ ˆ
µ∈R n and σ 2 ∈ R+

Clearly, µ = Y but σ 2 = 0 is outside the parameter space
ˆ ˆ

duke.eps


Restrictions on µ

Linear model µ = Xβ for ﬁxed X (n × p) and some β ∈ Rp
restrict the possible values for µ
Oneway Anova Model j = 1, . . . , J groups with nj replicates in
each group j
“cell means” Yij = µj + ij
  
1n1 0n1 ... 0n1 µ1
 0n 1n ... 0n2  µ2 
 2 2
µ= .
 
. .
. .. .
.  .
. 
 . . . .  . 
0nJ ... 0nJ 1nJ µJ
1nj is a vector of length nj of ones.
0nj is a vector of length nj h of zeros.

duke.eps


Equivalent Model

Linear model µ = Xβ for ﬁxed X (n × p) and some β ∈ Rp
Oneway Anova Model j = 1, . . . , J groups with nj replicates in
each group j
“Treatment eﬀects” Yij = µ + τj + ij
  
1n1 1n1 0n1 ... 0n1 µ
 1n 0n2 1n2 ... 0n2  τ1 
 2
µ= .
 
. . .. . .
 . . . . .
 
. . . . .  . 
1nJ 0nJ ... 0nJ 1nJ τJ

Equivalent means µj = µ + τj
Should our inference for µ depend on how we represent or
parameterize µ?
duke.eps


Column Space

Many equivalent ways to represent the same mean vector –
inference should be independent of the coordinate system used
Let X1 , X2 , . . . , Xp ∈ Rn
The set of all linear combinations of X1 , . . . , Xp is the space
spanned by X1 , . . . , Xp ≡ S(X1 , . . . , Xp )
Let X = [X1 X2 . . . Xp ] be a n × p matrix with columns Xj
then the column space of X, C (X) = S(X1 , . . . , Xp ) space
spanned by the (column) vectors of X
µ ∈ C (X) : C (X) = {µ | µ ∈ Rn such that Xβ = µ for some
β ∈ Rp } (also called the Range of X, R(X)
β are the “coordinates” of µ in this space
C (X) is a subspace of Rn

duke.eps


Vector spaces
A collection of vectors V is a real vector space if the following
conditions hold: for any pair x and y of vectors in V there
corresponds a vector x + y and scalars α, β ∈ R such that:
1 x + y = y + x (vector addition is commutative)
2 (x + y) + z = x + (y + z) (vector addition is associative)
3 there exists a unique vector 0 ∈ V (the origin) such that x + 0 = x
for every vector x
4 for every x ∈ V , there exists a unique vector −x such that
x + (−x) = 0
5 α(βx) = (αβ)x (multiplication by scalars is associative)
6 1x = x for every x
7 α(x + y) = αx + αy (multiplication by scalars is distributive with
respect vector addition)
8 (α + β)x = αx + βx (multiplication by scalars is distributive with
respect to vector addition) duke.eps


Subspaces

Deﬁnition
Let V be a vector space and let V0 be a set with V0 ⊆ V . V0 is a
subspace of V if and only if V0 is a vector space.

Theorem
Let V be a vector space, and let V0 be a non-empty subset of V .
If V0 is closed on vector addition and scalar multiplication, then V0
is a subspace of V

Theorem
Let V be a vector space, and let x1 , . . . , xr be vectors in V . The
set of all linear combinations of x1 , . . . , xr is a subspace of V .

The Column space of X, C (X), is a subspace of Rn
duke.eps


Basis

Definition
A finite set of vectors {xi } is linearly dependent if there exist
scalars {ai } (not all zero) such that i ai xi = 0. Conversely
i ai xi = 0 implies that ai = 0 ∀ i the set is linearly independent.

Definition
If V0 is a subspace of V and if {x1 , . . . , xr } is a linearly
independent spanning set for V0 , then {x1 , . . . , xr } is a basis for V0

Theorem
If V0 is a subspace of V , then all bases for V0 have the same
number of vectors
Can both the collection of vectors in the cell means and the
treatment effects parameterizations be a basis?
duke.eps


Rank

Deﬁnition
The rank of a subspace V0 is the number of elements in a basis for
V0 and is written as r (V0 ). Similarly if A is a matrix, the rank of
C (A) is called the rank of A and is written r (A).

What is the rank of the subspace in the Oneway ANOVA model?

duke.eps


Orthogonality

Definition
An inner product space is a vector space V equipped with an inner
product: ·, · is a mapping V × V → R. Two vectors are
orthogonal if x, y = 0, written x ⊥ y (usual xT y = 0)

Definition
Two subspaces are orthogonal if x ∈ M and y ∈ N imply that
x⊥y

Definition
Orthonormal basis: {x1 , . . . , xr } is an orthonormal basis (ONB) for
M if xi , xj = 0 for i = j and xi , xi = 1 for all i. Length
x= x = x, x . Distance of two vectors is x − y

duke.eps


Oneway Anova

Find an orthonormal basis for the oneway ANOVA model. Is it
unique?

Can also use Gram-Schmidt sequential orthogonalization

duke.eps


Decomposition

Deﬁnition
If M and N are subspaces of V that satisfy
M ∩ N = {0}
M + N = V where M + N = {z | z = x + y, x ∈ M, y ∈ N}
(M ⊕ N = V ) Direct Sum
then M and N are complementary spaces. N is the complement of
M

Deﬁnition
If M and N are complementary spaces M + N = Rn and M and N
are orthogonal subspaces, then M is the orthogonal complement of
N, N ⊥ .

If z ∈ Rn = N + M, then we can uniquely decompose it into a part
x ∈ M and y ∈ N and r (M) + r (N) = n
duke.eps


Projections

Deﬁnition
P is an orthogonal projection operator onto C (X) if and only if
u ∈ C (X) implies Pu = u (projection)
v ⊥ C (X) implies Pv = 0 (perpendicular)

C (X) and C (X)⊥ are complementary spaces
y = u + v for y ∈ Rn , u ∈ C (X) and v ⊥ C (X) then
Py = Pu + Pv = u
u is the orthogonal projection of y onto C (X)

duke.eps


More on Projections

Prop
P is an orthogonal projection operator on C (X) then C (X) = C (P)

Theorem
P is an orthogonal projection operator on C (P) if and only if
P = P2 (idempotent)
P = PT (symmetry)

Claim: If X is n × p and r (X) = p then, PX = X(XT X)−1 XT
operator onto C (X)

duke.eps


Projections
Claim: If X is n × p and r (X) = p then, PX = X(XT X)−1 XT is an
orthogonal projection operator onto C (X)
P = P2 (idempotent)

P2 = PX PX = X(XT X)−1 XT X(XT X)−1 XT
X
= X(XT X)−1 XT
= PX

P = PT (symmetry)

PT
X = (X(XT X)−1 XT )T
= (XT )T ((XT X)−1 )T (X)T
= X(XT X)−1 XT
= PX

C (X ) = C (PX ) duke.eps


Projections

Claim: I − PX is an orthogonal projection onto C (X)⊥
idempotent

(I − PX )2 = (I − PX )(I − PX )
= I − PX − PX + PX PX
= I − PX − PX + PX
= I − PX

Symmetry I − PX = (I − PX )T
u ∈ C (X)⊥ ⇒ u ⊥ C (X) and (I − PX )u = u (projection)
if v ∈ C (X), (I − PX )v = v − v = 0

duke.eps


Null Space

Column space C (X)
N(A): Null space of A is {u | Au = 0}
Null space of XT is (u | XT u = 0}
N(XT ) = C (X)⊥ the orthogonal complement of C (X)
N(PX ) = C (X)⊥

u ∈ N(P) ⇒ PX u = 0
⇔ X(XT X)−1 XT u = 0
⇔ uT X(XT X)−1 XT = 0T
⇔ uT X(XT X)−1 XT β = 0
⇔ uT v = 0 v ∈ C (X)

duke.eps


Models Again

Y ∼ N(µ, σ 2 In ) with µ ∈ C (X)
Claim: Maximum Likelihood Estimator (MLE) of µ is PX Y
Log Likelihood:

n 1 Y−µ 2
log L(µ, σ 2 ) = − log(σ 2 ) −
2 2 σ2

Decompose Y = PX Y + (I − PX )Y
Use PX µ = µ
and Simplify
2
Y−µ

duke.eps


Models

More Related Content

What's hot

Viewers also liked

Similar to Models

Recently uploaded

Models