ch7_lin_updatedApril13_2022.pdf

STAT 4002
Applied Multivariate Analysis
Chapter 7
Factor Analysis
1 / 25

Agenda
The factor analysis model
Maximum likelihood estimation
Rotation of factors
Factor scores
2 / 25

Introduction
I Factor analysis (FA) is a useful multivariate statistical
technique to model the covariance or correlation structure
between variables.
I The objective is to model the covariance or correlation
structure by introducing some unobservable factors (also
known as latent variables).
I This technique is commonly used in psychology, education
research and marketing research where they often involve
unobservable factors that cannot be directly observed, such as
self-confidence, intelligence quotient (IQ), emotional quotient
(EQ), verbal ability, analytic power, loyalty etc.
3 / 25

The k-factor analysis model can be formulated as follows. Let
x = (X1, . . . Xp)0 be observable random variables,
µ = (µ1, . . . µp)0 be a constant vector representing the mean, and
f = (F1, . . . Fk)0 be unobservable common factors (latent
variables). Then we can write
X1 = µ1 + λ11F1 + . . . + λ1kFk + ε1,
X2 = µ2 + λ21F1 + . . . + λ2kFk + ε2,
.
.
.
Xp = µp + λp1F1 + . . . + λpkFk + εp,
where λij is the factor loading (sensitivity) of the i-th response
with respect to the j-th factor.
4 / 25

In matrix notations, we have
x = µ + Λf + ε,
where
Λ =



λ11 · · · λ1k
.
.
.
.
.
.
λp1 · · · λpk


 , ε =



ε1
.
.
.
εp


 and k < p. (1)
Λ is a p × k matrix of factor loadings with respect to the common
factors f and ε is a p × 1 vector of unique factors (also known as
specific factors, or uniquenesses).
Since we are interested in the covariance structure rather than the
mean, we may simply assume that µ = 0. In addition, we also
assume that E(f) = 0 and E(ε) = 0 and hence E(x) = 0.
5 / 25

We further assume that
Var(f) = Ik,
Var(ε) = Ψ = diag(ψ11, . . . , ψpp),
Cov(f, ε) = 0k×p,
the covariance matrix of x could then be written as
Σ = Var(Λf + ε) = ΛΛ0
+ Ψ. (2)
The variance of Xi can be split into two parts
σii =
k
X
j=1
λ2
ij + ψii = h2
i + ψii,
where h2
i =
Pk
j=1 λ2
ij is called the communality and ψii is the
specific variance.
6 / 25

Example
Consider the covariance matrix
Σ =




19 30 2 12
30 57 5 23
2 5 38 47
12 23 47 68



 .
The equality




19 30 2 12
30 57 5 23
2 5 38 47
12 23 47 68



 =




4 1
7 2
−1 6
1 8





4 7 −1 1
1 2 6 8

+




2 0 0 0
0 4 0 0
0 0 1 0
0 0 0 3




implies that Σ has the structure produced by an k = 2 factor
model
7 / 25

Example
with
Λ =




4 1
7 2
−1 6
1 8



 and Ψ =




2 0 0 0
0 4 0 0
0 0 1 0
0 0 0 3




The variance of X1 can be decomposed as
σ11 = λ2
11 + λ2
12 + ψ11,
or
19 = 42
+ 12
+ 2.
8 / 25

Remarks on standardization
I For cases in which the units of the variables are not
comparable, it is usually desirable to work with the
standardized variables.
I Standardization avoids the problems of having one variable
with large variance dominate the factor loadings.
I The decomposition of the covariance matrix could then be
applied on the sample correlation matrix R,
R = Λ̂Λ̂
0
+ Ψ̂.
I Note that the results based on Σ and R are not the same.
9 / 25

If x has a multivariate normal distribution, then (n − 1)S would
have a Wishart distribution with n − 1 degrees of freedom, in
which n is the number of observations. Recall from Chapter 2 that
the density function of the Wishart distribution with n − 1 degrees
of freedom is
fn−1(S|Σ) =
|S|(n−p−2)/2|Σ|−(n−1)/2e−tr(Σ−1
S)/2
K
,
where K is a scaling constant. Hence, the log-likelihood function
could be defined as
l(Σ|S) = − ln K +
n − p − 2
2
ln |S| −
n − 1
2
ln |Σ| −
tr(Σ−1
S)
2
in terms of the unknown parameter Σ
10 / 25

Alternatively, using Σ = ΛΛ0
+ Ψ we can rewrite the likelihood
function as
l(Λ, Ψ|S) = − ln K +
n − p − 2
2
ln |S| −
n − 1
2
ln |ΛΛ0
+ Ψ|
−
tr[(ΛΛ0
+ Ψ)−1S]
2
(3)
in terms of Λ and Ψ. The maximum likelihood estimator (MLE)
Λ̂ and Ψ̂ are the value that maximize l(Λ, Ψ).
This is a complicated maximization problem because both Λ̂ and
Ψ̂ are matrix-valued. While explicit solution is not available, a
result were obtained in Jóreskog(1969) who developed a reliable
numerical method for the computation of the maximum likelihood
estimate.
11 / 25

R has a built-in function factanal() to compute the MLE of the
k-factor model on the correlation matrix. Let us use the
decath.csv data again to illustrate this.
d

−read . csv ( ” decath . csv ” ) # read i n data
x
−d [ , 2 : 1 1 ] # e x t r a c t column 2 to 11
fa2
−f a c t a n a l ( x , f a c t o r s =2, s c o r e s=” r e g r e s s i o n ” ) # save output to fa2
names ( fa2 ) # d i s p l a y items i n fa
[ 1 ] ” converged ” ” l o a d i n g s ” ” u n i q u e n e s s e s ” ” c o r r e l a t i o n ” ” c r i t e r i a ”
[ 6 ] ” f a c t o r s ” ” dof ” ”method” ” rotmat ” ” s c o r e s ”
[ 1 1 ] ”STATISTIC” ”PVAL” ”n . obs ” ” c a l l ”
(U

−fa2 $ u n i q u e n e s s e s ) # save and d i s p l a y uniqueness to U
m100 h110 m400 m1500 longjump highjump pole shot d i s c u s j a v
0.284 0.287 0.234 0.594 0.347 0.737 0.285 0.109 0.188 0.473
12 / 25

Let us display the factor loadings Λ̂ = L, communality and the
uniqueness Ψ̂ = diag(U).
(L

−fa2 $ l o a d i n g s ) # save and d i s p l a y f a c t o r l o a d i n g s to L
Loadings :
Factor1 Factor2
m100 0.781 −0.325
h110 0.741 −0.406
m400 0.875 −0.030
m1500 0.544 0.333
longjump −0.738 0.328
highjump −0.388 0.335
pole −0.576 0.619
shot −0.135 0.934
d i s c u s −0.102 0.895
j a v −0.177 0.704
Factor1 Factor2
SS l o a d i n g s 3.308 3.155
Proportion Var 0.331 0.315
Cumulative Var 0.331 0.646
apply (L ˆ2 ,1 ,sum) # compute communality
m100 h110 m400 m1500 longjump highjump pole shot d i s c u s j a v
0.716 0.713 0.766 0.406 0.653 0.263 0.715 0.891 0.812 0.527
13 / 25

Factor 1 can be interpreted as the weighted average of speed and
jumping ability while factor 2 is the power. We can also compute
Λ̂Λ̂
0
+ Ψ̂ and compare with the correlation matrix.
RMLE

−L%∗%t (L)+diag (U) # compute RMLE = LL’+U
round (RMLE, 3 ) # d i s p l a y RMLE
m100 h110 m400 m1500 longjump highjump pole shot d i s c u s
j a v
m100 1.000 0.710 0.693 0.317 −0.683 −0.412 −0.651 −0.409 −0.371
−0.367
h110 0.710 1.000 0.660 0.268 −0.680 −0.423 −0.678 −0.479 −0.439
−0.417
m400 0.693 0.660 1.000 0.466 −0.656 −0.350 −0.522 −0.146 −0.116
−0.176
m1500 0.317 0.268 0.466 1.000 −0.292 −0.099 −0.107 0.238 0.242
0.138
longjump −0.683 −0.680 −0.656 −0.292 1.000 0.396 0.628 0.406 0.369
0.361
highjump −0.412 −0.423 −0.350 −0.099 0.396 1.000 0.431 0.366 0.340
0.305
pole −0.651 −0.678 −0.522 −0.107 0.628 0.431 1.000 0.656 0.613
0.538
shot −0.409 −0.479 −0.146 0.238 0.406 0.366 0.656 1.000 0.850
0.681
d i s c u s −0.371 −0.439 −0.116 0.242 0.369 0.340 0.613 0.850 1.000
0.648
j a v −0.367 −0.417 −0.176 0.138 0.361 0.305 0.538 0.681 0.648
1.000
14 / 25

R

−cor ( x )
round (R, 3 ) # compare with c o r r . Matrix
m100 h110 m400 m1500 longjump highjump pole shot d i s c u s
j a v
m100 1.000 0.751 0.698 0.254 −0.691 −0.364 −0.627 −0.420 −0.353
−0.344
h110 0.751 1.000 0.655 0.155 −0.654 −0.487 −0.709 −0.489 −0.403
−0.350
m400 0.698 0.655 1.000 0.554 −0.636 −0.275 −0.521 −0.142 −0.154
−0.150
m1500 0.254 0.155 0.554 1.000 −0.356 −0.132 −0.070 0.202 0.288
0.045
longjump −0.691 −0.654 −0.636 −0.356 1.000 0.471 0.632 0.391 0.375
0.446
highjump −0.364 −0.487 −0.275 −0.132 0.471 1.000 0.472 0.321 0.376
0.338
pole −0.627 −0.709 −0.521 −0.070 0.632 0.472 1.000 0.643 0.620
0.557
shot −0.420 −0.489 −0.142 0.202 0.391 0.321 0.643 1.000 0.856
0.703
d i s c u s −0.353 −0.403 −0.154 0.288 0.375 0.376 0.620 0.856 1.000
0.618
j a v −0.344 −0.350 −0.150 0.045 0.446 0.338 0.557 0.703 0.618
1.000
15 / 25

Rotation of factors
The covariance structure will not change if Λ is replaced by
Π = ΛG for any orthogonal matrix G, because
ΠΠ0
+ Ψ = ΛGG0
Λ0
+ Ψ
= ΛIΛ0
+ Ψ
= Σ.
Geometrically speaking, the multiplication with an orthogonal
matrix is equivalent to a rotation of the principal axes. So it is
possible to find an orthogonal matrix (also known as rotation
matrix) to make the interpretation of factors easier.
16 / 25

Rotation of factors
One commonly used method to determine the rotation matrix is
called varimax, which is the default option of factanal() in R.
Varimax is to find the orthogonal matrix G such that
V =
k
X
j=1


p
X
i=1
π4
ij −
1
p
p
X
i=1
π2
ij
!2

 =
k
X
j=1
p
X
i=1
(π2
ij − π̄2
•j)2
#
is maximized, where π̄•j = 1
p
Pp
i=1 πij, πij is the (i, j)-th entry of
Π = ΛG. The factor loadings in the output of factanal() are
the rotated factor loadings.
17 / 25

Factor scores
Once the k-factor model is fitted and the factor loadings are
obtained, it may be of interest to estimate the realized value of the
factors f given individual observation x0 = (x1, . . . , xp)0. That is,
ˆ
f0 = E(f | x = x0).
To compute the above conditional expectation, we need the joint
distribution of x = (X1, . . . , Xp)0 and f = (F1, . . . , Fk)0. Note
that Cov(Xi, Fj) = λij, we get

x
f

∼ Np+k

0
0

,

ΛΛ0
+ Ψ Λ
Λ0
Ik

.
18 / 25

Factor scores
Then, the conditional expectation of f given x is
f̂ = E(f | x) = Λ0
(ΛΛ0
+ Ψ)−1
x.
Replacing Λ and Ψ with their estimates and given the observation
x = x0, the factor score is given by
f̂0 = Λ̂
0
(Λ̂Λ̂
0
+ Ψ̂)−1
x0.
The factor scores defined above is called the regression factor
score. However, this is a biased estimator.
19 / 25

Factor scores
An alternative unbiased estimate
f̂ =
h
(Ψ−1/2
Λ)0
(Ψ−1/2
Λ)
i−1
(Ψ−1/2
Λ)0
Ψ−1/2
x
= Λ0
Ψ−1
Λ
−1
Λ0
Ψ−1
x.
20 / 25

Factor scores
f̂ defined above is unbiased in the sense that
E[f̂|f] = E[ Λ0
Ψ−1
Λ
−1
Λ0
Ψ−1
x|f]
= E[ Λ0
Ψ−1
Λ
−1
Λ0
Ψ−1
(Λf + ) |f]
= Λ0
Ψ−1
Λ
−1
Λ0
Ψ−1
Λf
= f.
The factor score is then obtained by replacing Λ and Ψ with their
estimates and x by the observation x0,
f̂0 =

Λ̂
0
Ψ̂
−1
Λ̂
−1
Λ̂
0
Ψ̂
−1
x0.
This estimate is known as the Bartlett’s factor score.
21 / 25

Factor scores
The R function factanal() compute the factor scores and store
them in scores of the output object, the option
scores=regression
would compute the regression factor scores, while
scores=Bartlett
would output the Bartlett’s factor scores.
22 / 25

Factor scores
Let us plot these factor scores versus the observation number.
fs1 -fa2$scores [,1] # save 1st factor scores to fs1
fs2 -fa2$scores [,2] # save 2nd factor scores to fs2
par(mfrow=c(2 ,1)) # define 2x1 multi -frame graph
plot(fs1 ,type=o) # plot fs1
plot(fs2 ,type=o) # plot fs2
par(mfrow=c(1 ,1)) # reset multi -frame graph to 1x1
plot(fs1 ,fs2 ,main=factor score with obs. no.)
text(fs1 -0.1 , fs2 +0.1 , cex =0.6) # add obs. no. to the points
Recall that these observations are ordered according to athletics’
official result. The first factor score is the smaller the better while
the second factor scores is the larger the better.
23 / 25

Factor scores
0 5 10 15 20 25 30 35
−1
0
1
2
Index
fs1
0 5 10 15 20 25 30 35
−3
−1
0
1
2
Index
fs2
Figure 1: A plot of the factor scores against observation number.
24 / 25

Factor scores
−1 0 1 2
−3
−2
−1
0
1
2
factor score with obs. no.
fs1
fs2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
2122
23
24
25
26
27
28
29
30
31
32
33
34
Figure 2: A plot of ˆ
f1 against ˆ
f2.
25 / 25

ch7_lin_updatedApril13_2022.pdf

Recommended

Recommended

More Related Content

Similar to ch7_lin_updatedApril13_2022.pdf

Similar to ch7_lin_updatedApril13_2022.pdf (20)

Recently uploaded

Recently uploaded (20)

ch7_lin_updatedApril13_2022.pdf