Linear Algebra in PCA

Problem Definition
Suppose we have n individuals and for each individual we are
measuring same m variables i.e. we have n data points with each point
in Rm
.
For example, we have 120 students in a class and we have measured
the following for each student :
‘Registration Number’ , ‘MA204 End-Sem Marks’ , ‘MA204 grade’ i.e. m
= 3 and n = 120 here.
1: Which are the variables that are correlated?
In our example above, we can expect a correlation between the
‘MA204 grade’ and ‘MA204End- Sem Marks’ variables
but there wouldn’t be any correlation between the ‘Registration
Number’ and ‘MA204 grade’ of a student.
2: Which variables are the most important in describing the full
dataset?
There would be some variables that are more important in describing
the dataset and some variables which wouldn’t provide any significant
information to the dataset.
3: Can the data be visualized in a simpler way?
In our example, should the data points in R3
essentially be
clustered
around a plane or is there a simpler way of seeing the data?

Linear Algebra application in Principal
Component Analysis
Let us take a dataset A (3*4) =
Mean and variance:
We know that the mean of n points is given by, μ =
𝟏
𝟏
(a1+a2+...+an)
and variance is given by, σ2 =
𝟏
𝟏−𝟏
[(a1- μ)2+(a2- μ)2+...+(an- μ)2]
Now we will recenter the data such that the mean becomes zero. This is done by subtracting
the mean of the column from each column.
So, the 4*3 matrix B whose mean is zero becomes,
B =
Where μi is the mean of ith column.
Covariance:
Let us try to find the correlation between two columns A and B which tells us how much B
varies as A varies.
cov(A, B) =
𝟏
𝟏−𝟏
[(a1- μA)(b1-μB)+(a2- μA)(b2-μB)+...+(an- μA)(bn-μB)]
Now, let S be defined as S =
𝟏
𝟏−𝟏
BBT
Clearly S is a symmetric matrix.
Now, Sii =
𝟏
𝟏−𝟏
[(a1i- μi)2+(a2i- μi)2+(a3i- μi)2]
and Sij =
𝟏
𝟏−𝟏
[(a1i- μi)(a1j- μj)+(a2i- μi)(a2j- μj)+(a3i- μi)(a3j- μj)]

Clearly Sii represents the variance of the ith variable and Sij represents the covariance of the ith
and jth variable.
Spectral Theorem:
If A is symmetric (meaning A=AT), then A is orthogonally diagonalizable and has only real
eigenvalues. In other words, there exist real numbers λ1 ,..., λn(the eigenvalues) and orthogonal,
non-zero real vectors ũ1 ,..., ũn(the eigenvectors) such that for each i = 1, 2,…, n: Aiũi = λiũi
The matrices AAT
and AT
A sharethe same non-zero eigenvalues and the
eigenvalues of AAT
and AT
A are non-negativenumbers.
FromSpectral theorem, we can orthogonally diagonalize S as it is a symmetric
matrix and let the eigenvalues of S be λ1, λ2, λ3, λ4 and the corresponding
orthonormaleigenvectors be ũ1, ũ2, ũ3, ũ4. These eigenvectors are called the
principal components of the dataset.
The trace of a matrix, T is the sumof the diagonal elements which in turn, is the
sumof the varianceof all the columns and hence is the total variance.
Trace of a matrix is also equal to the sumof its eigenvalues.
The following interpretation is fundamental to PCA:
 The direction in Rm
given by ū1 (the first principal direction)
“explains” or “accounts for” an amount λ1 of the total variance, T.
What fraction of the total variance? It’s λ1/T. And similarly, the

second principal direction ū2 accounts for the fraction λ2 /T of the
total variance, and so on.
 Thus, the vector ū1 belongs to Rm
points in the most “significant”
direction of the data set.
 Among directions that are orthogonal to both ū1 and ū2 points in
the most significant direction, and so on.
Dimensionreduction:
It is often the case that the largest few eigenvalues ofS are much greater
than all the others. For instance, suppose m = 10, the total variance T = 100,
and λ1 = 90.5, λ1 = 8.9 and λ3 …., λ10 are all less than 0.1. This means that
the first and the second principal directions explain 99.4 percent of
total variation in the data. Thus, even though our data points might
from cloud in R10
(which seems impossible to visualize), PCA tells us
that these points cluster near a two-dimensional plane (spanned by ū1
and ū2). In fact, the data points will looksomethinglike a rectangularstrip
inside that plane, since λ1 is a lot bigger than λ2 (similarto the previous
example). We haveeffectively reduced the problem from ten dimensions
down to two.

Linear Algebra in PCA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linear Algebra in PCA

Similar to Linear Algebra in PCA (20)

Recently uploaded

Recently uploaded (20)

Linear Algebra in PCA