2. PCA - Overview
• A backbone of modern data analysis.
• A black box that is widely used but poorly understood.
• It is a mathematical tool from applied linear algebra.
• It is a simple, non-parametric method of extracting relevant
information from confusing data sets.
• It provides a roadmap for how to reduce a complex data set to a
lower dimension.
• PCA used to reduce dimensions of data without much loss of
information.
3. • PCA is “an orthogonal linear transformation that transfers the data to a new
coordinate system such that the greatest variance by any projection of the data
comes to lie on the first coordinate (first principal component), the second
greatest variance lies on the second coordinate (second principal component),
and so on.”
• An exploratory technique used to reduce the dimensionality of the data set to
2D or 3D
• Can be used to:
• Reduce number of dimensions in data
• Find patterns in high-dimensional data
• Visualize data of high dimensionality
• Example applications:
• Face recognition, Image compression, Gene expression analysis
PCA - Overview
4. Background
• Linear Algebra
• Principal Component Analysis (PCA)
• Independent Component Analysis (ICA)
• Linear Discriminant Analysis (LDA)
• Examples
5. Variance
• Variance – measure of the deviation from the mean for points in one
dimension, e.g., heights
• A measure of the spread of the data in a data set with mean
• Variance is claimed to be the original statistical measure of spread of data.
6. Covariance
• Covariance – a measure of how much each of the dimensions varies from
the mean with respect to each other.
• Covariance is measured between 2 dimensions to see if there is a
relationship between the 2 dimensions, e.g., number of hours studied
and grade obtained.
• The covariance between one dimension and itself is the variance
8. •Covariance
Covariance calculations are used to find relationships between
dimensions in high dimensonal data sets (usually greater that 3) where
visualization is difficult.
9. • Suppose we have n attributes, A1, ..., An.
• Covariance matrix:
• Example for three attributes (x,y,z):
Covariance matrix
),cov(where),( ,, jijiji
nn
AAccC
),cov(),cov(),cov(
),cov(),cov(),cov(
),cov(),cov(),cov(
zzyzxz
zyyyxy
zxyxxx
C
16. Eigenvalue Problem
A v Av v
2 3 3 12 3
x 4x
2 1 2 8 2
• Going back to our example: as A . v = λ . v
• Therefore, (3,2) is an eigenvector of the square matrix A and 4 is an
eigenvalue of A.
• The question is:
Given matrix A, how can we calculate the eigenvector and eigenvalues
for A?