Pca(principal components analysis)

PCA(Principal Components Analysis)
IsaacLu

Conception
 Principal component analysis(PCA) projects the feature onto the principal
components.
 The motivation is to reduce the features dimensionality while only losing a
small amount of information.

Procedure:
 The first principle component is just the normalized linear combination of the
variables that has the highest variance.
 The second principal component has largest variance, subject to being
uncorrelated with the first.(It means the second principal component is
orthogonal with first one)
 And so on.

Why we choose the direction with the
most variation
• Reason 1 : In signal analysis, they think
that signal has bigger variance and noise
has smaller variance.
• Reason 2 : As we can see our data
project on the green line (var = 0.6524)
that can separate data well rather than
purple line(var = 0.1678).
So choose the direction of PC with the most
variation in the data is our goal.

Example
DATA p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
x 2.5 0.5 2.2 1.9 3.1 2.3 2 1 1.5 1.1
y 2.4 0.7 2.9 2.2 3 2.7 1.6 1.1 1.6 0.9
We have 10 points(p1~p10) in two
dimensions as picture and table at
right side.
We want to use PCA to do dimension
reduce form 2 to 1.

 First step: Zero-centered(去中心化)
 Reason :We want to move the data center to original points. To make calculation
more clear without consider bias.
DATA p1 p2 p3 p4 p5
x 0.69 -1.31 0.39 0.09 1.29
y 0.49 -1.21 0.99 0.29 1.09
DATA p6 p7 p8 p9 p10
x 0.49 0.19 -0.81 -0.31 -0.71
y 0.79 -0.31 -0.81 -0.31 -1.01

 Second step: calculate covariance matrix(計算共變異矩陣)
 Reason :In probability theory and statistics, covariance is a measure of the joint
variability of two random variables.(衡量兩個變量的總體誤差)
 The sign of the covariance therefore shows the tendency in the linear
relationship between the variables. Variables whose covariance is zero are
called uncorrelated.
In our case:
𝑐𝑜𝑣 =
0.616556 0.615444
0.615444 0.716556

Covariance matrix
The covariance matrix defines the
shape of the data. Diagonal spread
is captured by the covariance,
while axis-aligned spread is
captured by the variance.

 Third step: calculate eigenvalue and eigenvector of covariance matrix(計算共
變異矩陣的eigenvalue跟eigenvector)
 Reason: Want to find direction of principle component.
𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒 = 0.049 1.284
𝑒𝑖𝑔𝑒𝑛𝑣𝑒𝑐𝑡𝑜𝑟 =
−0.735
0.678
0.678
0.735
Sort eigenvalue from large to small, also arrange
eigenvector follow eigenvalue.
0.678
0.735
1.284 0.049
0.678
0.735
−0.735
0.678

 Forth Step: Mapping Zero_centered data to new space generate new data,
 𝑁𝑒𝑤 𝑑𝑎𝑡𝑎 = 𝑍𝑒𝑟𝑜_𝑐𝑒𝑛𝑡𝑒𝑟𝑒𝑑 𝑑𝑎𝑡𝑎 ∗ 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑣𝑒𝑐𝑡𝑜𝑟
0.678
0.735
0.69 0.49
−1.31 −1.21
0.39 0.99
0.09 0.29
1.29 1.09
0.49 0.79
0.19 −0.31
−0.81 −0.81
−0.31 −0.31
−0.71 −1.01
*=
0.828
−1.778
0.992
0.274
1.676
0.913
−0.099
−1.145
−0.438
−1.224

Reference
 http://www.visiondummy.com/2014/04/geometric-interpretation-covariance-
matrix/
 http://www.libinx.com/2017/machine-learning-algorithm-series-pca/
 https://en.wikipedia.org/wiki/Covariance#cite_note-1
 http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/#everything-you-
did-know-or-do-now
 https://gerardnico.com/data_mining/pca
 https://www.jianshu.com/p/d090721cf501

Pca(principal components analysis)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Pca(principal components analysis)

Similar to Pca(principal components analysis) (20)

More from kalung0313

More from kalung0313 (8)

Recently uploaded

Recently uploaded (20)

Pca(principal components analysis)

Editor's Notes