Statistical analysis information about PCA or principles component analysis and etc

MULTIPLE DIMENSIONALANALYSIS
MDA
Principal Component Analysis
Clustering/Classification Methods

Data Matrix




































nk
x
n
x
n
x
k
x
x
x
k
x
x
x
X
2
1
2
22
21
1
12
11

 Rotates multivariate dataset into a new
configuration which is easier to interpret
 Reducing dimensionality
 Purposes
- simplify data
- look at relationships between variables
- look at patterns of objects (samples)

Principal Components Analysis
4.0 4.5 5.0 5.5 6.0
2
3
4
5
1st Principal
Component, y1
2nd Principal
Component, y2

Y = A'X (1)
where
Y is the matrix of new variables (main components)
A is the matrix of the values of the orthonormal eigenvectors of matrix
C and
X is the data matrix
Transformation (1) is possible only after solving equation (2)
0

 I
C 
where
C is the variance-covariance matrix of order (kxk)
I is the unit matrix of order kxk, and
λ is the characteristic root of equation (2), called eigenvalue

From k original variables: x1, x2, ..., xk:
Produce k new variables: y1, y2, ..., yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk

From k original variables: x1, x2, ..., xk:
Produce k new variables: y1, y2, ..., yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
such that:
yk's are uncorrelated (orthogonal)
y1 explains as much as possible of original variance of data
y2 explains as much as possible of remaining variance etc.

Uses:
 Correlation matrix, or
 Covariance matrix when
variables in same units
(morphometrics, etc.)

So, principal components are given by:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
xj’s are standardized if correlation matrix is used (mean
0.0, SD 1.0)

Score of ith unit on jth principal component
yi,j = aj1xi1 + aj2xi2 + ... + ajkxik

PCA Scores
4.0 4.5 5.0 5.5 6.0
2
3
4
5
xi2
xi1
yi,2 yi1

Amount of variance accounted for by:
1st principal component, λ1, 1st eigenvalue
2nd principal component, λ2, 2nd eigenvalue
...
λ1 > λ2 > λ3 > λ4 > ...
Average λj = 1 (correlation matrix)

Principal Component Analysis:
Eigenvalues
4.0 4.5 5.0 5.5 6.0
2
3
4
5
λ1
λ2

Principal Component Analysis:
Terminology
 jth principal component is jth eigenvector of correlation/covariance matrix
 coefficients, ajk, are elements of eigenvectors and relate original variables
(standardized if using correlation matrix) to components
 scores are values of units on components (produced using coefficients)
 amount of variance accounted for by component is given by eigenvalue, λj
 proportion of variance accounted for by component is given by λj / Σ λj
 loading of kth original variable on jth component is given by ajk√λj --
correlation between variable and component

How Many Components to Use?
 If λj < 1 then component explains less variance than original variable
(correlation matrix)
 Use 2 components (or 3) for visual ease
 Scree diagram:
0
0.5
1
1.5
2
2.5
1 2 3 4 5
Component number
Eigenvalue

Principal Component Analysis on:
 Covariance Matrix:
 Variables must be in same units
 Emphasizes variables with most variance
 Mean eigenvalue ≠1.0
 Useful in morphometrics, a few other cases
 Correlation Matrix:
 Variables are standardized (mean 0.0, SD 1.0)
 Variables can be in different units
 All variables have same impact on analysis
 Mean eigenvalue = 1.0

PCA: Potential Problems
 Lack of Independence
 NO PROBLEM
 Lack of Normality
 Normality desirable but not essential
 Lack of Precision
 Precision desirable but not essential
 Many Zeroes in Data Matrix
 Problem (use Correspondence Analysis)

Procedure
for Principal Component Analysis
1. Decide whether to use correlation or covariance
matrix
2. Find eigenvectors (components) and eigenvalues
(variance accounted for)
3. Decide how many components to use by examining
eigenvalues (perhaps using scree diagram)
4. Examine loadings (perhaps vector loading plot)
5. Plot scores
6. Try rotation --- go to step 4

Chemical elements and their properties
1
Simbol
2
Grupa
3
Tt
4
Tf
5
d
6
NO
7
E
Li
Na
K
Rb
Cs
Be
Mg
Ca
Sr
F
Cl
Br
I
He
Ne
Ar
Kr
Xe
Zn
Co
Cu
Fe
Mn
Ni
Bi
Pb
Tl
Li 1 453.69 1615 534 1 0.98
Na 1 371 1156 970 1 0.93
K 1 336.5 1032 860 1 0.82
Rb 1 312.5 961 1530 1 0.82
Cs 1 301.6 944 1870 1 0.79
Be 2 1550 3243 1800 2 1.57
Mg 2 924 1380 1741 2 1.31
Ca 2 1120 1760 1540 2 1
Sr 2 1042 1657 2600 2 0.95
F 3 53.5 85 1.7 -1 3.98
Cl 3 172.1 238.5 3.2 -1 3.16
Br 3 265.9 331.9 3100 -1 2.96
I 3 386.6 457.4 4940 -1 2.66
He 4 0.9 4.2 0.2 0 0
Ne 4 24.5 27.2 0.8 0 0
Ar 4 83.7 87.4 1.7 0 0
Kr 4 116.5 120.8 3.5 0 0
Xe 4 161.2 166 5.5 0 0
Zn 5 692.6 1180 7140 2 1.6
Co 5 1765 3170 8900 3 1.8
Cu 5 1356 2868 8930 2 1.9
Fe 5 1808 3300 7870 2 1.8
Mn 5 1517 2370 7440 2 1.5
Ni 5 1726 3005 8900 2 1.8
Bi 6 544.4 1837 9780 3 2.02
Pb 6 600.61 2022 11340 2 1.8
Tl 6 577 1746 11850 3 1.62

Correlation matrix
Correlations (Elemente.sta) Marked correlations are significant
at p < .05000 N=27 (Casewise deletion of missing data)
Means Std.
Dev.
Tt Tf d NO E
Tt 676 593.6 1.000 0.938 0.573 0.705 0.188
Tf 1361.6 1095.1 1.000 0.671 0.811 0.182
d 3838.9 4068.7 1.000 0.684 0.339
NO 1.1 1.3 1.000 -0.107
E 1.4 1.0 1.000

Eigenvalues of correlation matrix
Eigenvalues of correlation matrix, and related
statistics (Elemente.sta) Active variables only
Eigenvalue % Total -
variance
Cumulative -
Eigenvalue
Cumulative
%
1 3.241 64.82 3.241 64.8
2 1.095 21.91 4.336 86.7
3 0.476 9.52 4.813 96.3
4 0.145 2.90 4.958 99.2
5 0.042 0.85 5.000 100.0

Eigenvectors of correlation matrix
Eigenvectors of correlation matrix (Elemente.sta)
Active variables only
Variable Factor 1 Factor 2 Factor 3 Factor 4 Factor 5
Tt 0.504 -0.037 0.552 -0.335 0.572
Tf 0.534 -0.058 0.313 -0.013 -0.783
d 0.457 0.203 -0.716 -0.487 0.018
NO 0.485 -0.338 -0.272 0.722 0.235
E 0.132 0.916 0.101 0.360 0.056

PC1-PC2 loading scatterplot
Projection of the variables on the factor-plane ( 1 x 2)
TtTf
d
NO
E
-1.0 -0.5 0.0 0.5 1.0
Factor 1 : 64.82%
-1.0
-0.5
0.0
0.5
1.0
Factor
2
:
21.91%

PC1-PC2 score scatterplot
Projection of the cases on the factor-plane ( 1 x 2)
Cases with sum of cosine square >= 0.00
Li
Na
K
Rb
Cs
Be
Mg
Ca
Sr
F
Cl Br
I
He
Ne
Ar
Kr
Xe
Zn
Co
Cu
Fe
Mn
Ni
Bi
Pb
Tl
-4 -3 -2 -1 0 1 2 3 4 5
Factor 1: 64.82%
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Factor
2:
21.91%

MULTIPLE DIMENSIONALANALYSIS
MDA
CLUSTERING METHODS
• K-means cluster
• Hierarchical clustering
• Two-way joining clustering

What is Cluster Analysis?
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
 Cluster analysis
 Grouping a set of data objects into clusters
 Clustering is unsupervised classification: no predefined
classes
 Typical applications
 As a stand-alone tool to get insight into data distribution
 As a preprocessing step for other algorithms

K-means clustering
1. Select the numbers of clusters (K)
2. Randomly select three distinct data points
3. Measure the distance between points and initial clusters and assign them to the nearest one
4. Calculate the means of so formed clusters
4. Repeat the previous steps

Two-way joining clustering
400
350
300
250
200
150
100
50
0
Height
Wine
Weight
Strength
Hair
Sex
Region
Shoes
Age
Income
Beer
IQ
FS
FS
FS
FN
FN
MS
MS
FN
MN
MN
MN

Statistical analysis information about PCA or principles component analysis and etc

Recommended

Recommended

More Related Content

Similar to Statistical analysis information about PCA or principles component analysis and etc

Similar to Statistical analysis information about PCA or principles component analysis and etc (20)

More from RezaJoia

More from RezaJoia (15)

Recently uploaded

Recently uploaded (20)

Statistical analysis information about PCA or principles component analysis and etc