SlideShare a Scribd company logo
1 of 32
MULTIPLE DIMENSIONALANALYSIS
MDA
Principal Component Analysis
Clustering/Classification Methods
Data Matrix




































nk
x
n
x
n
x
k
x
x
x
k
x
x
x
X
2
1
2
22
21
1
12
11
Principal Component Analysis
 Rotates multivariate dataset into a new
configuration which is easier to interpret
 Reducing dimensionality
 Purposes
- simplify data
- look at relationships between variables
- look at patterns of objects (samples)
Principal Components Analysis
4.0 4.5 5.0 5.5 6.0
2
3
4
5
1st Principal
Component, y1
2nd Principal
Component, y2
Principal Components Analysis
Y = A'X (1)
where
Y is the matrix of new variables (main components)
A is the matrix of the values of the orthonormal eigenvectors of matrix
C and
X is the data matrix
Transformation (1) is possible only after solving equation (2)
0

 I
C 
where
C is the variance-covariance matrix of order (kxk)
I is the unit matrix of order kxk, and
λ is the characteristic root of equation (2), called eigenvalue
Principal Components Analysis
From k original variables: x1, x2, ..., xk:
Produce k new variables: y1, y2, ..., yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
Principal Components Analysis
From k original variables: x1, x2, ..., xk:
Produce k new variables: y1, y2, ..., yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
such that:
yk's are uncorrelated (orthogonal)
y1 explains as much as possible of original variance of data
y2 explains as much as possible of remaining variance etc.
Principal Components Analysis
Uses:
 Correlation matrix, or
 Covariance matrix when
variables in same units
(morphometrics, etc.)
Principal Components Analysis
So, principal components are given by:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
xj’s are standardized if correlation matrix is used (mean
0.0, SD 1.0)
Principal Component Analysis
Score of ith unit on jth principal component
yi,j = aj1xi1 + aj2xi2 + ... + ajkxik
PCA Scores
4.0 4.5 5.0 5.5 6.0
2
3
4
5
xi2
xi1
yi,2 yi1
Principal Component Analysis
Amount of variance accounted for by:
1st principal component, λ1, 1st eigenvalue
2nd principal component, λ2, 2nd eigenvalue
...
λ1 > λ2 > λ3 > λ4 > ...
Average λj = 1 (correlation matrix)
Principal Component Analysis:
Eigenvalues
4.0 4.5 5.0 5.5 6.0
2
3
4
5
λ1
λ2
Principal Component Analysis:
Terminology
 jth principal component is jth eigenvector of correlation/covariance matrix
 coefficients, ajk, are elements of eigenvectors and relate original variables
(standardized if using correlation matrix) to components
 scores are values of units on components (produced using coefficients)
 amount of variance accounted for by component is given by eigenvalue, λj
 proportion of variance accounted for by component is given by λj / Σ λj
 loading of kth original variable on jth component is given by ajk√λj --
correlation between variable and component
How Many Components to Use?
 If λj < 1 then component explains less variance than original variable
(correlation matrix)
 Use 2 components (or 3) for visual ease
 Scree diagram:
0
0.5
1
1.5
2
2.5
1 2 3 4 5
Component number
Eigenvalue
Principal Component Analysis on:
 Covariance Matrix:
 Variables must be in same units
 Emphasizes variables with most variance
 Mean eigenvalue ≠1.0
 Useful in morphometrics, a few other cases
 Correlation Matrix:
 Variables are standardized (mean 0.0, SD 1.0)
 Variables can be in different units
 All variables have same impact on analysis
 Mean eigenvalue = 1.0
PCA: Potential Problems
 Lack of Independence
 NO PROBLEM
 Lack of Normality
 Normality desirable but not essential
 Lack of Precision
 Precision desirable but not essential
 Many Zeroes in Data Matrix
 Problem (use Correspondence Analysis)
Procedure
for Principal Component Analysis
1. Decide whether to use correlation or covariance
matrix
2. Find eigenvectors (components) and eigenvalues
(variance accounted for)
3. Decide how many components to use by examining
eigenvalues (perhaps using scree diagram)
4. Examine loadings (perhaps vector loading plot)
5. Plot scores
6. Try rotation --- go to step 4
Chemical elements and their properties
1
Simbol
2
Grupa
3
Tt
4
Tf
5
d
6
NO
7
E
Li
Na
K
Rb
Cs
Be
Mg
Ca
Sr
F
Cl
Br
I
He
Ne
Ar
Kr
Xe
Zn
Co
Cu
Fe
Mn
Ni
Bi
Pb
Tl
Li 1 453.69 1615 534 1 0.98
Na 1 371 1156 970 1 0.93
K 1 336.5 1032 860 1 0.82
Rb 1 312.5 961 1530 1 0.82
Cs 1 301.6 944 1870 1 0.79
Be 2 1550 3243 1800 2 1.57
Mg 2 924 1380 1741 2 1.31
Ca 2 1120 1760 1540 2 1
Sr 2 1042 1657 2600 2 0.95
F 3 53.5 85 1.7 -1 3.98
Cl 3 172.1 238.5 3.2 -1 3.16
Br 3 265.9 331.9 3100 -1 2.96
I 3 386.6 457.4 4940 -1 2.66
He 4 0.9 4.2 0.2 0 0
Ne 4 24.5 27.2 0.8 0 0
Ar 4 83.7 87.4 1.7 0 0
Kr 4 116.5 120.8 3.5 0 0
Xe 4 161.2 166 5.5 0 0
Zn 5 692.6 1180 7140 2 1.6
Co 5 1765 3170 8900 3 1.8
Cu 5 1356 2868 8930 2 1.9
Fe 5 1808 3300 7870 2 1.8
Mn 5 1517 2370 7440 2 1.5
Ni 5 1726 3005 8900 2 1.8
Bi 6 544.4 1837 9780 3 2.02
Pb 6 600.61 2022 11340 2 1.8
Tl 6 577 1746 11850 3 1.62
Correlation matrix
Correlations (Elemente.sta) Marked correlations are significant
at p < .05000 N=27 (Casewise deletion of missing data)
Means Std.
Dev.
Tt Tf d NO E
Tt 676 593.6 1.000 0.938 0.573 0.705 0.188
Tf 1361.6 1095.1 1.000 0.671 0.811 0.182
d 3838.9 4068.7 1.000 0.684 0.339
NO 1.1 1.3 1.000 -0.107
E 1.4 1.0 1.000
Eigenvalues of correlation matrix
Eigenvalues of correlation matrix, and related
statistics (Elemente.sta) Active variables only
Eigenvalue % Total -
variance
Cumulative -
Eigenvalue
Cumulative
%
1 3.241 64.82 3.241 64.8
2 1.095 21.91 4.336 86.7
3 0.476 9.52 4.813 96.3
4 0.145 2.90 4.958 99.2
5 0.042 0.85 5.000 100.0
Eigenvectors of correlation matrix
Eigenvectors of correlation matrix (Elemente.sta)
Active variables only
Variable Factor 1 Factor 2 Factor 3 Factor 4 Factor 5
Tt 0.504 -0.037 0.552 -0.335 0.572
Tf 0.534 -0.058 0.313 -0.013 -0.783
d 0.457 0.203 -0.716 -0.487 0.018
NO 0.485 -0.338 -0.272 0.722 0.235
E 0.132 0.916 0.101 0.360 0.056
PC1-PC2 loading scatterplot
Projection of the variables on the factor-plane ( 1 x 2)
TtTf
d
NO
E
-1.0 -0.5 0.0 0.5 1.0
Factor 1 : 64.82%
-1.0
-0.5
0.0
0.5
1.0
Factor
2
:
21.91%
PC1-PC2 score scatterplot
Projection of the cases on the factor-plane ( 1 x 2)
Cases with sum of cosine square >= 0.00
Li
Na
K
Rb
Cs
Be
Mg
Ca
Sr
F
Cl Br
I
He
Ne
Ar
Kr
Xe
Zn
Co
Cu
Fe
Mn
Ni
Bi
Pb
Tl
-4 -3 -2 -1 0 1 2 3 4 5
Factor 1: 64.82%
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Factor
2:
21.91%
MULTIPLE DIMENSIONALANALYSIS
MDA
CLUSTERING METHODS
• K-means cluster
• Hierarchical clustering
• Two-way joining clustering
What is Cluster Analysis?
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
 Cluster analysis
 Grouping a set of data objects into clusters
 Clustering is unsupervised classification: no predefined
classes
 Typical applications
 As a stand-alone tool to get insight into data distribution
 As a preprocessing step for other algorithms
K-means clustering
1. Select the numbers of clusters (K)
2. Randomly select three distinct data points
3. Measure the distance between points and initial clusters and assign them to the nearest one
4. Calculate the means of so formed clusters
4. Repeat the previous steps
K-means clustering
K-means clustering
K-means clustering
Hierarchical clustering
Two-way joining clustering
400
350
300
250
200
150
100
50
0
Height
Wine
Weight
Strength
Hair
Sex
Region
Shoes
Age
Income
Beer
IQ
FS
FS
FS
FN
FN
MS
MS
FN
MN
MN
MN

More Related Content

Similar to Statistical analysis information about PCA or principles component analysis and etc

2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
nozomuhamada
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
Neerav Shivhare
 

Similar to Statistical analysis information about PCA or principles component analysis and etc (20)

talk9.ppt
talk9.ppttalk9.ppt
talk9.ppt
 
Lecture 8.pptx
Lecture 8.pptxLecture 8.pptx
Lecture 8.pptx
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 
Principal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding documentPrincipal Component Analysis(PCA) understanding document
Principal Component Analysis(PCA) understanding document
 
Factor Analysis for Exploratory Studies
Factor Analysis for Exploratory StudiesFactor Analysis for Exploratory Studies
Factor Analysis for Exploratory Studies
 
Matrix algebra in_r
Matrix algebra in_rMatrix algebra in_r
Matrix algebra in_r
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Determinants, crammers law, Inverse by adjoint and the applications
Determinants, crammers law,  Inverse by adjoint and the applicationsDeterminants, crammers law,  Inverse by adjoint and the applications
Determinants, crammers law, Inverse by adjoint and the applications
 
lecture.ppt
lecture.pptlecture.ppt
lecture.ppt
 
Functional Regression Analysis
Functional Regression AnalysisFunctional Regression Analysis
Functional Regression Analysis
 
pca.ppt
pca.pptpca.ppt
pca.ppt
 
The following ppt is about principal component analysis
The following ppt is about principal component analysisThe following ppt is about principal component analysis
The following ppt is about principal component analysis
 
MLMM_16_08_2022.pdf
MLMM_16_08_2022.pdfMLMM_16_08_2022.pdf
MLMM_16_08_2022.pdf
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
 
Algorithms - "quicksort"
Algorithms - "quicksort"Algorithms - "quicksort"
Algorithms - "quicksort"
 
11. Linear Models
11. Linear Models11. Linear Models
11. Linear Models
 
Econometric (Indonesia's Economy).pptx
Econometric (Indonesia's Economy).pptxEconometric (Indonesia's Economy).pptx
Econometric (Indonesia's Economy).pptx
 
Week 10 GEE Data Examples v2.pptx
Week 10 GEE Data Examples v2.pptxWeek 10 GEE Data Examples v2.pptx
Week 10 GEE Data Examples v2.pptx
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
 
Control system introduction for different application
Control system introduction for different applicationControl system introduction for different application
Control system introduction for different application
 

More from RezaJoia

5991-6593_Agilent_Atomic Spectroscopy_Hardware_EN.pptx
5991-6593_Agilent_Atomic Spectroscopy_Hardware_EN.pptx5991-6593_Agilent_Atomic Spectroscopy_Hardware_EN.pptx
5991-6593_Agilent_Atomic Spectroscopy_Hardware_EN.pptx
RezaJoia
 
c915592f15f964456da34422a765551d4ca7d129e2d2a14b7cb4ced0bbc03e6b.pptx
c915592f15f964456da34422a765551d4ca7d129e2d2a14b7cb4ced0bbc03e6b.pptxc915592f15f964456da34422a765551d4ca7d129e2d2a14b7cb4ced0bbc03e6b.pptx
c915592f15f964456da34422a765551d4ca7d129e2d2a14b7cb4ced0bbc03e6b.pptx
RezaJoia
 
Lecture 8 managing cultural differences.pptx
Lecture 8 managing cultural differences.pptxLecture 8 managing cultural differences.pptx
Lecture 8 managing cultural differences.pptx
RezaJoia
 
بروشور پوهنحی تعلیم و تربیه .pdf
بروشور پوهنحی تعلیم و تربیه .pdfبروشور پوهنحی تعلیم و تربیه .pdf
بروشور پوهنحی تعلیم و تربیه .pdf
RezaJoia
 
ramanspectroscpypresentationbyzakiaafzal-150525070834-lva1-app6891.pdf
ramanspectroscpypresentationbyzakiaafzal-150525070834-lva1-app6891.pdframanspectroscpypresentationbyzakiaafzal-150525070834-lva1-app6891.pdf
ramanspectroscpypresentationbyzakiaafzal-150525070834-lva1-app6891.pdf
RezaJoia
 
Lecture 2 - Literature Research 2023.pdf
Lecture 2 - Literature Research 2023.pdfLecture 2 - Literature Research 2023.pdf
Lecture 2 - Literature Research 2023.pdf
RezaJoia
 

More from RezaJoia (15)

Application of ELISA in food analysis.pptx
Application of ELISA in food analysis.pptxApplication of ELISA in food analysis.pptx
Application of ELISA in food analysis.pptx
 
biosensors in food analysis.pptx
biosensors in food analysis.pptxbiosensors in food analysis.pptx
biosensors in food analysis.pptx
 
determination of Acrylamide.pptx
determination of Acrylamide.pptxdetermination of Acrylamide.pptx
determination of Acrylamide.pptx
 
week 5.pptx
week 5.pptxweek 5.pptx
week 5.pptx
 
Lecture 2.pdf
Lecture 2.pdfLecture 2.pdf
Lecture 2.pdf
 
شعشع_جسم_سياه.pptx
شعشع_جسم_سياه.pptxشعشع_جسم_سياه.pptx
شعشع_جسم_سياه.pptx
 
5991-6593_Agilent_Atomic Spectroscopy_Hardware_EN.pptx
5991-6593_Agilent_Atomic Spectroscopy_Hardware_EN.pptx5991-6593_Agilent_Atomic Spectroscopy_Hardware_EN.pptx
5991-6593_Agilent_Atomic Spectroscopy_Hardware_EN.pptx
 
1606732404-atomic-absorption-emission-2.ppt
1606732404-atomic-absorption-emission-2.ppt1606732404-atomic-absorption-emission-2.ppt
1606732404-atomic-absorption-emission-2.ppt
 
c915592f15f964456da34422a765551d4ca7d129e2d2a14b7cb4ced0bbc03e6b.pptx
c915592f15f964456da34422a765551d4ca7d129e2d2a14b7cb4ced0bbc03e6b.pptxc915592f15f964456da34422a765551d4ca7d129e2d2a14b7cb4ced0bbc03e6b.pptx
c915592f15f964456da34422a765551d4ca7d129e2d2a14b7cb4ced0bbc03e6b.pptx
 
Lecture 8 managing cultural differences.pptx
Lecture 8 managing cultural differences.pptxLecture 8 managing cultural differences.pptx
Lecture 8 managing cultural differences.pptx
 
F1-Webster.pptx
F1-Webster.pptxF1-Webster.pptx
F1-Webster.pptx
 
بروشور پوهنحی تعلیم و تربیه .pdf
بروشور پوهنحی تعلیم و تربیه .pdfبروشور پوهنحی تعلیم و تربیه .pdf
بروشور پوهنحی تعلیم و تربیه .pdf
 
ramanspectroscpypresentationbyzakiaafzal-150525070834-lva1-app6891.pdf
ramanspectroscpypresentationbyzakiaafzal-150525070834-lva1-app6891.pdframanspectroscpypresentationbyzakiaafzal-150525070834-lva1-app6891.pdf
ramanspectroscpypresentationbyzakiaafzal-150525070834-lva1-app6891.pdf
 
Lecture 2 - Literature Research 2023.pdf
Lecture 2 - Literature Research 2023.pdfLecture 2 - Literature Research 2023.pdf
Lecture 2 - Literature Research 2023.pdf
 
basic_organic_chemistry_and_mechanisms_revision_from_m_wills_for_when_you_are...
basic_organic_chemistry_and_mechanisms_revision_from_m_wills_for_when_you_are...basic_organic_chemistry_and_mechanisms_revision_from_m_wills_for_when_you_are...
basic_organic_chemistry_and_mechanisms_revision_from_m_wills_for_when_you_are...
 

Recently uploaded

Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 

Statistical analysis information about PCA or principles component analysis and etc

  • 1. MULTIPLE DIMENSIONALANALYSIS MDA Principal Component Analysis Clustering/Classification Methods
  • 3. Principal Component Analysis  Rotates multivariate dataset into a new configuration which is easier to interpret  Reducing dimensionality  Purposes - simplify data - look at relationships between variables - look at patterns of objects (samples)
  • 4. Principal Components Analysis 4.0 4.5 5.0 5.5 6.0 2 3 4 5 1st Principal Component, y1 2nd Principal Component, y2
  • 5. Principal Components Analysis Y = A'X (1) where Y is the matrix of new variables (main components) A is the matrix of the values of the orthonormal eigenvectors of matrix C and X is the data matrix Transformation (1) is possible only after solving equation (2) 0   I C  where C is the variance-covariance matrix of order (kxk) I is the unit matrix of order kxk, and λ is the characteristic root of equation (2), called eigenvalue
  • 6. Principal Components Analysis From k original variables: x1, x2, ..., xk: Produce k new variables: y1, y2, ..., yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk
  • 7. Principal Components Analysis From k original variables: x1, x2, ..., xk: Produce k new variables: y1, y2, ..., yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk such that: yk's are uncorrelated (orthogonal) y1 explains as much as possible of original variance of data y2 explains as much as possible of remaining variance etc.
  • 8. Principal Components Analysis Uses:  Correlation matrix, or  Covariance matrix when variables in same units (morphometrics, etc.)
  • 9. Principal Components Analysis So, principal components are given by: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk xj’s are standardized if correlation matrix is used (mean 0.0, SD 1.0)
  • 10. Principal Component Analysis Score of ith unit on jth principal component yi,j = aj1xi1 + aj2xi2 + ... + ajkxik
  • 11. PCA Scores 4.0 4.5 5.0 5.5 6.0 2 3 4 5 xi2 xi1 yi,2 yi1
  • 12. Principal Component Analysis Amount of variance accounted for by: 1st principal component, λ1, 1st eigenvalue 2nd principal component, λ2, 2nd eigenvalue ... λ1 > λ2 > λ3 > λ4 > ... Average λj = 1 (correlation matrix)
  • 13. Principal Component Analysis: Eigenvalues 4.0 4.5 5.0 5.5 6.0 2 3 4 5 λ1 λ2
  • 14. Principal Component Analysis: Terminology  jth principal component is jth eigenvector of correlation/covariance matrix  coefficients, ajk, are elements of eigenvectors and relate original variables (standardized if using correlation matrix) to components  scores are values of units on components (produced using coefficients)  amount of variance accounted for by component is given by eigenvalue, λj  proportion of variance accounted for by component is given by λj / Σ λj  loading of kth original variable on jth component is given by ajk√λj -- correlation between variable and component
  • 15. How Many Components to Use?  If λj < 1 then component explains less variance than original variable (correlation matrix)  Use 2 components (or 3) for visual ease  Scree diagram: 0 0.5 1 1.5 2 2.5 1 2 3 4 5 Component number Eigenvalue
  • 16. Principal Component Analysis on:  Covariance Matrix:  Variables must be in same units  Emphasizes variables with most variance  Mean eigenvalue ≠1.0  Useful in morphometrics, a few other cases  Correlation Matrix:  Variables are standardized (mean 0.0, SD 1.0)  Variables can be in different units  All variables have same impact on analysis  Mean eigenvalue = 1.0
  • 17. PCA: Potential Problems  Lack of Independence  NO PROBLEM  Lack of Normality  Normality desirable but not essential  Lack of Precision  Precision desirable but not essential  Many Zeroes in Data Matrix  Problem (use Correspondence Analysis)
  • 18. Procedure for Principal Component Analysis 1. Decide whether to use correlation or covariance matrix 2. Find eigenvectors (components) and eigenvalues (variance accounted for) 3. Decide how many components to use by examining eigenvalues (perhaps using scree diagram) 4. Examine loadings (perhaps vector loading plot) 5. Plot scores 6. Try rotation --- go to step 4
  • 19. Chemical elements and their properties 1 Simbol 2 Grupa 3 Tt 4 Tf 5 d 6 NO 7 E Li Na K Rb Cs Be Mg Ca Sr F Cl Br I He Ne Ar Kr Xe Zn Co Cu Fe Mn Ni Bi Pb Tl Li 1 453.69 1615 534 1 0.98 Na 1 371 1156 970 1 0.93 K 1 336.5 1032 860 1 0.82 Rb 1 312.5 961 1530 1 0.82 Cs 1 301.6 944 1870 1 0.79 Be 2 1550 3243 1800 2 1.57 Mg 2 924 1380 1741 2 1.31 Ca 2 1120 1760 1540 2 1 Sr 2 1042 1657 2600 2 0.95 F 3 53.5 85 1.7 -1 3.98 Cl 3 172.1 238.5 3.2 -1 3.16 Br 3 265.9 331.9 3100 -1 2.96 I 3 386.6 457.4 4940 -1 2.66 He 4 0.9 4.2 0.2 0 0 Ne 4 24.5 27.2 0.8 0 0 Ar 4 83.7 87.4 1.7 0 0 Kr 4 116.5 120.8 3.5 0 0 Xe 4 161.2 166 5.5 0 0 Zn 5 692.6 1180 7140 2 1.6 Co 5 1765 3170 8900 3 1.8 Cu 5 1356 2868 8930 2 1.9 Fe 5 1808 3300 7870 2 1.8 Mn 5 1517 2370 7440 2 1.5 Ni 5 1726 3005 8900 2 1.8 Bi 6 544.4 1837 9780 3 2.02 Pb 6 600.61 2022 11340 2 1.8 Tl 6 577 1746 11850 3 1.62
  • 20. Correlation matrix Correlations (Elemente.sta) Marked correlations are significant at p < .05000 N=27 (Casewise deletion of missing data) Means Std. Dev. Tt Tf d NO E Tt 676 593.6 1.000 0.938 0.573 0.705 0.188 Tf 1361.6 1095.1 1.000 0.671 0.811 0.182 d 3838.9 4068.7 1.000 0.684 0.339 NO 1.1 1.3 1.000 -0.107 E 1.4 1.0 1.000
  • 21. Eigenvalues of correlation matrix Eigenvalues of correlation matrix, and related statistics (Elemente.sta) Active variables only Eigenvalue % Total - variance Cumulative - Eigenvalue Cumulative % 1 3.241 64.82 3.241 64.8 2 1.095 21.91 4.336 86.7 3 0.476 9.52 4.813 96.3 4 0.145 2.90 4.958 99.2 5 0.042 0.85 5.000 100.0
  • 22. Eigenvectors of correlation matrix Eigenvectors of correlation matrix (Elemente.sta) Active variables only Variable Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Tt 0.504 -0.037 0.552 -0.335 0.572 Tf 0.534 -0.058 0.313 -0.013 -0.783 d 0.457 0.203 -0.716 -0.487 0.018 NO 0.485 -0.338 -0.272 0.722 0.235 E 0.132 0.916 0.101 0.360 0.056
  • 23. PC1-PC2 loading scatterplot Projection of the variables on the factor-plane ( 1 x 2) TtTf d NO E -1.0 -0.5 0.0 0.5 1.0 Factor 1 : 64.82% -1.0 -0.5 0.0 0.5 1.0 Factor 2 : 21.91%
  • 24. PC1-PC2 score scatterplot Projection of the cases on the factor-plane ( 1 x 2) Cases with sum of cosine square >= 0.00 Li Na K Rb Cs Be Mg Ca Sr F Cl Br I He Ne Ar Kr Xe Zn Co Cu Fe Mn Ni Bi Pb Tl -4 -3 -2 -1 0 1 2 3 4 5 Factor 1: 64.82% -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Factor 2: 21.91%
  • 25. MULTIPLE DIMENSIONALANALYSIS MDA CLUSTERING METHODS • K-means cluster • Hierarchical clustering • Two-way joining clustering
  • 26. What is Cluster Analysis?  Cluster: a collection of data objects  Similar to one another within the same cluster  Dissimilar to the objects in other clusters  Cluster analysis  Grouping a set of data objects into clusters  Clustering is unsupervised classification: no predefined classes  Typical applications  As a stand-alone tool to get insight into data distribution  As a preprocessing step for other algorithms
  • 27. K-means clustering 1. Select the numbers of clusters (K) 2. Randomly select three distinct data points 3. Measure the distance between points and initial clusters and assign them to the nearest one 4. Calculate the means of so formed clusters 4. Repeat the previous steps