PCA (Principal component analysis) Theory and Toolkits

•Download as PPTX, PDF•

1 like•1,168 views

HopeBay Technologies, Inc.

In this talk, I share some basic understanding of PCA and existing toolkits for development.

Data & Analytics

Theory and Toolkits
of PCA
2009 5/4 IRLab
Study Group
Presenter : Chin-Hui Chen

Agenda
 Theory :
◦ 1. Scenario
◦ 2. What is PCA?
◦ 3. How to minimize Squared-Error ?
◦ 4. Dimensionality Reduction
 Toolkit :
◦ A list of PCA toolkits
◦ Demo

Scenario (Point? Line?)
 Consider a 2-dimension space
Least Squared Error

What is PCA ? (1)
 Principal component analysis (PCA)
involves a mathematical procedure that
transforms a number of possibly
correlated variables into a smaller number
of uncorrelated variables called “principal
components”.

What is PCA ? (2)
 What can PCA do ?
◦ Dimensionality Reduction
 For example :
◦ Assuming N points in D-dim space
◦ e.g. {x1, x2, x3, x4} ; xi = (v1, v2)
◦ A set (M) of basis for projection
◦ e.g. {u1}
 They are orthonormal bases (長度1,兩兩內積0)
 M << D (represent the feature in M dimensions)
◦ e.g. xi = (p1)

How to minimize Squared-Error ?
 Consider a D-dimension space
◦ Given N point : {x1, x2, …, xn}
◦ xi is a D-dim vector
 How to
◦ 1. 找一個點使得squared-error最小
◦ 2. 找一條線使得squared-error最小

How to ? - Point
◦ Goal : Find x0 s.t. min.
◦
◦ Let .

How to ? – Point - Line
 ∴ x0 =
◦ 1. 找一個點使得squared-error最小
◦ 2. 找一條線使得squared-error最小
 L : xk’- x0 = ake
 xk’= x0 + ake
 = m + ake

How to ? – Line
 L : xk’ = m + ake
 Goal :
 Find a1…an


How to ? – Line
 每個部份微分後 [2ak – 2et(xk-m)]

 What does it mean ?
xk’ = m + ake

How to ? – Line
f(x,y) ->
But if x,y : g(x,y)=0
 J’1(e) = -etSe
 Use lagrange multiplier :

 Because |e| = 1 , u = etSe – λ(ete-1)

How to ? – Line

◦ What is S ?
 Covariance Matrix (共變異數矩陣)
◦ Assume D-dim

How to ? – Line
 , we know S.
 Then, what is e ? Eigenvectors of S.
AX= λX Eigen : same

How to ? – conclusion
 Summary :
◦ Find a line : xk’= m + ake
 ak = et(xk-m)
 Se = λe ; e = eigenvectors of covariance matrix.
◦ D-dim space can find D eigenvectors.

Dimensionality Reduction
 Consider a 2-dim space …
X1 = (a,b)
X2 = (c,d)
X1 = (a’,b’)
X2 = (c’,d’)
We are going to do …
X1 = (a’)
X2 = (c’)

Dimensionality Reduction
 We want to proof :
◦ Axes of the data are independent.
 Consider N m-dim vectors
◦ {x1, x2, … ,xn}
◦ Let X=[x1-m x2-m … xn-m]T m = mean
◦ Let E = [e1 e2 … em]
Se = λe
eigen decomposition Eigen vector {e1,…,em}
Eigen value {λ1,…, λm}

Dimensionality Reduction
 SE = [Se1 Se2 … Sem]
 = [λe1 λe2 … λem]

 =
 = ED
 S = EDE-1
E = [e1 e2 … em]

Dimensionality Reduction
 We want to know new Covariance Matrix
of projected vectors.
 Let Y = [y1 y2 … yn]T
 E = [e1 e2 … em]
 Y = ETX
 SY

Dimensionality Reduction
 SY = D
 1. Covariance of two axes are 0.
 2. represent data↑->covariance of axes↑
 -> λ ↑

Dimensionality Reduction
 Conclusion :
 If we want to reduce
 dimension D to M
 (M<<D)
 1. Find S
 2. ->eigenvalues
 3. Select Top M
 4. Project data

A List of PCA Toolkits
 C & Java
◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources
◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
 Perl
◦ PDL::PCA
 Matlab
◦ Statistics Toolbox™ : princomp
 Weka
◦ weka.attributeSelection.PrincipalComponents
(http://www.laps.ufpa.br/aldebaro/weka/feature_selection.html )

A List of PCA Toolkits
 C & Java
◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources
◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
C :
Download: pca.c
Compile: cc pca.c -lm -o pcac
Run: ./pcac spectr.dat 36 8 R > pcaout.c.txt
Java :
Download: JAMA, PCAcorr.java
Compile: javac –classpath Jama-1.0.2.jar PCAcorr.java
Run: java PCAcorr iris.dat > pcaout.java.txt

PCA (Principal component analysis) Theory and Toolkits

What's hot

Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995

"FingerPrint Recognition Using Principle Component Analysis(PCA)”Er. Arpit Sharma

Independent Component Analysis Ibrahim Amer

PcaNalini. Yadav

Principal component analysisFarah M. Altufaili

Lect5 principal component analysishktripathy

Pca ankita dubeyAnkita Dubey

Pca pptDheeraj Dwivedi

Understandig PCA and LDADr. Syed Hassan Amin

Principal component analysis - application in financeIgor Hlivka

Ldask19920909

Implement principal component analysis (PCA) in python from scratchEshanAgarwal4

Principal Component AnalysisMason Ziemer

Principal Component Analysis For Novelty DetectionJordan McBain

Principal component analysisPartha Sarathi Kar

Probabilistic PCA, EM, and morehsharmasshare

Independent component analysisVanessa S

A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili

Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa

Intro to MATLAB and K-mean algorithmkhalid Shah

What's hot (20)

Principal Component Analysis (PCA) and LDA PPT Slides

"FingerPrint Recognition Using Principle Component Analysis(PCA)”

Independent Component Analysis

Pca

Principal component analysis

Lect5 principal component analysis

Pca ankita dubey

Pca ppt

Understandig PCA and LDA

Principal component analysis - application in finance

Lda

Implement principal component analysis (PCA) in python from scratch

Principal Component Analysis

Principal Component Analysis For Novelty Detection

Principal component analysis

Probabilistic PCA, EM, and more

Independent component analysis

A Correlative Information-Theoretic Measure for Image Similarity

Neural Networks: Principal Component Analysis (PCA)

Intro to MATLAB and K-mean algorithm

Similar to PCA (Principal component analysis) Theory and Toolkits

5 DimensionalityReduction.pdfRahul926331

DimensionalityReduction.pptx36rajneekant

Randomized algorithms ver 1.0Dr. C.V. Suresh Babu

Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo

principle component analysis.pptxwahid ullah

AAC ch 3 Advance strategies (Dynamic Programming).pptxHarshitSingh334328

Principal Components Analysis, Calculation and VisualizationMarjan Sterjev

Principal Component Analysis PCAAbdullah al Mamun

ML unit2.pptxSwarnaKumariChinni

Line circle drawPraveen Kumar

CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfRajJain516913

Line drawing algorithm and antialiasing techniquesAnkit Garg

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf

Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov

Support Vector Machines SimplyEmad Nabil

Distributed Architecture of Subspace Clustering and RelatedPei-Che Chang

Open GL T0074 56 sm4Roziq Bahtiar

machine learning.pptxAbdusSadik

INTRODUCTION TO MATLAB presentation.pptxDevaraj Chilakala

Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata

Similar to PCA (Principal component analysis) Theory and Toolkits (20)

5 DimensionalityReduction.pdf

DimensionalityReduction.pptx

Randomized algorithms ver 1.0

Aaa ped-17-Unsupervised Learning: Dimensionality reduction

principle component analysis.pptx

AAC ch 3 Advance strategies (Dynamic Programming).pptx

Principal Components Analysis, Calculation and Visualization

Principal Component Analysis PCA

ML unit2.pptx

Line circle draw

CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf

Line drawing algorithm and antialiasing techniques

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...

Numerical Linear Algebra for Data and Link Analysis.

Support Vector Machines Simply

Distributed Architecture of Subspace Clustering and Related

Open GL T0074 56 sm4

machine learning.pptx

INTRODUCTION TO MATLAB presentation.pptx

Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...

Recently uploaded

Data Science Jobs and Salaries Analysis.pptxFurkanTasci3

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

B2 Creative Industry Response Evaluation.docxStephen266013

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Industrialised data - the key to AI success.pdfLars Albertsson

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Ukraine War presentation: KNOW THE BASICSAishani27

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

Data Warehouse , Data Cube Computationsit20ad004

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Recently uploaded (20)

Data Science Jobs and Salaries Analysis.pptx

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

B2 Creative Industry Response Evaluation.docx

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

20240419 - Measurecamp Amsterdam - SAM.pdf

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...

04242024_CCC TUG_Joins and Relationships

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

100-Concepts-of-AI by Anupama Kate .pptx

Industrialised data - the key to AI success.pdf

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Ukraine War presentation: KNOW THE BASICS

Unveiling Insights: The Role of a Data Analyst

Data Warehouse , Data Cube Computation

Call Girls In Mahipalpur O9654467111 Escorts Service

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

PCA (Principal component analysis) Theory and Toolkits

1. Theory and Toolkits of PCA 2009 5/4 IRLab Study Group Presenter : Chin-Hui Chen

2. Agenda  Theory : ◦ 1. Scenario ◦ 2. What is PCA? ◦ 3. How to minimize Squared-Error ? ◦ 4. Dimensionality Reduction  Toolkit : ◦ A list of PCA toolkits ◦ Demo

3. Scenario (Point? Line?)  Consider a 2-dimension space Least Squared Error

4. Agenda  Theory : ◦ 1. Scenario ◦ 2. What is PCA? ◦ 3. How to minimize Squared-Error ? ◦ 4. Dimensionality Reduction  Toolkit : ◦ A list of PCA toolkits ◦ Demo

5. What is PCA ? (1)  Principal component analysis (PCA) involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called “principal components”.

6. What is PCA ? (2)  What can PCA do ? ◦ Dimensionality Reduction  For example : ◦ Assuming N points in D-dim space ◦ e.g. {x1, x2, x3, x4} ; xi = (v1, v2) ◦ A set (M) of basis for projection ◦ e.g. {u1}  They are orthonormal bases (長度1,兩兩內積0)  M << D (represent the feature in M dimensions) ◦ e.g. xi = (p1)

7. Agenda  Theory : ◦ 1. Scenario ◦ 2. What is PCA? ◦ 3. How to minimize Squared-Error ? ◦ 4. Dimensionality Reduction  Toolkit : ◦ A list of PCA toolkits ◦ Demo

8. How to minimize Squared-Error ?  Consider a D-dimension space ◦ Given N point : {x1, x2, …, xn} ◦ xi is a D-dim vector  How to ◦ 1. 找一個點使得squared-error最小 ◦ 2. 找一條線使得squared-error最小

9. How to ? - Point ◦ Goal : Find x0 s.t. min. ◦ ◦ Let .

10. How to ? – Point - Line  ∴ x0 = ◦ 1. 找一個點使得squared-error最小 ◦ 2. 找一條線使得squared-error最小  L : xk’- x0 = ake  xk’= x0 + ake  = m + ake

11. How to ? – Line  L : xk’ = m + ake  Goal :  Find a1…an 

12. How to ? – Line  每個部份微分後 [2ak – 2et(xk-m)]   What does it mean ? xk’ = m + ake

13. How to ? – Line  Then, how about e ?

14. How to ? – Line  Let Independent of e

15. How to ? – Line f(x,y) -> But if x,y : g(x,y)=0  J’1(e) = -etSe  Use lagrange multiplier :   Because |e| = 1 , u = etSe – λ(ete-1)

16. How to ? – Line  ◦ What is S ?  Covariance Matrix (共變異數矩陣) ◦ Assume D-dim

17. How to ? – Line  , we know S.  Then, what is e ? Eigenvectors of S. AX= λX Eigen : same

18. How to ? – conclusion  Summary : ◦ Find a line : xk’= m + ake  ak = et(xk-m)  Se = λe ; e = eigenvectors of covariance matrix. ◦ D-dim space can find D eigenvectors.

19. Agenda  Theory : ◦ 1. Scenario ◦ 2. What is PCA? ◦ 3. How to minimize Squared-Error ? ◦ 4. Dimensionality Reduction  Toolkit : ◦ A list of PCA toolkits ◦ Demo

20. Dimensionality Reduction

21. Dimensionality Reduction  Consider a 2-dim space … X1 = (a,b) X2 = (c,d) X1 = (a’,b’) X2 = (c’,d’) We are going to do … X1 = (a’) X2 = (c’)

22. Dimensionality Reduction  We want to proof : ◦ Axes of the data are independent.  Consider N m-dim vectors ◦ {x1, x2, … ,xn} ◦ Let X=[x1-m x2-m … xn-m]T m = mean ◦ Let E = [e1 e2 … em] Se = λe eigen decomposition Eigen vector {e1,…,em} Eigen value {λ1,…, λm}

23. Dimensionality Reduction  SE = [Se1 Se2 … Sem]  = [λe1 λe2 … λem]   =  = ED  S = EDE-1 E = [e1 e2 … em]

24. Dimensionality Reduction  We want to know new Covariance Matrix of projected vectors.  Let Y = [y1 y2 … yn]T  E = [e1 e2 … em]  Y = ETX  SY

25. Dimensionality Reduction  SY = D  1. Covariance of two axes are 0.  2. represent data↑->covariance of axes↑  -> λ ↑

26. Dimensionality Reduction  Conclusion :  If we want to reduce  dimension D to M  (M<<D)  1. Find S  2. ->eigenvalues  3. Select Top M  4. Project data

27. Agenda  Theory : ◦ 1. Scenario ◦ 2. What is PCA? ◦ 3. How to minimize Squared-Error ? ◦ 4. Dimensionality Reduction  Toolkit : ◦ A list of PCA toolkits ◦ Demo

28. Toolkits

29. A List of PCA Toolkits  C & Java ◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources ◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/  Perl ◦ PDL::PCA  Matlab ◦ Statistics Toolbox™ : princomp  Weka ◦ weka.attributeSelection.PrincipalComponents (http://www.laps.ufpa.br/aldebaro/weka/feature_selection.html )

30. A List of PCA Toolkits  C & Java ◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources ◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/ C : Download: pca.c Compile: cc pca.c -lm -o pcac Run: ./pcac spectr.dat 36 8 R > pcaout.c.txt Java : Download: JAMA, PCAcorr.java Compile: javac –classpath Jama-1.0.2.jar PCAcorr.java Run: java PCAcorr iris.dat > pcaout.java.txt

Editor's Notes

因為這代表如果你已經知道 e , 將空間中任一點xk投射到t直線 L 上, 只需要將原座標為移後與 e 做內積, 就可以得到空間轉換後的新座標

PCA (Principal component analysis) Theory and Toolkits

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PCA (Principal component analysis) Theory and Toolkits

Similar to PCA (Principal component analysis) Theory and Toolkits (20)

More from HopeBay Technologies, Inc.

More from HopeBay Technologies, Inc. (7)

Recently uploaded

Recently uploaded (20)

PCA (Principal component analysis) Theory and Toolkits

Editor's Notes