SlideShare a Scribd company logo
1 of 5
Ashoka bairwa
Page 1
CENTRAL UNIVERSITY OF HARYANA
Department of Computer Science & Engineering under SOET
MACHINE LEARNING LAB
Submitted by
Ashoka
Roll No:- 191890
Submitted to
Dr. Sangeeta
Assistant Professor
Central University of Haryana (SOET)
Ashoka bairwa
Page 2
PRACTICAL-1
Aim: Introduction about Principal Component Analysis (PCA) feature extraction.
Theory:
Principal Component Analysis, or PCA for short, is a method for reducing the dimensionality of
data.
It can be thought of as a projection method where data with m-columns (features) is projected into a
subspace with m or fewer columns, whilst retaining the essence of the original data.
The PCA method can be described and implemented using the tools of linear algebra.
Implementation:
PCA is an operation applied to a dataset, represented by an n x m matrix A that results in a projection
of A which we will call B. Let’s walk through the steps of this operation.
The first step is to calculate the mean values of each column.
Or
Next, we need to center the values in each column by subtracting the mean column value.
The next step is to calculate the covariance matrix of the centered matrix C.
Correlation is a normalized measure of the amount and direction (positive or negative) that two columns
change together. Covariance is a generalized and unnormalized version of correlation across multiple
columns. A covariance matrix is a calculation of covariance of a given matrix with covariance scores for
every column with every other column, including itself.
Ashoka bairwa
Page 3
Finally, we calculate the eigen decomposition of the covariance matrix V. This results in a list of eigenvalues
and a list of eigenvectors.
The eigenvectors represent the directions or components for the reduced subspace of B, whereas the
eigenvalues represent the magnitudes for the directions. For more on this topic, see the post:
Gentle Introduction to Eigen decomposition, Eigenvalues, and Eigenvectors for Machine Learning
The eigenvectors can be sorted by the eigenvalues in descending order to provide a ranking of the
components or axes of the new subspace for A.
If all eigenvalues have a similar value, then we know that the existing representation may already be
reasonably compressed or dense and that the projection may offer little. If there are eigenvalues close to
zero, they represent components or axes of B that may be discarded.
A total of m or less components must be selected to comprise the chosen subspace. Ideally, we would select
k eigenvectors, called principal components, that have the k largest eigenvalues.
Other matrix decomposition methods can be used such as Singular-Value Decomposition, or SVD. As such,
generally the values are referred to as singular values and the vectors of the subspace are referred to as
principal components.
Once chosen, data can be projected into the subspace via matrix multiplication.
Where A is the original data that we wish to project, B^T is the transpose of the chosen principal
components and P is the projection of A.
This is called the covariance method for calculating the PCA, although there are alternative ways to to
calculate it.
Ashoka bairwa
Page 4
Manually Calculate Principal Component Analysis
Ashoka bairwa
Page 5
Reusable Principal Component Analysis
 We can calculate a Principal Component Analysis on a dataset using the PCA() class in the
scikit-learn library. The benefit of this approach is that once the projection is calculated, it can
be applied to new data again and again quite easily.
 When creating the class, the number of components can be specified as a parameter.
 The class is first fit on a dataset by calling the fit() function, and then the original dataset or
other data can be projected into a subspace with the chosen number of dimensions by calling the
transform() function.
 Once fit, the eigenvalues and principal components can be accessed on the PCA class via the
explained_variance_ and components_ attributes.

More Related Content

Similar to Machine Learning Lab Report on PCA Feature Extraction

Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovarianceShrey Nishchal
 
Project management
Project managementProject management
Project managementAvay Minni
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical EquationsIRJET Journal
 
Classification Of Web Documents
Classification Of Web Documents Classification Of Web Documents
Classification Of Web Documents hussainahmad77100
 
PRML Chapter 12
PRML Chapter 12PRML Chapter 12
PRML Chapter 12Sunwoo Kim
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learningAkhilesh Joshi
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxrajalakshmi5921
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxrajalakshmi5921
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data SciencePremier Publishers
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component AnalysisSunjeet Jena
 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3Sunwoo Kim
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech
 

Similar to Machine Learning Lab Report on PCA Feature Extraction (20)

Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 
Project management
Project managementProject management
Project management
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical Equations
 
Classification Of Web Documents
Classification Of Web Documents Classification Of Web Documents
Classification Of Web Documents
 
PRML Chapter 12
PRML Chapter 12PRML Chapter 12
PRML Chapter 12
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Ch17 Hashing
Ch17 HashingCh17 Hashing
Ch17 Hashing
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
 
pca.pdf
pca.pdfpca.pdf
pca.pdf
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data Science
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3
 
Ann a Algorithms notes
Ann a Algorithms notesAnn a Algorithms notes
Ann a Algorithms notes
 
Linear Algebra
Linear AlgebraLinear Algebra
Linear Algebra
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
 
Mat lab.pptx
Mat lab.pptxMat lab.pptx
Mat lab.pptx
 

More from Central university of Haryana

MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdfMATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdfCentral university of Haryana
 

More from Central university of Haryana (19)

Practical --2..pdf
Practical --2..pdfPractical --2..pdf
Practical --2..pdf
 
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdfMATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
MATLAB-Cheat-Sheet-for-Data-Science_LondonSchoolofEconomics (1).pdf
 
LittleBookOfRuby.pdf
LittleBookOfRuby.pdfLittleBookOfRuby.pdf
LittleBookOfRuby.pdf
 
all matlab_prog.pdf
all              matlab_prog.pdfall              matlab_prog.pdf
all matlab_prog.pdf
 
Practical13.docx
Practical13.docxPractical13.docx
Practical13.docx
 
Practical 111.docx
Practical 111.docxPractical 111.docx
Practical 111.docx
 
Matlab Practical--11.pdf
Matlab Practical--11.pdfMatlab Practical--11.pdf
Matlab Practical--11.pdf
 
Matlab Practical--11.docx
Matlab Practical--11.docxMatlab Practical--11.docx
Matlab Practical--11.docx
 
Matlab Practical--9.docx
Matlab Practical--9.docxMatlab Practical--9.docx
Matlab Practical--9.docx
 
Matlab Practical-- 12.pdf
Matlab Practical-- 12.pdfMatlab Practical-- 12.pdf
Matlab Practical-- 12.pdf
 
Matlab practical ---9.pdf
Matlab practical ---9.pdfMatlab practical ---9.pdf
Matlab practical ---9.pdf
 
Matlab practical ---7.pdf
Matlab practical ---7.pdfMatlab practical ---7.pdf
Matlab practical ---7.pdf
 
Matlab practical ---6.pdf
Matlab practical ---6.pdfMatlab practical ---6.pdf
Matlab practical ---6.pdf
 
Matlab practical ---5.pdf
Matlab practical ---5.pdfMatlab practical ---5.pdf
Matlab practical ---5.pdf
 
Matlab practical ---4.pdf
Matlab practical ---4.pdfMatlab practical ---4.pdf
Matlab practical ---4.pdf
 
Matlab practical ---3.pdf
Matlab practical ---3.pdfMatlab practical ---3.pdf
Matlab practical ---3.pdf
 
Matlab practical ---2.pdf
Matlab practical ---2.pdfMatlab practical ---2.pdf
Matlab practical ---2.pdf
 
Matlab practical ---1.pdf
Matlab practical ---1.pdfMatlab practical ---1.pdf
Matlab practical ---1.pdf
 
Matlab practical --8.pdf
Matlab practical --8.pdfMatlab practical --8.pdf
Matlab practical --8.pdf
 

Recently uploaded

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 

Recently uploaded (20)

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 

Machine Learning Lab Report on PCA Feature Extraction

  • 1. Ashoka bairwa Page 1 CENTRAL UNIVERSITY OF HARYANA Department of Computer Science & Engineering under SOET MACHINE LEARNING LAB Submitted by Ashoka Roll No:- 191890 Submitted to Dr. Sangeeta Assistant Professor Central University of Haryana (SOET)
  • 2. Ashoka bairwa Page 2 PRACTICAL-1 Aim: Introduction about Principal Component Analysis (PCA) feature extraction. Theory: Principal Component Analysis, or PCA for short, is a method for reducing the dimensionality of data. It can be thought of as a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, whilst retaining the essence of the original data. The PCA method can be described and implemented using the tools of linear algebra. Implementation: PCA is an operation applied to a dataset, represented by an n x m matrix A that results in a projection of A which we will call B. Let’s walk through the steps of this operation. The first step is to calculate the mean values of each column. Or Next, we need to center the values in each column by subtracting the mean column value. The next step is to calculate the covariance matrix of the centered matrix C. Correlation is a normalized measure of the amount and direction (positive or negative) that two columns change together. Covariance is a generalized and unnormalized version of correlation across multiple columns. A covariance matrix is a calculation of covariance of a given matrix with covariance scores for every column with every other column, including itself.
  • 3. Ashoka bairwa Page 3 Finally, we calculate the eigen decomposition of the covariance matrix V. This results in a list of eigenvalues and a list of eigenvectors. The eigenvectors represent the directions or components for the reduced subspace of B, whereas the eigenvalues represent the magnitudes for the directions. For more on this topic, see the post: Gentle Introduction to Eigen decomposition, Eigenvalues, and Eigenvectors for Machine Learning The eigenvectors can be sorted by the eigenvalues in descending order to provide a ranking of the components or axes of the new subspace for A. If all eigenvalues have a similar value, then we know that the existing representation may already be reasonably compressed or dense and that the projection may offer little. If there are eigenvalues close to zero, they represent components or axes of B that may be discarded. A total of m or less components must be selected to comprise the chosen subspace. Ideally, we would select k eigenvectors, called principal components, that have the k largest eigenvalues. Other matrix decomposition methods can be used such as Singular-Value Decomposition, or SVD. As such, generally the values are referred to as singular values and the vectors of the subspace are referred to as principal components. Once chosen, data can be projected into the subspace via matrix multiplication. Where A is the original data that we wish to project, B^T is the transpose of the chosen principal components and P is the projection of A. This is called the covariance method for calculating the PCA, although there are alternative ways to to calculate it.
  • 4. Ashoka bairwa Page 4 Manually Calculate Principal Component Analysis
  • 5. Ashoka bairwa Page 5 Reusable Principal Component Analysis  We can calculate a Principal Component Analysis on a dataset using the PCA() class in the scikit-learn library. The benefit of this approach is that once the projection is calculated, it can be applied to new data again and again quite easily.  When creating the class, the number of components can be specified as a parameter.  The class is first fit on a dataset by calling the fit() function, and then the original dataset or other data can be projected into a subspace with the chosen number of dimensions by calling the transform() function.  Once fit, the eigenvalues and principal components can be accessed on the PCA class via the explained_variance_ and components_ attributes.