SlideShare a Scribd company logo
Presented by
Eshan Agarwal
Implement Principal Component
Analysis(PCA) in python
How do we choose the
right features ?
Given a
classification
problem ….
 PCA is a method for reducing the dimensionality of data.
 It can be thought of as a projection method where data with m-columns
(features) is projected into a subspace with m or fewer columns, while
retaining the essence of the original data.
An PCA
Xn
km
Introduction to PCA
In this presentation, we will discover the PCA
method for dimensionality reduction and how to
implement it from scratch in Python.
 Before go in deep of PCA let us understand
some key points of PCA
 Variance
 The variance of each variable is the average squared deviation of its
n values around the mean of that variable. It can also think of as
spread of data points.
Geometric Rationale of PCA
 Covariance
Covariance of
variables i and j
Sum over all
n objects
Value of
variable i
in object m
Mean of
variable i
Value of
variable j
in object m
Mean of
variable j
 Degree to which the variables are linearly correlated is represented by
their covariances.
Geometric Rationale of PCA
Objective of PCA
 Objective of PCA is to rigidly rotate the axes of this m-dimensional space to new positions
(principal axes)
 PCA is ordered such that principal axis 1 has the highest variance, axis 2 has the next
highest variance .... , and axis p has the lowest variance
Implement PCA in Python (Scratch)
 Load the Data-Set :
 We can use Boston Housing dataset for PCA. Boston dataset has 13
features. So question here is how to visualize the data ?. We can
reduce the dimensions of data by using PCA and then visualize.
 Standardize data:
 PCA is largely affected by scales and different features might have different
scales. So it is better to standardize data before finding PCA components.
Sklearn’s StandardScaler scales data to scale of zero mean and unit variance.
The Algebra of PCA
 Calculating PCA involves following steps:
a. Calculating the covariance matrix.
b. Calculating the eigenvalues and eigenvector.
c. Forming Principal Components.
d. Projection into the new feature space.
a b dc+ + ++ =
 Calculating the covariance matrix (S) :
 Covariance matrix is a matrix of variances and covariances (or correlations) among
every pair of the m variables .
 It is square, symmetric matrix.
 Covariance matrix (S) = X.T * X, we can find it by using numpy matmul() function
in python.
Calculating the eigenvalues and eigenvector :
 ƛ is an eigenvalue for a matrix X if it is a solution of the characteristic
equation:
det( ƛ*I - A ) = 0
Where, I is the identity matrix of the same dimension as X.
 The sum of all m eigenvalues equals the trace of S (the sum of the variances of
the original variables).
 For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by
solving :
( ƛ*I - A )v = 0
 The eigenvalues, 1, 2, ... m are the variances of the coordinates
on each principal component axis.
Calculating the eigenvalues and eigenvector :
 We are using scipy.linalg, which have eigh function for finding the top eigen-
values & eigen-vector, we are finding top 2 eigenvalues and eigenvectors as follow.
Code for finding eigenvalues and eigenvector :
Forming Principal Components :
 Below is code for forming principal components, formed by two principal eigen
vectors by vector-vector multiplication
 Projection into the new feature space :
 Creating a Data Frame having 1st principal & 2nd Principal components.
Visualize Data after PCA
Steps for PCA
 Standardize the Data.
 Calculate the covariance matrix.
 Find the eigenvalues and eigenvectors of the covariance matrix.
 Plot the eigenvectors / principal components over the scaled data.
1) [ True or False ] PCA can be used for projecting and visualizing data in lower
dimensions.
A. TRUE
B. FALSE
2) We apply PCA on image dataset.
A. TRUE
B. FALSE
3) PCA is based on variance maximization and distance minimization.
A. TRUE
B. FALSE
 Implement PCA for number of components = 3 and then visualize data, also load
iris dataset and perform same task
Assessment and Evaluation
Ans:1-A,2-A,3-A
For full code : https://github.com/Eshan2203/PCA-on-Boston-House-price-Data-
Set/blob/master/PCA_BOston.ipynb

More Related Content

What's hot

Pca
PcaPca
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
Ricardo Wendell Rodrigues da Silveira
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
EdutechLearners
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Farah M. Altufaili
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
Sunjeet Jena
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Partha Sarathi Kar
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
Massimiliano Patacchiola
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
Usha Vijay
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
SURBHI SAROHA
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 

What's hot (20)

Pca
PcaPca
Pca
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
PCA
PCAPCA
PCA
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 

Similar to Implement principal component analysis (PCA) in python from scratch

Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
beherasushree212
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
Sivam Chinna
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
PCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptxPCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptx
TechohiT
 
Unit3_1.pptx
Unit3_1.pptxUnit3_1.pptx
Unit3_1.pptx
ssuseree099d2
 
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PCA-SIFT: A More Distinctive Representation for Local Image DescriptorsPCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
wolf
 
Image recogonization
Image recogonizationImage recogonization
Image recogonizationSANTOSH RATH
 
Panoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFPanoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFEric Jansen
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
Akhilesh Joshi
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
Mason Ziemer
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
BeyaNasr1
 
4-RSSI-Spectral Domain Image Transforms_1.pdf
4-RSSI-Spectral Domain Image Transforms_1.pdf4-RSSI-Spectral Domain Image Transforms_1.pdf
4-RSSI-Spectral Domain Image Transforms_1.pdf
muhammadwalidmido
 
pca.pdf
pca.pdfpca.pdf
Kulum alin-11 jan2014
Kulum alin-11 jan2014Kulum alin-11 jan2014
Kulum alin-11 jan2014rolly purnomo
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
Hussain395748
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical Equations
IRJET Journal
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
SnehaDey21
 

Similar to Implement principal component analysis (PCA) in python from scratch (20)

ML Lab.docx
ML Lab.docxML Lab.docx
ML Lab.docx
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
 
Practical --1.pdf
Practical --1.pdfPractical --1.pdf
Practical --1.pdf
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
PCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptxPCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptx
 
Unit3_1.pptx
Unit3_1.pptxUnit3_1.pptx
Unit3_1.pptx
 
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PCA-SIFT: A More Distinctive Representation for Local Image DescriptorsPCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
 
Image recogonization
Image recogonizationImage recogonization
Image recogonization
 
Panoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFPanoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURF
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
4-RSSI-Spectral Domain Image Transforms_1.pdf
4-RSSI-Spectral Domain Image Transforms_1.pdf4-RSSI-Spectral Domain Image Transforms_1.pdf
4-RSSI-Spectral Domain Image Transforms_1.pdf
 
pca.pdf
pca.pdfpca.pdf
pca.pdf
 
Kulum alin-11 jan2014
Kulum alin-11 jan2014Kulum alin-11 jan2014
Kulum alin-11 jan2014
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical Equations
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 

Recently uploaded

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Implement principal component analysis (PCA) in python from scratch

  • 1. Presented by Eshan Agarwal Implement Principal Component Analysis(PCA) in python
  • 2. How do we choose the right features ? Given a classification problem ….
  • 3.  PCA is a method for reducing the dimensionality of data.  It can be thought of as a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, while retaining the essence of the original data. An PCA Xn km Introduction to PCA
  • 4. In this presentation, we will discover the PCA method for dimensionality reduction and how to implement it from scratch in Python.  Before go in deep of PCA let us understand some key points of PCA
  • 5.  Variance  The variance of each variable is the average squared deviation of its n values around the mean of that variable. It can also think of as spread of data points. Geometric Rationale of PCA
  • 6.  Covariance Covariance of variables i and j Sum over all n objects Value of variable i in object m Mean of variable i Value of variable j in object m Mean of variable j  Degree to which the variables are linearly correlated is represented by their covariances. Geometric Rationale of PCA
  • 7. Objective of PCA  Objective of PCA is to rigidly rotate the axes of this m-dimensional space to new positions (principal axes)  PCA is ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance .... , and axis p has the lowest variance
  • 8. Implement PCA in Python (Scratch)  Load the Data-Set :  We can use Boston Housing dataset for PCA. Boston dataset has 13 features. So question here is how to visualize the data ?. We can reduce the dimensions of data by using PCA and then visualize.
  • 9.  Standardize data:  PCA is largely affected by scales and different features might have different scales. So it is better to standardize data before finding PCA components. Sklearn’s StandardScaler scales data to scale of zero mean and unit variance.
  • 10. The Algebra of PCA  Calculating PCA involves following steps: a. Calculating the covariance matrix. b. Calculating the eigenvalues and eigenvector. c. Forming Principal Components. d. Projection into the new feature space. a b dc+ + ++ =
  • 11.  Calculating the covariance matrix (S) :  Covariance matrix is a matrix of variances and covariances (or correlations) among every pair of the m variables .  It is square, symmetric matrix.  Covariance matrix (S) = X.T * X, we can find it by using numpy matmul() function in python.
  • 12. Calculating the eigenvalues and eigenvector :  ƛ is an eigenvalue for a matrix X if it is a solution of the characteristic equation: det( ƛ*I - A ) = 0 Where, I is the identity matrix of the same dimension as X.  The sum of all m eigenvalues equals the trace of S (the sum of the variances of the original variables).
  • 13.  For each eigenvalue ƛ, a corresponding eigen-vector v, can be found by solving : ( ƛ*I - A )v = 0  The eigenvalues, 1, 2, ... m are the variances of the coordinates on each principal component axis. Calculating the eigenvalues and eigenvector :
  • 14.  We are using scipy.linalg, which have eigh function for finding the top eigen- values & eigen-vector, we are finding top 2 eigenvalues and eigenvectors as follow. Code for finding eigenvalues and eigenvector :
  • 15. Forming Principal Components :  Below is code for forming principal components, formed by two principal eigen vectors by vector-vector multiplication
  • 16.  Projection into the new feature space :  Creating a Data Frame having 1st principal & 2nd Principal components.
  • 18. Steps for PCA  Standardize the Data.  Calculate the covariance matrix.  Find the eigenvalues and eigenvectors of the covariance matrix.  Plot the eigenvectors / principal components over the scaled data.
  • 19. 1) [ True or False ] PCA can be used for projecting and visualizing data in lower dimensions. A. TRUE B. FALSE 2) We apply PCA on image dataset. A. TRUE B. FALSE 3) PCA is based on variance maximization and distance minimization. A. TRUE B. FALSE  Implement PCA for number of components = 3 and then visualize data, also load iris dataset and perform same task Assessment and Evaluation Ans:1-A,2-A,3-A
  • 20. For full code : https://github.com/Eshan2203/PCA-on-Boston-House-price-Data- Set/blob/master/PCA_BOston.ipynb

Editor's Notes

  1. How presentation will benefit audience: Adult learners are more interested in a subject if they know how or why it is important to them. Presenter’s level of expertise in the subject: Briefly state your credentials in this area, or explain why participants should listen to you.
  2. Lesson descriptions should be brief.
  3. Example objectives At the end of this lesson, you will be able to: Save files to the team Web server. Move files to different locations on the team Web server. Share files on the team Web server.