SlideShare a Scribd company logo
A	Review	and	Implementation	of	Principal	Component	Analysis		 1	
	
	
	
	
	
	
	
	
	
	
A Review and Implementation of Principal Component Analysis
Taweh Beysolow II
Professor Moretti
Fordham University
A	Review	and	Implementation	of	Principal	Component	Analysis		 2	
Abstract
In this experiment, we shall look at the famous iris data set and perform principal
component analysis on the data. We want to see which are the principal components that
explain the most variance within the data set. Furthermore, we will discuss the
application of principal component analysis in conjunction with other data analysis
techniques. All computations were performed in Python and all data is uploaded from the
UCI machine learning repository. Rather that simply using the built in PCA function, we
shall implement principal component analysis by manually performing each step, with
assistance from packages for Eigen-decomposition. In conclusion, we find that first two
principal components, out of four in total, explain roughly 96% of the variance in the data.
I. What is Principal Component Analysis?
Principal component analysis (PCA) is a orthogonal linear transformation of data, in
which the transformed data is projected onto a new coordinate plane. This transformed
data is displayed in such a manner that the first coordinate is the location of the greatest
variance, and every subsequent variance is placed on the coordinate system in a
decreasing fashion. These coordinates themselves are the principal components of the
data. The primary purpose of principal component analysis is “to reduce the
dimensionality of a data set consisting of a large number of interrelated variables, while
retaining as much as possible of the variation present in the data set.” (Wood, pg.2)
A	Review	and	Implementation	of	Principal	Component	Analysis		 3	
II. Notation
𝑥 = 𝑣𝑒𝑐𝑡𝑜𝑟 𝑜𝑓 𝑝 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 , 𝛼! = 𝑣𝑒𝑐𝑡𝑜𝑟 𝑜𝑓 𝑝 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠
𝛼!
!
𝑥 = Σ!
!
𝛼!" 𝑥!, Σ = 𝐶𝑜𝑣. 𝑚𝑎𝑡𝑟𝑖𝑥 𝑓𝑜𝑟 𝑥, 𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑑 𝑏𝑦 𝑆 𝑠𝑎𝑚𝑝𝑙𝑒 𝑐𝑜𝑣. 𝑚𝑎𝑡𝑟𝑖𝑥
𝜆! = 𝑃𝑟𝑖𝑛𝑐𝑖𝑝𝑎𝑙 𝐶𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝑘, 𝑘 = (1,2, … , 𝑝)
III. Derivation of Principal Component Analysis
Our goal is to find the linear function of random variables from the x vector with the
vector of constants from the alpha vector with the maximum variance. This linear
function produces our principal components. Be this as it may, each principal component
must be in order of decreasing variance, and each principal component must be
uncorrelated with each other.
Objective:
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑉𝑎𝑟 𝛼!
!
𝑥 = 𝛼!
!
Σ𝛼!
We seek to used constrained optimization, as without a constraint the value
of 𝛼! could be infinitely large. As such, we shall choose the following normalization
constraint:
𝛼!
!
𝛼! = 1
This brings us to the concept of Lagrange multipliers, which shall be the method
by which we achieve this constrained optimization.
A	Review	and	Implementation	of	Principal	Component	Analysis		 4	
Lagrange Multipliers in PCA
The Lagrange Multiplier method is a tool “for constrained optimization of
differentiable functions, especially for nonlinear constrained optimization.”(Huijuan,
pg.1) In particular, this is helpful for finding local maxima and minima of a respective
function subject to a given constraint. Within the context of the experiment, the Lagrange
multipliers are applied as follows:
𝛼!
!
Σ𝛼! − 𝜆 𝛼!
𝛼! − 1
𝑑
𝑑𝛼!
𝛼!
!
Σ𝛼! − 𝜆 𝛼!
𝛼! − 1 = 0
Σ𝛼! − 𝜆𝛼! = 0
Σ𝛼! = 𝜆! 𝛼!
The final step of the equation yields us the eigenvector 𝛼! with its corresponding
eigenvalue 𝜆!.
What are Eigenvalues and Eigenvectors?
An eigenvalue is a number derived from a square matrix, which corresponds to a
specific eigenvector, also associated with a square matrix. Together, they “provide the
Eigen-decomposition of a matrix.” (Abdi, pg.1) Plainly spoken, the Eigen-decomposition
of a matrix merely provides the matrix in the form of eigenvectors and their
corresponding eigenvalues. Eigen-decomposition is important because it is a “method by
which we can find the maximum (or minimum) of functions involving matrices.” (Abdi,
pg.1) In this context, this is the method by which we find the principal components in
order of decreasing variance.
A	Review	and	Implementation	of	Principal	Component	Analysis		 5	
Eigen-decomposition
𝐴𝑢 = 𝜆𝑢
𝐴 − 𝜆𝐼 𝑢 = 0
Where
A = square matrix,
u = eigenvector to matrix A (if length of vector changes when multiplied by A)
Assume that
𝐴 =
2 3
2 1
, 𝑇ℎ𝑒𝑟𝑒𝑟𝑓𝑜𝑟𝑒
𝑢! =
3
2
, 𝜆! = 4
𝑢! =
−1
1
, 𝜆! = −1
For most applications, the eigenvectors are normalized to a unit vector as such:
𝑢!
𝑢 = 1
Eigenvectors of A furthermore are put together together in a matrix U. each column of U
is an eigenvector of A. The eigenvalues are stored in a diagonal matrix ⋀, where the trace,
or diagonal, of the matrix gives the eigenvalues. Thus, we rewrite the first equation
accordingly:
𝐴𝑈 = 𝑈𝐴
𝐴 = 𝑈⋀𝑈!!
=
3 −1
2 1
4 0
0 −1
2 2
−4 6
=
2 3
2 1
A	Review	and	Implementation	of	Principal	Component	Analysis		 6	
Moving forward, as we have mentioned prior, our objective is the maximize 𝜆!, and with
the eigenvectors defined in decreasing order. If 𝜆! is the largest eigenvector, then the first
principal component is defined as
Σ𝛼! = 𝜆! 𝛼!
In general, we define a given eigenvector as the k-th principal component of x and that
the variance of a given eigenvector is denoted by its corresponding eigenvalue. We shall
now demonstrate this process when k = 2 and when k > 2.
2nd
and K-th Principal Component
The second principal component maximizes the variance subject to being
uncorrelated with the first principal component The non-correlation constraint is
expressed as the following:
𝑐𝑜𝑣 𝛼!
!
𝑥𝛼!
!
𝑥 = 𝛼!
!
Σ𝛼! = 𝛼!
!
Σ𝛼! = 𝛼!
!
𝜆! 𝛼!
!
= 𝜆! 𝛼!
!
𝛼 = 𝜆! 𝛼!
!
𝛼! = 0
𝛼!
!
Σ𝛼! − 𝜆! 𝛼!
!
𝛼! − 1 − 𝜙𝛼!
!
𝛼!
𝑑
𝑑𝛼!
𝛼!
!
Σ𝛼! − 𝜆! 𝛼!
!
𝛼! − 1 − 𝜙𝛼!
!
𝛼! = 0
= Σ𝛼! − 𝜆! 𝛼! − 𝜙𝛼! = 0
𝛼!
!
Σ𝛼! − 𝛼!
!
𝜆! 𝛼! − 𝛼!
!
𝜙𝛼! = 0
0 − 0 − 𝜙1 = 0
𝜙 = 0
Σ𝛼! − 𝜆! 𝛼! = 0
This process can be repeated up to k = p, yielding principal components for each
of the p random variables.
A	Review	and	Implementation	of	Principal	Component	Analysis		 7	
IV. Data
For this experiment, we shall be using Ronald Fisher’s Iris flower data set, originally
collected by Edgar Anderson to study the variation of the three species. Our objective is
to determine which principal components contain the most data regarding this data set.
There are a total of 150 observations, 50 of each of the three species of flower. The
species and variables of observed are:
Species
• Iris-Setosa
• Iris-Virginica
• Iris-Veriscolor
Variables
• Sepal Length
• Sepal Width
• Petal Length
• Petal Width
A	Review	and	Implementation	of	Principal	Component	Analysis		 8	
V. Experiment
When performing initial exploratory analysis on our data, we notice the following:
As we observe, the data exhibits very high variance within and between species
with respect to sepal length and sepal width, but is considerable less variable between
species, and moderately variable within species when observing petal length and petal
width. This will be a point of interest to keep in mind for later, but for now let us move
on to describing the implementation as performed here. After we load our data into a
variable within Python, we standardize our values (mean = 0, var. =1), then we calculate
the covariance matrix for X:
𝑆 = Σ!
!
𝑥! − 𝑥 !
(𝑥 − 𝑥)
A	Review	and	Implementation	of	Principal	Component	Analysis		 9	
Generally speaking, we want to standardize values when they are not measured on
the same scale. Although in this experiment all of the variables are measured on a
centimeter scale, it is advisable to still do so. Moving forward, we perform the Eigen-
decomposition, and obtain the eigenvalues and eigenvectors. After we sort the
eigenvectors, we observe the following:
As we can see, the first two principal components explain the vast majority of
variance within the data set. As pointed out early, the high variability amongst sepal
length and sepal width between and within species foreshadowed these events. Finally,
we project the transformed data onto the new feature space:
A	Review	and	Implementation	of	Principal	Component	Analysis		 10	
VI. Conclusion and Comments
We observe that instead of a 4-dimensional plot, as we would have originally had,
we are now looking at a very familiar xy plot. For exploratory analysis purposes, this
brings considerable ease both visually and analytically. It is easy to see that Iris-viriginica
and Iris-veriscolor show considerable similarities with respect to sepal length and sepal
width properties. In contrast, Iris-setosa in general seems to be considerably unique. As
for further applications of PCA, it is used in regression analysis often to determine which
variables should be included in a model, used in neuroscience to identify properties of
stimuli, as well as other tools. As proven above, both in theory and application, principal
component analysis provides a robust and excellent method of simplifying very complex
data into less complex forms.
A	Review	and	Implementation	of	Principal	Component	Analysis		 11	
VII. Appendix
1. Wood, F. (2009, September). Principal Component Analysis. Retrieved from
http://www.stat.columbia.edu/~fwood/Teaching/w4315/Fall2009/pca.pdf
2. Abdi, H. (2007). The Eigen-Decomposition. Retrieved from
https://www.utdallas.edu/~herve/Abdi-EVD2007-pretty.pdf
3. Huijuan, L. (2008, September 28). Lagrange Multipliers and their
Applications. Retrieved from
http://sces.phys.utk.edu/~moreo/mm08/method_HLi.pdf

More Related Content

What's hot

Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Waqas Tariq
 
Hc3413121317
Hc3413121317Hc3413121317
Hc3413121317
IJERA Editor
 
Simultaneous optimization of neural network weights and active nodes using me...
Simultaneous optimization of neural network weights and active nodes using me...Simultaneous optimization of neural network weights and active nodes using me...
Simultaneous optimization of neural network weights and active nodes using me...
Varun Ojha
 
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET Journal
 
Searching,sorting
Searching,sortingSearching,sorting
Searching,sorting
LavanyaJ28
 
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
ijistjournal
 
Selection Sort with Improved Asymptotic Time Bounds
Selection Sort with Improved Asymptotic Time BoundsSelection Sort with Improved Asymptotic Time Bounds
Selection Sort with Improved Asymptotic Time Bounds
theijes
 
data structures and algorithms Unit 3
data structures and algorithms Unit 3data structures and algorithms Unit 3
data structures and algorithms Unit 3
infanciaj
 
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based AlgorithmsNMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
Suzanne Wallace
 
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
IOSR Journals
 
Average sort
Average sortAverage sort
Average sort
Waqas Tariq
 
Bs,qs,divide and conquer 1
Bs,qs,divide and conquer 1Bs,qs,divide and conquer 1
Bs,qs,divide and conquer 1
subhashchandra197
 
Evaluation the efficiency of cuckoo
Evaluation the efficiency of cuckooEvaluation the efficiency of cuckoo
Evaluation the efficiency of cuckoo
ijcsa
 
13.cartesian coordinates
13.cartesian coordinates13.cartesian coordinates
13.cartesian coordinates
Abhijeet Kadam
 
Fuzzy logic
Fuzzy logicFuzzy logic
Af4201214217
Af4201214217Af4201214217
Af4201214217
IJERA Editor
 
Implementing Merge Sort
Implementing Merge SortImplementing Merge Sort
Implementing Merge Sort
smita gupta
 

What's hot (20)

Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
 
Hc3413121317
Hc3413121317Hc3413121317
Hc3413121317
 
Simultaneous optimization of neural network weights and active nodes using me...
Simultaneous optimization of neural network weights and active nodes using me...Simultaneous optimization of neural network weights and active nodes using me...
Simultaneous optimization of neural network weights and active nodes using me...
 
Query processing System
Query processing SystemQuery processing System
Query processing System
 
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
 
Searching,sorting
Searching,sortingSearching,sorting
Searching,sorting
 
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
GLOBAL CHAOS SYNCHRONIZATION OF UNCERTAIN LORENZ-STENFLO AND QI 4-D CHAOTIC S...
 
Selection Sort with Improved Asymptotic Time Bounds
Selection Sort with Improved Asymptotic Time BoundsSelection Sort with Improved Asymptotic Time Bounds
Selection Sort with Improved Asymptotic Time Bounds
 
data structures and algorithms Unit 3
data structures and algorithms Unit 3data structures and algorithms Unit 3
data structures and algorithms Unit 3
 
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based AlgorithmsNMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
 
Ch05 Black Jack
Ch05  Black  JackCh05  Black  Jack
Ch05 Black Jack
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
 
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
 
Average sort
Average sortAverage sort
Average sort
 
Bs,qs,divide and conquer 1
Bs,qs,divide and conquer 1Bs,qs,divide and conquer 1
Bs,qs,divide and conquer 1
 
Evaluation the efficiency of cuckoo
Evaluation the efficiency of cuckooEvaluation the efficiency of cuckoo
Evaluation the efficiency of cuckoo
 
13.cartesian coordinates
13.cartesian coordinates13.cartesian coordinates
13.cartesian coordinates
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Af4201214217
Af4201214217Af4201214217
Af4201214217
 
Implementing Merge Sort
Implementing Merge SortImplementing Merge Sort
Implementing Merge Sort
 

Viewers also liked

Fundamentals of oil and gas cert
Fundamentals of oil and gas certFundamentals of oil and gas cert
Fundamentals of oil and gas certLewis Chapman
 
Asset Price Prediction with Machine Learning
Asset Price Prediction with Machine LearningAsset Price Prediction with Machine Learning
Asset Price Prediction with Machine Learning
Taweh Beysolow II
 
Oil Physical and Financial Markets: Economics, Engineering, Pricing, And Regu...
Oil Physical and Financial Markets: Economics, Engineering, Pricing, And Regu...Oil Physical and Financial Markets: Economics, Engineering, Pricing, And Regu...
Oil Physical and Financial Markets: Economics, Engineering, Pricing, And Regu...
Kevin Kane
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic RegressionTaweh Beysolow II
 
Pairs Trading
Pairs TradingPairs Trading
Pairs Trading
nattyvirk
 
Fundamentals of oil & gas industry h. kumar
Fundamentals of oil & gas industry   h. kumarFundamentals of oil & gas industry   h. kumar
Fundamentals of oil & gas industry h. kumargusgon
 

Viewers also liked (8)

Statistical Arbitrage
Statistical Arbitrage Statistical Arbitrage
Statistical Arbitrage
 
Fundamentals of oil and gas cert
Fundamentals of oil and gas certFundamentals of oil and gas cert
Fundamentals of oil and gas cert
 
Gini Index Research
Gini Index ResearchGini Index Research
Gini Index Research
 
Asset Price Prediction with Machine Learning
Asset Price Prediction with Machine LearningAsset Price Prediction with Machine Learning
Asset Price Prediction with Machine Learning
 
Oil Physical and Financial Markets: Economics, Engineering, Pricing, And Regu...
Oil Physical and Financial Markets: Economics, Engineering, Pricing, And Regu...Oil Physical and Financial Markets: Economics, Engineering, Pricing, And Regu...
Oil Physical and Financial Markets: Economics, Engineering, Pricing, And Regu...
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic Regression
 
Pairs Trading
Pairs TradingPairs Trading
Pairs Trading
 
Fundamentals of oil & gas industry h. kumar
Fundamentals of oil & gas industry   h. kumarFundamentals of oil & gas industry   h. kumar
Fundamentals of oil & gas industry h. kumar
 

Similar to Overview and Implementation of Principal Component Analysis

Dv33736740
Dv33736740Dv33736740
Dv33736740
IJERA Editor
 
Dv33736740
Dv33736740Dv33736740
Dv33736740
IJERA Editor
 
Factor anaysis scale dimensionality
Factor anaysis scale dimensionalityFactor anaysis scale dimensionality
Factor anaysis scale dimensionalityCarlo Magno
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
IJCERT
 
Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
MaTruongThanh002937
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
Sunjeet Jena
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
ijaia
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
infopapers
 
Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11P Palai
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_A
Srimatre K
 
A Study on Performance Analysis of Different Prediction Techniques in Predict...
A Study on Performance Analysis of Different Prediction Techniques in Predict...A Study on Performance Analysis of Different Prediction Techniques in Predict...
A Study on Performance Analysis of Different Prediction Techniques in Predict...
IJRES Journal
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
DrPrafullNarooka
 
Recommender system
Recommender systemRecommender system
Recommender system
Bhumi Patel
 
panel regression.pptx
panel regression.pptxpanel regression.pptx
panel regression.pptx
Victoria Bozhenko
 
Priya
PriyaPriya
Priya
Student
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
BeyaNasr1
 
Bachelor's Thesis
Bachelor's ThesisBachelor's Thesis
Bachelor's Thesis
Bastiaan Frerix
 

Similar to Overview and Implementation of Principal Component Analysis (20)

Dv33736740
Dv33736740Dv33736740
Dv33736740
 
Dv33736740
Dv33736740Dv33736740
Dv33736740
 
Factor anaysis scale dimensionality
Factor anaysis scale dimensionalityFactor anaysis scale dimensionality
Factor anaysis scale dimensionality
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
 
Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Bioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignmentsBioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignments
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_A
 
A Study on Performance Analysis of Different Prediction Techniques in Predict...
A Study on Performance Analysis of Different Prediction Techniques in Predict...A Study on Performance Analysis of Different Prediction Techniques in Predict...
A Study on Performance Analysis of Different Prediction Techniques in Predict...
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 
StockMarketPrediction
StockMarketPredictionStockMarketPrediction
StockMarketPrediction
 
Recommender system
Recommender systemRecommender system
Recommender system
 
panel regression.pptx
panel regression.pptxpanel regression.pptx
panel regression.pptx
 
Priya
PriyaPriya
Priya
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
20120140503023
2012014050302320120140503023
20120140503023
 
Bachelor's Thesis
Bachelor's ThesisBachelor's Thesis
Bachelor's Thesis
 

Overview and Implementation of Principal Component Analysis

  • 1. A Review and Implementation of Principal Component Analysis 1 A Review and Implementation of Principal Component Analysis Taweh Beysolow II Professor Moretti Fordham University
  • 2. A Review and Implementation of Principal Component Analysis 2 Abstract In this experiment, we shall look at the famous iris data set and perform principal component analysis on the data. We want to see which are the principal components that explain the most variance within the data set. Furthermore, we will discuss the application of principal component analysis in conjunction with other data analysis techniques. All computations were performed in Python and all data is uploaded from the UCI machine learning repository. Rather that simply using the built in PCA function, we shall implement principal component analysis by manually performing each step, with assistance from packages for Eigen-decomposition. In conclusion, we find that first two principal components, out of four in total, explain roughly 96% of the variance in the data. I. What is Principal Component Analysis? Principal component analysis (PCA) is a orthogonal linear transformation of data, in which the transformed data is projected onto a new coordinate plane. This transformed data is displayed in such a manner that the first coordinate is the location of the greatest variance, and every subsequent variance is placed on the coordinate system in a decreasing fashion. These coordinates themselves are the principal components of the data. The primary purpose of principal component analysis is “to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set.” (Wood, pg.2)
  • 3. A Review and Implementation of Principal Component Analysis 3 II. Notation 𝑥 = 𝑣𝑒𝑐𝑡𝑜𝑟 𝑜𝑓 𝑝 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 , 𝛼! = 𝑣𝑒𝑐𝑡𝑜𝑟 𝑜𝑓 𝑝 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝛼! ! 𝑥 = Σ! ! 𝛼!" 𝑥!, Σ = 𝐶𝑜𝑣. 𝑚𝑎𝑡𝑟𝑖𝑥 𝑓𝑜𝑟 𝑥, 𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑑 𝑏𝑦 𝑆 𝑠𝑎𝑚𝑝𝑙𝑒 𝑐𝑜𝑣. 𝑚𝑎𝑡𝑟𝑖𝑥 𝜆! = 𝑃𝑟𝑖𝑛𝑐𝑖𝑝𝑎𝑙 𝐶𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝑘, 𝑘 = (1,2, … , 𝑝) III. Derivation of Principal Component Analysis Our goal is to find the linear function of random variables from the x vector with the vector of constants from the alpha vector with the maximum variance. This linear function produces our principal components. Be this as it may, each principal component must be in order of decreasing variance, and each principal component must be uncorrelated with each other. Objective: 𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑉𝑎𝑟 𝛼! ! 𝑥 = 𝛼! ! Σ𝛼! We seek to used constrained optimization, as without a constraint the value of 𝛼! could be infinitely large. As such, we shall choose the following normalization constraint: 𝛼! ! 𝛼! = 1 This brings us to the concept of Lagrange multipliers, which shall be the method by which we achieve this constrained optimization.
  • 4. A Review and Implementation of Principal Component Analysis 4 Lagrange Multipliers in PCA The Lagrange Multiplier method is a tool “for constrained optimization of differentiable functions, especially for nonlinear constrained optimization.”(Huijuan, pg.1) In particular, this is helpful for finding local maxima and minima of a respective function subject to a given constraint. Within the context of the experiment, the Lagrange multipliers are applied as follows: 𝛼! ! Σ𝛼! − 𝜆 𝛼! 𝛼! − 1 𝑑 𝑑𝛼! 𝛼! ! Σ𝛼! − 𝜆 𝛼! 𝛼! − 1 = 0 Σ𝛼! − 𝜆𝛼! = 0 Σ𝛼! = 𝜆! 𝛼! The final step of the equation yields us the eigenvector 𝛼! with its corresponding eigenvalue 𝜆!. What are Eigenvalues and Eigenvectors? An eigenvalue is a number derived from a square matrix, which corresponds to a specific eigenvector, also associated with a square matrix. Together, they “provide the Eigen-decomposition of a matrix.” (Abdi, pg.1) Plainly spoken, the Eigen-decomposition of a matrix merely provides the matrix in the form of eigenvectors and their corresponding eigenvalues. Eigen-decomposition is important because it is a “method by which we can find the maximum (or minimum) of functions involving matrices.” (Abdi, pg.1) In this context, this is the method by which we find the principal components in order of decreasing variance.
  • 5. A Review and Implementation of Principal Component Analysis 5 Eigen-decomposition 𝐴𝑢 = 𝜆𝑢 𝐴 − 𝜆𝐼 𝑢 = 0 Where A = square matrix, u = eigenvector to matrix A (if length of vector changes when multiplied by A) Assume that 𝐴 = 2 3 2 1 , 𝑇ℎ𝑒𝑟𝑒𝑟𝑓𝑜𝑟𝑒 𝑢! = 3 2 , 𝜆! = 4 𝑢! = −1 1 , 𝜆! = −1 For most applications, the eigenvectors are normalized to a unit vector as such: 𝑢! 𝑢 = 1 Eigenvectors of A furthermore are put together together in a matrix U. each column of U is an eigenvector of A. The eigenvalues are stored in a diagonal matrix ⋀, where the trace, or diagonal, of the matrix gives the eigenvalues. Thus, we rewrite the first equation accordingly: 𝐴𝑈 = 𝑈𝐴 𝐴 = 𝑈⋀𝑈!! = 3 −1 2 1 4 0 0 −1 2 2 −4 6 = 2 3 2 1
  • 6. A Review and Implementation of Principal Component Analysis 6 Moving forward, as we have mentioned prior, our objective is the maximize 𝜆!, and with the eigenvectors defined in decreasing order. If 𝜆! is the largest eigenvector, then the first principal component is defined as Σ𝛼! = 𝜆! 𝛼! In general, we define a given eigenvector as the k-th principal component of x and that the variance of a given eigenvector is denoted by its corresponding eigenvalue. We shall now demonstrate this process when k = 2 and when k > 2. 2nd and K-th Principal Component The second principal component maximizes the variance subject to being uncorrelated with the first principal component The non-correlation constraint is expressed as the following: 𝑐𝑜𝑣 𝛼! ! 𝑥𝛼! ! 𝑥 = 𝛼! ! Σ𝛼! = 𝛼! ! Σ𝛼! = 𝛼! ! 𝜆! 𝛼! ! = 𝜆! 𝛼! ! 𝛼 = 𝜆! 𝛼! ! 𝛼! = 0 𝛼! ! Σ𝛼! − 𝜆! 𝛼! ! 𝛼! − 1 − 𝜙𝛼! ! 𝛼! 𝑑 𝑑𝛼! 𝛼! ! Σ𝛼! − 𝜆! 𝛼! ! 𝛼! − 1 − 𝜙𝛼! ! 𝛼! = 0 = Σ𝛼! − 𝜆! 𝛼! − 𝜙𝛼! = 0 𝛼! ! Σ𝛼! − 𝛼! ! 𝜆! 𝛼! − 𝛼! ! 𝜙𝛼! = 0 0 − 0 − 𝜙1 = 0 𝜙 = 0 Σ𝛼! − 𝜆! 𝛼! = 0 This process can be repeated up to k = p, yielding principal components for each of the p random variables.
  • 7. A Review and Implementation of Principal Component Analysis 7 IV. Data For this experiment, we shall be using Ronald Fisher’s Iris flower data set, originally collected by Edgar Anderson to study the variation of the three species. Our objective is to determine which principal components contain the most data regarding this data set. There are a total of 150 observations, 50 of each of the three species of flower. The species and variables of observed are: Species • Iris-Setosa • Iris-Virginica • Iris-Veriscolor Variables • Sepal Length • Sepal Width • Petal Length • Petal Width
  • 8. A Review and Implementation of Principal Component Analysis 8 V. Experiment When performing initial exploratory analysis on our data, we notice the following: As we observe, the data exhibits very high variance within and between species with respect to sepal length and sepal width, but is considerable less variable between species, and moderately variable within species when observing petal length and petal width. This will be a point of interest to keep in mind for later, but for now let us move on to describing the implementation as performed here. After we load our data into a variable within Python, we standardize our values (mean = 0, var. =1), then we calculate the covariance matrix for X: 𝑆 = Σ! ! 𝑥! − 𝑥 ! (𝑥 − 𝑥)
  • 9. A Review and Implementation of Principal Component Analysis 9 Generally speaking, we want to standardize values when they are not measured on the same scale. Although in this experiment all of the variables are measured on a centimeter scale, it is advisable to still do so. Moving forward, we perform the Eigen- decomposition, and obtain the eigenvalues and eigenvectors. After we sort the eigenvectors, we observe the following: As we can see, the first two principal components explain the vast majority of variance within the data set. As pointed out early, the high variability amongst sepal length and sepal width between and within species foreshadowed these events. Finally, we project the transformed data onto the new feature space:
  • 10. A Review and Implementation of Principal Component Analysis 10 VI. Conclusion and Comments We observe that instead of a 4-dimensional plot, as we would have originally had, we are now looking at a very familiar xy plot. For exploratory analysis purposes, this brings considerable ease both visually and analytically. It is easy to see that Iris-viriginica and Iris-veriscolor show considerable similarities with respect to sepal length and sepal width properties. In contrast, Iris-setosa in general seems to be considerably unique. As for further applications of PCA, it is used in regression analysis often to determine which variables should be included in a model, used in neuroscience to identify properties of stimuli, as well as other tools. As proven above, both in theory and application, principal component analysis provides a robust and excellent method of simplifying very complex data into less complex forms.
  • 11. A Review and Implementation of Principal Component Analysis 11 VII. Appendix 1. Wood, F. (2009, September). Principal Component Analysis. Retrieved from http://www.stat.columbia.edu/~fwood/Teaching/w4315/Fall2009/pca.pdf 2. Abdi, H. (2007). The Eigen-Decomposition. Retrieved from https://www.utdallas.edu/~herve/Abdi-EVD2007-pretty.pdf 3. Huijuan, L. (2008, September 28). Lagrange Multipliers and their Applications. Retrieved from http://sces.phys.utk.edu/~moreo/mm08/method_HLi.pdf