SlideShare a Scribd company logo
1 of 15
Download to read offline
Principle Component Analysis
sumit singh
October 2017
0.1 Prologue
Figure 1: Rotation of Axis
As shown in Figure 0.1
x = x cosα + y sinα
y = −x sinα + y cosα
0.2 Introduction
Principal component analysis is a data reduction technique. It was developed
by Hotelling in 1933. It reduces the number of dimensions. The new dimen-
sions are orthogonal to each other and they are the principal components.
0.3 Mathematics of PCA
Let us take a two-dimensional matrix, a data of 2 variables and n observa-
tions. Say the scatter plot of the data looks like what is shown in Figure
1
0.3.
Xn×2 =









x11 x12
x21 x22
...
...
xd1 xd2









Figure 2: Principal Components in 2-D
Let us compute the covariance matrix of X and denote it as:
Cov(X1, X2) =



s11 s12
s21 s22



It is a 2 × 2 matrix. The scatter plot shows that there is a relationship
between X1 and X2. So, if we calculate the correlation matrix, we will get
non zero correlation coefficient.
Corr(X1, X2) =



1 r12
r21 1



We know that −1 ≤ r12 = r21 ≤ 1 and say r12
∼= 0.9. The variance of X1 is s11
and the variance of X2 is s22 which are sample variance. The corresponding
2
Figure 3: Original Data
population variance matrix is denoted as:
PopulationCov(X) =



σ11 σ12
σ21 σ22



The population variance of X1 is σ11 and that of X2 is σ22.
As shown in Figure , the variability of X1 and X2 (s11 and s22 respec-
tively)is not same but they are both high.
Now, we rotate the axis X1 and X2 by an angle θ. Let us call the new
axes Z1 and Z2. Now, let us see the variability along the new axes Z1 and
Z2. Let us draw a figure along the new axes Z1 and Z2 and call this trans-
formed data. We compare the two diagrams and see that V ariability(Z1)
V ariability(X1) and similarly, or rather even more so, V ariability(Z2)
3
Figure 4: Transformation
V ariability(X2).
And, in the transformed axes as shown in Fig 0.3:
•
V (Z1) V (Z2) (1)
Since the variability across Z2 is much less compared to Z1, then we
can say that the information content across Z2 dimension is very less.
In fact, we can ignore that information and we can just capture the
information along Z1. In statistical sense, information is nothing but
the variability/variance . Z1 alone is enough to give the information
4
Figure 5: Transformed Data
which was scattered all over X1 & X2. We can in fact ignore the di-
mension Z2, which will leave us with lesser number of dimension. That
is why PCA is essentially a dimensionality reduction technique. The
5
dimension reduction can be done for p-dimension matrix as well.
• Orthogonal Dimensions. The major and minor axes of the ellipse (of
the scatter plot) is parallel to the (transformed) coordinate axes which
was not the case earlier in the X1 & X2 axes system. When the major
and minor axes of the ellipse is not parallel to the coordinate axes,
it shows a dependency between the axes, X1 & X2. That is not the
case with the transformed axes Z1 & Z2 which shows that they are
independent, that they are orthogonal. Orthogonality is preserved
So, we are trying to develop a mathematical formulation where uncorrelated
data structure is transformed to an uncorrelated data structure (the axis of
the ellipse is parallel to the transformed coordinate axes) in a reduced di-
mension.
A reduction from p-dimension to a lesser number of dimension is what PCA
does. By p-dimension, it means an n × p matrix. p denotes the number of
variables, the number of columns of data. n denotes the number of observa-
tions, number of rows of data:
X =









x11 x12 x13 . . . x1p
x21 x22 x23 . . . x2p
...
...
...
...
...
xn1 xd2 xd3 . . . xnp









=









XT
1
XT
2
...
XT
p









6
We transform it from X space to Z space. Xn,p → Zn,q where q p. PCA is
a dimensionality reduction technique. It transforms the original data matrix
into a reduced components matrix and it preserves the orthogonality of the
components. Suppose we want to do a prediction models using multivariate
regression. The Xs are independent variables (IV). If these IVs are corre-
lated, then that would lead to the problem of multicollinearity. The linear
model will not work as the determinant of the matrix will be zero. One of the
solution to this problem is the ridge regression. But, if we can make them
independent by transformations then the variables would be truly indepen-
dent and then linear regression can be applied. This is one of the advantages
of PCA.
What is Principal?
What is the method? we will go by the same 2 × 2 matrix, with scatter plot
of the data in the shape of ellipse. Let us consider a point at random from
all the data points, say M with coordinates (x1, x2). The coordinates of the
point M on the transformed axis will be (Assuming that Z1 is at angle θ
counter clock wise from X1):
z1 : x1cosθ + x2sinθ (2)
z2 : −x1sinθ + x2cosθ (3)
which in mtrix form can be written as:



z1
z2


 =



cosθ sinθ
−sinθ cosθ






x1
x2


 (4)
7
or
Z = AT
X (5)
where,
Z =



z1
z2


 (6)
A =



cosθ −sinθ
sinθ cosθ


 (7)
and
X =



x1
x2


 (8)
Now, if instead of p = 2 we have p = p, we will get:
Z =









z1
z2
...
zp









=









a11 a12 . . . a1p
a21 a22 . . . a2p
...
...
...
...
ap1 ap2 . . . app


















x1
x2
...
xp









(9)
In terms of dimensions:
Zp×1 = [Ap×p]T
× Xp×1 (10)
We say that it is an orthogonal transformation. Let’s see now, how this
8
orthogonality is maintained. Let’s see with the two-dimension case.
A =



cosθ −sinθ
sinθ cosθ


 = a1 a2 (11)
where
a1 =



cosθ
sinθ


 (12)
and
a2 =



−sinθ
cosθ


 (13)
Now,
aT
1 a1 = cosθ sinθ



cosθ
sinθ


 = 1 (14)
Similarly,
aT
2 a2 = 1 (15)
Now,
AT
A =



cosθ sinθ
−sinθ cosθ






cosθ −sinθ
sinθ cosθ


 (16)
=⇒ AT
A =



1 0
0 1


 = AAT
= A−1
A (17)
9
=⇒ A−1
A = I (18)
which means, A is an orthogonal matrix, the off diagonal elements are 0.
This shows that the transformation that we are doing is an orthogonal trans-
formation. A is an orthogonal transformation matrix.
In case of p variables:
z1 = aT
1 x = a11X1 + a12X2 + · · · + a1pXp (19)
z2 = aT
2 x = a21X1 + a22X2 + · · · + a2pXp (20)
... = · · · = . . . . . . . . . . . . . . . (21)
zj = aT
j x = aj1X1 + aj2X2 + · · · + ajpXp (22)
... = · · · = . . . . . . . . . . . . . . . (23)
zp = aT
p x = ap1X1 + ap2X2 + · · · + appXp (24)
where,
aT
j aj = 1, j = 1, 2, . . . , p (25)
and
var(z1) ≥ var(z2) ≥ · · · ≥ var(zp) (26)
We want the first principal component in such a manner that it will explain
the maximum variance possible. Second principal component will explain
the next maximum variance followed (after taking into account the variance
10
already explained by the first principal component) and so on and so forth.
Now, the next step is, how to extract the principal components given the
data matrix X.
The jth
principal component, Zj is equal to at
jx. So,
V ar(Zj) = V ar(at
jx) = at
jV ar(x)aj (27)
as aj is a constant vector.
V ar(X) = Cov(X) = Σ =









σ11 σ12 . . . σ1p
σ21 σ22 . . . σ2p
...
...
...
...
σp1 σp2 . . . σpp









(28)
as X is multivariate data.
=⇒ V ar(Zj) = at
jΣaj (29)
Now,
E[Zj] = E[aT
j x] = aT
j E[x] = aT
j µ (30)
11
aT
j x ∼ (aT
j µ, at
jΣaj) (31)
If it is Normally distributed, then,
aT
j x ∼ N(aT
j µ, at
jΣaj) (32)
Σ and µ are population parameters. If they are known, then it is pop-
ulation principal component analysis. But, it is usually unknown. In such
a case X is used as the best estimate of µ and S, the sample covariance
matrix, is used as the estimate for σ. And, in that case the PCA is known
as sample principal component analysis. Since population parameters are
rarely known, we will be going ahead with sample PCA. In sample PCA:
E[zj] = E[aT
j X] = aT
j E[X] = aT
j X (33)
V ar(zj) = V ar(aT
j X) = aT
j Cov(X)aj = aT
j Saj (34)
If it is normally distributed then,
aT
j ∼ (aT
j X, aT
j Saj) (35)
Now, we will look at the principles followed to extract the PCs. Principles
of Principals:
• Each PC is a linear combination of X, a p×1 variable vector, i.e., aT
j X
• First PC is aT
1 X, subjected to aT
1 a1 = 1 that maximizes V ar(aT
1 X)
12
• Second PC is aT
2 X that maximizes V ar(aT
2 X) and subjected to aT
2 a2 =
1 and Cov(aT
1 X, aT
2 X) = 0
• The jth
PC is aT
j X that maximizes V ar(aT
j X) and subjected to aT
j aj =
1 and Cov(aT
j , aT
k X) = 0 for k < j
V ar(Z1) is the maximum variance in the data and aT
1 a1 = 1. V ar(Z2) is
the maximum variance in the data after V ar(Z1) has been removed and
aT
2 a2 = 1 and cov(a1X, a2X) = 0. Since the components are orthogo-
nal, therefore cov(a1X, a2X) = 0. V ar(Z3) is the maximum variance in
the data after V ar(Z1) and V ar(Z2)has been removed and aT
3 a3 = 1 and
cov(a1X, a3X) = 0, cov(a2X, a3X) = 0. And so on and so forth.
Now, the optimization problem is : Maximize V ar(Zj) = Maximize aT
j Saj
subject to aT
j aj = 1 which can be written as aT
j aj − 1 = 0.
Using Langragian:
Max L = aT
j Saj − λ(aT
j aj − 1) (36)
where λ is the lagrange multiplier. We need to find the value of aj such
that L is maximized. Using First Order Condition,
∂L
∂aj
= 0 (37)
=⇒ 2Saj − 2λaj = 0 (38)
=⇒ (S − λI)aj = 0 (39)
13
S is a matrix and λ is a scalar hence the identity matrix I. S is a p × p
matrix as their are p variables. λ is a scalar. I is a p × p identity matrix.
|S − λI| = 0 is the characteristic equation. If S is a p × p matrix, then the
characteristic equation has p roots which will be λ1 ≥ λ2 ≥ · · · ≥ λp. The
λs are the eigen values and the corrsponding eigen vectors give the values of
the corresponding principals aj. Each of the eigen value is the variance along
the corresponding dimension.
14

More Related Content

What's hot

Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisMason Ziemer
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and morehsharmasshare
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-Ihktripathy
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysisVanessa S
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
 
Principal component analysis - application in finance
Principal component analysis - application in financePrincipal component analysis - application in finance
Principal component analysis - application in financeIgor Hlivka
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysishktripathy
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and ClusteringUsha Vijay
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 

What's hot (20)

Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Pca
PcaPca
Pca
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
 
Pca ankita dubey
Pca ankita dubeyPca ankita dubey
Pca ankita dubey
 
Principal component analysis - application in finance
Principal component analysis - application in financePrincipal component analysis - application in finance
Principal component analysis - application in finance
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 

Similar to Principal Component Analysis

The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9VuTran231
 
Journey to structure from motion
Journey to structure from motionJourney to structure from motion
Journey to structure from motionJa-Keoung Koo
 
Aieee maths-quick review
Aieee maths-quick reviewAieee maths-quick review
Aieee maths-quick reviewSharath Kumar
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniquesKrishna Gali
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt msFaeco Bot
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)CrackDSE
 
On the Seidel’s Method, a Stronger Contraction Fixed Point Iterative Method o...
On the Seidel’s Method, a Stronger Contraction Fixed Point Iterative Method o...On the Seidel’s Method, a Stronger Contraction Fixed Point Iterative Method o...
On the Seidel’s Method, a Stronger Contraction Fixed Point Iterative Method o...BRNSS Publication Hub
 
multivariate normal distribution.pdf
multivariate normal distribution.pdfmultivariate normal distribution.pdf
multivariate normal distribution.pdfrishumaurya10
 
Conjugate Gradient Methods
Conjugate Gradient MethodsConjugate Gradient Methods
Conjugate Gradient MethodsMTiti1
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theoremWathna
 
Relative superior mandelbrot and julia sets for integer and non integer values
Relative superior mandelbrot and julia sets for integer and non integer valuesRelative superior mandelbrot and julia sets for integer and non integer values
Relative superior mandelbrot and julia sets for integer and non integer valueseSAT Journals
 
Relative superior mandelbrot sets and relative
Relative superior mandelbrot sets and relativeRelative superior mandelbrot sets and relative
Relative superior mandelbrot sets and relativeeSAT Publishing House
 

Similar to Principal Component Analysis (20)

The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Multivariate Methods Assignment Help
Multivariate Methods Assignment HelpMultivariate Methods Assignment Help
Multivariate Methods Assignment Help
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9
 
Journey to structure from motion
Journey to structure from motionJourney to structure from motion
Journey to structure from motion
 
Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
ch3.ppt
ch3.pptch3.ppt
ch3.ppt
 
Aieee maths-quick review
Aieee maths-quick reviewAieee maths-quick review
Aieee maths-quick review
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
 
Metodo gauss_newton.pdf
Metodo gauss_newton.pdfMetodo gauss_newton.pdf
Metodo gauss_newton.pdf
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 
03_AJMS_166_18_RA.pdf
03_AJMS_166_18_RA.pdf03_AJMS_166_18_RA.pdf
03_AJMS_166_18_RA.pdf
 
03_AJMS_166_18_RA.pdf
03_AJMS_166_18_RA.pdf03_AJMS_166_18_RA.pdf
03_AJMS_166_18_RA.pdf
 
On the Seidel’s Method, a Stronger Contraction Fixed Point Iterative Method o...
On the Seidel’s Method, a Stronger Contraction Fixed Point Iterative Method o...On the Seidel’s Method, a Stronger Contraction Fixed Point Iterative Method o...
On the Seidel’s Method, a Stronger Contraction Fixed Point Iterative Method o...
 
multivariate normal distribution.pdf
multivariate normal distribution.pdfmultivariate normal distribution.pdf
multivariate normal distribution.pdf
 
Conjugate Gradient Methods
Conjugate Gradient MethodsConjugate Gradient Methods
Conjugate Gradient Methods
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theorem
 
Relative superior mandelbrot and julia sets for integer and non integer values
Relative superior mandelbrot and julia sets for integer and non integer valuesRelative superior mandelbrot and julia sets for integer and non integer values
Relative superior mandelbrot and julia sets for integer and non integer values
 
Relative superior mandelbrot sets and relative
Relative superior mandelbrot sets and relativeRelative superior mandelbrot sets and relative
Relative superior mandelbrot sets and relative
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 

Recently uploaded

Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACherry
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptxMuhammadRazzaq31
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfCherry
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.takadzanijustinmaime
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Cherry
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationSérgio Sacani
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxCherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptxCherry
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Cherry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methodsimroshankoirala
 
Daily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter PhysicsDaily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter PhysicsWILSONROMA4
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 

Recently uploaded (20)

Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methods
 
Daily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter PhysicsDaily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter Physics
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 

Principal Component Analysis

  • 2. 0.1 Prologue Figure 1: Rotation of Axis As shown in Figure 0.1 x = x cosα + y sinα y = −x sinα + y cosα 0.2 Introduction Principal component analysis is a data reduction technique. It was developed by Hotelling in 1933. It reduces the number of dimensions. The new dimen- sions are orthogonal to each other and they are the principal components. 0.3 Mathematics of PCA Let us take a two-dimensional matrix, a data of 2 variables and n observa- tions. Say the scatter plot of the data looks like what is shown in Figure 1
  • 3. 0.3. Xn×2 =          x11 x12 x21 x22 ... ... xd1 xd2          Figure 2: Principal Components in 2-D Let us compute the covariance matrix of X and denote it as: Cov(X1, X2) =    s11 s12 s21 s22    It is a 2 × 2 matrix. The scatter plot shows that there is a relationship between X1 and X2. So, if we calculate the correlation matrix, we will get non zero correlation coefficient. Corr(X1, X2) =    1 r12 r21 1    We know that −1 ≤ r12 = r21 ≤ 1 and say r12 ∼= 0.9. The variance of X1 is s11 and the variance of X2 is s22 which are sample variance. The corresponding 2
  • 4. Figure 3: Original Data population variance matrix is denoted as: PopulationCov(X) =    σ11 σ12 σ21 σ22    The population variance of X1 is σ11 and that of X2 is σ22. As shown in Figure , the variability of X1 and X2 (s11 and s22 respec- tively)is not same but they are both high. Now, we rotate the axis X1 and X2 by an angle θ. Let us call the new axes Z1 and Z2. Now, let us see the variability along the new axes Z1 and Z2. Let us draw a figure along the new axes Z1 and Z2 and call this trans- formed data. We compare the two diagrams and see that V ariability(Z1) V ariability(X1) and similarly, or rather even more so, V ariability(Z2) 3
  • 5. Figure 4: Transformation V ariability(X2). And, in the transformed axes as shown in Fig 0.3: • V (Z1) V (Z2) (1) Since the variability across Z2 is much less compared to Z1, then we can say that the information content across Z2 dimension is very less. In fact, we can ignore that information and we can just capture the information along Z1. In statistical sense, information is nothing but the variability/variance . Z1 alone is enough to give the information 4
  • 6. Figure 5: Transformed Data which was scattered all over X1 & X2. We can in fact ignore the di- mension Z2, which will leave us with lesser number of dimension. That is why PCA is essentially a dimensionality reduction technique. The 5
  • 7. dimension reduction can be done for p-dimension matrix as well. • Orthogonal Dimensions. The major and minor axes of the ellipse (of the scatter plot) is parallel to the (transformed) coordinate axes which was not the case earlier in the X1 & X2 axes system. When the major and minor axes of the ellipse is not parallel to the coordinate axes, it shows a dependency between the axes, X1 & X2. That is not the case with the transformed axes Z1 & Z2 which shows that they are independent, that they are orthogonal. Orthogonality is preserved So, we are trying to develop a mathematical formulation where uncorrelated data structure is transformed to an uncorrelated data structure (the axis of the ellipse is parallel to the transformed coordinate axes) in a reduced di- mension. A reduction from p-dimension to a lesser number of dimension is what PCA does. By p-dimension, it means an n × p matrix. p denotes the number of variables, the number of columns of data. n denotes the number of observa- tions, number of rows of data: X =          x11 x12 x13 . . . x1p x21 x22 x23 . . . x2p ... ... ... ... ... xn1 xd2 xd3 . . . xnp          =          XT 1 XT 2 ... XT p          6
  • 8. We transform it from X space to Z space. Xn,p → Zn,q where q p. PCA is a dimensionality reduction technique. It transforms the original data matrix into a reduced components matrix and it preserves the orthogonality of the components. Suppose we want to do a prediction models using multivariate regression. The Xs are independent variables (IV). If these IVs are corre- lated, then that would lead to the problem of multicollinearity. The linear model will not work as the determinant of the matrix will be zero. One of the solution to this problem is the ridge regression. But, if we can make them independent by transformations then the variables would be truly indepen- dent and then linear regression can be applied. This is one of the advantages of PCA. What is Principal? What is the method? we will go by the same 2 × 2 matrix, with scatter plot of the data in the shape of ellipse. Let us consider a point at random from all the data points, say M with coordinates (x1, x2). The coordinates of the point M on the transformed axis will be (Assuming that Z1 is at angle θ counter clock wise from X1): z1 : x1cosθ + x2sinθ (2) z2 : −x1sinθ + x2cosθ (3) which in mtrix form can be written as:    z1 z2    =    cosθ sinθ −sinθ cosθ       x1 x2    (4) 7
  • 9. or Z = AT X (5) where, Z =    z1 z2    (6) A =    cosθ −sinθ sinθ cosθ    (7) and X =    x1 x2    (8) Now, if instead of p = 2 we have p = p, we will get: Z =          z1 z2 ... zp          =          a11 a12 . . . a1p a21 a22 . . . a2p ... ... ... ... ap1 ap2 . . . app                   x1 x2 ... xp          (9) In terms of dimensions: Zp×1 = [Ap×p]T × Xp×1 (10) We say that it is an orthogonal transformation. Let’s see now, how this 8
  • 10. orthogonality is maintained. Let’s see with the two-dimension case. A =    cosθ −sinθ sinθ cosθ    = a1 a2 (11) where a1 =    cosθ sinθ    (12) and a2 =    −sinθ cosθ    (13) Now, aT 1 a1 = cosθ sinθ    cosθ sinθ    = 1 (14) Similarly, aT 2 a2 = 1 (15) Now, AT A =    cosθ sinθ −sinθ cosθ       cosθ −sinθ sinθ cosθ    (16) =⇒ AT A =    1 0 0 1    = AAT = A−1 A (17) 9
  • 11. =⇒ A−1 A = I (18) which means, A is an orthogonal matrix, the off diagonal elements are 0. This shows that the transformation that we are doing is an orthogonal trans- formation. A is an orthogonal transformation matrix. In case of p variables: z1 = aT 1 x = a11X1 + a12X2 + · · · + a1pXp (19) z2 = aT 2 x = a21X1 + a22X2 + · · · + a2pXp (20) ... = · · · = . . . . . . . . . . . . . . . (21) zj = aT j x = aj1X1 + aj2X2 + · · · + ajpXp (22) ... = · · · = . . . . . . . . . . . . . . . (23) zp = aT p x = ap1X1 + ap2X2 + · · · + appXp (24) where, aT j aj = 1, j = 1, 2, . . . , p (25) and var(z1) ≥ var(z2) ≥ · · · ≥ var(zp) (26) We want the first principal component in such a manner that it will explain the maximum variance possible. Second principal component will explain the next maximum variance followed (after taking into account the variance 10
  • 12. already explained by the first principal component) and so on and so forth. Now, the next step is, how to extract the principal components given the data matrix X. The jth principal component, Zj is equal to at jx. So, V ar(Zj) = V ar(at jx) = at jV ar(x)aj (27) as aj is a constant vector. V ar(X) = Cov(X) = Σ =          σ11 σ12 . . . σ1p σ21 σ22 . . . σ2p ... ... ... ... σp1 σp2 . . . σpp          (28) as X is multivariate data. =⇒ V ar(Zj) = at jΣaj (29) Now, E[Zj] = E[aT j x] = aT j E[x] = aT j µ (30) 11
  • 13. aT j x ∼ (aT j µ, at jΣaj) (31) If it is Normally distributed, then, aT j x ∼ N(aT j µ, at jΣaj) (32) Σ and µ are population parameters. If they are known, then it is pop- ulation principal component analysis. But, it is usually unknown. In such a case X is used as the best estimate of µ and S, the sample covariance matrix, is used as the estimate for σ. And, in that case the PCA is known as sample principal component analysis. Since population parameters are rarely known, we will be going ahead with sample PCA. In sample PCA: E[zj] = E[aT j X] = aT j E[X] = aT j X (33) V ar(zj) = V ar(aT j X) = aT j Cov(X)aj = aT j Saj (34) If it is normally distributed then, aT j ∼ (aT j X, aT j Saj) (35) Now, we will look at the principles followed to extract the PCs. Principles of Principals: • Each PC is a linear combination of X, a p×1 variable vector, i.e., aT j X • First PC is aT 1 X, subjected to aT 1 a1 = 1 that maximizes V ar(aT 1 X) 12
  • 14. • Second PC is aT 2 X that maximizes V ar(aT 2 X) and subjected to aT 2 a2 = 1 and Cov(aT 1 X, aT 2 X) = 0 • The jth PC is aT j X that maximizes V ar(aT j X) and subjected to aT j aj = 1 and Cov(aT j , aT k X) = 0 for k < j V ar(Z1) is the maximum variance in the data and aT 1 a1 = 1. V ar(Z2) is the maximum variance in the data after V ar(Z1) has been removed and aT 2 a2 = 1 and cov(a1X, a2X) = 0. Since the components are orthogo- nal, therefore cov(a1X, a2X) = 0. V ar(Z3) is the maximum variance in the data after V ar(Z1) and V ar(Z2)has been removed and aT 3 a3 = 1 and cov(a1X, a3X) = 0, cov(a2X, a3X) = 0. And so on and so forth. Now, the optimization problem is : Maximize V ar(Zj) = Maximize aT j Saj subject to aT j aj = 1 which can be written as aT j aj − 1 = 0. Using Langragian: Max L = aT j Saj − λ(aT j aj − 1) (36) where λ is the lagrange multiplier. We need to find the value of aj such that L is maximized. Using First Order Condition, ∂L ∂aj = 0 (37) =⇒ 2Saj − 2λaj = 0 (38) =⇒ (S − λI)aj = 0 (39) 13
  • 15. S is a matrix and λ is a scalar hence the identity matrix I. S is a p × p matrix as their are p variables. λ is a scalar. I is a p × p identity matrix. |S − λI| = 0 is the characteristic equation. If S is a p × p matrix, then the characteristic equation has p roots which will be λ1 ≥ λ2 ≥ · · · ≥ λp. The λs are the eigen values and the corrsponding eigen vectors give the values of the corresponding principals aj. Each of the eigen value is the variance along the corresponding dimension. 14