SlideShare a Scribd company logo
1 of 6
Download to read offline
Introduction	
  to	
  PCA	
  
Christian	
  Zuniga,	
  PhD	
  
Friday,	
  November	
  8,	
  2019	
  
	
  
	
  Principal	
  component	
  analysis	
  (PCA)	
  is	
  an	
  unsupervised,	
  linear	
  technique	
  for	
  
dimensionality	
  reduction	
  first	
  developed	
  by	
  Pearson	
  in	
  19011,2,3.	
  It	
  is	
  widely	
  used	
  in	
  
many	
  areas	
  of	
  data	
  mining	
  such	
  as	
  visualization,	
  image	
  processing	
  and	
  anomaly	
  
detection.	
  It	
  is	
  based	
  on	
  the	
  fact	
  that	
  data	
  may	
  have	
  redundancies	
  in	
  its	
  
representation.	
  Data	
  refers	
  to	
  a	
  collection	
  of	
  similar	
  objects	
  and	
  their	
  features.	
  An	
  
object	
  could	
  be	
  a	
  house	
  and	
  the	
  features	
  the	
  location,	
  the	
  number	
  of	
  bedrooms,	
  the	
  
square	
  footage,	
  and	
  any	
  other	
  characteristic	
  that	
  can	
  be	
  recorded	
  of	
  the	
  house.	
  In	
  
PCA	
  analysis,	
  redundancy	
  in	
  the	
  data	
  refers	
  to	
  linear	
  correlation	
  among	
  features.	
  
Knowledge	
  of	
  one	
  feature	
  reveals	
  some	
  knowledge	
  of	
  another	
  feature.	
  PCA	
  may	
  use	
  
this	
  redundancy	
  to	
  form	
  a	
  smaller	
  set	
  of	
  features,	
  called	
  principal	
  components	
  that	
  
can	
  approximate	
  well	
  the	
  data.	
  	
  
	
  
Figure	
  1	
  shows	
  the	
  general	
  idea.	
  The	
  data	
  is	
  represented	
  as	
  a	
  matrix	
  X	
  with	
  N	
  
objects	
  (like	
  houses)	
  and	
  F	
  features	
  (like	
  square	
  footage).	
  PCA	
  linearly	
  transforms	
  
the	
  features	
  into	
  a	
  new	
  set	
  and	
  retains	
  the	
  G	
  most	
  relevant	
  features	
  where	
  G	
  <	
  F.	
  The	
  
new	
  features	
  are	
  called	
  the	
  principal	
  components.	
  The	
  new	
  data	
  matrix	
  Y	
  is	
  Y	
  =	
  PX,	
  
where	
  P	
  is	
  a	
  G	
  by	
  F	
  projection	
  matrix.	
  	
  The	
  first	
  principal	
  component	
  captures	
  most	
  
of	
  the	
  variance	
  of	
  the	
  data.	
  Each	
  additional	
  principal	
  component	
  is	
  made	
  to	
  capture	
  
the	
  remaining	
  variance	
  and	
  is	
  uncorrelated	
  or	
  orthogonal	
  to	
  other	
  principal	
  
components.	
  
	
  
	
  
Figure	
  1	
  PCA	
  transforms	
  a	
  data	
  matrix	
  into	
  a	
  new	
  one	
  with	
  fewer	
  features.	
  
	
  
	
  
	
  
The	
  cars	
  dataset	
  from	
  UC	
  Irvine	
  will	
  be	
  used	
  as	
  an	
  example4.	
  	
  This	
  set	
  contains	
  9	
  
features	
  for	
  392	
  cars	
  of	
  various	
  makes	
  and	
  models.	
  	
  Figure	
  2	
  shows	
  two	
  sample	
  
features,	
  ‘acceleration’	
  plotted	
  vs.	
  ‘horsepower’.	
  	
  Acceleration	
  is	
  given	
  in	
  the	
  time	
  
taken	
  for	
  a	
  car	
  to	
  accelerate	
  from	
  0	
  to	
  60	
  mph.	
  The	
  figure	
  shows	
  the	
  two	
  features	
  
have	
  opposite	
  trends,	
  or	
  are	
  negatively	
  correlated.	
  This	
  is	
  not	
  surprising	
  since	
  
higher	
  horsepower	
  should	
  result	
  in	
  smaller	
  times.	
  	
  
	
  
 
Figure	
  2	
  Two	
  features	
  of	
  car	
  data	
  set	
  shows	
  the	
  data	
  is	
  concentrated	
  along	
  a	
  line	
  P1.	
  
	
  
Figure	
  2	
  shows	
  that	
  most	
  of	
  the	
  variation	
  of	
  the	
  features	
  is	
  concentrated	
  along	
  a	
  line	
  
labeled	
  ‘P1’.	
  	
  The	
  remainder	
  of	
  the	
  variation	
  is	
  along	
  a	
  second	
  line	
  labeled	
  ‘P2’.	
  The	
  
lines	
  can	
  be	
  characterized	
  by	
  unit	
  vectors	
  vj	
  =	
  [a1j,	
  a2j]	
  (j=1,2)	
  that	
  give	
  the	
  lines’	
  
orientations.	
  The	
  lines’	
  displacements	
  from	
  the	
  origin	
  do	
  not	
  matter	
  since	
  the	
  data	
  
will	
  later	
  be	
  centered	
  at	
  zero.	
  Each	
  point	
  represents	
  a	
  car	
  and	
  can	
  also	
  be	
  
represented	
  by	
  a	
  vector	
  xi	
  =	
  [horsepoweri,	
  accelerationi],	
  where	
  the	
  subscript	
  
corresponds	
  to	
  the	
  ith	
  car.	
  Each	
  point	
  can	
  be	
  projected	
  onto	
  a	
  line	
  Pj	
  by	
  the	
  inner	
  
product	
  of	
  vj	
  and	
  xi	
  as	
  shown	
  in	
  Figure	
  3.	
  	
  	
  
	
  
𝑝!" = 𝑎!! 𝑎𝑐𝑐𝑒𝑙𝑒𝑟𝑎𝑡𝑖𝑜𝑛! + 𝑎!!ℎ𝑜𝑟𝑠𝑒𝑝𝑜𝑤𝑒𝑟!	
  
	
  
	
  
	
  
Figure	
  3	
  Projecting	
  a	
  point	
  onto	
  line	
  P1.	
  
	
  
This	
  new	
  feature	
  p1	
  is	
  the	
  first	
  principal	
  component	
  and	
  is	
  a	
  linear	
  combination	
  of	
  
the	
  original	
  two	
  features,	
  horsepower	
  and	
  acceleration.	
  	
  In	
  general	
  it	
  will	
  not	
  have	
  a	
  
more	
  descriptive	
  name	
  but	
  one	
  could	
  be	
  given	
  to	
  clarify	
  the	
  concept.	
  One	
  option	
  is	
  to	
  
think	
  of	
  the	
  combination	
  of	
  ‘horsepower’	
  and	
  ‘acceleration’	
  as	
  the	
  ‘performance’	
  of	
  
the	
  car.	
  	
  
	
  
The	
  question	
  is	
  then	
  how	
  to	
  find	
  the	
  coefficients	
  a11,	
  and	
  a21	
  of	
  vector	
  v1,	
  which	
  gives	
  
the	
  direction	
  of	
  the	
  best-­‐fit	
  line	
  P1.	
  This	
  line	
  should	
  be	
  as	
  close	
  to	
  all	
  points	
  as	
  
possible,	
  minimizing	
  the	
  average	
  distance	
  J	
  to	
  all	
  the	
  points.	
  	
  
	
  
𝐽 =
1
𝑁
𝑑!
!
!
!!!
	
  
	
  
The	
  solution	
  lies	
  in	
  the	
  covariance	
  matrix	
  of	
  the	
  features	
  SX.	
  Specifically;	
  the	
  
eigenvectors	
  of	
  SX	
  give	
  the	
  required	
  vectors	
  v1	
  and	
  v2.	
  To	
  calculate	
  the	
  covariance	
  
matrix,	
  the	
  mean	
  of	
  each	
  feature	
  is	
  subtracted	
  from	
  each	
  row.	
  To	
  put	
  the	
  features	
  on	
  
a	
  similar	
  scale,	
  they	
  should	
  also	
  be	
  divided	
  by	
  their	
  standard	
  deviation.	
  This	
  is	
  done	
  
to	
  prevent	
  the	
  analysis	
  from	
  capturing	
  uninteresting	
  directions	
  in	
  the	
  data.	
  After	
  
this	
  preprocessing	
  of	
  the	
  data	
  matrix	
  X,	
  the	
  covariance	
  matrix	
  is	
  and	
  F	
  by	
  F	
  matrix:	
  
	
  
𝑺! =
1
𝑁
𝑿𝑿!
	
  
	
  
The	
  vector	
  v1	
  corresponds	
  to	
  the	
  eigenvector	
  with	
  largest	
  eigenvalue	
  λM.	
  Vector	
  v1	
  
corresponds	
  to	
  the	
  second	
  smaller	
  eigenvalue	
  λm	
  (λm	
  <	
  λM).	
  
	
  
𝑺 𝑿 𝒗 𝟏 = 𝜆! 𝒗 𝟏	
  
	
   	
   	
   	
   	
   𝑺 𝑿 𝒗 𝟐 = 𝜆! 𝒗 𝟐	
  
	
  
For	
  the	
  car	
  data	
  set,	
  using	
  only	
  the	
  features	
  ‘acceleration’	
  and	
  ‘horsepower’,	
  the	
  
covariance	
  matrix	
  is.	
  
	
  
𝑺 𝑿 =
1 −0.69
−0.69 1
	
  
	
  
The	
  off-­‐diagonal	
  term	
  -­‐0.69	
  shows	
  the	
  cross-­‐covariance	
  between	
  horsepower	
  and	
  
acceleration,	
  which	
  is	
  negative	
  as	
  implied	
  by	
  Figure	
  1.	
  The	
  diagonal	
  terms	
  show	
  the	
  
auto-­‐covariance	
  of	
  each	
  feature	
  and	
  have	
  value	
  1	
  because	
  of	
  the	
  pre-­‐scaling.	
  	
  
	
  
Using	
  any	
  linear	
  algebra	
  solver	
  readily	
  gives	
  the	
  eigenvectors	
  of	
  the	
  covariance	
  
matrix.	
  The	
  eigenvalues	
  are	
  (1.69,	
  0.31).	
  Their	
  sum	
  is	
  the	
  total	
  variance,	
  which	
  is	
  2.	
  
Vector	
  v1	
  is	
  [0.707,	
  -­‐0.707]	
  and	
  captures	
  84.5%	
  of	
  the	
  total	
  variance	
  (1.69/2).	
  Figure	
  
4	
  shows	
  the	
  resulting	
  directions	
  of	
  v1	
  and	
  v2.	
  
 
Figure	
  4	
  Rescaled	
  data	
  with	
  the	
  direction	
  of	
  the	
  2	
  principal	
  components.	
  
The	
  projection	
  matrix	
  P	
  can	
  be	
  made	
  with	
  v1	
  and	
  v2	
  as	
  rows.	
  If	
  both	
  vectors	
  are	
  
kept,	
  there	
  is	
  no	
  loss	
  in	
  representation.	
  The	
  new	
  representation	
  would	
  have	
  a	
  
covariance	
  matrix.	
  
	
  
𝑺! =
1
𝑁
𝒀𝒀!
	
  
𝑺! =
1
𝑁
𝑷𝑿 𝑷𝑿 !
= 𝑷𝑺 𝑿 𝑷!
= 𝚲	
  
	
  
Since	
  P	
  has	
  the	
  eigenvectors	
  of	
  Sx	
  as	
  rows,	
  the	
  right	
  hand	
  side	
  results	
  in	
  a	
  diagonal	
  
matrix	
  of	
  the	
  eigenvalues	
  of	
  Sx.	
  	
  
	
  
𝑆! =
1.69 0
0 0.31
	
  
	
  
For	
  dimensionality	
  reduction,	
  only	
  v1	
  would	
  be	
  used.	
  	
  The	
  new	
  representation,	
  
matrix	
  Y,	
  would	
  have	
  a	
  single	
  feature,	
  the	
  first	
  principle	
  component	
  (Y	
  =	
  v1X).	
  
	
  
PCA	
  can	
  be	
  applied	
  to	
  any	
  number	
  of	
  features.	
  The	
  car	
  data	
  set	
  has	
  the	
  additional	
  
features:	
  ‘cylinders’,	
  ‘displacement’,	
  and	
  ‘weight’	
  for	
  a	
  total	
  of	
  5	
  features.	
  Other	
  
features	
  are	
  categorical	
  and	
  one	
  ‘mpg’	
  is	
  usually	
  the	
  target	
  variable	
  of	
  interest.	
  The	
  
same	
  process	
  can	
  be	
  done	
  to	
  obtain	
  the	
  principal	
  components.	
  Each	
  principal	
  
component	
  is	
  a	
  linear	
  combination	
  of	
  these	
  5	
  features.	
  Standard	
  scaling	
  is	
  applied	
  on	
  
the	
  features	
  before	
  making	
  the	
  covariance	
  matrix	
  Sx	
  (now	
  a	
  5	
  by	
  5	
  matrix).	
  	
  The	
  
eigenvalues	
  and	
  eigenvectors	
  are	
  found	
  and	
  the	
  eigenvectors	
  used	
  to	
  make	
  the	
  
projection	
  matrix	
  P.	
  Figure	
  5	
  shows	
  the	
  percentage	
  of	
  variance	
  explained	
  by	
  each	
  
component.	
  Again	
  the	
  first	
  component	
  captures	
  over	
  80%	
  of	
  the	
  total	
  variance.	
  	
  
Instead	
  of	
  5	
  features,	
  1-­‐2	
  principal	
  components	
  may	
  be	
  enough	
  for	
  various	
  
purposes.	
  
	
  
 
Figure	
  5	
  Percentage	
  of	
  variance	
  explained	
  by	
  each	
  principal	
  component.	
  
	
  
For	
  example,	
  Figure	
  6	
  shows	
  a	
  plot	
  of	
  principal	
  component	
  2	
  vs.	
  principal	
  
component	
  1.	
  Together	
  they	
  capture	
  about	
  96%	
  of	
  the	
  total	
  variance.	
  The	
  figure	
  
shows	
  there	
  may	
  be	
  3	
  clusters,	
  or	
  groups	
  of	
  cars.	
  These	
  may	
  correspond	
  to	
  different	
  
types	
  of	
  cars	
  such	
  as	
  sports	
  cars,	
  sedans,	
  and	
  trucks.	
  Confirming	
  with	
  the	
  make	
  of	
  
the	
  car	
  might	
  clarify	
  this.	
  A	
  clustering	
  algorithm	
  like	
  k-­‐means	
  may	
  be	
  applied	
  to	
  
quantify	
  the	
  clusters.	
  This	
  shows	
  a	
  common	
  application	
  of	
  PCA	
  in	
  dimensionality	
  
reduction	
  where	
  fewer	
  features	
  help	
  with	
  many	
  machine	
  learning	
  algorithms.	
  
	
  
	
  
Figure	
  6	
  Principal	
  Component	
  2	
  vs.	
  1	
  indicates	
  there	
  may	
  be	
  around	
  3	
  clusters.	
  
	
  
To	
  summarize,	
  PCA	
  is	
  a	
  linear,	
  dimensionality	
  reduction	
  technique	
  that	
  forms	
  new	
  
features	
  using	
  linear	
  combinations	
  of	
  the	
  original.	
  These	
  new	
  features,	
  the	
  principal	
  
components,	
  maximize	
  the	
  total	
  variance	
  capture	
  and	
  are	
  uncorrelated	
  with	
  each	
  
other.	
  The	
  eigenvectors	
  of	
  the	
  covariance	
  matrix	
  are	
  used	
  to	
  transform	
  the	
  data	
  
matrix.	
  In	
  practice	
  if	
  there	
  are	
  many	
  features,	
  forming	
  the	
  covariance	
  matrix	
  may	
  be	
  
computationally	
  expensive	
  and	
  an	
  SVD	
  of	
  the	
  data	
  matrix	
  is	
  used.	
  	
  
	
  
[1]	
  Gilbert	
  Strang	
  “Linear	
  Algebra	
  and	
  Learning	
  from	
  Data”	
  Wellesley	
  Cambridge	
  
Press	
  2019	
  
[2]	
  Deisenroth,	
  et.	
  Al.	
  “Mathematics	
  for	
  Machine	
  Learning”	
  to	
  be	
  published	
  
Cambridge	
  University	
  Press.	
  https://mml-­‐book.com	
  
[3]	
  Shlens	
  “A	
  Tutorial	
  on	
  Principal	
  Component	
  Analysis”	
  2014	
  
https://arxiv.org/abs/1404.1100	
  
[4]	
  https://archive.ics.uci.edu/ml/datasets/car+evaluation	
  
	
  
	
  
	
  

More Related Content

What's hot

Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Ismael Torres-Pizarro, PhD, PE, Esq.
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisSumit Singh
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis Ibrahim Amer
 
04 programming 2
04 programming 204 programming 2
04 programming 2Tareq Qazi
 
"FingerPrint Recognition Using Principle Component Analysis(PCA)”
"FingerPrint Recognition Using Principle Component Analysis(PCA)”"FingerPrint Recognition Using Principle Component Analysis(PCA)”
"FingerPrint Recognition Using Principle Component Analysis(PCA)”Er. Arpit Sharma
 
Numerical approach of riemann-liouville fractional derivative operator
Numerical approach of riemann-liouville fractional derivative operatorNumerical approach of riemann-liouville fractional derivative operator
Numerical approach of riemann-liouville fractional derivative operatorIJECEIAES
 
Network topology
Network topologyNetwork topology
Network topologytoramamohan
 
Image recogonization
Image recogonizationImage recogonization
Image recogonizationSANTOSH RATH
 
February 10 2016
February 10 2016February 10 2016
February 10 2016khyps13
 

What's hot (20)

Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
 
Digital Signal Processing Assignment Help
Digital Signal Processing Assignment HelpDigital Signal Processing Assignment Help
Digital Signal Processing Assignment Help
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis
 
Lda
LdaLda
Lda
 
04 programming 2
04 programming 204 programming 2
04 programming 2
 
"FingerPrint Recognition Using Principle Component Analysis(PCA)”
"FingerPrint Recognition Using Principle Component Analysis(PCA)”"FingerPrint Recognition Using Principle Component Analysis(PCA)”
"FingerPrint Recognition Using Principle Component Analysis(PCA)”
 
Computation Assignment Help
Computation Assignment Help Computation Assignment Help
Computation Assignment Help
 
Electrical Engineering Assignment Help
Electrical Engineering Assignment HelpElectrical Engineering Assignment Help
Electrical Engineering Assignment Help
 
Numerical approach of riemann-liouville fractional derivative operator
Numerical approach of riemann-liouville fractional derivative operatorNumerical approach of riemann-liouville fractional derivative operator
Numerical approach of riemann-liouville fractional derivative operator
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
Network topology
Network topologyNetwork topology
Network topology
 
Block diagrams
Block diagramsBlock diagrams
Block diagrams
 
Image recogonization
Image recogonizationImage recogonization
Image recogonization
 
Pca ankita dubey
Pca ankita dubeyPca ankita dubey
Pca ankita dubey
 
February 10 2016
February 10 2016February 10 2016
February 10 2016
 
Matlab Assignment Help
Matlab Assignment HelpMatlab Assignment Help
Matlab Assignment Help
 

Similar to Introduction to pca v2

Cs229 notes10
Cs229 notes10Cs229 notes10
Cs229 notes10VuTran231
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)NYversity
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
CEE 213—Deformable Solids The Mechanics Project Arizona Stat.docx
CEE 213—Deformable Solids The Mechanics Project Arizona Stat.docxCEE 213—Deformable Solids The Mechanics Project Arizona Stat.docx
CEE 213—Deformable Solids The Mechanics Project Arizona Stat.docxcravennichole326
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectSurya Chandra
 
Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11P Palai
 
Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresAYUSH RAJ
 
Finite element analysis sdfa ggq rsd vqer fas dd sg fa sd qadas casdasc asdac...
Finite element analysis sdfa ggq rsd vqer fas dd sg fa sd qadas casdasc asdac...Finite element analysis sdfa ggq rsd vqer fas dd sg fa sd qadas casdasc asdac...
Finite element analysis sdfa ggq rsd vqer fas dd sg fa sd qadas casdasc asdac...MrGChandrasekarmecha
 
tw1979 Exercise 1 Report
tw1979 Exercise 1 Reporttw1979 Exercise 1 Report
tw1979 Exercise 1 ReportThomas Wigg
 
Statistics with Computer Applications
Statistics with Computer ApplicationsStatistics with Computer Applications
Statistics with Computer ApplicationsDrMateoMacalaguingJr
 
Finite element analysis of shape function
Finite element analysis of shape functionFinite element analysis of shape function
Finite element analysis of shape functionAjay ed
 
Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Niklas Ignell
 
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationApplied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationBrian Erandio
 
Building 3D Morphable Models from 2D Images
Building 3D Morphable Models from 2D ImagesBuilding 3D Morphable Models from 2D Images
Building 3D Morphable Models from 2D ImagesShanglin Yang
 
ECE 3rd_Unit No. 1_K-Map_DSD.ppt
ECE 3rd_Unit No. 1_K-Map_DSD.pptECE 3rd_Unit No. 1_K-Map_DSD.ppt
ECE 3rd_Unit No. 1_K-Map_DSD.pptsonusreekumar
 
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGCOMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGcsandit
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...ANIRBANMAJUMDAR18
 

Similar to Introduction to pca v2 (20)

Cs229 notes10
Cs229 notes10Cs229 notes10
Cs229 notes10
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
CEE 213—Deformable Solids The Mechanics Project Arizona Stat.docx
CEE 213—Deformable Solids The Mechanics Project Arizona Stat.docxCEE 213—Deformable Solids The Mechanics Project Arizona Stat.docx
CEE 213—Deformable Solids The Mechanics Project Arizona Stat.docx
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11
 
Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry features
 
Finite element analysis sdfa ggq rsd vqer fas dd sg fa sd qadas casdasc asdac...
Finite element analysis sdfa ggq rsd vqer fas dd sg fa sd qadas casdasc asdac...Finite element analysis sdfa ggq rsd vqer fas dd sg fa sd qadas casdasc asdac...
Finite element analysis sdfa ggq rsd vqer fas dd sg fa sd qadas casdasc asdac...
 
tw1979 Exercise 1 Report
tw1979 Exercise 1 Reporttw1979 Exercise 1 Report
tw1979 Exercise 1 Report
 
Chapter#8
Chapter#8Chapter#8
Chapter#8
 
Statistics with Computer Applications
Statistics with Computer ApplicationsStatistics with Computer Applications
Statistics with Computer Applications
 
Finite element analysis of shape function
Finite element analysis of shape functionFinite element analysis of shape function
Finite element analysis of shape function
 
Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...
 
10.1.1.630.8055
10.1.1.630.805510.1.1.630.8055
10.1.1.630.8055
 
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationApplied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
 
Building 3D Morphable Models from 2D Images
Building 3D Morphable Models from 2D ImagesBuilding 3D Morphable Models from 2D Images
Building 3D Morphable Models from 2D Images
 
ECE 3rd_Unit No. 1_K-Map_DSD.ppt
ECE 3rd_Unit No. 1_K-Map_DSD.pptECE 3rd_Unit No. 1_K-Map_DSD.ppt
ECE 3rd_Unit No. 1_K-Map_DSD.ppt
 
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGCOMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 

Recently uploaded

如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样wsppdmt
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptTanveerAhmed817946
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSSnehalVinod
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIf6x4zqzk86
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 

Recently uploaded (20)

如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 

Introduction to pca v2

  • 1. Introduction  to  PCA   Christian  Zuniga,  PhD   Friday,  November  8,  2019      Principal  component  analysis  (PCA)  is  an  unsupervised,  linear  technique  for   dimensionality  reduction  first  developed  by  Pearson  in  19011,2,3.  It  is  widely  used  in   many  areas  of  data  mining  such  as  visualization,  image  processing  and  anomaly   detection.  It  is  based  on  the  fact  that  data  may  have  redundancies  in  its   representation.  Data  refers  to  a  collection  of  similar  objects  and  their  features.  An   object  could  be  a  house  and  the  features  the  location,  the  number  of  bedrooms,  the   square  footage,  and  any  other  characteristic  that  can  be  recorded  of  the  house.  In   PCA  analysis,  redundancy  in  the  data  refers  to  linear  correlation  among  features.   Knowledge  of  one  feature  reveals  some  knowledge  of  another  feature.  PCA  may  use   this  redundancy  to  form  a  smaller  set  of  features,  called  principal  components  that   can  approximate  well  the  data.       Figure  1  shows  the  general  idea.  The  data  is  represented  as  a  matrix  X  with  N   objects  (like  houses)  and  F  features  (like  square  footage).  PCA  linearly  transforms   the  features  into  a  new  set  and  retains  the  G  most  relevant  features  where  G  <  F.  The   new  features  are  called  the  principal  components.  The  new  data  matrix  Y  is  Y  =  PX,   where  P  is  a  G  by  F  projection  matrix.    The  first  principal  component  captures  most   of  the  variance  of  the  data.  Each  additional  principal  component  is  made  to  capture   the  remaining  variance  and  is  uncorrelated  or  orthogonal  to  other  principal   components.       Figure  1  PCA  transforms  a  data  matrix  into  a  new  one  with  fewer  features.         The  cars  dataset  from  UC  Irvine  will  be  used  as  an  example4.    This  set  contains  9   features  for  392  cars  of  various  makes  and  models.    Figure  2  shows  two  sample   features,  ‘acceleration’  plotted  vs.  ‘horsepower’.    Acceleration  is  given  in  the  time   taken  for  a  car  to  accelerate  from  0  to  60  mph.  The  figure  shows  the  two  features   have  opposite  trends,  or  are  negatively  correlated.  This  is  not  surprising  since   higher  horsepower  should  result  in  smaller  times.      
  • 2.   Figure  2  Two  features  of  car  data  set  shows  the  data  is  concentrated  along  a  line  P1.     Figure  2  shows  that  most  of  the  variation  of  the  features  is  concentrated  along  a  line   labeled  ‘P1’.    The  remainder  of  the  variation  is  along  a  second  line  labeled  ‘P2’.  The   lines  can  be  characterized  by  unit  vectors  vj  =  [a1j,  a2j]  (j=1,2)  that  give  the  lines’   orientations.  The  lines’  displacements  from  the  origin  do  not  matter  since  the  data   will  later  be  centered  at  zero.  Each  point  represents  a  car  and  can  also  be   represented  by  a  vector  xi  =  [horsepoweri,  accelerationi],  where  the  subscript   corresponds  to  the  ith  car.  Each  point  can  be  projected  onto  a  line  Pj  by  the  inner   product  of  vj  and  xi  as  shown  in  Figure  3.         𝑝!" = 𝑎!! 𝑎𝑐𝑐𝑒𝑙𝑒𝑟𝑎𝑡𝑖𝑜𝑛! + 𝑎!!ℎ𝑜𝑟𝑠𝑒𝑝𝑜𝑤𝑒𝑟!         Figure  3  Projecting  a  point  onto  line  P1.     This  new  feature  p1  is  the  first  principal  component  and  is  a  linear  combination  of   the  original  two  features,  horsepower  and  acceleration.    In  general  it  will  not  have  a   more  descriptive  name  but  one  could  be  given  to  clarify  the  concept.  One  option  is  to   think  of  the  combination  of  ‘horsepower’  and  ‘acceleration’  as  the  ‘performance’  of   the  car.      
  • 3. The  question  is  then  how  to  find  the  coefficients  a11,  and  a21  of  vector  v1,  which  gives   the  direction  of  the  best-­‐fit  line  P1.  This  line  should  be  as  close  to  all  points  as   possible,  minimizing  the  average  distance  J  to  all  the  points.       𝐽 = 1 𝑁 𝑑! ! ! !!!     The  solution  lies  in  the  covariance  matrix  of  the  features  SX.  Specifically;  the   eigenvectors  of  SX  give  the  required  vectors  v1  and  v2.  To  calculate  the  covariance   matrix,  the  mean  of  each  feature  is  subtracted  from  each  row.  To  put  the  features  on   a  similar  scale,  they  should  also  be  divided  by  their  standard  deviation.  This  is  done   to  prevent  the  analysis  from  capturing  uninteresting  directions  in  the  data.  After   this  preprocessing  of  the  data  matrix  X,  the  covariance  matrix  is  and  F  by  F  matrix:     𝑺! = 1 𝑁 𝑿𝑿!     The  vector  v1  corresponds  to  the  eigenvector  with  largest  eigenvalue  λM.  Vector  v1   corresponds  to  the  second  smaller  eigenvalue  λm  (λm  <  λM).     𝑺 𝑿 𝒗 𝟏 = 𝜆! 𝒗 𝟏             𝑺 𝑿 𝒗 𝟐 = 𝜆! 𝒗 𝟐     For  the  car  data  set,  using  only  the  features  ‘acceleration’  and  ‘horsepower’,  the   covariance  matrix  is.     𝑺 𝑿 = 1 −0.69 −0.69 1     The  off-­‐diagonal  term  -­‐0.69  shows  the  cross-­‐covariance  between  horsepower  and   acceleration,  which  is  negative  as  implied  by  Figure  1.  The  diagonal  terms  show  the   auto-­‐covariance  of  each  feature  and  have  value  1  because  of  the  pre-­‐scaling.       Using  any  linear  algebra  solver  readily  gives  the  eigenvectors  of  the  covariance   matrix.  The  eigenvalues  are  (1.69,  0.31).  Their  sum  is  the  total  variance,  which  is  2.   Vector  v1  is  [0.707,  -­‐0.707]  and  captures  84.5%  of  the  total  variance  (1.69/2).  Figure   4  shows  the  resulting  directions  of  v1  and  v2.  
  • 4.   Figure  4  Rescaled  data  with  the  direction  of  the  2  principal  components.   The  projection  matrix  P  can  be  made  with  v1  and  v2  as  rows.  If  both  vectors  are   kept,  there  is  no  loss  in  representation.  The  new  representation  would  have  a   covariance  matrix.     𝑺! = 1 𝑁 𝒀𝒀!   𝑺! = 1 𝑁 𝑷𝑿 𝑷𝑿 ! = 𝑷𝑺 𝑿 𝑷! = 𝚲     Since  P  has  the  eigenvectors  of  Sx  as  rows,  the  right  hand  side  results  in  a  diagonal   matrix  of  the  eigenvalues  of  Sx.       𝑆! = 1.69 0 0 0.31     For  dimensionality  reduction,  only  v1  would  be  used.    The  new  representation,   matrix  Y,  would  have  a  single  feature,  the  first  principle  component  (Y  =  v1X).     PCA  can  be  applied  to  any  number  of  features.  The  car  data  set  has  the  additional   features:  ‘cylinders’,  ‘displacement’,  and  ‘weight’  for  a  total  of  5  features.  Other   features  are  categorical  and  one  ‘mpg’  is  usually  the  target  variable  of  interest.  The   same  process  can  be  done  to  obtain  the  principal  components.  Each  principal   component  is  a  linear  combination  of  these  5  features.  Standard  scaling  is  applied  on   the  features  before  making  the  covariance  matrix  Sx  (now  a  5  by  5  matrix).    The   eigenvalues  and  eigenvectors  are  found  and  the  eigenvectors  used  to  make  the   projection  matrix  P.  Figure  5  shows  the  percentage  of  variance  explained  by  each   component.  Again  the  first  component  captures  over  80%  of  the  total  variance.     Instead  of  5  features,  1-­‐2  principal  components  may  be  enough  for  various   purposes.    
  • 5.   Figure  5  Percentage  of  variance  explained  by  each  principal  component.     For  example,  Figure  6  shows  a  plot  of  principal  component  2  vs.  principal   component  1.  Together  they  capture  about  96%  of  the  total  variance.  The  figure   shows  there  may  be  3  clusters,  or  groups  of  cars.  These  may  correspond  to  different   types  of  cars  such  as  sports  cars,  sedans,  and  trucks.  Confirming  with  the  make  of   the  car  might  clarify  this.  A  clustering  algorithm  like  k-­‐means  may  be  applied  to   quantify  the  clusters.  This  shows  a  common  application  of  PCA  in  dimensionality   reduction  where  fewer  features  help  with  many  machine  learning  algorithms.       Figure  6  Principal  Component  2  vs.  1  indicates  there  may  be  around  3  clusters.     To  summarize,  PCA  is  a  linear,  dimensionality  reduction  technique  that  forms  new   features  using  linear  combinations  of  the  original.  These  new  features,  the  principal   components,  maximize  the  total  variance  capture  and  are  uncorrelated  with  each   other.  The  eigenvectors  of  the  covariance  matrix  are  used  to  transform  the  data   matrix.  In  practice  if  there  are  many  features,  forming  the  covariance  matrix  may  be   computationally  expensive  and  an  SVD  of  the  data  matrix  is  used.       [1]  Gilbert  Strang  “Linear  Algebra  and  Learning  from  Data”  Wellesley  Cambridge   Press  2019  
  • 6. [2]  Deisenroth,  et.  Al.  “Mathematics  for  Machine  Learning”  to  be  published   Cambridge  University  Press.  https://mml-­‐book.com   [3]  Shlens  “A  Tutorial  on  Principal  Component  Analysis”  2014   https://arxiv.org/abs/1404.1100   [4]  https://archive.ics.uci.edu/ml/datasets/car+evaluation