Data Transformation
Summer Data Jam
Chris Orwa
14th July 2015
Principal Component Analysis
Principal component analysis (PCA) is a technique used
to emphasize variation and bring out strong patterns in a
dataset. It's often used to make data easy to explore and
visualize.
Statistically, PCA is the eigenvectors of a covariance
matrix.
Let us Look at Some Concepts
Covariance
The covariance of two variables x and y in a data sample
measures how the variance of two attributes are related.
R code
duration = faithful$eruptions
waiting = faithful$waiting
cov(duration, waiting)
Covariance Matrix
Eigen Vectors
Eigenvector is a vector of a square matrix that points in a
direction invariant under the associated linear
transformation.
R code
B <- matrix(1:9, 3)
eigen(B)
Principal Component Analysis
R Code
#load data
a = read.csv(‘my_data.csv')
#perform PCA
c = prcomp(a)

Data transformation

  • 1.
    Data Transformation Summer DataJam Chris Orwa 14th July 2015
  • 2.
    Principal Component Analysis Principalcomponent analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. Statistically, PCA is the eigenvectors of a covariance matrix.
  • 3.
    Let us Lookat Some Concepts Covariance The covariance of two variables x and y in a data sample measures how the variance of two attributes are related. R code duration = faithful$eruptions waiting = faithful$waiting cov(duration, waiting)
  • 4.
  • 5.
    Eigen Vectors Eigenvector isa vector of a square matrix that points in a direction invariant under the associated linear transformation. R code B <- matrix(1:9, 3) eigen(B)
  • 6.
    Principal Component Analysis RCode #load data a = read.csv(‘my_data.csv') #perform PCA c = prcomp(a)