SlideShare a Scribd company logo
Data Transformation
Summer Data Jam
Chris Orwa
14th July 2015
Principal Component Analysis
Principal component analysis (PCA) is a technique used
to emphasize variation and bring out strong patterns in a
dataset. It's often used to make data easy to explore and
visualize.
Statistically, PCA is the eigenvectors of a covariance
matrix.
Let us Look at Some Concepts
Covariance
The covariance of two variables x and y in a data sample
measures how the variance of two attributes are related.
R code
duration = faithful$eruptions
waiting = faithful$waiting
cov(duration, waiting)
Covariance Matrix
Eigen Vectors
Eigenvector is a vector of a square matrix that points in a
direction invariant under the associated linear
transformation.
R code
B <- matrix(1:9, 3)
eigen(B)
Principal Component Analysis
R Code
#load data
a = read.csv(‘my_data.csv')
#perform PCA
c = prcomp(a)

More Related Content

Similar to Data transformation

Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
EshanAgarwal4
 
PCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptxPCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptx
TechohiT
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
Shruti Mohan
 
Correation, Linear Regression and Multilinear Regression using R software
Correation, Linear Regression and Multilinear Regression using R softwareCorreation, Linear Regression and Multilinear Regression using R software
Correation, Linear Regression and Multilinear Regression using R software
shrikrishna kesharwani
 
Correlation and regression in r
Correlation and regression in rCorrelation and regression in r
Correlation and regression in r
Dr.K.Sreenivas Rao
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
gitikasingh2004
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
Dr. Radhey Shyam
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
Predicating continuous variables-1.pptx
Predicating continuous  variables-1.pptxPredicating continuous  variables-1.pptx
Predicating continuous variables-1.pptx
luckyanirudhsai
 
Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в py...
Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в py...Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в py...
Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в py...
Анастасия Чопко
 
Data Visualization Tools in Python
Data Visualization Tools in PythonData Visualization Tools in Python
Data Visualization Tools in Python
Roman Merkulov
 
Saif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submittedSaif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submitted
Saif Kabir, P.Eng., PMP® , M.A.Sc(ECE)
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
Dmitry Grapov
 
[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation
YONG ZHENG
 
Ch14 data exploration (i)
Ch14 data exploration (i)Ch14 data exploration (i)
Ch14 data exploration (i)
Mingxuan Zhuo
 
Ch14 data exploration (i)
Ch14 data exploration (i)Ch14 data exploration (i)
Ch14 data exploration (i)
Mingxuan Zhuo
 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
Dai-Hai Nguyen
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
BeyaNasr1
 
An introduction to R
An introduction to RAn introduction to R
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 

Similar to Data transformation (20)

Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
PCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptxPCACONFUSIONMATRIX.pptx
PCACONFUSIONMATRIX.pptx
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Correation, Linear Regression and Multilinear Regression using R software
Correation, Linear Regression and Multilinear Regression using R softwareCorreation, Linear Regression and Multilinear Regression using R software
Correation, Linear Regression and Multilinear Regression using R software
 
Correlation and regression in r
Correlation and regression in rCorrelation and regression in r
Correlation and regression in r
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
Predicating continuous variables-1.pptx
Predicating continuous  variables-1.pptxPredicating continuous  variables-1.pptx
Predicating continuous variables-1.pptx
 
Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в py...
Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в py...Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в py...
Роман Меркулов. In data labs. Прикладные инструменты визуализации данных в py...
 
Data Visualization Tools in Python
Data Visualization Tools in PythonData Visualization Tools in Python
Data Visualization Tools in Python
 
Saif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submittedSaif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submitted
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation
 
Ch14 data exploration (i)
Ch14 data exploration (i)Ch14 data exploration (i)
Ch14 data exploration (i)
 
Ch14 data exploration (i)
Ch14 data exploration (i)Ch14 data exploration (i)
Ch14 data exploration (i)
 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
An introduction to R
An introduction to RAn introduction to R
An introduction to R
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 

More from Chris Orwa

Location intelligence
Location intelligenceLocation intelligence
Location intelligence
Chris Orwa
 
Finding Deep Structures in Data
Finding Deep Structures in DataFinding Deep Structures in Data
Finding Deep Structures in Data
Chris Orwa
 
What Makes a Good Data story
What Makes a Good Data storyWhat Makes a Good Data story
What Makes a Good Data story
Chris Orwa
 
Introduction to Data Analysis and Visualisation
Introduction to Data Analysis and VisualisationIntroduction to Data Analysis and Visualisation
Introduction to Data Analysis and Visualisation
Chris Orwa
 
Solving problems with machine learning
Solving problems with machine learningSolving problems with machine learning
Solving problems with machine learning
Chris Orwa
 
The R Debate
The R DebateThe R Debate
The R Debate
Chris Orwa
 
Deep learning
Deep learningDeep learning
Deep learning
Chris Orwa
 
Loops in R
Loops in RLoops in R
Loops in R
Chris Orwa
 
Software Engineering in Data Science
Software Engineering in Data ScienceSoftware Engineering in Data Science
Software Engineering in Data Science
Chris Orwa
 
County Ranking Index
County Ranking Index County Ranking Index
County Ranking Index
Chris Orwa
 
ICCM 2013 : Building Smart Filters for Election Crowdsourcing
ICCM 2013 : Building Smart Filters for Election CrowdsourcingICCM 2013 : Building Smart Filters for Election Crowdsourcing
ICCM 2013 : Building Smart Filters for Election Crowdsourcing
Chris Orwa
 
Open Data
Open DataOpen Data
Open Data
Chris Orwa
 
Rotaract
RotaractRotaract
Rotaract
Chris Orwa
 
Rotaract
RotaractRotaract
Rotaract
Chris Orwa
 

More from Chris Orwa (14)

Location intelligence
Location intelligenceLocation intelligence
Location intelligence
 
Finding Deep Structures in Data
Finding Deep Structures in DataFinding Deep Structures in Data
Finding Deep Structures in Data
 
What Makes a Good Data story
What Makes a Good Data storyWhat Makes a Good Data story
What Makes a Good Data story
 
Introduction to Data Analysis and Visualisation
Introduction to Data Analysis and VisualisationIntroduction to Data Analysis and Visualisation
Introduction to Data Analysis and Visualisation
 
Solving problems with machine learning
Solving problems with machine learningSolving problems with machine learning
Solving problems with machine learning
 
The R Debate
The R DebateThe R Debate
The R Debate
 
Deep learning
Deep learningDeep learning
Deep learning
 
Loops in R
Loops in RLoops in R
Loops in R
 
Software Engineering in Data Science
Software Engineering in Data ScienceSoftware Engineering in Data Science
Software Engineering in Data Science
 
County Ranking Index
County Ranking Index County Ranking Index
County Ranking Index
 
ICCM 2013 : Building Smart Filters for Election Crowdsourcing
ICCM 2013 : Building Smart Filters for Election CrowdsourcingICCM 2013 : Building Smart Filters for Election Crowdsourcing
ICCM 2013 : Building Smart Filters for Election Crowdsourcing
 
Open Data
Open DataOpen Data
Open Data
 
Rotaract
RotaractRotaract
Rotaract
 
Rotaract
RotaractRotaract
Rotaract
 

Recently uploaded

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 

Recently uploaded (20)

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 

Data transformation

  • 1. Data Transformation Summer Data Jam Chris Orwa 14th July 2015
  • 2. Principal Component Analysis Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. Statistically, PCA is the eigenvectors of a covariance matrix.
  • 3. Let us Look at Some Concepts Covariance The covariance of two variables x and y in a data sample measures how the variance of two attributes are related. R code duration = faithful$eruptions waiting = faithful$waiting cov(duration, waiting)
  • 5. Eigen Vectors Eigenvector is a vector of a square matrix that points in a direction invariant under the associated linear transformation. R code B <- matrix(1:9, 3) eigen(B)
  • 6. Principal Component Analysis R Code #load data a = read.csv(‘my_data.csv') #perform PCA c = prcomp(a)