3 principal components analysis

•Download as PPTX, PDF•

7 likes•35,400 views

This document discusses using principal component analysis (PCA) to analyze metabolomic sample data from pumpkin experiments. PCA was performed on the raw data and scaled data to identify major sources of variance. For the raw data, the first two principal components captured most of the variance and separated samples by extraction method and treatment. Several samples were identified as potential outliers. When PCA was done on autoscaled data, the loadings showed differences due to both extraction and treatment. The scaled analysis also identified some outlier samples.

Biology

Chemistry

Principal Components Analysis

Informatics

Principal Component Analysis
(PCA) of metabolomic sample
processing methods

Goal:
Use PCA to identify the major modes of variance
(Used DATA: Pumpkin data 1.csv)
Topics:
1. Principal component number selection
2. Data pretreatment
3. PCA results visualization

Principal Components Analysis

Biology

Chemistry

Principal Components Analysis

Informatics

Steps
1. Calculate a PCA model
2. Select optimal model principal component (PC)
3. Overview PCA scores and loadings plots
4. Repeat steps 1-2 using data centering and scaling
Visualize:
1. Sample scores annotated by extraction and treatment
2. Leverage and DmodX (distance from model plane)
3. Variable loadings and biplots
Exercise:
1. How many PCs are needed to capture 80% variance for raw data and
scaled data?
2. Are their any moderate or extreme outliers?
3. What variables contribute most to the variance for raw and scaled data?

Biology

Chemistry
Informatics

PCA Variance Explained
(raw data)
•

Principal Components Analysis

•

PCs can be selected to
explain a minimum
%variance in the data
(~80%)
PCs explaining below
1% variance can be
excluded

•

q2 is the crossvalidated PCA
prediction of left out
data

PCA Scores (raw data)
Biology

Chemistry

Principal Components Analysis

Informatics

• Hotelling's T2 ellipse
shows 95% CI for
bivariate normal
distribution
• Samples lying outside
of the ellipse could be
outliers

PCA Loadings (raw data, centered)
Biology

Chemistry

Principal Components Analysis

Informatics

• Unscaled data PCA
loadings are highly
correlated with
magnitude

PCA Biplot (raw data)
Biology

Chemistry

Sample contains high maleic acid

Principal Components Analysis

Informatics

• Biplots can be used to
rapidly overview the
correlation between
sample scores and
variable loadings
Sample contains high sucrose (low maleic acid)

Biology

Chemistry

PCA Leverage and DmodX
(raw data)

Principal Components Analysis

Informatics

Leverage is the
distance to samples
center in the PCA plane
(extreme outliers)
Distance to model X
(DmodX) is the
orthogonal distance to
the PCA plane
(moderate outliers)

Biology

Chemistry

Principal Components Analysis

Informatics

PCA Variance Explained
(autoscaled)

PCA Scores (autoscaled)
Biology

Chemistry

Principal Components Analysis

Informatics

• Loadings on PC1
describe differences
due to extraction
• Loadings on PC2
describe differences
due to treatment

Biology

Chemistry

Principal Components Analysis

Informatics

PCA Leverage and DmodX
(autoscaled)
• Samples with both high
leverage and DomdX are likely
outliers
• Evaluate PCA results after their
removal

PCA Loadings (autoscaled)
Biology

Chemistry

Principal Components Analysis

Informatics

• Scaled loadings are
independent of
variable magnitude
and show a rich
variance structure of
the data

Biology

Chemistry

Relationship between scores and
loadings (autoscaled)

Informatics

Principal Components Analysis

Higher in
100%
MeOH

Lower in
100%
MeOH

Extraction

Loadings and Scores
Biology

Chemistry
Informatics

Principal Components Analysis

Highest negative
loading on PC1

Highest positive
loading on PC1

What's hot

Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov

Data analysis workflows part 2 2015Dmitry Grapov

Advanced strategies for Metabolomics Data AnalysisDmitry Grapov

Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov

4 partial least squares modelingDmitry Grapov

Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov

Strategies for Metabolomics Data AnalysisDmitry Grapov

Mapping to the Metabolomic ManifoldDmitry Grapov

Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov

Multivariate data analysis and visualization tools for biological dataDmitry Grapov

Metabolomic data analysis and visualization toolsDmitry Grapov

2 cluster analysisDmitry Grapov

Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov

3 data normalization (2014 lab tutorial)Dmitry Grapov

6 metabolite enrichment analysisDmitry Grapov

Omic Data Integration StrategiesDmitry Grapov

5 data analysis case studyDmitry Grapov

Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov

Data analysis workflows part 1 2015Dmitry Grapov

The International Journal of Engineering and Science (The IJES)theijes

What's hot (20)

Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses

Data analysis workflows part 2 2015

Advanced strategies for Metabolomics Data Analysis

Normalization of Large-Scale Metabolomic Studies 2014

4 partial least squares modeling

Data Normalization Approaches for Large-scale Biological Studies

Strategies for Metabolomics Data Analysis

Mapping to the Metabolomic Manifold

Automation of (Biological) Data Analysis and Report Generation

Multivariate data analysis and visualization tools for biological data

Metabolomic data analysis and visualization tools

2 cluster analysis

Case Study: Overview of Metabolomic Data Normalization Strategies

3 data normalization (2014 lab tutorial)

6 metabolite enrichment analysis

Omic Data Integration Strategies

5 data analysis case study

Complex Systems Biology Informed Data Analysis and Machine Learning

Data analysis workflows part 1 2015

The International Journal of Engineering and Science (The IJES)

Similar to 3 principal components analysis

Lecture 9 molecular descriptorsRAJAN ROLTA

ADMET.pptxSantu Chall

Prediction of pKa from chemical structure using free and open source toolsUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Multivariate Analysis and Visualization of Proteomic DataUC Davis

Webinar: How to Develop a Regulatory-compliant Continued Process Verificatio...MilliporeSigma

Webinar: How to Develop a Regulatory-compliant Continued Process Verification...Merck Life Sciences

Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim

Data handling metabolomicsShruthi Shree Gandhi

Final presentation dwi riyonoDwi Riyono

ARCHIVED: new version available - METASPACE Step by Step guideMETASPACE

Points to Consider in QC Method Validation and Transfer for Biological ProductsWeijun Li

The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri

PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3QIAGEN

Too good to be true? How validate your dataAlex Henderson

Development and Validation of a RP-HPLC methodUshaKhanal3

Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series - Late...Chromatography & Mass Spectrometry Solutions

презентация за варшаваValeriya Simeonova

Aqbd seminar DOEDimitris Papamatthaiakis

Pa nalyticals high_score_suite_brochureNhut Duong

Distributed approach for Peptide Identificationabhinav vedanbhatla

Similar to 3 principal components analysis (20)

Lecture 9 molecular descriptors

ADMET.pptx

Prediction of pKa from chemical structure using free and open source tools

Multivariate Analysis and Visualization of Proteomic Data

Webinar: How to Develop a Regulatory-compliant Continued Process Verificatio...

Webinar: How to Develop a Regulatory-compliant Continued Process Verification...

Using open bioactivity data for developing machine-learning prediction models...

Data handling metabolomics

Final presentation dwi riyono

ARCHIVED: new version available - METASPACE Step by Step guide

Points to Consider in QC Method Validation and Transfer for Biological Products

The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...

PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3

Too good to be true? How validate your data

Development and Validation of a RP-HPLC method

Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series - Late...

презентация за варшава

Aqbd seminar DOE

Pa nalyticals high_score_suite_brochure

Distributed approach for Peptide Identification

More from Dmitry Grapov

R programming for Data Science - A Beginner’s GuideDmitry Grapov

Network mapping 101 courseDmitry Grapov

Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov

Dmitry Grapov Resume and CVDmitry Grapov

Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov

Modeling posterDmitry Grapov

Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov

American Society of Mass Spectrommetry Conference 2014Dmitry Grapov

More from Dmitry Grapov (8)

R programming for Data Science - A Beginner’s Guide

Network mapping 101 course

Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...

Dmitry Grapov Resume and CV

Machine Learning Powered Metabolomic Network Analysis

Modeling poster

Gene Ontology Enrichment Network Analysis -Tutorial

American Society of Mass Spectrommetry Conference 2014

3 principal components analysis

1. Biology Chemistry Principal Components Analysis Informatics Principal Component Analysis (PCA) of metabolomic sample processing methods Goal: Use PCA to identify the major modes of variance (Used DATA: Pumpkin data 1.csv) Topics: 1. Principal component number selection 2. Data pretreatment 3. PCA results visualization

2. Principal Components Analysis Biology Chemistry Principal Components Analysis Informatics Steps 1. Calculate a PCA model 2. Select optimal model principal component (PC) 3. Overview PCA scores and loadings plots 4. Repeat steps 1-2 using data centering and scaling Visualize: 1. Sample scores annotated by extraction and treatment 2. Leverage and DmodX (distance from model plane) 3. Variable loadings and biplots Exercise: 1. How many PCs are needed to capture 80% variance for raw data and scaled data? 2. Are their any moderate or extreme outliers? 3. What variables contribute most to the variance for raw and scaled data?

3. Biology Chemistry Informatics PCA Variance Explained (raw data) • Principal Components Analysis • PCs can be selected to explain a minimum %variance in the data (~80%) PCs explaining below 1% variance can be excluded • q2 is the crossvalidated PCA prediction of left out data

4. PCA Scores (raw data) Biology Chemistry Principal Components Analysis Informatics • Hotelling's T2 ellipse shows 95% CI for bivariate normal distribution • Samples lying outside of the ellipse could be outliers

5. PCA Loadings (raw data, centered) Biology Chemistry Principal Components Analysis Informatics • Unscaled data PCA loadings are highly correlated with magnitude

6. PCA Biplot (raw data) Biology Chemistry Sample contains high maleic acid Principal Components Analysis Informatics • Biplots can be used to rapidly overview the correlation between sample scores and variable loadings Sample contains high sucrose (low maleic acid)

7. Biology Chemistry PCA Leverage and DmodX (raw data) Principal Components Analysis Informatics Leverage is the distance to samples center in the PCA plane (extreme outliers) Distance to model X (DmodX) is the orthogonal distance to the PCA plane (moderate outliers)

8. Biology Chemistry Principal Components Analysis Informatics PCA Variance Explained (autoscaled)

9. PCA Scores (autoscaled) Biology Chemistry Principal Components Analysis Informatics • Loadings on PC1 describe differences due to extraction • Loadings on PC2 describe differences due to treatment

10. Biology Chemistry Principal Components Analysis Informatics PCA Leverage and DmodX (autoscaled) • Samples with both high leverage and DomdX are likely outliers • Evaluate PCA results after their removal

11. PCA Loadings (autoscaled) Biology Chemistry Principal Components Analysis Informatics • Scaled loadings are independent of variable magnitude and show a rich variance structure of the data

12. Biology Chemistry Relationship between scores and loadings (autoscaled) Informatics Principal Components Analysis Higher in 100% MeOH Lower in 100% MeOH Extraction

13. Loadings and Scores Biology Chemistry Informatics Principal Components Analysis Highest negative loading on PC1 Highest positive loading on PC1

3 principal components analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 3 principal components analysis

Similar to 3 principal components analysis (20)

More from Dmitry Grapov

More from Dmitry Grapov (8)

3 principal components analysis