SlideShare a Scribd company logo
1 of 10
Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

Partial Least Squares (O-/PLS/-DA)
modeling of metabolomic sample
processing methods

Goal:
Use PLS to identify metabolites which best
discriminate (most different between) sample
processing methods
(Used DATA: Pumpkin data 1.csv)
Topics:
1.Model Selection
2.Results visualization
3.Feature Selection
4.Validation
Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

Partial Least Squares Modeling
Discriminant Analysis (PLS-DA)

Steps
1.Calculate a single Y PLS model to discriminate between extraction/treatment
2.Select optimal scaling and model latent variable (LV) number
3.Overview PLS scores and loadings plots
4.Validate model
5.Repeat steps 1-4 for an O-PLS model
Visualize:
1.Sample scores annotated by extraction and treatment
2.Variable loadings plot
Exercise:
1.How are scores different between 1-Y PLS, 1-Y O-PLS, 2-Y O-PLS?
2.Are there any moderate or extreme outliers?
3.What variables contribute most to the differences between treatments?
Model Performance

Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

RMSEP - root mean
squared error of
prediction (fit to left
out set)
Q2 - cross-validated fit
to the training data
Xvar- explained
variance in X
(measurements)
Model Validation

Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

single model
performance estimates

Model properties and
validation settings

Based on training/test
splitting
Based on training/test
splitting and permuted Y
Significance of difference between model performance
and permuted NULL distributions
Biology

Chemistry
Informatics

Model Scores single Y
(extraction_treatment)

Partial Least Squares (O-/PLS/-DA)

Treatment

Extraction

Variance in extraction
dominates model
Biology

Chemistry
Informatics

Model Scores
Comparison

Partial Least Squares (O-/PLS/-DA)

1-Y PLS

treatment

extraction

1-Y O-PLS

2-Y O-PLS
Top Discriminants

Biology

Fatty acids, lipids

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

2-Y O-PLS
treatment

extraction
Sugar phosphates, polyols
Feature Selection

Biology

Chemistry

Steps
1.Identify top 10% of discriminants for extraction based on metabolite
• correlation with model scores
• importance (loading, coeffcient, VIP)

Visualize:
1.Feature selection diagnostics

correlation

Partial Least Squares (O-/PLS/-DA)

Informatics

Loading (LV1)
Feature Selection for Extraction

Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

1

2

higher in 3
3

higher in 1
Summary

Biology

Chemistry

Partial Least Squares (O-/PLS/-DA)

Informatics

Modeling Strategy (advanced)
1. Fit preliminary model
2. Evaluate of scores/loadings
3. Remove outliers
4. split data into test and train set (optional)
5. model selection (LV, OLV, etc)
6. internal (training set) model validation (permutation testing,
training/testing)
7. Feature selection (optional)
•

Comparison of selected to excluded feature models

8. External model validation (test set)

More Related Content

What's hot

Applications of Genomic and Proteomic Tools
Applications of Genomic and Proteomic ToolsApplications of Genomic and Proteomic Tools
Applications of Genomic and Proteomic ToolsRaju Paudel
 
toxicokinetics and saturation kinetics
toxicokinetics and saturation kineticstoxicokinetics and saturation kinetics
toxicokinetics and saturation kineticspharmacistnitish
 
screening model for parkinsons disease
screening model for parkinsons diseasescreening model for parkinsons disease
screening model for parkinsons diseaseKundlik Rathod
 
High throughput screening
High throughput screeningHigh throughput screening
High throughput screeningJeremy Ogbadu
 
2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORSSmita Jain
 
SlideShare on Traditional drug design methods
 SlideShare on Traditional drug design methods  SlideShare on Traditional drug design methods
SlideShare on Traditional drug design methods Naveen K L
 
Pharmacophore mapping
Pharmacophore mapping Pharmacophore mapping
Pharmacophore mapping GamitKinjal
 
Assignment on Limitation of animal experimentation
Assignment on Limitation of animal experimentationAssignment on Limitation of animal experimentation
Assignment on Limitation of animal experimentationDeepak Kumar
 
Rational drug design method
Rational drug design methodRational drug design method
Rational drug design methodRangnathChikane
 
economics of drug discovery.pptx
economics of drug discovery.pptxeconomics of drug discovery.pptx
economics of drug discovery.pptxTamannaKumari8
 
Partial Least Square model.pdf
Partial Least Square model.pdfPartial Least Square model.pdf
Partial Least Square model.pdfbhaskarpathak15
 
Genetic variations in gpcr
Genetic variations in gpcrGenetic variations in gpcr
Genetic variations in gpcrMeenakshi Gupta
 
Molecular Modeling and virtual screening techniques
Molecular Modeling and virtual screening techniquesMolecular Modeling and virtual screening techniques
Molecular Modeling and virtual screening techniquesTheabhi.in
 
Target discovery and validation
Target discovery and validation Target discovery and validation
Target discovery and validation ANAND SAGAR TIWARI
 
Role of nuclicacid microarray &protein micro array for drug discovery process
Role of nuclicacid microarray &protein micro array for drug discovery processRole of nuclicacid microarray &protein micro array for drug discovery process
Role of nuclicacid microarray &protein micro array for drug discovery processmohamed abusalih
 
General principles of preclinical screening
General principles of preclinical screeningGeneral principles of preclinical screening
General principles of preclinical screeningpradnya Jagtap
 

What's hot (20)

Applications of Genomic and Proteomic Tools
Applications of Genomic and Proteomic ToolsApplications of Genomic and Proteomic Tools
Applications of Genomic and Proteomic Tools
 
toxicokinetics and saturation kinetics
toxicokinetics and saturation kineticstoxicokinetics and saturation kinetics
toxicokinetics and saturation kinetics
 
screening model for parkinsons disease
screening model for parkinsons diseasescreening model for parkinsons disease
screening model for parkinsons disease
 
High throughput screening
High throughput screeningHigh throughput screening
High throughput screening
 
2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS
 
SlideShare on Traditional drug design methods
 SlideShare on Traditional drug design methods  SlideShare on Traditional drug design methods
SlideShare on Traditional drug design methods
 
3d qsar
3d qsar3d qsar
3d qsar
 
Pharmacophore mapping
Pharmacophore mapping Pharmacophore mapping
Pharmacophore mapping
 
Assignment on Limitation of animal experimentation
Assignment on Limitation of animal experimentationAssignment on Limitation of animal experimentation
Assignment on Limitation of animal experimentation
 
Rational drug design method
Rational drug design methodRational drug design method
Rational drug design method
 
Denovo Drug Design
Denovo Drug DesignDenovo Drug Design
Denovo Drug Design
 
economics of drug discovery.pptx
economics of drug discovery.pptxeconomics of drug discovery.pptx
economics of drug discovery.pptx
 
Partial Least Square model.pdf
Partial Least Square model.pdfPartial Least Square model.pdf
Partial Least Square model.pdf
 
Genetic variations in gpcr
Genetic variations in gpcrGenetic variations in gpcr
Genetic variations in gpcr
 
Molecular Modeling and virtual screening techniques
Molecular Modeling and virtual screening techniquesMolecular Modeling and virtual screening techniques
Molecular Modeling and virtual screening techniques
 
Target discovery and validation
Target discovery and validation Target discovery and validation
Target discovery and validation
 
De novo drug design
De novo drug designDe novo drug design
De novo drug design
 
Role of nuclicacid microarray &protein micro array for drug discovery process
Role of nuclicacid microarray &protein micro array for drug discovery processRole of nuclicacid microarray &protein micro array for drug discovery process
Role of nuclicacid microarray &protein micro array for drug discovery process
 
General principles of preclinical screening
General principles of preclinical screeningGeneral principles of preclinical screening
General principles of preclinical screening
 
Screening of antiparkinsonian agents
Screening of antiparkinsonian agentsScreening of antiparkinsonian agents
Screening of antiparkinsonian agents
 

Viewers also liked

5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 

Viewers also liked (9)

5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
0 introduction
0  introduction0  introduction
0 introduction
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 

Similar to PLS-DA Modeling Discriminates Sample Processing Methods

3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
1501 Refocus metabolomics
1501 Refocus metabolomics1501 Refocus metabolomics
1501 Refocus metabolomicsIS-X
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptorsRAJAN ROLTA
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
 
Development and Validation of a RP-HPLC method
Development and Validation of a RP-HPLC methodDevelopment and Validation of a RP-HPLC method
Development and Validation of a RP-HPLC methodUshaKhanal3
 
Pg. 05Question FiveAssignment #Deadline Day 22.docx
Pg. 05Question FiveAssignment #Deadline Day 22.docxPg. 05Question FiveAssignment #Deadline Day 22.docx
Pg. 05Question FiveAssignment #Deadline Day 22.docxmattjtoni51554
 
IRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsIRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsPanagiotis Arapitsas
 
Worked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationWorked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationGH Yeoh
 
PurposeStudents will conduct one brief research exercise that w.docx
PurposeStudents will conduct one brief research exercise that w.docxPurposeStudents will conduct one brief research exercise that w.docx
PurposeStudents will conduct one brief research exercise that w.docxamrit47
 
Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Ahmed Negida
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysisILRI-Jmaru
 
Final presentation dwi riyono
Final presentation dwi riyonoFinal presentation dwi riyono
Final presentation dwi riyonoDwi Riyono
 
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docxFinal Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docxmydrynan
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"James Neill
 
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxDIPESH30
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
Lab report 100 points
Lab report 100 pointsLab report 100 points
Lab report 100 pointsMaria Donohue
 

Similar to PLS-DA Modeling Discriminates Sample Processing Methods (20)

Food metabolomics Arapitsas 2017
Food metabolomics Arapitsas 2017Food metabolomics Arapitsas 2017
Food metabolomics Arapitsas 2017
 
Bioanalytical ppt
Bioanalytical pptBioanalytical ppt
Bioanalytical ppt
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
1501 Refocus metabolomics
1501 Refocus metabolomics1501 Refocus metabolomics
1501 Refocus metabolomics
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
 
Development and Validation of a RP-HPLC method
Development and Validation of a RP-HPLC methodDevelopment and Validation of a RP-HPLC method
Development and Validation of a RP-HPLC method
 
Pg. 05Question FiveAssignment #Deadline Day 22.docx
Pg. 05Question FiveAssignment #Deadline Day 22.docxPg. 05Question FiveAssignment #Deadline Day 22.docx
Pg. 05Question FiveAssignment #Deadline Day 22.docx
 
IRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomicsIRSAE aquatic ecology 28 June 2018 metabolomics
IRSAE aquatic ecology 28 June 2018 metabolomics
 
Worked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationWorked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluation
 
PurposeStudents will conduct one brief research exercise that w.docx
PurposeStudents will conduct one brief research exercise that w.docxPurposeStudents will conduct one brief research exercise that w.docx
PurposeStudents will conduct one brief research exercise that w.docx
 
Instrumentation
InstrumentationInstrumentation
Instrumentation
 
Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
 
Final presentation dwi riyono
Final presentation dwi riyonoFinal presentation dwi riyono
Final presentation dwi riyono
 
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docxFinal Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
Final Applied Lab Project-BIOL 103The Effect of low pH on Enzyme A.docx
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Lab report 100 points
Lab report 100 pointsLab report 100 points
Lab report 100 points
 

More from Dmitry Grapov

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideDmitry Grapov
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014Dmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 

More from Dmitry Grapov (20)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 

PLS-DA Modeling Discriminates Sample Processing Methods

  • 1. Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics Partial Least Squares (O-/PLS/-DA) modeling of metabolomic sample processing methods Goal: Use PLS to identify metabolites which best discriminate (most different between) sample processing methods (Used DATA: Pumpkin data 1.csv) Topics: 1.Model Selection 2.Results visualization 3.Feature Selection 4.Validation
  • 2. Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics Partial Least Squares Modeling Discriminant Analysis (PLS-DA) Steps 1.Calculate a single Y PLS model to discriminate between extraction/treatment 2.Select optimal scaling and model latent variable (LV) number 3.Overview PLS scores and loadings plots 4.Validate model 5.Repeat steps 1-4 for an O-PLS model Visualize: 1.Sample scores annotated by extraction and treatment 2.Variable loadings plot Exercise: 1.How are scores different between 1-Y PLS, 1-Y O-PLS, 2-Y O-PLS? 2.Are there any moderate or extreme outliers? 3.What variables contribute most to the differences between treatments?
  • 3. Model Performance Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics RMSEP - root mean squared error of prediction (fit to left out set) Q2 - cross-validated fit to the training data Xvar- explained variance in X (measurements)
  • 4. Model Validation Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics single model performance estimates Model properties and validation settings Based on training/test splitting Based on training/test splitting and permuted Y Significance of difference between model performance and permuted NULL distributions
  • 5. Biology Chemistry Informatics Model Scores single Y (extraction_treatment) Partial Least Squares (O-/PLS/-DA) Treatment Extraction Variance in extraction dominates model
  • 6. Biology Chemistry Informatics Model Scores Comparison Partial Least Squares (O-/PLS/-DA) 1-Y PLS treatment extraction 1-Y O-PLS 2-Y O-PLS
  • 7. Top Discriminants Biology Fatty acids, lipids Chemistry Partial Least Squares (O-/PLS/-DA) Informatics 2-Y O-PLS treatment extraction Sugar phosphates, polyols
  • 8. Feature Selection Biology Chemistry Steps 1.Identify top 10% of discriminants for extraction based on metabolite • correlation with model scores • importance (loading, coeffcient, VIP) Visualize: 1.Feature selection diagnostics correlation Partial Least Squares (O-/PLS/-DA) Informatics Loading (LV1)
  • 9. Feature Selection for Extraction Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics 1 2 higher in 3 3 higher in 1
  • 10. Summary Biology Chemistry Partial Least Squares (O-/PLS/-DA) Informatics Modeling Strategy (advanced) 1. Fit preliminary model 2. Evaluate of scores/loadings 3. Remove outliers 4. split data into test and train set (optional) 5. model selection (LV, OLV, etc) 6. internal (training set) model validation (permutation testing, training/testing) 7. Feature selection (optional) • Comparison of selected to excluded feature models 8. External model validation (test set)