SlideShare a Scribd company logo
1 of 23
Dmitry Grapov and Oliver Fiehn
University of California, Davis
Multivariate Analysis and
Visualization Tools for
Metabolomic Data
State of the art facility producing massive
amounts of biological data…
>20-30K samples/yr
>200 studies
Sample
Variable
Data Analysis and Visualization
Quality Assessment
• use replicated mesurements
and/or internal standards to
estimate analytical variance
Statistical and Multivariate
• use the experimental design
to test hypotheses and/or
identify trends in analytes
Functional
• use statistical and multivariate
results to identify impacted
biochemical domains
Network
• integrate statistical and
multivariate results with the
experimental design and
analyte metadata
experimental design
- organism, sex, age etc.
analyte description and
metadata
- biochemical class, mass
spectra, etc.
VariableSample
Sample
Variable
Data Analysis and Visualization
Quality Assessment
• use replicated mesurements
and/or internal standards to
estimate analytical variance
Statistical and Multivariate
• use the experimental design
to test hypotheses and/or
identify trends in analytes
Functional
• use statistical and multivariate
results to identify impacted
biochemical domains
Network
• integrate statistical and
multivariate results with the
experimental design and
analyte metadata
Network Mapping
experimental design
- organism, sex, age etc.
analyte description and
metadata
- biochemical class, mass
spectra, etc.
VariableSample
Principal Component
Analysis (PCA) of all
analytes, showing QC
sample scores
Data Quality Assessment
Drift in >400 replicated measurements across >100 analytical batches for a single analyte
Acquisition batch
Abundance
QCs embedded
among >5,5000
samples (1:10)
collected over
1.5 yrs
If the biological effect
size is less than the
analytical variance
then the experiment
will incorrectly yield
insignificant results
Data Quality Assessment
Analyte specific data quality
overview
Sample specific normalization can be used
to estimate and remove analytical variance
Raw Data Normalized Data
Normalizations need to be
numerically and visually validated
log mean
low precision
%RSD
high precision
Samples
QCs
Network Mapping
Ranked statistically
significant differences
within a a biochemical
context
Statistics
Multivariate
Context
+
+
=
Statistical and Multivariate Analyses
Group 1
Group 2
What analytes are
different between the
two groups of samples?
Statistical
significant differences
lacking rank and
context
t-Test
Multivariate
ranked differences
lacking significance
and context
O-PLS-DA
Network Mapping
Statistics
Multivariate
Context
+
+
=
Statistical and Multivariate Analyses
Group 1
Group 2
What analytes are
different between the
two groups of samples?
Statistical
t-Test
Multivariate
O-PLS-DA
To see the big picture it is necessary too view the data from multiple
different angles
DeviumWebhttps://github.com/dgrapov/DeviumWeb
• visualization
• statistics
• clustering
• PCA
• O-PLS
DeviumWebhttps://github.com/dgrapov/DeviumWeb
• visualization
• statistics
• clustering
• PCA
• O-PLS
Functional Analysis
Nucl. Acids Res. (2008) 36 (suppl 2): W423-W426.doi: 10.1093/nar/gkn282
Identify changes or enrichment in biochemical domains
• decrease
• increase
Functional Analysis: opportunity for ‘Omic integration
Use domain knowledge
databases to integrate
genomic, proteomic
and metabolomic data
Current approaches can
be limited to pathway
level analyses
Networks
Biochemical
•reaction
•domain
Structural
•molecular fingerprints
• mass spectra
Empirical
•correlation
•partial correlation
BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99 
Mapped
Network
- displaying metabolic
differences in control vs.
malignant lung tissue
Biochemical
Relationships
http://www.genome.jp/dbget-bin/www_bget?rn:R00975
Structural
Similarity
http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
Empirical Networks
Use experiment specific or data driven relationships to gain novel insight
into biochemical relationships
urea cycle
nucleotide
synthesis
protein
glycosylation
Mass Spectral Networks
Use mass spectra as a proxy for structure to help make sense of
unknown compounds’ biochemical identities
Watrous J et al. PNAS 2012;109:E1743-E1752
unknown compounds are likely phytosterol
esters
Mass Spectral Networks
Use mass spectra and empirical relationships to narrow down the
biochemical roles for unknown compounds
Rigorous chemical experiments identified the unknown compounds as partial
derivatization products of glucose
MetaMapRhttps://github.com/dgrapov/MetaMapR
Analysis at the Metabolomic Scale and Beyond
pyruvate lactate
enzyme
gene Bgene A
Pathway independent metabolomic (known and unknown),
proteomic and genomic data integration
Software and Resources
•DeviumWeb- Dynamic multivariate data analysis and
visualization platform
url: https://github.com/dgrapov/DeviumWeb
•imDEV- Microsoft Excel add-in for multivariate analysis
url: http://sourceforge.net/projects/imdev/
•MetaMapR: Network analysis tools for metabolomics
url: https://github.com/dgrapov/MetaMapR
•TeachingDemos- Tutorials and demonstrations
•url: http://sourceforge.net/projects/teachingdemos/?source=directory
•url: https://github.com/dgrapov/TeachingDemos
•Data analysis case studies and Examples
url: http://imdevsoftware.wordpress.com/
dgrapov@ucdavis.edu
metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154

More Related Content

What's hot

1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
Dmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
Dmitry Grapov
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
Dmitry Grapov
 

What's hot (20)

7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data Analysis
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
0 introduction
0  introduction0  introduction
0 introduction
 
Metabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesMetabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case Studies
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data Analysis
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with Context
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 

Similar to Multivarite and network tools for biological data analysis

American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013
Dmitry Grapov
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
GenomeInABottle
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013
adrianheilbut
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
OSTHUS
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysis
Setia Pramana
 
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Perficient
 

Similar to Multivarite and network tools for biological data analysis (20)

Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1
 
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
iMicrobe_ASLO_2015
iMicrobe_ASLO_2015iMicrobe_ASLO_2015
iMicrobe_ASLO_2015
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysis
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysis
 

More from Dmitry Grapov

Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Dmitry Grapov
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
Dmitry Grapov
 

More from Dmitry Grapov (8)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 

Recently uploaded

Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
Sérgio Sacani
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
Sérgio Sacani
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
Sérgio Sacani
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
Sérgio Sacani
 

Recently uploaded (20)

mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategy
 
National Biodiversity protection initiatives and Convention on Biological Di...
National Biodiversity protection initiatives and  Convention on Biological Di...National Biodiversity protection initiatives and  Convention on Biological Di...
National Biodiversity protection initiatives and Convention on Biological Di...
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyan
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
 
Land use land cover change analysis and detection of its drivers using geospa...
Land use land cover change analysis and detection of its drivers using geospa...Land use land cover change analysis and detection of its drivers using geospa...
Land use land cover change analysis and detection of its drivers using geospa...
 
Hemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. MuralinathHemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. Muralinath
 
Lec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.pptLec 1.b Totipotency and birth of tissue culture.ppt
Lec 1.b Totipotency and birth of tissue culture.ppt
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdf
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
 
family therapy psychotherapy types .pdf
family therapy psychotherapy types  .pdffamily therapy psychotherapy types  .pdf
family therapy psychotherapy types .pdf
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 

Multivarite and network tools for biological data analysis

  • 1. Dmitry Grapov and Oliver Fiehn University of California, Davis Multivariate Analysis and Visualization Tools for Metabolomic Data
  • 2. State of the art facility producing massive amounts of biological data… >20-30K samples/yr >200 studies
  • 3. Sample Variable Data Analysis and Visualization Quality Assessment • use replicated mesurements and/or internal standards to estimate analytical variance Statistical and Multivariate • use the experimental design to test hypotheses and/or identify trends in analytes Functional • use statistical and multivariate results to identify impacted biochemical domains Network • integrate statistical and multivariate results with the experimental design and analyte metadata experimental design - organism, sex, age etc. analyte description and metadata - biochemical class, mass spectra, etc. VariableSample
  • 4. Sample Variable Data Analysis and Visualization Quality Assessment • use replicated mesurements and/or internal standards to estimate analytical variance Statistical and Multivariate • use the experimental design to test hypotheses and/or identify trends in analytes Functional • use statistical and multivariate results to identify impacted biochemical domains Network • integrate statistical and multivariate results with the experimental design and analyte metadata Network Mapping experimental design - organism, sex, age etc. analyte description and metadata - biochemical class, mass spectra, etc. VariableSample
  • 5. Principal Component Analysis (PCA) of all analytes, showing QC sample scores Data Quality Assessment Drift in >400 replicated measurements across >100 analytical batches for a single analyte Acquisition batch Abundance QCs embedded among >5,5000 samples (1:10) collected over 1.5 yrs If the biological effect size is less than the analytical variance then the experiment will incorrectly yield insignificant results
  • 6. Data Quality Assessment Analyte specific data quality overview Sample specific normalization can be used to estimate and remove analytical variance Raw Data Normalized Data Normalizations need to be numerically and visually validated log mean low precision %RSD high precision Samples QCs
  • 7. Network Mapping Ranked statistically significant differences within a a biochemical context Statistics Multivariate Context + + = Statistical and Multivariate Analyses Group 1 Group 2 What analytes are different between the two groups of samples? Statistical significant differences lacking rank and context t-Test Multivariate ranked differences lacking significance and context O-PLS-DA
  • 8. Network Mapping Statistics Multivariate Context + + = Statistical and Multivariate Analyses Group 1 Group 2 What analytes are different between the two groups of samples? Statistical t-Test Multivariate O-PLS-DA To see the big picture it is necessary too view the data from multiple different angles
  • 11. Functional Analysis Nucl. Acids Res. (2008) 36 (suppl 2): W423-W426.doi: 10.1093/nar/gkn282 Identify changes or enrichment in biochemical domains • decrease • increase
  • 12. Functional Analysis: opportunity for ‘Omic integration Use domain knowledge databases to integrate genomic, proteomic and metabolomic data Current approaches can be limited to pathway level analyses
  • 13. Networks Biochemical •reaction •domain Structural •molecular fingerprints • mass spectra Empirical •correlation •partial correlation BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99 
  • 14. Mapped Network - displaying metabolic differences in control vs. malignant lung tissue Biochemical Relationships http://www.genome.jp/dbget-bin/www_bget?rn:R00975
  • 16. Empirical Networks Use experiment specific or data driven relationships to gain novel insight into biochemical relationships urea cycle nucleotide synthesis protein glycosylation
  • 17. Mass Spectral Networks Use mass spectra as a proxy for structure to help make sense of unknown compounds’ biochemical identities Watrous J et al. PNAS 2012;109:E1743-E1752 unknown compounds are likely phytosterol esters
  • 18. Mass Spectral Networks Use mass spectra and empirical relationships to narrow down the biochemical roles for unknown compounds Rigorous chemical experiments identified the unknown compounds as partial derivatization products of glucose
  • 20.
  • 21. Analysis at the Metabolomic Scale and Beyond pyruvate lactate enzyme gene Bgene A Pathway independent metabolomic (known and unknown), proteomic and genomic data integration
  • 22. Software and Resources •DeviumWeb- Dynamic multivariate data analysis and visualization platform url: https://github.com/dgrapov/DeviumWeb •imDEV- Microsoft Excel add-in for multivariate analysis url: http://sourceforge.net/projects/imdev/ •MetaMapR: Network analysis tools for metabolomics url: https://github.com/dgrapov/MetaMapR •TeachingDemos- Tutorials and demonstrations •url: http://sourceforge.net/projects/teachingdemos/?source=directory •url: https://github.com/dgrapov/TeachingDemos •Data analysis case studies and Examples url: http://imdevsoftware.wordpress.com/
  • 23. dgrapov@ucdavis.edu metabolomics.ucdavis.edu This research was supported in part by NIH 1 U24 DK097154