Creative
Data
Solutions
Machine learning powered
metabolomic network
analysis
Dmitry Grapov, PhD
www.createdatasol.com
DAVe
Creative
Data
Solutions
Predictive Modeling Within a Biochemical Context
Grapov et. al., Circ. Cardiovasc. Genet. 2014
Personalized Medicine
Complex Data Integration
Grapov et. al.,PLoS ONE (2014) doi:10.1371/journal.pone.0084260
J. Proteome Res., 2015, 14 (1), pp 557–566 DOI: 10.1021/pr500782g
Biomarker Discovery
Creative
Data
SolutionsDOI: 10.1126/science.356.6338.646
‘The impact of metabolomics on systems
biology is still in its infancy… few people have
a solid understanding across all the 'omics,
including lipidomics and epigenetics.’
Present and Future
‘There is a need for a generalized data
analysis methods capable of efficiently
integrating across[and within] -Omic domains’
Creative
Data
Solutions
Maximizing Metabolomic Coverage
American Journal of Physiology - Endocrinology and Metabolism 2015 Vol. no. , DOI: 10.1152/ajpendo.00019.2015
Creative
Data
Solutions
http://www.archaeology.org/issues/207-
1603/features/4157-arles-roman-wall-paintings
Materials of Connected Biological
Data Analysis and Visualization
Quality Assessment
• use replicated measurements
and/or internal standards to
estimate analytical variance
Statistical and Multivariate
• use the experimental design
to test hypotheses and/or
identify trends in analytes
Functional
• use statistical and multivariate
results to identify impacted
biochemical domains
Network and Predictive
• integrate statistical and
multivariate results with the
experimental design and
analyte metadata
Creative
Data
Solutions
Data Analysis Visualization engine
Creative
Data
Solutions
Statistical Data Analysis
fold-change
significance
Simplest representation: two-class comparison
↑ diabetics
↓ diabetics
↑ diabetics ↓ diabetics
Creative
Data
Solutions
Multivariate Data Analysis
meta data
data
(row, column)
Clustering: distance + linkage
Creative
Data
Solutions
dimensional
reduction
Multivariate Data Analysis
projection similarity
Creative
Data
Solutions
Considerations for Selecting a Modeling Method
Is it appropriate for my data?
• Missing values
• Data types e.g. numeric/discrete
• Multicollinearity
• Fitting properties e.g. non-linear
Training and test performance?
• Optimization
• Parameter tuning
• Feature selection
• Feature weights/rank
https://en.wikipedia.org/wiki/Anscombe%27s_quartet
https://en.wikipedia.org/wiki/Overfitting
Very important to avoid
overfitting
non-linear
classificationoutlier
linear
Visualization is critical for model evaluation
Creative
Data
Solutions
Machine Learning Based Predictive Modeling
Auto or manual
parameter tuning
Optimize based on a variety
of performance metrics
Control model cross-
validation
Share results with
global state
Choose from over
200 model types
Calculate classification or
regression models
Creative
Data
Solutions
Pathway Analysis
Analyses:
• visualization
• mapping
• enrichment
• topology
Pathway: expert definition of a biochemical module
Creative
Data
Solutions
Structural similarity
Complex Metabolic Systems Biology
Biochemistry
Creative
Data
Solutions
Encoding Biochemical Reactions
http://www.genome.jp/dbget-bin/www_bget?rn:R00703
Creative
Data
Solutions
Biochemical Reactions as Networks
Creative
Data
Solutions
Network Analysis and Mapping
Biochemical
•reaction
•domain
Structural
•molecular fingerprints
• mass spectra
Empirical
•correlation
•partial correlation
BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99
Creative
Data
Solutions
MappingsNetwork Mapped Network
Grapov D.,American Society of Mass Spectrometry Conference (2013, 2014)
Network Mapping
+ DAVe
Creative
Data
Solutions
Network Mapping
Creative
Data
SolutionsDOI: 10.1126/science.356.6338.646
‘Metabolomics can’t really
function on its own without
the genome sequence.’
Present and Future
Creative
Data
Solutions
Integrated Omic Signal State
http://dx.doi.org/10.1016/j.tibtech.2015.12.013
Creative
Data
Solutions
Omic’ data integration strategies
Biomarker Insights 2015:Suppl. 4 1-6 DOI: 10.4137/BMI.S29511
Empirical
correlation
Network
based
Biochemical
pathway
Creative
Data
Solutionshttp://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
Machine Learning Biology
Computer Vision
Ranking Deep
Learning
Creative
Data
Solutionshttps://arxiv.org/ftp/arxiv/papers/1603/1603.06430.pdf
Biochemical Deep Learning
How can we achieve the scale
and encoding required for
metabolomic deep learning?
Creative
Data
SolutionsDeepVariant http://biorxiv.org/content/biorxiv/early/2016/12/14/092890.full.pdf
Biochemical Deep Learning
• Unexpected repurposing of
established methods hint at
potential of neural networks as
universal data encoders
• ‘Black-box’ algorithms are less
useful for understanding causal
biochemical relationships
• High degree of difficulty of
implementation limits routine
adoption among biologists
Creative
Data
SolutionsDeepVariant http://biorxiv.org/content/biorxiv/early/2016/12/14/092890.full.pdf
Summary
• Metabolomic analysis involves
integration of analytical methods,
biochemical domain knowledge
and experimental data
• Network analysis and mapping are
useful methods for diverse data
integration, analysis and
visualization
• Combination of machine learning
and network analysis provides a
flexible framework for data
analysis and visualization
Creative
Data
Solutions
data visualization network analysis machine learning
predictive modeling biochemistry software
createdatasol@gmail.com
www.createdatasol.com

Machine Learning Powered Metabolomic Network Analysis