Omic Data Integration Strategies

Approaches for Integration of multiple ‘Omic’
Data
Dmitry Grapov, PhD

Examples
Nature Reviews Genetics 15, 107–120 (2014) doi:10.1038/nrg3643
FBA = flux-balance analysis
• Topological enrichment can give broad overview
of impacted genes, proteins and metabolites
• Changes in biochemical domains corroborated
by multi-Omic data sets can be used to identify
robust candidates responsible for phenotypic
variation between comparisons
• Gene-gene, protein-protein or gene-protein
interaction networks can be used to
deconvolute ambiguous metabolic pathways

Common Approaches
Nature Reviews Genetics 15, 107–120 (2014) doi:10.1038/nrg3643

Biochemical Domain
Enrichment Analysis
• Genes/Proteins  DAVID, AmiGo, etc GO:terms
• Genes/Proteins + Metabolites  IMPaLA: Integrated Molecular
Pathway Level Analysis (http://impala.molgen.mpg.de/)  pathways
1. Classify all species domains (e.g. biological process, pathway, etc)
2. Calculate probability of observing changes in species by chance

IMPaLA: Gene + Metabolite
pathway enrichment
Challenges:
•Removal of redundant information
•Preference of specific vs. generic pathways
•Visualization of gene + metabolite + pathway relationships

Determining significance of the
enrichment: Hypergeometric Test
How to calculate statistics to determine enrichment?
hit.num = 51 # number of significantly changed pathway
metabolites
set.num = 1455 # number of metabolites in pathway
full = 3358 # all possible metabolites in organism
q.size = 72 # number of significantly changed metabolites
phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)
= 1.717553e-06

GO Enrichment analysis:
Hierarchy of Redundancy (parents)
• GO is an ontology wherein enrichment is often
shared by children and parents.
• Difficult to co-visualize term hierarchy and gene to
term mapping

Enrichment networks:
Removing the Hierarchy of
Redundancy
Workflow:
1. If two nodes share all genes, drop least
enriched (highest p-value)
2. Filter terms based on enrichment
3. Display term to gene/protein
relationships as edges in a network
4. Map direction of change in
genes/proteins to network node
attributes

Enrichment Network
Mapping of parents through children
GO enrichment network displays:
• gene names associated with
each overrepresented term
• Fold change in protein
expression between two
groups (can be extended k>2
groups)
• Can display enrichment p-
value for each term
• Can incorporate metabolites
as children of genes

Empirical Networks
• Correlation based networks (CN)
(simple, tendency to hairball)
• GGM or partial correlation based
networks (advanced, preference
of direct over indirect
relationships
• *Increase in robustness with
sample size
10.1007/978-1-4614-1689-0_17

Topological
Enrichment
Networks
http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
http://www.genome.jp/dbget-bin/www_bget?rn:R00975

gene gene
Topological Enrichment Networks:
genes + proteins + metabolites
metabolite
metabolite
protein
gene

MetaMapR
Biological network generator
https://github.com/dgrapov/MetaMapR

dgrapov@ucdavis.edu
metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154

Omic Data Integration Strategies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Omic Data Integration Strategies

Similar to Omic Data Integration Strategies (20)

More from Dmitry Grapov

More from Dmitry Grapov (16)

Recently uploaded

Recently uploaded (20)

Omic Data Integration Strategies