SlideShare a Scribd company logo
1 of 31
Download to read offline
Integrative causality analysis of genetic, epigenetic, and
transcriptomic data in a large cohort
Rosemary McCloskey and Sara Mostafavi
rmcclosk.math@gmail.com
http://slideshare.net/rmcclosk/omics-integration
March 27, 2015
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 1 / 12
Motivation
genetic, epigenetic, and transcriptomic data provide snapshots of
cellular processes
GATTACA
gene
expression
methylation
histone
acetylation
genotype
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 2 / 12
Motivation
genetic, epigenetic, and transcriptomic data provide snapshots of
cellular processes
usually one data type is studied at a time, in relation to a phenotype
or disease
GATTACA
gene
expression
methylation
histone
acetylation
genotype
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 2 / 12
Motivation
genetic, epigenetic, and transcriptomic data provide snapshots of
cellular processes
usually one data type is studied at a time, in relation to a phenotype
or disease
GATTACA
?
gene
expression
methylation
histone
acetylation
genotype
how do these data fit together?
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 2 / 12
The data
large cohort designed
to study cognitive
decline and
Alzheimer’s disease
2
19
1080
0
3
392
152
20
0
1
40 61
47
17
11
expression methylation
acetylation genotype
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 3 / 12
The data
large cohort designed
to study cognitive
decline and
Alzheimer’s disease
genotype, gene
expression, DNA
methylation, and
histone acetylation
(CHiP-seq) data
2
19
1080
0
3
392
152
20
0
1
40 61
47
17
11
expression methylation
acetylation genotype
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 3 / 12
The data
large cohort designed
to study cognitive
decline and
Alzheimer’s disease
genotype, gene
expression, DNA
methylation, and
histone acetylation
(CHiP-seq) data
392 individuals with
all four data types
were used for this
analysis
2
19
1080
0
3
392
152
20
0
1
40 61
47
17
11
expression methylation
acetylation genotype
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 3 / 12
Quantitative trait loci (QTLs)
a QTL is a genetic locus
correlated with a
phenotype
-2
-1
0
1
2
3
-2
-1
0
1
2
-1
0
1
expressionacetylationmethylation
0 1 2
genotype
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 4 / 12
Quantitative trait loci (QTLs)
a QTL is a genetic locus
correlated with a
phenotype
we are interested in
QTLs for gene
expression (eQTLs),
histone acetylation
(aceQTLs), and
methylation (meQTLs)
-2
-1
0
1
2
3
-2
-1
0
1
2
-1
0
1
expressionacetylationmethylation
0 1 2
genotype
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 4 / 12
Quantitative trait loci (QTLs)
a QTL is a genetic locus
correlated with a
phenotype
we are interested in
QTLs for gene
expression (eQTLs),
histone acetylation
(aceQTLs), and
methylation (meQTLs)
QTLs provide a tool to
study interaction
between other molecular
phenotypes
-2
-1
0
1
2
3
-2
-1
0
1
2
-1
0
1
expressionacetylationmethylation
0 1 2
genotype
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 4 / 12
Identifying QTLs
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 5 / 12
Identifying QTLs
↓
SNPs in 200 kb window
Spearman’s ρ
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 5 / 12
Identifying QTLs
↓
SNPs in 200 kb window
Spearman’s ρ
↓
Holm-Bonferroni correction
best SNP per feature
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 5 / 12
Identifying QTLs
↓
SNPs in 200 kb window
Spearman’s ρ
↓
Holm-Bonferroni correction
best SNP per feature
↓ FDR correction
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 5 / 12
Removing Principal Components
technical, environmental,
and biological covariates
can swamp out QTL
effects
4000
4500
5000
5500
6000
3000
3500
4000
75000
80000
85000
90000
95000
genespeaksCpGs
0 5 10 15 20
PCs removed
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 6 / 12
Removing Principal Components
technical, environmental,
and biological covariates
can swamp out QTL
effects
correct by removing
principal components
4000
4500
5000
5500
6000
3000
3500
4000
75000
80000
85000
90000
95000
genespeaksCpGs
0 5 10 15 20
PCs removed
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 6 / 12
Removing Principal Components
technical, environmental,
and biological covariates
can swamp out QTL
effects
correct by removing
principal components
number of peaks with a
QTL plateaus at 10 PCs,
while genes and CpGs
continue to increase
4000
4500
5000
5500
6000
3000
3500
4000
75000
80000
85000
90000
95000
genespeaksCpGs
0 5 10 15 20
PCs removed
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 6 / 12
Removing Principal Components
technical, environmental,
and biological covariates
can swamp out QTL
effects
correct by removing
principal components
number of peaks with a
QTL plateaus at 10 PCs,
while genes and CpGs
continue to increase
for this analysis, removed
10 PCs from all data
4000
4500
5000
5500
6000
3000
3500
4000
75000
80000
85000
90000
95000
genespeaksCpGs
0 5 10 15 20
PCs removed
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 6 / 12
Identifying multi-QTLs
By intersecting QTL sets, found
240 gene, CpG, and peak triples
which shared the same QTL
2984
1799
50981
127
240
1604
2129
eQTL meQTL
aceQTL
2984
1799
50981
127
240
1604
2129
eQTL meQTL
aceQTL
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 7 / 12
Identifying multi-QTLs
By intersecting QTL sets, found
240 gene, CpG, and peak triples
which shared the same QTL
2984
1799
50981
127
240
1604
2129
eQTL meQTL
aceQTL
2984
1799
50981
127
240
1604
2129
eQTL meQTL
aceQTL
Also assessed QTL overlap using
π0 approach
100 %
46 %
14 %
31 %
100 %
11 %
83 %
84 %
100 %
eQTLs
aceQTLs
meQTLs
eQTLs
aceQTLs
meQTLs
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 7 / 12
Bayesian networks
Bayesian networks are directed graphical models, where the directed
edges represent causal relationships
temperature precipitation
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 8 / 12
Bayesian networks
Bayesian networks are directed graphical models, where the directed
edges represent causal relationships
We use conditional Gaussian networks
temperature precipitation
Pr(temp) ∼ N(0, 1) Pr(precip | temp) ∼ N(0, 1)
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 8 / 12
Bayesian networks
Bayesian networks are directed graphical models, where the directed
edges represent causal relationships
We use conditional Gaussian networks
Score = likelihood of data given network
temperature precipitation
Pr(temp) ∼ N(0, 1) Pr(precip | temp) ∼ N(0, 1)
0.7 0.5
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 8 / 12
Bayesian networks
Bayesian networks are directed graphical models, where the directed
edges represent causal relationships
We use conditional Gaussian networks
Score = likelihood of data given network
temperature precipitation
Pr(temp) ∼ N(0, 1) Pr(precip | temp) ∼ N(0, 1)
0.7 0.5
Pr(N(0, 1) = 0.7) Pr(N(0.7, 1) = 0.5)×
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 8 / 12
Networks for QTLs
deal and CGBayesNets packages to construct one Bayesian network
for each multi-QTL by exhaustive search
genotypeexpression acetylation
methylation
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 9 / 12
Networks for QTLs
deal and CGBayesNets packages to construct one Bayesian network
for each multi-QTL by exhaustive search
With deal, edges into genotype were blacklisted
genotypeexpression acetylation
methylation
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 9 / 12
Networks for QTLs
deal and CGBayesNets packages to construct one Bayesian network
for each multi-QTL by exhaustive search
With deal, edges into genotype were blacklisted
Most common network structure was independence
genotypeexpression acetylation
methylation
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 9 / 12
Networks for QTLs
deal and CGBayesNets packages to construct one Bayesian network
for each multi-QTL by exhaustive search
With deal, edges into genotype were blacklisted
Most common network structure was independence
Accounted for 42% of deal networks, 29% of CGBayesNets networks
genotypeexpression acetylation
methylation
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 9 / 12
Future Work
Expand the number of multi-QTLs
More that just the best SNP per feature
Identify overlapping QTLs intelligently
More rigourous criterion for number of PCs to remove
Try other packages for network learning (HyPhy)
Are QTLs enriched in SNPs identified in GWAS studies?
Correlations with phenotype (cognitive decline etc.)
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 10 / 12
Thank you!
Harvard / Broad
Philip L. D. Jager
Lori Chibnik
Jishu Xu
Charles White
Cristin McCabe
Towfique Raj
Rush
David A Bennett
Chris Gaiteri
Lei Yu
Bioinformatics Training Program
All the students
Sharon Ruschkowski
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 11 / 12
Software
QTL analysis
Matrix eQTL
qvalue
Bayesian networks
deal
CGBayesNets
Slides
beamer
TikZ
tikzDevice
Plots
pheatmap
ggplot2
VennDiagram
Colour Scheme
solarized
R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 12 / 12

More Related Content

Viewers also liked (10)

Anther culture
Anther cultureAnther culture
Anther culture
 
Cybrids
CybridsCybrids
Cybrids
 
Direct organogenesis, embryogenesis, micro grafting, meristem culture and its...
Direct organogenesis, embryogenesis, micro grafting, meristem culture and its...Direct organogenesis, embryogenesis, micro grafting, meristem culture and its...
Direct organogenesis, embryogenesis, micro grafting, meristem culture and its...
 
In vitro plant development
In vitro plant developmentIn vitro plant development
In vitro plant development
 
Techniques of in vitro clonal propagation for fruit crops
Techniques of in vitro  clonal propagation for fruit cropsTechniques of in vitro  clonal propagation for fruit crops
Techniques of in vitro clonal propagation for fruit crops
 
Invitro mutation selection for biotic stresses in Plants
Invitro mutation selection for biotic stresses in PlantsInvitro mutation selection for biotic stresses in Plants
Invitro mutation selection for biotic stresses in Plants
 
Protoplast culture
Protoplast cultureProtoplast culture
Protoplast culture
 
Genetically Modified Organisms (GMO)
Genetically Modified Organisms (GMO)Genetically Modified Organisms (GMO)
Genetically Modified Organisms (GMO)
 
GMO presentation
GMO presentationGMO presentation
GMO presentation
 
Somaclonal variation
Somaclonal variationSomaclonal variation
Somaclonal variation
 

Similar to Omics Integration

dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET
 

Similar to Omics Integration (20)

Bda2015 tutorial-part2-data&databases
Bda2015 tutorial-part2-data&databasesBda2015 tutorial-part2-data&databases
Bda2015 tutorial-part2-data&databases
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
Analyzing Genomic Data for Whole Populations
Analyzing Genomic Data for Whole PopulationsAnalyzing Genomic Data for Whole Populations
Analyzing Genomic Data for Whole Populations
 
Thesis def
Thesis defThesis def
Thesis def
 
Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic Workflows
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic Workflows
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Integrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataIntegrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS data
 
AI Math Agents
AI Math AgentsAI Math Agents
AI Math Agents
 
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
 
QTLNetMiner - Efficient search and prioritization of gene evidence networks
QTLNetMiner - Efficient search and prioritization of gene evidence networksQTLNetMiner - Efficient search and prioritization of gene evidence networks
QTLNetMiner - Efficient search and prioritization of gene evidence networks
 
Lecture at Reading University 2015
Lecture at Reading University 2015Lecture at Reading University 2015
Lecture at Reading University 2015
 
Population-Based DNA Variant Analysis
Population-Based DNA Variant AnalysisPopulation-Based DNA Variant Analysis
Population-Based DNA Variant Analysis
 
Distributed stream consistency checking
Distributed stream consistency checkingDistributed stream consistency checking
Distributed stream consistency checking
 

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Omics Integration

  • 1. Integrative causality analysis of genetic, epigenetic, and transcriptomic data in a large cohort Rosemary McCloskey and Sara Mostafavi rmcclosk.math@gmail.com http://slideshare.net/rmcclosk/omics-integration March 27, 2015 R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 1 / 12
  • 2. Motivation genetic, epigenetic, and transcriptomic data provide snapshots of cellular processes GATTACA gene expression methylation histone acetylation genotype R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 2 / 12
  • 3. Motivation genetic, epigenetic, and transcriptomic data provide snapshots of cellular processes usually one data type is studied at a time, in relation to a phenotype or disease GATTACA gene expression methylation histone acetylation genotype R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 2 / 12
  • 4. Motivation genetic, epigenetic, and transcriptomic data provide snapshots of cellular processes usually one data type is studied at a time, in relation to a phenotype or disease GATTACA ? gene expression methylation histone acetylation genotype how do these data fit together? R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 2 / 12
  • 5. The data large cohort designed to study cognitive decline and Alzheimer’s disease 2 19 1080 0 3 392 152 20 0 1 40 61 47 17 11 expression methylation acetylation genotype R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 3 / 12
  • 6. The data large cohort designed to study cognitive decline and Alzheimer’s disease genotype, gene expression, DNA methylation, and histone acetylation (CHiP-seq) data 2 19 1080 0 3 392 152 20 0 1 40 61 47 17 11 expression methylation acetylation genotype R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 3 / 12
  • 7. The data large cohort designed to study cognitive decline and Alzheimer’s disease genotype, gene expression, DNA methylation, and histone acetylation (CHiP-seq) data 392 individuals with all four data types were used for this analysis 2 19 1080 0 3 392 152 20 0 1 40 61 47 17 11 expression methylation acetylation genotype R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 3 / 12
  • 8. Quantitative trait loci (QTLs) a QTL is a genetic locus correlated with a phenotype -2 -1 0 1 2 3 -2 -1 0 1 2 -1 0 1 expressionacetylationmethylation 0 1 2 genotype R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 4 / 12
  • 9. Quantitative trait loci (QTLs) a QTL is a genetic locus correlated with a phenotype we are interested in QTLs for gene expression (eQTLs), histone acetylation (aceQTLs), and methylation (meQTLs) -2 -1 0 1 2 3 -2 -1 0 1 2 -1 0 1 expressionacetylationmethylation 0 1 2 genotype R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 4 / 12
  • 10. Quantitative trait loci (QTLs) a QTL is a genetic locus correlated with a phenotype we are interested in QTLs for gene expression (eQTLs), histone acetylation (aceQTLs), and methylation (meQTLs) QTLs provide a tool to study interaction between other molecular phenotypes -2 -1 0 1 2 3 -2 -1 0 1 2 -1 0 1 expressionacetylationmethylation 0 1 2 genotype R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 4 / 12
  • 11. Identifying QTLs R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 5 / 12
  • 12. Identifying QTLs ↓ SNPs in 200 kb window Spearman’s ρ R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 5 / 12
  • 13. Identifying QTLs ↓ SNPs in 200 kb window Spearman’s ρ ↓ Holm-Bonferroni correction best SNP per feature R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 5 / 12
  • 14. Identifying QTLs ↓ SNPs in 200 kb window Spearman’s ρ ↓ Holm-Bonferroni correction best SNP per feature ↓ FDR correction R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 5 / 12
  • 15. Removing Principal Components technical, environmental, and biological covariates can swamp out QTL effects 4000 4500 5000 5500 6000 3000 3500 4000 75000 80000 85000 90000 95000 genespeaksCpGs 0 5 10 15 20 PCs removed R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 6 / 12
  • 16. Removing Principal Components technical, environmental, and biological covariates can swamp out QTL effects correct by removing principal components 4000 4500 5000 5500 6000 3000 3500 4000 75000 80000 85000 90000 95000 genespeaksCpGs 0 5 10 15 20 PCs removed R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 6 / 12
  • 17. Removing Principal Components technical, environmental, and biological covariates can swamp out QTL effects correct by removing principal components number of peaks with a QTL plateaus at 10 PCs, while genes and CpGs continue to increase 4000 4500 5000 5500 6000 3000 3500 4000 75000 80000 85000 90000 95000 genespeaksCpGs 0 5 10 15 20 PCs removed R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 6 / 12
  • 18. Removing Principal Components technical, environmental, and biological covariates can swamp out QTL effects correct by removing principal components number of peaks with a QTL plateaus at 10 PCs, while genes and CpGs continue to increase for this analysis, removed 10 PCs from all data 4000 4500 5000 5500 6000 3000 3500 4000 75000 80000 85000 90000 95000 genespeaksCpGs 0 5 10 15 20 PCs removed R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 6 / 12
  • 19. Identifying multi-QTLs By intersecting QTL sets, found 240 gene, CpG, and peak triples which shared the same QTL 2984 1799 50981 127 240 1604 2129 eQTL meQTL aceQTL 2984 1799 50981 127 240 1604 2129 eQTL meQTL aceQTL R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 7 / 12
  • 20. Identifying multi-QTLs By intersecting QTL sets, found 240 gene, CpG, and peak triples which shared the same QTL 2984 1799 50981 127 240 1604 2129 eQTL meQTL aceQTL 2984 1799 50981 127 240 1604 2129 eQTL meQTL aceQTL Also assessed QTL overlap using π0 approach 100 % 46 % 14 % 31 % 100 % 11 % 83 % 84 % 100 % eQTLs aceQTLs meQTLs eQTLs aceQTLs meQTLs R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 7 / 12
  • 21. Bayesian networks Bayesian networks are directed graphical models, where the directed edges represent causal relationships temperature precipitation R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 8 / 12
  • 22. Bayesian networks Bayesian networks are directed graphical models, where the directed edges represent causal relationships We use conditional Gaussian networks temperature precipitation Pr(temp) ∼ N(0, 1) Pr(precip | temp) ∼ N(0, 1) R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 8 / 12
  • 23. Bayesian networks Bayesian networks are directed graphical models, where the directed edges represent causal relationships We use conditional Gaussian networks Score = likelihood of data given network temperature precipitation Pr(temp) ∼ N(0, 1) Pr(precip | temp) ∼ N(0, 1) 0.7 0.5 R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 8 / 12
  • 24. Bayesian networks Bayesian networks are directed graphical models, where the directed edges represent causal relationships We use conditional Gaussian networks Score = likelihood of data given network temperature precipitation Pr(temp) ∼ N(0, 1) Pr(precip | temp) ∼ N(0, 1) 0.7 0.5 Pr(N(0, 1) = 0.7) Pr(N(0.7, 1) = 0.5)× R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 8 / 12
  • 25. Networks for QTLs deal and CGBayesNets packages to construct one Bayesian network for each multi-QTL by exhaustive search genotypeexpression acetylation methylation R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 9 / 12
  • 26. Networks for QTLs deal and CGBayesNets packages to construct one Bayesian network for each multi-QTL by exhaustive search With deal, edges into genotype were blacklisted genotypeexpression acetylation methylation R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 9 / 12
  • 27. Networks for QTLs deal and CGBayesNets packages to construct one Bayesian network for each multi-QTL by exhaustive search With deal, edges into genotype were blacklisted Most common network structure was independence genotypeexpression acetylation methylation R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 9 / 12
  • 28. Networks for QTLs deal and CGBayesNets packages to construct one Bayesian network for each multi-QTL by exhaustive search With deal, edges into genotype were blacklisted Most common network structure was independence Accounted for 42% of deal networks, 29% of CGBayesNets networks genotypeexpression acetylation methylation R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 9 / 12
  • 29. Future Work Expand the number of multi-QTLs More that just the best SNP per feature Identify overlapping QTLs intelligently More rigourous criterion for number of PCs to remove Try other packages for network learning (HyPhy) Are QTLs enriched in SNPs identified in GWAS studies? Correlations with phenotype (cognitive decline etc.) R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 10 / 12
  • 30. Thank you! Harvard / Broad Philip L. D. Jager Lori Chibnik Jishu Xu Charles White Cristin McCabe Towfique Raj Rush David A Bennett Chris Gaiteri Lei Yu Bioinformatics Training Program All the students Sharon Ruschkowski R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 11 / 12
  • 31. Software QTL analysis Matrix eQTL qvalue Bayesian networks deal CGBayesNets Slides beamer TikZ tikzDevice Plots pheatmap ggplot2 VennDiagram Colour Scheme solarized R. McCloskey & S. Mostafavi () Omics data integration March 27, 2015 12 / 12