Methylation and Expression data integration

Introduction
Methods
Results
Future Directions
Making sense of Methylation & Expression data in
Cordblood and Placenta tissues
Sahir Rai Bhatnagar1
March 5, 2015
1Greenwood Group Lab Meeting
1 / 27

Introduction
Methods
Results
Future Directions
Outline
1 Talk about the data I’m working with
2 Some preliminary results
3 A proposition
2 / 27

Introduction
Methods
Results
Future Directions
Motivation
The data
Visual Representations
Motivation
1 in 4 adult Canadians and 1 in 10 children are clinically obese.
6 million Canadians are at higher risk for type 2 diabetes, high blood
pressure, cardiovascular disease.
Overweight and obesity related health care costs ≈ $6 billion, or
4.1% of Canada’s total health care budget
Events during pregnancy are suspected to play a role in childhood
obesity → we don’t know about the mechanisms involved
Children born to women who had a gestational diabetes
mellitus-aﬀected pregnancy are more likely to be overweight and
obese
Evidence suggests epigenetic factors are important piece of the
puzzle
3 / 27

Introduction
Methods
Results
Future Directions
Motivation
The data
Research Question(s)
Objectives
1 Identify epigenetic marks observed at birth that help predict
childhood obesity
4 / 27

Introduction
Methods
Results
Future Directions
Motivation
The data
Objectives
childhood obesity
2 Determine if these epigenetic changes are associated with speciﬁc
maternal factors (GD, weight gain during pregnancy)
4 / 27

Introduction
Methods
Results
Future Directions
Motivation
The data
Objectives
childhood obesity
2 Determine if these epigenetic changes are associated with speciﬁc
maternal factors (GD, weight gain during pregnancy)
3 Impact of these epigenetic changes on gene expression levels
4 / 27

Expression
p = 46, 889
Methylation
Illumina 450k
p = 375, 561

Expression
p = 46, 889
Methylation
Illumina 450k
p = 375, 561
Phenotype

Expression
p = 46, 889
Methylation
Illumina 450k
p = 375, 561
Phenotype
Placenta
n = 45
Cord blood
n = 45

Expression
p = 46, 889
Methylation
Illumina 450k
p = 375, 561
Phenotype
Placenta
n = 45
Cord blood
n = 45
Gestational
Diabetes
(Binary)
n = 45
GD = 29
7 Continuous
Fat Measures
Child age=5
n = 23
GD = 16

Expression
p = 46, 889
Methylation
Illumina 450k
p = 375, 561
Phenotype
Placenta
n = 45
Cord blood
n = 45
Gestational
Diabetes
(Binary)
n = 45
GD = 29
7 Continuous
Fat Measures
Child age=5
n = 23
GD = 16
?? ??
??

Introduction
Methods
Results
Future Directions
Motivation
The data
Percent Fat and Gestational Age
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
5
10
15
20
NGT DG
case
percentFAT
case
NGT
DG
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
38
39
40
41
NGT DG
case
Age_gestationnel
case
NGT
DG
Figure 1 : Distribution of covariates
6 / 27

Introduction
Methods
Results
Future Directions
Motivation
The data
Child age and Zscore BMI
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
60
70
80
90
NGT DG
case
AgeMois
case
NGT
DG
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
−1
0
NGT DG
case
ZScoreBMI
case
NGT
DG
Figure 2 : Distribution of covariates
7 / 27

Introduction
Methods
Results
Future Directions
Motivation
The data
q
q
5
10
Tricep Bicep Sous_Scapulaire Iliaque
plis_adipeux
value
case
NGT
DG
Figure 3 : Distribution of plis adipeux
8 / 27

Introduction
Methods
Results
Future Directions
Motivation
The data
mean methylation values for each probe by tissue
Density
0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
cord
0.0 0.2 0.4 0.6 0.8 1.0
placenta
Figure 4 : Density plot of Mean methylation values for each probe by tissue
9 / 27

Introduction
Methods
Results
Future Directions
Adjusting for Cell type mixtures
Regression forms
q-value
Motivation
Methylation
in Cordblood
& Placenta
Gestational
Diabetes
Cell type
mixture

Introduction
Methods
Results
Future Directions
Regression forms
q-value
Motivation
Methylation
in Cordblood
& Placenta
Gestational
Diabetes
Cell type
mixture
??
10 / 27

Introduction
Methods
Results
Future Directions
Regression forms
q-value
Motivation
We perform the adjustment for cell type mixture using SVA
11 / 27

Introduction
Methods
Results
Future Directions
Regression forms
q-value
Motivation
Why SVA ?
11 / 27

Introduction
Methods
Results
Future Directions
Regression forms
q-value
Motivation
Why SVA ?
see Kevin for details
11 / 27

Introduction
Methods
Results
Future Directions
Regression forms
q-value
Methylation (M) and Expression (E) for Cord blood and
Placenta
M or E ∼ Gestational Diabetes + Gestational Age + Cell Mixture (1)
M or E ∼ Body Fat Measures + Gestational Age+
Sex and Age of child + Cell Mixture (2)
note: The 7 body fat measures were modelled separately
note: n=45 for model (1), n=23 for model (2)
12 / 27

Introduction
Methods
Results
Future Directions
Regression forms
q-value
Reporting Evidence
Evidence reported in terms of the p-value and q-value
The q-value is an extension of the False Discovery Rate (FDR), by
giving each feature its own individual measure of significance.
The q-value for a CpG site is the expected proportion of false
positives incurred when calling that site significant.
Whereas the p-value is a measure of significance in terms of the
false positive rate, the q-value is a measure in terms of the FDR.
Example: if 10 CpG sites with q-values ≤ 5% are called significant
in an EWAS, 1 of these 10 sites is a false positive
The q-value methodology estimates the proportion of features that
are truly null (from the given p-values) denoted by π0 whereas the
FDR methodology assumes π0 = 1.
We calculated the q-values using the qvalue package in R.
13 / 27

Introduction
Methods
Results
Future Directions
Methylation ∼ Gestational Diabetes
Methylation ∼ Body Fat measures
Gene Expression ∼ Body Fat measures
Cord blood and Placenta
Table 1 : The number of diﬀerentially methylated CpG sites in cord blood and
placenta DNA samples from newborns with or without exposure to gestational
diabetes mellitus, for unadjusted, age adjusted, age and cell mixture adjusted
models at diﬀerent p and q value thresholds.
Threshold 1 × 10−3
0.01 0.025 0.05 0.10
Criteria p q p q p q p q p q
Model
Cordblood
Unadj 389 0 3,961 0 10,321 0 21,620 0 44,988 4
Age 253 1 2,648 1 6,904 1 14,457 1 31,250 1
Cell-adj 575 1 4,150 1 9,531 3 18,365 5 36,100 9
Placenta
Unadj 260 0 2,520 0 6,437 0 13,445 0 28,571 0
Age 259 0 2,492 0 6,493 0 13,425 0 28,692 0
Cell-adj 451 0 3,368 1 7,997 2 15,919 6 32,333 7
14 / 27

Introduction
Methods
Results
Future Directions
15 / 27

Introduction
Methods
Results
Future Directions
16 / 27

Introduction
Methods
Results
Future Directions
Cord blood
Table 2 : # of signiﬁcant CpG sites out of 229,550, restricted to probes
with mean methylation values between 10% and 90%.
Threshold 0.01 0.001 1 × 10−4
1 × 10−5
Criteria p q p q p q p q
Primary Covariate
BMI 11881 300 2492 44 644 16 192 7
BMI.latent 3713 93 924 26 305 11 124 7
ZScoreBMI 5558 12 987 2 204 0 64 0
ZScoreBMI.latent 3686 43 789 8 243 1 90 0
bicep 1843 0 295 0 73 0 20 0
bicep.latent 6154 202 1651 56 558 34 208 15
iliaque 2815 6 509 1 117 0 33 0
iliaque.latent 3580 40 791 5 227 4 81 1
percentFAT 1533 40 318 17 117 7 59 4
percentFAT.latent 3498 95 914 45 288 27 122 14
scap 4947 27 842 8 178 2 48 0
scap.latent 3899 72 882 9 254 2 98 2
tricep 3965 32 775 12 187 8 65 2
tricep.latent 5172 115 1302 41 387 14 150 7 17 / 27

Introduction
Methods
Results
Future Directions
Placenta
Table 3 : # of signiﬁcant CpG sites out of 229,533, restricted to probes
with mean methylation values between 10% and 90%.
Threshold 0.01 0.001 1 × 10−4
1 × 10−5
Criteria p q p q p q p q
Primary Covariate
BMI 5052 98 1164 32 365 12 134 7
BMI.latent 7043 339 2006 90 704 33 293 13
ZScoreBMI 4168 33 935 13 254 1 82 0
ZScoreBMI.latent 5515 138 1425 50 451 17 167 12
bicep 2275 7 436 0 80 0 19 0
bicep.latent 3431 40 745 7 200 0 76 0
iliaque 9235 326 2304 77 686 13 262 6
iliaque.latent 5415 98 1419 40 453 13 143 7
percentFAT 2154 91 546 51 219 38 116 29
percentFAT.latent 3185 101 797 37 280 17 124 8
scap 6699 181 1647 48 484 14 183 7
scap.latent 6698 317 1934 104 673 43 289 18
tricep 3221 86 781 30 269 12 122 4
tricep.latent 4136 88 1016 25 310 13 120 7 18 / 27

Introduction
Methods
Results
Future Directions
19 / 27

Introduction
Methods
Results
Future Directions
20 / 27

Introduction
Methods
Results
Future Directions
Gene Expression Results
Threshold 0.05 0.01 1 × 10−3
1 × 10−4
1 × 10−5
Criteria p q p q p q p q p q
BMI 3579 0 938 0 157 0 27 0 2 0
BMI.latent 3281 4 921 0 153 0 36 0 5 0
ZScoreBMI 3680 0 994 0 151 0 24 0 3 0
ZScoreBMI.latent 3570 2 1043 1 194 0 34 0 5 0
bicep 4262 15 1294 0 220 0 41 0 12 0
bicep.latent 3010 0 777 0 123 0 30 0 5 0
iliaque 2518 0 495 0 39 0 4 0 0 0
iliaque.latent 2665 0 618 0 96 0 18 0 6 0
percentFAT 2892 0 579 0 67 0 10 0 0 0
percentFAT.latent 2862 6 754 1 123 0 21 0 6 0
scap 2886 0 687 0 92 0 21 0 3 0
scap.latent 2780 3 736 0 119 0 21 0 5 0
tricep 3335 6 848 0 124 0 18 0 6 0
tricep.latent 2896 3 728 1 137 0 30 0 6 0
21 / 27

Introduction
Methods
Results
Future Directions
22 / 27

Introduction
Methods
Results
Future Directions
23 / 27

Introduction
Methods
Results
Future Directions
Overlap Analysis
PLS Path Modeling
Ad-hoc Overlap Analysis
Table 4 : # of Overlapping locations
within 5kb that are signiﬁcant at p < 10−3
in the placenta, for both expression and
methylation data
Primary Covariate # of Overlapping locations
BMI 3
BMI.latent 6
ZScoreBMI 5
ZScoreBMI.latent 10
bicep 4
bicep.latent 1
iliaque 1
iliaque.latent 1
percentFAT 1
percentFAT.latent 3
scap 3
scap.latent 4
tricep 2
tricep.latent 1
24 / 27

Introduction
Methods
Results
Future Directions
Overlap Analysis
PLS Path Modeling
Expression
Methylation
Fat
Gene 1
CpG 1
BMI
Gene 2
CpG 2
% Fat
Gene 3
CpG 3
Bicep
25 / 27

Methylation and Expression data integration

Recommended

Recommended

More Related Content

Similar to Methylation and Expression data integration

Similar to Methylation and Expression data integration (20)

More from sahirbhatnagar

More from sahirbhatnagar (11)

Recently uploaded

Recently uploaded (20)

Methylation and Expression data integration