Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Methylation and Expression data integration

527 views

Published on

Presentation to my lab on current project

Published in: Science
  • Be the first to comment

Methylation and Expression data integration

  1. 1. Introduction Methods Results Future Directions Making sense of Methylation & Expression data in Cordblood and Placenta tissues Sahir Rai Bhatnagar1 March 5, 2015 1Greenwood Group Lab Meeting 1 / 27
  2. 2. Introduction Methods Results Future Directions Outline 1 Talk about the data I’m working with 2 Some preliminary results 3 A proposition 2 / 27
  3. 3. Introduction Methods Results Future Directions Motivation The data Visual Representations Motivation 1 in 4 adult Canadians and 1 in 10 children are clinically obese. 6 million Canadians are at higher risk for type 2 diabetes, high blood pressure, cardiovascular disease. Overweight and obesity related health care costs ≈ $6 billion, or 4.1% of Canada’s total health care budget Events during pregnancy are suspected to play a role in childhood obesity → we don’t know about the mechanisms involved Children born to women who had a gestational diabetes mellitus-affected pregnancy are more likely to be overweight and obese Evidence suggests epigenetic factors are important piece of the puzzle 3 / 27
  4. 4. Introduction Methods Results Future Directions Motivation The data Visual Representations Research Question(s) Objectives 1 Identify epigenetic marks observed at birth that help predict childhood obesity 4 / 27
  5. 5. Introduction Methods Results Future Directions Motivation The data Visual Representations Research Question(s) Objectives 1 Identify epigenetic marks observed at birth that help predict childhood obesity 2 Determine if these epigenetic changes are associated with specific maternal factors (GD, weight gain during pregnancy) 4 / 27
  6. 6. Introduction Methods Results Future Directions Motivation The data Visual Representations Research Question(s) Objectives 1 Identify epigenetic marks observed at birth that help predict childhood obesity 2 Determine if these epigenetic changes are associated with specific maternal factors (GD, weight gain during pregnancy) 3 Impact of these epigenetic changes on gene expression levels 4 / 27
  7. 7. Expression p = 46, 889
  8. 8. Expression p = 46, 889 Methylation Illumina 450k p = 375, 561
  9. 9. Expression p = 46, 889 Methylation Illumina 450k p = 375, 561 Phenotype
  10. 10. Expression p = 46, 889 Methylation Illumina 450k p = 375, 561 Phenotype Placenta n = 45 Cord blood n = 45
  11. 11. Expression p = 46, 889 Methylation Illumina 450k p = 375, 561 Phenotype Placenta n = 45 Cord blood n = 45
  12. 12. Expression p = 46, 889 Methylation Illumina 450k p = 375, 561 Phenotype Placenta n = 45 Cord blood n = 45 Gestational Diabetes (Binary) n = 45 GD = 29 7 Continuous Fat Measures Child age=5 n = 23 GD = 16
  13. 13. Expression p = 46, 889 Methylation Illumina 450k p = 375, 561 Phenotype Placenta n = 45 Cord blood n = 45 Gestational Diabetes (Binary) n = 45 GD = 29 7 Continuous Fat Measures Child age=5 n = 23 GD = 16 ?? ?? ??
  14. 14. Introduction Methods Results Future Directions Motivation The data Visual Representations Percent Fat and Gestational Age q q q q q q q q q q q q q q q q q q q q q q q q 5 10 15 20 NGT DG case percentFAT case NGT DG q q q q q q q q q q q q q q q q q q q q q q q q q 38 39 40 41 NGT DG case Age_gestationnel case NGT DG Figure 1 : Distribution of covariates 6 / 27
  15. 15. Introduction Methods Results Future Directions Motivation The data Visual Representations Child age and Zscore BMI q q q q q q q q q q q q q q q q q q q q q q q q 60 70 80 90 NGT DG case AgeMois case NGT DG q q q q q q q q q q q q q q q q q q q q q q q −1 0 NGT DG case ZScoreBMI case NGT DG Figure 2 : Distribution of covariates 7 / 27
  16. 16. Introduction Methods Results Future Directions Motivation The data Visual Representations q q 5 10 Tricep Bicep Sous_Scapulaire Iliaque plis_adipeux value case NGT DG Figure 3 : Distribution of plis adipeux 8 / 27
  17. 17. Introduction Methods Results Future Directions Motivation The data Visual Representations mean methylation values for each probe by tissue Density 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 cord 0.0 0.2 0.4 0.6 0.8 1.0 placenta Figure 4 : Density plot of Mean methylation values for each probe by tissue 9 / 27
  18. 18. Introduction Methods Results Future Directions Adjusting for Cell type mixtures Regression forms q-value Motivation Methylation in Cordblood & Placenta Gestational Diabetes Cell type mixture
  19. 19. Introduction Methods Results Future Directions Adjusting for Cell type mixtures Regression forms q-value Motivation Methylation in Cordblood & Placenta Gestational Diabetes Cell type mixture ?? 10 / 27
  20. 20. Introduction Methods Results Future Directions Adjusting for Cell type mixtures Regression forms q-value Motivation We perform the adjustment for cell type mixture using SVA 11 / 27
  21. 21. Introduction Methods Results Future Directions Adjusting for Cell type mixtures Regression forms q-value Motivation We perform the adjustment for cell type mixture using SVA Why SVA ? 11 / 27
  22. 22. Introduction Methods Results Future Directions Adjusting for Cell type mixtures Regression forms q-value Motivation We perform the adjustment for cell type mixture using SVA Why SVA ? see Kevin for details 11 / 27
  23. 23. Introduction Methods Results Future Directions Adjusting for Cell type mixtures Regression forms q-value Methylation (M) and Expression (E) for Cord blood and Placenta M or E ∼ Gestational Diabetes + Gestational Age + Cell Mixture (1) M or E ∼ Body Fat Measures + Gestational Age+ Sex and Age of child + Cell Mixture (2) note: The 7 body fat measures were modelled separately note: n=45 for model (1), n=23 for model (2) 12 / 27
  24. 24. Introduction Methods Results Future Directions Adjusting for Cell type mixtures Regression forms q-value Reporting Evidence Evidence reported in terms of the p-value and q-value The q-value is an extension of the False Discovery Rate (FDR), by giving each feature its own individual measure of significance. The q-value for a CpG site is the expected proportion of false positives incurred when calling that site significant. Whereas the p-value is a measure of significance in terms of the false positive rate, the q-value is a measure in terms of the FDR. Example: if 10 CpG sites with q-values ≤ 5% are called significant in an EWAS, 1 of these 10 sites is a false positive The q-value methodology estimates the proportion of features that are truly null (from the given p-values) denoted by π0 whereas the FDR methodology assumes π0 = 1. We calculated the q-values using the qvalue package in R. 13 / 27
  25. 25. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures Cord blood and Placenta Table 1 : The number of differentially methylated CpG sites in cord blood and placenta DNA samples from newborns with or without exposure to gestational diabetes mellitus, for unadjusted, age adjusted, age and cell mixture adjusted models at different p and q value thresholds. Threshold 1 × 10−3 0.01 0.025 0.05 0.10 Criteria p q p q p q p q p q Model Cordblood Unadj 389 0 3,961 0 10,321 0 21,620 0 44,988 4 Age 253 1 2,648 1 6,904 1 14,457 1 31,250 1 Cell-adj 575 1 4,150 1 9,531 3 18,365 5 36,100 9 Placenta Unadj 260 0 2,520 0 6,437 0 13,445 0 28,571 0 Age 259 0 2,492 0 6,493 0 13,425 0 28,692 0 Cell-adj 451 0 3,368 1 7,997 2 15,919 6 32,333 7 14 / 27
  26. 26. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures 15 / 27
  27. 27. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures 16 / 27
  28. 28. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures Cord blood Table 2 : # of significant CpG sites out of 229,550, restricted to probes with mean methylation values between 10% and 90%. Threshold 0.01 0.001 1 × 10−4 1 × 10−5 Criteria p q p q p q p q Primary Covariate BMI 11881 300 2492 44 644 16 192 7 BMI.latent 3713 93 924 26 305 11 124 7 ZScoreBMI 5558 12 987 2 204 0 64 0 ZScoreBMI.latent 3686 43 789 8 243 1 90 0 bicep 1843 0 295 0 73 0 20 0 bicep.latent 6154 202 1651 56 558 34 208 15 iliaque 2815 6 509 1 117 0 33 0 iliaque.latent 3580 40 791 5 227 4 81 1 percentFAT 1533 40 318 17 117 7 59 4 percentFAT.latent 3498 95 914 45 288 27 122 14 scap 4947 27 842 8 178 2 48 0 scap.latent 3899 72 882 9 254 2 98 2 tricep 3965 32 775 12 187 8 65 2 tricep.latent 5172 115 1302 41 387 14 150 7 17 / 27
  29. 29. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures Placenta Table 3 : # of significant CpG sites out of 229,533, restricted to probes with mean methylation values between 10% and 90%. Threshold 0.01 0.001 1 × 10−4 1 × 10−5 Criteria p q p q p q p q Primary Covariate BMI 5052 98 1164 32 365 12 134 7 BMI.latent 7043 339 2006 90 704 33 293 13 ZScoreBMI 4168 33 935 13 254 1 82 0 ZScoreBMI.latent 5515 138 1425 50 451 17 167 12 bicep 2275 7 436 0 80 0 19 0 bicep.latent 3431 40 745 7 200 0 76 0 iliaque 9235 326 2304 77 686 13 262 6 iliaque.latent 5415 98 1419 40 453 13 143 7 percentFAT 2154 91 546 51 219 38 116 29 percentFAT.latent 3185 101 797 37 280 17 124 8 scap 6699 181 1647 48 484 14 183 7 scap.latent 6698 317 1934 104 673 43 289 18 tricep 3221 86 781 30 269 12 122 4 tricep.latent 4136 88 1016 25 310 13 120 7 18 / 27
  30. 30. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures 19 / 27
  31. 31. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures 20 / 27
  32. 32. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures Gene Expression Results Threshold 0.05 0.01 1 × 10−3 1 × 10−4 1 × 10−5 Criteria p q p q p q p q p q BMI 3579 0 938 0 157 0 27 0 2 0 BMI.latent 3281 4 921 0 153 0 36 0 5 0 ZScoreBMI 3680 0 994 0 151 0 24 0 3 0 ZScoreBMI.latent 3570 2 1043 1 194 0 34 0 5 0 bicep 4262 15 1294 0 220 0 41 0 12 0 bicep.latent 3010 0 777 0 123 0 30 0 5 0 iliaque 2518 0 495 0 39 0 4 0 0 0 iliaque.latent 2665 0 618 0 96 0 18 0 6 0 percentFAT 2892 0 579 0 67 0 10 0 0 0 percentFAT.latent 2862 6 754 1 123 0 21 0 6 0 scap 2886 0 687 0 92 0 21 0 3 0 scap.latent 2780 3 736 0 119 0 21 0 5 0 tricep 3335 6 848 0 124 0 18 0 6 0 tricep.latent 2896 3 728 1 137 0 30 0 6 0 21 / 27
  33. 33. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures 22 / 27
  34. 34. Introduction Methods Results Future Directions Methylation ∼ Gestational Diabetes Methylation ∼ Body Fat measures Gene Expression ∼ Body Fat measures 23 / 27
  35. 35. Introduction Methods Results Future Directions Overlap Analysis PLS Path Modeling Ad-hoc Overlap Analysis Table 4 : # of Overlapping locations within 5kb that are significant at p < 10−3 in the placenta, for both expression and methylation data Primary Covariate # of Overlapping locations BMI 3 BMI.latent 6 ZScoreBMI 5 ZScoreBMI.latent 10 bicep 4 bicep.latent 1 iliaque 1 iliaque.latent 1 percentFAT 1 percentFAT.latent 3 scap 3 scap.latent 4 tricep 2 tricep.latent 1 24 / 27
  36. 36. Introduction Methods Results Future Directions Overlap Analysis PLS Path Modeling Expression Methylation Fat Gene 1 CpG 1 BMI Gene 2 CpG 2 % Fat Gene 3 CpG 3 Bicep 25 / 27

×