A	
  versatile	
  Python	
  tool	
  to	
  assess	
  DNA	
  
methylation	
  variation	
  and	
  identify	
  DMRs
!
Wen-Wei ...
“So if you're a CS major and you want to start a startup,
instead of taking a class on entrepreneurship you're
better off ...
Man
Cells
19 20 21 22 X Y
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18
Chromosomes
Photo courtesy of Shutterstock.
2003
Human Genome Project
3 billion bases (Gb)
30 億
Do we really HACK the genome?
Why are identical twins different?
!
Identical twins, who carry the same sequence
of DNA nucleotides, often develop differ...
...GATTACACCCATGTCAGTGCGA...
...CTAATGTGGGTACAGTCACGCT...
DNA Sequence
ATTACACCCATGTCAGTGCGA
TAATGTGGGTACAGTCACGCT
m m
DNA Sequence
m = methylation (甲基化)
DNA	
  Methylation
• 5 position of cytosine!
• Regulation of gene expression!
• Environment effect!
• Heritable mark!
The	
  5th	
  base
Methylated Cytosine
Lister, R. & Ecker, J. R. Finding the fifth base: genome-wide sequencing!
of cytosin...
Improvements in the rate of DNA sequencing
over the past 30 years
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The ca...
How does next-generation sequencing work?
DNA extraction
fragmentation!
&!
amplification
massively parallel sequencing
Bisulfite sequencing
Treatment of DNA with sodium bisulfite
• unmethylated cytosines are converted to
uracils
• methylated...
Bisulfite sequence alignment
C C G A T G A T G T C G C C G A C G C A
T C G C C G A C G C A C
reference genome
sample fragm...
Bisulfite sequence alignment
C C G A T G A T G T C G C C G A C G C A
T C G C C G A C G C A C
reference genome
T C G T T G ...
X
Bisulfite sequence alignment
C C G A T G A T G T C G C C G A C G C A
T C G C C G A C G C A C
reference genome
T C G T T ...
Bisulfite sequence alignment
bisulfite sequencing
T C G T T G Abisulfite read
C C G A T G A T G T C G C C G A C G C A
refere...
Bisulfite sequence alignment
bisulfite sequencing
T T G T T G A
three letters
T C G T T G Abisulfite read
C C G A T G A T G ...
Bisulfite sequence alignment
bisulfite sequencing
T T G T T G A
three letters
T T G A T G A T G T T G T T G A T G T A
three...
Bisulfite sequence alignment
bisulfite sequencing
T T G T T G A
three letters
T T G A T G A T G T T G T T G A T G T A
three...
Bisulfite sequence alignment
bisulfite sequencing
T T G T T G A
T T G A T G A T G T T G T T G A T G T A
T C G T T G Abisulfi...
C C G A T G A T G T C G C C G A C G C A
reference genome
Bisulfite sequence alignment
bisulfite sequencing
T C G T T G Abis...
C C G A T G A T G T C G C C G A C G C A
reference genome
T C G T T G A
bisulfite reads
Quantification of DNA methylation at...
C C G A T G A T G T C G C C G A C G C A
reference genome
T C G T T G A
bisulfite reads
Quantification of DNA methylation at...
C C G A T G A T G T C G C C G A C G C A
reference genome
T C G T T G A
bisulfite reads
Quantification of DNA methylation at...
C C G A T G A T G T C G C C G A C G C A
reference genome
T C G T T G A
bisulfite reads
Quantification of DNA methylation at...
Robertson, K. D. DNA methylation and human disease. Nat. Rev. Genet. 6, 597–610 (2005).
Correlation between DNA methylatio...
Statistical testing for differential DNA
methylation at genomic regions
t-test!
ANOVA!
Fisher’s exact test
normal
cancer
Statistical testing for differential DNA
methylation at genomic regions
t-test!
ANOVA!
Fisher’s exact test
normal
cancer
Statistical testing for differential DNA
methylation at genomic regions
t-test!
ANOVA!
Fisher’s exact test
normal
cancer
Statistical testing for differential DNA
methylation at genomic regions
t-test!
ANOVA!
Fisher’s exact test
normal
cancer
Statistical testing for differential DNA
methylation at genomic regions
t-test!
ANOVA!
Fisher’s exact test
normal
cancer
Statistical testing for differential DNA
methylation at genomic regions
t-test!
ANOVA!
Fisher’s exact test
normal
cancer
Statistical testing for differential DNA
methylation at genomic regions
differentially methylated region
(DMR)
t-test!
ANO...
Heterogeneity
(異質性)
hamming!
distance
2
1
3
average!
hamming!
distance
(2 + 1 + 3)/3 = 2
1!
2!
3
1!
2
1!
3
2!
3
pairwise!
comparison
Tsai, A. ...
1!
2!
3
Xie, H. et al. Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Researc...
ML 0 1 0.5 0.5
H 0 0 2 2.13
ME 0 0 0.25 1
DMRL
assess heterogeneity
BS-Seeker2
BS-Seeker2
raw reads
map to reference
call methylation
BAM
CGmap
identify DMR
variati...
Correlation between DMR and gene expression
“So if you're a CS major and you want to start a startup,
instead of taking a class on entrepreneurship you're
better off ...
Thank you :)
Upcoming SlideShare
Loading in …5
×

DMRL: A versatile Python tool to assess DNA methylation variation and identify DMRs @ PyCon APAC 2014

3,440 views

Published on

PyCon APAC 2014
"DMRL: A versatile Python tool to assess DNA methylation variation and identify DMRs" by Wen-Wei Liao

Published in: Data & Analytics, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,440
On SlideShare
0
From Embeds
0
Number of Embeds
146
Actions
Shares
0
Downloads
61
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

DMRL: A versatile Python tool to assess DNA methylation variation and identify DMRs @ PyCon APAC 2014

  1. 1. A  versatile  Python  tool  to  assess  DNA   methylation  variation  and  identify  DMRs ! Wen-Wei Liao tw.linkedin.com/in/wwliao/
  2. 2. “So if you're a CS major and you want to start a startup, instead of taking a class on entrepreneurship you're better off taking a class on, say, genetics. Or better still, go work for a biotech company. CS majors normally get summer jobs at computer hardware or software companies. But if you want to find startup ideas, you might do better to get a summer job in some unrelated field. ー How to Get Startup Ideas, Paul Graham
  3. 3. Man
  4. 4. Cells
  5. 5. 19 20 21 22 X Y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Chromosomes
  6. 6. Photo courtesy of Shutterstock. 2003 Human Genome Project 3 billion bases (Gb) 30 億
  7. 7. Do we really HACK the genome?
  8. 8. Why are identical twins different? ! Identical twins, who carry the same sequence of DNA nucleotides, often develop different physical characteristics, behaviors, and even tendencies to disease.
  9. 9. ...GATTACACCCATGTCAGTGCGA... ...CTAATGTGGGTACAGTCACGCT... DNA Sequence
  10. 10. ATTACACCCATGTCAGTGCGA TAATGTGGGTACAGTCACGCT m m DNA Sequence m = methylation (甲基化)
  11. 11. DNA  Methylation • 5 position of cytosine! • Regulation of gene expression! • Environment effect! • Heritable mark!
  12. 12. The  5th  base Methylated Cytosine Lister, R. & Ecker, J. R. Finding the fifth base: genome-wide sequencing! of cytosine methylation. Genome Res. 19, 959–966 (2009).
  13. 13. Improvements in the rate of DNA sequencing over the past 30 years Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009). human genome project next-generation sequencing
  14. 14. How does next-generation sequencing work? DNA extraction fragmentation! &! amplification massively parallel sequencing
  15. 15. Bisulfite sequencing Treatment of DNA with sodium bisulfite • unmethylated cytosines are converted to uracils • methylated cytosines are unaffected • after PCR amplification, unmethylated cytosines appear as thymines and methylated cytosines appear as cytosines C → T! mC → C
  16. 16. Bisulfite sequence alignment C C G A T G A T G T C G C C G A C G C A T C G C C G A C G C A C reference genome sample fragment C : methylated C
  17. 17. Bisulfite sequence alignment C C G A T G A T G T C G C C G A C G C A T C G C C G A C G C A C reference genome T C G T T G A bisulfite sequencing bisulfite read sample fragment C T T C bisulfite read reference C : methylated C
  18. 18. X Bisulfite sequence alignment C C G A T G A T G T C G C C G A C G C A T C G C C G A C G C A C reference genome T C G T T G A bisulfite sequencing bisulfite read sample fragment C T T C bisulfite read reference C : methylated C
  19. 19. Bisulfite sequence alignment bisulfite sequencing T C G T T G Abisulfite read C C G A T G A T G T C G C C G A C G C A reference genome sample fragment T C G C C G A C G C A C C : methylated C
  20. 20. Bisulfite sequence alignment bisulfite sequencing T T G T T G A three letters T C G T T G Abisulfite read C C G A T G A T G T C G C C G A C G C A reference genome sample fragment T C G C C G A C G C A C C : methylated C
  21. 21. Bisulfite sequence alignment bisulfite sequencing T T G T T G A three letters T T G A T G A T G T T G T T G A T G T A three letters T C G T T G Abisulfite read C C G A T G A T G T C G C C G A C G C A reference genome sample fragment T C G C C G A C G C A C C : methylated C
  22. 22. Bisulfite sequence alignment bisulfite sequencing T T G T T G A three letters T T G A T G A T G T T G T T G A T G T A three letters T C G T T G Abisulfite read C C G A T G A T G T C G C C G A C G C A reference genome sample fragment T C G C C G A C G C A C match! C : methylated C
  23. 23. Bisulfite sequence alignment bisulfite sequencing T T G T T G A T T G A T G A T G T T G T T G A T G T A T C G T T G Abisulfite read C C G A T G A T G T C G C C G A C G C A reference genome sample fragment T C G C C G A C G C A C C : methylated C
  24. 24. C C G A T G A T G T C G C C G A C G C A reference genome Bisulfite sequence alignment bisulfite sequencing T C G T T G Abisulfite read sample fragment T C G C C G A C G C A C methylated unmethylated C : methylated C
  25. 25. C C G A T G A T G T C G C C G A C G C A reference genome T C G T T G A bisulfite reads Quantification of DNA methylation at single-base resolution C G T T G A C T G T C G C T T T G T T G A • Estimate methylation level at each covered C! • Methylation level = #C / (#C + #T)
  26. 26. C C G A T G A T G T C G C C G A C G C A reference genome T C G T T G A bisulfite reads Quantification of DNA methylation at single-base resolution C G T T G A C T G T C G C T T T G T T G A • Estimate methylation level at each covered C! • Methylation level = #C / (#C + #T) 3 / (3 + 1) = 75%
  27. 27. C C G A T G A T G T C G C C G A C G C A reference genome T C G T T G A bisulfite reads Quantification of DNA methylation at single-base resolution C G T T G A C T G T C G C T T T G T T G A • Estimate methylation level at each covered C! • Methylation level = #C / (#C + #T) 1 / (1 + 3) = 25%
  28. 28. C C G A T G A T G T C G C C G A C G C A reference genome T C G T T G A bisulfite reads Quantification of DNA methylation at single-base resolution C G T T G A C T G T C G C T T T G T T G A • Estimate methylation level at each covered C! • Methylation level = #C / (#C + #T) 0 / (0 + 4) = 0%
  29. 29. Robertson, K. D. DNA methylation and human disease. Nat. Rev. Genet. 6, 597–610 (2005). Correlation between DNA methylation and cancer
  30. 30. Statistical testing for differential DNA methylation at genomic regions t-test! ANOVA! Fisher’s exact test normal cancer
  31. 31. Statistical testing for differential DNA methylation at genomic regions t-test! ANOVA! Fisher’s exact test normal cancer
  32. 32. Statistical testing for differential DNA methylation at genomic regions t-test! ANOVA! Fisher’s exact test normal cancer
  33. 33. Statistical testing for differential DNA methylation at genomic regions t-test! ANOVA! Fisher’s exact test normal cancer
  34. 34. Statistical testing for differential DNA methylation at genomic regions t-test! ANOVA! Fisher’s exact test normal cancer
  35. 35. Statistical testing for differential DNA methylation at genomic regions t-test! ANOVA! Fisher’s exact test normal cancer
  36. 36. Statistical testing for differential DNA methylation at genomic regions differentially methylated region (DMR) t-test! ANOVA! Fisher’s exact test normal cancer
  37. 37. Heterogeneity (異質性)
  38. 38. hamming! distance 2 1 3 average! hamming! distance (2 + 1 + 3)/3 = 2 1! 2! 3 1! 2 1! 3 2! 3 pairwise! comparison Tsai, A. G. et al. Heterogeneity and randomness of DNA methylation patterns in! human embryonic stem cells. DNA Cell Biol. 31, 893–907 (2012).
  39. 39. 1! 2! 3 Xie, H. et al. Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Research 39, 4099–4108 (2011).
  40. 40. ML 0 1 0.5 0.5 H 0 0 2 2.13 ME 0 0 0.25 1
  41. 41. DMRL assess heterogeneity BS-Seeker2 BS-Seeker2 raw reads map to reference call methylation BAM CGmap identify DMR variation profile variation profile • hamming distance! • methylation entropy • t-test! • ANOVA! • Fisher’s exact test DMR! list Analysis workflow Dependencies: pysam, NumPy, SciPy, matplotlib, pandas Guo, W. et al. BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data. BMC Genomics 14, 774 (2013).
  42. 42. Correlation between DMR and gene expression
  43. 43. “So if you're a CS major and you want to start a startup, instead of taking a class on entrepreneurship you're better off taking a class on, say, genetics. Or better still, go work for a biotech company. CS majors normally get summer jobs at computer hardware or software companies. But if you want to find startup ideas, you might do better to get a summer job in some unrelated field. ー How to Get Startup Ideas, Paul Graham
  44. 44. Thank you :)

×