An Epigenetics Odyssey @ PyCon Taiwan 2013

1,220 views

Published on

PyCon Taiwan 2013
"An Epigenetics Odyssey" by Wen-Wei Liao

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,220
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An Epigenetics Odyssey @ PyCon Taiwan 2013

  1. 1. AN EPIGENETICS ODYSSEYWen-Wei Liaowwliao@gate.sinica.edu.tw
  2. 2. Wen-Wei Liao (廖玟崴)Work in Pao-Yang Chen’s Lab, Academia Sinica(陳柏仰)EducationM.S. in Dept. of Systems Neuroscience, NTHUB.S. in Dept. of Life Science, NTHUTalks“Use the Matplotlib, Luke,” PyCon Taiwan 2012“Matplotlib for Python Programmers,” PyHUG
  3. 3. PyCon TaiwanOur Lab
  4. 4. Who’s Paul Graham?
  5. 5. “So if youre a CS major and you want to start a startup,instead of taking a class on entrepreneurship yourebetter off taking a class on, say, genetics. Or better still,go work for a biotech company. CS majors normally getsummer jobs at computer hardware or softwarecompanies. But if you want to find startup ideas, youmight do better to get a summer job in some unrelatedfield.ー How to Get Startup Ideas, Paul Graham
  6. 6. Human
  7. 7. 19 20 21 22 X Y1 2 3 4 56 7 8 9 10 11 1213 14 15 16 17 18Chromosomes
  8. 8. Double Helix
  9. 9. 2003Human Genome Project3 billion bases (Gb)30 億
  10. 10. Why are Identical Twins different?
  11. 11. ...GATTACACCCATGTCAGTGCG......CTAATGTGGGTACAGTCACGC...DNA Sequences
  12. 12. ATTACACCCATGTCAGTGCTAATGTGGGTACAGTCACGDNA Sequences
  13. 13. ATTACACCCATGTCAGTGCTAATGTGGGTACAGTCACG                  m                      mDNA Sequences
  14. 14. ATTACACCCATGTCAGTGCTAATGTGGGTACAGTCACG                  m                      mDNA Sequencesm = methylation (甲基化)
  15. 15. Epigenetics (表觀遺傳學): Above GeneticsDNA Methylation (甲基化)• Regulation of gene expression• Environment effect• Heritable marks
  16. 16. “The new science of epigeneticsreveals how the choices youmake can change your genesand those of your kids.”
  17. 17. Next-generation Sequencing30 Gb (300 億)SonicationAdaptorLigationPCRAmplificationSequencingin Parallel
  18. 18. Next-generation Sequencing30 Gb (300 億)SonicationAdaptorLigationPCRAmplificationSequencingin ParallelBisulfiteConversion
  19. 19. Detecting Cytosine Methylation          m    mGATTACACCCATGTGATTACATCTATGTsodium bisulfite亞硫酸鈉1. Apply sodium bisulfiteamplify2. C → T, methylated C(and A/T/G)unchanged3. Align new sequence toknown reference andcompare
  20. 20. Analysis WorkflowSequenceMappingMethylationCallingStatistics&PlottingBS Seeker* mapping of bisulfite-treated readsBiopython*parse bioinformatics files into Pythonutilizable data structuresPyTables*manage hierarchical datasets (HDF5)and design to efficiently cope withextremely large amounts of dataPysamPython wrapper of SAMtools C-APIread and manipulate SAM filesrePython built-in module for regularexpressionNumPy&SciPymatrix operations, statistics,clustering, and moreMatplotlib data visualization
  21. 21. Mapping Approach: BS SeekerBS  reads  are  C/T  converted,  so  normal  aligners  are  NOT  applicable3 letters alignment algorithm:Chen et al. (2010). BMC Bioinformatics.                                              Convert  C  to  T        Bowtie  mapping        Restore  to  4  letters                                                                                                                        compare  alignmentBS  read:          AATCGTA              AATTGTA                                                                                          AATTGTA                          AATCGTA                                                                                      TTAATTGTAGG                  CTAATCGCAGGRef.genome:        CTAATCGCAGG      TTAATTGTAGG
  22. 22.                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)
  23. 23.                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)
  24. 24.                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)1  /  (1  +  2)  =  33.3%
  25. 25.                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)1  /  (1  +  2)  =  33.3%
  26. 26.                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)1  /  (1  +  2)  =  33.3%3  /  (3  +  0)  =  100%
  27. 27. Biopythonfrom Bio import SeqIO# load all the records into memory at oncewith open(some.fasta) as infile:seq = SeqIO.to_dict(SeqIO.parse(infile, fasta))# indexing approach for very large file# provide dictionary-like access to any recordseq = SeqIO.index(some.fasta, fasta)>chr1TTTAATTATCTCTGAAATTTAAACCCCCAAATCCAGGTAATAAAGCAAGGAAATGTCTTACAGCCCAACACTTGCCATCAATACTTTTTCGATGTTA...>chr2GGCTGCTCTATCCTTTTCTGCACATTTGAACTCCTCCGCTGTGGGCCATTCTCATTTGCTTTACTTCCTAGTCTGAATTCCATGGGAACTGCATTTA...FASTA file
  28. 28. PyTableschr1        C              10502      CHG          CT            0.000      0              14chr1        G              10504      CHG          CA            0.000      0              29chr1        C              10506      CHH          CC            0.000      0              15chr1        C              10507      CHG          CT            0.067      1              15chr1        G              10509      CHG          CA            0.000      0              30chr1        G              10511      CHH          CT            0.000      0              30chr1        G              10512      CHH          CC            0.000      0              29chr1        G              10514      CHH          CT            0.000      0              28chr1        C              10517      CHG          CT            0.000      0              15chr1        G              10519      CHG          CA            0.000      0              21chr1        G              10521      CHH          CA            0.045      1              22    .          .                  .            .              .                .          .                .    .          .                  .            .              .                .          .                .    .          .                  .            .              .                .          .                .chrY        G              63410      CHH          CT            0.000      0              18CGmap file
  29. 29. PyTablesrootgrouptables/. . . . . .cgmapchr1chr2chrY• Hierarchical Data Format (HDF5)
  30. 30. PyTablesimport tablesclass MethylSite(tables.IsDescription):strand = tables.StringCol(1)position = tables.Int64Col()context = tables.StringCol(3)dint = tables.StringCol(2)level = tables.Float32Col()mdepth = tables.Int32Col()depth = tables.Int32Col()
  31. 31. with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  32. 32. with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  33. 33. with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  34. 34. with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  35. 35. with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  36. 36. with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  37. 37. PyTableswith tables.openFile(args.h5file, a) as h5file:# indexed mode. Indexing is just a kind of sorting operation# over a column, so that searches along such a column will look# at this sorted information by using a binary searchtable = h5file.root.cgmap.chr1table.cols.position.createIndex()table.cols.level.createIndex()table.cols.depth.createIndex()# in-kernel mode, the condition is passed to the PyTables kernel,# written in C, and evaluated there at full C speedcondition = "(depth >= 4) & (level >= 0.05)"res = [row[position] for row in table.where(condition)]condition = "(position > 1000) & (position < 5000)"res = [row[position] for row in table.where(condition)]
  38. 38. “So if youre a CS major and you want to start a startup,instead of taking a class on entrepreneurship yourebetter off taking a class on, say, genetics. Or better still,go work for a biotech company. CS majors normally getsummer jobs at computer hardware or softwarecompanies. But if you want to find startup ideas, youmight do better to get a summer job in some unrelatedfield.ー How to Get Startup Ideas, Paul Graham
  39. 39. join us :)

×