An Epigenetics Odyssey @ PyCon Taiwan 2013
Upcoming SlideShare
Loading in...5
×
 

An Epigenetics Odyssey @ PyCon Taiwan 2013

on

  • 774 views

PyCon Taiwan 2013

PyCon Taiwan 2013
"An Epigenetics Odyssey" by Wen-Wei Liao

Statistics

Views

Total Views
774
Views on SlideShare
751
Embed Views
23

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 23

http://www.plurk.com 23

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An Epigenetics Odyssey @ PyCon Taiwan 2013 An Epigenetics Odyssey @ PyCon Taiwan 2013 Presentation Transcript

  • AN EPIGENETICS ODYSSEYWen-Wei Liaowwliao@gate.sinica.edu.tw
  • Wen-Wei Liao (廖玟崴)Work in Pao-Yang Chen’s Lab, Academia Sinica(陳柏仰)EducationM.S. in Dept. of Systems Neuroscience, NTHUB.S. in Dept. of Life Science, NTHUTalks“Use the Matplotlib, Luke,” PyCon Taiwan 2012“Matplotlib for Python Programmers,” PyHUG
  • PyCon TaiwanOur Lab
  • Who’s Paul Graham?
  • “So if youre a CS major and you want to start a startup,instead of taking a class on entrepreneurship yourebetter off taking a class on, say, genetics. Or better still,go work for a biotech company. CS majors normally getsummer jobs at computer hardware or softwarecompanies. But if you want to find startup ideas, youmight do better to get a summer job in some unrelatedfield.ー How to Get Startup Ideas, Paul Graham
  • Human
  • 19 20 21 22 X Y1 2 3 4 56 7 8 9 10 11 1213 14 15 16 17 18Chromosomes
  • Double Helix
  • 2003Human Genome Project3 billion bases (Gb)30 億
  • Why are Identical Twins different?
  • ...GATTACACCCATGTCAGTGCG......CTAATGTGGGTACAGTCACGC...DNA Sequences
  • ATTACACCCATGTCAGTGCTAATGTGGGTACAGTCACGDNA Sequences
  • ATTACACCCATGTCAGTGCTAATGTGGGTACAGTCACG                  m                      mDNA Sequences
  • ATTACACCCATGTCAGTGCTAATGTGGGTACAGTCACG                  m                      mDNA Sequencesm = methylation (甲基化)
  • Epigenetics (表觀遺傳學): Above GeneticsDNA Methylation (甲基化)• Regulation of gene expression• Environment effect• Heritable marks
  • “The new science of epigeneticsreveals how the choices youmake can change your genesand those of your kids.”
  • Next-generation Sequencing30 Gb (300 億)SonicationAdaptorLigationPCRAmplificationSequencingin Parallel
  • Next-generation Sequencing30 Gb (300 億)SonicationAdaptorLigationPCRAmplificationSequencingin ParallelBisulfiteConversion
  • Detecting Cytosine Methylation          m    mGATTACACCCATGTGATTACATCTATGTsodium bisulfite亞硫酸鈉1. Apply sodium bisulfiteamplify2. C → T, methylated C(and A/T/G)unchanged3. Align new sequence toknown reference andcompare
  • Analysis WorkflowSequenceMappingMethylationCallingStatistics&PlottingBS Seeker* mapping of bisulfite-treated readsBiopython*parse bioinformatics files into Pythonutilizable data structuresPyTables*manage hierarchical datasets (HDF5)and design to efficiently cope withextremely large amounts of dataPysamPython wrapper of SAMtools C-APIread and manipulate SAM filesrePython built-in module for regularexpressionNumPy&SciPymatrix operations, statistics,clustering, and moreMatplotlib data visualization
  • Mapping Approach: BS SeekerBS  reads  are  C/T  converted,  so  normal  aligners  are  NOT  applicable3 letters alignment algorithm:Chen et al. (2010). BMC Bioinformatics.                                              Convert  C  to  T        Bowtie  mapping        Restore  to  4  letters                                                                                                                        compare  alignmentBS  read:          AATCGTA              AATTGTA                                                                                          AATTGTA                          AATCGTA                                                                                      TTAATTGTAGG                  CTAATCGCAGGRef.genome:        CTAATCGCAGG      TTAATTGTAGG
  •                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)
  •                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)
  •                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)1  /  (1  +  2)  =  33.3%
  •                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)1  /  (1  +  2)  =  33.3%
  •                                                  TAGTGCGTGGTG                                        CATTTTAGTGCGTGG                                            TTTTAGCGCGTGGTGRef.  genome    ATTGAGACATCCTAGCGCGTGGTGACAATAATAMethylation levels at single-base resolution• Estimate methylation level at each covered C• Methylation level = #C / (#C + #T)1  /  (1  +  2)  =  33.3%3  /  (3  +  0)  =  100%
  • Biopythonfrom Bio import SeqIO# load all the records into memory at oncewith open(some.fasta) as infile:seq = SeqIO.to_dict(SeqIO.parse(infile, fasta))# indexing approach for very large file# provide dictionary-like access to any recordseq = SeqIO.index(some.fasta, fasta)>chr1TTTAATTATCTCTGAAATTTAAACCCCCAAATCCAGGTAATAAAGCAAGGAAATGTCTTACAGCCCAACACTTGCCATCAATACTTTTTCGATGTTA...>chr2GGCTGCTCTATCCTTTTCTGCACATTTGAACTCCTCCGCTGTGGGCCATTCTCATTTGCTTTACTTCCTAGTCTGAATTCCATGGGAACTGCATTTA...FASTA file
  • PyTableschr1        C              10502      CHG          CT            0.000      0              14chr1        G              10504      CHG          CA            0.000      0              29chr1        C              10506      CHH          CC            0.000      0              15chr1        C              10507      CHG          CT            0.067      1              15chr1        G              10509      CHG          CA            0.000      0              30chr1        G              10511      CHH          CT            0.000      0              30chr1        G              10512      CHH          CC            0.000      0              29chr1        G              10514      CHH          CT            0.000      0              28chr1        C              10517      CHG          CT            0.000      0              15chr1        G              10519      CHG          CA            0.000      0              21chr1        G              10521      CHH          CA            0.045      1              22    .          .                  .            .              .                .          .                .    .          .                  .            .              .                .          .                .    .          .                  .            .              .                .          .                .chrY        G              63410      CHH          CT            0.000      0              18CGmap file
  • PyTablesrootgrouptables/. . . . . .cgmapchr1chr2chrY• Hierarchical Data Format (HDF5)
  • PyTablesimport tablesclass MethylSite(tables.IsDescription):strand = tables.StringCol(1)position = tables.Int64Col()context = tables.StringCol(3)dint = tables.StringCol(2)level = tables.Float32Col()mdepth = tables.Int32Col()depth = tables.Int32Col()
  • with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  • with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  • with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  • with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  • with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  • with open(args.cgmap) as cgmap:root = os.path.splitext(os.path.basename(args.cgmap))[0]with tables.openFile({0}.h5.format(root), w, root) as h5file:group = h5file.createGroup(/, cgmap)for line in cgmap:line = line.strip().split()try:table = group._f_getChild(line[0])except tables.NoSuchNodeError:table = h5file.createTable(group, line[0], MethylSite)methylsite = table.rowmethylsite[strand] = line[1]methylsite[position] = int(line[2])methylsite[context] = line[3]methylsite[level] = float(line[5])methylsite[mdepth] = int(line[6])methylsite[depth] = int(line[7])methylsite.append()table.flush()PyTables
  • PyTableswith tables.openFile(args.h5file, a) as h5file:# indexed mode. Indexing is just a kind of sorting operation# over a column, so that searches along such a column will look# at this sorted information by using a binary searchtable = h5file.root.cgmap.chr1table.cols.position.createIndex()table.cols.level.createIndex()table.cols.depth.createIndex()# in-kernel mode, the condition is passed to the PyTables kernel,# written in C, and evaluated there at full C speedcondition = "(depth >= 4) & (level >= 0.05)"res = [row[position] for row in table.where(condition)]condition = "(position > 1000) & (position < 5000)"res = [row[position] for row in table.where(condition)]
  • “So if youre a CS major and you want to start a startup,instead of taking a class on entrepreneurship yourebetter off taking a class on, say, genetics. Or better still,go work for a biotech company. CS majors normally getsummer jobs at computer hardware or softwarecompanies. But if you want to find startup ideas, youmight do better to get a summer job in some unrelatedfield.ー How to Get Startup Ideas, Paul Graham
  • join us :)