Next-generation genomics: an integrative approach


Published on

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Next-generation genomics: an integrative approach

  1. 1. Korea Center for Disease Control & Prevention Next-generation genomics: an integrative approach Chang Bum Hong Division of Structural and functional Genomics, Center for Genome Sciences, NIHPermissions: you are free to blog or live-blog about this presentation as long as you attribute the work to its authors
  2. 2. twitter
  3. 3. APPLICATIONS OF NEXT-GENERATION SEQUENCING2011• Genome structural variation discovery and genotyping• RNA sequencing: advances, challenges and opportunities• Charting histon modifications and the functional organization of mammalian genomes2010• Evaluating genome-scale approaches to eukaryotic DNA replication• Advances in understanding cancer genomes through second-generation sequencing• Genome-wide allele-specific analysis: insights into regulatory variation• Next-generation genomics: an integrative approach• Uncovering the roles of rare variants in common disease through whole-genome sequencing• Principles and challenges of genome-wide DNA methylation analysis• Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity• Sequencing technologies - the next generation• RNA processing and its regulation: global insights into biological networks2009• The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs• ChIP-seq: advantages and challenges of a maturing technology• Insights from genomic profiling of transcription factors• RNA-Seq: a revolutionary tool for transcriptomics
  4. 4. Genome-scale data GWAS, ChIP-seq and RNA-seq Phenotype Disease Proteomics Translated into proteins Protein Chromatin immuniprecipitation sequencing Sequencing of bisulfite-treated DNA Transcriptomics DNA beingtranscribed into RNA RNA Transcriptome sequencing Epigenome Small RNA sequencing DNA Genomics Complete genome resequencing Targeted genomic resequencing de novo sequencing
  5. 5. Next-generation sequencing • We define this as the use of established sequencing platforms, including the • Illumia/Solexa Genome Analyzer HiSeq 2000 MiSeq • Roche/454 Genome Sequencer • Applied Biosystems SOLiD Genome Sequencer FLX System GS Junior • Helicos and Pacific Biosciences 5500xl SOLid System Ion Personal Genome Machine HeliScope Single MoleculeJay Flatley Greg Lucier PACBIO RS Sequncer
  6. 6. Jim Watson Craig VenterJay Flatley Greg Lucier Stephen Quake ? John West Illumina CEO Life Technogoies CEO Founder of Helicos Former Illumina CEO
  7. 7. 94 x Illumina GA2, 10 x 454, 8 x SOLiD3/4, 1 x Heliscope, 1 x Polonator, 1 x PacBio Broad Institute BGI 1 x 454, 27 x SOLiD3/4, 128 x Illumina HiSeq Next Generation Genomics: World Map of High-throughput Sequencers GMI at Seoul National University College of Medicine 10 x Illumina GA2 Macrogen 10 x Illumina GA2, 1 x 454, 2 x SOLiD3/4 NICEM Illumina GA2, 454 Gachon University of Medicine and Science Illumina GA2, 2 x SOLiD 3/4 KRIBB 1x Illumina GA2
  8. 8. Sequencing technologies• Next-next....-generation: how many ‘next’s are there? • First Generation: automated version of Sanger sequencing(DNA-sequencing method invented by Fred Sanger in the 1970s) • Second Generation • Roche/454 sequencing machine from 454 Life Science(2005) • 450 bases per read / $0.02 per 1000 bases / 2 days per Gb • Solexa from Illumina(2006) • 75 bases per read / $0.01 per 1000 bases / 0.5 days per Gb • SOLiD from Applied BioSystem(2006) • 50 bases per read / $0.001 per 1000 bases / 0.5 days per Gb• Next-Next-Gen - Third Generation? • Hiseq2000 from Illumina - 0.04 days per Gb • Helicos Heliscope • Pacific Biosciences SMART
  9. 9. Sequencing technologies Feature generation Shendure & Ji, 2008 Michael L. Metzker, 2010
  10. 10. Sequencing technologies Sequencing by synthesis Michael L. Metzker, 2010
  11. 11. NGS typical procedure• Sequencing • How deep? • Single, Paired read or both• Alignment • References, assemble or both• Experimental specific analysis • A ‘one-size-fits-all’ program dose not exist
  12. 12. Applications• Sequence assembly • Whole Genome Assembly (Reference, De novo) • Transcriptome Assembly• Short Sequence Alignment • Single read • Paired read• Genomic Variation Detection • Detection of Single Nucleotide Polymorphism (SNP) • Detection of Alternative Splicing Event • Detection of major/minor transcript isoforms
  13. 13. Applications Shendure & Ji, 2008
  14. 14. Bioinformatics tools Shendure & Ji, 2008
  15. 15. File Format• Sequence Reads • fastq • fasta• Alignment • Sequence Alignment Map (SAM) • BAM (Binary Alignment Map)• Variation • VCF (Variation Call Format)
  16. 16. Data: Sequence Reads
  17. 17. Data: Sequence Reads A challenge call for a new compression algorithm Compression of genomic sequences in FASTQ format
  18. 18. Data: Sequence Reads Sebastian Deorowicz, 2011Compress type Compress time Size gzip 14s 28M bzip2 9.75s 23M dsrc 1.36s 21M
  19. 19. Example of Applications• ChIP-Seq • allows you to assay the amount of binding and location of a protein to DNA, such as a transcription factor bound to the start site of a gene, or a histones of a certain type• RNA-Seq • Mapping transcription start sites • Characterization of alternative splicing patterns • Gene fusion detection • Estimation of the abundance of the transcripts from their depth of coverage in the mapping
  20. 20. ChIP-SeqChromatin immunoprecipitation (ChIP) Kharchenko et al, 2008 Barski A & Zhao K, 2009 Shirely et al, 2009
  21. 21. ChIP-Seq Shirely et al, 2009
  22. 22. ChIP-Seq Software packages Shirely et al, 2009
  23. 23. RNA-Seq RNA-Seq (De novo RNA-Seq(Transcriptometranscriptome assembly) resequencing) Zhong Wang, 2009
  24. 24. RNA-Seq RNA-Seq mapping of short reads over exon-exonRNA-Seq mapping of short reads in exon-exon junctions junctions, depending on where each end maps to, it could be defined a Transor a Cis event. from
  25. 25. RNA-Seq Software packages Shirely et al, 2009
  26. 26. DNA encodes heritable traits• Genes in DNA being transcribed into RNA • might be spliced • transported to an appropriate cellular compartment • translated into proteins• Regulated at many levels • DNA methylation • chromatin modification • binding of transcription factors to the DNA • binding of splicing factors to the RNA and RNA transport
  27. 27. NGG(Next-generation genomics) an integrative approach• What types of genomic data sets are available?• Why perform integrative genomic analysis?• Approaches to an integrative analysis• Using large-scale data sets for integrative analysis• Future perspectives
  28. 28. What types of genomic data sets are available?• Sequence variation data • SNP genotyping arrays • resequencing• Transcriptomic data • RNA-Seq • identify transcripts arising from gene fusion events • detect novel classes of non-coding RNAs• Epigenomic data • Bisulphite tratment • Chromatin immunoprecipitation• Interactome data • RNA-protein interaction • protein -protein interaction networks • define genetic and signaling pathways
  29. 29. Why perform integrative genomic analysis?• Annotating functional features of the genome• Inferring the function of genetic variants• Understanding mechanisms of gene regulation Figure 1 | Annotating the genome through detecting transcription-factor binding sites and histone-modification states. Figure 2 | Identification of regulatory SNPs
  30. 30. Approaches to an integrative analysis• Data complexity reduction • summarize each experiment as a collection of genomic regions with strong enrichment of signal • especially important to inspect at least some of the results by eye• Unsupervised integration • 목적은 어떤 올바른 답을 찾는 것이 아니라 데이터 집합 내에서 구조를 발견 • Clustering: partitioning a large data set into easily digestible, conceptual pieces• Supervised integration • 예제 입출력을 사용해 예측하는 방법을 학습하는 기법 • Bayesian network
  31. 31. Approaches to an integrative analysis Promoter an intromic H3K4me1 peak predicts an enhancer elements Transcribed
  32. 32. UCSC browser with EnCODE data
  33. 33. Using large-scale data sets for integrative analysis• For the bench scientist • open-source web browser, such as FireFox • add-ons: gatekeepers
  34. 34. Using large-scale data sets for integrative analysis• For the bench scientist • stand-alone analytical system: CisGenome • genome browser: UCSC browser, Anno-J Galaxy Figure 4 | Flow chart for data analysis Workflow for ChIP-seq analysis UCSC browser Online or stand-alone tools
  35. 35. Using large-scale data sets for integrative analysis• Bioinformatics hurdles • normalized data
  36. 36. Future perspectives• Data integration itself is not an end • designed to generate novel hypotheses and help to test them• Community-wide effort, akin to Wikipedia• Searchable with Google-like capabilities
  37. 37. Future perspectives
  38. 38. Future perspectives
  39. 39. 사트남 알랙 토비 세가란 생명과학 커뮤니티를 위한 버티컬 검색 엔진을 개발Genstruct에서 약제 발현원리 이해를 위한 알고리즘 설계 하는 넥스트바이오의 엔지니어링 부사장
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.