Next-generation genomics: an integrative approach
Upcoming SlideShare
Loading in...5
×
 

Next-generation genomics: an integrative approach

on

  • 2,597 views

 

Statistics

Views

Total Views
2,597
Views on SlideShare
2,597
Embed Views
0

Actions

Likes
1
Downloads
175
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Next-generation genomics: an integrative approach Next-generation genomics: an integrative approach Presentation Transcript

  • Korea Center for Disease Control & Prevention Next-generation genomics: an integrative approach Chang Bum Hong Division of Structural and functional Genomics, Center for Genome Sciences, NIHPermissions: you are free to blog or live-blog about this presentation as long as you attribute the work to its authors
  • twitter
  • APPLICATIONS OF NEXT-GENERATION SEQUENCING2011• Genome structural variation discovery and genotyping• RNA sequencing: advances, challenges and opportunities• Charting histon modifications and the functional organization of mammalian genomes2010• Evaluating genome-scale approaches to eukaryotic DNA replication• Advances in understanding cancer genomes through second-generation sequencing• Genome-wide allele-specific analysis: insights into regulatory variation• Next-generation genomics: an integrative approach• Uncovering the roles of rare variants in common disease through whole-genome sequencing• Principles and challenges of genome-wide DNA methylation analysis• Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity• Sequencing technologies - the next generation• RNA processing and its regulation: global insights into biological networks2009• The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs• ChIP-seq: advantages and challenges of a maturing technology• Insights from genomic profiling of transcription factors• RNA-Seq: a revolutionary tool for transcriptomics
  • Genome-scale data GWAS, ChIP-seq and RNA-seq Phenotype Disease Proteomics Translated into proteins Protein Chromatin immuniprecipitation sequencing Sequencing of bisulfite-treated DNA Transcriptomics DNA beingtranscribed into RNA RNA Transcriptome sequencing Epigenome Small RNA sequencing DNA Genomics Complete genome resequencing Targeted genomic resequencing de novo sequencing
  • Next-generation sequencing • We define this as the use of established sequencing platforms, including the • Illumia/Solexa Genome Analyzer HiSeq 2000 MiSeq • Roche/454 Genome Sequencer • Applied Biosystems SOLiD Genome Sequencer FLX System GS Junior • Helicos and Pacific Biosciences 5500xl SOLid System Ion Personal Genome Machine HeliScope Single MoleculeJay Flatley Greg Lucier PACBIO RS Sequncer
  • Jim Watson Craig VenterJay Flatley Greg Lucier Stephen Quake ? John West Illumina CEO Life Technogoies CEO Founder of Helicos Former Illumina CEO
  • 94 x Illumina GA2, 10 x 454, 8 x SOLiD3/4, 1 x Heliscope, 1 x Polonator, 1 x PacBio Broad Institute BGI 1 x 454, 27 x SOLiD3/4, 128 x Illumina HiSeq Next Generation Genomics: World Map of High-throughput Sequencers http://pathogenomics.bham.ac.uk/hts/ GMI at Seoul National University College of Medicine 10 x Illumina GA2 Macrogen 10 x Illumina GA2, 1 x 454, 2 x SOLiD3/4 NICEM Illumina GA2, 454 Gachon University of Medicine and Science Illumina GA2, 2 x SOLiD 3/4 KRIBB 1x Illumina GA2
  • Sequencing technologies• Next-next....-generation: how many ‘next’s are there? • First Generation: automated version of Sanger sequencing(DNA-sequencing method invented by Fred Sanger in the 1970s) • Second Generation • Roche/454 sequencing machine from 454 Life Science(2005) • 450 bases per read / $0.02 per 1000 bases / 2 days per Gb • Solexa from Illumina(2006) • 75 bases per read / $0.01 per 1000 bases / 0.5 days per Gb • SOLiD from Applied BioSystem(2006) • 50 bases per read / $0.001 per 1000 bases / 0.5 days per Gb• Next-Next-Gen - Third Generation? • Hiseq2000 from Illumina - 0.04 days per Gb • Helicos Heliscope • Pacific Biosciences SMART
  • Sequencing technologies Feature generation Shendure & Ji, 2008 Michael L. Metzker, 2010
  • Sequencing technologies Sequencing by synthesis Michael L. Metzker, 2010
  • NGS typical procedure• Sequencing • How deep? • Single, Paired read or both• Alignment • References, assemble or both• Experimental specific analysis • A ‘one-size-fits-all’ program dose not exist
  • Applications• Sequence assembly • Whole Genome Assembly (Reference, De novo) • Transcriptome Assembly• Short Sequence Alignment • Single read • Paired read• Genomic Variation Detection • Detection of Single Nucleotide Polymorphism (SNP) • Detection of Alternative Splicing Event • Detection of major/minor transcript isoforms
  • Applications Shendure & Ji, 2008
  • Bioinformatics tools Shendure & Ji, 2008
  • File Format• Sequence Reads • fastq • fasta• Alignment • Sequence Alignment Map (SAM) • BAM (Binary Alignment Map)• Variation • VCF (Variation Call Format)
  • Data: Sequence Reads
  • Data: Sequence Reads A challenge call for a new compression algorithm Compression of genomic sequences in FASTQ format
  • Data: Sequence Reads Sebastian Deorowicz et.al, 2011Compress type Compress time Size gzip 14s 28M bzip2 9.75s 23M dsrc 1.36s 21M
  • Example of Applications• ChIP-Seq • allows you to assay the amount of binding and location of a protein to DNA, such as a transcription factor bound to the start site of a gene, or a histones of a certain type• RNA-Seq • Mapping transcription start sites • Characterization of alternative splicing patterns • Gene fusion detection • Estimation of the abundance of the transcripts from their depth of coverage in the mapping
  • ChIP-SeqChromatin immunoprecipitation (ChIP) Kharchenko et al, 2008 Barski A & Zhao K, 2009 Shirely et al, 2009
  • ChIP-Seq Shirely et al, 2009
  • ChIP-Seq Software packages Shirely et al, 2009
  • RNA-Seq RNA-Seq (De novo RNA-Seq(Transcriptometranscriptome assembly) resequencing) Zhong Wang, 2009
  • RNA-Seq RNA-Seq mapping of short reads over exon-exonRNA-Seq mapping of short reads in exon-exon junctions junctions, depending on where each end maps to, it could be defined a Transor a Cis event. from wikipedia.org
  • RNA-Seq Software packages Shirely et al, 2009
  • DNA encodes heritable traits• Genes in DNA being transcribed into RNA • might be spliced • transported to an appropriate cellular compartment • translated into proteins• Regulated at many levels • DNA methylation • chromatin modification • binding of transcription factors to the DNA • binding of splicing factors to the RNA and RNA transport
  • NGG(Next-generation genomics) an integrative approach• What types of genomic data sets are available?• Why perform integrative genomic analysis?• Approaches to an integrative analysis• Using large-scale data sets for integrative analysis• Future perspectives
  • What types of genomic data sets are available?• Sequence variation data • SNP genotyping arrays • resequencing• Transcriptomic data • RNA-Seq • identify transcripts arising from gene fusion events • detect novel classes of non-coding RNAs• Epigenomic data • Bisulphite tratment • Chromatin immunoprecipitation• Interactome data • RNA-protein interaction • protein -protein interaction networks • define genetic and signaling pathways
  • Why perform integrative genomic analysis?• Annotating functional features of the genome• Inferring the function of genetic variants• Understanding mechanisms of gene regulation Figure 1 | Annotating the genome through detecting transcription-factor binding sites and histone-modification states. Figure 2 | Identification of regulatory SNPs
  • Approaches to an integrative analysis• Data complexity reduction • summarize each experiment as a collection of genomic regions with strong enrichment of signal • especially important to inspect at least some of the results by eye• Unsupervised integration • 목적은 어떤 올바른 답을 찾는 것이 아니라 데이터 집합 내에서 구조를 발견 • Clustering: partitioning a large data set into easily digestible, conceptual pieces• Supervised integration • 예제 입출력을 사용해 예측하는 방법을 학습하는 기법 • Bayesian network
  • Approaches to an integrative analysis Promoter an intromic H3K4me1 peak predicts an enhancer elements Transcribed
  • UCSC browser with EnCODE data
  • Using large-scale data sets for integrative analysis• For the bench scientist • open-source web browser, such as FireFox • add-ons: gatekeepers
  • Using large-scale data sets for integrative analysis• For the bench scientist • stand-alone analytical system: CisGenome • genome browser: UCSC browser, Anno-J Galaxy Figure 4 | Flow chart for data analysis Workflow for ChIP-seq analysis UCSC browser Online or stand-alone tools
  • Using large-scale data sets for integrative analysis• Bioinformatics hurdles • normalized data
  • Future perspectives• Data integration itself is not an end • designed to generate novel hypotheses and help to test them• Community-wide effort, akin to Wikipedia• Searchable with Google-like capabilities
  • Future perspectives
  • Future perspectives
  • 사트남 알랙 토비 세가란 생명과학 커뮤니티를 위한 버티컬 검색 엔진을 개발Genstruct에서 약제 발현원리 이해를 위한 알고리즘 설계 하는 넥스트바이오의 엔지니어링 부사장