Transcript detection in RNAseq

6,353 views

Published on

Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,353
On SlideShare
0
From Embeds
0
Number of Embeds
1,888
Actions
Shares
0
Downloads
218
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • http://www.nature.com/nbt/journal/v29/n7/full/nbt.1915.html
  • That is, double-stranded cDNA is denatured, then allowed to partially re-anneal, and the most abundant species, which re-anneal most rapidly, are digested with crab duplex-specific nuclease
  • Transcript detection in RNAseq

    1. 1. [by Joseph Robertson]<br />RNAseq analysis: Transcript detection (1/2)<br />What is a jar ?<br />August 11, 2011<br />
    2. 2. Quick recap: Production informatics<br />August 11, 2011<br />Sequencing->Images->Conversion (Demultiplexing)<br />Resulting file type: FASTQ<br />“Having raw sequence reads and quality scores”<br />Sequencing<br />Image<br />Fastq<br />Quality Control<br />Projects<br />
    3. 3. Objective & Challenges<br />Objective: study the active transcriptome of the cell<br />Problems:<br />The RNA content of a cell is dominated by tRNA, rRNA and housekeeping genes<br />Flowcell has only a finite real-estate of which most would be occupied by these mainly invariable transcripts<br />How to focus the sequencing on the “interesting” part of the transcriptome: mRNA and ncRNA ?<br />August 11, 2011<br />
    4. 4. What RNAseq protocols are there?<br />RNA seq<br />total RNA tRNA/rRNA removed + PolyA-tail filtered<br />Good for studying protein coding genes, e.g. <br />gene expression, isoforms, expression of variant alleles<br />RNA editing events<br />RNA-DNA differences in the human transcriptome provide a yet-unexplored aspect of genome variation. <br />Small RNAseq: <br />Total RNA size selection for small RNA molecules<br />Good for small ncRNA e.g. miRNAs, snoRNA<br />Duplex-specific thermostable nuclease (DSN) guided RNA seq normalization<br />Total RNA  high abundant transcripts are digested <br />Good for studying all transcripts<br />August 11, 2011<br />Today<br />Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.<br />Christodoulou DC, Gorham JM, Herman DS, Seidman JG. Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. CurrProtoc Mol Biol. 2011 PMID: 21472699<br />
    5. 5. RNA-seq workflow<br />Select PolyA-tail + remove tRNA/rRNA<br />Fragment RNA<br />Make cDNA(caution you may loose strand info)<br />Sequence<br />Map reads<br />Identify transcripts<br />Quantify transcripts<br />Identify differences between conditions<br />August 11, 2011<br />Today<br />Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 PMID: 18516045.<br />
    6. 6. Production Informatics and Bioinformatics<br />August 11, 2011<br />Produce raw sequence reads<br />Basic Production<br />Informatics<br />Map to genome and generate raw genomic features (e.g. SNPs)<br />Advanced <br />Production Inform.<br />Analyze the data; Uncover the biological meaning<br />Bioinformatics<br />Research<br />Per one-flowcell project<br />
    7. 7. Challenges for RNAseq read mapping<br />Loosing reads because they do not match the ref. genome<br />Reads spanning exon junctions<br />RNA editing events <br />Approaches<br />Align to ref. transcriptom library<br />Exon-first e.g. Tophat<br />Seed-extend methods e.g. GSNAP<br />August 11, 2011<br />Sequencing reads<br />DNA<br />gRNA<br />mRNA<br />editing<br />event<br />Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.<br />Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.<br />
    8. 8. Exon-first approach<br />Align reads to ref. genome<br />Chop up unaligned reads and try to identify matching regions<br />Find splice junctions around the matches<br />August 11, 2011<br />Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.<br />
    9. 9. Seed-extend approach<br />Break reads in smaller k-mers and find matches<br />Iteratively extend k-mers to identify exact spliced alignment<br />August 11, 2011<br />Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.<br />
    10. 10. Which method ?<br />Exon-first: less computationally intensive<br />The additional exon-junctions found by seed-extend have not (yet) been demonstrated to be real.<br />August 11, 2011<br />Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.<br />
    11. 11. Challenges for transcript detection<br />Identifying isoforms is difficult<br />Transcript abundance is volatile<br />Most reads are not helpful (reads from exons) or even misleading (incompletely spliced precursor RNA) <br />Genes can have many isoforms<br />Approaches<br />Ignore isoforms<br />Genome-guided reconstruction, e.g. Cufflinks<br />Genome-independent reconstruction, e.g. Trinity<br />August 11, 2011<br />QBI data<br />Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.<br />
    12. 12. Genome-guided reconstruction<br />Use reads spanning slice junction to assemble the transcript path<br />Work out minimal possible set paths so that all reads are visited (graph theory)<br />If more than one set use read count to pick the most probable <br />August 11, 2011<br />Reads aligned to the genome<br />Isoforms<br />Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.<br />
    13. 13. Genome-independent reconstruction<br />Break reads into k-mers find their mutual overlap to build a de Bruijn graph<br />Find probable paths through the graph by using read counts<br />Map consensus assembly to genome<br />August 11, 2011<br />Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.<br />
    14. 14. Which method?<br />De novo methods are very computationally intensive<br />However, they are able to find alternative isoforms and promoters and structural variation<br />deletions (yellow)<br />chimeras (green)<br />August 11, 2011<br />Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.<br />
    15. 15. What are real transcripts?<br />Even the most sophisticated computational method can’t tell you what is a real transcript.<br />August 11, 2011<br />Roberts et al.<br />QBI data<br />Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 PMID: 21697122.<br />
    16. 16. Solution: biological replicates<br />Significant findings (here: new isoforms) in small sample sets can be due to <br />Technical errors<br />Biological variability<br />Population outliers<br />Sequencing experiments are subject to the same issues (even though they are more expensive than arrays)<br />Replicates are necessary to build confidence in your results!<br />August 11, 2011<br />Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011 PMID: 21747377<br />
    17. 17. Three things to remember<br />Methods for analyzing RNAseq data are not as mature as expression array analysis tools yet.<br />Especially identifying transcript isoforms is difficult.<br />Replicates are crucial to account for the biological variability<br />August 11, 2011<br />
    18. 18. Next Week:<br />August 11, 2011<br />Abstract: This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.<br />

    ×