Your SlideShare is downloading. ×
Transcript detection in RNAseq
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Transcript detection in RNAseq

5,461
views

Published on

Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of …

Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,461
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
194
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • http://www.nature.com/nbt/journal/v29/n7/full/nbt.1915.html
  • That is, double-stranded cDNA is denatured, then allowed to partially re-anneal, and the most abundant species, which re-anneal most rapidly, are digested with crab duplex-specific nuclease
  • Transcript

    • 1. [by Joseph Robertson]
      RNAseq analysis: Transcript detection (1/2)
      What is a jar ?
      August 11, 2011
    • 2. Quick recap: Production informatics
      August 11, 2011
      Sequencing->Images->Conversion (Demultiplexing)
      Resulting file type: FASTQ
      “Having raw sequence reads and quality scores”
      Sequencing
      Image
      Fastq
      Quality Control
      Projects
    • 3. Objective & Challenges
      Objective: study the active transcriptome of the cell
      Problems:
      The RNA content of a cell is dominated by tRNA, rRNA and housekeeping genes
      Flowcell has only a finite real-estate of which most would be occupied by these mainly invariable transcripts
      How to focus the sequencing on the “interesting” part of the transcriptome: mRNA and ncRNA ?
      August 11, 2011
    • 4. What RNAseq protocols are there?
      RNA seq
      total RNA tRNA/rRNA removed + PolyA-tail filtered
      Good for studying protein coding genes, e.g.
      gene expression, isoforms, expression of variant alleles
      RNA editing events
      RNA-DNA differences in the human transcriptome provide a yet-unexplored aspect of genome variation.
      Small RNAseq:
      Total RNA size selection for small RNA molecules
      Good for small ncRNA e.g. miRNAs, snoRNA
      Duplex-specific thermostable nuclease (DSN) guided RNA seq normalization
      Total RNA  high abundant transcripts are digested
      Good for studying all transcripts
      August 11, 2011
      Today
      Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.
      Christodoulou DC, Gorham JM, Herman DS, Seidman JG. Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. CurrProtoc Mol Biol. 2011 PMID: 21472699
    • 5. RNA-seq workflow
      Select PolyA-tail + remove tRNA/rRNA
      Fragment RNA
      Make cDNA(caution you may loose strand info)
      Sequence
      Map reads
      Identify transcripts
      Quantify transcripts
      Identify differences between conditions
      August 11, 2011
      Today
      Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 PMID: 18516045.
    • 6. Production Informatics and Bioinformatics
      August 11, 2011
      Produce raw sequence reads
      Basic Production
      Informatics
      Map to genome and generate raw genomic features (e.g. SNPs)
      Advanced
      Production Inform.
      Analyze the data; Uncover the biological meaning
      Bioinformatics
      Research
      Per one-flowcell project
    • 7. Challenges for RNAseq read mapping
      Loosing reads because they do not match the ref. genome
      Reads spanning exon junctions
      RNA editing events
      Approaches
      Align to ref. transcriptom library
      Exon-first e.g. Tophat
      Seed-extend methods e.g. GSNAP
      August 11, 2011
      Sequencing reads
      DNA
      gRNA
      mRNA
      editing
      event
      Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
      Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.
    • 8. Exon-first approach
      Align reads to ref. genome
      Chop up unaligned reads and try to identify matching regions
      Find splice junctions around the matches
      August 11, 2011
      Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
    • 9. Seed-extend approach
      Break reads in smaller k-mers and find matches
      Iteratively extend k-mers to identify exact spliced alignment
      August 11, 2011
      Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
    • 10. Which method ?
      Exon-first: less computationally intensive
      The additional exon-junctions found by seed-extend have not (yet) been demonstrated to be real.
      August 11, 2011
      Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
    • 11. Challenges for transcript detection
      Identifying isoforms is difficult
      Transcript abundance is volatile
      Most reads are not helpful (reads from exons) or even misleading (incompletely spliced precursor RNA)
      Genes can have many isoforms
      Approaches
      Ignore isoforms
      Genome-guided reconstruction, e.g. Cufflinks
      Genome-independent reconstruction, e.g. Trinity
      August 11, 2011
      QBI data
      Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
    • 12. Genome-guided reconstruction
      Use reads spanning slice junction to assemble the transcript path
      Work out minimal possible set paths so that all reads are visited (graph theory)
      If more than one set use read count to pick the most probable
      August 11, 2011
      Reads aligned to the genome
      Isoforms
      Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
    • 13. Genome-independent reconstruction
      Break reads into k-mers find their mutual overlap to build a de Bruijn graph
      Find probable paths through the graph by using read counts
      Map consensus assembly to genome
      August 11, 2011
      Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
    • 14. Which method?
      De novo methods are very computationally intensive
      However, they are able to find alternative isoforms and promoters and structural variation
      deletions (yellow)
      chimeras (green)
      August 11, 2011
      Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
    • 15. What are real transcripts?
      Even the most sophisticated computational method can’t tell you what is a real transcript.
      August 11, 2011
      Roberts et al.
      QBI data
      Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 PMID: 21697122.
    • 16. Solution: biological replicates
      Significant findings (here: new isoforms) in small sample sets can be due to
      Technical errors
      Biological variability
      Population outliers
      Sequencing experiments are subject to the same issues (even though they are more expensive than arrays)
      Replicates are necessary to build confidence in your results!
      August 11, 2011
      Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011 PMID: 21747377
    • 17. Three things to remember
      Methods for analyzing RNAseq data are not as mature as expression array analysis tools yet.
      Especially identifying transcript isoforms is difficult.
      Replicates are crucial to account for the biological variability
      August 11, 2011
    • 18. Next Week:
      August 11, 2011
      Abstract: This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.