Transcript detection in RNAseq

  • 5,381 views
Uploaded on

Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of …

Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,381
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
193
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://www.nature.com/nbt/journal/v29/n7/full/nbt.1915.html
  • That is, double-stranded cDNA is denatured, then allowed to partially re-anneal, and the most abundant species, which re-anneal most rapidly, are digested with crab duplex-specific nuclease

Transcript

  • 1. [by Joseph Robertson]
    RNAseq analysis: Transcript detection (1/2)
    What is a jar ?
    August 11, 2011
  • 2. Quick recap: Production informatics
    August 11, 2011
    Sequencing->Images->Conversion (Demultiplexing)
    Resulting file type: FASTQ
    “Having raw sequence reads and quality scores”
    Sequencing
    Image
    Fastq
    Quality Control
    Projects
  • 3. Objective & Challenges
    Objective: study the active transcriptome of the cell
    Problems:
    The RNA content of a cell is dominated by tRNA, rRNA and housekeeping genes
    Flowcell has only a finite real-estate of which most would be occupied by these mainly invariable transcripts
    How to focus the sequencing on the “interesting” part of the transcriptome: mRNA and ncRNA ?
    August 11, 2011
  • 4. What RNAseq protocols are there?
    RNA seq
    total RNA tRNA/rRNA removed + PolyA-tail filtered
    Good for studying protein coding genes, e.g.
    gene expression, isoforms, expression of variant alleles
    RNA editing events
    RNA-DNA differences in the human transcriptome provide a yet-unexplored aspect of genome variation.
    Small RNAseq:
    Total RNA size selection for small RNA molecules
    Good for small ncRNA e.g. miRNAs, snoRNA
    Duplex-specific thermostable nuclease (DSN) guided RNA seq normalization
    Total RNA  high abundant transcripts are digested
    Good for studying all transcripts
    August 11, 2011
    Today
    Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.
    Christodoulou DC, Gorham JM, Herman DS, Seidman JG. Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. CurrProtoc Mol Biol. 2011 PMID: 21472699
  • 5. RNA-seq workflow
    Select PolyA-tail + remove tRNA/rRNA
    Fragment RNA
    Make cDNA(caution you may loose strand info)
    Sequence
    Map reads
    Identify transcripts
    Quantify transcripts
    Identify differences between conditions
    August 11, 2011
    Today
    Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 PMID: 18516045.
  • 6. Production Informatics and Bioinformatics
    August 11, 2011
    Produce raw sequence reads
    Basic Production
    Informatics
    Map to genome and generate raw genomic features (e.g. SNPs)
    Advanced
    Production Inform.
    Analyze the data; Uncover the biological meaning
    Bioinformatics
    Research
    Per one-flowcell project
  • 7. Challenges for RNAseq read mapping
    Loosing reads because they do not match the ref. genome
    Reads spanning exon junctions
    RNA editing events
    Approaches
    Align to ref. transcriptom library
    Exon-first e.g. Tophat
    Seed-extend methods e.g. GSNAP
    August 11, 2011
    Sequencing reads
    DNA
    gRNA
    mRNA
    editing
    event
    Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
    Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.
  • 8. Exon-first approach
    Align reads to ref. genome
    Chop up unaligned reads and try to identify matching regions
    Find splice junctions around the matches
    August 11, 2011
    Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 9. Seed-extend approach
    Break reads in smaller k-mers and find matches
    Iteratively extend k-mers to identify exact spliced alignment
    August 11, 2011
    Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 10. Which method ?
    Exon-first: less computationally intensive
    The additional exon-junctions found by seed-extend have not (yet) been demonstrated to be real.
    August 11, 2011
    Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 11. Challenges for transcript detection
    Identifying isoforms is difficult
    Transcript abundance is volatile
    Most reads are not helpful (reads from exons) or even misleading (incompletely spliced precursor RNA)
    Genes can have many isoforms
    Approaches
    Ignore isoforms
    Genome-guided reconstruction, e.g. Cufflinks
    Genome-independent reconstruction, e.g. Trinity
    August 11, 2011
    QBI data
    Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 12. Genome-guided reconstruction
    Use reads spanning slice junction to assemble the transcript path
    Work out minimal possible set paths so that all reads are visited (graph theory)
    If more than one set use read count to pick the most probable
    August 11, 2011
    Reads aligned to the genome
    Isoforms
    Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 13. Genome-independent reconstruction
    Break reads into k-mers find their mutual overlap to build a de Bruijn graph
    Find probable paths through the graph by using read counts
    Map consensus assembly to genome
    August 11, 2011
    Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
  • 14. Which method?
    De novo methods are very computationally intensive
    However, they are able to find alternative isoforms and promoters and structural variation
    deletions (yellow)
    chimeras (green)
    August 11, 2011
    Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
  • 15. What are real transcripts?
    Even the most sophisticated computational method can’t tell you what is a real transcript.
    August 11, 2011
    Roberts et al.
    QBI data
    Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 PMID: 21697122.
  • 16. Solution: biological replicates
    Significant findings (here: new isoforms) in small sample sets can be due to
    Technical errors
    Biological variability
    Population outliers
    Sequencing experiments are subject to the same issues (even though they are more expensive than arrays)
    Replicates are necessary to build confidence in your results!
    August 11, 2011
    Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011 PMID: 21747377
  • 17. Three things to remember
    Methods for analyzing RNAseq data are not as mature as expression array analysis tools yet.
    Especially identifying transcript isoforms is difficult.
    Replicates are crucial to account for the biological variability
    August 11, 2011
  • 18. Next Week:
    August 11, 2011
    Abstract: This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.