Applied Bioinformatics Journal Club Pacbio RNA-Seq

989 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
989
On SlideShare
0
From Embeds
0
Number of Embeds
274
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Applied Bioinformatics Journal Club Pacbio RNA-Seq

  1. 1. Journal Club A single-molecule long-read survey of the human transcriptome Sharon et al., Nature Biotechnology 31, 1009–1014 (2013) Sanzhen Liu Plant Pathology 3/12/2014
  2. 2. PacBio technology • Amplification-free sequencing • Very long (up to 20kb, peak on 2-6 kb) • High errors (random, no-context-specific errors) PacBio website
  3. 3. CCS approach • High-quality, single-molecule, circular- consensus (CCS) reads http://flxlexblog.wordpress.com/2013/02/11/applications-for-pacbio-circular-consensus-sequencing/
  4. 4. Figure 1 • Input: pooled RNAs from 20 tissues • Approach: prepare double-stranded cDNAs -> CCS library -> PacBio sequencing • Output: 476,000 CCS reads, mean=1kb • 61% reads cover all introns and most first and last exons • CCS reads well cover (generally >90%) short transcripts (<1.2 kb) but stay low coverage for long transcripts, especially for those with >2.4 kb
  5. 5. Figure 2 Missing 3’ ends Missing 5’ ends The correlations of the number of reads and … ERCC, mixture of known/quantified RNAs
  6. 6. Figure 3 • 67% molecules with splicing sites were estimated • CSMM: consensus split-mapped molecule (accurate CCS reads with splicing sites?) • Splicing sites well match annotated splicing sites • PacBio (versus 454) exhibits much higher power to detect isoforms with >=10 introns • Estimate: 21,000 genes and 139,000 isoforms can be detected with high-depth seq
  7. 7. Summary • Full-length RNA of up to 1.5kb can readily be monitored with little sequence loss at the 5’ ends • With 476k CCS reads (>300bp), 14,000 spliced genes were identified. • The majority of introns are consistent with annotations, but >10% are novel.
  8. 8. Conclusion • Isoforms can be monitored at a single-molecule level without amplification or fragmentation • The majority of reads represent all splice sites of the original transcripts • Unannotated splice isoforms: long non-coding RNAs with few introns and isoforms of known protein- coding genes with many introns

×