Applied Bioinformatics Journal Club Pacbio RNA-Seq
1. Journal Club
A single-molecule long-read survey of
the human transcriptome
Sharon et al., Nature Biotechnology 31, 1009–1014 (2013)
Sanzhen Liu
Plant Pathology
3/12/2014
2. PacBio technology
• Amplification-free sequencing
• Very long (up to 20kb, peak on 2-6 kb)
• High errors (random, no-context-specific errors)
PacBio website
4. Figure 1
• Input: pooled RNAs from 20 tissues
• Approach: prepare double-stranded cDNAs -> CCS library -> PacBio sequencing
• Output: 476,000 CCS reads, mean=1kb
• 61% reads cover all introns and most first and last exons
• CCS reads well cover (generally >90%) short transcripts (<1.2 kb) but stay low
coverage for long transcripts, especially for those with >2.4 kb
5. Figure 2
Missing 3’ ends
Missing 5’ ends
The correlations
of the number of
reads and …
ERCC, mixture of known/quantified RNAs
6. Figure 3
• 67% molecules with splicing sites were estimated
• CSMM: consensus split-mapped molecule (accurate CCS reads with splicing sites?)
• Splicing sites well match annotated splicing sites
• PacBio (versus 454) exhibits much higher power to detect isoforms with >=10 introns
• Estimate: 21,000 genes and 139,000 isoforms can be detected with high-depth seq
7. Summary
• Full-length RNA of up to 1.5kb can readily be
monitored with little sequence loss at the 5’
ends
• With 476k CCS reads (>300bp), 14,000 spliced
genes were identified.
• The majority of introns are consistent with
annotations, but >10% are novel.
8. Conclusion
• Isoforms can be monitored at a single-molecule level
without amplification or fragmentation
• The majority of reads represent all splice sites of the
original transcripts
• Unannotated splice isoforms: long non-coding RNAs
with few introns and isoforms of known protein-
coding genes with many introns