High-throughput RNA sequencing
with Thermostable Group II Intron
Reverse Transcriptase
Douglas C. Wu
@wckdouglas
wckdouglas.github.io
Lambowitz’s Lab
MBS Retreat 2016
Group II Intron Reverse Transcriptase
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
• Mobile genetic elements found in bacterial and organellar genomes
• Ancestors of eukaryotic spliceosomal introns, retrotransposons and
retroviruses
RT
ORF
Lambowitz and Zimmerly. (2011) CSH Persp
Full-length cDNA of a highly
structured intron RNA
DNA Target
Reverse splicing and
bottom-strand cleavage
Target-DNA primed
reverse transcription
Catalytically active intron RNA with
stable secondary and tertiary
Group II intron RTs
function in intron mobility
Thermostable Group II Intron
Reverse Transcriptase (TGIRT)
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
Group II Intron RTs vs. Retroviral RTs
1. Group II intron RTs have high processivity and high fidelity
2. Group II intron RTs from thermophiles also have high
thermostability
Mohr, et al. (2013) RNA
A B
Adaptor
trimming
Tophat
Bowtie
local
Co
r
tRN
MAPQ15 filter
Downstream analysis
Combined
reads
Unmappe
reads
Mapped
reads
Unmappe
reads
Mapped
reads
Trimmed
reads
tRNA
reads
Raw
reads(1) Template-switching by TGIRT
Alkaline treatment
cDNA clean-up
(2) Adaptor ligation by
thermostable 5’ AppDNA/RNA ligase
R2 RNA
3’-Blocker
5’
5’
3’-N
R2R DNA
5’ 3’OH
Target RNA
5’
3’-N
3’OH
Target RNA cDNA TGIRT
cDNA clean-up
(3) PCR amplification
5’-App3’-Blocker 5’3’
Target RNA cDNAR1R R2R
5’
3’
R2R
R2
Target RNA cDNA
P5
3’
5’
Barcode+P7
R1R
R1
Human plasma RNA-Illumina Sequencing Library
Preparation with TGIRT
• cDNA synthesis and
adaptor attachment in
one step
• No RNA-ligation
• Strand specific
Qin and Yao et al. (2016) RNA
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
with mini
cleotide 3
mix of A, C
ed N; Mo
cDNAs ar
nucleotide
of an Illum
er-binding
12 cycles
second D
compatibl
TGIRT
were con
quenced
500 platfo
ating 51.6
end reads.
mapped to
version 7
in Materi
2016). Th
FIGURE 1. RNA sample and TGIRT-seq library preparation. (A) Sample A is composed of
Universal Human Reference RNA (UHR) mixed with ERCC Spike-in Mix 1, and Sample B is
composed of Human Brain Reference RNA (HBR) mixed with ERCC Spike-in Mix 2. Samples
A and B were mixed at ratios of 3:1 or 1:3 to constitute Samples C and D, respectively. (B)
TGIRT-seq library preparation was carried out as previously described (Qin et al. 2016). RNA
TGIRT-seq of
Cold Spring Harbon January 29, 2016 - Published byrnajournal.cshlp.orgDownloaded from
ERCC mix
(92 transcripts)
Nottingham and Wu et al (2016). RNA
ThermoFisher
Validation with MAQC Samples
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
Relative abundance/Differential expression
(Spike-in transcripts)TGIRT-seq of human RNA reference samples
Cold Spring Harbor Laboratory Presson January 29, 2016 - Published byrnajournal.cshlp.orgDownloaded from
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
Nottingham and Wu et al (2016). RNA
Relative abundance/Differential expression
(Human genes)
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
Nottingham and Wu et al (2016). RNA
Strand-specificity
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
ases characte
priming (Ha
2014). At RN
braries show
at the 5′
-mos
which may
tion of an A
cDNA, and
may reflect a
Thermostabl
for adaptor li
son et al. 201
tively unifor
(Fig. 5C, upp
the TruSeq li
TGIRT-seq of hu
Nottingham and Wu et al (2016). RNA
most splice junctions, as well as addi-
tional junctions detected throughout
the genes in the TGIRT-seq data sets
(Fig. 8). Strikingly, TGIRT-seq also de-
tected abundant reads corresponding
to snoRNAs encoded within introns in
ribosomal protein genes, whereas these
embedded snoRNAs were not readily
detected by TruSeq v3 (Fig. 8, boxed re-
gions for RPS8 and RPL17).
Analysis of detected splice sites showed
that TGIRT-seq and TruSeq v3 detected
annotated and unannotated splice junc-
tions, which were overwhelmingly ca-
nonical GU–AG junctions (>90%), with
relatively few U12-type splice junctions
(AU-AC, <0.1%; Supplemental Fig.
S5A,B). The TruSeq v3 data sets showed
w more uniform gene coverage than TruSeq data sets. (A)
ding genes classified as coding, intergenic, intronic, and un-
or combined replicate data sets for each library preparation
ddle panel) TruSeq v2; (right panel) TruSeq v3. Error bars
The normalized coverage of the 1000 most abundant pro-
against normalized gene position from the 5′
(left) to the 3′
combined replicate data sets for each library preparation
Cold Spring Harbor Laboratory Presson January 29, 2016 - Published byajournal.cshlp.org
Sampling Bias and Gene Coverage
1. More uniform gene coverage
2. Higher relative abundance of 5’ junctions
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
FIGURE 7. TGIRT-seq detects more transcripts an
The number of protein-coding gene transcripts and a
tion of mapped reads for combined data sets for Sam
od. The TruSeq v2 libraries were down-sampled to
and TruSeq v3 libraries. Shaded areas represent 95%
(C) Splice junction density plotted versus relative
procedure as expected, but have 3′
termini corresponding to Thes
FIGURE 7. TGIRT-seq detects more transcripts and splice junctions than does
The number of protein-coding gene transcripts and annotated splice junctions dete
tion of mapped reads for combined data sets for Samples A–D for each library prep
od. The TruSeq v2 libraries were down-sampled to match the sequencing depth
and TruSeq v3 libraries. Shaded areas represent 95% confidence intervals for th
(C) Splice junction density plotted versus relative distance from the 5′
end of p
gene RNAs for combined data sets for Samples A–D for each library preparation
Distribution of splice junction types for replicates of Samples A and B for T
TruSeq v3. Antisense to annotated junctions are exact complements on the opp
the annotated junctions. Novel junctions are those for which no annotation cur
the reference genome used.
Nottingham and Wu et al (2016). RNA
Total RNA recovery
Cold Spring Harbor Laboratory Presson January 29, 2016 - Published byrnajournal.cshlp.orgaded from
Nottingham et al.
Cold Spring Harbor Laon January 29, 2016 - Published byrnajournal.cshlp.orgDownloaded from
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
Nottingham and Wu et al (2016). RNA
Total RNA recovery
FIGURE 2. TGIRT-seq reads map mostly to protein-coding genes but with greater representation of small ncRNAs than TruSeq libraries. (A) Stacked
bar graphs showing the percentage of uniquely mapped reads for each class of annotated genomic features in Ensembl GRCh38 release 76, Genomic
Nottingham et al.
Cold Spring Harbor Laboratory Presson January 29, 2016 - Published byrnajournal.cshlp.orgDownloaded from
Intro
Differential
expression
Summary
Sampling
Bias
Total RNA
Nottingham and Wu et al (2016). RNA
Intro
Summary
1. Recapitulates the relative abundance of human transcripts
and spike-ins
2. Higher strand-specificity than TruSeq v3
3. More uniform 5’ to 3’ gene coverage and better detection of
5’ splice junctions
4. Enables simultaneous profiling of mRNA, lncRNA and
other structured small RNAs
5. Other applications:
• Structured RNA-seq
• Plasma and exosomal RNA-seq (cancer diagnostic)
• Long cDNA
Differential
expression
Summary
Sampling
Bias
Total RNA
Acknowledgement
Lambowitz Lab
Ryan Nottingham, Ph.D.
Yidan Qin, almost Ph.D.
Jun Yao, Ph.D.
Sabine Mohr, Ph.D
and other lab members.
GSAF
Scott Hunicke-Smith, Ph.D.
and members of GSAF
Professor Vishy Iyer
Article: Ryan M. Nottingham*, Douglas C. Wu*, Yidan Qin, Jun Yao, Scott Hunicke-Smith, and Alan M. Lambowitz (2016). RNA-seq of human
reference RNA samples using a thermostable group II intron reverse transcriptase. RNA.
Codes: https://github.com/wckdouglas/tgirtERCC.git
Slides: http://www.slideshare.net/DouglasWu1/highthrouput-rna-sequencing-with-thermostable-group-ii-intron-reverse-transcriptase

High-throughput RNA sequencing with Thermostable Group II Intron Reverse Transcriptase

  • 1.
    High-throughput RNA sequencing withThermostable Group II Intron Reverse Transcriptase Douglas C. Wu @wckdouglas wckdouglas.github.io Lambowitz’s Lab MBS Retreat 2016
  • 2.
    Group II IntronReverse Transcriptase Intro Differential expression Summary Sampling Bias Total RNA • Mobile genetic elements found in bacterial and organellar genomes • Ancestors of eukaryotic spliceosomal introns, retrotransposons and retroviruses RT ORF Lambowitz and Zimmerly. (2011) CSH Persp Full-length cDNA of a highly structured intron RNA DNA Target Reverse splicing and bottom-strand cleavage Target-DNA primed reverse transcription Catalytically active intron RNA with stable secondary and tertiary Group II intron RTs function in intron mobility
  • 3.
    Thermostable Group IIIntron Reverse Transcriptase (TGIRT) Intro Differential expression Summary Sampling Bias Total RNA Group II Intron RTs vs. Retroviral RTs 1. Group II intron RTs have high processivity and high fidelity 2. Group II intron RTs from thermophiles also have high thermostability Mohr, et al. (2013) RNA
  • 4.
    A B Adaptor trimming Tophat Bowtie local Co r tRN MAPQ15 filter Downstreamanalysis Combined reads Unmappe reads Mapped reads Unmappe reads Mapped reads Trimmed reads tRNA reads Raw reads(1) Template-switching by TGIRT Alkaline treatment cDNA clean-up (2) Adaptor ligation by thermostable 5’ AppDNA/RNA ligase R2 RNA 3’-Blocker 5’ 5’ 3’-N R2R DNA 5’ 3’OH Target RNA 5’ 3’-N 3’OH Target RNA cDNA TGIRT cDNA clean-up (3) PCR amplification 5’-App3’-Blocker 5’3’ Target RNA cDNAR1R R2R 5’ 3’ R2R R2 Target RNA cDNA P5 3’ 5’ Barcode+P7 R1R R1 Human plasma RNA-Illumina Sequencing Library Preparation with TGIRT • cDNA synthesis and adaptor attachment in one step • No RNA-ligation • Strand specific Qin and Yao et al. (2016) RNA Intro Differential expression Summary Sampling Bias Total RNA
  • 5.
    with mini cleotide 3 mixof A, C ed N; Mo cDNAs ar nucleotide of an Illum er-binding 12 cycles second D compatibl TGIRT were con quenced 500 platfo ating 51.6 end reads. mapped to version 7 in Materi 2016). Th FIGURE 1. RNA sample and TGIRT-seq library preparation. (A) Sample A is composed of Universal Human Reference RNA (UHR) mixed with ERCC Spike-in Mix 1, and Sample B is composed of Human Brain Reference RNA (HBR) mixed with ERCC Spike-in Mix 2. Samples A and B were mixed at ratios of 3:1 or 1:3 to constitute Samples C and D, respectively. (B) TGIRT-seq library preparation was carried out as previously described (Qin et al. 2016). RNA TGIRT-seq of Cold Spring Harbon January 29, 2016 - Published byrnajournal.cshlp.orgDownloaded from ERCC mix (92 transcripts) Nottingham and Wu et al (2016). RNA ThermoFisher Validation with MAQC Samples Intro Differential expression Summary Sampling Bias Total RNA
  • 6.
    Relative abundance/Differential expression (Spike-intranscripts)TGIRT-seq of human RNA reference samples Cold Spring Harbor Laboratory Presson January 29, 2016 - Published byrnajournal.cshlp.orgDownloaded from Intro Differential expression Summary Sampling Bias Total RNA Nottingham and Wu et al (2016). RNA
  • 7.
    Relative abundance/Differential expression (Humangenes) Intro Differential expression Summary Sampling Bias Total RNA Nottingham and Wu et al (2016). RNA
  • 8.
    Strand-specificity Intro Differential expression Summary Sampling Bias Total RNA ases characte priming(Ha 2014). At RN braries show at the 5′ -mos which may tion of an A cDNA, and may reflect a Thermostabl for adaptor li son et al. 201 tively unifor (Fig. 5C, upp the TruSeq li TGIRT-seq of hu Nottingham and Wu et al (2016). RNA
  • 9.
    most splice junctions,as well as addi- tional junctions detected throughout the genes in the TGIRT-seq data sets (Fig. 8). Strikingly, TGIRT-seq also de- tected abundant reads corresponding to snoRNAs encoded within introns in ribosomal protein genes, whereas these embedded snoRNAs were not readily detected by TruSeq v3 (Fig. 8, boxed re- gions for RPS8 and RPL17). Analysis of detected splice sites showed that TGIRT-seq and TruSeq v3 detected annotated and unannotated splice junc- tions, which were overwhelmingly ca- nonical GU–AG junctions (>90%), with relatively few U12-type splice junctions (AU-AC, <0.1%; Supplemental Fig. S5A,B). The TruSeq v3 data sets showed w more uniform gene coverage than TruSeq data sets. (A) ding genes classified as coding, intergenic, intronic, and un- or combined replicate data sets for each library preparation ddle panel) TruSeq v2; (right panel) TruSeq v3. Error bars The normalized coverage of the 1000 most abundant pro- against normalized gene position from the 5′ (left) to the 3′ combined replicate data sets for each library preparation Cold Spring Harbor Laboratory Presson January 29, 2016 - Published byajournal.cshlp.org Sampling Bias and Gene Coverage 1. More uniform gene coverage 2. Higher relative abundance of 5’ junctions Intro Differential expression Summary Sampling Bias Total RNA FIGURE 7. TGIRT-seq detects more transcripts an The number of protein-coding gene transcripts and a tion of mapped reads for combined data sets for Sam od. The TruSeq v2 libraries were down-sampled to and TruSeq v3 libraries. Shaded areas represent 95% (C) Splice junction density plotted versus relative procedure as expected, but have 3′ termini corresponding to Thes FIGURE 7. TGIRT-seq detects more transcripts and splice junctions than does The number of protein-coding gene transcripts and annotated splice junctions dete tion of mapped reads for combined data sets for Samples A–D for each library prep od. The TruSeq v2 libraries were down-sampled to match the sequencing depth and TruSeq v3 libraries. Shaded areas represent 95% confidence intervals for th (C) Splice junction density plotted versus relative distance from the 5′ end of p gene RNAs for combined data sets for Samples A–D for each library preparation Distribution of splice junction types for replicates of Samples A and B for T TruSeq v3. Antisense to annotated junctions are exact complements on the opp the annotated junctions. Novel junctions are those for which no annotation cur the reference genome used. Nottingham and Wu et al (2016). RNA
  • 10.
    Total RNA recovery ColdSpring Harbor Laboratory Presson January 29, 2016 - Published byrnajournal.cshlp.orgaded from Nottingham et al. Cold Spring Harbor Laon January 29, 2016 - Published byrnajournal.cshlp.orgDownloaded from Intro Differential expression Summary Sampling Bias Total RNA Nottingham and Wu et al (2016). RNA
  • 11.
    Total RNA recovery FIGURE2. TGIRT-seq reads map mostly to protein-coding genes but with greater representation of small ncRNAs than TruSeq libraries. (A) Stacked bar graphs showing the percentage of uniquely mapped reads for each class of annotated genomic features in Ensembl GRCh38 release 76, Genomic Nottingham et al. Cold Spring Harbor Laboratory Presson January 29, 2016 - Published byrnajournal.cshlp.orgDownloaded from Intro Differential expression Summary Sampling Bias Total RNA Nottingham and Wu et al (2016). RNA
  • 12.
    Intro Summary 1. Recapitulates therelative abundance of human transcripts and spike-ins 2. Higher strand-specificity than TruSeq v3 3. More uniform 5’ to 3’ gene coverage and better detection of 5’ splice junctions 4. Enables simultaneous profiling of mRNA, lncRNA and other structured small RNAs 5. Other applications: • Structured RNA-seq • Plasma and exosomal RNA-seq (cancer diagnostic) • Long cDNA Differential expression Summary Sampling Bias Total RNA
  • 13.
    Acknowledgement Lambowitz Lab Ryan Nottingham,Ph.D. Yidan Qin, almost Ph.D. Jun Yao, Ph.D. Sabine Mohr, Ph.D and other lab members. GSAF Scott Hunicke-Smith, Ph.D. and members of GSAF Professor Vishy Iyer Article: Ryan M. Nottingham*, Douglas C. Wu*, Yidan Qin, Jun Yao, Scott Hunicke-Smith, and Alan M. Lambowitz (2016). RNA-seq of human reference RNA samples using a thermostable group II intron reverse transcriptase. RNA. Codes: https://github.com/wckdouglas/tgirtERCC.git Slides: http://www.slideshare.net/DouglasWu1/highthrouput-rna-sequencing-with-thermostable-group-ii-intron-reverse-transcriptase