FIND MEANING IN COMPLEXITY
© Copyright 2013 by Pacific Biosciences of California, Inc. All rights reserved.
Profiling Full-Length cDNAs by SMRT® Sequencing
Short Reads Fall Short in the Era of “Alternative Events”
2
Alternative
transcription
start sites
Alternative splicing Alternative
polyA sites
AAAA
AAAA
Genome
Transcriptome
Transcriptome
Transcriptome
Transcriptome
Transcriptome
Transcriptome
Transcriptome
Transcriptome
Transcriptome
Bringing Full-Length cDNA Sequencing to the PacBio® RS
PacBio read lengths offer a unique opportunity for transcriptome biology
Methods development/optimization
• Robust synthesis of full-length cDNA libraries
• Sample normalization
• Target enrichment
• Sequencing recommendations
• Paths to analysis
3
Cumulative distribution
of human transcripts in
2 databases
Length (nucleotides)
Au et al. PLoS ONE, 2012
Full-Length cDNA Synthesis Kit - Invitrogen
4
• Very time-consuming
• Large polyA RNA input
• More stringent about full length
cDNA because of cap selection
• 5 μg of polyA RNA
• 50-100 ng of ds cDNA
• Successfully scaled down the
input to 1 μg polyA RNA
“Full-Length” cDNA: Template Switching
Matz M, Shagin D, Bogdanova E, Britanova O, Lukyanov S, Diatchenko L,
Chenchik A (1999) Amplification of cDNA ends based on template-switching
effect and step-out PCR. Nucleic Acids Res. 27, 1558-1560.
5
• Clontech® SMART kit
• Evrogen® Mint-2 kit
• Uses less material
• Less time consuming/fewer
steps
• Less stringent about selecting
full-length cDNA
cDNA SMRTbell™ Libraries Display Size Distributions That
Correlate with Expected Full-Length mRNA Sizes
6
S. cerevisiae
H. Sapiens
cerebellum
500 5k
500 5k
cDNA synthesis
method
PCR
PacBio
SMRTbell™
library prep
Both Kits Produce Good Results with a High Quality Sample
7
Clontech® Human Cerebellum polyA RNA
Aligned to Gencode by BLASR
5’ 3’ 5’ 3’
Invitrogen Evrogen
More 3’ bias
Lower Quality RNA Benefits from the 7mG Enrichment used in
Invitrogen® Method
8
MAQC-B (Human brain cDNA)
1-2 kb cDNA fraction
Clontech Invitrogen
Normalization During cDNA Sample Preparation
9
1 μg input
4-7 hr procedure
Unknown yield into PCR
PCR of 15-20 cycles to get
DNA for SMRTbell™ template prep
Zhulidov PA et al. A method for the preparation of normalized cDNA libraries
enriched with full-length sequences. Bioorg Khim. 2005; 31 (2):186-94.
Zhulidov PA et al. Simple cDNA normalization using kamchatka crab duplex-
specific nuclease. Nucleic Acids Res. 2004; 32 (3):e37.
Normalization Increases the Sequence Breadth of a Sample
10
Normalized
Non normalized
Yeast Full-Length cDNA aligned to genome by BLASR
Coverage
Chr IV
Coverage
Chr VII
Normalization Increases the Sequence Breadth of a Sample
11
3485
1736
Normalized Non-Normalized
306
Number of Genes observed in a single SMRT® Cell of transcript data
Subread Length
Distribution
H1 human stem cell cDNA: Clontech
No size selection
SMRTbell™ library size: Bioanalyzer
Improving Subread Read Lengths for cDNA Libraries
12
500 5k
13
H1 SMRTbell™ library size: Bioanalyzer Subread Length Distribution
1-2 kb fraction
2-3 kb fraction
>3 kb fraction
Improving Subread Lengths for cDNA Libraries
Agilent® SureSelect® Enrichment to Increase the Coverage of a
Subset of cDNAs
14
Full-Length ds_cDNA
Enriched
Full-Length
ds_cDNA
Agilent® Kinome Enrichment Significantly Increases Target
Reads
15
Precision = # reads / total reads
0.015
0.61
Detection of Novel Splice Forms of a Cyclin-Dependent Kinase
16
Shared Protocol Available on Sample Net
http://www.smrtcommunity.com/Share/Protocol?id=a1q70000000
H6LmAAK&strRecordTypeName=Protocol
17
Analysis Options: Transcriptome Reference
• If alignment to a transcript database is desired, BLASR and BWA-SW
can be used for alignment
18
RefSeq/Gencode
BLASR
• https://github.com/PacificBiosciences/blasr
BWA-SW
• http://bio-bwa.sourceforge.net/
Analysis Options: Genomic Reference-Based Alignment
• BLAT and GMAP can be used to align PacBio CLR and CCS reads to
the genomic reference.
19
Genomic Reference
BLAT
• http://www.soe.ucsc.edu/~kent/src
GMAP
• http://research-pub.gene.com/gmap/
Short Reads and Genomic-Reference-Based Alignment
• If short-read data is available, error correction can be done prior to
alignment.
20
Short Read /
CCS Data
pacBioToCA
P_ErrorCorrection
LSC
Genomic
Reference
BLAT
GMAP
cDNA Takeaways
Sample Prep Run Design Sequencing on the PacBio® RS and primary analysis Secondary Analysis Tertiary Analysis
• Two cDNA prep methods show
promising results but differ in
input and stringency
• Double-stranded cDNA is
converted to SMRTbell™
libraries with the PacBio®
Large Insert Kit
• Normalization can be an
effective means to increase
breadth, but use caution if
characterizing rare isoforms
• Agilent® SureSelect® system is
effective for custom
enrichment of select genes
• Size Selection Can Enrich for
Larger cDNAs
• Run design depends on size
• <2k fractions: 2x45 or 2x55
min movies, C2/C2, diffusion
loading, CPS start
• >2k fractions: 1x120 min
movies, mag loading, XL/C2,
stage start
• Non-size selected: run both
conditions above to cover all
sizes, or select for desired
size range
• Full-Pass Subreads represent
putative full-length Isoforms
• Transcript Reference
• BLASR
• BWA-SW
• Genomic Reference
• BLAT
• GMAP
• Error Correction: Short
Reads
• pacBioToCA
• P_ErrorCorrection
• LSC (Au et al. PLoS
ONE, 2012)

Full-length cDNA Sequencing.pdf

  • 1.
    FIND MEANING INCOMPLEXITY © Copyright 2013 by Pacific Biosciences of California, Inc. All rights reserved. Profiling Full-Length cDNAs by SMRT® Sequencing
  • 2.
    Short Reads FallShort in the Era of “Alternative Events” 2 Alternative transcription start sites Alternative splicing Alternative polyA sites AAAA AAAA Genome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome
  • 3.
    Bringing Full-Length cDNASequencing to the PacBio® RS PacBio read lengths offer a unique opportunity for transcriptome biology Methods development/optimization • Robust synthesis of full-length cDNA libraries • Sample normalization • Target enrichment • Sequencing recommendations • Paths to analysis 3 Cumulative distribution of human transcripts in 2 databases Length (nucleotides) Au et al. PLoS ONE, 2012
  • 4.
    Full-Length cDNA SynthesisKit - Invitrogen 4 • Very time-consuming • Large polyA RNA input • More stringent about full length cDNA because of cap selection • 5 μg of polyA RNA • 50-100 ng of ds cDNA • Successfully scaled down the input to 1 μg polyA RNA
  • 5.
    “Full-Length” cDNA: TemplateSwitching Matz M, Shagin D, Bogdanova E, Britanova O, Lukyanov S, Diatchenko L, Chenchik A (1999) Amplification of cDNA ends based on template-switching effect and step-out PCR. Nucleic Acids Res. 27, 1558-1560. 5 • Clontech® SMART kit • Evrogen® Mint-2 kit • Uses less material • Less time consuming/fewer steps • Less stringent about selecting full-length cDNA
  • 6.
    cDNA SMRTbell™ LibrariesDisplay Size Distributions That Correlate with Expected Full-Length mRNA Sizes 6 S. cerevisiae H. Sapiens cerebellum 500 5k 500 5k cDNA synthesis method PCR PacBio SMRTbell™ library prep
  • 7.
    Both Kits ProduceGood Results with a High Quality Sample 7 Clontech® Human Cerebellum polyA RNA Aligned to Gencode by BLASR 5’ 3’ 5’ 3’ Invitrogen Evrogen More 3’ bias
  • 8.
    Lower Quality RNABenefits from the 7mG Enrichment used in Invitrogen® Method 8 MAQC-B (Human brain cDNA) 1-2 kb cDNA fraction Clontech Invitrogen
  • 9.
    Normalization During cDNASample Preparation 9 1 μg input 4-7 hr procedure Unknown yield into PCR PCR of 15-20 cycles to get DNA for SMRTbell™ template prep Zhulidov PA et al. A method for the preparation of normalized cDNA libraries enriched with full-length sequences. Bioorg Khim. 2005; 31 (2):186-94. Zhulidov PA et al. Simple cDNA normalization using kamchatka crab duplex- specific nuclease. Nucleic Acids Res. 2004; 32 (3):e37.
  • 10.
    Normalization Increases theSequence Breadth of a Sample 10 Normalized Non normalized Yeast Full-Length cDNA aligned to genome by BLASR Coverage Chr IV Coverage Chr VII
  • 11.
    Normalization Increases theSequence Breadth of a Sample 11 3485 1736 Normalized Non-Normalized 306 Number of Genes observed in a single SMRT® Cell of transcript data
  • 12.
    Subread Length Distribution H1 humanstem cell cDNA: Clontech No size selection SMRTbell™ library size: Bioanalyzer Improving Subread Read Lengths for cDNA Libraries 12 500 5k
  • 13.
    13 H1 SMRTbell™ librarysize: Bioanalyzer Subread Length Distribution 1-2 kb fraction 2-3 kb fraction >3 kb fraction Improving Subread Lengths for cDNA Libraries
  • 14.
    Agilent® SureSelect® Enrichmentto Increase the Coverage of a Subset of cDNAs 14 Full-Length ds_cDNA Enriched Full-Length ds_cDNA
  • 15.
    Agilent® Kinome EnrichmentSignificantly Increases Target Reads 15 Precision = # reads / total reads 0.015 0.61
  • 16.
    Detection of NovelSplice Forms of a Cyclin-Dependent Kinase 16
  • 17.
    Shared Protocol Availableon Sample Net http://www.smrtcommunity.com/Share/Protocol?id=a1q70000000 H6LmAAK&strRecordTypeName=Protocol 17
  • 18.
    Analysis Options: TranscriptomeReference • If alignment to a transcript database is desired, BLASR and BWA-SW can be used for alignment 18 RefSeq/Gencode BLASR • https://github.com/PacificBiosciences/blasr BWA-SW • http://bio-bwa.sourceforge.net/
  • 19.
    Analysis Options: GenomicReference-Based Alignment • BLAT and GMAP can be used to align PacBio CLR and CCS reads to the genomic reference. 19 Genomic Reference BLAT • http://www.soe.ucsc.edu/~kent/src GMAP • http://research-pub.gene.com/gmap/
  • 20.
    Short Reads andGenomic-Reference-Based Alignment • If short-read data is available, error correction can be done prior to alignment. 20 Short Read / CCS Data pacBioToCA P_ErrorCorrection LSC Genomic Reference BLAT GMAP
  • 21.
    cDNA Takeaways Sample PrepRun Design Sequencing on the PacBio® RS and primary analysis Secondary Analysis Tertiary Analysis • Two cDNA prep methods show promising results but differ in input and stringency • Double-stranded cDNA is converted to SMRTbell™ libraries with the PacBio® Large Insert Kit • Normalization can be an effective means to increase breadth, but use caution if characterizing rare isoforms • Agilent® SureSelect® system is effective for custom enrichment of select genes • Size Selection Can Enrich for Larger cDNAs • Run design depends on size • <2k fractions: 2x45 or 2x55 min movies, C2/C2, diffusion loading, CPS start • >2k fractions: 1x120 min movies, mag loading, XL/C2, stage start • Non-size selected: run both conditions above to cover all sizes, or select for desired size range • Full-Pass Subreads represent putative full-length Isoforms • Transcript Reference • BLASR • BWA-SW • Genomic Reference • BLAT • GMAP • Error Correction: Short Reads • pacBioToCA • P_ErrorCorrection • LSC (Au et al. PLoS ONE, 2012)