Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20140711 3 t_clark_ercc2.0_workshop

944 views

Published on

Published in: Science
  • Be the first to comment

  • Be the first to like this

20140711 3 t_clark_ercc2.0_workshop

  1. 1. Single Molecule, Real-Time Sequencing of Full-length cDNA Transcripts FIND MEANING IN COMPLEXITY © Copyright 2012 by Pacific Biosciences of California, Inc. All rights reserved. Tyson Clark 7/11/14
  2. 2. Single Molecule, Real-Time (SMRT) DNA Sequencing PacBio RS II
  3. 3. P5-C3 Sequencing Chemistry
  4. 4. Transcript Diversity
  5. 5. Current State of Transcript Assembly “The way we do RNA-seq now is… you take the transcriptome, you blow it up into pieces and then you try to figure out how they all go back together again… If you think about it, it’s kind of a crazy way to do things”. Michael Snyder Stanford University Tal Nawy (2013) End-to-end RNA sequencing, Nature Methods 10: 1144–1145 Ian Korf (2013) Genomics: the state of the art in RNA-seq analysis. Nature Methods 10: 1165-1166.
  6. 6. PacBio Iso-Seq for High-quality, Full-length Transcripts 1 2 3 4 5 PolyA mRNA AAAAA AAAAA AAAAA AAAAA cDNA synthesis with adapters AAAAA TTTTT AAAAA TTTTT AAAAA TTTTT AAAAA TTTTT a b AAAAA TTTTT AAAAA TTTTT AAAAA TTTTT AAAAA TTTTT Size partitioning & PCR amplification SMRTbell ligation PacBio RS II Sequencing Experimental Pipeline (AAA)n Reads of Insert Informatics Pipeline Clean sequence reads Remove adapters Remove artifacts Reads clustering 5’ UTR Coding sequence 3’ UTR Isoform clusters Experimental pipeline Informatics pipeline AAAAA polyA tail (TTT)n Nonredundant transcript isoforms Consensus calling AAAAA cDNA synthesis with adapters Quality filtering Final isoforms PacBio raw sequence reads Raw 5’ primer 3’ primer Map to PacBio raw sequence reads sequence reads Isoform clusters Nonredundant transcript isoforms reference genome Figure 1 AAAA AAAA AAAAA AAAAA AAAAA Size partitioning & PCR amplification SMRTbell ligation RS sequencing Remove adapters Remove artifacts Clean Reads clustering Quality filtering Final isoforms TTTT TTTT Consensus calling Map to reference genome Evidence-based gene models polyA mRNA AAAA AAAA TTTT TTTT AAAA TTTT AAAA TTTT AAAA TTTT AAAA TTTT Evidenced-based gene models (AAA)n (TTT)n SMRT adapter 6 7 8 9 10 SMRT adapter https://github.com/PacificBiosciences/cDNA_primer/
  7. 7. Detailed Clontech workflow for conversion of cDNA into SMRTbell libraries 7 Total RNA Optional Poly-A Selection polyA+ RNA Reverse Transcription Full Length 1st Strand cDNA PCR Optimization Large Scale Amplification Amplified cDNA 1-2 kb 2-3 kb 3-6 kb Size Selection (Blue Pippin or Gel) 1-2 kb 2-3 kb 3-6 kb Re-Amplification 1-2 kb 2-3 kb 3-6 kb SMRTbell Template Preparation 1-2 kb 2-3 kb 3-6 kb SMRT Sequencing 3-6 kb Optional Size Selection (Blue Pippin)
  8. 8. Brain Amplified cDNA – Testing PCR Enzymes 8 Phusion Kapa Hifi SeqAmp
  9. 9. Brain Amplified cDNA (zoom) 9 Phusion Kapa Hifi SeqAmp
  10. 10. 2nd Amplification (after Blue Pippin size selection) 10 4000 2000 1250 800 500 Brain 1-2 kb 2-3 kb 3-6 kb 5-10 kb 6-10 kb 8-12 kb 10-15 kb Kapa Polymerase
  11. 11. 2nd Amplification (after Blue Pippin size selection) 11 4000 2000 1250 800 500 Heart 1-2 kb 2-3 kb 3-6 kb 5-10 kb 8-12 kb Liver 1-2 kb 2-3 kb 3-6 kb 5-10 kb Kapa Polymerase
  12. 12. Amplified cDNA from Multiple Human Tissues 12 Brain Heart Liver
  13. 13. SageELF 13
  14. 14. Brain Amplifed cDNA – Size Selected 14 M 12 11 10 9 8 7 6 5 4 3 2 1 800-1600 1600-2700 2700-4800 4800-8000 3000 1500 800 500 300 100 SageELF BluePippin Kapa Polymerase
  15. 15. 15 SageELF – 12 size bins (Amplified cDNA)
  16. 16. SageELF – 12 size bins (Amplified cDNA) 16
  17. 17. Brain cDNA – ELF Size Selected – 2nd Amplification 17
  18. 18. Actual FL Lengths from each ELF Fraction 18 ELF 12 (400 bp) Actual: 181 - 266 bp ELF 11 (550 bp) Actual: 370 - 480 bp ELF 10 (800 bp) Actual: 617 – 727 bp (25 percentile – 75 percentile)
  19. 19. Actual FL Lengths from each ELF Fraction 19 ELF 9 (1.2 kb) Actual: 955 – 1113 bp ELF 8 (1.5 kb) Actual: 1355 – 1544 bp ELF 7 (1.8 kb) Actual: 1800 – 2033 bp
  20. 20. Actual FL Lengths from each ELF Fraction 20 ELF 6 (2.5 kb) Actual: 2398 – 2737 bp ELF 5 (3 kb) Actual: 3193 – 3574 bp ELF 4 (4 kb) Actual: 2127 – 4664 bp
  21. 21. Actual FL Lengths from each ELF Fraction 21 ELF 3 (5.5 kb) Actual: 1342 – 6075 bp ELF 2 (7 kb) Actual: 1229 – 7446 bp
  22. 22. Actual FL Lengths from each ELF Fraction ELF 1 (9 kb) 180 min Actual: 1295 – 1814 bp
  23. 23. Summarizing ELF for Size Selection ELF Lane # Actual FL range ELF12-400bp 181 - 266 bp ELF11-500bp 370 - 480 bp ELF10-800bp 617 - 727 bp ELF9-1.2kb 955 - 1113 bp ELF8-1.5kb 1355 - 1544 bp ELF7-1.8kb 1800 - 2033 bp ELF6-2.5kb 2398 - 2737 bp ELF5-3kb 3193 - 3574 bp ELF4-4kb 2127 - 4664 bp ELF3-5.5kb 1342 - 6075 bp ELF2-7kb 1229 - 7446 bp ELF1-9kb 1295 - 1814 bp The Good: 1. One run, 12 fractions 2. Finer size fractions (~ 200 bp) 3. 100 bp – 10 kb spread The Not-Good-Yet: 1. > 4 kb gets small inserts competing To Work On: 1. New beta machine 2. Combining fractions
  24. 24. Targeted Sequencing 24
  25. 25. Targeted Sequencing 25
  26. 26. Targeted Sequencing 26
  27. 27. ERCC 2.0 Controls (from the PacBio perspective) • Long Transcripts (>10kb, if possible) • Transcript Isoforms that span size bins • Complex alternative splicing patterns • Diversity of GC contents 27
  28. 28. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.

×