Successfully reported this slideshow.
Your SlideShare is downloading. ×

New data from giab genomes promethion

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 21 Ad

More Related Content

Slideshows for you (20)

Similar to New data from giab genomes promethion (20)

Advertisement

More from GenomeInABottle (14)

Recently uploaded (20)

Advertisement

New data from giab genomes promethion

  1. 1. Karen Miga 03/28/19 GIAB Workshop Generating high-quality human reference genomes using PromethION nanopore sequencing @khmiga
  2. 2. Broader Goal:
 Improving Diploid T2T Assemblies One (haploid) genome reference assembly
  3. 3. Technology Bottleneck Long read Sequencing Compute: Assembly +
  4. 4. PromethION 100 kb+ Reads Scalable Assembly Tools Multi-flow Cells Requirements for Long Read Sequencing Consistency in Assembly Quality Capacity to Scale: Parallelized Long-Read Sequencing Comprehensive Genome Representation
  5. 5. Sequencing 11 Reference Genomes in 9 Days
  6. 6. Flip Flop Racon Medakawtdbg2 HiRise Sequencing/ Basecalling Assembly Polishing Scaffolding 4x FINISHED ASSEMBLY HiC Data Phasing Sequencing 11 Reference Genomes in 9 Days
  7. 7. Sequencing strategy for enrichment of UL-reads ttps://www.circulomics.com/ Centrifuge Wash Step Re-suspend Size-selected HMW DNA gDNA + buffer x2 Short Read Eliminator Kit Decrease Standard HMW DNA Prep Circulomics Short Read Eliminator Kit Increase Read Lengths (kb) NumberofBases(Mb)
  8. 8. Sequencing strategy for enrichment of UL-reads ttps://www.circulomics.com/ Centrifuge Wash Step Re-suspend Size-selected HMW DNA gDNA + buffer x2 Short Read Eliminator Kit Read Lengths (kb) NumberofBases(Mb) FoldEnrichment 0 5 10 15 20 25 30 35 40 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Read Lengths (kb) Enrichment of 100kb+ reads
  9. 9. Sequencing strategy for enrichment of UL-reads ttps://www.circulomics.com/ Centrifuge Wash Step Re-suspend Size-selected HMW DNA gDNA + buffer x2 Short Read Eliminator Kit Read Lengths (kb) NumberofBases(Mb) FoldEnrichment 0 5 10 15 20 25 30 35 40 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Read Lengths (kb) Enrichment of 100kb+ reads 0 5 10 15 20 HG020HG02055HG01243HG01109HG00733GM24385GM24149GM24143 Coverage >10 kb 100 kb+ Boost in Overall Coverage of 100kb+
  10. 10. Sequencing strategy for enrichment of UL-reads 0 1000 2000 3000 4000 5000 6000 7000 8000 100kb+ 10-100kb <10kbMb Read Len 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0 20 40 60 80 220200180160140120100 Read Length (kb) NumberofBases(Gb) 30 35 40 45 50 24143 24149 24385 00733 01109 01243 02055 02080 02723 03098 03492 N50s: 44kb
  11. 11. GM24143 GM24149 GM24385 HG00733 HG01109 HG01243 HG02055 HG02080 HG02723 HG03098 HG03492 0 30 60 90 Diploid Genomes min max 62 79 80 71 68 74 79 81 71 107 98 45 40 68 41 64 43 52 62 27 82 88 Flow Cell Throughput (Gb) ave 69 Gb Per Flow Cell 48x 54x 69x 52x 61x 57x 61x 74x 47x 83x 85x cov 159 (Gb) 177 227 173 201 188 201 243 156 274 280 Total throughput 100 kb+ Reads (ave 22Gb, 7.3x) High-Throughput Runs
  12. 12. 48x 54x 69x 52x 61x 57x 61x 74x 47x 83x 85x cov 159 (Gb) 177 227 173 201 188 201 243 156 274 280 Total throughput 100 kb+ Reads (ave 22Gb, 7.3x) Evaluation of Read Accuracy Flip-flop Non-flip flop HG00733 Flow Cell Replicates 0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0 Sequence Identity Sequence Identity
  13. 13. Flip Flop Racon Medakawtdbg2 HiRise Sequencing/ Basecalling Assembly Polishing Scaffolding 4x FINISHED ASSEMBLY HiC Data Phasing HG00733 99.18% 2.76 GB aligned Consensus Base Accuracy (GRCh38)
 • Not phased alignments • Additional polishing steps (pilon/methylation aware polishing) • Alignments are not to the individuals genome Assembly Performance: Base Accuracy
  14. 14. Flip Flop Racon Medakawtdbg2 HiRise Sequencing/ Basecalling Assembly Polishing Scaffolding 4x FINISHED ASSEMBLY HiC Data Phasing HG00733 99.18% 2.76 GB aligned Consensus Base Accuracy (GRCh38)
 Assembly Performance: Base Accuracy • Alignments are not to the individuals genome Complete BAC alignments 21 BACs: 3.1Mb
  15. 15. Flip Flop Racon Medakawtdbg2 HiRise Sequencing/ Basecalling Assembly Polishing Scaffolding 4x FINISHED ASSEMBLY HiC Data Phasing HG00733 99.18% 2.76 GB aligned Consensus Base Accuracy (GRCh38)
 Assembly Performance: Base Accuracy • Alignments are not to the individuals genome Complete BAC alignments 21 BACs: 3.1Mb 0.9976NA12878 ONT (NBT 2018, update): Nanopolish (x2), CpG methylation-mode (Sergey Koren and Adam Phillippy) *
  16. 16. • 6 mos (May-Oct) • 62 MinION Flow Cells • 155Gb (50X Coverage) • N50s 70kb • 44Gb 100kb+ (16.5x)
  17. 17. • 6 mos (May-Oct) • 62 MinION Flow Cells • 155Gb (50X Coverage) • N50s 70kb • 44Gb 100kb+ (16.5x) • 4 days • 3 PromethION Flow Cells • 207 Gb (69X Coverage) • N50s 44 kb • 22Gb 100kb+ (7x)
  18. 18. 10 Reference Genome Assemblies in 10 Days
  19. 19. Flip Flop Racon Medakawtdbg2 HiRise Sequencing/ Basecalling Assembly Polishing Scaffolding 4x FINISHED ASSEMBLY HiC Data Phasing Not yet running at full capacity Improvement Assembly and Polishing:
 Reduce cost — Improve quality Haplotype Phasing
  20. 20. Benedict Paten Mark AkesonDavid Haussler Acknowledgements Simon Mayes Vania Costa Daniel Garalde David Stoddart Rosemary Dokos Jon Pugh Chris Seymour Chris Wright ONT TEAM Adam Novak Glenn Hickey Jordan Eizenga Erik Garrison Jean Monlong Xian Chang Miten Jain Hugh Olsen Kristof Tigyi Marina Haukness Ryan Lorig-Roach Trevor Pesout Joel Armstrong Nicholas Maurer Justin Zook, Nate Olson

×