Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Telomere-to-telomere assembly of a complete human X chromosome

1,159 views

Published on

Presentation at AGBT 2019

Published in: Science

Telomere-to-telomere assembly of a complete human X chromosome

  1. 1. Adam M. Phillippy Head, Genome Informatics Section Telomere-to-telomere assembly of a complete human X chromosome AGBT – March 2, 2019
  2. 2. • The human reference genome is incomplete • 368 unresolved issues, 102 gaps • Segmental duplications, rDNAs • Centromeres, telomeres, heterochromatin • These gaps contain important information • Missing reference sequence leads to analysis artifacts • Variation in these gaps is unexplored (e.g. rDNAs) • We don’t know what we don’t know… I have some troubling news…
  3. 3. @khmiga @aphillippy Karen Miga Adam Phillippy Let’s finish the human genome
  4. 4. • Repeats are long, reads are short • “If the overlap is of sufficient length to distinguish it from being a repeat in the sequence the two sequences must be contiguous.” • Rodger Staden, 1979, MRC Laboratory of Molecular Biology What’s the problem?
  5. 5. • The return of closed (bacterial) genomes • Bibersteinia trehalosi 192 Flashback to AGBT 2012
  6. 6. • How long are the repeats? • 7 kbp LINEs • 1 Mbp+ rDNA arrays • 1 Mbp+ centromere arrays • 10 Mbp+ heterochromatin blocks • Coverage and accuracy matter too • 1,000X of 100 bp reads at 100% accuracy? NO • 10X of 10,000,000 bp reads at 100% accuracy, YES • 100X of 100,000 bp reads at 90% accuracy, MAYBE? How long do reads need to be, for human?
  7. 7. • ONT R9 pore: E. coli CsgG membrane protein • Read lengths >1 Mbp possible Ultra-long nanopore sequencing *Assuming 3.4 Å per bp, 1 Mbp = 3,400,000 Å (0.34 mm) = 40,000x height of the pore 120 Å 85 Å 3.2 km in 37 m 8 cm
  8. 8. LONG READ CLUB Really very long reads indeed @pathogenomenick Nick Loman @mattloose Matt Loose
  9. 9. It’s time to finish the human genome CHM13 cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton, Stowers (N=46; XX) The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome.
  10. 10. • 30x Nanopore ultra-long • Contig building • 60x PacBio • Polishing • 50x 10x Genomics • Polishing • BioNano • Structural validation We need long reads. Lots of long reads 100 kb
  11. 11. • Nanopore UL read length distribution is long tailed It pays to go deep repeat
  12. 12. • From May 1 – October 29, 2018 • 62 MinION/GridION flow cells • 8.9M reads, 98 Gb, 1.6 Gb / cell • N50 read length 76 kb • 44 Gb in reads >100 kb • Max read length 1.03 Mb • Assembled with Canu • 10x cov of 100 kb at 90% acc CHM13 sequencing Now upwards of 90 flow cells and counting…
  13. 13. The human genome, 2001 ref28 NG50 contig 0.5 Mbp
  14. 14. The human genome, 2019 CHM13 NG50 contig 75 Mbp (70x PacBio + 35x UL ONT) 13 14 15 16 17 18 19 20 21 22 X 1 2 3 4 5 6 7 8 9 10 11 12 Canu
  15. 15. The first complete assembly of a human chromosome
  16. 16. A complete X chromosome ddPCR
  17. 17. • Unique structural variants from PacBio • Unique k-mers confirmed by Duplex-Seq Stitching across the X centromere
  18. 18. An assembly is a hypothesis
  19. 19. Anchored 100 kb+ centromere reads Requires a careful measure of “mapping quality”
  20. 20. Centromere array validation
  21. 21. Centromere array validation 1.8 Mb 0.7 Mb 0.3 Mb
  22. 22. It’s time to finish the human genome
  23. 23. • Almost! • Have proven it’s possible for the X chromosome • T2T assembly of all chrs within the next 2 years • Remaining challenges • Satellite arrays, rDNA arrays, segmental duplications • Nanopore consensus quality • Targeted long-read sequencing • Better methods for phasing repeats and haplotypes Are we there yet?
  24. 24. • github.com/nanopore-wgs-consortium/chm13 • Draft whole-genome assemblies • Nanopore ultra-long reads • 10x Genomics reads • BioNano DLS (WashU) • PacBio (SRA) • Coming soon: • Hi-C (Arima Genomics) All our CHM13 data is openly released
  25. 25. NHGRI • Sergey Koren • Arang Rhie • Jim Mullikin • Alice Young • Shelise Brooks • Valerie Maduro • Gerard Bouffard • Sofia Barreira • Andy Baxevanis • Nancy Hansen • Karen Miga, UCSC • Jennifer Gerton, Stowers • Tamara Potapova, Stowers • Tina Graves Lindsay, WashU • Ira Hall, WashU • Valerie Schneider, NCBI • Kerstin Howe, Sanger • Jo Wood, Sanger • Matt Loose, Nottingham • Nick Loman, Birmingham • Urvashi Surti, Pitt (ret.) Acknowledgements

×