SlideShare a Scribd company logo
Telomere-to-telomere assembly of a
complete human chromosomes
Karen Miga
UC Davis Genetics Seminar
Sept 30, 2019
@khmiga
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
Human reference genome is incomplete.
• 368 unresolved issues, 102 gaps
• Segmental duplications, gene families, satellite
arrays, centromeres, rDNAs
• Uncharacterized sequence variation in the human
population
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
Human reference genome is incomplete.
• 368 unresolved issues, 102 gaps
• Segmental duplications, gene families, satellite
arrays, centromeres, rDNAs
• Uncharacterized sequence variation in the human
population
chr21
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
Human reference genome is incomplete.
• 368 unresolved issues, 102 gaps
• Segmental duplications, gene families, satellite
arrays, centromeres, rDNAs
• Uncharacterized sequence variation in the human
population
Our current understanding of
genome biology and function30 Mb
chr21
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
Human reference genome is incomplete.
• 368 unresolved issues, 102 gaps
• Segmental duplications, gene families, satellite
arrays, centromeres, rDNAs
• Uncharacterized sequence variation in the human
population
Our current understanding of
genome biology and function30 Mb
chr21
~20 Mb ?
Challenge:
Generating assemblies across repetitive regions that
span hundreds of kilobases.
Repeats (100 kb+)
Unique
variant
Unique
variant
Can high-coverage ultra-long sequencing resolve
complete assemblies of the human genome?
MinION
100kb+
It’s time to finish the human genome
The Telomere-to-Telomere (T2T) consortium is an
open, community-based effort to generate the
first complete assembly of a human genome.
Our target: CHM13hTERT
Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers
N=46; XX
Our target: CHM13hTERT
Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers
N=46; XX
Intramural Sequencing Center
CHM13 Sequencing
94 MinION/GridION flow cells
11.1M reads
155 Gb (1.6 Gb / flow cell) (50x)
99 Gb in reads >50 kb (32x)
78 Gb in reads >70 kb (25x)
Max mapped read length 1.04 Mb
From May 1/18 – Jan 8/19
Intramural Sequencing Center
CHM13 Sequencing
94 MinION/GridION flow cells
11.1M reads
155 Gb (1.6 Gb / flow cell) (50x)
99 Gb in reads >50 kb (32x)
78 Gb in reads >70 kb (25x)
Max mapped read length 1.04 Mb
From May 1/18 – Jan 8/19
50x Nanopore ultra-long
Contig building
60x PacBio
Polishing
50x 10x Genomics
Polishing
BioNano
Structural validation
• 2.94 Gbp assembly NG50: 75 Mbp
• Exceeds the continuity of the reference
genome GRCh38 (56 Mbp NG50
contig size).
• Subset of chromosome assemblies
break only at centromere.
Roadmap for completing the genome
Canu
Canu
Canu
Orthogonal Validation
Jo and Valerie
2.2 - 3.7 Mb
mean of 3010 kb (S.D. = 429; n = 49)
STRUCTURAL VARIANT
STRUCTURAL VARIANT
151516 15 3 8 2
8
4
Assemble contigs
Using overlapping
SV patterns
XqXp
Scaffold Assembly of XCEN
XqXp
Rel3 Assembly: ~3.1 Mb
The assembly is a hypothesis(!)
2107 294659
Beth SullivanJennifer Gerton
Edmund Howe
Rel3 Assembly: ~3.1 Mb
@NanoporeConf | #NanoporeConf
Marker-assisted mapping
Adam Phillippy Arang Rhie Sergey Koren
@NanoporeConf | #NanoporeConf
Create a scaffold of unique, or
single copy k-mers genome-wide
Marker-assisted mapping
Adam Phillippy Arang Rhie Sergey Koren
Marker-assisted mapping
@NanoporeConf | #NanoporeConf
Anchor high-confident
long-read alignments to
repeat assemblies
Marker-assisted mapping
Adam Phillippy Arang Rhie Sergey Koren
Marker-assisted mapping
28
Confident mapping of long reads
using a single-copy k-mer strategy
Identify and mark all sites of unique anchors across the chromosome
chrX
• 21-mers that appear ~c times in Illumina data
• Also found in PacBio/Nanopore reads
• Less frequent in the centromere, but still there
• (Validated with Duplex-Seq)
29
Confident mapping of long reads
using a single-copy k-mer strategy
Filter long read alignments: retaining those with unique k-mer anchoring
chrX
chrX
30
Spacing of single-copy k-mers can be irregular in
repeat-dense regions
chrX
chrX
X CENTROMERE ARRAY
CENTROMERE
CENX: 3.1 Mbps
Number of k-mers: 2,034
Spacing N50: 6,879
Longest distance
between k-mers
: 53,798 bp
31
10XG Polishing
Unique K-mer-based filtering: Nanopore Reads
longranger + freebayes (two rounds)
nanopolish (two rounds)
arrow (two rounds)
Unique K-mer-based filtering: PacBio (CLR) Reads
chrX
chrX
chrX
GAGE pre-polishing
ChrX GAGE array: 19 tandemly arrayed ~9.4 kb repeats
Coverage
250
200
150
100
50
0
Base position
Most frequent base
Second most frequent base (error)
19 tandemly arrayed ~9.4 kb repeats
GAGE with marker-assisted polishing
Most frequent base
Second most frequent base (error)
ChrX GAGE array: 19 tandemly arrayed ~9.4 kb repeats
Coverage
250
200
150
100
50
0
Base position
19 tandemly arrayed ~9.4 kb repeats
34
CSS/HiFi Evaluation
chrX
HiFi Alignments to Evaluate Polishing
CENTROMERE X:
BEFORE POLISHING
DXZ1: 3.1 Mb
35
CSS/HiFi Evaluation
chrX
HiFi Alignments to Evaluate Polishing
CENTROMERE X:
AFTER POLISHING
NOTE:
Underlying satellite array
structure remains the same.
DXZ1: 3.1 Mb
Opens the whole genome to analysis
Ariel Gershman
Winston Timp’s
Laboratory
Ariel Gershman
Winston Timp’s
Laboratory
Ariel Gershman
Winston Timp’s
Laboratory
Ariel Gershman
Winston Timp’s
Laboratory
1. Structurally validated assembly from telomere-to-telomere. Including
3.1 Mb tandem repeat at the X centromere and providing a complete
assessment across tandemly repeated gene families.
Finished T2T X Chromosome:
High Accuracy and High Continuity
1. Structurally validated assembly from telomere-to-telomere. Including
3.1 Mb tandem repeat at the X centromere and providing a complete
assessment across tandemly repeated gene families.
2. Novel polishing strategy capable of improving the quality of large repeat-
rich regions. Demonstrating dramatic improvements in quality over the
entirety of the X chromosome.
Finished T2T X Chromosome:
High Accuracy and High Continuity
1. Structurally validated assembly from telomere-to-telomere. Including
3.1 Mb tandem repeat at the X centromere and providing a complete
assessment across tandemly repeated gene families.
2. Novel polishing strategy capable of improving the quality of large repeat-
rich regions. Demonstrating dramatic improvements in quality over the
entirety of the X chromosome.
3. Statistics of CHM13 full length BAC alignments to polished assembly:
275/341 (81%) QV 37.4 QV 27.9
153/341 (45%) QV 37.7 QV 27.4
Vollger M, Logsdon, G et al. bioRxiv doi.org/10.1101/635037
MeanMedianBACs Aligned
HiFi
UL-asm
Finished T2T X Chromosome:
High Accuracy and High Continuity
@NanoporeConf | #NanoporeConf
It is time to finish the
human genome
• github.com/nanopore-wgs-consortium/chm13
• 120x Nanopore reads
• NHGRI, UW, Nottingham,
• UC Davis (PromethION, Megan Dennis)
• 50x 10x Genomics linked reads (NHGRI)
• 70x PacBio CLR reads (WashU)
• 24x PacBio HiFi reads (UW)
• 40x Hi-C (Arima Genomics)
• BioNano optical map (WashU)
• Unpolished Canu assemblies
NEW! Rel3 open data release
Additional ultra-long ONT data
from Glennis Logsdon (UW)
Read length Coverage Percent of data
>50 kbp 12X 86%
>100 kbp 9.1X 66%
>150 kbp 6.8X 49%
>200 kbp 4.9X 35%
>250 kbp 3.4X 24%
N50 = 147.1
N1 = 649.6
Max = 1538.3
0.1 1 10 100 1000 10,000
Read length (kbp)
20,000
17,500
15,000
12,500
10,000
7,500
5,000
2,500
0
Numberofreads
13.9X coverage
• github.com/nanopore-wgs-consortium/chm13
• Minimal change in continuity
• 79.5 Mbp (rel2) vs. 71.8 Mbp (rel3) NG50
• Don’t judge assemblies based on continuity
• Tricky regions are fixed
• GAGE and more SegDups automatically resolved
• Improved BAC validation
• 288 (rel2) vs. 310 (rel3) of 341 BACs resolved
• 1 chromosome down, 23 to go…
Triple the coverage, what changed?
Goal of a complete human genome in the next two
years.
Challenges in front of us:
• Acrocentric p-arms
• Large segmental duplications
• Classical Human satellites 2,3
Establishing new benchmarking standards (XChr)
Pioneering new pipelines: Polishing, repeat assembly, and array
structural validation.
Setting the bar higher for quality and completeness.
Telomere-to-telomere assembly of a complete human chromosomes

More Related Content

What's hot

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
mkim8
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
Genome Reference Consortium
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Uzma Jabeen
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
Genome Reference Consortium
 
SciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminarSciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminar
Phil Ewels
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
Torsten Seemann
 
Nano Pore sequencing
Nano Pore sequencingNano Pore sequencing
Nano Pore sequencing
Amandeep Kaur
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
Amritha S R
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
Paolo Dametto
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
VHIR Vall d’Hebron Institut de Recerca
 
Principle and workflow of whole genome bisulfite sequencing
Principle and workflow of whole genome bisulfite sequencingPrinciple and workflow of whole genome bisulfite sequencing
Principle and workflow of whole genome bisulfite sequencing
sciencelearning123
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
Dominic Suciu
 
A brief history of DNA sequencing
A brief history of DNA sequencingA brief history of DNA sequencing
A brief history of DNA sequencing
Eurofins Genomics Germany GmbH
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
Aureliano Bombarely
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
HorizonDiscovery
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
Sukhjinder Singh
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
 
Snp genotyping
Snp genotypingSnp genotyping
Snp genotyping
shivendra kumar
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
VHIR Vall d’Hebron Institut de Recerca
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
QIAGEN
 

What's hot (20)

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
SciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminarSciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminar
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
Nano Pore sequencing
Nano Pore sequencingNano Pore sequencing
Nano Pore sequencing
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Principle and workflow of whole genome bisulfite sequencing
Principle and workflow of whole genome bisulfite sequencingPrinciple and workflow of whole genome bisulfite sequencing
Principle and workflow of whole genome bisulfite sequencing
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
 
A brief history of DNA sequencing
A brief history of DNA sequencingA brief history of DNA sequencing
A brief history of DNA sequencing
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Snp genotyping
Snp genotypingSnp genotyping
Snp genotyping
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 

Similar to Telomere-to-telomere assembly of a complete human chromosomes

London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
Karen Hayden Miga
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
Adam Phillippy
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
GenomeInABottle
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
Genome Reference Consortium
 
Architecture and evolution of neochromosomes
Architecture and evolution of neochromosomesArchitecture and evolution of neochromosomes
Architecture and evolution of neochromosomes
Anthony Papenfuss
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
Miten Jain
 
Sept2016 sv 10_x
Sept2016 sv 10_xSept2016 sv 10_x
Sept2016 sv 10_x
GenomeInABottle
 
Tetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenTetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan Eisen
Jonathan Eisen
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
hansjansen9999
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
Brian Krueger
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
GenomeInABottle
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
hansjansen9999
 
CALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqCALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeq
Ashley Yow
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
aryajayakottarathil
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
c.titus.brown
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
ehsan sepahi
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
Brian Krueger
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdf
NoraCRuizGuevara
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
aaaa bbb
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS
4RTPCRAnand
 

Similar to Telomere-to-telomere assembly of a complete human chromosomes (20)

London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Architecture and evolution of neochromosomes
Architecture and evolution of neochromosomesArchitecture and evolution of neochromosomes
Architecture and evolution of neochromosomes
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
 
Sept2016 sv 10_x
Sept2016 sv 10_xSept2016 sv 10_x
Sept2016 sv 10_x
 
Tetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenTetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan Eisen
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
CALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqCALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeq
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdf
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS
 

More from Genome Reference Consortium

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
Genome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Genome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
Genome Reference Consortium
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
Genome Reference Consortium
 
Mane v2 final
Mane v2 finalMane v2 final
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
Genome Reference Consortium
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
Genome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
Genome Reference Consortium
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
Genome Reference Consortium
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
Genome Reference Consortium
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
Genome Reference Consortium
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
Genome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
Genome Reference Consortium
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
Genome Reference Consortium
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
Genome Reference Consortium
 

More from Genome Reference Consortium (20)

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 

Recently uploaded

GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
Karen593256
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 

Telomere-to-telomere assembly of a complete human chromosomes

  • 1. Telomere-to-telomere assembly of a complete human chromosomes Karen Miga UC Davis Genetics Seminar Sept 30, 2019 @khmiga
  • 2. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies
  • 3. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies Human reference genome is incomplete. • 368 unresolved issues, 102 gaps • Segmental duplications, gene families, satellite arrays, centromeres, rDNAs • Uncharacterized sequence variation in the human population
  • 4. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies Human reference genome is incomplete. • 368 unresolved issues, 102 gaps • Segmental duplications, gene families, satellite arrays, centromeres, rDNAs • Uncharacterized sequence variation in the human population chr21
  • 5. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies Human reference genome is incomplete. • 368 unresolved issues, 102 gaps • Segmental duplications, gene families, satellite arrays, centromeres, rDNAs • Uncharacterized sequence variation in the human population Our current understanding of genome biology and function30 Mb chr21
  • 6. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies Human reference genome is incomplete. • 368 unresolved issues, 102 gaps • Segmental duplications, gene families, satellite arrays, centromeres, rDNAs • Uncharacterized sequence variation in the human population Our current understanding of genome biology and function30 Mb chr21 ~20 Mb ?
  • 7. Challenge: Generating assemblies across repetitive regions that span hundreds of kilobases. Repeats (100 kb+) Unique variant Unique variant Can high-coverage ultra-long sequencing resolve complete assemblies of the human genome?
  • 9. It’s time to finish the human genome The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome.
  • 10. Our target: CHM13hTERT Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers N=46; XX
  • 11. Our target: CHM13hTERT Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers N=46; XX
  • 12. Intramural Sequencing Center CHM13 Sequencing 94 MinION/GridION flow cells 11.1M reads 155 Gb (1.6 Gb / flow cell) (50x) 99 Gb in reads >50 kb (32x) 78 Gb in reads >70 kb (25x) Max mapped read length 1.04 Mb From May 1/18 – Jan 8/19
  • 13. Intramural Sequencing Center CHM13 Sequencing 94 MinION/GridION flow cells 11.1M reads 155 Gb (1.6 Gb / flow cell) (50x) 99 Gb in reads >50 kb (32x) 78 Gb in reads >70 kb (25x) Max mapped read length 1.04 Mb From May 1/18 – Jan 8/19 50x Nanopore ultra-long Contig building 60x PacBio Polishing 50x 10x Genomics Polishing BioNano Structural validation
  • 14. • 2.94 Gbp assembly NG50: 75 Mbp • Exceeds the continuity of the reference genome GRCh38 (56 Mbp NG50 contig size). • Subset of chromosome assemblies break only at centromere. Roadmap for completing the genome Canu
  • 15. Canu
  • 16. Canu
  • 18.
  • 19. 2.2 - 3.7 Mb mean of 3010 kb (S.D. = 429; n = 49)
  • 21. STRUCTURAL VARIANT 151516 15 3 8 2 8 4 Assemble contigs Using overlapping SV patterns
  • 23. XqXp Rel3 Assembly: ~3.1 Mb The assembly is a hypothesis(!)
  • 24. 2107 294659 Beth SullivanJennifer Gerton Edmund Howe Rel3 Assembly: ~3.1 Mb
  • 25. @NanoporeConf | #NanoporeConf Marker-assisted mapping Adam Phillippy Arang Rhie Sergey Koren
  • 26. @NanoporeConf | #NanoporeConf Create a scaffold of unique, or single copy k-mers genome-wide Marker-assisted mapping Adam Phillippy Arang Rhie Sergey Koren Marker-assisted mapping
  • 27. @NanoporeConf | #NanoporeConf Anchor high-confident long-read alignments to repeat assemblies Marker-assisted mapping Adam Phillippy Arang Rhie Sergey Koren Marker-assisted mapping
  • 28. 28 Confident mapping of long reads using a single-copy k-mer strategy Identify and mark all sites of unique anchors across the chromosome chrX • 21-mers that appear ~c times in Illumina data • Also found in PacBio/Nanopore reads • Less frequent in the centromere, but still there • (Validated with Duplex-Seq)
  • 29. 29 Confident mapping of long reads using a single-copy k-mer strategy Filter long read alignments: retaining those with unique k-mer anchoring chrX chrX
  • 30. 30 Spacing of single-copy k-mers can be irregular in repeat-dense regions chrX chrX X CENTROMERE ARRAY CENTROMERE CENX: 3.1 Mbps Number of k-mers: 2,034 Spacing N50: 6,879 Longest distance between k-mers : 53,798 bp
  • 31. 31 10XG Polishing Unique K-mer-based filtering: Nanopore Reads longranger + freebayes (two rounds) nanopolish (two rounds) arrow (two rounds) Unique K-mer-based filtering: PacBio (CLR) Reads chrX chrX chrX
  • 32. GAGE pre-polishing ChrX GAGE array: 19 tandemly arrayed ~9.4 kb repeats Coverage 250 200 150 100 50 0 Base position Most frequent base Second most frequent base (error) 19 tandemly arrayed ~9.4 kb repeats
  • 33. GAGE with marker-assisted polishing Most frequent base Second most frequent base (error) ChrX GAGE array: 19 tandemly arrayed ~9.4 kb repeats Coverage 250 200 150 100 50 0 Base position 19 tandemly arrayed ~9.4 kb repeats
  • 34. 34 CSS/HiFi Evaluation chrX HiFi Alignments to Evaluate Polishing CENTROMERE X: BEFORE POLISHING DXZ1: 3.1 Mb
  • 35. 35 CSS/HiFi Evaluation chrX HiFi Alignments to Evaluate Polishing CENTROMERE X: AFTER POLISHING NOTE: Underlying satellite array structure remains the same. DXZ1: 3.1 Mb
  • 36. Opens the whole genome to analysis Ariel Gershman Winston Timp’s Laboratory
  • 40. 1. Structurally validated assembly from telomere-to-telomere. Including 3.1 Mb tandem repeat at the X centromere and providing a complete assessment across tandemly repeated gene families. Finished T2T X Chromosome: High Accuracy and High Continuity
  • 41. 1. Structurally validated assembly from telomere-to-telomere. Including 3.1 Mb tandem repeat at the X centromere and providing a complete assessment across tandemly repeated gene families. 2. Novel polishing strategy capable of improving the quality of large repeat- rich regions. Demonstrating dramatic improvements in quality over the entirety of the X chromosome. Finished T2T X Chromosome: High Accuracy and High Continuity
  • 42. 1. Structurally validated assembly from telomere-to-telomere. Including 3.1 Mb tandem repeat at the X centromere and providing a complete assessment across tandemly repeated gene families. 2. Novel polishing strategy capable of improving the quality of large repeat- rich regions. Demonstrating dramatic improvements in quality over the entirety of the X chromosome. 3. Statistics of CHM13 full length BAC alignments to polished assembly: 275/341 (81%) QV 37.4 QV 27.9 153/341 (45%) QV 37.7 QV 27.4 Vollger M, Logsdon, G et al. bioRxiv doi.org/10.1101/635037 MeanMedianBACs Aligned HiFi UL-asm Finished T2T X Chromosome: High Accuracy and High Continuity
  • 43. @NanoporeConf | #NanoporeConf It is time to finish the human genome
  • 44. • github.com/nanopore-wgs-consortium/chm13 • 120x Nanopore reads • NHGRI, UW, Nottingham, • UC Davis (PromethION, Megan Dennis) • 50x 10x Genomics linked reads (NHGRI) • 70x PacBio CLR reads (WashU) • 24x PacBio HiFi reads (UW) • 40x Hi-C (Arima Genomics) • BioNano optical map (WashU) • Unpolished Canu assemblies NEW! Rel3 open data release
  • 45. Additional ultra-long ONT data from Glennis Logsdon (UW) Read length Coverage Percent of data >50 kbp 12X 86% >100 kbp 9.1X 66% >150 kbp 6.8X 49% >200 kbp 4.9X 35% >250 kbp 3.4X 24% N50 = 147.1 N1 = 649.6 Max = 1538.3 0.1 1 10 100 1000 10,000 Read length (kbp) 20,000 17,500 15,000 12,500 10,000 7,500 5,000 2,500 0 Numberofreads 13.9X coverage • github.com/nanopore-wgs-consortium/chm13
  • 46. • Minimal change in continuity • 79.5 Mbp (rel2) vs. 71.8 Mbp (rel3) NG50 • Don’t judge assemblies based on continuity • Tricky regions are fixed • GAGE and more SegDups automatically resolved • Improved BAC validation • 288 (rel2) vs. 310 (rel3) of 341 BACs resolved • 1 chromosome down, 23 to go… Triple the coverage, what changed?
  • 47. Goal of a complete human genome in the next two years. Challenges in front of us: • Acrocentric p-arms • Large segmental duplications • Classical Human satellites 2,3 Establishing new benchmarking standards (XChr) Pioneering new pipelines: Polishing, repeat assembly, and array structural validation. Setting the bar higher for quality and completeness.

Editor's Notes

  1. KEY POINT HERE: spacing of unique variants… Some regions are easier than others….
  2. Number of k-mers: 2,034 Spacing N50: 6,879 Longest distance: 53,798 bp
  3. Median BAC QV 37.4 (mean QV 28.0) vs median QV 37.6 (mean WV 27.4 ) for the best CHM13 HiFi asm. And resolve 85% of BACs at >99.8% idy v.s. 54% for prior PacBio asm. T otal BACs: 341 Compressed: 166 1 Median: 99.9895 QV: 39.78811 Mean: 99.8706 QV: 28.88052 Mitchell HiFi: 153 1 Median: 99.9827 QV: 37.61954 Mean: 99.81871 QV: 27.41627 UL + 10x: 275 1 Median: 99.982 QV: 37.44727 Mean: 99.84145 QV: 27.99832
  4. Median BAC QV 37.4 (mean QV 28.0) vs median QV 37.6 (mean WV 27.4 ) for the best CHM13 HiFi asm. And resolve 85% of BACs at >99.8% idy v.s. 54% for prior PacBio asm. T otal BACs: 341 Compressed: 166 1 Median: 99.9895 QV: 39.78811 Mean: 99.8706 QV: 28.88052 Mitchell HiFi: 153 1 Median: 99.9827 QV: 37.61954 Mean: 99.81871 QV: 27.41627 UL + 10x: 275 1 Median: 99.982 QV: 37.44727 Mean: 99.84145 QV: 27.99832
  5. Median BAC QV 37.4 (mean QV 28.0) vs median QV 37.6 (mean WV 27.4 ) for the best CHM13 HiFi asm. And resolve 85% of BACs at >99.8% idy v.s. 54% for prior PacBio asm. T otal BACs: 341 Compressed: 166 1 Median: 99.9895 QV: 39.78811 Mean: 99.8706 QV: 28.88052 Mitchell HiFi: 153 1 Median: 99.9827 QV: 37.61954 Mean: 99.81871 QV: 27.41627 UL + 10x: 275 1 Median: 99.982 QV: 37.44727 Mean: 99.84145 QV: 27.99832