SlideShare a Scribd company logo
1 of 19
GRC/GIAB Workshop:
Getting the Most from the Reference
Assembly and Reference Materials
Oct 15, 2019: 9 am-12 pm
What's new and what's next for the human
reference assembly?
Valerie Schneider, Ph.D.
NCBI
15 October 2019
https://genomereference.org
GRC
• Valerie Schneider
• Kerstin Howe
• Tina Graves
• Paul Flicek
• Tayebeh Rezaie
• Nathan Bouk
• Hsiu-Chuan Chen
• Jo Wood
• Joanna Collins
• Sarah Pelan
• Will Chow
• James Torrance
• Derek Albracht
• Milinn Kremitzki
• Laura Clarke
Thanks to many GRC Collaborators
https://www.ncbi.nlm.nih.gov/grc/credits/
CreditsTwitter: @GenomeRef
Announcements: grc-announce@ncbi.nlm.nih.gov
Funding:
• This work was supported in part by the Intramural Research Program
of the National Library of Medicine, National Institutes of Health.
• The European Molecular Biology Laboratory.
• The Wellcome Trust, UK.
• The MGI was supported by National Institutes of Health grants
5U54HG003079, 5U41HG007635 and 5U24HG009081.
• What’s the reference?
• What’s new: GRCh38.p13 through today
• What’s next?
Outline
What’s the reference?
Anonymous samples
Individual 1A Individual 2A Individual 1B
Haploid mosaic assembly
• Highly contiguous
• Contig N50: 57.9 Mb
• Highly accurate
• per bp error: <10-5
Today’s reference assembly does not
represent:
1. The most common allele/haplotype
2. The longest allele/haplotype
3. The ancestral allele/haplotype
The reference represents
the available Human
Genome Project sequence
1 library ~
What’s the reference? Assembly Model Evolution
Gene1 Gene2
Sample
Gene2
Gene1
chromosome
alt scaffold
Reference Assembly
Gene1
Ref
Assembly
false
gap
chromosome
Sequences from haplotype 1
Sequences from haplotype 2
Linear model: impacts on assembly building and analysis
GRCh37/GRCh38 reference assembly model: represent both haplotypes
many
alt loci scaffold 1
chromosome
alt loci scaffold 2
alt loci scaffold 3
Reference Assembly
Reference Assembly 101: Assembly Model Evolution
chromosome
Patch release: No change to chromosome coordinates
Assembly nomenclature: GRCh38.p$
novel patch scaffold
ALLELIC
fix patch scaffold
PREFERRED
• What’s the reference?
• What’s new: GRCh38.p13 through today
• What’s next?
Outline
GRCh38.p13 (cumulative stats)
• 113 Fix patches: Add >3.88 Mb novel sequence
• 43 added in p13
• 72 Novel patches: Add >1.1 Mb novel sequence
• 2 added in p13
• >25 genes affected
What’s new?: GRCh38.p13
Tayebeh Rezaie
Weds, 9 am
Grand Ballroom B
Level 3 Convention Center
What’s new?: NOR Distal Junction Regions
Brian McStay Lab
DJ sequences are >99% identical between acrocentrics
What’s new?: NOR Distal Junction Regions
Updated chr 21 p-arm
<<<<CENTROMERE TELOMERE>>>>
Reduced clone path (unordered/unoriented)
GRCh38 chr 21 alignment
_
rDNA + NOR DJ
What’s new?: Gap Closures
Data
Source
Origin Assembly Accession Status # Contigs Contig N50
CHM1 NA (haploid) GCA_001297185.2 Contig Assembly Submitted 3,709 26.5 Mb
CHM13 NA (haploid) GCA_002884485.1 Contig Assembly Submitted 1,916 29.2 Mb
NA19240 Yoruban GCA_001524155.4 Chr-level Assembly Submitted 1,826 29.1 Mb
HG00514 Han Chinese GCA_002180035.3 Chr-level Assembly Submitted 2,877 29.4 Mb
NA12878 European GCA_002077035.3 Chr-level Assembly Submitted 3,220 16.8 Mb
HG00733 Puerto Rican GCA_002208065.1 Contig Assembly Submitted 3,580 22.2 Mb
HG01352 Columbian GCA_002209525.1 Contig Assembly Submitted 3,120 22.8 Mb
NA19434 Luhya GCA_002872155.1 Contig Assembly Submitted 3,123 21.5 Mb
HG02059 Kinh-Vietnamese GCA_003070785.1 Contig Assembly Submitted 3,180 25.3 Mb
HG03486 Mende GCA_003086635.1 Contig Assembly Submitted 3,465 5.3 Mb (Sequel)
HG02818 Gambian GCA_003574075.1 Contig Assembly Submitted 3,267 22.5 Mb
HG03807 Bengali GCA_003601015.1 Contig Assembly Submitted 3,103 8.4 Mb (Sequel)
HG04217 Telugu GCA_007821485.1 Contig Assembly Submitted 4,249 3.4 Mb (Sequel)
HG02106 Peruvian GCA_008583285.1 Contig Assembly Submitted 2,636 3.2 Mb (Sequel)
HG00268 Finnish GCA_008065235.1 Contig Assembly Submitted 1,995 20.0 Mb (Sequel)
Compressed diploid assemblies unless otherwise noted
• GRCh38 gaps to be evaluated (n=196)
• Excludes biological gaps and WGS intra-scaffold gaps
• Evaluation: Alignment of 8 collapsed diploid assemblies
• 26 gaps spanned all 8 WGS assemblies, with constant insert length
• Spanning sequence included in GRCh38.p13
• 3 gaps spanned by all 8 WGS assemblies, with variable insert length
• 24 gaps spanned by only a subset of the 8 assemblies
• Remainder of gap evaluations still in progress
Clone CloneWGS WGS WGS
PacBio Assembly
Assessed as one gap
GRCh38
What’s new?: Gap Closures
• What’s the reference?
• What’s new: GRCh38.p13 through today
• What’s next?
Outline
Unresolved genome issues Current curation status
Resolution likelihoods as determined by the GRC review
n=234
What’s next?
Slide: Tayebeh Rezaie
What’s next?
Data Source Origin Status
NA19836 African American Assembly Submission Underway
NA20502 Tuscan Assembly Submission Underway
NA20862 Gujarati Indian Assembly Submission Underway
HG03125 Esan Assembly Assessment Underway
HG02970 Esan Assembly Assessment Underway
NA21309 Maasai Assembly Assessment Underway
NA20300 African American Assembly Assessment Underway
NA20129 African American Assembly Assessment Underway
HG01567 Peruvian Assembly Assessment Underway
HG03719 Telugu Assembly Assessment Underway
HG00766 Chinese Dai Assembly Assessment Underway
NA12395 CEPH Assembly Underway
NA19030 Luyha Assembly Underway
NA19734 Mexican Ancestry Assembly Underway
HG03736 Sri Lankan Assembly Underway
0
20
40
60
80
100
120
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Growth of accessioned complete human genome
assemblies in NCBI Assembly database
• Engagement with T2T consortium
– Chr X
– Other missing sequences
• Continued engagement with Gold Genomes project
– Gap closures
– New Novel patches
• New clone paths for immune regions (improve
existing paths and add diversity)
– MHC
– IgH
• Chr 21 p-arm sequence review and update
– Not possible as patches?
• Community outreach
– Workshops
– Website: Help Desk/FAQs
• Your Data?
What’s next?
(For updated assemblies, only date of initial submission is counted)
GRCh38
released
n=98
GRCh38.p14 (2020)
What’s next?
• Consortium Goals
– Produce 350
Human whole
genome
assemblies
– Fully phased
diploid assemblies
– Identify SVs
between samples
and current
Reference
GRCh38
– Incorporate those
SVs into the
reference, likely
as a graph
representation
• What’s the reference?
• What’s new: GRCh38.p13 through today
• What’s next?
Outline

More Related Content

What's hot

Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...Gabe Rudy
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
KHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKarenMiga
 
Intro to metagenomic binning
Intro to metagenomic binningIntro to metagenomic binning
Intro to metagenomic binningA. Murat Eren
 
Base Editing in Crops: Current Advances, Limitations and Future Implications
Base Editing in Crops: Current Advances, Limitations and Future ImplicationsBase Editing in Crops: Current Advances, Limitations and Future Implications
Base Editing in Crops: Current Advances, Limitations and Future Implicationsharikantyadav6
 
ACMG-Based Variant Classification with VSClinical
ACMG-Based Variant Classification with VSClinicalACMG-Based Variant Classification with VSClinical
ACMG-Based Variant Classification with VSClinicalGolden Helix
 
Telomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeTelomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeAdam Phillippy
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesGenome Reference Consortium
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]HeonjongHan
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondAdamCribbs1
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewPaolo Dametto
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesGolden Helix Inc
 

What's hot (20)

Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
KHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdf
 
Intro to metagenomic binning
Intro to metagenomic binningIntro to metagenomic binning
Intro to metagenomic binning
 
Base Editing in Crops: Current Advances, Limitations and Future Implications
Base Editing in Crops: Current Advances, Limitations and Future ImplicationsBase Editing in Crops: Current Advances, Limitations and Future Implications
Base Editing in Crops: Current Advances, Limitations and Future Implications
 
ACMG-Based Variant Classification with VSClinical
ACMG-Based Variant Classification with VSClinicalACMG-Based Variant Classification with VSClinical
ACMG-Based Variant Classification with VSClinical
 
Telomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeTelomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosome
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
ACMG guidelines 2015: How to interpret DNA variants? [Today's paper]
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 

Similar to What's new and what's next for the human reference assembly?

ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleGenomeInABottle
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
CARTaGENE: Challenges and benefits of a federated biorepository model - Octob...
CARTaGENE: Challenges and benefits of a federated biorepository model - Octob...CARTaGENE: Challenges and benefits of a federated biorepository model - Octob...
CARTaGENE: Challenges and benefits of a federated biorepository model - Octob...CARTaGENE Biobank
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSGolden Helix
 
100,000 Genomes Project.
100,000 Genomes Project.100,000 Genomes Project.
100,000 Genomes Project.David Montaner
 

Similar to What's new and what's next for the human reference assembly? (20)

Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
CARTaGENE: Challenges and benefits of a federated biorepository model - Octob...
CARTaGENE: Challenges and benefits of a federated biorepository model - Octob...CARTaGENE: Challenges and benefits of a federated biorepository model - Octob...
CARTaGENE: Challenges and benefits of a federated biorepository model - Octob...
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
100,000 Genomes Project.
100,000 Genomes Project.100,000 Genomes Project.
100,000 Genomes Project.
 
Choosing best laboratory practice
Choosing best laboratory practiceChoosing best laboratory practice
Choosing best laboratory practice
 
Church clinical2012
Church clinical2012Church clinical2012
Church clinical2012
 

More from Genome Reference Consortium

Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonGenome Reference Consortium
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGenome Reference Consortium
 

More from Genome Reference Consortium (19)

Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 

Recently uploaded

GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 

Recently uploaded (20)

GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 

What's new and what's next for the human reference assembly?

  • 1. GRC/GIAB Workshop: Getting the Most from the Reference Assembly and Reference Materials Oct 15, 2019: 9 am-12 pm
  • 2. What's new and what's next for the human reference assembly? Valerie Schneider, Ph.D. NCBI 15 October 2019 https://genomereference.org
  • 3. GRC • Valerie Schneider • Kerstin Howe • Tina Graves • Paul Flicek • Tayebeh Rezaie • Nathan Bouk • Hsiu-Chuan Chen • Jo Wood • Joanna Collins • Sarah Pelan • Will Chow • James Torrance • Derek Albracht • Milinn Kremitzki • Laura Clarke Thanks to many GRC Collaborators https://www.ncbi.nlm.nih.gov/grc/credits/ CreditsTwitter: @GenomeRef Announcements: grc-announce@ncbi.nlm.nih.gov Funding: • This work was supported in part by the Intramural Research Program of the National Library of Medicine, National Institutes of Health. • The European Molecular Biology Laboratory. • The Wellcome Trust, UK. • The MGI was supported by National Institutes of Health grants 5U54HG003079, 5U41HG007635 and 5U24HG009081.
  • 4. • What’s the reference? • What’s new: GRCh38.p13 through today • What’s next? Outline
  • 5. What’s the reference? Anonymous samples Individual 1A Individual 2A Individual 1B Haploid mosaic assembly • Highly contiguous • Contig N50: 57.9 Mb • Highly accurate • per bp error: <10-5 Today’s reference assembly does not represent: 1. The most common allele/haplotype 2. The longest allele/haplotype 3. The ancestral allele/haplotype The reference represents the available Human Genome Project sequence 1 library ~
  • 6. What’s the reference? Assembly Model Evolution Gene1 Gene2 Sample Gene2 Gene1 chromosome alt scaffold Reference Assembly Gene1 Ref Assembly false gap chromosome Sequences from haplotype 1 Sequences from haplotype 2 Linear model: impacts on assembly building and analysis GRCh37/GRCh38 reference assembly model: represent both haplotypes many alt loci scaffold 1 chromosome alt loci scaffold 2 alt loci scaffold 3 Reference Assembly
  • 7. Reference Assembly 101: Assembly Model Evolution chromosome Patch release: No change to chromosome coordinates Assembly nomenclature: GRCh38.p$ novel patch scaffold ALLELIC fix patch scaffold PREFERRED
  • 8. • What’s the reference? • What’s new: GRCh38.p13 through today • What’s next? Outline
  • 9. GRCh38.p13 (cumulative stats) • 113 Fix patches: Add >3.88 Mb novel sequence • 43 added in p13 • 72 Novel patches: Add >1.1 Mb novel sequence • 2 added in p13 • >25 genes affected What’s new?: GRCh38.p13 Tayebeh Rezaie Weds, 9 am Grand Ballroom B Level 3 Convention Center
  • 10. What’s new?: NOR Distal Junction Regions Brian McStay Lab DJ sequences are >99% identical between acrocentrics
  • 11. What’s new?: NOR Distal Junction Regions Updated chr 21 p-arm <<<<CENTROMERE TELOMERE>>>> Reduced clone path (unordered/unoriented) GRCh38 chr 21 alignment _ rDNA + NOR DJ
  • 12. What’s new?: Gap Closures Data Source Origin Assembly Accession Status # Contigs Contig N50 CHM1 NA (haploid) GCA_001297185.2 Contig Assembly Submitted 3,709 26.5 Mb CHM13 NA (haploid) GCA_002884485.1 Contig Assembly Submitted 1,916 29.2 Mb NA19240 Yoruban GCA_001524155.4 Chr-level Assembly Submitted 1,826 29.1 Mb HG00514 Han Chinese GCA_002180035.3 Chr-level Assembly Submitted 2,877 29.4 Mb NA12878 European GCA_002077035.3 Chr-level Assembly Submitted 3,220 16.8 Mb HG00733 Puerto Rican GCA_002208065.1 Contig Assembly Submitted 3,580 22.2 Mb HG01352 Columbian GCA_002209525.1 Contig Assembly Submitted 3,120 22.8 Mb NA19434 Luhya GCA_002872155.1 Contig Assembly Submitted 3,123 21.5 Mb HG02059 Kinh-Vietnamese GCA_003070785.1 Contig Assembly Submitted 3,180 25.3 Mb HG03486 Mende GCA_003086635.1 Contig Assembly Submitted 3,465 5.3 Mb (Sequel) HG02818 Gambian GCA_003574075.1 Contig Assembly Submitted 3,267 22.5 Mb HG03807 Bengali GCA_003601015.1 Contig Assembly Submitted 3,103 8.4 Mb (Sequel) HG04217 Telugu GCA_007821485.1 Contig Assembly Submitted 4,249 3.4 Mb (Sequel) HG02106 Peruvian GCA_008583285.1 Contig Assembly Submitted 2,636 3.2 Mb (Sequel) HG00268 Finnish GCA_008065235.1 Contig Assembly Submitted 1,995 20.0 Mb (Sequel) Compressed diploid assemblies unless otherwise noted
  • 13. • GRCh38 gaps to be evaluated (n=196) • Excludes biological gaps and WGS intra-scaffold gaps • Evaluation: Alignment of 8 collapsed diploid assemblies • 26 gaps spanned all 8 WGS assemblies, with constant insert length • Spanning sequence included in GRCh38.p13 • 3 gaps spanned by all 8 WGS assemblies, with variable insert length • 24 gaps spanned by only a subset of the 8 assemblies • Remainder of gap evaluations still in progress Clone CloneWGS WGS WGS PacBio Assembly Assessed as one gap GRCh38 What’s new?: Gap Closures
  • 14. • What’s the reference? • What’s new: GRCh38.p13 through today • What’s next? Outline
  • 15. Unresolved genome issues Current curation status Resolution likelihoods as determined by the GRC review n=234 What’s next? Slide: Tayebeh Rezaie
  • 16. What’s next? Data Source Origin Status NA19836 African American Assembly Submission Underway NA20502 Tuscan Assembly Submission Underway NA20862 Gujarati Indian Assembly Submission Underway HG03125 Esan Assembly Assessment Underway HG02970 Esan Assembly Assessment Underway NA21309 Maasai Assembly Assessment Underway NA20300 African American Assembly Assessment Underway NA20129 African American Assembly Assessment Underway HG01567 Peruvian Assembly Assessment Underway HG03719 Telugu Assembly Assessment Underway HG00766 Chinese Dai Assembly Assessment Underway NA12395 CEPH Assembly Underway NA19030 Luyha Assembly Underway NA19734 Mexican Ancestry Assembly Underway HG03736 Sri Lankan Assembly Underway
  • 17. 0 20 40 60 80 100 120 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Growth of accessioned complete human genome assemblies in NCBI Assembly database • Engagement with T2T consortium – Chr X – Other missing sequences • Continued engagement with Gold Genomes project – Gap closures – New Novel patches • New clone paths for immune regions (improve existing paths and add diversity) – MHC – IgH • Chr 21 p-arm sequence review and update – Not possible as patches? • Community outreach – Workshops – Website: Help Desk/FAQs • Your Data? What’s next? (For updated assemblies, only date of initial submission is counted) GRCh38 released n=98 GRCh38.p14 (2020)
  • 18. What’s next? • Consortium Goals – Produce 350 Human whole genome assemblies – Fully phased diploid assemblies – Identify SVs between samples and current Reference GRCh38 – Incorporate those SVs into the reference, likely as a graph representation
  • 19. • What’s the reference? • What’s new: GRCh38.p13 through today • What’s next? Outline