SlideShare a Scribd company logo
Advancements in the human genome reference assembly (GRCh38)
Tayebeh Rezaie, Ph.D.
NCBI
16 October 2019
Funding:
• This work was supported in part by the Intramural Research Program
of the National Library of Medicine, National Institutes of Health.
• The European Molecular Biology Laboratory.
• The Wellcome Trust, UK.
• The MGI was supported by National Institutes of Health grants
5U54HG003079, 5U41HG007635 and 5U24HG009081.
GRC
• Valerie Schneider
• Kerstin Howe
• Tina Graves
• Paul Flicek
• Tayebeh Rezaie
• Nathan Bouk
• Hsiu-Chuan Chen
• Jo Wood
• Joanna Collins
• Sarah Pelan
• Will Chow
• James Torrance
• Ying Sims
• Derek Albracht
• Milinn Kremitzki
Thanks to many GRC Collaborators
https://www.ncbi.nlm.nih.gov/grc/credits/
History of reference assembly
GRCh38/Reference genome:
• A critical resource to the basic & clinical research community, coordinate
system, annotation source & discovery of disease-associated variants
• Sanger seq. clone-based from Human Genome Project; multiple individuals
Individual 1 Individual 1Individual 2
Mosaic haploid
Number of ALT LOCI
HGP model (2003): each genomic region was represented with one sequence
Chromosome
Current model: ALT LOCI added to represent population genomic diversity
Chromosome
Alt loci: divergent
large variation in
genomic regions (not
SNP/small indels)
HGP GRC: reference maintaining, improving and updates African-European
•Major/coordinate-changing: GRCh38 (Dec 2013)
•Patches/no coordinate-change: GRCh38.p13 (Mar 2019)
Reference assembly updates
• 113 Fix patches: Add >3.88 Mb novel seq
• 72 Novel patches: Add >1.1 Mb novel seq
• 261 ALT Loci: Add 3.6 Mb novel seq
The new version of the reference should
capture ALL the updates to GRCh38
Reference updates released as
patches: 185/430 (42%)
The notion for variant representation has started long time ago.
Curation of reference assembly
• Issue sources: GRC assembly evaluation, reports from collaborators, community, literature
• Technology: sequencing, FISH, Optical Mapping
• Data resources: sequences generated by GRC or available in public database (clones, WGS, PCR products)
Evaluation of gaps in GRCh38
• Gap count = 196
Excluded biological gaps & gaps within WGS scaffolds
• Reports of new assm that can close ref. gaps
• To identify gaps that can be spanned
GRCh38
Clone CloneWGS WGS
GAP
PacBio
assembly
Data provided by Tina Graves
Alignments of 8 diploid PacBio assemblies to the reference:
• Spanned with the same amount of seq: 26 (missing seq.)
• Spanned with varying amount of seq: 3 (variation)
• Spanned by some not all assemblies: 24 (complex, missing + variation)
• The remaining gaps are under review
https://www.genome.wustl.edu/research/projects/
Curation of reference assembly: Missing sequences
Evaluation to distinguish error vs. variation
Reported genome issues = 195
 Resolved no change: 94 (variation < 5 Kb)
 Patches (started adding from p1 in 2014)
 FIX = 22
 NOVEL = 43
 Pending action: 36 (Variation 8, sequence
error 17, Unknown 11)
Find chr. context for missing seq.
Add variants (>5 kb) as novel patches
Data sources:
• Eichler’s lab (Kidd et al. (2010) PMID: 20440878), structurally variant fosmid seq.
• Heng Li (GCA_000786075.2), a set of non-redundant seq. absent in GRCh38 and ALTs
GRCh38.p13 updates to reference assembly
The most recent curation to GRCh38:
 FIX patches (43) + NOVEL (2)
 Added >0.5 Mb novel sequence
 Gap closure: 28
 Seq. error correction: 8
 Path: 2
 For p-arm of acrocentric chrs: 5
 Sequence data sources for updates:
 CHM1 assm: 21
 CHM13 assm: 12
 Other WGS assm: 3
 Clones: 9
 Highlights of p13:
 Improved clinically important
genomic regions
 Prader-Willi (5.5 Mb, 1.63 Mb unique)
 CT47A gene cluster
 Improved gene representations:
SLC5A11, GCNT2, SAMD1, GRCK1, C1R,
ECSCR, 5S rRNA
Chr. distribution of GRCh38.p13 patches
Correction of an assembly false gap caused by haplotype incompatibility
Mix haplotype representation
of CT47A in GRCh38
Long haplotype: 12 copies
https://www.ncbi.nlm.nih.gov/grc/human/issues
CHM1 Optical Map supporting
the updated CT47A haplotype
NW_021160027.1 (patch)
CHM1 chr. X
Single haplotype representation
of CT47A in GRCh38.p13
Short haplotype: 7 copies
https://www.ncbi.nlm.nih.gov/grc/human/issues
CYP2D6 haplotypes: genomic diversity of a clinically important region
Involved in metabolizing many prescribed drugs
Scaffolds providing alternate sequence
representations of CYP2D6 region
Alignment of alt loci and patch scaffolds to the CYP2D6 region of chr. 22
Unresolved genome issues Current curation status
Resolution likelihoods as determined by the GRC review
n=234
GRCh38.p14 is planned for release in 2020
The future is
BRIGHT
Conclusion and Future
GRCh38.p14: coming in 2020
MGI, a GRC member, has been awarded by NHGRI to:
• Produce 350 whole genome phased diploid assm.;
• Identify SVs between samples and current GRCh38;
• Incorporate those SVs into the Reference, likely as a
graph representation.
The reference has informed its own evolution.
Washington Post
GRCh39 is pending. The GRC is engaged in
validation, providing curation tools and support
to the pan-genome assemblies.

More Related Content

What's hot

Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
QIAGEN
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
Vall d'Hebron Institute of Research (VHIR)
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
Bioinformatics and Computational Biosciences Branch
 
Introducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Introducing VSClinical: Streamlining ACMG Variant Interpretation GuidelinesIntroducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Introducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Golden Helix
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
Amritha S R
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Torsten Seemann
 
Next generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvementNext generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvement
anjaligoud
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
drelamuruganvet
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
Jatinder Singh
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
Archa Dave
 
NGS data analysis Overview
NGS data analysis Overview NGS data analysis Overview
NGS data analysis Overview
Ravi Gandham
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
Sajad Rafatiyan
 
Metabarcoding QIIME2 workshop - Denoise
Metabarcoding QIIME2 workshop - DenoiseMetabarcoding QIIME2 workshop - Denoise
Metabarcoding QIIME2 workshop - Denoise
Evelien Jongepier
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
Elena Sügis
 
Illumina Sequencing
Illumina SequencingIllumina Sequencing
Illumina Sequencing
USD Bioinformatics
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Uzma Jabeen
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010x
FOODCROPS
 
Johannes Bergsten Dna Barcoding
Johannes Bergsten Dna BarcodingJohannes Bergsten Dna Barcoding
Johannes Bergsten Dna Barcoding
bioinfocourse
 

What's hot (20)

Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Introducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Introducing VSClinical: Streamlining ACMG Variant Interpretation GuidelinesIntroducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
Introducing VSClinical: Streamlining ACMG Variant Interpretation Guidelines
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
 
Next generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvementNext generation sequencing technologies for crop improvement
Next generation sequencing technologies for crop improvement
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
NGS data analysis Overview
NGS data analysis Overview NGS data analysis Overview
NGS data analysis Overview
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Metabarcoding QIIME2 workshop - Denoise
Metabarcoding QIIME2 workshop - DenoiseMetabarcoding QIIME2 workshop - Denoise
Metabarcoding QIIME2 workshop - Denoise
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
Illumina Sequencing
Illumina SequencingIllumina Sequencing
Illumina Sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010x
 
Johannes Bergsten Dna Barcoding
Johannes Bergsten Dna BarcodingJohannes Bergsten Dna Barcoding
Johannes Bergsten Dna Barcoding
 

Similar to Advancements in the human genome reference assembly (GRCh38)

Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
Genome Reference Consortium
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
Genome Reference Consortium
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
Genome Reference Consortium
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
GenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
GenomeInABottle
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
William Chow
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
GenomeInABottle
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
GenomeInABottle
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
Genome Reference Consortium
 
Data sharing and analysis
Data sharing and analysisData sharing and analysis
Data sharing and analysis
EURORDIS Rare Diseases Europe
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
Genome Reference Consortium
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
HorizonDiscovery
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
ExternalEvents
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
GenomeInABottle
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
GenomeInABottle
 
2014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 1402062014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 140206
GenomeInABottle
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
Genome Reference Consortium
 
Bioinformatics tools for development, analysis, and preclinical testing of in...
Bioinformatics tools for development, analysis, and preclinical testing of in...Bioinformatics tools for development, analysis, and preclinical testing of in...
Bioinformatics tools for development, analysis, and preclinical testing of in...
Malachi Griffith
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham Taylor
Human Variome Project
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
Genome Reference Consortium
 

Similar to Advancements in the human genome reference assembly (GRCh38) (20)

Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
 
Data sharing and analysis
Data sharing and analysisData sharing and analysis
Data sharing and analysis
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
2014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 1402062014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 140206
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Bioinformatics tools for development, analysis, and preclinical testing of in...
Bioinformatics tools for development, analysis, and preclinical testing of in...Bioinformatics tools for development, analysis, and preclinical testing of in...
Bioinformatics tools for development, analysis, and preclinical testing of in...
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham Taylor
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 

More from Genome Reference Consortium

Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
Genome Reference Consortium
 
Mane v2 final
Mane v2 finalMane v2 final
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
Genome Reference Consortium
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
Genome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
Genome Reference Consortium
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
Genome Reference Consortium
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
Genome Reference Consortium
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
Genome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
Genome Reference Consortium
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
Genome Reference Consortium
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
Genome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
Genome Reference Consortium
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
Genome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
Genome Reference Consortium
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
Genome Reference Consortium
 

More from Genome Reference Consortium (20)

Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 

Recently uploaded

Nutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptxNutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptx
vimalveerammal
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
RDhivya6
 
acanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptxacanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptx
muralinath2
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
vimalveerammal
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
WEB PROGRAMMING bharathiar university bca unitII
WEB PROGRAMMING  bharathiar university bca unitIIWEB PROGRAMMING  bharathiar university bca unitII
WEB PROGRAMMING bharathiar university bca unitII
VinodhiniRavi2
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
the fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptxthe fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptx
parminder0808singh
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed ProfessorFiroozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
frank0071
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
abhinayakamasamudram
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
rajeshwexl
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
fatima132662
 

Recently uploaded (20)

Nutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptxNutaceuticsls herbal drug technology CVS, cancer.pptx
Nutaceuticsls herbal drug technology CVS, cancer.pptx
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
 
acanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptxacanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptx
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
WEB PROGRAMMING bharathiar university bca unitII
WEB PROGRAMMING  bharathiar university bca unitIIWEB PROGRAMMING  bharathiar university bca unitII
WEB PROGRAMMING bharathiar university bca unitII
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
the fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptxthe fundamental unit of life CBSE class 9.pptx
the fundamental unit of life CBSE class 9.pptx
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed ProfessorFiroozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed Professor
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
 

Advancements in the human genome reference assembly (GRCh38)

  • 1. Advancements in the human genome reference assembly (GRCh38) Tayebeh Rezaie, Ph.D. NCBI 16 October 2019
  • 2. Funding: • This work was supported in part by the Intramural Research Program of the National Library of Medicine, National Institutes of Health. • The European Molecular Biology Laboratory. • The Wellcome Trust, UK. • The MGI was supported by National Institutes of Health grants 5U54HG003079, 5U41HG007635 and 5U24HG009081. GRC • Valerie Schneider • Kerstin Howe • Tina Graves • Paul Flicek • Tayebeh Rezaie • Nathan Bouk • Hsiu-Chuan Chen • Jo Wood • Joanna Collins • Sarah Pelan • Will Chow • James Torrance • Ying Sims • Derek Albracht • Milinn Kremitzki Thanks to many GRC Collaborators https://www.ncbi.nlm.nih.gov/grc/credits/
  • 3. History of reference assembly GRCh38/Reference genome: • A critical resource to the basic & clinical research community, coordinate system, annotation source & discovery of disease-associated variants • Sanger seq. clone-based from Human Genome Project; multiple individuals Individual 1 Individual 1Individual 2 Mosaic haploid Number of ALT LOCI HGP model (2003): each genomic region was represented with one sequence Chromosome Current model: ALT LOCI added to represent population genomic diversity Chromosome Alt loci: divergent large variation in genomic regions (not SNP/small indels) HGP GRC: reference maintaining, improving and updates African-European
  • 4. •Major/coordinate-changing: GRCh38 (Dec 2013) •Patches/no coordinate-change: GRCh38.p13 (Mar 2019) Reference assembly updates • 113 Fix patches: Add >3.88 Mb novel seq • 72 Novel patches: Add >1.1 Mb novel seq • 261 ALT Loci: Add 3.6 Mb novel seq The new version of the reference should capture ALL the updates to GRCh38 Reference updates released as patches: 185/430 (42%) The notion for variant representation has started long time ago.
  • 5. Curation of reference assembly • Issue sources: GRC assembly evaluation, reports from collaborators, community, literature • Technology: sequencing, FISH, Optical Mapping • Data resources: sequences generated by GRC or available in public database (clones, WGS, PCR products) Evaluation of gaps in GRCh38 • Gap count = 196 Excluded biological gaps & gaps within WGS scaffolds • Reports of new assm that can close ref. gaps • To identify gaps that can be spanned GRCh38 Clone CloneWGS WGS GAP PacBio assembly Data provided by Tina Graves Alignments of 8 diploid PacBio assemblies to the reference: • Spanned with the same amount of seq: 26 (missing seq.) • Spanned with varying amount of seq: 3 (variation) • Spanned by some not all assemblies: 24 (complex, missing + variation) • The remaining gaps are under review https://www.genome.wustl.edu/research/projects/
  • 6. Curation of reference assembly: Missing sequences Evaluation to distinguish error vs. variation Reported genome issues = 195  Resolved no change: 94 (variation < 5 Kb)  Patches (started adding from p1 in 2014)  FIX = 22  NOVEL = 43  Pending action: 36 (Variation 8, sequence error 17, Unknown 11) Find chr. context for missing seq. Add variants (>5 kb) as novel patches Data sources: • Eichler’s lab (Kidd et al. (2010) PMID: 20440878), structurally variant fosmid seq. • Heng Li (GCA_000786075.2), a set of non-redundant seq. absent in GRCh38 and ALTs
  • 7. GRCh38.p13 updates to reference assembly The most recent curation to GRCh38:  FIX patches (43) + NOVEL (2)  Added >0.5 Mb novel sequence  Gap closure: 28  Seq. error correction: 8  Path: 2  For p-arm of acrocentric chrs: 5  Sequence data sources for updates:  CHM1 assm: 21  CHM13 assm: 12  Other WGS assm: 3  Clones: 9  Highlights of p13:  Improved clinically important genomic regions  Prader-Willi (5.5 Mb, 1.63 Mb unique)  CT47A gene cluster  Improved gene representations: SLC5A11, GCNT2, SAMD1, GRCK1, C1R, ECSCR, 5S rRNA Chr. distribution of GRCh38.p13 patches
  • 8. Correction of an assembly false gap caused by haplotype incompatibility Mix haplotype representation of CT47A in GRCh38 Long haplotype: 12 copies https://www.ncbi.nlm.nih.gov/grc/human/issues CHM1 Optical Map supporting the updated CT47A haplotype NW_021160027.1 (patch) CHM1 chr. X Single haplotype representation of CT47A in GRCh38.p13 Short haplotype: 7 copies
  • 9. https://www.ncbi.nlm.nih.gov/grc/human/issues CYP2D6 haplotypes: genomic diversity of a clinically important region Involved in metabolizing many prescribed drugs Scaffolds providing alternate sequence representations of CYP2D6 region Alignment of alt loci and patch scaffolds to the CYP2D6 region of chr. 22
  • 10. Unresolved genome issues Current curation status Resolution likelihoods as determined by the GRC review n=234 GRCh38.p14 is planned for release in 2020
  • 11. The future is BRIGHT Conclusion and Future GRCh38.p14: coming in 2020 MGI, a GRC member, has been awarded by NHGRI to: • Produce 350 whole genome phased diploid assm.; • Identify SVs between samples and current GRCh38; • Incorporate those SVs into the Reference, likely as a graph representation. The reference has informed its own evolution. Washington Post GRCh39 is pending. The GRC is engaged in validation, providing curation tools and support to the pan-genome assemblies.