SlideShare a Scribd company logo
1 of 23
Download to read offline
GRC/GIAB Workshop:
Getting the Most from the Reference
Assembly and Reference Materials
Oct 16, 2018: 1-4 pm
The human reference assembly: past, present
and future
Valerie Schneider, Ph.D.
NCBI
16 October 2018
https://genomereference.org
Credits
GRCh38 Collaborators
• NCBI RefSeq and gpipe annotation team
• Havana annotators
• Karen Miga
• Karyn Meltz Steinberg
• David Schwartz
• Steve Goldstein
• Mario Caceres
• Giulio Genovese
• Jeff Kidd
• Peter Lansdorp
• Mark Hills
• David Page
• Jim Knight
• Stephan Schuster
• 1000 Genomes
GRC SAB
• Rick Myers
• Granger Sutton
• Evan Eichler
• Jim Kent
• Roderic Guigo
• Jan Korbel
• Liz Worthey
• Matthew Hurles
• Richard Gibbs
• Carol Bult
• Derek Stemple
GRC
Tina Graves-Lindsay
Tayebeh Rezaie
Kerstin Howe
Paul Flicek
Monte Westerfield
Curators
Developers
Deanna Church
Richard Durbin
Laura Clarke
Twitter: @GenomeRef
Announcements: grc-announce@ncbi.nlm.nih.gov
• Past: Reference assembly 101
• Present: Curating GRCh38
• Future: What’s next for the
reference?
Outline
The reference is a Sanger-seq’d, clone-based assembly
BAC insert
BAC vector
Shotgun sequence clone
Assemble clone
GAPS
Finish (via PCR)
Minimal Clone Tiling Path
Define consensus from switch points of adjacent clones
Ordering the Path
Fingerprint maps
Genetic linkage maps
Radiation hybrid maps
Reference Assembly 101
Today’s reference assembly does not represent:
1.The most common allele/haplotype
2.The longest allele/haplotype
3.The ancestral allele/haplotype
It represents the clone-based sequence available from the HGP
Reference Assembly 101
• Highly contiguous
• High sequence accuracy (finished: <10-5)
• Haploid mosaic
The reference is comprised of sequences from multiple individuals
Reference Assembly 101
Reference Assembly 101
Gene1 Gene2
Sample
Gene1
Ref
Assembly
Slide Credit: Deanna Church
Reference Assembly 101
Current assembly model:
represent both haplotypes
alt loci scaffold
chromosomemany
Gene1 Gene2
Sample
Gene2
Gene1
chromosome
alt scaffold
Reference
GRCh38 (Dec. 2013)
• 178 regions with alt loci: 2% of chromosome
sequence (61.9 Mb)
• 261 Alt Loci: 3.6 Mb novel sequence relative to
chromosomes
• Average alt length = 400 kb, max = ~5 Mb
• >150 genes only represented on alt loci
Gene1
Ref
Assembly
Original assembly model:
compress into a consensus
false
gap
chromosome
Sequences from haplotype 1
Sequences from haplotype 2
• Past: Reference assembly 101
• Present: Curating GRCh38
• Future: What’s next for the
reference?
Outline
• >1000 reported issues resolved
• Closed gaps
• Targeted base fixes
• Corrected path errors
Genome Research 27(5):849-864 (2017)
• Addition of missing paralogs
• Better representation of variation
• Better annotation substrate
• Modeled centromeres
GRCh38 (Dec 2013)
Curating GRCh38
Curating GRCh38
chromosome
novel patch scaffold
fix patch scaffold
Patch release: No change to chromosome coordinates
Assembly nomenclature: GRCh38.p$
GRCh38.p12
• 70 FIX, 70 NOVEL
• Added >2.2 Mb novel
sequence
• >20 genes affected
Since ASHG 2017: 113 resolved
0 10 20 30 40 50 60 70
Gap
Clone
Variation
Localization
Path
Missing Seq
GRC Housekeeping
Unknown
Resolution Odds (n=215/385)
likely potential unlikely
Curating GRCh38
*Unknown: typically bp discrepancy for which
there is currently insufficient info to distinguish
clone error vs. variation
*
Poster 444F (3:00-4:00)
Latest improvements in the
human genome reference
assembly (GRCh38)
Tayebeh Rezaie
• Past: Reference assembly 101
• Present: Curating GRCh38
• Future: What’s next for the
reference?
Outline
• Ideals:
• Provides chromosome context for
any common human sequence
>500 bp
• Supports unambiguous data
interpretation at all clinically
relevant loci
• Imparts no systematic error/bias in
genome-wide analyses
• Real-World:
• Community interest
• Resources for curation
HGP GRC
What’s next?
Defining “Done”
What’s next?
Initial Falcon Assembly
Collection of 40-50 Falcon
Assemblies w/ varied parameters
Select “Best” Assembly: combo
of N50/length
Error Correction Quiver/Pilon
Identify chimeric contigs from
BioNano alignment
Submit to GenBank
What’s next?
Data Source Origin Assembly Accession Status
CHM1 NA (haploid) GCA_001297185.2 Contig Assembly Submitted
CHM13 NA (haploid) GCA_002884485.1 Contig Assembly Submitted
NA19240 Yoruban GCA_001524155.4 Chr-level Assembly Submitted
HG00514 Han Chinese GCA_002180035.2 Chr-level Assembly Submitted
NA12878 European GCA_002077035.3 Chr-level Assembly Submitted
HG00733 Puerto Rican GCA_002208065.1 Contig Assembly Submitted
HG01352 Columbian GCA_002209525.1 Contig Assembly Submitted
NA19434 Luhya GCA_002872155.1 Contig Assembly Submitted
HG02059 Kinh-Vietnamese GCA_003070785.1 Contig Assembly Submitted
HG03486 Mende GCA_003086635.1 Contig Assembly Submitted
HG02818 Gambian GCA_003574075.1 Contig Assembly Submitted
HG03807 Bengali GCA_003601015.1 Contig Assembly Submitted
HG04217 Telugu Assembly Assessment
HG02106 Peruvian Assembly Assessment
HG00268 Finnish Assembly Assessment
NA19836 African American Assembly Underway
HG03125 Esan Data Generation Underway
What’s next?
Sample Population Ungapped Size # Contigs Contig N50 Sequencer
NA19240 Yoruban 2.87 Gb 2521 29.1 Mb RSII
HG00733 Puerto Rican 2.88 Gb 3580 22.2 Mb RSII
NA12878 European 2.85 Gb 3220 16.8 Mb RSII
HG01352 Columbian 2.88 Gb 3120 22.8 Mb RSII
HG00514 Han Chinese 2.87 Gb 3190 25.3 Mb RSII
NA19434 Luhya 2.86 Gb 3123 21.5 Mb RSII
HG02059 Kinh-Vietnamese 2.90 Gb 3180 25.3 Mb RSII
HG02818 Gambian 2.88 Gb 3267 22.5 Mb RSII
HG03486 Mende 2.87 Gb 3465 5.3 Mb* Sequel
HG003087 Bengali 2.86 Gb 3103 8.4 Mb** Sequel +RSII
Poster 442W (3:00-4:00)
New methods for discovery and
interpretation of allelic diversity
in human genomes
Bob Fulton
What’s next?
GRC curation challenge: which
insertion(s) to represent?
Indel polymorphism at GRCh38 gap
What’s next?
GAP
Optical map confirmation of WGS contigs
Indel region
Indel region
Indel region
• Add representation for
acrocentric chromosome short-
arm sequences (McStay)
• Improved centromere
representations (Miga)
• New clone paths for immune
regions (improve existing paths
and add diversity) (Watson)
• Community outreach
–Workshops
–Website: Help Desk/FAQs
• Your Data?
What’s next?
(For updated assemblies, only date of initial submission is counted)
0
10
20
30
40
50
60
70
80
90
100
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Growth of accessioned (full) human genome
assemblies in NCBI Assembly database
GRCh38
released
n=91
GRCh39?
• Remain committed to mission to provide the
best representation of the human genome to
meet basic and clinical research needs
• Make GRCh38 updates publicly available at
regular intervals in the form of patch releases
• Indefinitely postpone GRCh39 while evaluating
new models and sequence content for the
human reference assembly currently in
development
What’s next?
MGI Assemblies Acknowledgements
The McDonnell Genome Institute at
Washington University in St. Louis
Susan Dutcher
Bob Fulton
Wes Warren
Ira Hall
Karyn Meltz Steinberg
Derek Albracht
Milinn Kremitzki
Susan Rock
Chad Tomlinson
Patrick Minx
Chris Markovic
Eddie Belter
Lee Trani
Sara Kohlberg
University of Washington
Evan Eichler
NCBI
Valerie Schneider
BioNano Genomics
Alex Hastie
Pacific Biosciences
Nick Sisneros
Sarah Kingan
Luke Hickey
Greg Concepcion
UCSF
Pui-Yan Kwok
Yvonne Lai
Chin Lin
Catherine Chu
10X Genomics
Deanna Church
Nationwide Children’s Hospital
Richard Wilson
Vince Magrini
Sean McGrath
UCSC
Ed Green

More Related Content

What's hot

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Saul Kravitz
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 

What's hot (20)

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
ABGT 2016 Workshop Schneider
ABGT 2016 Workshop SchneiderABGT 2016 Workshop Schneider
ABGT 2016 Workshop Schneider
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 

Similar to Schneider grc workshop_final

Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Miten Jain
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleGenomeInABottle
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?Adam Phillippy
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
KHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKarenMiga
 
Johannes Bergsten Dna Barcoding
Johannes Bergsten Dna BarcodingJohannes Bergsten Dna Barcoding
Johannes Bergsten Dna Barcodingbioinfocourse
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Stuart MacGowan
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionMinesh A. Jethva
 
Computational Biology thesis defense
Computational Biology thesis defenseComputational Biology thesis defense
Computational Biology thesis defensecsfunk
 

Similar to Schneider grc workshop_final (20)

AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
KHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdf
 
Johannes Bergsten Dna Barcoding
Johannes Bergsten Dna BarcodingJohannes Bergsten Dna Barcoding
Johannes Bergsten Dna Barcoding
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools Selection
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
Computational Biology thesis defense
Computational Biology thesis defenseComputational Biology thesis defense
Computational Biology thesis defense
 
Ashg2015 grc-pruitt
Ashg2015 grc-pruittAshg2015 grc-pruitt
Ashg2015 grc-pruitt
 

More from Genome Reference Consortium

More from Genome Reference Consortium (12)

Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 

Recently uploaded

structure of proteins and its type I PPT
structure of proteins and its type I PPTstructure of proteins and its type I PPT
structure of proteins and its type I PPTvishalbhati28
 
Geometric New Earth, Solarsystem, projection
Geometric New Earth, Solarsystem, projectionGeometric New Earth, Solarsystem, projection
Geometric New Earth, Solarsystem, projectionWim van Es
 
Preparation of enterprise budget for integrated fish farming
Preparation of enterprise budget for integrated fish farmingPreparation of enterprise budget for integrated fish farming
Preparation of enterprise budget for integrated fish farmingbhanilsaa
 
Production of super male Tilapia (Sex reversal techniques).pptx
Production of super male Tilapia (Sex reversal techniques).pptxProduction of super male Tilapia (Sex reversal techniques).pptx
Production of super male Tilapia (Sex reversal techniques).pptxAKSHAY MANDAL
 
Science9 Quarter 3:Latitude and altitude.pptx
Science9 Quarter 3:Latitude and altitude.pptxScience9 Quarter 3:Latitude and altitude.pptx
Science9 Quarter 3:Latitude and altitude.pptxteleganne21
 
Theory of indicators: Ostwald's and Quinonoid theories
Theory of indicators: Ostwald's and Quinonoid theoriesTheory of indicators: Ostwald's and Quinonoid theories
Theory of indicators: Ostwald's and Quinonoid theoriesChimwemweGladysBanda
 
AKSHITA A R ECOLOGICAL NICHE and Gauss lawpptx
AKSHITA A R ECOLOGICAL NICHE and Gauss lawpptxAKSHITA A R ECOLOGICAL NICHE and Gauss lawpptx
AKSHITA A R ECOLOGICAL NICHE and Gauss lawpptxharichikku1713
 
Introduction about protein and General method of analysis of protein
Introduction about protein and General method of analysis of proteinIntroduction about protein and General method of analysis of protein
Introduction about protein and General method of analysis of proteinSowmiya
 
Zoogeographical regions In the World.pptx
Zoogeographical regions In the World.pptxZoogeographical regions In the World.pptx
Zoogeographical regions In the World.pptx2019n04898
 
ROLE OF HERBS IN COSMETIC SKIN CARE: ALOE AND TURMERIC
ROLE OF HERBS IN COSMETIC SKIN CARE: ALOE AND TURMERICROLE OF HERBS IN COSMETIC SKIN CARE: ALOE AND TURMERIC
ROLE OF HERBS IN COSMETIC SKIN CARE: ALOE AND TURMERICsnehalraut2002
 
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...ORAU
 
INFLUENCE OF PREHARVEST PRACTICES, ENZYMATIC AND TEXTURAL CHANGES, RESPIRATIO...
INFLUENCE OF PREHARVEST PRACTICES, ENZYMATIC AND TEXTURAL CHANGES, RESPIRATIO...INFLUENCE OF PREHARVEST PRACTICES, ENZYMATIC AND TEXTURAL CHANGES, RESPIRATIO...
INFLUENCE OF PREHARVEST PRACTICES, ENZYMATIC AND TEXTURAL CHANGES, RESPIRATIO...Ajay kamboj
 
Pests of Maize_Dr.UPR_Identification, Binomics, Integrated Pest Management
Pests of Maize_Dr.UPR_Identification, Binomics, Integrated Pest ManagementPests of Maize_Dr.UPR_Identification, Binomics, Integrated Pest Management
Pests of Maize_Dr.UPR_Identification, Binomics, Integrated Pest ManagementPirithiRaju
 
1David Andress - The Oxford Handbook of the French Revolution-Oxford Universi...
1David Andress - The Oxford Handbook of the French Revolution-Oxford Universi...1David Andress - The Oxford Handbook of the French Revolution-Oxford Universi...
1David Andress - The Oxford Handbook of the French Revolution-Oxford Universi...klada0003
 
Solid waste management_13_409_U1_2024.pptx
Solid waste management_13_409_U1_2024.pptxSolid waste management_13_409_U1_2024.pptx
Solid waste management_13_409_U1_2024.pptxkrishuchavda31032003
 
layers of the earths atmosphere.ppt slides for grade 9
layers of the earths atmosphere.ppt slides for grade 9layers of the earths atmosphere.ppt slides for grade 9
layers of the earths atmosphere.ppt slides for grade 9rolanaribato30
 
Description of cultivating Duckweed Syllabus.pdf
Description of cultivating Duckweed Syllabus.pdfDescription of cultivating Duckweed Syllabus.pdf
Description of cultivating Duckweed Syllabus.pdfHaim R. Branisteanu
 
Basics Of Computers | The Computer System
Basics Of Computers | The Computer SystemBasics Of Computers | The Computer System
Basics Of Computers | The Computer SystemNehaRohtagi1
 
Introduction to Green chemistry ppt.pptx
Introduction to Green chemistry ppt.pptxIntroduction to Green chemistry ppt.pptx
Introduction to Green chemistry ppt.pptxMuskan219429
 

Recently uploaded (20)

structure of proteins and its type I PPT
structure of proteins and its type I PPTstructure of proteins and its type I PPT
structure of proteins and its type I PPT
 
Geometric New Earth, Solarsystem, projection
Geometric New Earth, Solarsystem, projectionGeometric New Earth, Solarsystem, projection
Geometric New Earth, Solarsystem, projection
 
Preparation of enterprise budget for integrated fish farming
Preparation of enterprise budget for integrated fish farmingPreparation of enterprise budget for integrated fish farming
Preparation of enterprise budget for integrated fish farming
 
Production of super male Tilapia (Sex reversal techniques).pptx
Production of super male Tilapia (Sex reversal techniques).pptxProduction of super male Tilapia (Sex reversal techniques).pptx
Production of super male Tilapia (Sex reversal techniques).pptx
 
Science9 Quarter 3:Latitude and altitude.pptx
Science9 Quarter 3:Latitude and altitude.pptxScience9 Quarter 3:Latitude and altitude.pptx
Science9 Quarter 3:Latitude and altitude.pptx
 
Proof-of-Concept Publicly Accessible Data Dashboards from the US-EPA.pptx
Proof-of-Concept Publicly Accessible Data Dashboards from the US-EPA.pptxProof-of-Concept Publicly Accessible Data Dashboards from the US-EPA.pptx
Proof-of-Concept Publicly Accessible Data Dashboards from the US-EPA.pptx
 
Theory of indicators: Ostwald's and Quinonoid theories
Theory of indicators: Ostwald's and Quinonoid theoriesTheory of indicators: Ostwald's and Quinonoid theories
Theory of indicators: Ostwald's and Quinonoid theories
 
AKSHITA A R ECOLOGICAL NICHE and Gauss lawpptx
AKSHITA A R ECOLOGICAL NICHE and Gauss lawpptxAKSHITA A R ECOLOGICAL NICHE and Gauss lawpptx
AKSHITA A R ECOLOGICAL NICHE and Gauss lawpptx
 
Introduction about protein and General method of analysis of protein
Introduction about protein and General method of analysis of proteinIntroduction about protein and General method of analysis of protein
Introduction about protein and General method of analysis of protein
 
Zoogeographical regions In the World.pptx
Zoogeographical regions In the World.pptxZoogeographical regions In the World.pptx
Zoogeographical regions In the World.pptx
 
ROLE OF HERBS IN COSMETIC SKIN CARE: ALOE AND TURMERIC
ROLE OF HERBS IN COSMETIC SKIN CARE: ALOE AND TURMERICROLE OF HERBS IN COSMETIC SKIN CARE: ALOE AND TURMERIC
ROLE OF HERBS IN COSMETIC SKIN CARE: ALOE AND TURMERIC
 
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
 
INFLUENCE OF PREHARVEST PRACTICES, ENZYMATIC AND TEXTURAL CHANGES, RESPIRATIO...
INFLUENCE OF PREHARVEST PRACTICES, ENZYMATIC AND TEXTURAL CHANGES, RESPIRATIO...INFLUENCE OF PREHARVEST PRACTICES, ENZYMATIC AND TEXTURAL CHANGES, RESPIRATIO...
INFLUENCE OF PREHARVEST PRACTICES, ENZYMATIC AND TEXTURAL CHANGES, RESPIRATIO...
 
Pests of Maize_Dr.UPR_Identification, Binomics, Integrated Pest Management
Pests of Maize_Dr.UPR_Identification, Binomics, Integrated Pest ManagementPests of Maize_Dr.UPR_Identification, Binomics, Integrated Pest Management
Pests of Maize_Dr.UPR_Identification, Binomics, Integrated Pest Management
 
1David Andress - The Oxford Handbook of the French Revolution-Oxford Universi...
1David Andress - The Oxford Handbook of the French Revolution-Oxford Universi...1David Andress - The Oxford Handbook of the French Revolution-Oxford Universi...
1David Andress - The Oxford Handbook of the French Revolution-Oxford Universi...
 
Solid waste management_13_409_U1_2024.pptx
Solid waste management_13_409_U1_2024.pptxSolid waste management_13_409_U1_2024.pptx
Solid waste management_13_409_U1_2024.pptx
 
layers of the earths atmosphere.ppt slides for grade 9
layers of the earths atmosphere.ppt slides for grade 9layers of the earths atmosphere.ppt slides for grade 9
layers of the earths atmosphere.ppt slides for grade 9
 
Description of cultivating Duckweed Syllabus.pdf
Description of cultivating Duckweed Syllabus.pdfDescription of cultivating Duckweed Syllabus.pdf
Description of cultivating Duckweed Syllabus.pdf
 
Basics Of Computers | The Computer System
Basics Of Computers | The Computer SystemBasics Of Computers | The Computer System
Basics Of Computers | The Computer System
 
Introduction to Green chemistry ppt.pptx
Introduction to Green chemistry ppt.pptxIntroduction to Green chemistry ppt.pptx
Introduction to Green chemistry ppt.pptx
 

Schneider grc workshop_final

  • 1. GRC/GIAB Workshop: Getting the Most from the Reference Assembly and Reference Materials Oct 16, 2018: 1-4 pm
  • 2. The human reference assembly: past, present and future Valerie Schneider, Ph.D. NCBI 16 October 2018 https://genomereference.org
  • 3. Credits GRCh38 Collaborators • NCBI RefSeq and gpipe annotation team • Havana annotators • Karen Miga • Karyn Meltz Steinberg • David Schwartz • Steve Goldstein • Mario Caceres • Giulio Genovese • Jeff Kidd • Peter Lansdorp • Mark Hills • David Page • Jim Knight • Stephan Schuster • 1000 Genomes GRC SAB • Rick Myers • Granger Sutton • Evan Eichler • Jim Kent • Roderic Guigo • Jan Korbel • Liz Worthey • Matthew Hurles • Richard Gibbs • Carol Bult • Derek Stemple GRC Tina Graves-Lindsay Tayebeh Rezaie Kerstin Howe Paul Flicek Monte Westerfield Curators Developers Deanna Church Richard Durbin Laura Clarke Twitter: @GenomeRef Announcements: grc-announce@ncbi.nlm.nih.gov
  • 4. • Past: Reference assembly 101 • Present: Curating GRCh38 • Future: What’s next for the reference? Outline
  • 5. The reference is a Sanger-seq’d, clone-based assembly BAC insert BAC vector Shotgun sequence clone Assemble clone GAPS Finish (via PCR) Minimal Clone Tiling Path Define consensus from switch points of adjacent clones Ordering the Path Fingerprint maps Genetic linkage maps Radiation hybrid maps Reference Assembly 101
  • 6. Today’s reference assembly does not represent: 1.The most common allele/haplotype 2.The longest allele/haplotype 3.The ancestral allele/haplotype It represents the clone-based sequence available from the HGP Reference Assembly 101 • Highly contiguous • High sequence accuracy (finished: <10-5) • Haploid mosaic
  • 7. The reference is comprised of sequences from multiple individuals Reference Assembly 101
  • 8. Reference Assembly 101 Gene1 Gene2 Sample Gene1 Ref Assembly Slide Credit: Deanna Church
  • 9. Reference Assembly 101 Current assembly model: represent both haplotypes alt loci scaffold chromosomemany Gene1 Gene2 Sample Gene2 Gene1 chromosome alt scaffold Reference GRCh38 (Dec. 2013) • 178 regions with alt loci: 2% of chromosome sequence (61.9 Mb) • 261 Alt Loci: 3.6 Mb novel sequence relative to chromosomes • Average alt length = 400 kb, max = ~5 Mb • >150 genes only represented on alt loci Gene1 Ref Assembly Original assembly model: compress into a consensus false gap chromosome Sequences from haplotype 1 Sequences from haplotype 2
  • 10. • Past: Reference assembly 101 • Present: Curating GRCh38 • Future: What’s next for the reference? Outline
  • 11. • >1000 reported issues resolved • Closed gaps • Targeted base fixes • Corrected path errors Genome Research 27(5):849-864 (2017) • Addition of missing paralogs • Better representation of variation • Better annotation substrate • Modeled centromeres GRCh38 (Dec 2013) Curating GRCh38
  • 12. Curating GRCh38 chromosome novel patch scaffold fix patch scaffold Patch release: No change to chromosome coordinates Assembly nomenclature: GRCh38.p$ GRCh38.p12 • 70 FIX, 70 NOVEL • Added >2.2 Mb novel sequence • >20 genes affected Since ASHG 2017: 113 resolved
  • 13. 0 10 20 30 40 50 60 70 Gap Clone Variation Localization Path Missing Seq GRC Housekeeping Unknown Resolution Odds (n=215/385) likely potential unlikely Curating GRCh38 *Unknown: typically bp discrepancy for which there is currently insufficient info to distinguish clone error vs. variation * Poster 444F (3:00-4:00) Latest improvements in the human genome reference assembly (GRCh38) Tayebeh Rezaie
  • 14. • Past: Reference assembly 101 • Present: Curating GRCh38 • Future: What’s next for the reference? Outline
  • 15. • Ideals: • Provides chromosome context for any common human sequence >500 bp • Supports unambiguous data interpretation at all clinically relevant loci • Imparts no systematic error/bias in genome-wide analyses • Real-World: • Community interest • Resources for curation HGP GRC What’s next? Defining “Done”
  • 17. Initial Falcon Assembly Collection of 40-50 Falcon Assemblies w/ varied parameters Select “Best” Assembly: combo of N50/length Error Correction Quiver/Pilon Identify chimeric contigs from BioNano alignment Submit to GenBank What’s next?
  • 18. Data Source Origin Assembly Accession Status CHM1 NA (haploid) GCA_001297185.2 Contig Assembly Submitted CHM13 NA (haploid) GCA_002884485.1 Contig Assembly Submitted NA19240 Yoruban GCA_001524155.4 Chr-level Assembly Submitted HG00514 Han Chinese GCA_002180035.2 Chr-level Assembly Submitted NA12878 European GCA_002077035.3 Chr-level Assembly Submitted HG00733 Puerto Rican GCA_002208065.1 Contig Assembly Submitted HG01352 Columbian GCA_002209525.1 Contig Assembly Submitted NA19434 Luhya GCA_002872155.1 Contig Assembly Submitted HG02059 Kinh-Vietnamese GCA_003070785.1 Contig Assembly Submitted HG03486 Mende GCA_003086635.1 Contig Assembly Submitted HG02818 Gambian GCA_003574075.1 Contig Assembly Submitted HG03807 Bengali GCA_003601015.1 Contig Assembly Submitted HG04217 Telugu Assembly Assessment HG02106 Peruvian Assembly Assessment HG00268 Finnish Assembly Assessment NA19836 African American Assembly Underway HG03125 Esan Data Generation Underway What’s next?
  • 19. Sample Population Ungapped Size # Contigs Contig N50 Sequencer NA19240 Yoruban 2.87 Gb 2521 29.1 Mb RSII HG00733 Puerto Rican 2.88 Gb 3580 22.2 Mb RSII NA12878 European 2.85 Gb 3220 16.8 Mb RSII HG01352 Columbian 2.88 Gb 3120 22.8 Mb RSII HG00514 Han Chinese 2.87 Gb 3190 25.3 Mb RSII NA19434 Luhya 2.86 Gb 3123 21.5 Mb RSII HG02059 Kinh-Vietnamese 2.90 Gb 3180 25.3 Mb RSII HG02818 Gambian 2.88 Gb 3267 22.5 Mb RSII HG03486 Mende 2.87 Gb 3465 5.3 Mb* Sequel HG003087 Bengali 2.86 Gb 3103 8.4 Mb** Sequel +RSII Poster 442W (3:00-4:00) New methods for discovery and interpretation of allelic diversity in human genomes Bob Fulton What’s next?
  • 20. GRC curation challenge: which insertion(s) to represent? Indel polymorphism at GRCh38 gap What’s next? GAP Optical map confirmation of WGS contigs Indel region Indel region Indel region
  • 21. • Add representation for acrocentric chromosome short- arm sequences (McStay) • Improved centromere representations (Miga) • New clone paths for immune regions (improve existing paths and add diversity) (Watson) • Community outreach –Workshops –Website: Help Desk/FAQs • Your Data? What’s next? (For updated assemblies, only date of initial submission is counted) 0 10 20 30 40 50 60 70 80 90 100 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Growth of accessioned (full) human genome assemblies in NCBI Assembly database GRCh38 released n=91
  • 22. GRCh39? • Remain committed to mission to provide the best representation of the human genome to meet basic and clinical research needs • Make GRCh38 updates publicly available at regular intervals in the form of patch releases • Indefinitely postpone GRCh39 while evaluating new models and sequence content for the human reference assembly currently in development What’s next?
  • 23. MGI Assemblies Acknowledgements The McDonnell Genome Institute at Washington University in St. Louis Susan Dutcher Bob Fulton Wes Warren Ira Hall Karyn Meltz Steinberg Derek Albracht Milinn Kremitzki Susan Rock Chad Tomlinson Patrick Minx Chris Markovic Eddie Belter Lee Trani Sara Kohlberg University of Washington Evan Eichler NCBI Valerie Schneider BioNano Genomics Alex Hastie Pacific Biosciences Nick Sisneros Sarah Kingan Luke Hickey Greg Concepcion UCSF Pui-Yan Kwok Yvonne Lai Chin Lin Catherine Chu 10X Genomics Deanna Church Nationwide Children’s Hospital Richard Wilson Vince Magrini Sean McGrath UCSC Ed Green