SlideShare a Scribd company logo
Genome in a Bottle: Reference
Materials to Benchmark
Challenging Variants and
Regions of the Human Genome
Justin Zook, on behalf of the Genome in a Bottle Consortium
National Institute of Standards and Technology (NIST)
Human Genomics Team
Sept 30, 2021
Motivation for Genome in a Bottle: Sequencing and analysis methods can give
different answers, particularly in challenging, repetitive regions
O’Rawe et al, Genome Medicine, 2013
https://doi.org/10.1186/gm432
GIAB has characterized variants in 7
human genomes
National I nstituteof S tandards & Te
c
hnology
Re
port of I nve
stigation
Reference Material 8391
Human DNA for Whole-Genome Variant Assessment
(Son of Eastern European Ashkenazim Jewish Ancestry)
This Reference Material (RM) is intended for validation, optimization, and process evaluation purposes. It consists
of a male whole human genome sample of Eastern European Ashkenazim Jewish ancestry, and it can be used to assess
performance of variant calling from genome sequencing. A unit of RM 8391 consists of a vial containing human
genomic DNA extracted from a single large growth of human lymphoblastoid cell line GM24385 from the Coriell
Institute for Medical Research (Camden, NJ). The vial contains approximately 10 µg of genomic DNA, with the peak
of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA, and the DNA is in TE buffer
(10 mM TRIS, 1 mM EDTA, pH 8.0).
This material is intended for assessing performance of human genome sequencing variant calling by obtaining
estimates of true positives, false positives, true negatives, and false negatives. Sequencing applications could include
whole genome sequencing, whole exome sequencing, and more targeted sequencing such as gene panels. This
genomic DNA is intended to be analyzed in the same way as any other sample a lab would process and analyze
extracted DNA. Because the RM is extracted DNA, it is not useful for assessing pre-analytical steps such as DNA
extraction, but it does challenge sequencing library preparation, sequencing machines, and the bioinformatics steps of
mapping, alignment, and variant calling. This RM is not intended to assess subsequent bioinformatics steps such as
functional or clinical interpretation.
Information Values: Information values are provided for single nucleotide polymorphisms (SNPs), small insertions
and deletions (indels), and homozygous reference genotypes for approximately 88 % of the genome, using methods
similar to described in reference 1. An information value is considered to be a value that will be of interest and use to
the RM user, but insufficient information is available to assess the uncertainty associated with the value. We describe
and disseminate our best, most confident, estimate of the genotypes using the data and methods currently available.
These data and genomic characterizations will be maintained over time as new data accrue and measurement and
informatics methods become available. The information values are given as a variant call file (vcf) that contains the
high-confidence SNPs and small indels, as well as a tab-delimited “bed” file that describes the regions that are called
high-confidence. Information values cannot be used to establish metrological traceability. The files referenced in this
report are available at the Genome in a Bottle ftp site hosted by the National Center for Biotechnology Information
(NCBI). The Genome in a Bottle ftp site for the high-confidence vcf and high confidence regions is:
HG001
HG002
HG003 HG004
HG006 HG007
HG005
AJ Trio
Chinese
Trio
Pilot Genome
NA12878
GIAB “Open Science” Virtuous Cycle
Users
analyze
GIAB
Samples
Benchmark
vs. GIAB
data
Critical
feedback to
GIAB
Integrate
new
methods
New
benchmark
data
Method
development,
optimization, and
demonstration
Part of assay
validation
GIAB/NIST
expands to
more difficult
regions
Design of our human genome reference values
Benchmark
Variant
Calls
Benchmark
Regions –
regions in which
the benchmark
contains (almost)
all the variants
Benchmark
Variant
Calls
Design of our human genome reference values
Reference
Values*
Benchmark
Variant
Calls
Design of our human genome reference values
Benchmark
Regions
*Currently no quality or
confidence scores associated
with our reference values
Variants from
any method
being evaluated
Design of our human genome reference values
Benchmark
Regions
Benchmark
Variant
Calls
Benchmark
Regions
Variants
outside
benchmark
regions are
not assessed
Majority of
variants unique
to method should
be false positives
(FPs)
Majority of
variants
unique to
benchmark
should be
false
negatives
(FNs)
Matching
variants
assumed to be
true positives
Variants from
any method
being evaluated
Benchmark
Variant
Calls
Design of our human genome reference values
In 2019, GIAB and GA4GH Published
Resources for “Easier” Small Variants
First
Structural
Variant
Benchmark
Published
https://doi.org/10.1038/s41436-021-01187-w
v4.2.1 Small Variant Benchmark used Long and Linked Reads
Reference Build Benchmark Set Reference Coverage SNVs Indels Base pairs in Seg Dups and low mappability
GRCh37 v3.3.2 87.8 3,048,869 464,463 57,277,670
GRCh37 v4.2.1 94.1 3,353,881 522,388 133,848,288
GRCh38 v3.3.2 85.4 3,030,495 475,332 65,714,199
GRCh38 v4.2.1 92.2 3,367,208 525,545 145,585,710
Wagner et al, https://doi.org/10.1101/2020.07.24.2127
New benchmark includes challenging genes like PMS2
Segmental duplications
Collaborating with FDA to use GIAB
benchmark to inspire new methods
https://precision.fda.gov/challenges/10
The best-performing submissions were from new sequencing
technologies and bioinformatics methods
Olson et al, https://doi.org/10.1101/2020.11.13.380
Expanding the benchmark was important to demonstrate improved
technologies and analysis methods for difficult genome regions
Olson et al, https://doi.org/10.1101/2020.11.13.380
INDELs SNVs
Stratification helps understand strengths of each technology/method
Olson et al, https://doi.org/10.1101/2020.11.13.380
Shortcomings in Medical Genes for v4.2.1 benchmark
● Mandelker et al. in 2016
created a list of medical
genes with at least one
exon that is difficult to map
with short reads
● v4.2.1 improved coverage
of these genes but many
are still not fully covered
Why Create a Medical Gene Benchmark for Genome in a
Bottle?
● HG002 v4.2.1 benchmark still excludes >10% of 395 medically relevant
genes on chromosomes 1-22 on GRCh37 or GRCh38 due to structural
variants, large segmental duplications, or other difficult regions
● Advances in diploid assembly enabled us to develop phased small
variant and structural variant benchmarks in 273 of these 395 genes on
both GRCh37 and GRCh38 for HG002
Wagner et al, https://doi.org/10.1101/2021.06.07.444885
Justin Wagner
Jason Chin
Fritz Sedlazeck
GIAB CMRG Team
Generating a Challenging Medical Gene Benchmark
Trio-based
diploid
assembly
Diploid Assembly Using PacBio HiFi reads
● Trio-hifiasm
○ Illumina reads for parents and
PacBio HiFi reads for HG002
○ Best performance in Human
Pangenome Reference Consortium
diploid assembly bakeoff
● Called variants with dipcall
○ Outputs variant calls and confident
regions
○ Confident regions: covered by
exactly one contig from each
haplotype
https://github.com/lh3/dipcall
https://doi.org/10.1038/s41592-020-01056-5
New benchmark
includes 273 challenging
genes
● Curated each gene for
accurate resolution by
assembly in IGV
● Manually curated >1000
variant discrepancies and
excluded errors in benchmark
● Most errors in homopolymers
and/or highly homozygous
regions
The new CMRG small variant benchmark includes more
challenging variants and identifies more false negatives
Highlighting Genes in the New Benchmark – SMN1
GRCh37 and GRCh38 contain different false duplications
• GRCh38 has an extra copy of some medically relevant genes
like CBS, KCNE1, and CRYAA, causing mis-mapped reads
26
https://gnomad.broadinstitute.org/gene/ENSG00000160200?dataset=gnomad_r2_1
gnomAD coverage of CBS on GRCh38 decreases for genome sequencing due to mapping ambiguity
gnomAD coverage of CBS on GRCh37 is generally normal for genome (green) and exome (blue) samples
False duplications on GRCh38 can be fixed by masking
T2T identified and fixed additional false duplications
● 12 regions affecting ~1.2 Mbp and 74 genes (including 22 protein coding genes)
● Most medically relevant genes included in 11 pairs of genes in 5 large duplicated
regions on chr21
https://doi.org/10.1101/2021.07.12.452063
Genes found to be falsely
duplicated in CMRG and
T2T work
T2T also identified collapsed
duplications in GRCh38
● 203 regions affecting ~8 Mbp and 308 genes
(including 48 protein coding genes)
● Includes several medically-relevant genes:
○ KCNJ18/KCNJ12
○ KMT2C
○ MAP2K3
https://doi.org/10.1101/2021.07.12.452063
What medical genes do we still not include >90%?
● 110 on GRCh37 and 100 on GRCh38 + all genes on chrX/chrY
Progressively categorizing all 100 on GRCh38:
● 20 affected by gaps in the reference
● 38 had evidence of duplications in HG002 relative to GRCh38
○ Collapsed duplications in GRCh38 (e.g., KCNJ18)
○ Population copy number variability (e.g., LPA, KIR)
● 2 resolved on GRCh38 but not GRCh37
● 18 were >90% included by the dip.bed but had multiple contigs or a break in the
assembly-assembly alignment
● 7 have a large deletion of part or all of the gene on one haplotype
● 4 have breaks or false duplications in the hifiasm assembly (e.g., SMN2)
● 2 are in the structurally variable immunoglobulin locus
● 6 resolved but excluded due to being previously assembled in the MHC
● one (TNNT3) has a structural error in GRCh38
Plans for future assembly-based benchmarks
● Long-read assembly-based variants are reaching/surpassing the accuracy of
our benchmarks (with some exceptions)
● Use T2T-HPRC’s assembly of HG002 chrX (and chrY?) to develop small
variant and structural variant benchmark for genic and non-genic regions
● Use diploid assemblies of children in trios
Exploring if AI can be used for Genomic Reference Material
Development
● Exploring deep learning to assign
uncertainty to genomic reference
materials
● Exploring transparency for genomics AI
(e.g., "model cards")
● Exploring explainability for AI-based
reference materials
https://mdic.org/project/cancer-genomic-somatic-reference-samples/
21st Century Cell Lines: Fully Consented and
Characterized Cancer Tumor/Normal Cell Lines as
Reference Materials
● Developing matched tumor/normal cell lines
pairs and donor normal tissue analyzed at early
passages
○ Initial collaboration with Andrew Liss at MGH for
pancreatic ductal adenocarcinoma (PDAC) cell
lines
● Broadly consented for public release of
genomic data and commercial use and
redistribution
● Path to Cancer Genome in a Bottle
Seeking
collaborations
for additional
broadly-
consented
tumor/normal
cell lines
Take-home messages
● Ongoing improvement of benchmarks has been needed to
drive technology and bioinformatics innovations
● Assembly methods have advanced rapidly and are
enabling characterization of increasingly challenging
genome regions
● More work is needed to develop better benchmarks and
benchmarking tools, particularly for tumor genomes
Acknowledgment of many GIAB contributors
Government
Clinical Laboratories Academic Laboratories
Bioinformatics developers
NGS technology developers
Reference samples
* Funders
*
*
Interesting in getting involved?
www.genomeinabottle.org - sign up for general GIAB and Analysis Team google groups
GIAB slides: www.slideshare.net/genomeinabottle
Public, Unembargoed Data:
• http://www.nature.com/articles/sdata201625
• ftp://ftp-trace.ncbi.nlm.nih.gov/giab/
• github.com/genome-in-a-bottle
Global Alliance Benchmarking Team
• https://github.com/ga4gh/benchmarking-tools
• Web-based implementation at precision.fda.gov
• Best Practices at https://rdcu.be/bqpDT
GIAB Analysis Team Calls
• Sign up for the google group to attend biweekly calls Justin Zook: jzook@nist.gov
We are hiring!
Machine learning,
diploid assembly,
cancer genomes,
data science,
other ‘omics, …

More Related Content

What's hot

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Dna sequencing techniques
Dna sequencing techniquesDna sequencing techniques
Dna sequencing techniques
Bahauddin Zakariya University lahore
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
BibiQuinah
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
Golden Helix Inc
 
Proteomics
Proteomics Proteomics
Proteomics
yashgin66
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
Jennifer Shelton
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Manikhandan Mudaliar
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
Denis C. Bauer
 
marker free methods
 marker free methods marker free methods
marker free methods
ParthSharma157924
 
Pcr primer design
Pcr primer designPcr primer design
Pcr primer design
Karan Veer Singh
 
Biomarker Discovery and Validation
Biomarker Discovery and ValidationBiomarker Discovery and Validation
Biomarker Discovery and Validation
rashmiakula
 
How the blast work
How the blast workHow the blast work
How the blast work
Atai Rabby
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
VHIR Vall d’Hebron Institut de Recerca
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Data Driven Innovation
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
Karan Veer Singh
 
Metatranscriptomics
MetatranscriptomicsMetatranscriptomics
Metatranscriptomics
berciyalgolda1
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
Vall d'Hebron Institute of Research (VHIR)
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Jajati Keshari Nayak
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
Human Variome Project
 

What's hot (20)

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Dna sequencing techniques
Dna sequencing techniquesDna sequencing techniques
Dna sequencing techniques
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Proteomics
Proteomics Proteomics
Proteomics
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
marker free methods
 marker free methods marker free methods
marker free methods
 
Pcr primer design
Pcr primer designPcr primer design
Pcr primer design
 
Biomarker Discovery and Validation
Biomarker Discovery and ValidationBiomarker Discovery and Validation
Biomarker Discovery and Validation
 
How the blast work
How the blast workHow the blast work
How the blast work
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Metatranscriptomics
MetatranscriptomicsMetatranscriptomics
Metatranscriptomics
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
 

Similar to Genome in a Bottle- reference materials to benchmark challenging variants and regions of the human genome 210930

Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
GenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
GenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
GenomeInABottle
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
GenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
GenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GenomeInABottle
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
GenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
GenomeInABottle
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
GenomeInABottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
GenomeInABottle
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
GenomeInABottle
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
GenomeInABottle
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
Golden Helix
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
GenomeInABottle
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
GenomeInABottle
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMsGenomeInABottle
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
GenomeInABottle
 

Similar to Genome in a Bottle- reference materials to benchmark challenging variants and regions of the human genome 210930 (20)

Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 

More from GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
GenomeInABottle
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
GenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
GenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
GenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
GenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
GenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
GenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
GenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
GenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
GenomeInABottle
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
GenomeInABottle
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
GenomeInABottle
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
GenomeInABottle
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
GenomeInABottle
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
GenomeInABottle
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
GenomeInABottle
 

More from GenomeInABottle (20)

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 

Recently uploaded

Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
MedicoseAcademics
 
Superficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptxSuperficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptx
Dr. Rabia Inam Gandapore
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Dr Jeenal Mistry
 
heat stroke and heat exhaustion in children
heat stroke and heat exhaustion in childrenheat stroke and heat exhaustion in children
heat stroke and heat exhaustion in children
SumeraAhmad5
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
MedicoseAcademics
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
Shweta
 
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
bkling
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Prof. Marcus Renato de Carvalho
 
24 Upakrama.pptx class ppt useful in all
24 Upakrama.pptx class ppt useful in all24 Upakrama.pptx class ppt useful in all
24 Upakrama.pptx class ppt useful in all
DrSathishMS1
 
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Oleg Kshivets
 
Non-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdfNon-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdf
MedicoseAcademics
 
KDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologistsKDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologists
د.محمود نجيب
 
The Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of IIThe Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of II
MedicoseAcademics
 
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
i3 Health
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
Knee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdfKnee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdf
vimalpl1234
 
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Savita Shen $i11
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
Sujoy Dasgupta
 
Flu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore KarnatakaFlu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore Karnataka
addon Scans
 
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
kevinkariuki227
 

Recently uploaded (20)

Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
 
Superficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptxSuperficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptx
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
 
heat stroke and heat exhaustion in children
heat stroke and heat exhaustion in childrenheat stroke and heat exhaustion in children
heat stroke and heat exhaustion in children
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
 
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
 
24 Upakrama.pptx class ppt useful in all
24 Upakrama.pptx class ppt useful in all24 Upakrama.pptx class ppt useful in all
24 Upakrama.pptx class ppt useful in all
 
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
 
Non-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdfNon-respiratory Functions of the Lungs.pdf
Non-respiratory Functions of the Lungs.pdf
 
KDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologistsKDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologists
 
The Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of IIThe Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of II
 
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
 
Knee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdfKnee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdf
 
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
 
Flu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore KarnatakaFlu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore Karnataka
 
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
 

Genome in a Bottle- reference materials to benchmark challenging variants and regions of the human genome 210930

  • 1. Genome in a Bottle: Reference Materials to Benchmark Challenging Variants and Regions of the Human Genome Justin Zook, on behalf of the Genome in a Bottle Consortium National Institute of Standards and Technology (NIST) Human Genomics Team Sept 30, 2021
  • 2. Motivation for Genome in a Bottle: Sequencing and analysis methods can give different answers, particularly in challenging, repetitive regions O’Rawe et al, Genome Medicine, 2013 https://doi.org/10.1186/gm432
  • 3. GIAB has characterized variants in 7 human genomes National I nstituteof S tandards & Te c hnology Re port of I nve stigation Reference Material 8391 Human DNA for Whole-Genome Variant Assessment (Son of Eastern European Ashkenazim Jewish Ancestry) This Reference Material (RM) is intended for validation, optimization, and process evaluation purposes. It consists of a male whole human genome sample of Eastern European Ashkenazim Jewish ancestry, and it can be used to assess performance of variant calling from genome sequencing. A unit of RM 8391 consists of a vial containing human genomic DNA extracted from a single large growth of human lymphoblastoid cell line GM24385 from the Coriell Institute for Medical Research (Camden, NJ). The vial contains approximately 10 µg of genomic DNA, with the peak of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA, and the DNA is in TE buffer (10 mM TRIS, 1 mM EDTA, pH 8.0). This material is intended for assessing performance of human genome sequencing variant calling by obtaining estimates of true positives, false positives, true negatives, and false negatives. Sequencing applications could include whole genome sequencing, whole exome sequencing, and more targeted sequencing such as gene panels. This genomic DNA is intended to be analyzed in the same way as any other sample a lab would process and analyze extracted DNA. Because the RM is extracted DNA, it is not useful for assessing pre-analytical steps such as DNA extraction, but it does challenge sequencing library preparation, sequencing machines, and the bioinformatics steps of mapping, alignment, and variant calling. This RM is not intended to assess subsequent bioinformatics steps such as functional or clinical interpretation. Information Values: Information values are provided for single nucleotide polymorphisms (SNPs), small insertions and deletions (indels), and homozygous reference genotypes for approximately 88 % of the genome, using methods similar to described in reference 1. An information value is considered to be a value that will be of interest and use to the RM user, but insufficient information is available to assess the uncertainty associated with the value. We describe and disseminate our best, most confident, estimate of the genotypes using the data and methods currently available. These data and genomic characterizations will be maintained over time as new data accrue and measurement and informatics methods become available. The information values are given as a variant call file (vcf) that contains the high-confidence SNPs and small indels, as well as a tab-delimited “bed” file that describes the regions that are called high-confidence. Information values cannot be used to establish metrological traceability. The files referenced in this report are available at the Genome in a Bottle ftp site hosted by the National Center for Biotechnology Information (NCBI). The Genome in a Bottle ftp site for the high-confidence vcf and high confidence regions is: HG001 HG002 HG003 HG004 HG006 HG007 HG005 AJ Trio Chinese Trio Pilot Genome NA12878
  • 4. GIAB “Open Science” Virtuous Cycle Users analyze GIAB Samples Benchmark vs. GIAB data Critical feedback to GIAB Integrate new methods New benchmark data Method development, optimization, and demonstration Part of assay validation GIAB/NIST expands to more difficult regions
  • 5. Design of our human genome reference values Benchmark Variant Calls
  • 6. Benchmark Regions – regions in which the benchmark contains (almost) all the variants Benchmark Variant Calls Design of our human genome reference values
  • 7. Reference Values* Benchmark Variant Calls Design of our human genome reference values Benchmark Regions *Currently no quality or confidence scores associated with our reference values
  • 8. Variants from any method being evaluated Design of our human genome reference values Benchmark Regions Benchmark Variant Calls
  • 9. Benchmark Regions Variants outside benchmark regions are not assessed Majority of variants unique to method should be false positives (FPs) Majority of variants unique to benchmark should be false negatives (FNs) Matching variants assumed to be true positives Variants from any method being evaluated Benchmark Variant Calls Design of our human genome reference values
  • 10. In 2019, GIAB and GA4GH Published Resources for “Easier” Small Variants
  • 13. v4.2.1 Small Variant Benchmark used Long and Linked Reads Reference Build Benchmark Set Reference Coverage SNVs Indels Base pairs in Seg Dups and low mappability GRCh37 v3.3.2 87.8 3,048,869 464,463 57,277,670 GRCh37 v4.2.1 94.1 3,353,881 522,388 133,848,288 GRCh38 v3.3.2 85.4 3,030,495 475,332 65,714,199 GRCh38 v4.2.1 92.2 3,367,208 525,545 145,585,710 Wagner et al, https://doi.org/10.1101/2020.07.24.2127
  • 14. New benchmark includes challenging genes like PMS2 Segmental duplications
  • 15. Collaborating with FDA to use GIAB benchmark to inspire new methods https://precision.fda.gov/challenges/10
  • 16. The best-performing submissions were from new sequencing technologies and bioinformatics methods Olson et al, https://doi.org/10.1101/2020.11.13.380
  • 17. Expanding the benchmark was important to demonstrate improved technologies and analysis methods for difficult genome regions Olson et al, https://doi.org/10.1101/2020.11.13.380
  • 18. INDELs SNVs Stratification helps understand strengths of each technology/method Olson et al, https://doi.org/10.1101/2020.11.13.380
  • 19. Shortcomings in Medical Genes for v4.2.1 benchmark ● Mandelker et al. in 2016 created a list of medical genes with at least one exon that is difficult to map with short reads ● v4.2.1 improved coverage of these genes but many are still not fully covered
  • 20. Why Create a Medical Gene Benchmark for Genome in a Bottle? ● HG002 v4.2.1 benchmark still excludes >10% of 395 medically relevant genes on chromosomes 1-22 on GRCh37 or GRCh38 due to structural variants, large segmental duplications, or other difficult regions ● Advances in diploid assembly enabled us to develop phased small variant and structural variant benchmarks in 273 of these 395 genes on both GRCh37 and GRCh38 for HG002 Wagner et al, https://doi.org/10.1101/2021.06.07.444885 Justin Wagner Jason Chin Fritz Sedlazeck GIAB CMRG Team
  • 21. Generating a Challenging Medical Gene Benchmark Trio-based diploid assembly
  • 22. Diploid Assembly Using PacBio HiFi reads ● Trio-hifiasm ○ Illumina reads for parents and PacBio HiFi reads for HG002 ○ Best performance in Human Pangenome Reference Consortium diploid assembly bakeoff ● Called variants with dipcall ○ Outputs variant calls and confident regions ○ Confident regions: covered by exactly one contig from each haplotype https://github.com/lh3/dipcall https://doi.org/10.1038/s41592-020-01056-5
  • 23. New benchmark includes 273 challenging genes ● Curated each gene for accurate resolution by assembly in IGV ● Manually curated >1000 variant discrepancies and excluded errors in benchmark ● Most errors in homopolymers and/or highly homozygous regions
  • 24. The new CMRG small variant benchmark includes more challenging variants and identifies more false negatives
  • 25. Highlighting Genes in the New Benchmark – SMN1
  • 26. GRCh37 and GRCh38 contain different false duplications • GRCh38 has an extra copy of some medically relevant genes like CBS, KCNE1, and CRYAA, causing mis-mapped reads 26 https://gnomad.broadinstitute.org/gene/ENSG00000160200?dataset=gnomad_r2_1 gnomAD coverage of CBS on GRCh38 decreases for genome sequencing due to mapping ambiguity gnomAD coverage of CBS on GRCh37 is generally normal for genome (green) and exome (blue) samples
  • 27. False duplications on GRCh38 can be fixed by masking
  • 28. T2T identified and fixed additional false duplications ● 12 regions affecting ~1.2 Mbp and 74 genes (including 22 protein coding genes) ● Most medically relevant genes included in 11 pairs of genes in 5 large duplicated regions on chr21 https://doi.org/10.1101/2021.07.12.452063
  • 29. Genes found to be falsely duplicated in CMRG and T2T work
  • 30. T2T also identified collapsed duplications in GRCh38 ● 203 regions affecting ~8 Mbp and 308 genes (including 48 protein coding genes) ● Includes several medically-relevant genes: ○ KCNJ18/KCNJ12 ○ KMT2C ○ MAP2K3 https://doi.org/10.1101/2021.07.12.452063
  • 31. What medical genes do we still not include >90%? ● 110 on GRCh37 and 100 on GRCh38 + all genes on chrX/chrY Progressively categorizing all 100 on GRCh38: ● 20 affected by gaps in the reference ● 38 had evidence of duplications in HG002 relative to GRCh38 ○ Collapsed duplications in GRCh38 (e.g., KCNJ18) ○ Population copy number variability (e.g., LPA, KIR) ● 2 resolved on GRCh38 but not GRCh37 ● 18 were >90% included by the dip.bed but had multiple contigs or a break in the assembly-assembly alignment ● 7 have a large deletion of part or all of the gene on one haplotype ● 4 have breaks or false duplications in the hifiasm assembly (e.g., SMN2) ● 2 are in the structurally variable immunoglobulin locus ● 6 resolved but excluded due to being previously assembled in the MHC ● one (TNNT3) has a structural error in GRCh38
  • 32. Plans for future assembly-based benchmarks ● Long-read assembly-based variants are reaching/surpassing the accuracy of our benchmarks (with some exceptions) ● Use T2T-HPRC’s assembly of HG002 chrX (and chrY?) to develop small variant and structural variant benchmark for genic and non-genic regions ● Use diploid assemblies of children in trios
  • 33. Exploring if AI can be used for Genomic Reference Material Development ● Exploring deep learning to assign uncertainty to genomic reference materials ● Exploring transparency for genomics AI (e.g., "model cards") ● Exploring explainability for AI-based reference materials
  • 35. 21st Century Cell Lines: Fully Consented and Characterized Cancer Tumor/Normal Cell Lines as Reference Materials ● Developing matched tumor/normal cell lines pairs and donor normal tissue analyzed at early passages ○ Initial collaboration with Andrew Liss at MGH for pancreatic ductal adenocarcinoma (PDAC) cell lines ● Broadly consented for public release of genomic data and commercial use and redistribution ● Path to Cancer Genome in a Bottle Seeking collaborations for additional broadly- consented tumor/normal cell lines
  • 36. Take-home messages ● Ongoing improvement of benchmarks has been needed to drive technology and bioinformatics innovations ● Assembly methods have advanced rapidly and are enabling characterization of increasingly challenging genome regions ● More work is needed to develop better benchmarks and benchmarking tools, particularly for tumor genomes
  • 37. Acknowledgment of many GIAB contributors Government Clinical Laboratories Academic Laboratories Bioinformatics developers NGS technology developers Reference samples * Funders * *
  • 38. Interesting in getting involved? www.genomeinabottle.org - sign up for general GIAB and Analysis Team google groups GIAB slides: www.slideshare.net/genomeinabottle Public, Unembargoed Data: • http://www.nature.com/articles/sdata201625 • ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ • github.com/genome-in-a-bottle Global Alliance Benchmarking Team • https://github.com/ga4gh/benchmarking-tools • Web-based implementation at precision.fda.gov • Best Practices at https://rdcu.be/bqpDT GIAB Analysis Team Calls • Sign up for the google group to attend biweekly calls Justin Zook: jzook@nist.gov We are hiring! Machine learning, diploid assembly, cancer genomes, data science, other ‘omics, …

Editor's Notes

  1. Strawman