Benchmarking
with
Genome In A
Bottle
GIAB Improves Confidence in Genome Sequencing
and Variant Calling
REFERENCE
MATERIALS
CHARACTERIZATIONS
(BENCHMARK SETS)
REFERENCE DATA BENCHMARKING
METHODS
2
Genome
Sequencing and
Variant Calling
3
GIAB Reference
Materials
4
GIAB has characterized variants in 7 human
genomes
5
HG001*
Chinese Trio
NA12878
HG002*
HG003* HG004*
AJ Trio
HG006 HG007
HG005*
*NIST RMs developed from large batches of DNA
GIAB Reference
Data
6
Public Data Sources
• NIH Hosted FTP Site https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/
• NIH SRA https://www.ncbi.nlm.nih.gov/bioproject/200694
• HPRC S3 Bucket https://github.com/human-pangenomics/HG002_Data_Freeze_v1.0
7
8
GIAB Data
Indexes on
Github
9
https://github.com/genome-in-a-bottle/giab_data_indexes
Work In Progress - Data Registry
Queryable database with
pointers to publicly
available GIAB data
along with summary
statistics
Data Types
Sample
FASTQs
BAMs
VCFs
Capturing methods and
linking datasets for data
provenance
10
GIAB
Characterizations
11
12
Small Variant
Integration
Process
13
Benchmark
Regions
Reliably identifies false positives
Matching
variants
assumed true
positives
Variants from
any method
Benchmark
Variants
Design of GIAB benchmark
Variants not assessed
Reliably identifies false negatives
GRCh37 and GRCh38
Reliable IDentification of Errors (RIDE)
14
v4.2.1 Small Variant Benchmark used Long and Linked Reads
15
Reference Build Benchmark Set Reference Coverage SNVs Indels Base pairs in Seg Dups and low mappability
GRCh37 v3.3.2 87.8 3,048,869 464,463 57,277,670
GRCh37 v4.2.1 94.1 3,353,881 522,388 133,848,288
GRCh38 v3.3.2 85.4 3,030,495 475,332 65,714,199
GRCh38 v4.2.1 92.2 3,367,208 525,545 145,585,710
Wagner et al, https://doi.org/10.1101/2020.07.24.212712
Structural
Variant
Benchmark Set
16
Zook, J.M., Hansen, N.F., Olson, N.D. et al. A robust benchmark for detection of germline large deletions and
insertions. Nat Biotechnol 38, 1347–1355 (2020). https://doi.org/10.1038/s41587-020-0538-8
GIAB
Benchmarking
Methods
17
Small Variant Benchmarking Highlights (TLDR)
Best practices for
benchmarking
germline variant
calling
https://rdcu.be/bVtIF
Supplemental Table 2
summarizes best
practices
Hap.py - best
practices
implementation
Command line -
https://github.com/Illumi
na/hap.py
Graphical interface –
https://precision.fda.gov/
HappyR – R
package for hap.py
results
Github
https://github.com/Illumi
na/happyR
www.slideshare.ne
t/genomeinabottle
Benchmarking Process
19
Best Practices
Summary
Benchmark Sets
Stringency of variant comparison
Variant comparison tools
Manual Curation
Metric Interpretation
Stratifications
Confidence Intervals
Additional Benchmarking Approaches
Applying
Best Practices
22
Best Practices for Benchmarking Small Variants
23
https://github.com/ga4gh/benchmarking-tools
Paper: https://rdcu.be/bqpDT https://precision.fda.gov/
Stratified Performance
Metrics
• Plot metric on a phred scale for
better separation of metric
values > 99%.
• Precision = TP/(TP + FP)
• Recall = TP/ (TP + FN)
• Confidence intervals indicate
uncertainty and help account
for differences in number of
variants per stratification.
INDEL SNP
Precision
Recall
Difficult
Homopol
Not
in
Difficult
TR
and
Homopol
CDS
chainSelf
lowmap
and
segdups
lowmap
SegDups
chainSelf
>10kb
SegDups
>
10kb
Difficult
Homopol
Not
in
Difficult
TR
and
Homopol
CDS
chainSelf
lowmap
and
segdups
lowmap
SegDups
chainSelf
>10kb
SegDups
>
10kb
99
99.9
99.99
99
99.9
99.99
Genomic Context
Metric
(%
phred
scale)
GIAB ID HG003 HG004 Stratification Type all notin
Pairwise
callset
comparison
L1H
L1H
quadTR >200bp
nonuniuqe l250m0e0
nonuniuqe l250m0e0
notin Not in All Difficult
notin Not in All Difficult
TR 201bp − 10kb
L1H
L1H
diTR 51−200bp
diTR 51−200bp
triTR 51−200bp
triTR 51−200bp
nonuniuqe l250m0e0
nonuniuqe l250m0e0
notin Not in All Difficult
L1H
notin Not in All Difficult
notin Not in All Difficult
L1H
MHC
MHC
diTR 51−200bp
diTR 51−200bp
quadTR 51−200bp
triTR 51−200bp
triTR 51−200bp
notin Not in All Difficult
notin Not in All Difficult
Precision Recall
INDEL
SNP
0 90 99 99.9 99.99 0 90 99 99.9 99.99
0
90
99
99.9
99.99
0
90
99
99.9
99.99
DeepVariant_PacBio
DeepVariant_ILL
strat_group
All Diff
LowComplexity
Map and SegDups
mappability
Other Diff
SegDups
NA
(Optional) Optimization
– Identifying biases
responsible for
performing
stratifications.
Benchmarking Take Home Messages
Kruche et al. URL, is a great resource for germ-line small variant benchmarking.
Appropriate data visualizations are critical to interpreting benchmarking results.
Use manual curation to evaluate benchmarking results
Resources available for benchmarking small and structural variants against
GRCh37 and GRCh38.
Collaborating with
FDA to use GIAB
benchmark to
inspire new
methods
29
https://precision.fda.gov/challenges/10
30
Challenge Results
• Received 64 submissions from 20
participants
• Most submissions used deep-learning-
based variant-calling methods
• Submissions using multiple
technologies outperformed single
technology submissions
• Submission performance varied by
genomic stratification
31
W
W
W
W
W
W
W
W
W W
W
W
W
W
Sentieon
Roche Sequencing Solutions
The Genomics Team in Google Health Sentieon
Sentieon
DRAGEN
Sentieon
Roche Sequencing Solutions
Sentieon
Seven Bridges Genomics
The UCSC CGL and Google Health
Wang Genomics Lab
DRAGEN
The UCSC CGL and Google Health
0
90
99
99.9
Dif
f
i
cult-to-Map
Regions
All Benchmark
Regions
MHC
Genomic Regions
F1
%
Technology ILLUMINA MULTI ONT PACBIO
Results Con’t
• Updated stratifications enable
comparison of method strengths
• Graph-based variant calling enables high
accuracy of short read variant calls in the
difficult MHC region.
• Improved benchmark sets and
stratifications reveal significant
progress in DNA sequencing and
variant calling since the 2016 challenge
32
Future of
Genome In A
Bottle
33
DEvelopment
Framework for
Assembly Based
Bechmarks
(DEFRABB)
34
Developing benchmarks on
new references using
assemblies
35
• Telomere-to-Telomere
Consortium generated a
new reference T2T-
CHM13
• Developed CMRG
benchmark on T2T-
CHM13 using the diploid
assembly of HG002
similar to benchmarks on
GRCh37 and GRCh38
Assembly-Based Benchmark Process
36
Assembly-Based Benchmark Process
37 - Minimap2 for Assembly –Assembly alignment
- Variants called and diploid assembled regions
identified using dipcall v0.3
Assembly-Based Benchmark Process
38
VCF formatting and modifications for use in
benchmarking.
Assembly-Based Benchmark Process
39 Exclude regions from dip.bed (assembled regions)
that are problematic for small variant calling and
comparison due to SVs and gaps in reference or
alignment
Take-home messages
REFERENCE
MATERIALS
AVAILABLE FOR 5
INDIVIDUALS
SMALL VARIANT
BENCHMARK SETS
FOR 7 INDIVIDUALS
FOR GRCH37 AND
GRCH38, SV
BENCHMARK FOR
ONE INDIVIDUAL FOR
GRCH37
BEST PRACTICES
ESTABLISHED FOR
SMALL VARIANT
BENCHMARKING
CURRENT EFFORTS
FOCUS ON
DEVELOPING SMALL
VARIANT AND
STRUCTURAL
VARIANT
BENCHMARK SET
USING DIPLOID
ASSEMBLIES
40
Acknowledgment of many GIAB contributors
41
Government
Clinical Laboratories Academic Laboratories
Bioinformatics developers
NGS technology developers
Reference samples
* Funders
*
*
Interesting in getting involved?
42
www.genomeinabottle.org - sign up for general
GIAB and Analysis Team google groups
GIAB slides:
www.slideshare.net/genomeinabottle
Public, Unembargoed
Data:
github.com/genome-
in-a-bottle
We are hiring!
Data Manager,
Machine learning,
diploid assembly,
cancer genomes,
data science,
other ‘omics, …

Benchmarking with GIAB 220907

  • 1.
  • 2.
    GIAB Improves Confidencein Genome Sequencing and Variant Calling REFERENCE MATERIALS CHARACTERIZATIONS (BENCHMARK SETS) REFERENCE DATA BENCHMARKING METHODS 2
  • 3.
  • 4.
  • 5.
    GIAB has characterizedvariants in 7 human genomes 5 HG001* Chinese Trio NA12878 HG002* HG003* HG004* AJ Trio HG006 HG007 HG005* *NIST RMs developed from large batches of DNA
  • 6.
  • 7.
    Public Data Sources •NIH Hosted FTP Site https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/ • NIH SRA https://www.ncbi.nlm.nih.gov/bioproject/200694 • HPRC S3 Bucket https://github.com/human-pangenomics/HG002_Data_Freeze_v1.0 7
  • 8.
  • 9.
  • 10.
    Work In Progress- Data Registry Queryable database with pointers to publicly available GIAB data along with summary statistics Data Types Sample FASTQs BAMs VCFs Capturing methods and linking datasets for data provenance 10
  • 11.
  • 12.
  • 13.
  • 14.
    Benchmark Regions Reliably identifies falsepositives Matching variants assumed true positives Variants from any method Benchmark Variants Design of GIAB benchmark Variants not assessed Reliably identifies false negatives GRCh37 and GRCh38 Reliable IDentification of Errors (RIDE) 14
  • 15.
    v4.2.1 Small VariantBenchmark used Long and Linked Reads 15 Reference Build Benchmark Set Reference Coverage SNVs Indels Base pairs in Seg Dups and low mappability GRCh37 v3.3.2 87.8 3,048,869 464,463 57,277,670 GRCh37 v4.2.1 94.1 3,353,881 522,388 133,848,288 GRCh38 v3.3.2 85.4 3,030,495 475,332 65,714,199 GRCh38 v4.2.1 92.2 3,367,208 525,545 145,585,710 Wagner et al, https://doi.org/10.1101/2020.07.24.212712
  • 16.
    Structural Variant Benchmark Set 16 Zook, J.M.,Hansen, N.F., Olson, N.D. et al. A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol 38, 1347–1355 (2020). https://doi.org/10.1038/s41587-020-0538-8
  • 17.
  • 18.
    Small Variant BenchmarkingHighlights (TLDR) Best practices for benchmarking germline variant calling https://rdcu.be/bVtIF Supplemental Table 2 summarizes best practices Hap.py - best practices implementation Command line - https://github.com/Illumi na/hap.py Graphical interface – https://precision.fda.gov/ HappyR – R package for hap.py results Github https://github.com/Illumi na/happyR www.slideshare.ne t/genomeinabottle
  • 19.
  • 21.
    Best Practices Summary Benchmark Sets Stringencyof variant comparison Variant comparison tools Manual Curation Metric Interpretation Stratifications Confidence Intervals Additional Benchmarking Approaches
  • 22.
  • 23.
    Best Practices forBenchmarking Small Variants 23 https://github.com/ga4gh/benchmarking-tools Paper: https://rdcu.be/bqpDT https://precision.fda.gov/
  • 25.
    Stratified Performance Metrics • Plotmetric on a phred scale for better separation of metric values > 99%. • Precision = TP/(TP + FP) • Recall = TP/ (TP + FN) • Confidence intervals indicate uncertainty and help account for differences in number of variants per stratification. INDEL SNP Precision Recall Difficult Homopol Not in Difficult TR and Homopol CDS chainSelf lowmap and segdups lowmap SegDups chainSelf >10kb SegDups > 10kb Difficult Homopol Not in Difficult TR and Homopol CDS chainSelf lowmap and segdups lowmap SegDups chainSelf >10kb SegDups > 10kb 99 99.9 99.99 99 99.9 99.99 Genomic Context Metric (% phred scale) GIAB ID HG003 HG004 Stratification Type all notin
  • 26.
    Pairwise callset comparison L1H L1H quadTR >200bp nonuniuqe l250m0e0 nonuniuqel250m0e0 notin Not in All Difficult notin Not in All Difficult TR 201bp − 10kb L1H L1H diTR 51−200bp diTR 51−200bp triTR 51−200bp triTR 51−200bp nonuniuqe l250m0e0 nonuniuqe l250m0e0 notin Not in All Difficult L1H notin Not in All Difficult notin Not in All Difficult L1H MHC MHC diTR 51−200bp diTR 51−200bp quadTR 51−200bp triTR 51−200bp triTR 51−200bp notin Not in All Difficult notin Not in All Difficult Precision Recall INDEL SNP 0 90 99 99.9 99.99 0 90 99 99.9 99.99 0 90 99 99.9 99.99 0 90 99 99.9 99.99 DeepVariant_PacBio DeepVariant_ILL strat_group All Diff LowComplexity Map and SegDups mappability Other Diff SegDups NA
  • 27.
    (Optional) Optimization – Identifyingbiases responsible for performing stratifications.
  • 28.
    Benchmarking Take HomeMessages Kruche et al. URL, is a great resource for germ-line small variant benchmarking. Appropriate data visualizations are critical to interpreting benchmarking results. Use manual curation to evaluate benchmarking results Resources available for benchmarking small and structural variants against GRCh37 and GRCh38.
  • 29.
    Collaborating with FDA touse GIAB benchmark to inspire new methods 29 https://precision.fda.gov/challenges/10
  • 30.
  • 31.
    Challenge Results • Received64 submissions from 20 participants • Most submissions used deep-learning- based variant-calling methods • Submissions using multiple technologies outperformed single technology submissions • Submission performance varied by genomic stratification 31 W W W W W W W W W W W W W W Sentieon Roche Sequencing Solutions The Genomics Team in Google Health Sentieon Sentieon DRAGEN Sentieon Roche Sequencing Solutions Sentieon Seven Bridges Genomics The UCSC CGL and Google Health Wang Genomics Lab DRAGEN The UCSC CGL and Google Health 0 90 99 99.9 Dif f i cult-to-Map Regions All Benchmark Regions MHC Genomic Regions F1 % Technology ILLUMINA MULTI ONT PACBIO
  • 32.
    Results Con’t • Updatedstratifications enable comparison of method strengths • Graph-based variant calling enables high accuracy of short read variant calls in the difficult MHC region. • Improved benchmark sets and stratifications reveal significant progress in DNA sequencing and variant calling since the 2016 challenge 32
  • 33.
  • 34.
  • 35.
    Developing benchmarks on newreferences using assemblies 35 • Telomere-to-Telomere Consortium generated a new reference T2T- CHM13 • Developed CMRG benchmark on T2T- CHM13 using the diploid assembly of HG002 similar to benchmarks on GRCh37 and GRCh38
  • 36.
  • 37.
    Assembly-Based Benchmark Process 37- Minimap2 for Assembly –Assembly alignment - Variants called and diploid assembled regions identified using dipcall v0.3
  • 38.
    Assembly-Based Benchmark Process 38 VCFformatting and modifications for use in benchmarking.
  • 39.
    Assembly-Based Benchmark Process 39Exclude regions from dip.bed (assembled regions) that are problematic for small variant calling and comparison due to SVs and gaps in reference or alignment
  • 40.
    Take-home messages REFERENCE MATERIALS AVAILABLE FOR5 INDIVIDUALS SMALL VARIANT BENCHMARK SETS FOR 7 INDIVIDUALS FOR GRCH37 AND GRCH38, SV BENCHMARK FOR ONE INDIVIDUAL FOR GRCH37 BEST PRACTICES ESTABLISHED FOR SMALL VARIANT BENCHMARKING CURRENT EFFORTS FOCUS ON DEVELOPING SMALL VARIANT AND STRUCTURAL VARIANT BENCHMARK SET USING DIPLOID ASSEMBLIES 40
  • 41.
    Acknowledgment of manyGIAB contributors 41 Government Clinical Laboratories Academic Laboratories Bioinformatics developers NGS technology developers Reference samples * Funders * *
  • 42.
    Interesting in gettinginvolved? 42 www.genomeinabottle.org - sign up for general GIAB and Analysis Team google groups GIAB slides: www.slideshare.net/genomeinabottle Public, Unembargoed Data: github.com/genome- in-a-bottle We are hiring! Data Manager, Machine learning, diploid assembly, cancer genomes, data science, other ‘omics, …