2017 agbt benchmarking_poster

Introduction
Robust benchmarking of small variant calls
Justin Zook1, Peter Krusche2, Michael Eberle2, Len Trigg3, Kevin Jacobs4, Brendan O’Fallon5, Marc Salit1,
the Global Alliance for Genomics and Health Benchmarking Team, and the Genome in a Bottle Consortium
(1) Genome-Scale Measurements Group, National Institute of Standards and Technology
(2) Illumina, Inc.; (3) Real Time Genomics; (4) Helix; (5) ARUP Laboratories
• The Global Alliance for Genomics and Health Benchmarking Team has
developed a variety of resources for benchmarking germline small
variant calls:
• Standardized performance metrics definitions (e.g., false positives,
false negatives, precision, recall/sensitivity, genotype error rate)
• Links to high-confidence calls and data for benchmark genomes
• Benchmarking Tools
• Integrate variant comparison tools into a single benchmarking
framework
• Enable stratification of performance by variant type and genome
context
• Sophisticated variant comparison tools are important to handle different
representations of complex variants
• Benchmarking tools have been used in PrecisionFDA Challenges
Public Benchmark Callsets/Genomes
Resources
• GitHub site: https://github.com/ga4gh/benchmarking-tools
• In-progress benchmarking standards document: doc/standards
• Description of intermediate formats: doc/ref-impl
• Benchmark descriptions and downloads: resources/high-confidence-sets
• Stratification bed files and descriptions: resources/stratification-bed-files
• Python-code for HTML reporting and running benchmarks: reporting/basic
• Please contribute / join the discussion! Email jzook@nist.gov if you’re
interested
Genome PGP ID Coriell ID NIST ID NIST RM #
CEPH Mother N/A GM12878 HG001 8398
AJ Son huAA53E0 GM24385 HG002 8391 (son)/8392
AJ Father hu6E4515 GM24149 HG003 8392 (trio)
AJ Mother hu8E87A9 GM24143 HG004 8392 (trio)
Chinese Son hu91BD69 GM24631 HG005 8393
Practical Implications of Benchmark Callsets
Stratification by Variant Type and Context
Table 1: Genomes currently being characterized by GIAB by integrating
data from multiple technologies. Vials from large homogeneous batch of
DNA available as NIST Reference Materials (RMs)
Clinical Benchmarking Considerations
Different variant representation
changes variant counts.
MNP (=> one TP / FP / FN):
chr1 16837188 TGC CGT
SNPs (=> two TP / FP / FN):
chr1 16837188 T C
chr1 16837190 C T
Variant types can change when decomposing
or recomposing variants:
Complex variant:
chr1 201586350 CTCTCTCTCT CA
DEL + SNP:
chr1 201586350 CTCTCTCTCT C
chr1 201586359 T A
Variants cannot always be canonicalized
uniquely:
Complex variant:
chr20 21221450 GCCC GCG
chr20 21221450 GC G
chr20 21221452 C G
chr20 21221452 C G
chr20 21221453 C <DEL>
BWA/GATK-
no decoy
vs. NIST
2.18
vs. NIST
3.3.2
vs. PG
2016-1.0
Precision 91% 67% 93%
Sensitivity 99.8% 99.4% 93%
Outside bed 91% 92% 78%
• Decoy was designed to capture mis-mapped reads
that cause FPs – can we test this?
• v3.3.2 best at identifying FP SNPs
• 43% of all FPs in decoy (only 0.5% of TPs)
• PG and v2.18 exclude most FP sites
• PG best at identifying possible FN SNPs
• Clustered, unclear, difficult-to-map variants
What is the effect of not mapping to the decoy?
What is the SNP sensitivity in coding exons?
• 97.98% sensitivity vs. PG
• FNs predominately in low MQ and/or segmental duplication regions
• ~80% of FNs supported by long or linked reads
• 99.96% sensitivity vs. NISTv3.3.2
• 62x lower FN rate than vs PG
True accuracy is
hard to estimate,
especially in difficult
regions
1x0.3x 10x3x 30x
11 to 50 bp51 to 200 bp
2bp unit repeat
3bp unit repeat
4bp unit repeat
2bp unit repeat
3bp unit repeat
4bp unit repeat
FN rate vs. average
What is the FP rate for compound heterozygous indels?
• 93% precision vs. PG
• 4/10 manually inspected putative FPs were errors in test set
• 6/10 were correct in test set (partial calls or missing in PG vcf)
• 95% precision vs. NISTv3.3.2
• 9/10 manually inspected FPs were errors in test set (1 error in v3.3.2)
• Benchmark genomes may contain a limited number of variants in targeted
regions, particularly for non-SNPs, so always calculate confidence intervals
• The variants that many clinical assays are most interested in detecting are
often enriched for difficult variants (e.g., indels, complex variants)
• Could we use the stratified performance on benchmark genomes to predict
the assay’s performance for clinically interesting variants?
• Other useful benchmarking approaches:
• Synthetic DNA spike-ins
• Cell lines with engineered mutations
• Simulated reads
• Modified real reads
• Modified reference genomes
• Confirming results found in real samples over time
• Accuracy often varies by variant type and
genomic context
• Error rates for complex variants >
indels > SNPs
• Error rates in tandem repeats and
difficult to map regions are greater than
in non-repetitive regions
• The benchmarking team has made
available a set of bed files describing
difficult and interesting regions
• Different types of tandem repeats
• Low mappability regions
• Segmental duplications
• Coding regions
PrecisionFDA Challenge Results Example (precision.fda.gov)
Genome PGP ID Coriell ID NIST ID NIST RM #
CEPH Mother N/A GM12878 HG001 8398
CEPH Father N/A GM12877 N/A N/A
Table 2: Genomes with high-confidence calls from the Illumina Platinum
Genomes Project by phasing parents and 11 children and finding variants
inherited as expected
Accounting for different representations of
complex variants
• Complex variants (i.e.,
nearby SNPs and indels)
can usually be correctly
represented in multiple
ways
• GA4GH Benchmarking
tools account for these
differences in
representation
Sophisticated
comparison tools
(right) make a
significant
difference in
performance
metrics compared
to naïve tools (left)
Example Complex Variant where normalization
alone (e.g., with vt) does not work

2017 agbt benchmarking_poster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2017 agbt benchmarking_poster

Similar to 2017 agbt benchmarking_poster (20)

More from GenomeInABottle

More from GenomeInABottle (20)

Recently uploaded

Recently uploaded (20)

2017 agbt benchmarking_poster