SlideShare a Scribd company logo
1 of 38
Genome in a Bottle: Tools for
Using NIST Reference Materials
Next Generation Diagnostics Summit Short Course
August 2014
Justin Zook, Marc Salit, and the Genome in a Bottle
Consortium
Learning Objectives
• How can Genome in a Bottle Reference
Materials help with validating NGS assays?
• Comparing your variant calls to high-
confidence calls
• Tools available for understanding potential
false positives and false negatives
• Examples of how labs are using our high-
confidence calls
NIST-hosted
Genome in a Bottle Consortium
• Infrastructure for performance
assessment of NGS
– support science-based regulatory
oversight
• No widely accepted set of metrics
to characterize the fidelity of
variant calls from NGS…
• Genome in a Bottle Consortium is
developing standards to address
this…
– human genomes as Reference
Materials (RMs)
• characterize and disseminate by NIST
– tools and methods to use these RMs
• common sequencing instruments
• bioinformatics workflows.
http://genomeinabottle.org
Whole genome sequencing technologies
disagree about 100,000’s of variants
3,198,316
(80.05%)
125,574
(3.14%)
Platform
#1
Platform
#2
Platform #3
230,311
(5.76%)
121,440
(3.04%)
208,038
(5.21%)
71,944
(1.80%)
39,604
(0.99%)
# SNPs
(% of SNPs detected
by any platform)
Bioinformatics programs also disagree
O’Rawe et al. Genome Medicine 2013, 5:28
Measurement Process
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
• gDNA reference
materials will be
developed to
characterize
performance of a part
of process
– materials will be
certified for their
variants against a
reference sequence,
with confidence
estimates
genericmeasurementprocess
NIST Human Genome RMs in the
pipeline
• All 10 ug samples of DNA
isolated from multistage large
growth cell cultures
– all are intended to act as stable,
homogeneous references
suitable for use in regulated
applications
– all genomes also available from
Coriell repository
• Pilot Genome
– ~8400 tubes
• Ashkenazim Jewish Trio
– ~10000 son; ~2500 each parent
• Asian Trio
– ~10000 son; parents not yet
planned as NIST RM
Goals for Data to Accompany RM
• ~0 false positive AND false negative calls in
confident regions
• Include as much of the genome as possible in
the confident regions (i.e., don’t just take the
intersection)
• Avoid bias towards any particular platform
– take advantage of strengths of each platform
• Avoid bias towards any particular
bioinformatics algorithms
8
Integration Methods to Establish
Reference Variant Calls
Candidate variants
Concordant variants
Find characteristics of bias
Arbitrate using evidence of
bias
Confidence Level Zook et al., Nature Biotechnology, 2014.
Assigning confidence to genotypes
High-confidence sites
• Sequencing/bioinformatics
methods agree or we
understand the biases
causing disagreement
• At least some methods have
no evidence of bias
• Inherited as expected
Less confident sites
• In a region known to be
difficult for current
technologies
• State reasons for lower
confidence
• If a site is near a low
confidence site, make it low
confidence
Reasons we exclude regions from high-
confidence set
Challenges with assessing
performance
• All variant types are not
equal
• All regions of the genome
are not equal
– Homopolymers, STRs,
duplications
– Can be similar or different
in different genomes
• Labeling difficult variants
as uncertain leads to
higher apparent accuracy
when assessing
performance
• Genotypes fall in 3+
categories (not
positive/negative)
– standard diagnostic
accuracy measures not
well posed
12
Preliminary uses of high-confidence
NIST-GIAB genotypes for NA12878
• NIST have released
several versions of high-
confidence genotypes
for its pilot RM
• These data are
presently being used for
benchmarking
– prior to release of RMs
– SNPs & indels
• ~77% of the genome
NIST Plays a Role in the First FDA Authorization for
Next-Generation Sequencer
November 20, 2013
Integrating NIST Call Sets into a Validation Workflow
Validation Report
False Positive Ratio FPR=FP/(FP+TN)
False Discovery Rate FDR=FP/(FP + TP)
Sensitivity Sens. = TP/(TP+FN)
Specificity Spec. = TN/(FP +TN)
Balanced Accuracy (Sens. + Spec.)/2
GCAT – Interactive Performance
Metrics
• NIST is working with GCAT
to use our highly
confident variant calls
• Assess performance of
many combinations of
mappers and variant
callers
• Currently assesses only
exome sequencing
• www.bioplanet.com/gcat
16
GCAT Tests
GCAT Variant Calling Tests
Pre-run Tests
Upload your own variant calls
GCAT – Upload your own exome calls
Freebayes SNP calls changed very little in 2013
http://www.bioplanet.com/gcat/reports/1933-westleouzm/variant-calls/illumina-100bp-pe-exome-150x/bwamem-
freebayes-0-9-10-131226/compare-1934-akckizzzfr-1931-laqgzjytqw-1935-xwckffckoa/snp/group-quality
Freebayes indel calls improved in 2013
http://www.bioplanet.com/gcat/reports/1933-westleouzm/variant-calls/illumina-100bp-pe-exome-150x/bwamem-
freebayes-0-9-10-131226/compare-1934-akckizzzfr-1931-laqgzjytqw-1935-xwckffckoa/indel/group-quality
Background
• Clinical laboratory – Division of Genomic Diagnostics Certified by regulatory
agencies (CAP).
• CWES test requires stringent validation per CAP criteria to establish
performance metrics of the test.
Utilizing NIST data in validation of CWES Test
• Sequence and call variants of NA12878 at CHOP
• CHOP ROI: Agilent SureSelect V5+ (SSV5+) baits file
• Compare CHOP dataset to NIST data set for concordance
NIST Data Set Details:
*High quality reference data set on NA12878 (Dec. 2013)
*NIST’s highly confident Region of Interests (ROI)
*Variants called in 219,222 regions on hg19 assembly
*: National Institute of Standards and Technology
Analytical Validation of Clinical
Whole-Exome Sequencing (CWES) Test
SENSITIVITY /SPECIFICITY
RefGene +/- 15bp (SSV5+)
CHOP NIST
TP
SNVs: 18480
INDELs: 396
FP
SNVs: 26
INDELs: 3
FN
SNVs: 63
INDELs: 30
FP: False Positive
TP: True Positive
FN: False Negative
TN: True Negative
SNVs INDELs
Sensitivity (TP/TP+FN) 99.66% 92.96%
Specificity (TN/TN+FP) ~100% ~100%
FDR (FP/FP+TN) 0.02% 0.08%
Accuracy (TP+TN/TP+TN+FP+FN) ~100% ~100%
TN = NIST highly confident
regions – CHOP ROIs
Further analysis on presumptive 93 FNs and 29 FPs
63 SNVs 30 INDELs
93 FNs
29 FPs
26 SNVs 3 INDELs
Using the GeT-RM Browser
• http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/
• Allows visualization of questionable calls
GeT-RM Load alignments for visualization
Chr6:151669820 Chr6:151669828
Difficult site in homopolymer in intron of gene AKAP12
Chr1:1666303
SNP in Gene SLC35E2, which is also in a pseudogene and a segmental duplication
Segmental
Duplication
Pseudo-
gene
Structural
Variant
Feedback from MoCha lab in NCI
• We built a targeted amplicons NGS assay for
detecting mutations in clinical tumor specimens
• To assess the assay’s specificity, we compared 84
runs of CEPH NA12878 data from our assay with
NIST’s consensus variant list (VCF v2.15)
• We observed a high overall concordance with a
few FP variants in homopolymeric regions unique
in our platform
• We concluded that NIST GIAB is a useful
reference standard to evaluate assay specificity
Using Genome in a Bottle calls to
benchmark clinical exome sequencing
at Mount Sinai School of Medicine
“We evaluate a set of
NA12878 technical replicates
against GIAB for each new
pipeline version.”
Benchmarking somatic variant calling
at Qiagen
HSPH – Brad Chapman
Comparing variant callers
http://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection-
methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/
NextSeq: New Chemistry – Does it work?
Whole Genome Metrics NextSeq500 HiSeq2500
% Genome Covered (>= 10X in Q20 bases) 96% 96%
Mean Coverage in Q20 Bases 28.3X 31.8X
SNPs Called (% dbSNP 129) 3,643,998 (89%) 3,664,014 (88%)
InDels Called (% dbSNP 129) 646,907 (65.7%) 686,547 (64.5%)
Genome in a Bottle SNP Sensitivity & Precision 99.07% | 99.04% 99.25% | 99.90%
Genome in a Bottle Indel Sensitivity & Precision 86.90% | 98.85% 93.29% | 97.54%
NextSeq 500: Genomic Coverage in High Quality Bases
Coverage in Bases with MQ>=20 and Q>=20
ProportionofGenomeatCoverage
0.000.010.020.030.040.05
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
Mean: 28.33X
Fraction at 2/3 Mean: 0.9
HiSeq 2000: Genomic Coverage in High Quality Bases
Coverage in Bases with MQ>=20 and Q>=20
ProportionofGenomeatCoverage
0.000.010.020.030.040.05
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
Mean: 31.86X
Fraction at 2/3 Mean: 0.91
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●●●●
●
●●●● ●
●0.0
0.5
1.0
1.5
2.0
0.00 0.25 0.50 0.75 1.00
GC Content
NormalizedCoverage
Platform
●
●
HiSeq 2000
NextSeq 500
Ion Benchmarking I
Ion Benchmarking II
Command-line tools for variant
benchmarking
• USeq VCFComparator
– http://sourceforge.net/projects/useq/
• RTG vcfeval
– ftp://ftp-trace.ncbi.nih.gov/giab/ftp/tools/RTG/
• bcbio.variation
– http://bcbio.wordpress.com/2013/05/06/framework-
for-evaluating-variant-detection-methods-
comparison-of-aligners-and-callers/
• SMaSH
– http://smash.cs.berkeley.edu/
How Can I Get Involved?
• Use our integrated SNP/indel
genotypes for NA12878 and give
us feedback
– Cells and DNA currently available
from Coriell
– NIST RM available late 2014
• Sequencing/analyzing the new
Genome in a Bottle samples
• Help with Structural Variant calls
• Help with analyzing data from
long-read technologies
• Attend our biannual workshops
(January in CA, August in MD)
• Help develop methods to
measure performance using our
well-characterized genomes
http://genomeinabottle.org
Email:
Justin Zook - jzook@nist.gov
Marc Salit – salit@nist.gov
Slides on slideshare at:
http://www.slideshare.net/Gen
omeInABottle

More Related Content

What's hot

What's hot (20)

171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giab
 
Aug2015 horizon diagnostics
Aug2015 horizon diagnosticsAug2015 horizon diagnostics
Aug2015 horizon diagnostics
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
GIAB GRC Workshop slides
GIAB GRC Workshop slidesGIAB GRC Workshop slides
GIAB GRC Workshop slides
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 

Similar to Tools for Using NIST Reference Materials

140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
GenomeInABottle
 
Identity testing by NGS as a means of risk mitigation for viral gene therapies
Identity testing by NGS as a means of risk mitigation for viral gene therapiesIdentity testing by NGS as a means of risk mitigation for viral gene therapies
Identity testing by NGS as a means of risk mitigation for viral gene therapies
MilliporeSigma
 
Identity testing by NGS as a means of risk mitigation for viral gene therapies
Identity testing by NGS as a means of risk mitigation for viral gene therapiesIdentity testing by NGS as a means of risk mitigation for viral gene therapies
Identity testing by NGS as a means of risk mitigation for viral gene therapies
Merck Life Sciences
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls
GenomeInABottle
 

Similar to Tools for Using NIST Reference Materials (20)

150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slides
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
Identity testing by NGS as a means of risk mitigation for viral gene therapies
Identity testing by NGS as a means of risk mitigation for viral gene therapiesIdentity testing by NGS as a means of risk mitigation for viral gene therapies
Identity testing by NGS as a means of risk mitigation for viral gene therapies
 
Identity testing by NGS as a means of risk mitigation for viral gene therapies
Identity testing by NGS as a means of risk mitigation for viral gene therapiesIdentity testing by NGS as a means of risk mitigation for viral gene therapies
Identity testing by NGS as a means of risk mitigation for viral gene therapies
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Functional Predictions and Conservation Scores in VSClinical
Functional Predictions and Conservation Scores in VSClinicalFunctional Predictions and Conservation Scores in VSClinical
Functional Predictions and Conservation Scores in VSClinical
 

More from GenomeInABottle

More from GenomeInABottle (19)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...
 

Recently uploaded

Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
chetankumar9855
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
adilkhan87451
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
Sheetaleventcompany
 

Recently uploaded (20)

8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
 
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
 
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Me
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near MeTop Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Me
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Me
 

Tools for Using NIST Reference Materials

  • 1. Genome in a Bottle: Tools for Using NIST Reference Materials Next Generation Diagnostics Summit Short Course August 2014 Justin Zook, Marc Salit, and the Genome in a Bottle Consortium
  • 2. Learning Objectives • How can Genome in a Bottle Reference Materials help with validating NGS assays? • Comparing your variant calls to high- confidence calls • Tools available for understanding potential false positives and false negatives • Examples of how labs are using our high- confidence calls
  • 3. NIST-hosted Genome in a Bottle Consortium • Infrastructure for performance assessment of NGS – support science-based regulatory oversight • No widely accepted set of metrics to characterize the fidelity of variant calls from NGS… • Genome in a Bottle Consortium is developing standards to address this… – human genomes as Reference Materials (RMs) • characterize and disseminate by NIST – tools and methods to use these RMs • common sequencing instruments • bioinformatics workflows. http://genomeinabottle.org
  • 4. Whole genome sequencing technologies disagree about 100,000’s of variants 3,198,316 (80.05%) 125,574 (3.14%) Platform #1 Platform #2 Platform #3 230,311 (5.76%) 121,440 (3.04%) 208,038 (5.21%) 71,944 (1.80%) 39,604 (0.99%) # SNPs (% of SNPs detected by any platform)
  • 5. Bioinformatics programs also disagree O’Rawe et al. Genome Medicine 2013, 5:28
  • 6. Measurement Process Sample gDNA isolation Library Prep Sequencing Alignment/Mapping Variant Calling Confidence Estimates Downstream Analysis • gDNA reference materials will be developed to characterize performance of a part of process – materials will be certified for their variants against a reference sequence, with confidence estimates genericmeasurementprocess
  • 7. NIST Human Genome RMs in the pipeline • All 10 ug samples of DNA isolated from multistage large growth cell cultures – all are intended to act as stable, homogeneous references suitable for use in regulated applications – all genomes also available from Coriell repository • Pilot Genome – ~8400 tubes • Ashkenazim Jewish Trio – ~10000 son; ~2500 each parent • Asian Trio – ~10000 son; parents not yet planned as NIST RM
  • 8. Goals for Data to Accompany RM • ~0 false positive AND false negative calls in confident regions • Include as much of the genome as possible in the confident regions (i.e., don’t just take the intersection) • Avoid bias towards any particular platform – take advantage of strengths of each platform • Avoid bias towards any particular bioinformatics algorithms 8
  • 9. Integration Methods to Establish Reference Variant Calls Candidate variants Concordant variants Find characteristics of bias Arbitrate using evidence of bias Confidence Level Zook et al., Nature Biotechnology, 2014.
  • 10. Assigning confidence to genotypes High-confidence sites • Sequencing/bioinformatics methods agree or we understand the biases causing disagreement • At least some methods have no evidence of bias • Inherited as expected Less confident sites • In a region known to be difficult for current technologies • State reasons for lower confidence • If a site is near a low confidence site, make it low confidence
  • 11. Reasons we exclude regions from high- confidence set
  • 12. Challenges with assessing performance • All variant types are not equal • All regions of the genome are not equal – Homopolymers, STRs, duplications – Can be similar or different in different genomes • Labeling difficult variants as uncertain leads to higher apparent accuracy when assessing performance • Genotypes fall in 3+ categories (not positive/negative) – standard diagnostic accuracy measures not well posed 12
  • 13. Preliminary uses of high-confidence NIST-GIAB genotypes for NA12878 • NIST have released several versions of high- confidence genotypes for its pilot RM • These data are presently being used for benchmarking – prior to release of RMs – SNPs & indels • ~77% of the genome
  • 14. NIST Plays a Role in the First FDA Authorization for Next-Generation Sequencer November 20, 2013
  • 15. Integrating NIST Call Sets into a Validation Workflow Validation Report False Positive Ratio FPR=FP/(FP+TN) False Discovery Rate FDR=FP/(FP + TP) Sensitivity Sens. = TP/(TP+FN) Specificity Spec. = TN/(FP +TN) Balanced Accuracy (Sens. + Spec.)/2
  • 16. GCAT – Interactive Performance Metrics • NIST is working with GCAT to use our highly confident variant calls • Assess performance of many combinations of mappers and variant callers • Currently assesses only exome sequencing • www.bioplanet.com/gcat 16
  • 18. GCAT Variant Calling Tests Pre-run Tests Upload your own variant calls
  • 19. GCAT – Upload your own exome calls
  • 20. Freebayes SNP calls changed very little in 2013 http://www.bioplanet.com/gcat/reports/1933-westleouzm/variant-calls/illumina-100bp-pe-exome-150x/bwamem- freebayes-0-9-10-131226/compare-1934-akckizzzfr-1931-laqgzjytqw-1935-xwckffckoa/snp/group-quality
  • 21. Freebayes indel calls improved in 2013 http://www.bioplanet.com/gcat/reports/1933-westleouzm/variant-calls/illumina-100bp-pe-exome-150x/bwamem- freebayes-0-9-10-131226/compare-1934-akckizzzfr-1931-laqgzjytqw-1935-xwckffckoa/indel/group-quality
  • 22. Background • Clinical laboratory – Division of Genomic Diagnostics Certified by regulatory agencies (CAP). • CWES test requires stringent validation per CAP criteria to establish performance metrics of the test. Utilizing NIST data in validation of CWES Test • Sequence and call variants of NA12878 at CHOP • CHOP ROI: Agilent SureSelect V5+ (SSV5+) baits file • Compare CHOP dataset to NIST data set for concordance NIST Data Set Details: *High quality reference data set on NA12878 (Dec. 2013) *NIST’s highly confident Region of Interests (ROI) *Variants called in 219,222 regions on hg19 assembly *: National Institute of Standards and Technology Analytical Validation of Clinical Whole-Exome Sequencing (CWES) Test
  • 23. SENSITIVITY /SPECIFICITY RefGene +/- 15bp (SSV5+) CHOP NIST TP SNVs: 18480 INDELs: 396 FP SNVs: 26 INDELs: 3 FN SNVs: 63 INDELs: 30 FP: False Positive TP: True Positive FN: False Negative TN: True Negative SNVs INDELs Sensitivity (TP/TP+FN) 99.66% 92.96% Specificity (TN/TN+FP) ~100% ~100% FDR (FP/FP+TN) 0.02% 0.08% Accuracy (TP+TN/TP+TN+FP+FN) ~100% ~100% TN = NIST highly confident regions – CHOP ROIs
  • 24. Further analysis on presumptive 93 FNs and 29 FPs 63 SNVs 30 INDELs 93 FNs 29 FPs 26 SNVs 3 INDELs
  • 25. Using the GeT-RM Browser • http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/ • Allows visualization of questionable calls
  • 26. GeT-RM Load alignments for visualization
  • 27. Chr6:151669820 Chr6:151669828 Difficult site in homopolymer in intron of gene AKAP12
  • 28. Chr1:1666303 SNP in Gene SLC35E2, which is also in a pseudogene and a segmental duplication
  • 30. Feedback from MoCha lab in NCI • We built a targeted amplicons NGS assay for detecting mutations in clinical tumor specimens • To assess the assay’s specificity, we compared 84 runs of CEPH NA12878 data from our assay with NIST’s consensus variant list (VCF v2.15) • We observed a high overall concordance with a few FP variants in homopolymeric regions unique in our platform • We concluded that NIST GIAB is a useful reference standard to evaluate assay specificity
  • 31. Using Genome in a Bottle calls to benchmark clinical exome sequencing at Mount Sinai School of Medicine “We evaluate a set of NA12878 technical replicates against GIAB for each new pipeline version.”
  • 32. Benchmarking somatic variant calling at Qiagen
  • 33. HSPH – Brad Chapman Comparing variant callers http://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection- methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/
  • 34. NextSeq: New Chemistry – Does it work? Whole Genome Metrics NextSeq500 HiSeq2500 % Genome Covered (>= 10X in Q20 bases) 96% 96% Mean Coverage in Q20 Bases 28.3X 31.8X SNPs Called (% dbSNP 129) 3,643,998 (89%) 3,664,014 (88%) InDels Called (% dbSNP 129) 646,907 (65.7%) 686,547 (64.5%) Genome in a Bottle SNP Sensitivity & Precision 99.07% | 99.04% 99.25% | 99.90% Genome in a Bottle Indel Sensitivity & Precision 86.90% | 98.85% 93.29% | 97.54% NextSeq 500: Genomic Coverage in High Quality Bases Coverage in Bases with MQ>=20 and Q>=20 ProportionofGenomeatCoverage 0.000.010.020.030.040.05 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Mean: 28.33X Fraction at 2/3 Mean: 0.9 HiSeq 2000: Genomic Coverage in High Quality Bases Coverage in Bases with MQ>=20 and Q>=20 ProportionofGenomeatCoverage 0.000.010.020.030.040.05 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Mean: 31.86X Fraction at 2/3 Mean: 0.91 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●●●● ● ●●●● ● ●0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00 GC Content NormalizedCoverage Platform ● ● HiSeq 2000 NextSeq 500
  • 37. Command-line tools for variant benchmarking • USeq VCFComparator – http://sourceforge.net/projects/useq/ • RTG vcfeval – ftp://ftp-trace.ncbi.nih.gov/giab/ftp/tools/RTG/ • bcbio.variation – http://bcbio.wordpress.com/2013/05/06/framework- for-evaluating-variant-detection-methods- comparison-of-aligners-and-callers/ • SMaSH – http://smash.cs.berkeley.edu/
  • 38. How Can I Get Involved? • Use our integrated SNP/indel genotypes for NA12878 and give us feedback – Cells and DNA currently available from Coriell – NIST RM available late 2014 • Sequencing/analyzing the new Genome in a Bottle samples • Help with Structural Variant calls • Help with analyzing data from long-read technologies • Attend our biannual workshops (January in CA, August in MD) • Help develop methods to measure performance using our well-characterized genomes http://genomeinabottle.org Email: Justin Zook - jzook@nist.gov Marc Salit – salit@nist.gov Slides on slideshare at: http://www.slideshare.net/Gen omeInABottle

Editor's Notes

  1. One FN snv is confirmed to be a reference One FP indel is confirmed to be REAL indel Three FP SNVs are confirmed to be REAL SNVs