SlideShare a Scribd company logo
1 of 1
Download to read offline
Results from Adding Long and Linked Reads
NIST hosts the Genome in a Bottle (GIAB) Consortium that develops
metrology infrastructure for characterization of human whole genome
variant detection. Consortium products include:
• Characterization of seven broadly-consented human genomes including
2 son-mother-father trios released as Reference Materials (RMs)
• Reference data associated with RMs are benchmark variants and
genomic regions covering, for example, 87.8% of assembled bases in
chromosomes 1-22 in GRCh37 for the sample HG002
A limitation of the current GIAB benchmark is short read variant callers
perform poorly in genomic locations with high homology such as segmental
duplications and low-complexity repeat-rich regions. We incorporated
PacBio CCS long reads and 10x Genomics linked reads to generate a draft for
a new GIAB benchmark. Initial results show long and linked reads add
greater than 276,840 SNPs and 42,980 insertions/deletions to the
benchmark, mostly in regions difficult to map with short reads.
Overview
Integration data for HG002
Using long and linked reads to generate
a new Genome in a Bottle small variant benchmark
J. Wagner1, A. Carroll6, I.T. Fiddes3, A.M. Wenger2, W.J. Rowell2, N. Olson1, L. Harris1, J. McDaniel1, C. Xiao5, M. Salit4, J. Zook1, Genome in a Bottle Consortium
1) Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899; 2) Pacific Biosciences, 1305 O'Brien Drive, Menlo Park
CA 94025; 3) 10x Genomics, 7068 Koll Center Parkway, Pleasanton CA 94566; 4) Joint Initiative for Metrology in Biology, Stanford, CA 94305; 5) National Center for
Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894; 6) Google, Inc. Mountain View, CA
Ongoing and Future work
Integration Pipeline Process
Benchmark includes more bases, variants, and segmental duplications in v4
Comparison of Illumina RTG VCF against benchmark sets
• SNP FNs increases by a factor of more than 3, mostly due to new
benchmark variants in difficult to map regions and segmental duplications
Performance in medically-relevant genes in GRCh37
• v4 draft covers more of the MHC region, see poster 1707W for details
• Outside of MHC updates, top 5 genes with variants increased from v3.3.2
to v4 draft benchmark: TSPEAR (31), LAMA5 (28), FCGBP (18), TPSAB1 (15),
HSPG2 (13)
• PMS2 from ACMG59 has 2 more variants and RET, SCN5A, TNNI3 have 1
more variant covered in v4 draft benchmark that are not in v3.3.2
Sanger sequencing
• Performed long range PCR before
sequencing
• Confirmed 12 variants in CYP21A2,
which is a medically-relevant gene in
the MHC region
• Confirmed 6 variants in PMS2
Genome in a Bottle Consortium
Platform Characteristics Alignment; Variant Calling
PacBio Sequel II ~11Kbp reads; ~32x coverage
minimap2; GATK4
minimap2; DeepVariant
10X Genomics Linked reads; ~84x coverage LongRanger Pipeline
PASS variants #2
Benchmark regions
0/1 1/11/1
Benchmark calls 0/11/1
Callable regions #2
Callable regions #1
1/10/11/1PASS variants #1
InputMethods
1/1
Concordant
Discordant
unresolved
Discordant
arbitrated
Concordant
not callable
Variants in Medical Exome
(genes from OMIM, HGMD, ClinVar, UniProt)
Benchmark Regions v3.3.2 8,209
Benchmark Regions v4 draft 9,527
Difficult Region Description Bases Covered in
GRCh37
Bases Covered
in GRCh38
v0.6 SV Benchmark 32,596,754 32,872,907
Potential copy number variation 51,713,344 62,666,746
Tandem Repeats > 10kb 5,731,885 71,942,255
Highly similar and high depth segmental duplications 1,232,701 2,094,143
Regions that are collapsed and expanded from GRCh37/38
Primary Assembly Alignments 17,979,597 N/A
Modeled centromere and heterochromatin N/A 62,304,573
Subset v3.3.2 FNs v4 FNs
All SNPs 8,594 30,229
Low mappability 6,708 25,295
Segmental duplications 1,429 14,008
• Refine use of genome stratifications
• Adding variant calls from raw PacBio and Oxford Nanopore
• Improve benchmark for larger indels, homopolymers, and tandem repeats
• Improve normalization of complex variants
• Generating benchmark variants from diploid assemblies
• Machine learning
- Outlier detection, active learning
The input data for GIAB benchmark v3.3.2 consisted of Illumina, Complete
Genomics, Ion, 10X, and Solid technologies. The draft v4 benchmark
incorporates new PacBio CCS and 10x Genomics linked read data.
New members welcome! Sign up for newsletters at www.genomeinabottle.org
Volunteer to evaluate draft benchmark by emailing: justin.zook@nist.gov
Excluded all methods:
The following regions are excluded from all technologies and methods:
• Tandem Repeats < 51bp except GATK from Illumina PCR-free, Complete
Genomics, and CCS DeepVariant
• Tandem Repeats > 51bp and < 200bp except GATK from Illumina PCR-Free
and CCS DeepVariant
• Tandem Repeats > 200bp except CCS DeepVariant
• Homopolymers > 6bp except GATK from Illumina PCR-free, Complete
Genomics, Ion Exome, CCS
• Imperfect homopolymer > 10bp except GATK from Illumina PCR-Free
• Difficult to map regions for short reads except 10x and CCS
• LINE:L1Hs > 500 except Illumina MatePair, 10x, and CCS
• Segmental duplications except 10x and CCS
Evaluation by GIAB collaborators
Compared benchmark to callsets from a variety of technologies and variant
calling methods including:
• Illumina PCR-Free and Dragen
• 10x Genomics and Aquila (variants from local diploid assembly)
• PacBio CCS and GATK4
• PacBio CCS and Clair (Next generation of Clairvoyante)
• PacBio CCS and DeepVariant
• ONT Promethion and Clair
Preliminary results suggest that a majority of FPs and FNs are correct in the
benchmark and errors in the tested callsets.
v4 draft GRCh37 v4 draft GRCh38
Base pairs 2,504,027,936 2,509,269,277
Reference
covered
93.2% 91.03%
SNPs 3,323,773 3,314,941
Indels 519,152 519,494
Base pairs in
Segmental
Duplications
64,300,499 73,819,342
Arbitration Example
80.00%
85.00%
90.00%
95.00%
Percent of reference covered
Only in v3.3.2
GRCh37
Only in v4
draft GRCh37
SNPs INDELs
More
volunteers
welcomed
Genome in a Bottle
Consortium
SNPs INDELs
Only in v3.3.2
GRCh38
Only in v4
draft GRCh38
343,358
69,495
77,324
23,828
376,653
91,837
91,719
48,753

More Related Content

What's hot

GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphsGenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
Giab agbt small_var_2019
Giab agbt small_var_2019Giab agbt small_var_2019
Giab agbt small_var_2019GenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normalGenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethionGenomeInABottle
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justinGenomeInABottle
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomesGenomeInABottle
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccsGenomeInABottle
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justinGenomeInABottle
 
Giab product and tool roadmap small variants
Giab product and tool roadmap   small variantsGiab product and tool roadmap   small variants
Giab product and tool roadmap small variantsGenomeInABottle
 

What's hot (20)

GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
Giab agbt small_var_2019
Giab agbt small_var_2019Giab agbt small_var_2019
Giab agbt small_var_2019
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
Giab ashg 2017
Giab ashg 2017Giab ashg 2017
Giab ashg 2017
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
Giab product and tool roadmap small variants
Giab product and tool roadmap   small variantsGiab product and tool roadmap   small variants
Giab product and tool roadmap small variants
 

Similar to GIAB ASHG 2019 Small Variant poster

New methods draft v4alpha small variant benchmark
New methods   draft v4alpha small variant benchmarkNew methods   draft v4alpha small variant benchmark
New methods draft v4alpha small variant benchmarkGenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821GenomeInABottle
 
Understanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysUnderstanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysCandy Smellie
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...Mark Evans
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansGenomeInABottle
 
Chemically ligated gRNAs for CRISPR applications.
Chemically ligated gRNAs for CRISPR applications.Chemically ligated gRNAs for CRISPR applications.
Chemically ligated gRNAs for CRISPR applications.Minghong Zhong
 
Emergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciencesEmergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalscienceskarenbbs
 
Isothermal Nucleic Acid Amplification Techniques
Isothermal Nucleic Acid Amplification TechniquesIsothermal Nucleic Acid Amplification Techniques
Isothermal Nucleic Acid Amplification TechniquesAref Farokhi Fard
 
Advanced Real-Time PCR Array Technology – Coding and Noncoding RNA Expression...
Advanced Real-Time PCR Array Technology – Coding and Noncoding RNA Expression...Advanced Real-Time PCR Array Technology – Coding and Noncoding RNA Expression...
Advanced Real-Time PCR Array Technology – Coding and Noncoding RNA Expression...QIAGEN
 
Biomarker for genotoxicity 2013
Biomarker for genotoxicity 2013Biomarker for genotoxicity 2013
Biomarker for genotoxicity 2013Elsa von Licy
 
2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinarElsa von Licy
 
Primer design for PCR and analysis of gel picture
Primer design for  PCR and analysis of gel picture Primer design for  PCR and analysis of gel picture
Primer design for PCR and analysis of gel picture Thoria Donia
 

Similar to GIAB ASHG 2019 Small Variant poster (20)

New methods draft v4alpha small variant benchmark
New methods   draft v4alpha small variant benchmarkNew methods   draft v4alpha small variant benchmark
New methods draft v4alpha small variant benchmark
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Understanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysUnderstanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assays
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
May workshop
May workshopMay workshop
May workshop
 
Chemically ligated gRNAs for CRISPR applications.
Chemically ligated gRNAs for CRISPR applications.Chemically ligated gRNAs for CRISPR applications.
Chemically ligated gRNAs for CRISPR applications.
 
Emergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciencesEmergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciences
 
Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
 
Isothermal Nucleic Acid Amplification Techniques
Isothermal Nucleic Acid Amplification TechniquesIsothermal Nucleic Acid Amplification Techniques
Isothermal Nucleic Acid Amplification Techniques
 
Advanced Real-Time PCR Array Technology – Coding and Noncoding RNA Expression...
Advanced Real-Time PCR Array Technology – Coding and Noncoding RNA Expression...Advanced Real-Time PCR Array Technology – Coding and Noncoding RNA Expression...
Advanced Real-Time PCR Array Technology – Coding and Noncoding RNA Expression...
 
Biomarker for genotoxicity 2013
Biomarker for genotoxicity 2013Biomarker for genotoxicity 2013
Biomarker for genotoxicity 2013
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinar
 
Primer design for PCR and analysis of gel picture
Primer design for  PCR and analysis of gel picture Primer design for  PCR and analysis of gel picture
Primer design for PCR and analysis of gel picture
 
Q biomarkercn
Q biomarkercnQ biomarkercn
Q biomarkercn
 

More from GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seqGenomeInABottle
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanoporeGenomeInABottle
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samplesGenomeInABottle
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortiumGenomeInABottle
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...GenomeInABottle
 
How giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionHow giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionGenomeInABottle
 

More from GenomeInABottle (10)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...
 
How giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionHow giab fits in the rest of the world introduction
How giab fits in the rest of the world introduction
 

Recently uploaded

Apiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptApiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptkedirjemalharun
 
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
PULMONARY EDEMA AND  ITS  MANAGEMENT.pdfPULMONARY EDEMA AND  ITS  MANAGEMENT.pdf
PULMONARY EDEMA AND ITS MANAGEMENT.pdfDolisha Warbi
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurNavdeep Kaur
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptxTina Purnat
 
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranMusic Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranTara Rajendran
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Classmanuelazg2001
 
Monoclonal antibody production by hybridoma technology
Monoclonal antibody production by hybridoma technologyMonoclonal antibody production by hybridoma technology
Monoclonal antibody production by hybridoma technologyHasnat Tariq
 
Radiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxRadiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxDr. Dheeraj Kumar
 
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMAANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMADivya Kanojiya
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfSasikiranMarri
 
medico legal aspects of wound - forensic medicine
medico legal aspects of wound - forensic medicinemedico legal aspects of wound - forensic medicine
medico legal aspects of wound - forensic medicinethanaram patel
 
Culture and Health Disorders Social change.pptx
Culture and Health Disorders Social change.pptxCulture and Health Disorders Social change.pptx
Culture and Health Disorders Social change.pptxDr. Dheeraj Kumar
 
Informed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxInformed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxSasikiranMarri
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisGolden Helix
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.ANJALI
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...MehranMouzam
 
Clinical Pharmacotherapy of Scabies Disease
Clinical Pharmacotherapy of Scabies DiseaseClinical Pharmacotherapy of Scabies Disease
Clinical Pharmacotherapy of Scabies DiseaseSreenivasa Reddy Thalla
 
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfSGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfHongBiThi1
 
World-Health-Day-2024-My-Health-My-Right.pptx
World-Health-Day-2024-My-Health-My-Right.pptxWorld-Health-Day-2024-My-Health-My-Right.pptx
World-Health-Day-2024-My-Health-My-Right.pptxEx WHO/USAID
 

Recently uploaded (20)

Apiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptApiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.ppt
 
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
PULMONARY EDEMA AND  ITS  MANAGEMENT.pdfPULMONARY EDEMA AND  ITS  MANAGEMENT.pdf
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptx
 
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranMusic Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Class
 
Monoclonal antibody production by hybridoma technology
Monoclonal antibody production by hybridoma technologyMonoclonal antibody production by hybridoma technology
Monoclonal antibody production by hybridoma technology
 
Radiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxRadiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptx
 
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMAANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
ANTI-DIABETICS DRUGS - PTEROCARPUS AND GYMNEMA
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdf
 
medico legal aspects of wound - forensic medicine
medico legal aspects of wound - forensic medicinemedico legal aspects of wound - forensic medicine
medico legal aspects of wound - forensic medicine
 
Culture and Health Disorders Social change.pptx
Culture and Health Disorders Social change.pptxCulture and Health Disorders Social change.pptx
Culture and Health Disorders Social change.pptx
 
Informed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxInformed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptx
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
Study on the Impact of FOCUS-PDCA Management Model on the Disinfection Qualit...
 
Clinical Pharmacotherapy of Scabies Disease
Clinical Pharmacotherapy of Scabies DiseaseClinical Pharmacotherapy of Scabies Disease
Clinical Pharmacotherapy of Scabies Disease
 
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfSGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdf
 
World-Health-Day-2024-My-Health-My-Right.pptx
World-Health-Day-2024-My-Health-My-Right.pptxWorld-Health-Day-2024-My-Health-My-Right.pptx
World-Health-Day-2024-My-Health-My-Right.pptx
 

GIAB ASHG 2019 Small Variant poster

  • 1. Results from Adding Long and Linked Reads NIST hosts the Genome in a Bottle (GIAB) Consortium that develops metrology infrastructure for characterization of human whole genome variant detection. Consortium products include: • Characterization of seven broadly-consented human genomes including 2 son-mother-father trios released as Reference Materials (RMs) • Reference data associated with RMs are benchmark variants and genomic regions covering, for example, 87.8% of assembled bases in chromosomes 1-22 in GRCh37 for the sample HG002 A limitation of the current GIAB benchmark is short read variant callers perform poorly in genomic locations with high homology such as segmental duplications and low-complexity repeat-rich regions. We incorporated PacBio CCS long reads and 10x Genomics linked reads to generate a draft for a new GIAB benchmark. Initial results show long and linked reads add greater than 276,840 SNPs and 42,980 insertions/deletions to the benchmark, mostly in regions difficult to map with short reads. Overview Integration data for HG002 Using long and linked reads to generate a new Genome in a Bottle small variant benchmark J. Wagner1, A. Carroll6, I.T. Fiddes3, A.M. Wenger2, W.J. Rowell2, N. Olson1, L. Harris1, J. McDaniel1, C. Xiao5, M. Salit4, J. Zook1, Genome in a Bottle Consortium 1) Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899; 2) Pacific Biosciences, 1305 O'Brien Drive, Menlo Park CA 94025; 3) 10x Genomics, 7068 Koll Center Parkway, Pleasanton CA 94566; 4) Joint Initiative for Metrology in Biology, Stanford, CA 94305; 5) National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894; 6) Google, Inc. Mountain View, CA Ongoing and Future work Integration Pipeline Process Benchmark includes more bases, variants, and segmental duplications in v4 Comparison of Illumina RTG VCF against benchmark sets • SNP FNs increases by a factor of more than 3, mostly due to new benchmark variants in difficult to map regions and segmental duplications Performance in medically-relevant genes in GRCh37 • v4 draft covers more of the MHC region, see poster 1707W for details • Outside of MHC updates, top 5 genes with variants increased from v3.3.2 to v4 draft benchmark: TSPEAR (31), LAMA5 (28), FCGBP (18), TPSAB1 (15), HSPG2 (13) • PMS2 from ACMG59 has 2 more variants and RET, SCN5A, TNNI3 have 1 more variant covered in v4 draft benchmark that are not in v3.3.2 Sanger sequencing • Performed long range PCR before sequencing • Confirmed 12 variants in CYP21A2, which is a medically-relevant gene in the MHC region • Confirmed 6 variants in PMS2 Genome in a Bottle Consortium Platform Characteristics Alignment; Variant Calling PacBio Sequel II ~11Kbp reads; ~32x coverage minimap2; GATK4 minimap2; DeepVariant 10X Genomics Linked reads; ~84x coverage LongRanger Pipeline PASS variants #2 Benchmark regions 0/1 1/11/1 Benchmark calls 0/11/1 Callable regions #2 Callable regions #1 1/10/11/1PASS variants #1 InputMethods 1/1 Concordant Discordant unresolved Discordant arbitrated Concordant not callable Variants in Medical Exome (genes from OMIM, HGMD, ClinVar, UniProt) Benchmark Regions v3.3.2 8,209 Benchmark Regions v4 draft 9,527 Difficult Region Description Bases Covered in GRCh37 Bases Covered in GRCh38 v0.6 SV Benchmark 32,596,754 32,872,907 Potential copy number variation 51,713,344 62,666,746 Tandem Repeats > 10kb 5,731,885 71,942,255 Highly similar and high depth segmental duplications 1,232,701 2,094,143 Regions that are collapsed and expanded from GRCh37/38 Primary Assembly Alignments 17,979,597 N/A Modeled centromere and heterochromatin N/A 62,304,573 Subset v3.3.2 FNs v4 FNs All SNPs 8,594 30,229 Low mappability 6,708 25,295 Segmental duplications 1,429 14,008 • Refine use of genome stratifications • Adding variant calls from raw PacBio and Oxford Nanopore • Improve benchmark for larger indels, homopolymers, and tandem repeats • Improve normalization of complex variants • Generating benchmark variants from diploid assemblies • Machine learning - Outlier detection, active learning The input data for GIAB benchmark v3.3.2 consisted of Illumina, Complete Genomics, Ion, 10X, and Solid technologies. The draft v4 benchmark incorporates new PacBio CCS and 10x Genomics linked read data. New members welcome! Sign up for newsletters at www.genomeinabottle.org Volunteer to evaluate draft benchmark by emailing: justin.zook@nist.gov Excluded all methods: The following regions are excluded from all technologies and methods: • Tandem Repeats < 51bp except GATK from Illumina PCR-free, Complete Genomics, and CCS DeepVariant • Tandem Repeats > 51bp and < 200bp except GATK from Illumina PCR-Free and CCS DeepVariant • Tandem Repeats > 200bp except CCS DeepVariant • Homopolymers > 6bp except GATK from Illumina PCR-free, Complete Genomics, Ion Exome, CCS • Imperfect homopolymer > 10bp except GATK from Illumina PCR-Free • Difficult to map regions for short reads except 10x and CCS • LINE:L1Hs > 500 except Illumina MatePair, 10x, and CCS • Segmental duplications except 10x and CCS Evaluation by GIAB collaborators Compared benchmark to callsets from a variety of technologies and variant calling methods including: • Illumina PCR-Free and Dragen • 10x Genomics and Aquila (variants from local diploid assembly) • PacBio CCS and GATK4 • PacBio CCS and Clair (Next generation of Clairvoyante) • PacBio CCS and DeepVariant • ONT Promethion and Clair Preliminary results suggest that a majority of FPs and FNs are correct in the benchmark and errors in the tested callsets. v4 draft GRCh37 v4 draft GRCh38 Base pairs 2,504,027,936 2,509,269,277 Reference covered 93.2% 91.03% SNPs 3,323,773 3,314,941 Indels 519,152 519,494 Base pairs in Segmental Duplications 64,300,499 73,819,342 Arbitration Example 80.00% 85.00% 90.00% 95.00% Percent of reference covered Only in v3.3.2 GRCh37 Only in v4 draft GRCh37 SNPs INDELs More volunteers welcomed Genome in a Bottle Consortium SNPs INDELs Only in v3.3.2 GRCh38 Only in v4 draft GRCh38 343,358 69,495 77,324 23,828 376,653 91,837 91,719 48,753