SlideShare a Scribd company logo
1 of 30
Genome in a Bottle:
So you’ve sequenced a genome – how well did you do?
Justin Zook and Marc Salit
NIST Genome-Scale Measurements Group
Joint Initiative for Metrology in Biology (JIMB)
March 26, 2017
Genome in a Bottle Consortium
Authoritative Characterization of Human Genomes
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
• gDNA reference materials to
evaluate performance
– materials characterized for their
variants against a reference
sequence, with confidence
estimates
• established consortium to
develop reference materials,
data, methods, performance
metrics
genericmeasurementprocess
www.slideshare.net/genomeinabottle
In September, we released 4 new
GIAB RM Genomes.
• PGP Human Genomes
– AJ son
– AJ trio
– Asian son
• Parents also characterized
National I nstituteof S tandards & Technology
Report of I nvestigation
Reference Material 8391
Human DNA for Whole-Genome Variant Assessment
(Son of Eastern European Ashkenazim Jewish Ancestry)
This Reference Material (RM) is intended for validation, optimization, and process evaluation purposes. It consists
of a male whole human genome sample of Eastern European Ashkenazim Jewish ancestry, and it can be used to assess
performance of variant calling from genome sequencing. A unit of RM 8391 consists of a vial containing human
genomic DNA extracted from a single large growth of human lymphoblastoid cell line GM24385 from the Coriell
Institute for Medical Research (Camden, NJ). The vial contains approximately 10 µg of genomic DNA, with the peak
of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA, and the DNA is in TE buffer
(10 mM TRIS, 1 mM EDTA, pH 8.0).
This material is intended for assessing performance of human genome sequencing variant calling by obtaining
estimates of true positives, false positives, true negatives, and false negatives. Sequencing applications could include
whole genome sequencing, whole exome sequencing, and more targeted sequencing such as gene panels. This
genomic DNA is intended to be analyzed in the same way as any other sample a lab would process and analyze
extracted DNA. Because the RM is extracted DNA, it is not useful for assessing pre-analytical steps such as DNA
extraction, but it does challenge sequencing library preparation, sequencing machines, and the bioinformatics steps of
mapping, alignment, and variant calling. This RM is not intended to assess subsequent bioinformatics steps such as
functional or clinical interpretation.
Information Values: Information values are provided for single nucleotide polymorphisms (SNPs), small insertions
and deletions (indels), and homozygous reference genotypes for approximately 88 % of the genome, using methods
similar to described in reference 1. An information value is considered to be a value that will be of interest and use to
the RM user, but insufficient information is available to assess the uncertainty associated with the value. We describe
and disseminate our best, most confident, estimate of the genotypes using the data and methods currently available.
These data and genomic characterizations will be maintained over time as new data accrue and measurement and
informatics methods become available. The information values are given as a variant call file (vcf) that contains the
high-confidence SNPs and small indels, as well as a tab-delimited “bed” file that describes the regions that are called
high-confidence. Information values cannot be used to establish metrological traceability. The files referenced in this
report are available at the Genome in a Bottle ftp site hosted by the National Center for Biotechnology Information
(NCBI). The Genome in a Bottle ftp site for the high-confidence vcf and high confidence regions is:
We also released a
Microbial Genome RM
National I nstituteof S tandards & Technology
Report of I nvestigation
Reference Material 8375
Microbial Genomic DNA Standards for Sequencing Performance Assessment
(MG-001, MG-002, MG-003, MG-004)
This Reference Material (RM) is intended for validation, optimization, process evaluation, and performance
assessment of whole genome sequencing. A unit of RM 8375 consists of four vials. Each vial contains a different
microbial genomic DNA sample (MG-001 Salmonella Typhimurium LT2, MG-002 Staphylococcus aureus, MG-003
Pseudomonas aeruginosa, and MG-004 Clostridium sporogenes). Each vial contains approximately 2 µg of microbial
genomic DNA; with the peak of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA;
in TE buffer (10 mM TRIS, 0.1 mM EDTA, pH 8.0).
This material is intended to help assess performance of high-throughput DNA sequencing methods. This genomic
DNA is intended to be analyzed in the same way as any other sample a laboratory would analyze extracted DNA, such
as through the use of a genome assembly or variant calling bioinformatics pipelines. Because the RM is extracted
DNA, it does not assess pre-analytical steps such as DNA extraction. It does, however, challenge sequencing library
preparation, sequencing machines, base calling algorithms, and the subsequent bioinformatics analyses such as variant
calling. This RM is not intended to assess other bioinformatics steps such as genome assembly, strain identification,
phylogenetic analysis, or genome annotation.
Information Values: Information values are currently provided for the whole genome sequence to enable
performance assessment of variant calling and assembly methods. An information value is considered to be a value
that will be of interest and use to the RM user, but insufficient information is available to assess the uncertainty
associated with the value. We describe and disseminate our best, most confident, estimate of the assembly using the
data and methods available at present [1]. Information values cannot be used to establish metrological traceability.
The genome sequence files referenced in this Report of Investigation are available at:
MG-001 Salmonella Typhimurium LT2
https://github.com/usnistgov/NIST_Micro_Genomic_RM_Data/MG001/ref_genome/MG001_v1.00.fasta
MG-002 Staphylococcus aureus
This Reference Material (RM) is
intended for validation,
optimization, process
evaluation, and performance
assessment of whole genome
sequencing.
• Salmonella Typhimurium
• Pseudomonas aeruginosa
• Staphylococcus aureus
• Clostridium sporogenes
Bringing Principles of Metrology
to the Genome
• Reference materials
– DNA in a tube you can buy from
NIST
• Extensive state-of-the-art
characterization
– arbitrated “gold standard” calls for
SNPs, small indels
• “Upgradable” as technology
develops
• PGP genomes suitable for
commercial derived products
• Developing benchmarking tools
and software
– with GA4GH
• Samples being used to develop
and demonstrate new technology
NIST Reference Materials
Genome PGP ID Coriell ID NIST ID NIST RM #
CEPH
Mother/Daughter
N/A GM12878 HG001 RM8398
AJ Son huAA53E0 GM24385 HG002 RM8391
(son)/RM8392
(trio)
AJ Father hu6E4515 GM24149 HG003 RM8392 (trio)
AJ Mother hu8E87A9 GM24143 HG004 RM8392 (trio)
Asian Son hu91BD69 GM24631 HG005 RM8393
Asian Father huCA017E GM24694 N/A N/A
Asian Mother hu38168C GM24695 N/A N/A
Data for GIAB PGP Trios
Dataset Characteristics Coverage Availability Most useful for…
Illumina Paired-end WGS 150x150bp
250x250bp
~300x/individual
~50x/individual
on SRA/FTP SNPs/indels/some SVs
Complete Genomics 100x/individual on SRA/ftp SNPs/indels/some SVs
SOLiD 5500W WGS 50bp single end 70x/son on FTP SNPs
Illumina Paired-end WES 100x100bp ~300x/individual on SRA/FTP SNPs/indels in exome
Ion Proton Exome 1000x/individual on SRA/FTP SNPs/indels in exome
Illumina Mate pair ~6000 bp insert ~30x/individual on FTP SVs
Illumina “moleculo” Custom library ~30x by long fragments on FTP SVs/phasing/assembly
Complete Genomics LFR 100x/individual on SRA/FTP SNPs/indels/phasing
10X Linked reads 30-45x/individual on FTP SNPs/SVs/phasing/assembly
PacBio ~10kb reads ~70x on AJ son, ~30x on
each AJ parent
on SRA/FTP SVs/phasing/assembly/STRs
Oxford Nanopore 5.8kb 2D reads 0.05x on AJ son on FTP SVs/assembly
Nabsys 2.0 ~100kbp N50 nanopore
maps
70x on AJ son SVs/assembly
BioNano Genomics 200-250kbp optical map
reads
~100x/AJ individual; 57x on
Asian son
on FTP SVs/assembly
Paper describing data…
51 authors
14 institutions
12 datasets
7 genomes
Data described in ISA-tab
Principles of Integration Process
• Form sensitive variant calls from
each dataset
• Define “callable regions” for each
callset
• Filter calls from each method
with annotations unlike
concordant calls
• Compare high-confidence calls to
other callsets and manually
inspect subset of differences
– vs. pedigree-based calls
– vs. common pipelines
– Trio analysis
• When benchmarking a new
callset against ours, most
putative FPs/FNs should actually
be FPs/FNs
Integration Methods to Establish Benchmark Variant
Calls
Candidate variants
Concordant variants
Find characteristics of bias
Arbitrate using evidence of bias
Confidence Level Zook et al., Nature Biotechnology, 2014.
Integration Methods to Establish Benchmark Variant
Calls
Candidate variants
Concordant variants
Find characteristics of bias
Arbitrate using evidence of bias
Confidence Level Zook et al., Nature Biotechnology, 2014.
NEW: Reproducible
integration pipeline with
new calls for NA12878 and
PGP Trios on GRCh37 and
GRCh38!
Evolution of high-confidence calls
Calls
HC
Regions HC Calls
HC
indels
Concordant
with PG
NIST-
only in
beds
PG-only
in beds PG-only
Variants
Phased
v2.19 2.22 Gb 3153247 352937 3030703 87 404 1018795 0.3%
v3.2.2 2.53 Gb 3512990 335594 3391783 57 52 657715 3.9%
v3.3 2.57 Gb 3566076 358753 3441361 40 60 608137 8.8%
v3.3.2 2.58 Gb 3691156 487841 3529641 47 61 469202 99.6%
5-7
errors
in NIST
1-7
errors
in NIST
~2 FPs and ~2 FNs per million NIST variants in PG and NIST bed files
Global Alliance for Genomics and Health Benchmarking Task
Team
• Developed standardized
definitions for performance
metrics like TP, FP, and FN.
• Developing sophisticated
benchmarking tools
• Integrated into a single framework
with standardized inputs and
outputs
• Standardized bed files with
difficult genome contexts for
stratification
https://github.com/ga4gh/benchmarking-tools
Variant types can change when decomposing
or recomposing variants:
Complex variant:
chr1 201586350 CTCTCTCTCT CA
DEL + SNP:
chr1 201586350 CTCTCTCTCT C
chr1 201586359 T A
Credit: Peter Krusche, Illumina
GA4GH Benchmarking Team
Workflow output
Benchmarking example: NA12878 / GiaB / 50X / PCR-Free / Hiseq2000
https://illumina.box.com/s/vjget1dumwmy0re19usetli2teucjel1
Credit: Peter Krusche, Illumina
GA4GH Benchmarking Team
Benchmarking Tools
Standardized comparison, counting, and stratification with
Hap.py + vcfeval
https://precision.fda.gov/https://github.com/ga4gh/benchmarking-tools
FN rates high in some tandem repeats
1x0.3x 10x3x 30x
11to50bp51to200bp
2bp unit repeat
3bp unit repeat
4bp unit repeat
2bp unit repeat
3bp unit repeat
4bp unit repeat
FN rate vs. average
GA4GH benchmarking on Github
In-progress benchmarking standards document: doc/standards
Description of intermediate formats: doc/ref-impl
Truthset descriptions and download links: resources/high-confidence-sets
Stratification bed files and descriptions: resources/stratification-bed-files
Python-code for HTML reporting and running benchmarks: reporting/basic
Please contribute / join the discussion!
https://github.com/ga4gh/benchmarking-tools
Credit: Peter Krusche, Illumina
GA4GH Benchmarking Team
Benchmarking stats can be difficult to interpret
Example: decoy-like regions
“Decoy” sequence for GRCh37
• Created to capture reads that are from
sequences that are not in the GRCh37
reference assembly, which otherwise
can cause FPs
• We only include calls in decoy-
homologous regions if they have clear
support in both 10X haplotypes
• We look at error rates for bwa-GATK
without using the decoy
SNP benchmarking stats vs. different callsets
BWA/GATK-
no decoy
vs. 2.18 vs. 3.3.2 vs. PG
Precision 91% 67% 93%
Recall 99.8% 99.4% 93%
Outside bed 91% 92% 78%
• v3.3.2 best at identifying FP SNPs
– 43% of FPs in decoy (only 0.5% of TPs)
• PG best at identifying FN SNPs
– Mostly clustered, unclear variants in
difficult-to-map regions
Benchmarking stats can be difficult to interpret
Example: FN SNPs in coding regions
RefSeq Coding Regions
• Studies often focus on variants in
coding regions
• We look at FN SNP rates for bwa-GATK
using the decoy
SNP benchmarking stats vs. PG and 3.3.2
• 97.98% sensitivity vs. PG
– FNs predominately in low MQ and/or
segmental duplication regions
– ~80% of FNs supported by long or linked
reads
• 99.96% sensitivity vs. NISTv3.3.2
– 62x lower FN rate than vs PG
• As always, true sensitivity is unknown
Benchmarking stats can be difficult to interpret
Example: FN SNPs in coding regions
RefSeq Coding Regions
• Studies often focus on variants in
coding regions
• We look at FN SNP rates for bwa-GATK
using the decoy
SNP benchmarking stats vs. PG and 3.3.2
• 97.98% sensitivity vs. PG
– FNs predominately in low MQ and/or
segmental duplication regions
– ~80% of FNs supported by long or linked
reads
• 99.96% sensitivity vs. NISTv3.3.2
– 62x lower FN rate than vs PG
• As always, true sensitivity is unknown
True accuracy is hard to
estimate, especially in
difficult regions
Approaches to Benchmarking Variant Calling
• Well-characterized whole genome Reference Materials
• Many samples characterized in clinically relevant regions
• Synthetic DNA spike-ins
• Cell lines with engineered mutations
• Simulated reads
• Modified real reads
• Modified reference genomes
• Confirming results found in real samples over time
Challenges in Benchmarking Variant Calling
• It is difficult to do robust benchmarking of tests designed to detect
many analytes (e.g., many variants)
• Easiest to benchmark only within high-confidence bed file, but…
• Benchmark calls/regions tend to be biased towards easier variants
and regions
– Some clinical tests are enriched for difficult sites
• Always manually inspect a subset of FPs/FNs
• Stratification by variant type and region is important
• Always calculate confidence intervals on performance metrics
How can we extend this approach to structural
variants?
Similarities to small variants
• Collect callsets from multiple
technologies
• Compare callsets to find calls
supported by multiple technologies
Differences from small variants
• Callsets have limited sensitivity
• Variants are often imprecisely
characterized
– breakpoints, size, type, etc.
• Representation of variants is poorly
standardized, especially when complex
• Comparison tools in infancy
Preliminary process for integrated deletions
Merge
deletions
within 1kb
Rank calls by
closeness of
predicted size
to median size
and select call
in each region
from best
callset
Find calls
supported by
2+
technologies
with size
within 20%
Filter calls
overlapping
seg dups,
reference N’s,
or with call
with predicted
size 2x larger
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_DraftIntegratedDeletionsgt19bp_v0.1.8
<50bp 50-100bp 100-1000bp 1kb-3kb >3kbp
Pre-filtered calls 2627 1600 2306 385 389
Post-filtered calls 2548 1448 1996 297 262
Proposed SV integration process
Calls with
REF and ALT
sequence
SV
Discovery
Imprecise SV
calls
Sequence-
based
comparison
SV
corroboration
methods (e.g.,
svviz, nabsys,
bionano,
Illumina
population,
lumpy?)
Heuristics to
form tiers of
benchmark SVs
Machine
learning to
form
benchmark SVs
Comparison
of all
candidate
calls
(SURVIVOR/s
vcompare)
SV
Comparison
SV
Corroboration
Form SV
benchmark calls
SV sequence
refinement
(Parliament,
Spiral Genetics,
PBRefine,
graphs?)
Paper about calls
and
comparisons?
SV
Refinement
Manually
curate
alignments
around a
subset of calls;
ask community
for feedback
Evaluate/optimize
benchmark calls
Draft de novo assemblies for AJ Son
Data Method
Contig
N50
Scaffold
N50
Number
Scaffolds
Total
Size
PacBio Falcon 5.3 Mb 5.3 Mb 13231 3.04 Gb
PacBio PBcR 4.5 Mb 4.5 Mb 12523 2.99 Gb
PacBio+
BioNano
Falcon+
BioNano 6.1 Mb 59.4 Mb 10591 3.27 Gb
PacBio+
Dovetail
Falcon+
HiRise 5.3 Mb 12.9 Mb 12459 3.04 Gb
PacBio+
Dovetail
PBcR+
HiRise 4.1 Mb 20.6 Mb 10491 2.99 Gb
Illumina DISCOVAR 81 kb 149 kb 1.06M 3.13 Gb
Illumina+
Dovetail
DISCOVAR+
HiRise 85 kb 12.9 Mb 1.03M 3.15 Gb
10X Supernova 106 kb 15.2 Mb 1360 2.73 Gb
Credits for assemblies:
Ali Bashir, Mt. Sinai
Jason Chin, PacBio
Alex Hastie, BioNano
Serge Koren, NHGRI
Adam Phillippy, NHGRI
Kareina Dill, Dovetail
Noushin Ghaffari, TAMU
10X Genomics
Assembly-based SV calls:
MSPAC
Assemblytics
PBRefineIMPORTANT NOTE: These are draft assemblies and statistics should not be used to
compare quality of assembly methods.
New Samples
Additional ancestries
• Shorter term
– Use existing PGP individual samples
– Use existing integration pipeline
• Data-based selection
– Proportion of potential genomes from
different ancestries
• 3 to 8 new samples
• Longer term
– Recruit large family
– Recruit trios from other ancestry groups
Cancer samples
• Longer term
• Make PGP-consented tumor and
normal cell lines from same individual
• Select tumor with diversity of mutation
types
Acknowledgements
• NIST/JIMB
– Marc Salit
– Jenny McDaniel
– Lindsay Vang
– David Catoe
– Lesley Chapman
• Genome in a Bottle Consortium
• GA4GH Benchmarking Team
• FDA
– Liz Mansfield
– Zivana Tevak
– David Litwack
For More Information
www.genomeinabottle.org - sign up for general GIAB and Analysis Team google group
emails
github.com/genome-in-a-bottle – Guide to GIAB data & ftp
www.slideshare.net/genomeinabottle
www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser
Data: http://www.nature.com/articles/sdata201625
Global Alliance Benchmarking Team
– https://github.com/ga4gh/benchmarking-tools
Public workshops
– Possible SV integration mini-workshop in 2017
– Next large workshop early 2018
NIST/JIMB postdoc opportunities available!
Justin Zook: jzook@nist.gov
Marc Salit: salit@nist.gov

More Related Content

What's hot

Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
GenomeInABottle
 

What's hot (20)

Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
Aug2015 horizon diagnostics
Aug2015 horizon diagnosticsAug2015 horizon diagnostics
Aug2015 horizon diagnostics
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Jan2016 bina giab
Jan2016 bina giabJan2016 bina giab
Jan2016 bina giab
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 

Viewers also liked

Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...
Matt Stine
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait Data
GenomeInABottle
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
GenomeInABottle
 
NIST program to develop genomic reference materials
NIST program to develop genomic reference materialsNIST program to develop genomic reference materials
NIST program to develop genomic reference materials
GenomeInABottle
 
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Matt Stine
 

Viewers also liked (20)

Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait Data
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
A National Network of Biomedical Research Expertise
A National Network of Biomedical Research ExpertiseA National Network of Biomedical Research Expertise
A National Network of Biomedical Research Expertise
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
 
Biomedical research
Biomedical researchBiomedical research
Biomedical research
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
 
I V I F2 F July 2005 Talk
I V I  F2 F  July 2005  TalkI V I  F2 F  July 2005  Talk
I V I F2 F July 2005 Talk
 
Maximizing Social Capital to Increase Core Facility Exposure and Usage
Maximizing Social Capital to Increase Core Facility Exposure and UsageMaximizing Social Capital to Increase Core Facility Exposure and Usage
Maximizing Social Capital to Increase Core Facility Exposure and Usage
 
NIST program to develop genomic reference materials
NIST program to develop genomic reference materialsNIST program to develop genomic reference materials
NIST program to develop genomic reference materials
 
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
 
Clean Labs Training
Clean Labs TrainingClean Labs Training
Clean Labs Training
 
decentralization: a trend in biomedical research
decentralization: a trend in biomedical researchdecentralization: a trend in biomedical research
decentralization: a trend in biomedical research
 
Making Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbMaking Biomedical Research More Like Airbnb
Making Biomedical Research More Like Airbnb
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital Enterprise
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
 
Core Facility 2.0 - leveraging social media to enhance visibility
Core Facility 2.0 - leveraging social media to enhance visibilityCore Facility 2.0 - leveraging social media to enhance visibility
Core Facility 2.0 - leveraging social media to enhance visibility
 
HIE technical infrastructure
HIE technical infrastructureHIE technical infrastructure
HIE technical infrastructure
 

Similar to 170326 giab abrf

Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
Long Pei
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
Computer Science Club
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
Dago Noel
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
Dago Noel
 

Similar to 170326 giab abrf (20)

171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slides
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
Multiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell FunctionMultiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell Function
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic Technology
 
Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 

More from GenomeInABottle

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
 

Recently uploaded

💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
Sheetaleventcompany
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan 087776558899
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
Sheetaleventcompany
 
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Sheetaleventcompany
 
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Sheetaleventcompany
 
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Sheetaleventcompany
 
Difference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesDifference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac Muscles
MedicoseAcademics
 

Recently uploaded (20)

Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...
Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...
Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...
 
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
Independent Bangalore Call Girls (Adult Only) 💯Call Us 🔝 7304373326 🔝 💃 Escor...
 
Call Girls in Lucknow Just Call 👉👉 8875999948 Top Class Call Girl Service Ava...
Call Girls in Lucknow Just Call 👉👉 8875999948 Top Class Call Girl Service Ava...Call Girls in Lucknow Just Call 👉👉 8875999948 Top Class Call Girl Service Ava...
Call Girls in Lucknow Just Call 👉👉 8875999948 Top Class Call Girl Service Ava...
 
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
 
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book nowChennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
 
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
Pune Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Pune No💰Adva...
 
💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
 
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableCall Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
 
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
 
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
 
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
Ahmedabad Call Girls Book Now 9630942363 Top Class Ahmedabad Escort Service A...
 
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
Premium Call Girls Dehradun {8854095900} ❤️VVIP ANJU Call Girls in Dehradun U...
 
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
Bhawanipatna Call Girls 📞9332606886 Call Girls in Bhawanipatna Escorts servic...
 
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
 
Difference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesDifference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac Muscles
 
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
 

170326 giab abrf

  • 1. Genome in a Bottle: So you’ve sequenced a genome – how well did you do? Justin Zook and Marc Salit NIST Genome-Scale Measurements Group Joint Initiative for Metrology in Biology (JIMB) March 26, 2017
  • 2. Genome in a Bottle Consortium Authoritative Characterization of Human Genomes Sample gDNA isolation Library Prep Sequencing Alignment/Mapping Variant Calling Confidence Estimates Downstream Analysis • gDNA reference materials to evaluate performance – materials characterized for their variants against a reference sequence, with confidence estimates • established consortium to develop reference materials, data, methods, performance metrics genericmeasurementprocess www.slideshare.net/genomeinabottle
  • 3. In September, we released 4 new GIAB RM Genomes. • PGP Human Genomes – AJ son – AJ trio – Asian son • Parents also characterized National I nstituteof S tandards & Technology Report of I nvestigation Reference Material 8391 Human DNA for Whole-Genome Variant Assessment (Son of Eastern European Ashkenazim Jewish Ancestry) This Reference Material (RM) is intended for validation, optimization, and process evaluation purposes. It consists of a male whole human genome sample of Eastern European Ashkenazim Jewish ancestry, and it can be used to assess performance of variant calling from genome sequencing. A unit of RM 8391 consists of a vial containing human genomic DNA extracted from a single large growth of human lymphoblastoid cell line GM24385 from the Coriell Institute for Medical Research (Camden, NJ). The vial contains approximately 10 µg of genomic DNA, with the peak of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA, and the DNA is in TE buffer (10 mM TRIS, 1 mM EDTA, pH 8.0). This material is intended for assessing performance of human genome sequencing variant calling by obtaining estimates of true positives, false positives, true negatives, and false negatives. Sequencing applications could include whole genome sequencing, whole exome sequencing, and more targeted sequencing such as gene panels. This genomic DNA is intended to be analyzed in the same way as any other sample a lab would process and analyze extracted DNA. Because the RM is extracted DNA, it is not useful for assessing pre-analytical steps such as DNA extraction, but it does challenge sequencing library preparation, sequencing machines, and the bioinformatics steps of mapping, alignment, and variant calling. This RM is not intended to assess subsequent bioinformatics steps such as functional or clinical interpretation. Information Values: Information values are provided for single nucleotide polymorphisms (SNPs), small insertions and deletions (indels), and homozygous reference genotypes for approximately 88 % of the genome, using methods similar to described in reference 1. An information value is considered to be a value that will be of interest and use to the RM user, but insufficient information is available to assess the uncertainty associated with the value. We describe and disseminate our best, most confident, estimate of the genotypes using the data and methods currently available. These data and genomic characterizations will be maintained over time as new data accrue and measurement and informatics methods become available. The information values are given as a variant call file (vcf) that contains the high-confidence SNPs and small indels, as well as a tab-delimited “bed” file that describes the regions that are called high-confidence. Information values cannot be used to establish metrological traceability. The files referenced in this report are available at the Genome in a Bottle ftp site hosted by the National Center for Biotechnology Information (NCBI). The Genome in a Bottle ftp site for the high-confidence vcf and high confidence regions is:
  • 4. We also released a Microbial Genome RM National I nstituteof S tandards & Technology Report of I nvestigation Reference Material 8375 Microbial Genomic DNA Standards for Sequencing Performance Assessment (MG-001, MG-002, MG-003, MG-004) This Reference Material (RM) is intended for validation, optimization, process evaluation, and performance assessment of whole genome sequencing. A unit of RM 8375 consists of four vials. Each vial contains a different microbial genomic DNA sample (MG-001 Salmonella Typhimurium LT2, MG-002 Staphylococcus aureus, MG-003 Pseudomonas aeruginosa, and MG-004 Clostridium sporogenes). Each vial contains approximately 2 µg of microbial genomic DNA; with the peak of the nominal length distribution longer than 48.5 kb, as referenced by Lambda DNA; in TE buffer (10 mM TRIS, 0.1 mM EDTA, pH 8.0). This material is intended to help assess performance of high-throughput DNA sequencing methods. This genomic DNA is intended to be analyzed in the same way as any other sample a laboratory would analyze extracted DNA, such as through the use of a genome assembly or variant calling bioinformatics pipelines. Because the RM is extracted DNA, it does not assess pre-analytical steps such as DNA extraction. It does, however, challenge sequencing library preparation, sequencing machines, base calling algorithms, and the subsequent bioinformatics analyses such as variant calling. This RM is not intended to assess other bioinformatics steps such as genome assembly, strain identification, phylogenetic analysis, or genome annotation. Information Values: Information values are currently provided for the whole genome sequence to enable performance assessment of variant calling and assembly methods. An information value is considered to be a value that will be of interest and use to the RM user, but insufficient information is available to assess the uncertainty associated with the value. We describe and disseminate our best, most confident, estimate of the assembly using the data and methods available at present [1]. Information values cannot be used to establish metrological traceability. The genome sequence files referenced in this Report of Investigation are available at: MG-001 Salmonella Typhimurium LT2 https://github.com/usnistgov/NIST_Micro_Genomic_RM_Data/MG001/ref_genome/MG001_v1.00.fasta MG-002 Staphylococcus aureus This Reference Material (RM) is intended for validation, optimization, process evaluation, and performance assessment of whole genome sequencing. • Salmonella Typhimurium • Pseudomonas aeruginosa • Staphylococcus aureus • Clostridium sporogenes
  • 5. Bringing Principles of Metrology to the Genome • Reference materials – DNA in a tube you can buy from NIST • Extensive state-of-the-art characterization – arbitrated “gold standard” calls for SNPs, small indels • “Upgradable” as technology develops • PGP genomes suitable for commercial derived products • Developing benchmarking tools and software – with GA4GH • Samples being used to develop and demonstrate new technology
  • 6. NIST Reference Materials Genome PGP ID Coriell ID NIST ID NIST RM # CEPH Mother/Daughter N/A GM12878 HG001 RM8398 AJ Son huAA53E0 GM24385 HG002 RM8391 (son)/RM8392 (trio) AJ Father hu6E4515 GM24149 HG003 RM8392 (trio) AJ Mother hu8E87A9 GM24143 HG004 RM8392 (trio) Asian Son hu91BD69 GM24631 HG005 RM8393 Asian Father huCA017E GM24694 N/A N/A Asian Mother hu38168C GM24695 N/A N/A
  • 7. Data for GIAB PGP Trios Dataset Characteristics Coverage Availability Most useful for… Illumina Paired-end WGS 150x150bp 250x250bp ~300x/individual ~50x/individual on SRA/FTP SNPs/indels/some SVs Complete Genomics 100x/individual on SRA/ftp SNPs/indels/some SVs SOLiD 5500W WGS 50bp single end 70x/son on FTP SNPs Illumina Paired-end WES 100x100bp ~300x/individual on SRA/FTP SNPs/indels in exome Ion Proton Exome 1000x/individual on SRA/FTP SNPs/indels in exome Illumina Mate pair ~6000 bp insert ~30x/individual on FTP SVs Illumina “moleculo” Custom library ~30x by long fragments on FTP SVs/phasing/assembly Complete Genomics LFR 100x/individual on SRA/FTP SNPs/indels/phasing 10X Linked reads 30-45x/individual on FTP SNPs/SVs/phasing/assembly PacBio ~10kb reads ~70x on AJ son, ~30x on each AJ parent on SRA/FTP SVs/phasing/assembly/STRs Oxford Nanopore 5.8kb 2D reads 0.05x on AJ son on FTP SVs/assembly Nabsys 2.0 ~100kbp N50 nanopore maps 70x on AJ son SVs/assembly BioNano Genomics 200-250kbp optical map reads ~100x/AJ individual; 57x on Asian son on FTP SVs/assembly
  • 8. Paper describing data… 51 authors 14 institutions 12 datasets 7 genomes Data described in ISA-tab
  • 9. Principles of Integration Process • Form sensitive variant calls from each dataset • Define “callable regions” for each callset • Filter calls from each method with annotations unlike concordant calls • Compare high-confidence calls to other callsets and manually inspect subset of differences – vs. pedigree-based calls – vs. common pipelines – Trio analysis • When benchmarking a new callset against ours, most putative FPs/FNs should actually be FPs/FNs
  • 10. Integration Methods to Establish Benchmark Variant Calls Candidate variants Concordant variants Find characteristics of bias Arbitrate using evidence of bias Confidence Level Zook et al., Nature Biotechnology, 2014.
  • 11. Integration Methods to Establish Benchmark Variant Calls Candidate variants Concordant variants Find characteristics of bias Arbitrate using evidence of bias Confidence Level Zook et al., Nature Biotechnology, 2014. NEW: Reproducible integration pipeline with new calls for NA12878 and PGP Trios on GRCh37 and GRCh38!
  • 12. Evolution of high-confidence calls Calls HC Regions HC Calls HC indels Concordant with PG NIST- only in beds PG-only in beds PG-only Variants Phased v2.19 2.22 Gb 3153247 352937 3030703 87 404 1018795 0.3% v3.2.2 2.53 Gb 3512990 335594 3391783 57 52 657715 3.9% v3.3 2.57 Gb 3566076 358753 3441361 40 60 608137 8.8% v3.3.2 2.58 Gb 3691156 487841 3529641 47 61 469202 99.6% 5-7 errors in NIST 1-7 errors in NIST ~2 FPs and ~2 FNs per million NIST variants in PG and NIST bed files
  • 13. Global Alliance for Genomics and Health Benchmarking Task Team • Developed standardized definitions for performance metrics like TP, FP, and FN. • Developing sophisticated benchmarking tools • Integrated into a single framework with standardized inputs and outputs • Standardized bed files with difficult genome contexts for stratification https://github.com/ga4gh/benchmarking-tools Variant types can change when decomposing or recomposing variants: Complex variant: chr1 201586350 CTCTCTCTCT CA DEL + SNP: chr1 201586350 CTCTCTCTCT C chr1 201586359 T A Credit: Peter Krusche, Illumina GA4GH Benchmarking Team
  • 14. Workflow output Benchmarking example: NA12878 / GiaB / 50X / PCR-Free / Hiseq2000 https://illumina.box.com/s/vjget1dumwmy0re19usetli2teucjel1 Credit: Peter Krusche, Illumina GA4GH Benchmarking Team
  • 15. Benchmarking Tools Standardized comparison, counting, and stratification with Hap.py + vcfeval https://precision.fda.gov/https://github.com/ga4gh/benchmarking-tools
  • 16. FN rates high in some tandem repeats 1x0.3x 10x3x 30x 11to50bp51to200bp 2bp unit repeat 3bp unit repeat 4bp unit repeat 2bp unit repeat 3bp unit repeat 4bp unit repeat FN rate vs. average
  • 17. GA4GH benchmarking on Github In-progress benchmarking standards document: doc/standards Description of intermediate formats: doc/ref-impl Truthset descriptions and download links: resources/high-confidence-sets Stratification bed files and descriptions: resources/stratification-bed-files Python-code for HTML reporting and running benchmarks: reporting/basic Please contribute / join the discussion! https://github.com/ga4gh/benchmarking-tools Credit: Peter Krusche, Illumina GA4GH Benchmarking Team
  • 18. Benchmarking stats can be difficult to interpret Example: decoy-like regions “Decoy” sequence for GRCh37 • Created to capture reads that are from sequences that are not in the GRCh37 reference assembly, which otherwise can cause FPs • We only include calls in decoy- homologous regions if they have clear support in both 10X haplotypes • We look at error rates for bwa-GATK without using the decoy SNP benchmarking stats vs. different callsets BWA/GATK- no decoy vs. 2.18 vs. 3.3.2 vs. PG Precision 91% 67% 93% Recall 99.8% 99.4% 93% Outside bed 91% 92% 78% • v3.3.2 best at identifying FP SNPs – 43% of FPs in decoy (only 0.5% of TPs) • PG best at identifying FN SNPs – Mostly clustered, unclear variants in difficult-to-map regions
  • 19. Benchmarking stats can be difficult to interpret Example: FN SNPs in coding regions RefSeq Coding Regions • Studies often focus on variants in coding regions • We look at FN SNP rates for bwa-GATK using the decoy SNP benchmarking stats vs. PG and 3.3.2 • 97.98% sensitivity vs. PG – FNs predominately in low MQ and/or segmental duplication regions – ~80% of FNs supported by long or linked reads • 99.96% sensitivity vs. NISTv3.3.2 – 62x lower FN rate than vs PG • As always, true sensitivity is unknown
  • 20. Benchmarking stats can be difficult to interpret Example: FN SNPs in coding regions RefSeq Coding Regions • Studies often focus on variants in coding regions • We look at FN SNP rates for bwa-GATK using the decoy SNP benchmarking stats vs. PG and 3.3.2 • 97.98% sensitivity vs. PG – FNs predominately in low MQ and/or segmental duplication regions – ~80% of FNs supported by long or linked reads • 99.96% sensitivity vs. NISTv3.3.2 – 62x lower FN rate than vs PG • As always, true sensitivity is unknown True accuracy is hard to estimate, especially in difficult regions
  • 21. Approaches to Benchmarking Variant Calling • Well-characterized whole genome Reference Materials • Many samples characterized in clinically relevant regions • Synthetic DNA spike-ins • Cell lines with engineered mutations • Simulated reads • Modified real reads • Modified reference genomes • Confirming results found in real samples over time
  • 22. Challenges in Benchmarking Variant Calling • It is difficult to do robust benchmarking of tests designed to detect many analytes (e.g., many variants) • Easiest to benchmark only within high-confidence bed file, but… • Benchmark calls/regions tend to be biased towards easier variants and regions – Some clinical tests are enriched for difficult sites • Always manually inspect a subset of FPs/FNs • Stratification by variant type and region is important • Always calculate confidence intervals on performance metrics
  • 23. How can we extend this approach to structural variants? Similarities to small variants • Collect callsets from multiple technologies • Compare callsets to find calls supported by multiple technologies Differences from small variants • Callsets have limited sensitivity • Variants are often imprecisely characterized – breakpoints, size, type, etc. • Representation of variants is poorly standardized, especially when complex • Comparison tools in infancy
  • 24. Preliminary process for integrated deletions Merge deletions within 1kb Rank calls by closeness of predicted size to median size and select call in each region from best callset Find calls supported by 2+ technologies with size within 20% Filter calls overlapping seg dups, reference N’s, or with call with predicted size 2x larger ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_DraftIntegratedDeletionsgt19bp_v0.1.8 <50bp 50-100bp 100-1000bp 1kb-3kb >3kbp Pre-filtered calls 2627 1600 2306 385 389 Post-filtered calls 2548 1448 1996 297 262
  • 25. Proposed SV integration process Calls with REF and ALT sequence SV Discovery Imprecise SV calls Sequence- based comparison SV corroboration methods (e.g., svviz, nabsys, bionano, Illumina population, lumpy?) Heuristics to form tiers of benchmark SVs Machine learning to form benchmark SVs Comparison of all candidate calls (SURVIVOR/s vcompare) SV Comparison SV Corroboration Form SV benchmark calls SV sequence refinement (Parliament, Spiral Genetics, PBRefine, graphs?) Paper about calls and comparisons? SV Refinement Manually curate alignments around a subset of calls; ask community for feedback Evaluate/optimize benchmark calls
  • 26. Draft de novo assemblies for AJ Son Data Method Contig N50 Scaffold N50 Number Scaffolds Total Size PacBio Falcon 5.3 Mb 5.3 Mb 13231 3.04 Gb PacBio PBcR 4.5 Mb 4.5 Mb 12523 2.99 Gb PacBio+ BioNano Falcon+ BioNano 6.1 Mb 59.4 Mb 10591 3.27 Gb PacBio+ Dovetail Falcon+ HiRise 5.3 Mb 12.9 Mb 12459 3.04 Gb PacBio+ Dovetail PBcR+ HiRise 4.1 Mb 20.6 Mb 10491 2.99 Gb Illumina DISCOVAR 81 kb 149 kb 1.06M 3.13 Gb Illumina+ Dovetail DISCOVAR+ HiRise 85 kb 12.9 Mb 1.03M 3.15 Gb 10X Supernova 106 kb 15.2 Mb 1360 2.73 Gb Credits for assemblies: Ali Bashir, Mt. Sinai Jason Chin, PacBio Alex Hastie, BioNano Serge Koren, NHGRI Adam Phillippy, NHGRI Kareina Dill, Dovetail Noushin Ghaffari, TAMU 10X Genomics Assembly-based SV calls: MSPAC Assemblytics PBRefineIMPORTANT NOTE: These are draft assemblies and statistics should not be used to compare quality of assembly methods.
  • 27. New Samples Additional ancestries • Shorter term – Use existing PGP individual samples – Use existing integration pipeline • Data-based selection – Proportion of potential genomes from different ancestries • 3 to 8 new samples • Longer term – Recruit large family – Recruit trios from other ancestry groups Cancer samples • Longer term • Make PGP-consented tumor and normal cell lines from same individual • Select tumor with diversity of mutation types
  • 28.
  • 29. Acknowledgements • NIST/JIMB – Marc Salit – Jenny McDaniel – Lindsay Vang – David Catoe – Lesley Chapman • Genome in a Bottle Consortium • GA4GH Benchmarking Team • FDA – Liz Mansfield – Zivana Tevak – David Litwack
  • 30. For More Information www.genomeinabottle.org - sign up for general GIAB and Analysis Team google group emails github.com/genome-in-a-bottle – Guide to GIAB data & ftp www.slideshare.net/genomeinabottle www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser Data: http://www.nature.com/articles/sdata201625 Global Alliance Benchmarking Team – https://github.com/ga4gh/benchmarking-tools Public workshops – Possible SV integration mini-workshop in 2017 – Next large workshop early 2018 NIST/JIMB postdoc opportunities available! Justin Zook: jzook@nist.gov Marc Salit: salit@nist.gov