Standards for Characterizing
Whole-Genome Sequencing
Marc Salit, NIST/JIMB
GIAB Workshop – August 28, 2015
Enterprise WGS Test Architecture
Preanalytical
Sequencing
Sequence
Bioinformatics
Functional Variant
Annotation
Clinical Variant
Knowledgebase
Query
Clinical
Interpretation
Reporting
EHR Archival
Genome in a Bottle Consortium
Whole Genome Variant Calling
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
• gDNA reference materials to
evaluate performance
– materials certified for their
variants against a reference
sequence, with confidence
estimates
• established consortium to
develop reference
materials, data, methods,
performance metrics
• Characterized Pilot Genome
NA12878
• Ashkenazim Trio, Asian Trio
from PGP in process
genericmeasurementprocess
Analytical/Technical Performance
Assessment
Preanalytical
Sequencing
Sequence
Bioinformatics
Functional Variant
Annotation
Clinical Variant
Knowledgebase
Query
Clinical
Interpretation
Reporting
EHR Archival
Sequencing
from DNA -> Raw Sequence Data
Evidence to be
established
Standards/Evidenc
e developing
Stakeholders
Example
Knowledge Gaps
Accurate
(unbiased)
sequencing
Fit-for-purpose
characteristics
Well-characterized
genomic DNA
reference materials
Documentary
standard describing
sequencing
characteristics
appropriate for
different clinical
indications
Standards Labs
Clinical labs
Sequencing
Technology
Developers
Academic Labs
developing
methods
Genome Centers
Sequencing
“difficult” regions
Platform artifacts
High quality
benchmark
genomes
Performance
expectations
(sensitivity/specific
ity)
Sequence Bioinformatics
Raw Sequence Data -> VCF
Evidence to be
established
Standards/Evidence
developing
Stakeholders
Example Knowledge
Gaps
Unbiased
processing of
sequence data
(mapping/asse
mbly)
Accurate variant
calling
Accurate and
unambiguous
representation,
interoperability
Protocols to critically
evaluate processes,
informed by platform
idiosyncrasies
Data representation
standards
Reference
data/implementation:
benchmark VCF files
Reference software to
evaluate VCF files
Standards Labs
Clinical labs
Sequencing
Technology
Developers
Academic Labs
developing
methods
Genome Centers
Assembly/mapping in
“difficult” regions
Artifacts
Benchmark genomes
Performance
expectations
SDO and
Accreditation body
fluency with
bioinformatics
Analytical/Technical Performance
Assessment
Preanalytical
Sequencing
Sequence
Bioinformatics
Functional Variant
Annotation
Clinical Variant
Knowledgebase
Query
Clinical
Interpretation
Reporting
EHR Archival

Aug2015 salit standards architecture

  • 1.
    Standards for Characterizing Whole-GenomeSequencing Marc Salit, NIST/JIMB GIAB Workshop – August 28, 2015
  • 2.
    Enterprise WGS TestArchitecture Preanalytical Sequencing Sequence Bioinformatics Functional Variant Annotation Clinical Variant Knowledgebase Query Clinical Interpretation Reporting EHR Archival
  • 3.
    Genome in aBottle Consortium Whole Genome Variant Calling Sample gDNA isolation Library Prep Sequencing Alignment/Mapping Variant Calling Confidence Estimates Downstream Analysis • gDNA reference materials to evaluate performance – materials certified for their variants against a reference sequence, with confidence estimates • established consortium to develop reference materials, data, methods, performance metrics • Characterized Pilot Genome NA12878 • Ashkenazim Trio, Asian Trio from PGP in process genericmeasurementprocess
  • 4.
  • 5.
    Sequencing from DNA ->Raw Sequence Data Evidence to be established Standards/Evidenc e developing Stakeholders Example Knowledge Gaps Accurate (unbiased) sequencing Fit-for-purpose characteristics Well-characterized genomic DNA reference materials Documentary standard describing sequencing characteristics appropriate for different clinical indications Standards Labs Clinical labs Sequencing Technology Developers Academic Labs developing methods Genome Centers Sequencing “difficult” regions Platform artifacts High quality benchmark genomes Performance expectations (sensitivity/specific ity)
  • 6.
    Sequence Bioinformatics Raw SequenceData -> VCF Evidence to be established Standards/Evidence developing Stakeholders Example Knowledge Gaps Unbiased processing of sequence data (mapping/asse mbly) Accurate variant calling Accurate and unambiguous representation, interoperability Protocols to critically evaluate processes, informed by platform idiosyncrasies Data representation standards Reference data/implementation: benchmark VCF files Reference software to evaluate VCF files Standards Labs Clinical labs Sequencing Technology Developers Academic Labs developing methods Genome Centers Assembly/mapping in “difficult” regions Artifacts Benchmark genomes Performance expectations SDO and Accreditation body fluency with bioinformatics
  • 7.