SlideShare a Scribd company logo
Model results: SNVs and INDELs
• Despite advances in sequencing technology, correctly identifying complex variation in the
human genome remains challenging
• Challenging variants have general characteristics, but we lack a data-driven model to
quantifiably link these characteristics with likelihood of successful identification
• We are using explainable boosting machines (EBMs) to model variant calling accuracy as a
function of genomic context, with specific goals to:
• understand sequencing errors to enable more precise stratifications when benchmarking
• predict which types of variants and genome contexts a variant caller might miss
Introduction
Generating Features from Genomic Regions
StratoMod: Using Machine Learning Models to Understand
Errors in Human Genomic Variant Calling
N. Dwarshuis1
, J. Wagner1
, N. Olson1
, J. McDaniel1
, P. Tonner2
, F Sedlazeck3
, J.M. Zook1
1) Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD
2) Information Technology Laboratory, National Institute of Standards and Technology, Gaithersburg, MD
3) Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Analysis Pipeline Overview
Genome in a Bottle
Consortium
Explainable Boosting Machines
Transparency
Accuracy
𝑔( 𝑦 )= 𝑓 1 (𝑥1 )+ 𝑓 2 (𝑥2)+…+ 𝑓 1, 2 (𝑥1 , 𝑥2 )+…
g: logit link function
fk: decision tree with feature k
EBM: GAM with
interaction terms
 Flexibility of blackbox models
(random forest, XGBoost)
 Interpretability of simpler
models
 Efficiently fits 2nd
order terms
Nori et al, arXiv. 2019
Lou et al. SIGKDD Proc. 2012
github.com/interpretml/interpret
region y
1 TP
2 TP
3 FP
4 TP
…
region REF ALT
0 A AAT
1 C G
2 G GG
3 CG G
…
region REF ALT
0 A AAT
1 C G
2 CG G
…
region y x1 x2 x3
1 TP
2 TP
3 FP
4 TP
…
region x1 x2 x3
1
2
3
4
…
Benchmark variants
(ground truth)
Comparison variants
Compare
variants
Variant labels
w/ regions
Join
features
and labels
by region
Genomic
Region Features
region x1 x2 x3
1
2
3
4
…
region x1 x2 x3
1
2
3
4
…
Fit EBM on
y vs x’s
Interpretation
Labels
TP: variant in both benchmark and comparison
FP: variant only in comparison (error)
Model Fit
Shows (transparently) the genomic contexts that
are likely to lead to incorrect variant calling
Feature Class Description
Segmental Duplications Highly similar sequences >1000b repeated throughout genome
Tandem Repeats Short repeated patterns (eg TATATA)
Transposable Elements Sequences that copy themselves to other locations
Other features included:
In general, variants in highly repetitive regions are hard to identify correctly
Example: imperfect homopolymers – regions (almost
entirely) of just one base repeated
AAAAAAAAAAAAA
GGGGGGGGGCGGGGGGGGGGGG
CCCCCCCCCCCCCCCCCCC
Use length of
these regions
as a feature
Funding: NIST Use-Inspired AI Program (FARSAIT)
BioRxiv preprint:
https://doi.org/10.1101/2023.01.20.524401
Contact: njd2@nist.gov
• Parameters: 19 main effects, 18
interactions (all features crossed with
PCR-free vs PCR-plus)
• Trained on false positives (FPs) in
unfiltered DeepVariant candidate
calls vs true positives (TPs)
• FP errors are overall less likely for PCR-Free
• The length at which FP error rate exceeds
baseline (dotted line) is longer in PCR-Free
• FP errors are equally likely1
in both technologies
• FP errors are likely in A/T and G/C
homopolymers >15 and any length respectively
INDELs
SNVs
Use case 1: assessing FP errors in PCR-free/plus Illumina sequencing methods
• Parameters: 17 main effects, 16
interactions with vcf input file to
compare pipelines
• Modeled missed/incorrectly
filtered variants by training on
false negatives (FNs) in final
DeepVariant callsets vs TPs
Use case 2: assessing FN error likelihood in HiFi and Illumina analysis pipelines
1. “score” on each y-axis is in logistic space, and each plot is the sum of the indicated feature, its interaction with VCF input, and the VCF input main effect
2. Difficult to map regions defined from GIAB v3.0 stratifications
INDELs
SNVs
Performance was consistent across
genomic samples
• DeepVariant/Benchmark had more overlap than
StratoMod/BenchMark
• StratoMod/DeepVariant had significant non-
benchmark overlap
• Most ClinVar variants that were predicted to
be missed were in hard-to-map regions
• This has an additional advantage of also
identifying difficult variants in large INDELs
and tandem repeats
Element shows advantage for longer AT
homopolymers
Linear Model y = B1x1 + B2x2 + ...
Generalized
Linear Model
g(y) = B1x1 + B2x2 + ...
Additive Model y = f1(x1) + f2(x2) + ...
Generalized
Additive Model
g(y) = f1(x1) + f2(x2) + ...
Full Complexity
Model
y = f(x1, x2, ...)
Shared (p(FN) > 0.9)
Non-Shared(p(FN) > 0.9)

More Related Content

Similar to Stratomod ASHG 2023

Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
GenomeInABottle
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence callsGenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
GenomeInABottle
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
GenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
GenomeInABottle
 
GENOMIC SIGNAL PROCESSING
GENOMIC SIGNAL PROCESSINGGENOMIC SIGNAL PROCESSING
GENOMIC SIGNAL PROCESSING
Shobhit Srivastava
 
Plant transformation
Plant transformationPlant transformation
Plant transformation
Soumitra Paul
 
marker system presentation shiv shankar.pptx
marker system presentation shiv shankar.pptxmarker system presentation shiv shankar.pptx
marker system presentation shiv shankar.pptx
ShivshankarLoniya
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
Rafael C. Jimenez
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
GenomeInABottle
 
El proyecto 1000 genomas
El proyecto 1000 genomas El proyecto 1000 genomas
El proyecto 1000 genomas Ultramolecular
 
QTL mapping in genetic analysis
QTL mapping in genetic analysisQTL mapping in genetic analysis
QTL mapping in genetic analysis
NikhilNik25
 
Assays for Determining Lesion Bypass Efficiency and Mutagenicity
Assays for Determining Lesion Bypass Efficiency and MutagenicityAssays for Determining Lesion Bypass Efficiency and Mutagenicity
Assays for Determining Lesion Bypass Efficiency and MutagenicityPatrick Dumas
 
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Genomika Diagnósticos
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
QTL MAPPING & ANALYSIS
QTL MAPPING & ANALYSIS  QTL MAPPING & ANALYSIS
QTL MAPPING & ANALYSIS
manjunath kencharahut
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...
Dr. Mukesh Chavan
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
GenomeInABottle
 

Similar to Stratomod ASHG 2023 (20)

Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GENOMIC SIGNAL PROCESSING
GENOMIC SIGNAL PROCESSINGGENOMIC SIGNAL PROCESSING
GENOMIC SIGNAL PROCESSING
 
Plant transformation
Plant transformationPlant transformation
Plant transformation
 
marker system presentation shiv shankar.pptx
marker system presentation shiv shankar.pptxmarker system presentation shiv shankar.pptx
marker system presentation shiv shankar.pptx
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
El proyecto 1000 genomas
El proyecto 1000 genomas El proyecto 1000 genomas
El proyecto 1000 genomas
 
QTL mapping in genetic analysis
QTL mapping in genetic analysisQTL mapping in genetic analysis
QTL mapping in genetic analysis
 
Assays for Determining Lesion Bypass Efficiency and Mutagenicity
Assays for Determining Lesion Bypass Efficiency and MutagenicityAssays for Determining Lesion Bypass Efficiency and Mutagenicity
Assays for Determining Lesion Bypass Efficiency and Mutagenicity
 
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
QTL MAPPING & ANALYSIS
QTL MAPPING & ANALYSIS  QTL MAPPING & ANALYSIS
QTL MAPPING & ANALYSIS
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
P0126557 slides
P0126557 slidesP0126557 slides
P0126557 slides
 

More from GenomeInABottle

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
GenomeInABottle
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
GenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
GenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
GenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
GenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
GenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
GenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
GenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
GenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
GenomeInABottle
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
GenomeInABottle
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
GenomeInABottle
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
GenomeInABottle
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
GenomeInABottle
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
GenomeInABottle
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
GenomeInABottle
 

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
 

Recently uploaded

BRACHYTHERAPY OVERVIEW AND APPLICATORS
BRACHYTHERAPY OVERVIEW  AND  APPLICATORSBRACHYTHERAPY OVERVIEW  AND  APPLICATORS
BRACHYTHERAPY OVERVIEW AND APPLICATORS
Krishan Murari
 
Cardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdfCardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdf
shivalingatalekar1
 
Best Ayurvedic medicine for Gas and Indigestion
Best Ayurvedic medicine for Gas and IndigestionBest Ayurvedic medicine for Gas and Indigestion
Best Ayurvedic medicine for Gas and Indigestion
Swastik Ayurveda
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
Dr. Jyothirmai Paindla
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Swastik Ayurveda
 
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Oleg Kshivets
 
Top-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India ListTop-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India List
SwisschemDerma
 
KDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologistsKDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologists
د.محمود نجيب
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
MedicoseAcademics
 
Knee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdfKnee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdf
vimalpl1234
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
NephroTube - Dr.Gawad
 
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradeshBasavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Dr. Madduru Muni Haritha
 
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Ayurveda ForAll
 
Pictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdfPictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdf
Dr. Rabia Inam Gandapore
 
SURGICAL ANATOMY OF THE RETROPERITONEUM, ADRENALS, KIDNEYS AND URETERS.pptx
SURGICAL ANATOMY OF THE RETROPERITONEUM, ADRENALS, KIDNEYS AND URETERS.pptxSURGICAL ANATOMY OF THE RETROPERITONEUM, ADRENALS, KIDNEYS AND URETERS.pptx
SURGICAL ANATOMY OF THE RETROPERITONEUM, ADRENALS, KIDNEYS AND URETERS.pptx
Bright Chipili
 
Ophthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE examOphthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE exam
KafrELShiekh University
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Saeid Safari
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
Superficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptxSuperficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptx
Dr. Rabia Inam Gandapore
 
NVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control programNVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control program
Sapna Thakur
 

Recently uploaded (20)

BRACHYTHERAPY OVERVIEW AND APPLICATORS
BRACHYTHERAPY OVERVIEW  AND  APPLICATORSBRACHYTHERAPY OVERVIEW  AND  APPLICATORS
BRACHYTHERAPY OVERVIEW AND APPLICATORS
 
Cardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdfCardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdf
 
Best Ayurvedic medicine for Gas and Indigestion
Best Ayurvedic medicine for Gas and IndigestionBest Ayurvedic medicine for Gas and Indigestion
Best Ayurvedic medicine for Gas and Indigestion
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
 
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
 
Top-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India ListTop-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India List
 
KDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologistsKDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologists
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
 
Knee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdfKnee anatomy and clinical tests 2024.pdf
Knee anatomy and clinical tests 2024.pdf
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
 
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradeshBasavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
 
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
 
Pictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdfPictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdf
 
SURGICAL ANATOMY OF THE RETROPERITONEUM, ADRENALS, KIDNEYS AND URETERS.pptx
SURGICAL ANATOMY OF THE RETROPERITONEUM, ADRENALS, KIDNEYS AND URETERS.pptxSURGICAL ANATOMY OF THE RETROPERITONEUM, ADRENALS, KIDNEYS AND URETERS.pptx
SURGICAL ANATOMY OF THE RETROPERITONEUM, ADRENALS, KIDNEYS AND URETERS.pptx
 
Ophthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE examOphthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE exam
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
 
Superficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptxSuperficial & Deep Fascia of the NECK.pptx
Superficial & Deep Fascia of the NECK.pptx
 
NVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control programNVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control program
 

Stratomod ASHG 2023

  • 1. Model results: SNVs and INDELs • Despite advances in sequencing technology, correctly identifying complex variation in the human genome remains challenging • Challenging variants have general characteristics, but we lack a data-driven model to quantifiably link these characteristics with likelihood of successful identification • We are using explainable boosting machines (EBMs) to model variant calling accuracy as a function of genomic context, with specific goals to: • understand sequencing errors to enable more precise stratifications when benchmarking • predict which types of variants and genome contexts a variant caller might miss Introduction Generating Features from Genomic Regions StratoMod: Using Machine Learning Models to Understand Errors in Human Genomic Variant Calling N. Dwarshuis1 , J. Wagner1 , N. Olson1 , J. McDaniel1 , P. Tonner2 , F Sedlazeck3 , J.M. Zook1 1) Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 2) Information Technology Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 3) Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA Analysis Pipeline Overview Genome in a Bottle Consortium Explainable Boosting Machines Transparency Accuracy 𝑔( 𝑦 )= 𝑓 1 (𝑥1 )+ 𝑓 2 (𝑥2)+…+ 𝑓 1, 2 (𝑥1 , 𝑥2 )+… g: logit link function fk: decision tree with feature k EBM: GAM with interaction terms  Flexibility of blackbox models (random forest, XGBoost)  Interpretability of simpler models  Efficiently fits 2nd order terms Nori et al, arXiv. 2019 Lou et al. SIGKDD Proc. 2012 github.com/interpretml/interpret region y 1 TP 2 TP 3 FP 4 TP … region REF ALT 0 A AAT 1 C G 2 G GG 3 CG G … region REF ALT 0 A AAT 1 C G 2 CG G … region y x1 x2 x3 1 TP 2 TP 3 FP 4 TP … region x1 x2 x3 1 2 3 4 … Benchmark variants (ground truth) Comparison variants Compare variants Variant labels w/ regions Join features and labels by region Genomic Region Features region x1 x2 x3 1 2 3 4 … region x1 x2 x3 1 2 3 4 … Fit EBM on y vs x’s Interpretation Labels TP: variant in both benchmark and comparison FP: variant only in comparison (error) Model Fit Shows (transparently) the genomic contexts that are likely to lead to incorrect variant calling Feature Class Description Segmental Duplications Highly similar sequences >1000b repeated throughout genome Tandem Repeats Short repeated patterns (eg TATATA) Transposable Elements Sequences that copy themselves to other locations Other features included: In general, variants in highly repetitive regions are hard to identify correctly Example: imperfect homopolymers – regions (almost entirely) of just one base repeated AAAAAAAAAAAAA GGGGGGGGGCGGGGGGGGGGGG CCCCCCCCCCCCCCCCCCC Use length of these regions as a feature Funding: NIST Use-Inspired AI Program (FARSAIT) BioRxiv preprint: https://doi.org/10.1101/2023.01.20.524401 Contact: njd2@nist.gov • Parameters: 19 main effects, 18 interactions (all features crossed with PCR-free vs PCR-plus) • Trained on false positives (FPs) in unfiltered DeepVariant candidate calls vs true positives (TPs) • FP errors are overall less likely for PCR-Free • The length at which FP error rate exceeds baseline (dotted line) is longer in PCR-Free • FP errors are equally likely1 in both technologies • FP errors are likely in A/T and G/C homopolymers >15 and any length respectively INDELs SNVs Use case 1: assessing FP errors in PCR-free/plus Illumina sequencing methods • Parameters: 17 main effects, 16 interactions with vcf input file to compare pipelines • Modeled missed/incorrectly filtered variants by training on false negatives (FNs) in final DeepVariant callsets vs TPs Use case 2: assessing FN error likelihood in HiFi and Illumina analysis pipelines 1. “score” on each y-axis is in logistic space, and each plot is the sum of the indicated feature, its interaction with VCF input, and the VCF input main effect 2. Difficult to map regions defined from GIAB v3.0 stratifications INDELs SNVs Performance was consistent across genomic samples • DeepVariant/Benchmark had more overlap than StratoMod/BenchMark • StratoMod/DeepVariant had significant non- benchmark overlap • Most ClinVar variants that were predicted to be missed were in hard-to-map regions • This has an additional advantage of also identifying difficult variants in large INDELs and tandem repeats Element shows advantage for longer AT homopolymers Linear Model y = B1x1 + B2x2 + ... Generalized Linear Model g(y) = B1x1 + B2x2 + ... Additive Model y = f1(x1) + f2(x2) + ... Generalized Additive Model g(y) = f1(x1) + f2(x2) + ... Full Complexity Model y = f(x1, x2, ...) Shared (p(FN) > 0.9) Non-Shared(p(FN) > 0.9)