Giab jan2016 analysis team breakout SNP indel update zook

•Download as PPTX, PDF•

0 likes•442 views

GenomeInABottle

Health & Medicine

GIAB Analysis Team: SNP/indel
update and SV comparisons
Justin Zook
January 28, 2016

Overview of SNP/indel integration process
Find sensitive
variant calls
and callable
regions for
each dataset
Find
“consensus”
calls with
support from
2+
technologies
(and no
other
technologies
disagree)
Use
“consensus”
calls to train
simple one-
class model
for each
dataset and
find
“outliers”
that are less
trustworthy
for each
dataset
Find high-
confidence
calls by using
callable
regions and
“outliers” to
arbitrate
between
datasets
when they
disagree
Find high-
confidence
regions by
taking union
of callable
regions and
subtracting
uncertain
variants and
difficult
regions
Not yet finalized
Most useful to
others to make
easier to use?
Not yet on DNAnexus

Preliminary comparisons to 2.19 on
chr20 for NA12878
V2.19
• Bases in 2.19 bed: 51.8Mbp
• Total calls: 73412 (9k indels; 2k in
homopol>10)
• Total calls in 2.19 bed: 73412 (9k
indels)
• Concordant calls: 71669
• Concordant in 3.0 bed: 67657
• FPs in both beds: 0
• FNs in both beds: 9
• Genotype errors: 0
• Allele errors: 1
• Extra calls outside 3.0 bed: 1674
(708 SNPs)
V3.0
• Bases in 3.0 bed: 51.1Mbp
• Total calls: 86886 (13k indels; 4k in
homopol>10)
• Total calls in 3.0 bed: 70202 (7.4k
indels); 2platforms: 64617 (5k
indels)
• Concordant calls: 71669
• Concordant in 3.0 bed: 67657
• FPs in both beds: 0
• FNs in both beds: 0
• Genotype errors: 3
• Allele errors: 0
• Extra calls outside 2.19 bed: 2463
(1285 SNPs)

How can we add more difficult
calls/regions to our high-confidence
set?
Develop new method
Confirm subset of calls (e.g., manual inspection of multiple datasets/individuals or targeted
experimental validation)
Make calls available for community curation
Others compare and submit feedback about curation results
NIST critically evaluates results for integration into the GIAB callsets

Potential Breakout discussions
• Benchmarking SVs
– Can we use existing tools?
– What should the performance metrics be?
– How stringent is the matching (e.g., correct type, correct size, correct
breakpoints, correct sequence)?
• Confirmation/Validation of SVs
– Design questions for manual inspectors
– When is targeted experimental validation needed/useful?
– Randomly selected vs. stratified by size/type/difficulty…
• How to establish benchmark SVs?
– How many levels of confidence?
• Can we establish confident regions that do not have SVs in order to assess
FP rates?
• Ideas for a hackathon adjacent to August workshop
– Manual curation of SVs and other difficult variants/regions
– SV benchmarking tools
– Manual curation tools for SNPs, indels, and/or SVs
– …
• Other ideas?

Actual Breakout discussions
• What criteria should we use to decide when 2 SVs
should be considered to be the “same” and merged?
– For establishing the benchmark
– When comparing to the benchmark
• How should we confirm/validate candidate SVs calls
and establish benchmark SVs?
– questions for manual inspectors
– When is targeted experimental validation needed/useful?
– Randomly selected vs. stratified by size/type/difficulty…
– How many levels of confidence?
• How can we utilize new sophisticated variant
comparison tools to improve our benchmark SNP/indel
callsets and how can we develop high-confidence calls
for GRCh38?

What's hot

Giab workshop intro 180125GenomeInABottle

2017 amp benchmarking_poster_justinGenomeInABottle

Aug2015 salit standards architectureGenomeInABottle

GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle

Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3GenomeInABottle

Jan2016 horizon GIABGenomeInABottle

Aug2015 Giab nist integration methodsGenomeInABottle

GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle

Tools for Using NIST Reference MaterialsGenomeInABottle

Giab aug2015 intro and update 150821.pptxGenomeInABottle

Jan2016 rm selection and design breakout summaryGenomeInABottle

161115 precision fda giabGenomeInABottle

Giab ashg 2017GenomeInABottle

Aug2015 horizon diagnosticsGenomeInABottle

Aug2013 illumina platinum genomesGenomeInABottle

2017 agbt benchmarking_posterGenomeInABottle

170120 giab stanford genetics seminarGenomeInABottle

171017 giab for giab grc workshopGenomeInABottle

Giab product and tool roadmap small variantsGenomeInABottle

Hansen SVanalyzer Progress toward precision in analysis of Genomic SVsGenomeInABottle

What's hot (20)

Giab workshop intro 180125

2017 amp benchmarking_poster_justin

Aug2015 salit standards architecture

GIAB-GRC workshop oct2015 giab introduction 151005

Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3

Jan2016 horizon GIAB

Aug2015 Giab nist integration methods

GIAB Integrating multiple technologies to form benchmark SVs 180517

Tools for Using NIST Reference Materials

Giab aug2015 intro and update 150821.pptx

Jan2016 rm selection and design breakout summary

161115 precision fda giab

Giab ashg 2017

Aug2015 horizon diagnostics

Aug2013 illumina platinum genomes

2017 agbt benchmarking_poster

170120 giab stanford genetics seminar

171017 giab for giab grc workshop

Giab product and tool roadmap small variants

Hansen SVanalyzer Progress toward precision in analysis of Genomic SVs

Similar to Giab jan2016 analysis team breakout SNP indel update zook

Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes

Robust inference via generative classifiers for handling noisy labelsKimin Lee

Advanced topics researchkieran122

Using nvivo to tell the story, the power of codingQSR International

Can we induce change with what we measure?Michaela Greiler

GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle

Using NVivo to tell the story - the power of codingQSR International

REVIEW PPT.pptxSaravanaD2

Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack

Machinr Learning and artificial_Lect1.pdfSaketBansal9

SIGIR Tutorial on IR Evaluation: Designing an End-to-End Offline Evaluation P...Jin Young Kim

Introducing Multi Valued Vectors Fields in Apache LuceneSease

From ensembles to computer networksCSIRO

150219 agbt giab_poster_marcGenomeInABottle

Modern Perspectives on Recommender Systems and their Applications in MendeleyMaya Hristakeva

Recommendation engine Using Genetic AlgorithmVaibhav Varshney

Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...Databricks

FutureOfTesting2008vipulkocher

Talk@rmit 09112017Shuai Zhang

Finding the Perfect Donor Database in an Imperfect World (11NTCDB)Miminten

Similar to Giab jan2016 analysis team breakout SNP indel update zook (20)

Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com

Robust inference via generative classifiers for handling noisy labels

Advanced topics research

Using nvivo to tell the story, the power of coding

Can we induce change with what we measure?

GIAB Technical Germline Benchmark roadmap discussion

Using NVivo to tell the story - the power of coding

REVIEW PPT.pptx

Modern Perspectives on Recommender Systems and their Applications in Mendeley

Machinr Learning and artificial_Lect1.pdf

SIGIR Tutorial on IR Evaluation: Designing an End-to-End Offline Evaluation P...

Introducing Multi Valued Vectors Fields in Apache Lucene

From ensembles to computer networks

150219 agbt giab_poster_marc

Modern Perspectives on Recommender Systems and their Applications in Mendeley

Recommendation engine Using Genetic Algorithm

Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...

FutureOfTesting2008

Talk@rmit 09112017

Finding the Perfect Donor Database in an Imperfect World (11NTCDB)

Recently uploaded

Most Beautiful Call Girl in Bangalore Contact on WhatsappInaaya Sharma

Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Dipal Arora

Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...narwatsonia7

Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...parulsinha

Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...chandars293

Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeCall Girls Delhi

Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook

Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Dipal Arora

Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X79953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora

Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Ishani Gupta

8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadGENUINE ESCORT AGENCY

Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...mahaiklolahd

Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora

VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...jageshsingh5554

Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora

Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...chandars293

Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY

Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...Dipal Arora

Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableGENUINE ESCORT AGENCY

Recently uploaded (20)

Most Beautiful Call Girl in Bangalore Contact on Whatsapp

Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...

Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...

Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...

Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...

Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time

Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...

Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...

Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7

Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available

Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...

8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad

Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...

Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available

VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...

Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available

Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...

Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available

Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...

Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available

Giab jan2016 analysis team breakout SNP indel update zook

1. GIAB Analysis Team: SNP/indel update and SV comparisons Justin Zook January 28, 2016

2. Overview of SNP/indel integration process Find sensitive variant calls and callable regions for each dataset Find “consensus” calls with support from 2+ technologies (and no other technologies disagree) Use “consensus” calls to train simple one- class model for each dataset and find “outliers” that are less trustworthy for each dataset Find high- confidence calls by using callable regions and “outliers” to arbitrate between datasets when they disagree Find high- confidence regions by taking union of callable regions and subtracting uncertain variants and difficult regions Not yet finalized Most useful to others to make easier to use? Not yet on DNAnexus

3. Preliminary comparisons to 2.19 on chr20 for NA12878 V2.19 • Bases in 2.19 bed: 51.8Mbp • Total calls: 73412 (9k indels; 2k in homopol>10) • Total calls in 2.19 bed: 73412 (9k indels) • Concordant calls: 71669 • Concordant in 3.0 bed: 67657 • FPs in both beds: 0 • FNs in both beds: 9 • Genotype errors: 0 • Allele errors: 1 • Extra calls outside 3.0 bed: 1674 (708 SNPs) V3.0 • Bases in 3.0 bed: 51.1Mbp • Total calls: 86886 (13k indels; 4k in homopol>10) • Total calls in 3.0 bed: 70202 (7.4k indels); 2platforms: 64617 (5k indels) • Concordant calls: 71669 • Concordant in 3.0 bed: 67657 • FPs in both beds: 0 • FNs in both beds: 0 • Genotype errors: 3 • Allele errors: 0 • Extra calls outside 2.19 bed: 2463 (1285 SNPs)

4. How can we add more difficult calls/regions to our high-confidence set? Develop new method Confirm subset of calls (e.g., manual inspection of multiple datasets/individuals or targeted experimental validation) Make calls available for community curation Others compare and submit feedback about curation results NIST critically evaluates results for integration into the GIAB callsets

5. Potential Breakout discussions • Benchmarking SVs – Can we use existing tools? – What should the performance metrics be? – How stringent is the matching (e.g., correct type, correct size, correct breakpoints, correct sequence)? • Confirmation/Validation of SVs – Design questions for manual inspectors – When is targeted experimental validation needed/useful? – Randomly selected vs. stratified by size/type/difficulty… • How to establish benchmark SVs? – How many levels of confidence? • Can we establish confident regions that do not have SVs in order to assess FP rates? • Ideas for a hackathon adjacent to August workshop – Manual curation of SVs and other difficult variants/regions – SV benchmarking tools – Manual curation tools for SNPs, indels, and/or SVs – … • Other ideas?

6. Actual Breakout discussions • What criteria should we use to decide when 2 SVs should be considered to be the “same” and merged? – For establishing the benchmark – When comparing to the benchmark • How should we confirm/validate candidate SVs calls and establish benchmark SVs? – questions for manual inspectors – When is targeted experimental validation needed/useful? – Randomly selected vs. stratified by size/type/difficulty… – How many levels of confidence? • How can we utilize new sophisticated variant comparison tools to improve our benchmark SNP/indel callsets and how can we develop high-confidence calls for GRCh38?

Giab jan2016 analysis team breakout SNP indel update zook

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Giab jan2016 analysis team breakout SNP indel update zook

Similar to Giab jan2016 analysis team breakout SNP indel update zook (20)

More from GenomeInABottle

More from GenomeInABottle (20)

Recently uploaded

Recently uploaded (20)

Giab jan2016 analysis team breakout SNP indel update zook