Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Giab jan2016 analysis team breakout SNP indel update zook
1. GIAB Analysis Team: SNP/indel
update and SV comparisons
Justin Zook
January 28, 2016
2. Overview of SNP/indel integration process
Find sensitive
variant calls
and callable
regions for
each dataset
Find
“consensus”
calls with
support from
2+
technologies
(and no
other
technologies
disagree)
Use
“consensus”
calls to train
simple one-
class model
for each
dataset and
find
“outliers”
that are less
trustworthy
for each
dataset
Find high-
confidence
calls by using
callable
regions and
“outliers” to
arbitrate
between
datasets
when they
disagree
Find high-
confidence
regions by
taking union
of callable
regions and
subtracting
uncertain
variants and
difficult
regions
Not yet finalized
Most useful to
others to make
easier to use?
Not yet on DNAnexus
3. Preliminary comparisons to 2.19 on
chr20 for NA12878
V2.19
• Bases in 2.19 bed: 51.8Mbp
• Total calls: 73412 (9k indels; 2k in
homopol>10)
• Total calls in 2.19 bed: 73412 (9k
indels)
• Concordant calls: 71669
• Concordant in 3.0 bed: 67657
• FPs in both beds: 0
• FNs in both beds: 9
• Genotype errors: 0
• Allele errors: 1
• Extra calls outside 3.0 bed: 1674
(708 SNPs)
V3.0
• Bases in 3.0 bed: 51.1Mbp
• Total calls: 86886 (13k indels; 4k in
homopol>10)
• Total calls in 3.0 bed: 70202 (7.4k
indels); 2platforms: 64617 (5k
indels)
• Concordant calls: 71669
• Concordant in 3.0 bed: 67657
• FPs in both beds: 0
• FNs in both beds: 0
• Genotype errors: 3
• Allele errors: 0
• Extra calls outside 2.19 bed: 2463
(1285 SNPs)
4. How can we add more difficult
calls/regions to our high-confidence
set?
Develop new method
Confirm subset of calls (e.g., manual inspection of multiple datasets/individuals or targeted
experimental validation)
Make calls available for community curation
Others compare and submit feedback about curation results
NIST critically evaluates results for integration into the GIAB callsets
5. Potential Breakout discussions
• Benchmarking SVs
– Can we use existing tools?
– What should the performance metrics be?
– How stringent is the matching (e.g., correct type, correct size, correct
breakpoints, correct sequence)?
• Confirmation/Validation of SVs
– Design questions for manual inspectors
– When is targeted experimental validation needed/useful?
– Randomly selected vs. stratified by size/type/difficulty…
• How to establish benchmark SVs?
– How many levels of confidence?
• Can we establish confident regions that do not have SVs in order to assess
FP rates?
• Ideas for a hackathon adjacent to August workshop
– Manual curation of SVs and other difficult variants/regions
– SV benchmarking tools
– Manual curation tools for SNPs, indels, and/or SVs
– …
• Other ideas?
6. Actual Breakout discussions
• What criteria should we use to decide when 2 SVs
should be considered to be the “same” and merged?
– For establishing the benchmark
– When comparing to the benchmark
• How should we confirm/validate candidate SVs calls
and establish benchmark SVs?
– questions for manual inspectors
– When is targeted experimental validation needed/useful?
– Randomly selected vs. stratified by size/type/difficulty…
– How many levels of confidence?
• How can we utilize new sophisticated variant
comparison tools to improve our benchmark SNP/indel
callsets and how can we develop high-confidence calls
for GRCh38?