Aug2013 performance metrics working groupPresentation Transcript
Genome in a Bottle
Workgroup 4: Performance Metrics
Q1.What performance metrics can/should
be generated when someone sequences
the GIAB RMs?
Sequence group 1: Initial characterization of the RM to
develop the ‘truth set’
Sequence group 2: People using reference materials to
To do list:
Create a document describing metadata we want to capture (Chris Mason)
Identify fields we can reliably get from sequencers (Chris Mason)
Develop a flat data structure to capture information (Brad Chapman)
Help develop an improved individual genotype reporting format.
Work with CDC group on this.
Work with VCF/gVCF/GVF developers
Q2. How should performance be
subdivided by region?
Q3. How should performance be
subdivided by variant type?
Assembly Region Reproducibility Track (for all RMs)
Highly confident regions
Less confident regions
Regions we can’t reliably call
NA12878 high quality genotype calls
Focus on SNVs and small indels first
Expand to other variant types as we get more confidence
Update definitions as we add additional reference materials.
Q4. How can GIAB help coordinate
the different groups developing
Develop APIs for existing software:
X-prize/Harvard School of Public Health software
BCBio variation (comparison software)
BCBio NextGen (Pipeline for running comparison)
Chris Mason’s software suite
GCAT software (Bioplanet)
GeT-RM browser for visualization