Workgroup 4 Meeting Report
Group leader: Deanna M. Church
Co-leader: Melissa Landrum
Meeting date: Jan 27- Jan 28
Location: Stanford University
Workgroup 4 is tasked with defining how users in the community interact with and
use the GIAB data. During the course of the two-day meeting we focused on aspects
of the user interface that need to be addressed. These topics include defining the
target audience, understanding how these various user groups will interface with the
tools and integrating with visualization tools, such as the GeT-RM browser. The goal
of this workgroup is to produce a specification document by the end of February
Defining the target audience
It is anticipated that a wide variety of users will want to interact with this data. A
prioritized list of users was proposed:
1. Regulators (FDA)
2. Accreditors (CLIA/CAP)
3. Clinical Labs
4. Platform Developers
There were four aspects of tools development that were discussed and need to be
addressed in the specification document.
1. Software development and licensing
Francisco de la Vega presented very nice software for comparing VCF files that
was developed by Real Time Genomics. While the software is freely available it
is not open source; this lead to a discussion of source code availability and
licensing. The workgroup unanimously agreed that software should be open
source. There was less clarity on the licensing requirements but Nils Homer
volunteered to research license types and make recommendations in this
2. Software interface
There was also unanimous agreement that software used to compare user
variant calls to GIAB datasets would need to be accessible via a web interface
and an API.
3. Inputs and outputs
The input and output formats need to be well defined in the specification
document. It is likely we will need some translation tools to help support the
web interface though, as many users of this interface may have difficulty
producing well-formatted VCF files. NCBI is building a suite of tools to handle
4. Development cycles
We are likely better off getting tools out to the community sooner rather than
later so we can get feedback from the community. This means we may need
to be prepared to throw away early versions of software if they don’t fully
meet our needs (which will be better defined as we get feedback from the
5. User feedback
It is critical to provide a mechanism to allow users to provide feedback on the
utility of the tool.
Much of the discussion focused on data analysis. For some aspects of analysis, there
was strong agreement:
Users need to be able to provide a BED file of the regions analyzed so that
they are not overly penalized with false negative calls in regions of the
genome they did not analyze.
Analysis needs to be performed at various levels depending on the users
needs. For example, some users will only want to score variant calling, others
will want to score genotype calls and others may want to score phasing.
It is likely we will need to support >1 ‘Truth set’ though a reasonable default
will need to be chosen.
o It is critical to allow a mechanism that allows users to provide feedback
concerning problems or errors with the ‘Truth set’
We need to have crisp definitions of comparison terms, so that as different
developers begin developing software we can all communicate using the
We will need to support different analysis for different variant types.
o We will need to support all variant types defined in the truth set. This
means no SV/CNVs in phase 1.
o We will not likely have the same level of support for complex variants
as we do for substitution variants.
We need clear definitions for defining sensitivity and specificity calculations.
We need to provide users with concise summaries, but we also need to
provide very detailed analysis files as well.
Ideally the software will produce files suitable for import into the GeT-RM
browser to facilitate manual review of the data.
There was a great deal of discussion about the best way to deal with complex
variants. While it is clear that there is no standard approach to dealing with complex
variant comparison, and that it is a very difficult problem there was no strong
consensus about how important it was for this to be handled robustly in phase 1 of
implementing this software. This will need to be addressed more fully in the