2. Genome in a Bottle
Consortium Development
• NIST met with sequencing
technology developers to assess
standards needs
– Stanford, June 2011
• Open, exploratory workshop
– ASHG, Montreal, Canada
– October 2011
• Small, invitational workshop at
NIST to develop consortium for
human genome reference
materials
– FDA, NCBI, NHGRI, NCI, CDC, Wash
U, Broad, technology developers,
clinical labs, CAP, PGP, Partners,
ABRF, others
– developed draft work plan
– April 2012
• Open, public meeting at NIST to
formally establish consortium,
present draft work plan
– formed working groups
– identified candidate genomes
– established principles of:
• reference material selection
• characterization
• informatics
• performance metrics
– August 2012
• Open, public workshop at XGen
Congress
– March 2013
• Website
– www.genomeinabottle.org
3. Well-characterized, stable RMs
• Obtain metrics for validation,
QC, QA, PT
• Determine sources and types
of bias/error
• Learn to resolve difficult
structural variants
• Improve reference genome
assembly
• Optimization
– integration of data from
multiple platforms
– sequencing and analysis
• Enable regulated applications Comparison of SNP Calls for
NA12878 on 2 platforms, 3
analysis methods
4. Measurement Process
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
• gDNA reference
materials will be
developed to
characterize
performance of a part
of process
– materials will be
certified for their
variants against a
reference sequence,
with confidence
estimates
genericmeasurementprocess
5. • NIST working with GiaB
to select genomes
• Current plan
– NA12878 HapMap
sample as Pilot sample
• part of 17-member
pedigree
– trios from PGP as more
complete set
• 8 trios, focus on children
• varying biogeographic
ancestry
12889 12890 12891 12892
12877 12878
12879 12880 12881 12882 12883 12884 12885 1288712886 12888 12893
CEPH Utah Pedigree 1463
Putting “Genomes” in Bottles
11 children, Birth Order Redacted
6. Genome in a Bottle Working Groups
Reference Material
Selection
& Design
Andrew Grupe,
Celera
•Develop prioritized list
of whole human
genomes for Reference
Materials
•Identify candidate
approaches and
materials for artificial
RMs
•Develop prioritized
list
Meaurements for
Reference Material
Characterization
Mike Eberle, Illumina
•Develop consensus
plan for experimental
characterization of
Reference Materials
Bioninformatics,
Data Integration,
and Data
Representation
Steve Sherry, NCBI
•Develop plan for
integrating
experimental data and
forming consensus
variant calls and
confidence estimates
•Develop consensus
plan for data
representation
Performance Metrics
& Figures of Merit
Justin Johnson
•User interface to the
Genome-in-a-Bottle
Reference Material
•“Dashboard”
•what an end user will
see and report to
understand and
describe the
performance of their
experiment
•variant call accuracy
•process performance
measures to enable
optimization
7. Agenda
Thursday
Welcome and Intro
Integrating large scale sequencing into clinical
practice
Heidi Rehm
Personal Genomics
Michael Snyder
Break/Poster Session
Update on GIAB Progress
Marc Salit
Comparison of NIST, Platinum Genomes, and
other NA12878 call-sets to understand
sequencing performance
Justin Zook
Presentations from related projects
Platinum Genomes
Michael Eberle
NA12878 Trio Analysis
Francisco De La Vega
GeT-RM Project and Genome Browser
Deanna Church
Lunch (on your own in NIST cafeteria)
Working Group Meetings
Reference Material Selection & Design (Lecture
Room E)
Measurements for Reference Material
Characterization (Dining Room A&B)
Bioinformatics, Data Integration, and Data
Representation (Lecture Room A)
Performance Metrics and Figures of Merit
(Lecture Room C)
Friday
Discussion between working groups
Working group reports (Green Auditorium)
Workplan refinement, timeline
Lunch (on your own in NIST cafeteria)
Discussion: Scope of consortium, how to make
decisions
Resource needs, how to meet them, and next
steps
8. Agenda
Thursday
Welcome and Intro
Integrating large scale sequencing into clinical
practice
Heidi Rehm
Personal Genomics
Michael Snyder
Break/Poster Session
Update on GIAB Progress
Marc Salit
Comparison of NIST, Platinum Genomes, and
other NA12878 call-sets to understand
sequencing performance
Justin Zook
Presentations from related projects
Platinum Genomes
Michael Eberle
NA12878 Trio Analysis
Francisco De La Vega
GeT-RM Project and Genome Browser
Deanna Church
Lunch (on your own in NIST cafeteria)
Working Group Meetings
Reference Material Selection & Design (Lecture
Room E)
Measurements for Reference Material
Characterization (Dining Room A&B)
Bioinformatics, Data Integration, and Data
Representation (Lecture Room A)
Performance Metrics and Figures of Merit
(Lecture Room C)
Friday
Discussion between working groups
Working group reports (Green Auditorium)
Workplan refinement, timeline
Lunch (on your own in NIST cafeteria)
Discussion: Scope of consortium, how to make
decisions
Resource needs, how to meet them, and next
steps
Please Note
The plenary sessions of this workshop are being
webcasted (audio & slides) – please use the
microphones when asking questions. Web
attendees can ask questions with chat. Slides will
be made available on SlideShare after the
workshop (see genomeinabottle.org).
Tweets are welcome unless the speaker requests
otherwise. Please use #giab as the hashtag.
10. Consenting Genomes for use as
Reference Materials
• Risk of re-identification
– this is a real risk
– privacy
– implications for family members
• Meaning of possibility of
withdrawal
• Commercial application
– indirect, research
– direct, derived products
• PGP project currently state-of-art
– broad and direct
– test to demonstrate understanding
• “Wild West”
11. NIST Reference Materials
Pilot RM - NA12878
• 8300 10ug vials of NA12878
gDNA @ NIST 4/2013
– Available for sequencing by
GIAB participants
– target for release as NIST RM
2/2014
• SNPs, small indels
• Will be sequenced at ~10 labs
– ~4 technologies, multiple
modes
• Received “Human Subjects
Approval” for release of
NA12878 as NIST RM
Personal Genome Project
• Ashkenazim trio DNA expected
~Dec 2013
• Asian son DNA expected ~Dec
2013
– Parents’ cell lines in process at
Coriell
• “Human subjects review”
close to approval for release of
PGP genomes as NIST RMs
• Plan is 5-6 additional trios of
diverse ancestry
– Ideally, african, asian, hispanic
– What should we do if PGP
doesn’t have trios from each of
these groups?
12. Planned Measurements on NA12878
candidate RM
• NIST
– ~300x total 2x150bp Illumina
over 6 vials of NA12878
– ~100x SOLiD 5500W 2x50bp
coverage
– ~50x SOLiD 5500W 2x50bp
coverage of parents
• Illumina
– PCR-free
– Mate-pair
• Complete Genomics
– Normal pipeline
– LFR pipeline
• NCI
– Ion Proton
– Illumina
– Various libraries
• Garvan
– Illumina exome
• Celera
– Targeted panels
• Cornell Weill
– Illumina
• MTAs pending
– Univ. of Nebraska Medical
– Univ. of Michigan
13. HOW DO WE WANT TO FUNCTION
AS A CONSORTIUM?
What’s our scope?
How do we make decisions?
14. Spectrum of Possibilities
• NIST develops and
disseminated gDNA
RMs with consortium
input
• Consortium functions as
a Standards Body, with
dynamic portfolio and
broad influence
15. Spectrum of Possibilities
NIST develops
and disseminates
gDNA RMs with
consortium input
Consortium
functions as a
Standards Body,
with dynamic
portfolio and
broad influence
16. Scope
Basic Scope
• Develop/disseminate pilot
genome and 8 trios as RMs
– gDNA and reference data
• Develop/disseminate
“Performance Metrics
Suite”
– data repository?
• Documentary Standards to
describe methods?
– through a clinical SDO? CLSI?
IFCC?
Extended Scope
• Other RMs as part of GiaB
portfolio?
– tumor/normal pair
– artifical spike-in controls
• pDNA from NCI
– derived commercial materials
• Cell lines for which we have
reference gDNA
• Such cell lines embedded in
FFPE
– engineered cell lines
• designed as controls for
specific variants
17. Extended Scope
• need process to include
new material/product in
portfolio
– what does it mean to put the
GiaB imprimatur on
something?
• some possible requirements
– guidelines for usage
– methods for characterization
– conduct interlab studies to
establish utility
• how do we decide what to
do?
– need to be
• open, transparent, public
• form consensus
– pragmatic consensus
needs champion and
commitment
• e.g. proposer pilots interlab
• consortium members
participate in interlab
• how do we decide policy
matters?
– see draft data release policy
discussion on this tomorrow after lunch…