Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Jan2015 GIAB intro, Update, and Data Analysis Planning
1. Genome in a Bottle Consortium
January 2015
Stanford University
Reference Materials for Clinical Applications of Human Genome
Sequencing
Marc Salit, Ph.D. and Justin Zook, Ph.D
National Institute of Standards and Technology
Advances in Biological/Medical Measurement Science
(ABMS @ Stanford)
2. GIAB Scope
• The Genome in a Bottle Consortium is
developing the reference materials, reference
methods, and reference data needed to assess
confidence in human whole genome variant
calls.
• A principal motivation for this consortium is to
enable performance assessment of
sequencing and science-based regulatory
oversight of clinical sequencing.
3. Genome in a Bottle
Consortium Development
• NIST met with sequencing
technology developers to assess
standards needs
– Stanford, June 2011
• Open, exploratory workshop
– ASHG, Montreal, Canada
– October 2011
• Small, invitational workshop at
NIST to develop consortium for
human genome reference
materials
– FDA, NCBI, NHGRI, NCI, CDC, Wash
U, Broad, technology developers,
clinical labs, CAP, PGP, Partners,
ABRF, others
– developed draft work plan
– April 2012
• Open, public meetings of GIAB
– August 2012 at NIST
– March 2013 at Xgen
– August 2013 at NIST
– January 2014 at Stanford
– August 2014 at NIST
– January 2015 at Stanford
• Website
– www.genomeinabottle.org
4. Well-characterized, stable RMs
• Obtain metrics for validation,
QC, QA, PT
• Determine sources and types
of bias/error
• Learn to resolve difficult
structural variants
• Improve reference genome
assembly
• Optimization
– integration of data from
multiple platforms
– sequencing and analysis
• Enable regulated applications Comparison of SNP Calls for
NA12878 on 2 platforms, 3
analysis methods
5. Measurement Process
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
• gDNA reference
materials will be
developed to
characterize
performance of a part
of process
– materials will be
certified for their
variants against a
reference sequence,
with confidence
estimates
genericmeasurementprocess
6. • NIST working with GiaB
to select genomes
• Current plan
– NA12878 HapMap
sample as Pilot sample
• part of 17-member
pedigree
– trios from PGP as more
complete set
• 2 trios, focus on children
• varying biogeographic
ancestry
12889 12890 12891 12892
12877 12878
12879 12880 12881 12882 12883 12884 12885 1288712886 12888 12893
CEPH Utah Pedigree 1463
Putting “Genomes” in Bottles
11 children, Birth Order Redacted
8. Genome in a Bottle Working Groups
Reference Material
Selection
& Design
Andrew Grupe,
Celera
•Develop prioritized list
of whole human
genomes for Reference
Materials
•Identify candidate
approaches and
materials for artificial
RMs
•Develop prioritized
list
Meaurements for
Reference Material
Characterization
Mike Eberle, Illumina
•Develop consensus
plan for experimental
characterization of
Reference Materials
Bioninformatics,
Data Integration,
and Data
Representation
Chunlin Xiao, NCBI
•Develop plan for
integrating
experimental data and
forming consensus
variant calls and
confidence estimates
•Develop consensus
plan for data
representation
Performance Metrics
& Figures of Merit
Deanna Church,
Personalis
•User interface to the
Genome-in-a-Bottle
Reference Material
•“Dashboard”
•what an end user will
see and report to
understand and
describe the
performance of their
experiment
•variant call accuracy
•process performance
measures to enable
optimization
9. Update
Zook et al., Nature Biotechnology, 2014.
• methods to develop
SNP/indel call set
described in manuscript
• broad and quick
adoption of call set for
benchmarking
– struck nerve
10. Preliminary uses of high-confidence
NIST-GIAB genotypes for NA12878
• NIST have released
several versions of high-
confidence genotypes
for its pilot RM
• These data are
presently being used for
benchmarking
– prior to release of RMs
– SNPs & indels
• ~77% of the genome
11. Highlights
This workshop
• Pilot genome release and
use
• Coordinating analyses for
PGP GIAB Trios
• Working groups
– Spike-in mutation interlab,
FFPE
– FTP site, analysis coordination
– GA4GH
• GIAB papers
Future GIAB work
• Beyond support,
improvement/development
and maintenance of existing
GIAB products…
– What future work should
GIAB do that would uniquely
take advantage of the
momentum we’ve built?
12. Agenda
Thursday
• Breakfast
• Welcome and Status Update
• Using the Pilot RM
• Break
• Coordination of PGP analyses
• Lunch (provided)
• Working Group Breakout
Discussions
• Break
• Discussion about Planned
GIAB papers
• Informal discussions
• Reception
Friday
• Breakfast
• Working Group leaders
present plans and discussion
• Break
• Future GIAB work
• Lunch (provided)
• Steering committee meeting
13. Agenda
Monday
• Breakfast and registration
• Welcome and Context Setting
• NIST RM Update and Status Report
• Charge to Working Groups
• Coffee Break
• Working Group Breakout Discussions
• Lunch (provided)
• Informal Working Group Reports
• Coffee Break
• Breakout Topical Discussions
– Topic #1: Moving beyond the 'easy'
variants and regions of the genome
– Topic #2: Selecting future genomes for
Reference Materials
Tuesday
• Breakfast and registration
• Use cases: Experiences using the pilot
Reference Material
• Discussion of plans to release pilot
Reference Material
• Coffee Break
• Working Group Breakout discussions
• Lunch (provided)
• Working Group leaders present plans
and discussion
• Steering committee Overview
• First meeting of the Steering
Committee (others adjourn)
Please Note
Slides will be made available on SlideShare after
the workshop (see genomeinabottle.org).
Tweets are welcome unless the speaker requests
otherwise. Please use #giab as the hashtag.
14. What’s the future of GIAB?
• What is GIAB uniquely positioned to
do?
– how will we know when we’re done?
• If we do other stuff, are we the best
cohort to do it?
• Other biogeographical ancestry
groups?
• Cancer?
– spike-in controls
– whole-genomes
• tumor/normal?
• Create list of mutattions for spike-ins
for germline
• Somatic genomes other than cancer
• Prenatal
• Forensics – decay of DNA
• Transcriptome?
• Epigenome?
• Interpretation standards?
– functional
– clinical
15. Others working in this space…
Well-characterized genomes
• Illumina Platinum Genomes
• CDC GeT-RM
• Korean Genome Project
• Human Longevity, Inc.
• Hyditaform mole haploid
cell line
• Genome Reference
Consortium
Performance Metrics
• Global Alliance for
Genomics and Health
Benchmarking Team
• NCBI/CDC GeT-RM Browser
• GCAT website
17. Data Release Plans
Individual Datasets
• Uploaded to GIAB FTP site
as it is collected
• May include raw reads,
aligned reads, and
variant/reference calls
Integrated High-confidence Calls
• First develop SNP, indel, and
homozygous reference calls
• Then develop SV and non-
SV calls
• Released calls are versioned
• Preliminary callsets will be
made available to be
critiqued
18. Pilot RM (NA12878)
• Developing
reproducible methods
for new integrated high-
confidence SNPs/indels
• Illumina Platinum
Genomes released
phased pedigree calls in
Dec 2014
– Blog will be posted
– also working on SVs
• Developing SV calls
– High-confidence
deletions and pre-print
will be released Feb
2015
• Planned release as NIST
RM8398 in April 2015
19. Ashkenazim PGP trio
Short reads
• Completed
– 300x Illumina paired end on
trio
– Complete Genomics
– Ion exome
• Scheduled
– Illumina mate-pair
– possibly SOLiD
Long reads
• Completed
– 20x/8x/8x PacBio
– BioNano Genomics
• Scheduled
– 60x/30x/30x PacBio (or more)
– custom moleculo
20. Ashkenazim Jewish PGP RM Trio
Dataset Characteristics Coverage Availability Good for…
Illumina Paired-
end
150x150bp ~300x/individu
al
Fastq on ftp SNPs/indels/so
me SVs
Illumina Long
Mate pair
~6000 bp insert ~40x/individual Feb-Mar 2015 SVs
Illumina
“moleculo”
Custom library ~30x by long
fragments
Feb-Mar 2015 SVs/phasing/as
sembly
Complete
Genomics
100x/individual On ftp SNPs/indels/so
me SVs
Complete
Genomics
LFR ?? SNPs/indels/ph
asing
Ion Proton Exome 1000x/individu
al
On SRA SNPs/indels in
exome
BioNano
Genomics
Feb 2015 SVs/assembly
PacBio ~10kb reads ~120-150x on
AJ trio
Finished ~Mar
2015
SVs/phasing/as
sembly/STRs
21. Asian PGP trio
• Similar sequencing to
Ashkenazim trio except
for PacBio
• Only son will be NIST
RM
22. SNP/Indel Integration Method Update
• Implementing new integration methods on
DNAnexus
– Easier for others to reproduce results
– Easier to apply same methods to new genomes
• First, analyzing NA12878 RM data with new
methods to ensure they work well
• Then, apply to PGP trios
23. Reference genome,
Repeatmasker data
SVClassify
Up to 180
annotations
per SV
Aligned sequence
data (BAM file)
List of structural
variants (bed file)
Up to 35
selected
annotations
per SV
One class
methods
Unsupervised
clustering
Support
vector
machine
L1
distance
SV Integration Methods
24. Multidimensional scaling plot for visualizing the 8 clusters. We use a 3 dimensional
representation of the data space which associates 3 MDS coordinates to each site,
one for each dimension. This figure plots MDS-3 against MDS-1
Multi-dimensional scaling showing
separation of 8 clusters
26. Number of sites from each candidate callset that have k=3 L1 Classification scores in
each range, where the score is the proportion p of random sites that are closer to
the center than each candidate site. These numbers are after filtering sites for which
the flanking regions have low mapping quality or high coverage.
<0.68 0.68-0.90 0.90-0.99 >0.99
Random 2,599 773 279 17
Personalis 4 4 182 1,783
1000
Genomes
38 65 557 1,493
One-class scores for Random Non-SVs
and “Validated” Callsets
27. Sample Data Discovery Merge Evaluate Results
Personal
Genome
Illumina
PacBio
Nextera
aCGH, Irys
Breakdancer, Delly,
CNVnator, Pindel, Crest,
SV-STAT, Tiresias
Honey
Multi-Source Reduce & Cluster
Annotation
Sources
Discordant Loci
Database
Hybrid
Assembly
SVatchra
PacBio
Force Calling
Heuristics
Putative SVs
Yes
Yes
Yes
No
No
Parliament SV Integration Pipeline
From Baylor College of Medicine
Data Type Resolution
WGS Illumina HiSeq NGS 48X 100x100 bp paired-end
WGS Illumina Nextera NGS ~2X 100x100 bp mate-pair
WGS SOLiD NGS
3X 35 bp fragment
10X 25x25 bp paired-end
17X 50x50 bp paired-end
WGS PacBio Long-Read 10X ~10,000 bp
Agilent 1M aCGH 1-million-probe oligo array
NimbleGen 2.1M aCGH 2.1-million-probe oligo array
Custom Agilent Array aCGH
44,000 neuropathy-specific oligo
array
BioNano Irys Genome Mapping Whole genome architecture
Sanger-Validated Deletions Manual 42 fully resolved deletions
Program Method
BreakDancer Paired End
Crest Split Read
Pindel Paired End
Delly Paired End / Split Read
CNVnator Read Depth
Tiresias Consensus Sequence
SV-STAT Split Read
SVatchra Paired-End
PBHoney Errors, Tail Mapping, Assembly
pb-jelly.sourceforge.net
28. Potential SV Integration Approach
• GIAB members generate candidate SV calls
• Use SVClassify and Parliament to classify
candidate calls as likely TPs or FPs
29. Analysis Coordinator(s)
• “Face” of the group
• Maintain table of groups doing different types of
analyses
• Recruit groups to do missing analyses
• Make workplan and timeline
• Follow-up with analysis groups
• Coordinate comparisons and integration of
analyses
• Coordinate writing of papers
30. Ashkenazim Jewish PGP RM Trio
Dataset Characteristics Coverage Availability Most useful
for…
Illumina Paired-
end
150x150bp ~300x/individu
al
Fastq on ftp SNPs/indels/so
me SVs
Illumina Long
Mate pair
~6000 bp insert ~40x/individual Feb-Mar 2015 SVs
Illumina
“moleculo”
Custom library ~30x by long
fragments
Feb-Mar 2015 SVs/phasing/as
sembly
Complete
Genomics
100x/individual On ftp SNPs/indels/so
me SVs
Complete
Genomics
LFR ?? SNPs/indels/ph
asing
Ion Proton Exome 1000x/individu
al
On SRA SNPs/indels in
exome
BioNano
Genomics
Long optical
map reads
Feb 2015 SVs/assembly
PacBio ~10kb reads ~120-150x on
AJ trio
Finished ~Mar
2015
SVs/phasing/as
sembly/STRs