NIST Reference Material
Development Plans
August 2014
NIST RM Development Plans
Genome(s) Q4 2014 Q1 2015 Q2 2015 Q3 2015 Q4 2015
HG-
001/NA1287
8
Release NIST
RM8398;
Prelimin...
Preliminary uses of high-confidence
NIST-GIAB genotypes for NA12878
• NIST have released
several versions of high-
confide...
Data Release Plans
Individual Datasets
• Uploaded to GIAB FTP site
as it is collected
• May include raw reads,
aligned rea...
Pilot RM (NA12878)
• HapMap/1000
Genomes sample
• Lots of public data and
analyses
• Not consented for
commercial
redistri...
Ashkenazim PGP trio
• Personal Genome Project
trio
(huAA53E0/hu8E87A9/hu6E
4515)
• Father/mother/son at
Coriell
(GM24143/G...
Asian PGP trio
• Personal Genome Project
trio
(hu91BD69/hu38168C/hu
CA017E)
• Father/mother/son at
Coriell
(GM24695/GM2469...
New Platform-specific (-independent?)
Integration Method
Normalize and
take union of calls
Simple
SNPs/indels
Illumina/SOL...
Integration Method Plans
• Implement new integration methods on the cloud
– Easier for…
• distributed analysis
• scalabili...
Upcoming SlideShare
Loading in...5
×

Aug2014 nist rm development plans

340

Published on

Aug2014 nist rm development plans

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
340
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Aug2014 nist rm development plans

  1. 1. NIST Reference Material Development Plans August 2014
  2. 2. NIST RM Development Plans Genome(s) Q4 2014 Q1 2015 Q2 2015 Q3 2015 Q4 2015 HG- 001/NA1287 8 Release NIST RM8398; Preliminary large deletions Refined Structural Variants HG-002 to HG-004 (Ashkenazim trio) Illumina, Complete Genomics, Ion, BioNano, and SOLiD data Preliminary SNPs/indels; 100x PacBio data; Illumina assembled long reads Refined SNPs/indels; Preliminary SVs Refined Structural Variants NIST RMs 8391/8392 release HG-005 (son in Asian trio) Illumina, Complete Genomics, Ion, BioNano, and SOLiD data Illumina assembled long reads Preliminary SNPs/indels Refined SNPs/indels; Refined Structural Variants NIST RM8393 release
  3. 3. Preliminary uses of high-confidence NIST-GIAB genotypes for NA12878 • NIST have released several versions of high- confidence genotypes for its pilot RM • These data are presently being used for benchmarking – prior to release of RMs – SNPs & indels • ~77% of the genome
  4. 4. Data Release Plans Individual Datasets • Uploaded to GIAB FTP site as it is collected • May include raw reads, aligned reads, and variant/reference calls Integrated High-confidence Calls • First develop SNP, indel, and homozygous reference calls • Then develop SV and non- SV calls • Released calls are versioned • Preliminary callsets will be made available to be critiqued • Data jamboree??
  5. 5. Pilot RM (NA12878) • HapMap/1000 Genomes sample • Lots of public data and analyses • Not consented for commercial redistribution • Data from pedigree available and analyzed • ~8000 units for NIST RM • High-confidence calls released – integrates multiple datasets and phased pedigree analysis • Developing SV calls • Planned release as NIST RM8398 in Q4 2014
  6. 6. Ashkenazim PGP trio • Personal Genome Project trio (huAA53E0/hu8E87A9/hu6E 4515) • Father/mother/son at Coriell (GM24143/GM24149/GM2 4385) • Consented for commercial redistribution • Most short-read data will be available Q3 2014 • 100x PacBio WGS completed ~Q1 2015 • 10x Illumina assembled long reads for son ~Q1 2015 • Planned NIST RM release ~Q4 2015 – NIST RM 8391 will be only the son (~8000 units) – NIST RM 8392 will contain all 3 family members (~2500 units)
  7. 7. Asian PGP trio • Personal Genome Project trio (hu91BD69/hu38168C/hu CA017E) • Father/mother/son at Coriell (GM24695/GM24694/GM 24631) • Only the son planned for NIST RM but trio will be characterized • Consented for commercial redistribution • Most short-read data will be available Q3-Q4 2014 • 10x Illumina assembled long reads for son ~Q1 2015 • Planned NIST RM release ~Q4 2015 – NIST RM 8393 will be only the son (~11000 units)
  8. 8. New Platform-specific (-independent?) Integration Method Normalize and take union of calls Simple SNPs/indels Illumina/SOLiD – GATK HC force calls Ion – TVC force calls If all biased or low qual, uncertain Elseif all concordant, high- conf Elseif all unbiased are concordant, high-conf Else uncertain CG – use Ref file Complex Variants Use vcfeval or SMASH for sequential pair- wise comparison
  9. 9. Integration Method Plans • Implement new integration methods on the cloud – Easier for… • distributed analysis • scalability • transparency • others to reproduce results • First, analyze NA12878 RM data with new methods to ensure they work well • Then, apply to PGP trios
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×