Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Aug2014 nist rm development plans

591 views

Published on

Aug2014 nist rm development plans

Published in: Health & Medicine
  • Be the first to comment

Aug2014 nist rm development plans

  1. 1. NIST Reference Material Development Plans August 2014
  2. 2. NIST RM Development Plans Genome(s) Q4 2014 Q1 2015 Q2 2015 Q3 2015 Q4 2015 HG- 001/NA1287 8 Release NIST RM8398; Preliminary large deletions Refined Structural Variants HG-002 to HG-004 (Ashkenazim trio) Illumina, Complete Genomics, Ion, BioNano, and SOLiD data Preliminary SNPs/indels; 100x PacBio data; Illumina assembled long reads Refined SNPs/indels; Preliminary SVs Refined Structural Variants NIST RMs 8391/8392 release HG-005 (son in Asian trio) Illumina, Complete Genomics, Ion, BioNano, and SOLiD data Illumina assembled long reads Preliminary SNPs/indels Refined SNPs/indels; Refined Structural Variants NIST RM8393 release
  3. 3. Preliminary uses of high-confidence NIST-GIAB genotypes for NA12878 • NIST have released several versions of high- confidence genotypes for its pilot RM • These data are presently being used for benchmarking – prior to release of RMs – SNPs & indels • ~77% of the genome
  4. 4. Data Release Plans Individual Datasets • Uploaded to GIAB FTP site as it is collected • May include raw reads, aligned reads, and variant/reference calls Integrated High-confidence Calls • First develop SNP, indel, and homozygous reference calls • Then develop SV and non- SV calls • Released calls are versioned • Preliminary callsets will be made available to be critiqued • Data jamboree??
  5. 5. Pilot RM (NA12878) • HapMap/1000 Genomes sample • Lots of public data and analyses • Not consented for commercial redistribution • Data from pedigree available and analyzed • ~8000 units for NIST RM • High-confidence calls released – integrates multiple datasets and phased pedigree analysis • Developing SV calls • Planned release as NIST RM8398 in Q4 2014
  6. 6. Ashkenazim PGP trio • Personal Genome Project trio (huAA53E0/hu8E87A9/hu6E 4515) • Father/mother/son at Coriell (GM24143/GM24149/GM2 4385) • Consented for commercial redistribution • Most short-read data will be available Q3 2014 • 100x PacBio WGS completed ~Q1 2015 • 10x Illumina assembled long reads for son ~Q1 2015 • Planned NIST RM release ~Q4 2015 – NIST RM 8391 will be only the son (~8000 units) – NIST RM 8392 will contain all 3 family members (~2500 units)
  7. 7. Asian PGP trio • Personal Genome Project trio (hu91BD69/hu38168C/hu CA017E) • Father/mother/son at Coriell (GM24695/GM24694/GM 24631) • Only the son planned for NIST RM but trio will be characterized • Consented for commercial redistribution • Most short-read data will be available Q3-Q4 2014 • 10x Illumina assembled long reads for son ~Q1 2015 • Planned NIST RM release ~Q4 2015 – NIST RM 8393 will be only the son (~11000 units)
  8. 8. New Platform-specific (-independent?) Integration Method Normalize and take union of calls Simple SNPs/indels Illumina/SOLiD – GATK HC force calls Ion – TVC force calls If all biased or low qual, uncertain Elseif all concordant, high- conf Elseif all unbiased are concordant, high-conf Else uncertain CG – use Ref file Complex Variants Use vcfeval or SMASH for sequential pair- wise comparison
  9. 9. Integration Method Plans • Implement new integration methods on the cloud – Easier for… • distributed analysis • scalability • transparency • others to reproduce results • First, analyze NA12878 RM data with new methods to ensure they work well • Then, apply to PGP trios

×