Aug2014 nist integration plans

422 views

Published on

Aug2014 nist integration plans

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
422
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Aug2014 nist integration plans

  1. 1. Pedigree and Trio Data Integration Justin Zook NIST
  2. 2. Integration of Pedigree with NIST arbitrated calls High-confidence • In NIST high-confidence set and not in the RTG phase inconsistent set. • In NIST low-confidence set and polymorphic in either the RTG or PG phase consistent sets Uncertain • Homopolymers not in phase consistent sets • In NIST low-confidence set and not polymorphic in either the RTG or PG phase consistent sets • In RTG or PG and homozygous reference in NIST • Calls missing from our high and low confidence calls and falls outside our high-confidence regions • NA12878 SVs in dbVar and known segmental duplications
  3. 3. Integration of Pedigree with multi-platform calls NIST-PASS Both 3.04M RTG- PHQ 12.6k NIST-PASS = NIST passing calls v.2.19 NIST-All = NIST v2.19 calls, including filtered calls if they are not likely homozygous reference RTG-PHQ = Real Time Genomics Phase Consistent calls with any phase quality RTG-PHQ>20 = Real Time Genomics Phase Consistent calls with phase quality > 20 RTG-PHI = Real Time Genomics Phase Inconsistent calls PlatGen = Platinum Genomes Phase Consistent calls PlatGenPoly = Platinum Genomes Phase Consistent calls that are polymorphic in the pedigree Bold means included in the final call set Italic means removed + 50bp on either side from the final bed file NIST- PASS- noPHQ 23.6k RTG- PHI (174k) NIST- PASS 55.6k Both 31.8k Plat Gen NIST- PASS- noPHQ -noPHI 17k Both 6.6k Plat Gen NIST- PASS- noPHQ- PHI (18k) Both 13.5k
  4. 4. Integration of Pedigree with multi-platform calls – NIST filtered Both 2.74M RTG- PHQ>0 61.0k NIST-PASS = NIST passing calls v.2.19 NIST-All = NIST v2.19 calls, including filtered calls if they are not likely homozygous reference RTG-PHQ = Real Time Genomics Phase Consistent calls with any phase quality RTG-PHQ>0 = Real Time Genomics Phase Consistent calls with phase quality > 0 RTG-PHI = Real Time Genomics Phase Inconsistent calls PlatGen = Platinum Genomes Phase Consistent calls PlatGenPoly = Platinum Genomes Phase Consistent calls that are polymorphic in the pedigree Bold means included in the final call set Italic means removed + 50bp on either side from the final bed file NIST- All- PHQ>0 364k NIST- PASS- PHQ (664k) NIST- All Both 2.37M NIST- All- PlatGen poly- noNIST PASS 62k NIST- All- PHQ>0 - noNIST PASS 134k Both 230k Both 2.05M PlatGen poly 32.7k NIST- All- PlatGen poly 364k NIST- PASS- PHQ (664k) NIST- All Both 2.37M
  5. 5. New Platform-specific Integration Method for PGP Trios Normalize and take union of calls Simple SNPs/indels Illumina/SOLiD – GATK HC force calls Ion – TVC force calls If all biased or low qual, uncertain Elseif all concordant, high- conf Elseif all unbiased are concordant, high-conf Else uncertain CG – use Ref file Complex Variants Use vcfeval or SMASH for sequential pair- wise comparison
  6. 6. Integration Method Plans • Implement new integration methods on the cloud – Easier for others to reproduce results • First, analyze NA12878 RM data with new methods to ensure they work well • Then, apply to PGP trios

×