Your SlideShare is downloading. ×
Aug2014 nist integration plans
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Aug2014 nist integration plans

160
views

Published on

Aug2014 nist integration plans

Aug2014 nist integration plans

Published in: Health & Medicine

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
160
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Pedigree and Trio Data Integration Justin Zook NIST
  • 2. Integration of Pedigree with NIST arbitrated calls High-confidence • In NIST high-confidence set and not in the RTG phase inconsistent set. • In NIST low-confidence set and polymorphic in either the RTG or PG phase consistent sets Uncertain • Homopolymers not in phase consistent sets • In NIST low-confidence set and not polymorphic in either the RTG or PG phase consistent sets • In RTG or PG and homozygous reference in NIST • Calls missing from our high and low confidence calls and falls outside our high-confidence regions • NA12878 SVs in dbVar and known segmental duplications
  • 3. Integration of Pedigree with multi-platform calls NIST-PASS Both 3.04M RTG- PHQ 12.6k NIST-PASS = NIST passing calls v.2.19 NIST-All = NIST v2.19 calls, including filtered calls if they are not likely homozygous reference RTG-PHQ = Real Time Genomics Phase Consistent calls with any phase quality RTG-PHQ>20 = Real Time Genomics Phase Consistent calls with phase quality > 20 RTG-PHI = Real Time Genomics Phase Inconsistent calls PlatGen = Platinum Genomes Phase Consistent calls PlatGenPoly = Platinum Genomes Phase Consistent calls that are polymorphic in the pedigree Bold means included in the final call set Italic means removed + 50bp on either side from the final bed file NIST- PASS- noPHQ 23.6k RTG- PHI (174k) NIST- PASS 55.6k Both 31.8k Plat Gen NIST- PASS- noPHQ -noPHI 17k Both 6.6k Plat Gen NIST- PASS- noPHQ- PHI (18k) Both 13.5k
  • 4. Integration of Pedigree with multi-platform calls – NIST filtered Both 2.74M RTG- PHQ>0 61.0k NIST-PASS = NIST passing calls v.2.19 NIST-All = NIST v2.19 calls, including filtered calls if they are not likely homozygous reference RTG-PHQ = Real Time Genomics Phase Consistent calls with any phase quality RTG-PHQ>0 = Real Time Genomics Phase Consistent calls with phase quality > 0 RTG-PHI = Real Time Genomics Phase Inconsistent calls PlatGen = Platinum Genomes Phase Consistent calls PlatGenPoly = Platinum Genomes Phase Consistent calls that are polymorphic in the pedigree Bold means included in the final call set Italic means removed + 50bp on either side from the final bed file NIST- All- PHQ>0 364k NIST- PASS- PHQ (664k) NIST- All Both 2.37M NIST- All- PlatGen poly- noNIST PASS 62k NIST- All- PHQ>0 - noNIST PASS 134k Both 230k Both 2.05M PlatGen poly 32.7k NIST- All- PlatGen poly 364k NIST- PASS- PHQ (664k) NIST- All Both 2.37M
  • 5. New Platform-specific Integration Method for PGP Trios Normalize and take union of calls Simple SNPs/indels Illumina/SOLiD – GATK HC force calls Ion – TVC force calls If all biased or low qual, uncertain Elseif all concordant, high- conf Elseif all unbiased are concordant, high-conf Else uncertain CG – use Ref file Complex Variants Use vcfeval or SMASH for sequential pair- wise comparison
  • 6. Integration Method Plans • Implement new integration methods on the cloud – Easier for others to reproduce results • First, analyze NA12878 RM data with new methods to ensure they work well • Then, apply to PGP trios