Aug2013 real time genomics trio pedigree analysis

1,887 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,887
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
41
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Aug2013 real time genomics trio pedigree analysis

  1. 1. ©  2013  Real  Time  Genomics,  Inc.       NA12878  Trio/Pedigree  Analysis   Francisco  M.  De  La  Vega,  D.Sc.   VP  Genome  Science  
  2. 2. ©  2013  Real  Time  Genomics,  Inc.     Leveraging trio information •  GiaB has selected reference materials in the form of father, mother, offspring trios •  The goal was to leverage the Mendelian inheritance patterns to: –  Identify variant genotype errors that are inconsistent with Mendelian inheritance –  Remove these errors from the reference baseline calls •  However, if variant identification methods don't use directly pedigree information and jointly analyze the trio alignments, an opportunity to improve the genotype calls would be missed •  We focused on using the RTG Family caller to better leverage the shared information in the trios and improve the call set, whilst reducing Mendelian inconsistent genotype errors
  3. 3. ©  2013  Real  Time  Genomics,  Inc.     C AA A A A A A A A A A A A A/Genotype: A A CA C C A A A A A /Genotype: C C A /Genotype: AC C C | || Variant calling can be improved by jointly analyzing related samples Shared   haplotypes  
  4. 4. ©  2013  Real  Time  Genomics,  Inc.     C AA A A A A A A A A A A A A/Genotype: A A CA C C A A A A A /Genotype: C C A /Genotype: AC C C | || Variant calling can be improved by jointly analyzing related samples Mendelian  variant   segregaJon   Shared   haplotypes  
  5. 5. ©  2013  Real  Time  Genomics,  Inc.     Mendelian inconsistency C C /Genotype: C C C C C C C A A A A A/Genotype: (Low QV) C A A A A A A /Genotype: C C C A A A CC AC | ||
  6. 6. ©  2013  Real  Time  Genomics,  Inc.     Joint trio analysis corrects Mendelian errors C /Genotype: C C C C C T G G G C T C T C T C A A A A A Genotype: C A / C G G G G G G G A A A Genotype: (Good QV) C T C T C T C T A / C G G G A A CC AC | ||
  7. 7. ©  2013  Real  Time  Genomics,  Inc.     NA12878 calls from trio calling •  Comparing offspring variants from singleton vs pedigree calling –  Both showing good quality metrics •  Using family information more good calls can be made and dubious calls are downgraded NA12878     Call  set SNVs Indels MNPs SNV   Het/Hom Ti/Tv   %  dbSNP   (r129) RTG  single   3,329,797 558,242 31,070 1.55   2.11   90.8%   RTG  trio   3,363,619 595,030 33,686 1.57   2.11   90.4%   GATK/VQSR     3,263,289 610,837 N/A 1.51   2.09   91.7%   Variant  StaBsBcs   Data:  WGS  2x100bp  >50X    Illumina  PlaJnum  Genomes  data  (ENA  Acc.  No.  ERP001960).  RTG  AVR  score  cut-­‐off  0.15;  GATK  v1.7  &  BWA  0.6.1.   142,848   68,000   Family   Singleton   3,849,457   NA12878 NA12891 NA12892
  8. 8. ©  2013  Real  Time  Genomics,  Inc.     NA12878 vs reference datasets NA12878     Call  set 1kP  OMNI    Poly  (TP%)   1kP  OMNI     Mono  (FP%)   Get-­‐RM¶   (TP  %)   GiaB   (TP%)   GiaB-­‐BED   (TP%)   RTG  single   97.5%   0.10%   97.4%   N/A   N/A   RTG  trio   97.5%   0.24%   97.0%   90.5%   94.1%   GATK/VQSR     97.8%   0.17%   87.8%   88.4%   92.5%   §  RelaJve  to  dbSNP  137;  StaJsJcs  for  SNVs  only.  ¶Get-­‐RM  consistent  high-­‐quality  variants;  n=498     NA12878 NA12891 NA12892 –  1000 Genomes Illumina OMNI SNP array •  Polymorphic sites – TP proxy •  Monomorphic sites – FP proxy –  Get-RM high confidence call set –  GiaB high confidence calls in BED region
  9. 9. ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. GiAB baseline (BED) RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  10. 10. ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. GiaB baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  11. 11. ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. CGI baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  12. 12. ©  2013  Real  Time  Genomics,  Inc.     Mendelian inconsistency errors RTG family caller reduces Mendelian Inheritance Errors over 60X vs. RTG singleton calling (over 70X vs. GATK/VQSR) Log  Counts  of  MIE   1   10   100   1000   10000   100000   1000000   RTG  single   RTG  trio   GATK/VQSR   335,625   4,870   351,904  
  13. 13. ©  2013  Real  Time  Genomics,  Inc.     Pattern #1: Heterozygous variant TrioCalling NA12878 NA12892NA12891 NA12877 NA12889 NA12890 NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 0/1 0/10/0 0/0 0/0 0/00/0 0/00/1 0/1 0/10/10/1    
  14. 14. ©  2013  Real  Time  Genomics,  Inc.     Segregation of heterozygous variants 0   20,000   40,000   60,000   80,000   1   2   3   4   5   6   7   8   9   10   11   SNV  count   #  of  offspring  segregaBng   SNV   0   100   200   300   400   500   1   2   3   4   5   6   7   8   9   10   11   MNP  count   #  of  offspring  segregaBng   MNP   0   2,000   4,000   6,000   8,000   10,000   1   2   3   4   5   6   7   8   9   10   11   indel    count   #  of  offspring  segregaBng   indel   0   20,000   40,000   60,000   80,000   100,000   1   2   3   4   5   6   7   8   9   10   11   Variant  count   #  of    offspirng  segregaBng   All  Variants   SegregaJon  of  NA12878  heterozygous  variants  called  as  family,  GQ>50,  homozygous  reference  in  other  parent.  
  15. 15. ©  2013  Real  Time  Genomics,  Inc.     Pattern #2: Homozygous-alt variant TrioCalling NA12878 NA12892NA12891 NA12877 NA12889 NA12890 NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 0/1 1/10/0 0/1 0/1 0/10/10/10/1 0/1 0/1 0/1 0/1    
  16. 16. ©  2013  Real  Time  Genomics,  Inc.     Segregation of homo-alt variants 0   20,000   40,000   60,000   80,000   100,000   120,000   1   2   3   4   5   6   7   8   9   10   11   SNV  count   #  of  offspring  segregaBng   SNV   0   100   200   300   400   500   600   700   1   2   3   4   5   6   7   8   9   10   11   MNP  count   #  of  offspring  segregaBng   MNP   0   2,000   4,000   6,000   8,000   10,000   12,000   1   2   3   4   5   6   7   8   9   10   11   indel  count   #  of  offspring  segregaBng   indel   0   20,000   40,000   60,000   80,000   100,000   120,000   1   2   3   4   5   6   7   8   9   10   11   Variant  count   #  of  offspring  segregaBng   All  Variants   SegregaJon  of  NA12878  homozygous  alternaJve  variants  called  as  family,  GQ>50,  homozygous  reference  in  other  parent.  
  17. 17. ©  2013  Real  Time  Genomics,  Inc.     False positive estimate by segregation  GT  Type   All  variants   SNV   MNP   indel    Het   TP  (10-­‐11)   123672   110262   693   12717   FP  (1-­‐8)   1901   1000   47   854   FP%   1.40%   0.88%   1.42%   5.67%    Homo-­‐alt   TP  (2-­‐10)   373260   329642   2258   41360   FP  (1,11)   4457   3672   36   749   FP%   1.18%   1.10%   1.57%   1.78%    Overall   TP   496932   439904   2951   54077   FP   6358   4672   83   1603   Overall  FP%   1.26%   1.05%   2.74%   2.88%  
  18. 18. ©  2013  Real  Time  Genomics,  Inc.     Data imputation by pedigree caller •  For genomes with no data use population priors –  With care can iterate over offspring then each of parents independently –  Avoid exponential explosion so can do whole extended family in one calling step
  19. 19. ©  2013  Real  Time  Genomics,  Inc.     Imputation of family members with no data Simulated  data       True  PosiJves   False  PosiJves   1  offspring   2  offspring   4  offspring   4  offspring  +  father  
  20. 20. ©  2013  Real  Time  Genomics,  Inc.     ROC vs NA12878 imputed baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  21. 21. ©  2013  Real  Time  Genomics,  Inc.     de novo mutation identification Call  set de  novo   candidates de  novo   germline*   de  novo   somaBc*   TP/FP   Singleton  calls 16,902 49  (100%)   941  (99%)   1:17   Trio  calls 2,205 49  (100%)   941  (99%)   1:2.2   de  novo  MutaBon  Accuracy  (NA12878)   *SensiJvity  vs.  Conrad  et  al.  (2011)  validated  dataset  of  germline  and  somaJc  cell  line  de  novo  mutaJons.   –  Uses the parental genomes to identify & score de novo mutations in offspring –  Greater than 7X improvement in precision to find de novo mutations vs. naïve methods NA12878 NA12891 NA12892
  22. 22. ©  2013  Real  Time  Genomics,  Inc.     Status •  Working through the complete trio datasets for producing joint pedigree calls for NA12878 trio – Aiming for a trio call set and another that includes full Platinum pedigree data – There is disproportionally more data for NA12878 than her parents or offspring •  Comprehensive segregation analysis that includes all Mendelian patterns •  Phasing analysis to identify variants that are inconsistent with transmitted phases
  23. 23. ©  2013  Real  Time  Genomics,  Inc.     Issues •  How to integrate pedigree calls with other data? – Variants that segregate appropriately candidates for inclusion in baseline – Variants that don’t segregate appropriately candidates for removal of baseline – Improvement of baseline genotypes using pedigree-based genotypes •  Use of the imputed NA12878 baseline •  Creation of a more inclusive baseline for ROC curves to compare new methods and select thresholds
  24. 24. ©  2013  Real  Time  Genomics,  Inc.     Acknowledgements •  RTG team at Hamilton, New Zealand –  Led by John Cleary, CTO •  RTG team at San Bruno, CA –  Sahar Malakshah –  Minita Shah –  Brian Hilbush •  Michael Eberle, Illumina, Inc. – Platinum Data •  Justin Zook, NIST •  1000 Genomes Project ©  2013  Real  Time  Genomics,  Inc.  All  rights  reserved.   US  Patent  7,640,256.  Other  patents  pending.   For  research  use  only.  Not  for  diagnosJc  applicaJons.  

×