©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
	
  
NA12878	
  Trio/Pedigree	
  Analysis	
  
Francisco	
  M.	
  De	
  La	
  Vega,	
  D.Sc.	
  
VP	
  Genome	
  Science	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Leveraging trio information
•  GiaB has selected reference materials in the form of father,
mother, offspring trios
•  The goal was to leverage the Mendelian inheritance patterns
to:
–  Identify variant genotype errors that are inconsistent with
Mendelian inheritance
–  Remove these errors from the reference baseline calls
•  However, if variant identification methods don't use directly
pedigree information and jointly analyze the trio alignments,
an opportunity to improve the genotype calls would be
missed
•  We focused on using the RTG Family caller to better leverage
the shared information in the trios and improve the call set,
whilst reducing Mendelian inconsistent genotype errors
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
C
AA
A
A
A
A
A
A
A
A
A
A
A A/Genotype:
A A
CA
C
C
A
A
A
A
A /Genotype: C
C
A /Genotype:
AC
C
C
|
||
Variant calling can be improved by jointly
analyzing related samples
Shared	
  
haplotypes	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
C
AA
A
A
A
A
A
A
A
A
A
A
A A/Genotype:
A A
CA
C
C
A
A
A
A
A /Genotype: C
C
A /Genotype:
AC
C
C
|
||
Variant calling can be improved by jointly
analyzing related samples
Mendelian	
  variant	
  
segregaJon	
  
Shared	
  
haplotypes	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Mendelian inconsistency
C
C
/Genotype: C
C
C
C
C
C
C
A
A
A
A A/Genotype: (Low QV)
C
A
A
A
A
A
A /Genotype:
C
C
C
A
A A
CC
AC
|
||
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Joint trio analysis corrects Mendelian errors 
C
/Genotype: C
C
C
C
C T
G
G
G
C T
C T
C T
C
A
A
A
A
A
Genotype:
C
A / C
G
G
G
G
G
G
G
A
A
A
Genotype: (Good QV)
C T
C T
C T
C T
A / C
G
G
G
A A
CC
AC
|
||
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
NA12878 calls from trio calling
•  Comparing offspring variants from singleton vs
pedigree calling
–  Both showing good quality metrics
•  Using family information more good calls can be
made and dubious calls are downgraded
NA12878	
  	
  
Call	
  set SNVs Indels MNPs
SNV	
  
Het/Hom Ti/Tv	
  
%	
  dbSNP	
  
(r129)
RTG	
  single	
   3,329,797 558,242 31,070 1.55	
   2.11	
   90.8%	
  
RTG	
  trio	
   3,363,619 595,030 33,686 1.57	
   2.11	
   90.4%	
  
GATK/VQSR	
  	
   3,263,289 610,837 N/A 1.51	
   2.09	
   91.7%	
  
Variant	
  StaBsBcs	
  
Data:	
  WGS	
  2x100bp	
  >50X	
  	
  Illumina	
  PlaJnum	
  Genomes	
  data	
  (ENA	
  Acc.	
  No.	
  ERP001960).	
  RTG	
  AVR	
  score	
  cut-­‐off	
  0.15;	
  GATK	
  v1.7	
  &	
  BWA	
  0.6.1.	
  
142,848	
  
68,000	
  
Family	
  
Singleton	
  
3,849,457	
  
NA12878
NA12891 NA12892
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
NA12878 vs reference datasets 
NA12878	
  	
  
Call	
  set
1kP	
  OMNI	
  
	
  Poly	
  (TP%)	
  
1kP	
  OMNI	
  	
  
Mono	
  (FP%)	
  
Get-­‐RM¶	
  
(TP	
  %)	
  
GiaB	
  
(TP%)	
  
GiaB-­‐BED	
  
(TP%)	
  
RTG	
  single	
   97.5%	
   0.10%	
   97.4%	
   N/A	
   N/A	
  
RTG	
  trio	
   97.5%	
   0.24%	
   97.0%	
   90.5%	
   94.1%	
  
GATK/VQSR	
  	
   97.8%	
   0.17%	
   87.8%	
   88.4%	
   92.5%	
  
§	
  RelaJve	
  to	
  dbSNP	
  137;	
  StaJsJcs	
  for	
  SNVs	
  only.	
  ¶Get-­‐RM	
  consistent	
  high-­‐quality	
  variants;	
  n=498	
  	
  
NA12878
NA12891 NA12892
–  1000 Genomes Illumina OMNI SNP array
•  Polymorphic sites – TP proxy
•  Monomorphic sites – FP proxy
–  Get-RM high confidence call set
–  GiaB high confidence calls in BED region
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
ROC Trio calls vs. GiAB baseline (BED)
RTG	
  snpsimeval	
  tool;	
  SNV/indel/MNP;	
  zygosity	
  match	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
ROC Trio calls vs. GiaB baseline
RTG	
  snpsimeval	
  tool;	
  SNV/indel/MNP;	
  zygosity	
  match	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
ROC Trio calls vs. CGI baseline
RTG	
  snpsimeval	
  tool;	
  SNV/indel/MNP;	
  zygosity	
  match	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Mendelian inconsistency errors
RTG family caller reduces Mendelian Inheritance Errors over 60X vs. RTG
singleton calling (over 70X vs. GATK/VQSR)
Log	
  Counts	
  of	
  MIE	
  
1	
  
10	
  
100	
  
1000	
  
10000	
  
100000	
  
1000000	
  
RTG	
  single	
   RTG	
  trio	
   GATK/VQSR	
  
335,625	
  
4,870	
  
351,904	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Pattern #1: Heterozygous variant
TrioCalling
NA12878
NA12892NA12891
NA12877
NA12889 NA12890
NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893
0/1
0/10/0
0/0 0/0 0/00/0 0/00/1 0/1 0/10/10/1
	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Segregation of heterozygous variants
0	
  
20,000	
  
40,000	
  
60,000	
  
80,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
SNV	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
SNV	
  
0	
  
100	
  
200	
  
300	
  
400	
  
500	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
MNP	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
MNP	
  
0	
  
2,000	
  
4,000	
  
6,000	
  
8,000	
  
10,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
indel	
  	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
indel	
  
0	
  
20,000	
  
40,000	
  
60,000	
  
80,000	
  
100,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
Variant	
  count	
  
#	
  of	
  	
  offspirng	
  segregaBng	
  
All	
  Variants	
  
SegregaJon	
  of	
  NA12878	
  heterozygous	
  variants	
  called	
  as	
  family,	
  GQ>50,	
  homozygous	
  reference	
  in	
  other	
  parent.	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Pattern #2: Homozygous-alt variant
TrioCalling
NA12878
NA12892NA12891
NA12877
NA12889 NA12890
NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893
0/1
1/10/0
0/1 0/1 0/10/10/10/1 0/1 0/1 0/1 0/1
	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Segregation of homo-alt variants
0	
  
20,000	
  
40,000	
  
60,000	
  
80,000	
  
100,000	
  
120,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
SNV	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
SNV	
  
0	
  
100	
  
200	
  
300	
  
400	
  
500	
  
600	
  
700	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
MNP	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
MNP	
  
0	
  
2,000	
  
4,000	
  
6,000	
  
8,000	
  
10,000	
  
12,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
indel	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
indel	
  
0	
  
20,000	
  
40,000	
  
60,000	
  
80,000	
  
100,000	
  
120,000	
  
1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
   10	
   11	
  
Variant	
  count	
  
#	
  of	
  offspring	
  segregaBng	
  
All	
  Variants	
  
SegregaJon	
  of	
  NA12878	
  homozygous	
  alternaJve	
  variants	
  called	
  as	
  family,	
  GQ>50,	
  homozygous	
  reference	
  in	
  other	
  parent.	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
False positive estimate by segregation
	
  GT	
  Type	
   All	
  variants	
   SNV	
   MNP	
   indel	
  
	
  Het	
  
TP	
  (10-­‐11)	
   123672	
   110262	
   693	
   12717	
  
FP	
  (1-­‐8)	
   1901	
   1000	
   47	
   854	
  
FP%	
   1.40%	
   0.88%	
   1.42%	
   5.67%	
  
	
  Homo-­‐alt	
  
TP	
  (2-­‐10)	
   373260	
   329642	
   2258	
   41360	
  
FP	
  (1,11)	
   4457	
   3672	
   36	
   749	
  
FP%	
   1.18%	
   1.10%	
   1.57%	
   1.78%	
  
	
  Overall	
  
TP	
   496932	
   439904	
   2951	
   54077	
  
FP	
   6358	
   4672	
   83	
   1603	
  
Overall	
  FP%	
   1.26%	
   1.05%	
   2.74%	
   2.88%	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Data imputation by pedigree caller
•  For genomes with no data use population priors
–  With care can iterate over offspring then each of parents
independently
–  Avoid exponential explosion so can do whole extended
family in one calling step
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Imputation of family members with no data
Simulated	
  data	
  	
  	
  
True	
  PosiJves	
  
False	
  PosiJves	
  
1	
  offspring	
  
2	
  offspring	
  
4	
  offspring	
  
4	
  offspring	
  +	
  father	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
ROC vs NA12878 imputed baseline
RTG	
  snpsimeval	
  tool;	
  SNV/indel/MNP;	
  zygosity	
  match	
  	
  
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
de novo mutation identification
Call	
  set
de	
  novo	
  
candidates
de	
  novo	
  
germline*	
  
de	
  novo	
  
somaBc*	
   TP/FP	
  
Singleton	
  calls 16,902 49	
  (100%)	
   941	
  (99%)	
   1:17	
  
Trio	
  calls 2,205 49	
  (100%)	
   941	
  (99%)	
   1:2.2	
  
de	
  novo	
  MutaBon	
  Accuracy	
  (NA12878)	
  
*SensiJvity	
  vs.	
  Conrad	
  et	
  al.	
  (2011)	
  validated	
  dataset	
  of	
  germline	
  and	
  somaJc	
  cell	
  line	
  de	
  novo	
  mutaJons.	
  
–  Uses the parental genomes to identify & score de novo
mutations in offspring
–  Greater than 7X improvement in precision to find de novo
mutations vs. naïve methods
NA12878
NA12891 NA12892
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Status
•  Working through the complete trio datasets for
producing joint pedigree calls for NA12878 trio
– Aiming for a trio call set and another that
includes full Platinum pedigree data 
– There is disproportionally more data for
NA12878 than her parents or offspring
•  Comprehensive segregation analysis that
includes all Mendelian patterns
•  Phasing analysis to identify variants that are
inconsistent with transmitted phases
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Issues
•  How to integrate pedigree calls with other data?
– Variants that segregate appropriately
candidates for inclusion in baseline
– Variants that don’t segregate appropriately
candidates for removal of baseline
– Improvement of baseline genotypes using
pedigree-based genotypes
•  Use of the imputed NA12878 baseline
•  Creation of a more inclusive baseline for ROC
curves to compare new methods and select
thresholds
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  	
  
Acknowledgements
•  RTG team at Hamilton, New Zealand
–  Led by John Cleary, CTO
•  RTG team at San Bruno, CA
–  Sahar Malakshah
–  Minita Shah
–  Brian Hilbush
•  Michael Eberle, Illumina, Inc. – Platinum Data
•  Justin Zook, NIST
•  1000 Genomes Project
©	
  2013	
  Real	
  Time	
  Genomics,	
  Inc.	
  All	
  rights	
  reserved.	
  
US	
  Patent	
  7,640,256.	
  Other	
  patents	
  pending.	
  
For	
  research	
  use	
  only.	
  Not	
  for	
  diagnosJc	
  applicaJons.	
  

Aug2013 real time genomics trio pedigree analysis

  • 1.
    ©  2013  Real  Time  Genomics,  Inc.       NA12878  Trio/Pedigree  Analysis   Francisco  M.  De  La  Vega,  D.Sc.   VP  Genome  Science  
  • 2.
    ©  2013  Real  Time  Genomics,  Inc.     Leveraging trio information •  GiaB has selected reference materials in the form of father, mother, offspring trios •  The goal was to leverage the Mendelian inheritance patterns to: –  Identify variant genotype errors that are inconsistent with Mendelian inheritance –  Remove these errors from the reference baseline calls •  However, if variant identification methods don't use directly pedigree information and jointly analyze the trio alignments, an opportunity to improve the genotype calls would be missed •  We focused on using the RTG Family caller to better leverage the shared information in the trios and improve the call set, whilst reducing Mendelian inconsistent genotype errors
  • 3.
    ©  2013  Real  Time  Genomics,  Inc.     C AA A A A A A A A A A A A A/Genotype: A A CA C C A A A A A /Genotype: C C A /Genotype: AC C C | || Variant calling can be improved by jointly analyzing related samples Shared   haplotypes  
  • 4.
    ©  2013  Real  Time  Genomics,  Inc.     C AA A A A A A A A A A A A A/Genotype: A A CA C C A A A A A /Genotype: C C A /Genotype: AC C C | || Variant calling can be improved by jointly analyzing related samples Mendelian  variant   segregaJon   Shared   haplotypes  
  • 5.
    ©  2013  Real  Time  Genomics,  Inc.     Mendelian inconsistency C C /Genotype: C C C C C C C A A A A A/Genotype: (Low QV) C A A A A A A /Genotype: C C C A A A CC AC | ||
  • 6.
    ©  2013  Real  Time  Genomics,  Inc.     Joint trio analysis corrects Mendelian errors C /Genotype: C C C C C T G G G C T C T C T C A A A A A Genotype: C A / C G G G G G G G A A A Genotype: (Good QV) C T C T C T C T A / C G G G A A CC AC | ||
  • 7.
    ©  2013  Real  Time  Genomics,  Inc.     NA12878 calls from trio calling •  Comparing offspring variants from singleton vs pedigree calling –  Both showing good quality metrics •  Using family information more good calls can be made and dubious calls are downgraded NA12878     Call  set SNVs Indels MNPs SNV   Het/Hom Ti/Tv   %  dbSNP   (r129) RTG  single   3,329,797 558,242 31,070 1.55   2.11   90.8%   RTG  trio   3,363,619 595,030 33,686 1.57   2.11   90.4%   GATK/VQSR     3,263,289 610,837 N/A 1.51   2.09   91.7%   Variant  StaBsBcs   Data:  WGS  2x100bp  >50X    Illumina  PlaJnum  Genomes  data  (ENA  Acc.  No.  ERP001960).  RTG  AVR  score  cut-­‐off  0.15;  GATK  v1.7  &  BWA  0.6.1.   142,848   68,000   Family   Singleton   3,849,457   NA12878 NA12891 NA12892
  • 8.
    ©  2013  Real  Time  Genomics,  Inc.     NA12878 vs reference datasets NA12878     Call  set 1kP  OMNI    Poly  (TP%)   1kP  OMNI     Mono  (FP%)   Get-­‐RM¶   (TP  %)   GiaB   (TP%)   GiaB-­‐BED   (TP%)   RTG  single   97.5%   0.10%   97.4%   N/A   N/A   RTG  trio   97.5%   0.24%   97.0%   90.5%   94.1%   GATK/VQSR     97.8%   0.17%   87.8%   88.4%   92.5%   §  RelaJve  to  dbSNP  137;  StaJsJcs  for  SNVs  only.  ¶Get-­‐RM  consistent  high-­‐quality  variants;  n=498     NA12878 NA12891 NA12892 –  1000 Genomes Illumina OMNI SNP array •  Polymorphic sites – TP proxy •  Monomorphic sites – FP proxy –  Get-RM high confidence call set –  GiaB high confidence calls in BED region
  • 9.
    ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. GiAB baseline (BED) RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  • 10.
    ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. GiaB baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  • 11.
    ©  2013  Real  Time  Genomics,  Inc.     ROC Trio calls vs. CGI baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  • 12.
    ©  2013  Real  Time  Genomics,  Inc.     Mendelian inconsistency errors RTG family caller reduces Mendelian Inheritance Errors over 60X vs. RTG singleton calling (over 70X vs. GATK/VQSR) Log  Counts  of  MIE   1   10   100   1000   10000   100000   1000000   RTG  single   RTG  trio   GATK/VQSR   335,625   4,870   351,904  
  • 13.
    ©  2013  Real  Time  Genomics,  Inc.     Pattern #1: Heterozygous variant TrioCalling NA12878 NA12892NA12891 NA12877 NA12889 NA12890 NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 0/1 0/10/0 0/0 0/0 0/00/0 0/00/1 0/1 0/10/10/1    
  • 14.
    ©  2013  Real  Time  Genomics,  Inc.     Segregation of heterozygous variants 0   20,000   40,000   60,000   80,000   1   2   3   4   5   6   7   8   9   10   11   SNV  count   #  of  offspring  segregaBng   SNV   0   100   200   300   400   500   1   2   3   4   5   6   7   8   9   10   11   MNP  count   #  of  offspring  segregaBng   MNP   0   2,000   4,000   6,000   8,000   10,000   1   2   3   4   5   6   7   8   9   10   11   indel    count   #  of  offspring  segregaBng   indel   0   20,000   40,000   60,000   80,000   100,000   1   2   3   4   5   6   7   8   9   10   11   Variant  count   #  of    offspirng  segregaBng   All  Variants   SegregaJon  of  NA12878  heterozygous  variants  called  as  family,  GQ>50,  homozygous  reference  in  other  parent.  
  • 15.
    ©  2013  Real  Time  Genomics,  Inc.     Pattern #2: Homozygous-alt variant TrioCalling NA12878 NA12892NA12891 NA12877 NA12889 NA12890 NA12879 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 0/1 1/10/0 0/1 0/1 0/10/10/10/1 0/1 0/1 0/1 0/1    
  • 16.
    ©  2013  Real  Time  Genomics,  Inc.     Segregation of homo-alt variants 0   20,000   40,000   60,000   80,000   100,000   120,000   1   2   3   4   5   6   7   8   9   10   11   SNV  count   #  of  offspring  segregaBng   SNV   0   100   200   300   400   500   600   700   1   2   3   4   5   6   7   8   9   10   11   MNP  count   #  of  offspring  segregaBng   MNP   0   2,000   4,000   6,000   8,000   10,000   12,000   1   2   3   4   5   6   7   8   9   10   11   indel  count   #  of  offspring  segregaBng   indel   0   20,000   40,000   60,000   80,000   100,000   120,000   1   2   3   4   5   6   7   8   9   10   11   Variant  count   #  of  offspring  segregaBng   All  Variants   SegregaJon  of  NA12878  homozygous  alternaJve  variants  called  as  family,  GQ>50,  homozygous  reference  in  other  parent.  
  • 17.
    ©  2013  Real  Time  Genomics,  Inc.     False positive estimate by segregation  GT  Type   All  variants   SNV   MNP   indel    Het   TP  (10-­‐11)   123672   110262   693   12717   FP  (1-­‐8)   1901   1000   47   854   FP%   1.40%   0.88%   1.42%   5.67%    Homo-­‐alt   TP  (2-­‐10)   373260   329642   2258   41360   FP  (1,11)   4457   3672   36   749   FP%   1.18%   1.10%   1.57%   1.78%    Overall   TP   496932   439904   2951   54077   FP   6358   4672   83   1603   Overall  FP%   1.26%   1.05%   2.74%   2.88%  
  • 18.
    ©  2013  Real  Time  Genomics,  Inc.     Data imputation by pedigree caller •  For genomes with no data use population priors –  With care can iterate over offspring then each of parents independently –  Avoid exponential explosion so can do whole extended family in one calling step
  • 19.
    ©  2013  Real  Time  Genomics,  Inc.     Imputation of family members with no data Simulated  data       True  PosiJves   False  PosiJves   1  offspring   2  offspring   4  offspring   4  offspring  +  father  
  • 20.
    ©  2013  Real  Time  Genomics,  Inc.     ROC vs NA12878 imputed baseline RTG  snpsimeval  tool;  SNV/indel/MNP;  zygosity  match    
  • 21.
    ©  2013  Real  Time  Genomics,  Inc.     de novo mutation identification Call  set de  novo   candidates de  novo   germline*   de  novo   somaBc*   TP/FP   Singleton  calls 16,902 49  (100%)   941  (99%)   1:17   Trio  calls 2,205 49  (100%)   941  (99%)   1:2.2   de  novo  MutaBon  Accuracy  (NA12878)   *SensiJvity  vs.  Conrad  et  al.  (2011)  validated  dataset  of  germline  and  somaJc  cell  line  de  novo  mutaJons.   –  Uses the parental genomes to identify & score de novo mutations in offspring –  Greater than 7X improvement in precision to find de novo mutations vs. naïve methods NA12878 NA12891 NA12892
  • 22.
    ©  2013  Real  Time  Genomics,  Inc.     Status •  Working through the complete trio datasets for producing joint pedigree calls for NA12878 trio – Aiming for a trio call set and another that includes full Platinum pedigree data – There is disproportionally more data for NA12878 than her parents or offspring •  Comprehensive segregation analysis that includes all Mendelian patterns •  Phasing analysis to identify variants that are inconsistent with transmitted phases
  • 23.
    ©  2013  Real  Time  Genomics,  Inc.     Issues •  How to integrate pedigree calls with other data? – Variants that segregate appropriately candidates for inclusion in baseline – Variants that don’t segregate appropriately candidates for removal of baseline – Improvement of baseline genotypes using pedigree-based genotypes •  Use of the imputed NA12878 baseline •  Creation of a more inclusive baseline for ROC curves to compare new methods and select thresholds
  • 24.
    ©  2013  Real  Time  Genomics,  Inc.     Acknowledgements •  RTG team at Hamilton, New Zealand –  Led by John Cleary, CTO •  RTG team at San Bruno, CA –  Sahar Malakshah –  Minita Shah –  Brian Hilbush •  Michael Eberle, Illumina, Inc. – Platinum Data •  Justin Zook, NIST •  1000 Genomes Project ©  2013  Real  Time  Genomics,  Inc.  All  rights  reserved.   US  Patent  7,640,256.  Other  patents  pending.   For  research  use  only.  Not  for  diagnosJc  applicaJons.