Aug2014 use cases combined

550 views

Published on

Example Use Cases

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
550
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • One FN snv is confirmed to be a reference
    One FP indel is confirmed to be REAL indel
    Three FP SNVs are confirmed to be REAL SNVs
  • Most of the mendelian violation are almost certainly actual cell line mutations!!
  • Aug2014 use cases combined

    1. 1. • Performance assessment − Complement to known-pathogenic control samples (e.g. Coriell/GeT-RM, NIBSC). These control samples are most relevant to our product, but only ~1 variant / sample, and a limited # of such samples are available. − GIAB boosts n greatly, though variants aren’t generally clinically relevant o We also use: Mike Eberle’s NA12878 calls; internally constructed truth set for CEPH 1463 family & NA19240 − Validation docs, performance assessment of genes with poor coverage with control samples, upcoming publications 8/18/20141 Use of GIAB NA12878 at Invitae
    2. 2. Integrating NIST Call Sets into a Validation Workflow Validation Report False Positive Ratio FPR=FP/(FP+TN) False Discovery Rate FDR=FP/(FP + TP) Sensitivity Sens. = TP/(TP+FN) Specificity Spec. = TN/(FP +TN) Balanced Accuracy (Sens. + Spec.)/2
    3. 3. Nephropathology Associate’s Kidney Disease Gene Panel: Excerpts from a NA12878 Validation Report • Data provided by Marjorie Beggs (Nephropathology Associates) • 301 genes from 13 renal disease categories • Agilent oligo-capture followed by MiSeq 2x150 sequencing • Genotypes/probabilities determined with a modified version of MAQ variant caller (Li et al., 2008) Summary of all targeted positions: Summary of targeted zero coverage positions in experiment: In Standard VCF 614 In Standard VCF 3 Not in Standard VCF 803980 Not In Strandard VCF 5100 Total 804594 Total 5103 Summary grid Depth* PNotRef** T/P F/P T/N F/N Total FPR FDR Sens. Spec. Bal. Accuracy 10 0.5 592 14 789743 6 790355 0.002% 2.310% 98.997% 99.998% 99.50% 10 0.75 591 14 789743 7 790355 0.002% 2.314% 98.829% 99.998% 99.41% 10 0.9 591 14 789743 7 790355 0.002% 2.314% 98.829% 99.998% 99.41% 20 0.5 540 11 740860 3 741414 0.001% 1.996% 99.448% 99.999% 99.72% 20 0.75 539 11 740860 4 741414 0.001% 2.000% 99.263% 99.999% 99.63% 20 0.9 539 11 740860 4 741414 0.001% 2.000% 99.263% 99.999% 99.63% 30 0.5 408 7 611453 3 611871 0.001% 1.687% 99.270% 99.999% 99.63% 30 0.75 408 7 611453 3 611871 0.001% 1.687% 99.270% 99.999% 99.63% 30 0.9 408 7 611453 3 611871 0.001% 1.687% 99.270% 99.999% 99.63% * Only positions with a depth greater than or equal to this value will be included in the calculation. ** The minimum value for a position to be included as a variant.
    4. 4. Ion Benchmarking I
    5. 5. Ion Benchmarking II
    6. 6. Ion Benchmarking III
    7. 7. Background • Clinical laboratory – Division of Genomic Diagnostics Certified by regulatory agencies (CAP). • CWES test requires stringent validation per CAP criteria to establish performance metrics of the test. Utilizing NIST data in validation of CWES Test • Sequence and call variants of NA12878 at CHOP • CHOP ROI: Agilent SureSelect V5+ (SSV5+) baits file • Compare CHOP dataset to NIST data set for concordance NIST Data Set Details: *High quality reference data set on NA12878 (Dec. 2013) *NIST’s highly confident Region of Interests (ROI) *Variants called in 219,222 regions on hg19 assembly *: National Institute of Standards and Technology Analytical Validation of Clinical Whole-Exome Sequencing (CWES) Test
    8. 8. SENSITIVITY /SPECIFICITY RefGene +/- 15bp (SSV5+) CHOP NIST TP SNVs: 18480 INDELs: 396 FP SNVs: 26 INDELs: 3 FN SNVs: 63 INDELs: 30 FP: False Positive TP: True Positive FN: False Negative TN: True Negative SNVs INDELs Sensitivity (TP/TP+FN) 99.66% 92.96% Specificity (TN/TN+FP) ~100% ~100% FDR (FP/FP+TN) 0.02% 0.08% Accuracy (TP+TN/TP+TN+FP+FN) ~100% ~100% TN = NIST highly confident regions – CHOP ROIs
    9. 9. Further analysis on presumptive 93 FNs and 29 FPs 63 SNVs 30 INDELs 93 FNs 29 FPs 26 SNVs 3 INDELs
    10. 10. • Director – Avni Santani – Mehdi Sarmady • Clinical WES Team – Zhenming Yu – Kristin McDonald Gibson – Tanya Tischler – Addie I Nesbitt – Elizabeth H Denenberg Acknowledgment
    11. 11. Chr6:151669820 Chr6:151669828 Difficult site in homopolymer in intron of gene AKAP12
    12. 12. Chr1:1666303 SNP in Gene SLC35E2, which is also in a pseudogene and a segmental duplication
    13. 13. Using Genome in a Bottle calls to benchmark clinical exome sequencing at Mount Sinai School of Medicine “We evaluate a set of NA12878 technical replicates against GIAB for each new pipeline version.”
    14. 14. Benchmarking somatic variant calling at Qiagen
    15. 15. NextSeq: New Chemistry – Does it work? Whole Genome Metrics NextSeq500 HiSeq2500 % Genome Covered (>= 10X in Q20 bases) 96% 96% Mean Coverage in Q20 Bases 28.3X 31.8X SNPs Called (% dbSNP 129) 3,643,998 (89%) 3,664,014 (88%) InDels Called (% dbSNP 129) 646,907 (65.7%) 686,547 (64.5%) Genome in a Bottle SNP Sensitivity & Precision 99.07% | 99.04% 99.25% | 99.90% Genome in a Bottle Indel Sensitivity & Precision 86.90% | 98.85% 93.29% | 97.54% NextSeq 500: Genomic Coverage in High Quality Bases Coverage in Bases with MQ>=20 and Q>=20 ProportionofGenomeatCoverage 0.000.010.020.030.040.05 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Mean: 28.33X Fraction at 2/3 Mean: 0.9 HiSeq 2000: Genomic Coverage in High Quality Bases Coverage in Bases with MQ>=20 and Q>=20 ProportionofGenomeatCoverage 0.000.010.020.030.040.05 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Mean: 31.86X Fraction at 2/3 Mean: 0.91 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●●●● ● ●●●● ● ●0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00 GC Content NormalizedCoverage Platform ● ● HiSeq 2000 NextSeq 500
    16. 16. NextSeq: Exomes Compare 12-plex Rapid Capture Exome data from HiSeq 2500Rapid to NextSeq500 • 12-plex capture containing • NA12878 • 2 cell line Tumor/Normal pairs • TCGA samples • 2 Runs with 2x76 • PF Yield: 72Gb & 75Gb • Run time: 18 hours • Cluster density: 227-238k/mm2 • High level metrics • Error rate: 0.58% • %Q30: 83.6% (72.2% post-BQSR) Hybrid Selection Metrics NextSeq500 HiSeq2500 % Selected 75.4% 74.5% Penalty 20x 4.83 4.67 Mean Target Coverage 112X 165X % Target Bases ≥ 20x 92.9% 95.1% % Target Bases ≥ 50x 79.1% 87.8% Variant Calling Metrics NextSeq500 HiSeq2500 SNPs (% dbSNP 129) 22786 (94.7%) 22953 (94.6%) GIAB Sensitivity 96.53% 96.79% GIAB Precision 99.87% 99.96% InDels (% dbSNP 129) 816 (64.2%) 813 (65.4%) GIAB Sensitivity 83.16% 83.92% GIAB Precision 88.43% 92.31%
    17. 17. Other use cases LabCorp (Kyle Hart) • We are using this data to validate our variant identification pipelines which are based on the Qiagen/CLC software and Illumina sequence data • We are seeking high clinical sensitivity to minimize false negatives and we have a variety of strategies to rescue un-callable segments and confirm called variants prior to reporting to increase specificity. NHGRI (Nancy Hansen) • We have a variant analysis pipeline which analyzes whole exome sequence data (Illumina HiSeq2000/2500) for SNPs and small indels • We are using the GIAB variant dataset to assess the accuracy of our pipeline and compare it to other publicly available pipelines.

    ×