Advertisement

Avoiding Nonsense Results in your NGS Variant Studies

Neurodevelopment Disorder Research (Genetics, Toxicology, Biomarkers, Prevention, Treatment)
Sep. 6, 2014
Advertisement

More Related Content

Advertisement

Similar to Avoiding Nonsense Results in your NGS Variant Studies(20)

Advertisement

Avoiding Nonsense Results in your NGS Variant Studies

  1. Avoiding Nonsense Results in your NGS Variant Studies James Lyons-Weiler, PhD Scientific Director/ Senior Research Scientist Bioinformatics Analysis Core Genomics & Proteomics Core Laboratories University of Pittsburgh Pittsburgh, PA May 1, 2014
  2. Two Parts • Identifying sites with low genotypic signal increases concordance among variant callers • Hazards in finding differentially expressed genes in RNASeq – how to do it more robustly.
  3. 23andMe: High risk of RA and psiriosis GTL: Low risk of RA and psiriosis
  4. NYTimes Article, etc.
  5. Data were from Illumina hi-seq 2000
  6. Among method average Concordance 57.5% overall; 32.7% at high coverage O’Rawe et al.
  7. Information Theory Consensus Analysis e.g.,2/3, ¾, set analysis (-> modeling) Improve Callers (fix errors, modeling) Bake Offs LOW CONCORDANCE (O’Rawe et al., 2013) VARIANT CALLERS MAPPER SEQUENCER TRUTH (BIOLOGICAL MOLECULAR SEQUENCE) Simulations Spiked Ins
  8. Entropy of Base Distributions A T C G A T C G A T C G Low entropy Low entropy High entropy High enthalpy High enthalpy Low enthalpy
  9. Boltzmann Entropy • s = k ln w (Planck) • w = antiln(s/k) http://schneider.ncifcrf.gov/images/boltzmann /boltzmann-tomb-4.html
  10. Rank Sorted Distribution of w (O’Rawe et al. data) Heterozygotes w = 2 Homozygotes w = 1
  11. Example w Density Distribution
  12. w and FBVC A T C G w pw Zygosity Genotype 200 0 0 0 1 0 Homozygote AA 16 158 13 13 2.102558 0 Homozygote TT 100 100 0 0 2 0 Heterozygote AT 58 30 1 111 2.768507 0 Heterozygote AG 28 80 14 78 3.303636 0 Heterozygote TG 76 38 29 57 3.758733 0 Heterozygote AG 33 49 60 58 3.895496 0.0126 Heterzygote? CG? 50 50 50 50 4 1 noise unknown
  13. Operational* Equiprobable Null Distribution {f(A) = f(T) = f(G) = f(C)}
  14. Convergence of significance (pw)
  15. What We Expect INCREASED CONCORDANCE Genotypic Signal Filtering VARIANT/BASE CALLERS MAPPER SEQUENCER TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)
  16. Phom Function
  17. gatk From the O’Rawe et al. generated results FBVC = frequency-based variant caller (Lyons-Weiler et al.) Concordance w/ FBVC Hom Het ALL 0.5762 11868 17670 pw<=0.05 0.9976 11282 5676 pw>0.05 0.0074 586 11994 samtools ALL 0.5649 11541 18799 pw<=0.05 0.9917 11489 5761 pw>0.05 0.0002 52 13038 snver ALL 0.6006 11904 16729 pw<=0.05 0.9934 11812 5470 pw>0.05 0.0007 92 11259
  18. Signal Tx %Concordance FBVC_vs_FBVC Marked ALL 85.64 pw<=0.05 91.08 pw>0.05 35.66 FBVC_vs_FBVC Realigned ALL 83.82 pw<=0.05 91.69 pw>0.05 28.21 FBVC_vs_FBVC Recalibrated ALL 93.14 pw<=0.05 ***99.39 pw>0.05 48.53 FBVC_vs_FBVC Reduced ALL 21.54 pw<=0.05 24.57 pw>0.05 4.25 FBVC_vs_FBVC Marked-Realigned ALL 76.91 pw<=0.05 86.11 pw>0.05 15.44 FBVC_vs_FBVC Marked-Realigned-Recalibrated ALL 76.73 pw<=0.05 85.99 pw>0.05 15.34 FBVC_vs_FBVC Marked-Realigned-Recalibrated-Reduced ALL 19.98 pw<=0.05 22.9 pw>0.05 2.66
  19. Information Theory Consensus Analysis e.g.,2/3, ¾, set analysis (-> modeling) Improve Callers (fix errors, modeling) Bake Offs LOW CONCORDANCE (O’Rawe et al., 2013) VARIANT CALLERS MAPPER SEQUENCER TRUTH (BIOLOGICAL MOLECULAR SEQUENCE) Simulations Spiked Ins
  20. Lifescope reads (read) Shrimp2 reads (blue) Mappers must be systematically evaluated
  21. Part 2: Good and Bad News for RNASeq (and everything else): The Bad News: Fold Change is Biased. The Good News: We have identified a much less biased method.
  22. T-test is not appropriate for small N, large P data (such as RNASeq)
  23. Fold Change > 2.0 Delta > 25
  24. FC(A/B) is Blind to Large Portions of Your Data FC(A/B) Delta (and J5: Patel & Lyons-Weiler, 2004)
  25. Ratio are Hard to Interpret as Biological Differences Gene A B delta (A-B) FC(A/B) gene1 5 3 2 1.667 gene2 50 30 20 1.667 gene3 500 300 200 1.667 gene4 5000 3000 2000 1.667 gene5 50000 30000 20000 1.667
  26. A-B is a difference A/B is a quotient.
  27. Log2 Transformation Does not Help Reveals Minor Delta (&J5) Bias Pink = FC(A/B) Black = Delta
  28. G-Thresholding J5
  29. FC Bias in Amyotrophic Lateral Sclerosis 350000 300000 250000 200000 150000 100000 50000 0 0 50000 100000 150000 200000 Control ALS DEGy FCDEGy Black circles = FC(A/B). Pink = Gthr-J5 genes
  30. FC(A/B) Bias in Alchohol-Induced Hepatitis Black circles = FC(A/B). Pink = Gthr-J5 genes
  31. Conclusions • Not all NGS/HTS sites have sufficient genotypic signal to warrant a base call. High coverage alone does not provide a solution. • By measuring genotypic signal, we can determine which sites we can call with confidence. • Fold-change(FC(A/B) is blind to highly expressed genes and should be abandoned as a measure of differential expression altogether – even for single gene or single protein studies! • Published microarray data sets analyzed to date using FC(A/B) only are a gold-mine for re-analysis using less biased methods.
  32. Credits and Contact • pw, pHom, etc: James Lyons-Weiler, Alan Twaddle, Rahil Sethi. – (MS in preparation) – Our software is called Gconf (not yet available) • Fold-Change Bias: James Lyons-Weiler, Tamanna Sultana, Rick Jordan, Rahil Sethi – (Paper in review) – For now, read • Mariani TJ, Budhraja V, Mecham BH, Gu CC, Watson MA, Sadovsky Y. 2003. A variable fold change threshold determines significance for expression microarrays. FASEB J. 17:321-3. doi: 10.1096/fj.02-0351fje • Pearson, K. 1897. On a form of spurious correlation that may arise when indices are used for the measurement of organs. Proc Roy Soc Lond 60:489-498 doi: 10.1098/rspl.1896.0076
Advertisement