Rodriguez_UROC_Final_Presentation

1. Ameliorating Statistical Methodologies as Genomic Data Burgeon: Reﬁned Proportional Odds Model with Application to New Dravet Dataset Ivan Rodriguez∗,⊥, † Joseph C. Watkins, Ph.D.∗,⊥,§ ∗The University of Arizona ⊥Department of Mathematics †UROC-PREP/STAR Program §Graduate Interdisciplinary Program in Statistics, Chair August 5, Summer 2016

2. Focus Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 2 / 19

3. Focus Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 3 / 19

4. Research Overview Motivation: ≈150,000 newborns diagnosed with genetic disease annually (Nussbaum, McInnes, & Willard, 2007). Challenge: making sense of this abundant data. Objectives: Match data and diagnosis by improving existing technique. Apply model to new and exclusive dataset. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 4 / 19

10. Methods: Ordinal Categorical Data Analysis Clear ordering of categories. Relevant example: disease severity. Complications: Assigning numeric values to categories. Nonequidistance between categories. Naïve solution: dichotomize ordinal outcome. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 5 / 19

17. Methods: Proportional Odds Model, General Better method: the proportional odds model (McCullagh, 1980). Extends binary logistic regression (Cox, 1958). Celebrated method for ordinal data analysis (Bender & Grouven, 1998). Applications: surveys, quality assurance, radiology, clinical research (McCullagh, 1999). Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 6 / 19

22. Methods: Proportional Odds Model, General Better method: the proportional odds model (McCullagh, 1980). Extends binary logistic regression (Cox, 1958). Celebrated method for ordinal data analysis (Bender & Grouven, 1998). Applications: surveys, quality assurance, radiology, clinical research (McCullagh, 1999). logit P(Yi ≤ j | Xi ) = θj − βT Xi , j ∈ (1, . . . , J − 1), logit(π) = log π 1 − π . Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 6 / 19

23. Methods: Proportional Odds Model, Limitations Great on paper, but not in practice. Proportional odds assumption often violated (Long & Freese, 2006). A standard workaround: modify the model. Reﬁne the latent variable. Fine-tune the null hypothesis. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 7 / 19

29. Methods: Latent Variable Variables that are inferred, not directly observed. The focus is to make better inferences. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 8 / 19

32. Methods: Latent Variable Variables that are inferred, not directly observed. The focus is to make better inferences. Y ∗ = βT + ε, P(Y ≤ j | X) = 1 exp(βTX − θj) + 1 . Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 8 / 19

33. Methods: Hypothesis Testing Null versus alternative hypotheses: H0 against HA. Traditionally, H0 is the status quo. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 9 / 19

36. Methods: Hypothesis Testing Null versus alternative hypotheses: H0 against HA. Traditionally, H0 is the status quo. H0 : β1 = · · · = βq = 0. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 9 / 19

37. Methods: Hypothesis Testing Null versus alternative hypotheses: H0 against HA. Traditionally, H0 is the status quo. H0 : β1 = · · · = βq = 0. β = τξ, τ ∈ F, Si = ξT xi . Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 9 / 19

38. Methods: Hypothesis Testing Null versus alternative hypotheses: H0 against HA. Traditionally, H0 is the status quo. H0 : β1 = · · · = βq = 0. β = τξ, τ ∈ F, Si = ξT xi . H0 : τ = 0, P Y ≤ j | X = 1 exp(Sτ − θj) + 1 . Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 9 / 19

39. Methods: Score Function Allows for quantiﬁcation of performance of model. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 10 / 19

40. Methods: Score Function Allows for quantiﬁcation of performance of model. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 10 / 19

41. Methods: Score Function Allows for quantiﬁcation of performance of model. u(θ1, . . . , θJ−1, τ) = − J j=1 nj i=1 Sij 1 − ψ θj − τSij − ψ θj−1 − τSij , ψ(t) = 1 1 + exp(−t) . Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 10 / 19

42. Methods: Simulations Criteria: type I error frequency and power. Algorithm: 1. Generate genotype data. 2. Obtain error terms. 3. Fix latent variables. 4. Produce ordinal categorical responses. 5. Estimate θj under modiﬁed H0. 6. Plug ˆθj into score function. 7. Receive p-values. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 11 / 19

52. Methods: Application Response: Dravet syndrome patient severity. Predictor: 12 stress-related single nucleotide polymorphisms. Sample size: 22 relatively isolated Japanese observations. Categories: 2, mild and severe. Other data: sex, status, IQ, allele count. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 12 / 19

58. Results: The Proposed Model Is Successful Type I error and power comparable to: Sequence kernel association test (Wu et al., 2011). Optimized sequence kernel association test (Lee et al., 2012). In terms of power, outperforms: Variable threshold test (Price et al., 2010). Cohort allelic sums test (Morgenthaler & Thilly, 2007). Cumulative minor-allele test (Zawistowski et al., 2010). Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 13 / 19

66. Results: Stress and Dravet Are Intricately Correlated Rare phenotypes prevalent for young severe patients. Several genes protect or exacerbate Dravet. Likely varies on case-by-case basis. Stress-Dravet link contingent on sample heterogeneity. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 14 / 19

71. Discussion Preliminary evaluation of model and dataset analysis. Severe genetic disease modifying genes determine quality-of-life. Identiﬁcation of modifying genes is paramount. Provides impetus for new medication and treatment. Personalized care will rise with genomic information. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 15 / 19

77. In Conclusion Problems: The proportional odds model can be improved. The stress-Dravet link is not known. Takeaways: The proposed model is formidable. A new stress-Dravet link has been established. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 16 / 19

84. Acknowledgments Joseph C. Watkins, Ph.D. Miao Zhang, M.S. Michael Hammer, Ph.D., and the Hammer Lab. Andrew Huerta, Ph.D. and Reneé Reynolds, M.A. Andrew Carnie, Ph.D. Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 17 / 19

90. References Bender, R., & Grouven, U. (1998). Using binary logistic regression models for ordinal data with non-proportional odds. Journal of Clinical Epidemiology, 51(10), 809–816. doi:10.1016/S0895-4356(98)00066-3 Cox, D. R. (1958). The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society, Series B, 20, 215–242. Lee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., NHLBI GO Exome Sequencing Project, . . . , Lin, X. (2012). Optimal uniﬁed approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. The American Journal of Human Genetics, 91(2), 224–237. doi:10.1016/j.ajhg.2012.06.007 Long, J. S., & Freese, J. (2006). Regression models for categorical dependent variables using Stata. College Station, TX: Stata Press. McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society, 42(2), 109–142. McCullagh, P. (1999). The proportional odds model. In P. Armitage, Encyclopedia of Biostatistics Vol. 5 (3560– 3563). Hoboken, NJ: John Wiley & Sons. Morgenthaler, S., & Thilly, W. G. (2007). A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: A cohort allelic sums test (CAST). Mutation Research, 615(1–2), 28–56. doi:10.1016/ j.mrfmmm.2006.09.003 Nussbaum, R. L., McInnes, R. R., & Willard, H. F. (2007). Thompson & Thompson genetics in medicine (6th ed.). Philadelphia, PA: W. B. Saunders. doi:10.1016/S0015-0282(02)03084-4 Price, A. L., Kryukov, G. V., de Bakker, P. I., Purcell, S. M., Staples, J., Wei, L. J., & Sunyaev, S. R. (2010). Pooled association tests for rare variants in exon-resequencing studies. The American Journal of Human Genetics, 86(6), 832–838. doi:10.1016/j.ajhg.2010.04.005 Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., & Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics, 89(1), 82–93. doi: 10.1016/j.ajhg.2011.05.029 Zawistowski, M., Gopalakrishnan, S., Ding, J., Li, Y., Grimm, S., & Zöllner, S. (2010). Extending rare-variant testing strategies: Analysis of noncoding sequence and imputed genotypes. The American Journal of Human Genetics, 87(5), 604–617. doi:10.1016/j.ajhg.2010.10.012 Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 18 / 19

91. Questions? Ivan Rodriguez: ivanrodriguez@email.arizona.edu . Ivan Rodriguez (The University of Arizona) Ameliorating Statistical Methodologies August 5, 2016 19 / 19

Rodriguez_UROC_Final_Presentation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to Rodriguez_UROC_Final_Presentation

Similar to Rodriguez_UROC_Final_Presentation (20)

More from Iván Rodríguez

More from Iván Rodríguez (10)