Biostatistics 760


Published on

Published in: Technology, Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Biostatistics 760

  1. 1. Biostatistics 760 Random Thoughts
  2. 2. Upcoming Classes • Bios 761: Advanced Probability and Statistical Inference • Bios 763: Generalized Linear Model Theory and Applications • Bios 767: Longitudinal Data Analysis • Bios 780: Theory and Methods for Survival Analysis • Bios 841: Statistical Consulting
  3. 3. Bios 760 • Frequentist and Bayesian decision theory • Hypothesis testing: UMP tests, etc. • Bootstrap and other methods of inference • Stochastic processes: – Poisson processes – Markov chains – Martingales – Brownian motion
  4. 4. Bios 780 • Time-to-event data • Right censoring • Counting processes; martingales • Semiparametric approaches – Kaplan-Meier estimator – Log-rank statistic – Cox model • Data analysis
  5. 5. Bios 841 • Consulting versus collaboration • Bringing it all together to solve problems • Communicating about statistics – Three real problems – Three journal style reports – One final oral presentation • Real time problem solving • What is the role of statistical theory?
  6. 6. A Few War Stories • As a student: thesis on surrogates • As a postdoc: infectious diseases • As a new professor: cystic fibrosis (CF)* • Working on tenure: empirical processes • Empirical processes and cancer* • Chair of the DSMC for NICHD • Artificial intelligence and NSCLC
  7. 7. CF Neonatal Screening • 1992: Joined Phil Farrell’s CF study team • 1997: Farrell, Kosorok, Laxova, et al, published in NEJM • 2004 (Oct. 15): CDC recommended CF newborn screening: the 1997 article was judged the only valid randomized trial • States offering CF newborn screening: 3 in 1997, 12 in 2004, 45 today
  8. 8. What Role Did “Theory” Play? • Used state-of-the-art statistical methods that were robust (GEE) • In other CF research we have used: – Current status methods (parametric, robust) – Constrained regression estimation – Semiparametric bootstrap inference – Martingale based survival analysis – New work using artificial intelligence
  9. 9. Empirical Processes and Cancer • Non-Hodgkin’s Lymphoma Prognostic Factors Project (1993, NEJM) • Cox proportional hazards model employed to ascertain risks of 5 prognostic factors: Age, performance Status, serum lactate dehydrogenase Level, number of extranodal disease Sites, tumor Stage • Diagnostics show the model fits poorly
  10. 10. What is the Problem? • Poor survival function prediction • Possibly incorrect interpretation of risk factor effects • A model that adds a single parameter to the Cox model was developed and fit • This new model fits well (Kosorok,Lee and Fine, 2004) • Inference for the new model is complicated
  11. 11. What Does Theory Tell Us? • We can derive valid inferential tools for the new model: estimation and bootstrap • Robustness was also studied: we learn theoretically that the Cox model is robust to this kind of model misspecification: – The direction of the regression coefficients is preserved – Should use robust variance for Cox model
  12. 12. Theory Versus Applications • The title implies there is conflict between theory and applications • This isn’t true! • Theory provides a basis for correct thinking and problem solving for applications • Applications drive new theoretical development
  13. 13. Theory Can Be Impractical • Law of iterated logarithm: needs sample size of 108 (“asymptopia”). • Sometimes higher order approximations are needed before it becomes useful. • Sometimes computational properties of asymptotically optimal estimators are poor. • Some hard problems take years to solve.
  14. 14. Why Theory is Needed • Often it does work for practical sample sizes. • Can reveal properties that are universally valid: simulation studies are limited to the scenarios investigated. • Theory can lead toward methodological solutions (Cook and Kosorok, 2004 JASA). • Theory can drive scientific discovery. • Some results are beautiful.
  15. 15. Data Mining Versus Inference • Data mining is summarizing and representing data no matter how complicated • Inference is determining valid measures of uncertainty • Patterns obtained from data mining can be misleading • Inference without data mining may miss important structure
  16. 16. The Core of Statistics • Statistics is the science of science • How do we learn from our world and draw meaningful and valid conclusions from it? • Need both data mining and valid inference • Requires a unique kind of intuition • Needs many different intellectual perspectives • One of the most challenging of all fields
  17. 17. Everyone Needs Core Literacy • All statisticians need to know enough theory to have core literacy about statistics and to be able to problem solve • All statisticians need to know enough about applications to know what is important • All biostatisticians need to know enough statistical methods to be useful in practice • The purpose of a Ph.D. in Biostatistics is to enable the creation of new methodology
  18. 18. Semiparametric Inference • The study of statistical models with parametric and/or nonparametric parts • Can achieve trade-off between scientific meaning and model “robustness” • Estimation and inference are often hard • There exists an efficiency bound for parametric and some nonparametric parts • NPMLE, testing and estimating equations
  19. 19. Empirical Processes • Tools for complex model inference and high dimensional data • Can determine universal properties of semiparametric methods: – Consistency – Rate of convergence – Limiting distributions – Valid inference (empirical process bootstrap) • Empirical processes are everywhere
  20. 20. The Road Ahead • Whatever you choose to do, the core statistical theory classes will help you. • Be patient as your learn. • Be willing to work hard (struggle is good). • It takes many different kinds of thinkers with different learning styles. • There are important discoveries to be made in both applications and theory.