Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A shrinkage estimator for causal inference in low-dimensional data

531 views

Published on

Slides of talk given at research meeting of departement clinical epidemiology in Leiden (Netherlands)

Published in: Science
  • Be the first to comment

A shrinkage estimator for causal inference in low-dimensional data

  1. 1. A shrinkage estimator for causal inference in low-dimensional data Maarten van Smeden Research meeting, department of Clinical Epidemiology, LUMC Leiden, February 13, 2018
  2. 2. Shrinkage - example
  3. 3. Shrinkage - example
  4. 4. Shrinkage - example
  5. 5. 1961 James and Stein. Estimation with quadratic loss. Proceedings of the fourth Berkeley symposium on mathematical statistics and probability. Vol. 1. 1961.
  6. 6. 1977 Efron and Morris (1977). Stein′s paradox in statistics. Scientific American, 236 (5): 119–127.
  7. 7. 1977 Efron and Morris (1977). Stein′s paradox in statistics. Scientific American, 236 (5): 119–127.
  8. 8. Shrinkage and overfitting (prediction) Overfitting of prediction models: model predictions of the expected probability (risk) in new individuals too extreme. By shrinkage of the predictor effects, the expected risks become less extreme.
  9. 9. Shrinkage for prediction literature (small selection)
  10. 10. Shrinkage in causal inference context? • Shrinkage estimators are used often to improve predictions • Are they useful to answer causal questions?
  11. 11. Why not use the best fitting line?
  12. 12. Remainder, consider the simple(st) situation: • Binary logistic regression (binary outcome, 1 exposure, P-1 confounders) • Interest is in: conditional log-odds ratio for exposure - outcome relation Assumptions are (met): • Linear effects (in logit) and no interactions • No unmeasured confounding • ‘Low dimensional’: N >> P • IID sample (i.e., no clustering/nesting/matching/….) • No estimation issues (i.e., no collinearity/separation/….) • No missing data • No measurement error • No outliers • No colliders • Data not very sparse (e.g. outcome events are not extremely rare) • No data-driven variable selection (DAG predefined)
  13. 13. Remainder, consider the simple(st) situation: YX1 X2 X4 X3
  14. 14. Two-line R analysis > df <- read.csv(“mydata.csv”) > glm(Y~X1+X2+X3+X4, family=“binomial”, data=df) Call: glm(formula = Y ~ X1 + X2 + X3 + X4, family = "binomial", data = df) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.04396 0.37171 0.118 0.90587 X1 1.68899 0.57755 2.924 0.00345 X2 0.73910 0.48419 1.526 0.12690 X3 1.04510 0.44755 2.335 0.01954 X4 -0.76366 0.41490 -1.841 0.06568 Null deviance: 69.235 on 49 degrees of freedom Residual deviance: 45.504 on 45 degrees of freedom Numerical example (data were simulated)
  15. 15. Behind the software scenes
  16. 16. Sources of bias Epidemiology text-books: • Confounding bias • Information bias • Selection bias
  17. 17. Sources of bias Epidemiology text-books: • Confounding bias • Information bias • Selection bias Statistics text-books: • Estimator: consistency • Estimator: (finite sample) bias
  18. 18. ML log(OR): consistent but not unbiased
  19. 19. ML log(OR): consistent but not unbiased
  20. 20. ML log(OR): consistent but not unbiased
  21. 21. ML log(OR): consistent but not unbiased
  22. 22. ML log(OR): consistent but not unbiased
  23. 23. ML log(OR): consistent but not unbiased
  24. 24. ML log(OR): consistent but not unbiased
  25. 25. ML log(OR): consistent but not unbiased
  26. 26. Formal proof given in Richardson comment in Stat Med (1985) that this proof was preceded by the same proof in Anderson and Richardson, 1979, Techometrics
  27. 27. Informal proof • Simulate 1 exposure and 3 confounders (multivariate standard normal with 0.1 equal pairwise correlations) • Exposure and confounders related to outcome with equal multivariable odds-ratios of 2. • 1,000 simulation samples of N = 50 • Consistency: create 1,000 meta-dataset of increasing size: meta- dataset r consists of each created dataset up to r; Outcome: difference between meta-data estimates of exposure effect and true value (log(OR) = log(2)) • Bias: calculate difference estimate of exposure effect and true value for each of the created datasets up to r; Outcome: difference between average of exposure effect estimates and true value (log(OR) = log(2))
  28. 28. Simulation - result 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration consistency
  29. 29. 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration ● consistency ~2% overestimated at N = 50,000 Simulation - result
  30. 30. Simulation - result 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration ● ● consistency bias ~2% overestimated at N = 50,000
  31. 31. Simulation - result 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration ● ● consistency bias ~2% overestimated at N = 50,000 ~25% overestimated at (N = 50, 1000 replications)
  32. 32. Simulation - summary • The magnitude of bias on the original scale (log(OR)) was about 25% -> when evaluated on the OR-scale: bias is about 50%(!!!!) • It is surprisingly easy to simulate situations that yield much larger bias (and much smaller) • The amount of bias depends on the number of confounders, the (true) effect sizes of each variable and the size of the smallest outcome group (i.e. the prevalence of events). Bias is in direction of extreme effects and has been observed for samples where N >> 1000. For more details, see van Smeden et al. BMC MRM 2016
  33. 33. Implication
  34. 34. Implication Decreasing sample size How we usually think about sample size
  35. 35. Decreasing sample size Based on the preceding simulations Implication Decreasing sample size
  36. 36. David Firth’s solution Web of Science: cited ~900 times (21% of publications from 2017!)
  37. 37. David Firth’s solution
  38. 38. David Firth’s solution
  39. 39. David Firth’s solution
  40. 40. David Firth’s solution • Firth’s ”correction” aims to reduce finite sample bias in maximum likelihood estimates, applicable to logistic regression • It makes clever use of the “Jeffries prior” (from Bayesian literature) to penalize the log-likelihood, which shrinks the estimated coefficients • It has a nice theoretical justifications, but does it work well?
  41. 41. 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration consistency bias Simulation – ML vs Firth’s corrected estimates ML 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration consistency bias Firth’s correction Estimated bias reduced from ~25% with Maximum likelihood to ~ 3% with Firth’s correction.
  42. 42. More elaborate simulations
  43. 43. More elaborate simulations Events per variable Bias(b1 ML ) 15 30 45 60 75 90 105 120 135 150 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 Events per variable Bias(b1 FR ) 15 30 45 60 75 90 105 120 135 150 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 Events per variable Bias(b1 ML ) 15 30 45 60 75 90 105 120 135 150 0.0 0.1 0.2 Events per variable Bias(b1 ML ) 6 12 18 24 30 0 0.1 0.2 0.3 Events per variable Bias(b1 ML ) 6 10 14 18 22 26 30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20 0.25 Events per variable Bias(b1 FR ) 15 30 45 60 75 90 105 120 135 150 0.0 0.1 0.2 Events per variable Bias(b1 FR ) 6 12 18 24 30 0 0.1 0.2 0.3 Events per variable Bias(b1 FR ) 6 10 14 18 22 26 30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20 0.25 Top: ML, Bottom: Firth’s correction Averaged over 465 simulation with 10,000 replications each
  44. 44. Two becomes three-line R analysis > require(“logistf”) > df <- read.csv(“mydata.csv”) > logistf(Y~X1+X2+X3+X4, firth=T, data=df) Numerical example (data were simulated) logistf(formula = Y ~ X1 + X2 + X3 + X4, data = df, firth = T) coef se(coef) lb.95 ub.95 Chisq p (Intercept) 0.0405 0.3547 -0.6506 0.7267 0.0137 0.9067 X1 1.4319 0.5218 0.5160 2.5844 10.2622 0.0013 X2 0.6193 0.4502 -0.1924 1.5789 2.1967 0.1383 X3 0.8659 0.4036 0.1605 1.7738 6.0391 0.0139 X4 -0.6336 0.3770 - 1.4677 0.0331 3.4435 0.0635 Likelihood ratio test=20.53 on 4 df, p=0.0004, n=50
  45. 45. Other properties of Firth’s correction Compared to ML: • It reduces both bias and mean squared error of the effect estimator
  46. 46. Simulations - MSE 0 200 400 600 800 1000 0.00.10.20.30.40.50.60.7 iteration MSE ML Firth
  47. 47. Other properties of Firth’s correction Compared to ML: • It reduces both bias and mean squared error of the effect estimator • It typically comes with smaller standard errors (and corresponding confidence intervals) • It is similarly easy to apply in R and Stata, without noticeable extra computing time • It is large-sample equivalent: for larger samples the estimates will hardly differ between Firth’s correction and ML • It remains finite in case of “separation” (a case where ML fails)
  48. 48. Example of separation
  49. 49. Example of separation
  50. 50. What is the catch? Firth’s correction needs some modifications to intercept estimation to become suitable for developing prediction models
  51. 51. Concluding remarks • Standard logistic regression that is based on maximum likelihood estimation produces estimates that are finite sample biased. When uncorrected, over-optimistic estimates of effect may be produced • Firth’s correction is a penalized estimation procedure that shrinks the coefficients, thereby removing a large part of the finite sample bias • Firth’s correction is also available for other popular models, such as Cox models, conditional logistic regression models, Poisson regression and multinomial logistic regression models. These models also produce estimates that are finite sample biased • The use of other shrinkage estimators, such as Ridge, LASSO or Elastic Net, should not be taken lightly when causal inference is concerned. These approaches are designed to create bias in effect estimators, rather than resolve it
  52. 52. • The handouts of this presentation are available via: https://www.slideshare.net/MaartenvanSmeden • R code to rerun and expand the simulations presented are available via: https://github.com/MvanSmeden/LRMbias • Unfamiliar with R? Learn the basics in just two hours via: http://www.r-tutorial.nl/ • Contact: M.van_Smeden@lumc.nl

×