Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- AI and Machine Learning Demystified... by Carol Smith 3150178 views
- The AI Rush by Jean-Baptiste Dumont 818770 views
- 10 facts about jobs in the future by Pew Research Cent... 470027 views
- 2017 holiday survey: An annual anal... by Deloitte United S... 770276 views
- Harry Surden - Artificial Intellige... by Harry Surden 426801 views
- Inside Google's Numbers in 2017 by Rand Fishkin 986658 views

505 views

Published on

Published in:
Science

No Downloads

Total views

505

On SlideShare

0

From Embeds

0

Number of Embeds

145

Shares

0

Downloads

7

Comments

0

Likes

1

No embeds

No notes for slide

- 1. A shrinkage estimator for causal inference in low-dimensional data Maarten van Smeden Research meeting, department of Clinical Epidemiology, LUMC Leiden, February 13, 2018
- 2. Shrinkage - example
- 3. Shrinkage - example
- 4. Shrinkage - example
- 5. 1961 James and Stein. Estimation with quadratic loss. Proceedings of the fourth Berkeley symposium on mathematical statistics and probability. Vol. 1. 1961.
- 6. 1977 Efron and Morris (1977). Stein′s paradox in statistics. Scientific American, 236 (5): 119–127.
- 7. 1977 Efron and Morris (1977). Stein′s paradox in statistics. Scientific American, 236 (5): 119–127.
- 8. Shrinkage and overfitting (prediction) Overfitting of prediction models: model predictions of the expected probability (risk) in new individuals too extreme. By shrinkage of the predictor effects, the expected risks become less extreme.
- 9. Shrinkage for prediction literature (small selection)
- 10. Shrinkage in causal inference context? • Shrinkage estimators are used often to improve predictions • Are they useful to answer causal questions?
- 11. Why not use the best fitting line?
- 12. Remainder, consider the simple(st) situation: • Binary logistic regression (binary outcome, 1 exposure, P-1 confounders) • Interest is in: conditional log-odds ratio for exposure - outcome relation Assumptions are (met): • Linear effects (in logit) and no interactions • No unmeasured confounding • ‘Low dimensional’: N >> P • IID sample (i.e., no clustering/nesting/matching/….) • No estimation issues (i.e., no collinearity/separation/….) • No missing data • No measurement error • No outliers • No colliders • Data not very sparse (e.g. outcome events are not extremely rare) • No data-driven variable selection (DAG predefined)
- 13. Remainder, consider the simple(st) situation: YX1 X2 X4 X3
- 14. Two-line R analysis > df <- read.csv(“mydata.csv”) > glm(Y~X1+X2+X3+X4, family=“binomial”, data=df) Call: glm(formula = Y ~ X1 + X2 + X3 + X4, family = "binomial", data = df) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.04396 0.37171 0.118 0.90587 X1 1.68899 0.57755 2.924 0.00345 X2 0.73910 0.48419 1.526 0.12690 X3 1.04510 0.44755 2.335 0.01954 X4 -0.76366 0.41490 -1.841 0.06568 Null deviance: 69.235 on 49 degrees of freedom Residual deviance: 45.504 on 45 degrees of freedom Numerical example (data were simulated)
- 15. Behind the software scenes
- 16. Sources of bias Epidemiology text-books: • Confounding bias • Information bias • Selection bias
- 17. Sources of bias Epidemiology text-books: • Confounding bias • Information bias • Selection bias Statistics text-books: • Estimator: consistency • Estimator: (finite sample) bias
- 18. ML log(OR): consistent but not unbiased
- 19. ML log(OR): consistent but not unbiased
- 20. ML log(OR): consistent but not unbiased
- 21. ML log(OR): consistent but not unbiased
- 22. ML log(OR): consistent but not unbiased
- 23. ML log(OR): consistent but not unbiased
- 24. ML log(OR): consistent but not unbiased
- 25. ML log(OR): consistent but not unbiased
- 26. Formal proof given in Richardson comment in Stat Med (1985) that this proof was preceded by the same proof in Anderson and Richardson, 1979, Techometrics
- 27. Informal proof • Simulate 1 exposure and 3 confounders (multivariate standard normal with 0.1 equal pairwise correlations) • Exposure and confounders related to outcome with equal multivariable odds-ratios of 2. • 1,000 simulation samples of N = 50 • Consistency: create 1,000 meta-dataset of increasing size: meta- dataset r consists of each created dataset up to r; Outcome: difference between meta-data estimates of exposure effect and true value (log(OR) = log(2)) • Bias: calculate difference estimate of exposure effect and true value for each of the created datasets up to r; Outcome: difference between average of exposure effect estimates and true value (log(OR) = log(2))
- 28. Simulation - result 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration consistency
- 29. 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration ● consistency ~2% overestimated at N = 50,000 Simulation - result
- 30. Simulation - result 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration ● ● consistency bias ~2% overestimated at N = 50,000
- 31. Simulation - result 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration ● ● consistency bias ~2% overestimated at N = 50,000 ~25% overestimated at (N = 50, 1000 replications)
- 32. Simulation - summary • The magnitude of bias on the original scale (log(OR)) was about 25% -> when evaluated on the OR-scale: bias is about 50%(!!!!) • It is surprisingly easy to simulate situations that yield much larger bias (and much smaller) • The amount of bias depends on the number of confounders, the (true) effect sizes of each variable and the size of the smallest outcome group (i.e. the prevalence of events). Bias is in direction of extreme effects and has been observed for samples where N >> 1000. For more details, see van Smeden et al. BMC MRM 2016
- 33. Implication
- 34. Implication Decreasing sample size How we usually think about sample size
- 35. Decreasing sample size Based on the preceding simulations Implication Decreasing sample size
- 36. David Firth’s solution Web of Science: cited ~900 times (21% of publications from 2017!)
- 37. David Firth’s solution
- 38. David Firth’s solution
- 39. David Firth’s solution
- 40. David Firth’s solution • Firth’s ”correction” aims to reduce finite sample bias in maximum likelihood estimates, applicable to logistic regression • It makes clever use of the “Jeffries prior” (from Bayesian literature) to penalize the log-likelihood, which shrinks the estimated coefficients • It has a nice theoretical justifications, but does it work well?
- 41. 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration consistency bias Simulation – ML vs Firth’s corrected estimates ML 0 200 400 600 800 1000 −0.10.00.10.20.3 iteration consistency bias Firth’s correction Estimated bias reduced from ~25% with Maximum likelihood to ~ 3% with Firth’s correction.
- 42. More elaborate simulations
- 43. More elaborate simulations Events per variable Bias(b1 ML ) 15 30 45 60 75 90 105 120 135 150 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 Events per variable Bias(b1 FR ) 15 30 45 60 75 90 105 120 135 150 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 Events per variable Bias(b1 ML ) 15 30 45 60 75 90 105 120 135 150 0.0 0.1 0.2 Events per variable Bias(b1 ML ) 6 12 18 24 30 0 0.1 0.2 0.3 Events per variable Bias(b1 ML ) 6 10 14 18 22 26 30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20 0.25 Events per variable Bias(b1 FR ) 15 30 45 60 75 90 105 120 135 150 0.0 0.1 0.2 Events per variable Bias(b1 FR ) 6 12 18 24 30 0 0.1 0.2 0.3 Events per variable Bias(b1 FR ) 6 10 14 18 22 26 30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20 0.25 Top: ML, Bottom: Firth’s correction Averaged over 465 simulation with 10,000 replications each
- 44. Two becomes three-line R analysis > require(“logistf”) > df <- read.csv(“mydata.csv”) > logistf(Y~X1+X2+X3+X4, firth=T, data=df) Numerical example (data were simulated) logistf(formula = Y ~ X1 + X2 + X3 + X4, data = df, firth = T) coef se(coef) lb.95 ub.95 Chisq p (Intercept) 0.0405 0.3547 -0.6506 0.7267 0.0137 0.9067 X1 1.4319 0.5218 0.5160 2.5844 10.2622 0.0013 X2 0.6193 0.4502 -0.1924 1.5789 2.1967 0.1383 X3 0.8659 0.4036 0.1605 1.7738 6.0391 0.0139 X4 -0.6336 0.3770 - 1.4677 0.0331 3.4435 0.0635 Likelihood ratio test=20.53 on 4 df, p=0.0004, n=50
- 45. Other properties of Firth’s correction Compared to ML: • It reduces both bias and mean squared error of the effect estimator
- 46. Simulations - MSE 0 200 400 600 800 1000 0.00.10.20.30.40.50.60.7 iteration MSE ML Firth
- 47. Other properties of Firth’s correction Compared to ML: • It reduces both bias and mean squared error of the effect estimator • It typically comes with smaller standard errors (and corresponding confidence intervals) • It is similarly easy to apply in R and Stata, without noticeable extra computing time • It is large-sample equivalent: for larger samples the estimates will hardly differ between Firth’s correction and ML • It remains finite in case of “separation” (a case where ML fails)
- 48. Example of separation
- 49. Example of separation
- 50. What is the catch? Firth’s correction needs some modifications to intercept estimation to become suitable for developing prediction models
- 51. Concluding remarks • Standard logistic regression that is based on maximum likelihood estimation produces estimates that are finite sample biased. When uncorrected, over-optimistic estimates of effect may be produced • Firth’s correction is a penalized estimation procedure that shrinks the coefficients, thereby removing a large part of the finite sample bias • Firth’s correction is also available for other popular models, such as Cox models, conditional logistic regression models, Poisson regression and multinomial logistic regression models. These models also produce estimates that are finite sample biased • The use of other shrinkage estimators, such as Ridge, LASSO or Elastic Net, should not be taken lightly when causal inference is concerned. These approaches are designed to create bias in effect estimators, rather than resolve it
- 52. • The handouts of this presentation are available via: https://www.slideshare.net/MaartenvanSmeden • R code to rerun and expand the simulations presented are available via: https://github.com/MvanSmeden/LRMbias • Unfamiliar with R? Learn the basics in just two hours via: http://www.r-tutorial.nl/ • Contact: M.van_Smeden@lumc.nl

No public clipboards found for this slide

Be the first to comment