Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving predictions: Lasso, Ridge and Stein's paradox

1,092 views

Published on

Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)

Published in: Science
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download Full EPUB Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download Full doc Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download PDF EBOOK here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download EPUB Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download doc Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • You can hardly find a student who enjoys writing a college papers. Among all the other tasks they get assigned in college, writing essays is one of the most difficult assignments. Fortunately for students, there are many offers nowadays which help to make this process easier. The best service which can help you is ⇒ www.WritePaper.info ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I wanted to tell you about my experience with this breast enlargement guide. I'm 24 years old and I've spent pretty much my whole life wishing I had slightly larger breasts. I'm normally a size A, I'd love to be a size B, especially when I wear certain kinds of low cut tops. I have a good figure, and I like what I have, but every now and then I would like a little bit more. Well, it's actually working! There is a noticeable difference in the size of my breasts! It's only been 10 days, and they say that the true final results won't appear until about the 4-6 month time frame, but based on what I can see, there's definitely a difference. My breasts are standing out more, and they are fuller and rounder. Sometimes I am able to go out without a bra, now I can't. And I have a feeling that I might have to actually buy new bras if the growth continues. I'm looking forward to seeing how the next month goes. There might actually be a size B in my future! ✔✔✔ https://t.cn/A6Liz7kD
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I thought I was good at writing essays all through freshman and sophomore year of high school but then in my junior year I got this awful teacher (I doubt you’re reading this, but screw you Mr. Murphy) He made us write research papers or literature analysis essays that were like 15 pages long. It was ridiculous. Anyway, I found ⇒ www.HelpWriting.net ⇐ and since then I’ve been ordering term papers from this one writer. His stuff is amazing and he always finishes it super quickly. Good luck with your order!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Improving predictions: Lasso, Ridge and Stein's paradox

  1. 1. Improving predictions: Ridge, Lasso and Stein’s paradox RIVM Epi masterclass (22/3/18) Maarten van Smeden Post-doc clinical epidemiology/medical statistics, Leiden University Medical Center
  2. 2. This slide deck available: https://www.slideshare.net/MaartenvanSmeden
  3. 3. Diagnostic / prognostic prediction Clinical prediction models •Diagnostic prediction: probability of disease D = d in patient i? •Prognostic prediction: probability of developing health outcome Y = y within (or up to) T years in patient i?
  4. 4. Apgar score (since 1952)
  5. 5. Just this morning
  6. 6. Rise of prediction models •>110 models for prostate cancer (Shariat 2008) •>100 models for Traumatic Brain Injury (Perel 2006) •83 models for stroke (Counsell 2001) •54 models for breast cancer (Altman 2009) •43 models for type 2 diabetes (Collins 2011; Dieren 2012) •31 models for osteoporotic fracture (Steurer 2011) •29 models in reproductive medicine (Leushuis 2009) •26 models for hospital readmission (Kansagara 2011) •>25 models for length of stay in cardiac surgery (Ettema 2010) •>350 models for CVD outcomes (Damen 2016) The overview was created and first presented by Prof. KGM Moons (Julius Center, UMC Utrecht)
  7. 7. Reality Bell et al. BMJ 2015;351:h5639
  8. 8. This talk Key message Regression shrinkage strategies, such as Ridge and Lasso, have the ability to dramatically improve predictive performance of prediction models Outline •What is wrong with traditional prediction model development strategies? •What is Ridge and Lasso? •Some thoughts on when to consider Ridge/Lasso.
  9. 9. Setting •Development data: with subjects (i = 1, . . . , N) for which an outcome is observed (y: the outcome to predict), and P predictor variables (X: explanatory variables to make a prediction of y) •(External) validation data: with subjects that were not part of the development data but have the same outcome and predictor variables observed. Perhaps subjects from a different geographical area •The goal is to develop a prediction model with high as possible predictive performance in validation (out-of-sample performance); performance in development sample is not directly relevant {•I’ll focus on the linear model for illustrative reasons} {•N >> P}
  10. 10. Setting •Development data: with subjects (i = 1, . . . , N) for which an outcome is observed (y: the outcome to predict), and P predictor variables (X: explanatory variables to make a prediction of y) •(External) validation data: with subjects that were not part of the development data but have the same outcome and predictor variables observed. Perhaps subjects from a different geographical area •The goal is to develop prediction model with high as possible predictive performance in validation (out-of-sample performance); performance in development sample is not directly relevant •I’ll focus on the linear model for illustrative reasons •N >> P
  11. 11. Linear model: OLS regression Linear regression model y = f(X) + , ∼ N(0, σ2 ) •With linear main effects only: ˆf(X) = ˆβ0 + ˆβ1x1 + ˆβ2X2 + . . . + ˆβP xP •Find β that minimizes (in-sample) squared prediction error: i (yi − ˆf(xi)) •Closed form solution: (X X)−1 X y Question Is ˆf(.) the best estimator to predict for future individuals?
  12. 12. 1955: Stein’s paradox
  13. 13. 1955: Stein’s paradox Stein’s paradox in words (rather simplified) When one has three or more units (say, individuals), and for each unit one can calculate an average score (say, average blood pressure), then the best guess of future observations (blood pressure) for each unit is NOT its average score.
  14. 14. 1961: James-Stein estimator: the next Berkley Symposium James and Stein. Estimation with quadratic loss. Proceedings of the fourth Berkeley symposium on mathematical statistics and probability. Vol. 1. 1961.
  15. 15. 1977: Baseball example Efron and Morris (1977). Stein’s paradox in statistics. Scientific American, 236 (5): 119-127.
  16. 16. Lessons from Stein’s paradox •Probably among the most surprising (and initially doubted) phenomena in statistics •Now a large “family”: shrinkage estimators reduce prediction variance to an extent that typically outweighs the bias that is introduced •Bias/variance trade-off principle has motivated many statistical developments Bias, variance and prediction error1 Expected prediction error = irreducible error + bias2 + variance 1 Friedman et al. (2001). The elements of statistical learning. Vol. 1. New York: Springer series.
  17. 17. Illustration of regression shrinkage
  18. 18. Illustration of regression shrinkage
  19. 19. Illustration of regression shrinkage
  20. 20. Illustration of regression shrinkage
  21. 21. Illustration of regression shrinkage
  22. 22. Illustration of shrinkage
  23. 23. Illustration of shrinkage
  24. 24. Illustration of shrinkage
  25. 25. Illustration of shrinkage
  26. 26. Illustration of shrinkage
  27. 27. Illustration of shrinkage Was I just lucky?
  28. 28. Simulate 100 times
  29. 29. Not just lucky •5% reduction in MSPE just by shrinkage estimator •Van Houwelingen and le Cessie’s heuristic shrinkage factor
  30. 30. Heuristic argument for shrinkage calibration plot predicted observed ideal model Typical calibration plot: “overfitting”
  31. 31. Heuristic argument for shrinkage calibration plot predicted observed ideal model Typical calibration plot: “overfitting”
  32. 32. Overfitting "Idiosyncrasies in the data are fitted rather than generalizable patterns. A model may hence not be applicable to new patients, even when the setting of application is very similar to the development setting." Steyerberg (2009). Clinical Prediction Models.
  33. 33. Ridge regression Objective i (yi − ˆf(xi))2 + λ P p=1 ˆβ2 p •Note: λ = 0 corresponds to the OLS solution •Closed form solution: (X X+λIp)−1 X y, where Ip is a P-dimensional identity matrix •In most software programs X is standardized and y centered for estimation (output is mostly transformed back to original scale) The challenge of ridge regression finding a good value for the "tuning parameter": λ.
  34. 34. Diabetes data Source: https://web.stanford.edu/ hastie/Papers/LARS/ (19/3/2018) Details: Efron et al. (2004) Least angle regression. The annals of Statistics.
  35. 35. Diabetes data
  36. 36. K-fold cross-validation to find “optimal” λ •Usually K = 10 or K = 5 •Partition the dataset into K non-overlapping sub-datasets of equal size (disjoint subsets) •Fit statistical model on all but 1 of the subsets (training set), and evaluate performance of the model in the left-out subset (test set) •Fit and evaluate K times
  37. 37. First fold of cross-validation (Diabetes data)
  38. 38. 5-fold cross-validation (Diabetes data)
  39. 39. Diabetes data: Ridge regression results AGE SEX BMI BP s1 s2 s3 s4 s5 s6 OLS -10.00 -239.80 519.80 324.40 -792.2 476.70 -101.00 177.10 751.30 67.60 Ridge -9.93 -239.68 520.11 324.25 -763.5 454.28 -88.23 173.37 740.69 67.66 Regression coefficients (data were standardized, outcome centered) •log(λ) = 1.60 minimized average cross-validation MSPE •R-code Ridge regression (glmnet package): require(glmnet) require(glmnetUtils) df <- read.table("diabetes.txt",header=T) rcv <- cv.glmnet(y~.,df,alpha=0,family="gaussian",nfolds=5) fitr <- glmnet(y~.,data,alpha=0,lambda=rcv$lambda.min) coef(fitr)
  40. 40. Lasso regression Objective i (yi − ˆf(xi))2 + λ2 P p=1 |ˆβp| •Remember Ridge regression: i (yi − ˆf(xi))2 + λ P p=1 ˆβ2 p •No closed form solution for Lasso: estimation regression proceeds iteratively •Like Ridge regression, cross-validation for estimating λ2
  41. 41. Diabetes data: Lasso regression results AGE SEX BMI BP s1 s2 s3 s4 s5 s6 OLS -10.00 -239.80 519.80 324.40 -792.20 476.70 -101.00 177.10 751.30 67.60 Ridge -9.93 -239.68 520.11 324.25 -763.50 454.28 -88.23 173.37 740.69 67.66 Lasso 0.00 -184.39 520.52 290.18 -87.53 0.00 219.67 0.00 504.93 48.08 Regression coefficients (data were standardized, outcome centered) •Lasso shrinks some variables to zero: built-in variable selection (!!!) •R-code Lasso regression (glmnet package): require(glmnet) require(glmnetUtils) df <- read.table("diabetes.txt",header=T) lcv <- cv.glmnet(y~.,df,alpha=1,family="gaussian",nfolds=5) fitl <- glmnet(y~.,data,alpha=1,lambda=lcv$lambda.min) coef(fitr)
  42. 42. The argument to use Ridge/Lasso Key message Regression shrinkage strategies, such as Ridge and Lasso, have the ability to dramatically improve predictive performance of prediction models
  43. 43. Some arguments against Ridge/Lasso •Interpretation of regression coefficient •Shrinkage not needed due to sufficient sample size (e.g. based on rule of thumb) •Cross-validation can lead to unstable estimation of the λ parameter •Difficult to implement
  44. 44. Interpretation of regression coefficients •Shrinkage estimators such as Ridge and Lasso introduce bias in (‘shrink’) the regression coefficient by design •Most software programs not provide standard errors and confidence intervals for Ridge/Lasso regression coefficients •Interpretation of coefficients is not / should not be the goal of a prediction model Note Popular approaches to develop prediction models yield biased regression coefficients and provide uninterpretable confidence intervals
  45. 45. Variable selection without shrinkage
  46. 46. Parameters may need shrinkage to become unbiased Available at: https://www.slideshare.net/MaartenvanSmeden
  47. 47. Some arguments against Ridge/Lasso •Interpretation of regression coefficient •Shrinkage not needed due to sufficient sample size •Cross-validation can lead to unstable estimation of the λ parameter •Difficult to implement
  48. 48. Sufficient sample size? Benefit of regression shrinkage dependents on: •Sample size •Correlations between predictor variables •Sparsity of outcome and predictor variables •The irreducible error component •Type of outcome (continuous, binary, count, time-to-event,. . . ) •Number of candidate predictor variables •Non-linear/interaction effects •Weak/strong predictor balance How to know that there is no need for shrinkage at some sample size?
  49. 49. Is a rule of thumb a rule of dumb1? 1 direct quote from tweet by prof Stephen Senn: https://twitter.com/stephensenn/status/936213710770753536
  50. 50. Some arguments against Ridge/Lasso •Interpretation of regression coefficient •Shrinkage not needed due to sufficient sample size (e.g. based on rule of thumb) •Cross-validation can lead to unstable estimation of the λ parameter •Difficult to implement
  51. 51. Estimating Ridge/Lasso •“Programming” Ridge/Lasso regression isn’t hard with user friendly software such as the glmnet package in R •Getting it right might be a bit tougher than traditional approaches. It’s all about the tuning parameter (λ) •K-fold cross-validation makes arbitrary partitions of data which may make estimating the tuning parameter unstable (there are some suggestions to circumvent the problems). Note: this is not a flaw of cross-validation: it means that there is probably insufficient data to estimate how much shrinkage is really needed!
  52. 52. Closing remarks •Shrinkage is highly recommended when developing a prediction model (e.g. see Tripod guidelines for reporting) •Software and methodological developments have made Lasso and Ridge regression relatively easy to implement and computationally fast •The cross-validation procedure can provide insights about possible overfitting (much like propensity score analysis can provide information about balance) •Consider the Lasso instead of traditional backward/forward selection strategies
  53. 53. Slide deck available: https://www.slideshare.net/MaartenvanSmeden Free R tutorial (~ 2 hours): http://www.r-tutorial.nl/
  54. 54. AI and machine learning
  55. 55. AI and machine learning

×