Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Estimation and Accuracy after Model Selection

799 views

Published on

We discuss the main results in Estimation and Accuracy after Model Selection by
Bradley Efron. This well written article, addresses how the variability in the model
selection process can lead to unstable post-selection inferences. The main result is an
easy to use, closed form formula for the standard deviation of a smoothed bootstrap
(or bagged ) estimator. A projection type argument is given in the paper to prove that
the proposed estimator is always less than or equal to the commonly used bootstrap
standard error. We investigate the validity of these results on the prostate data set, a
simulated data set where p > n, and the african data set as a representative example
for GLM. We find substantial gains in accuracy of post selection inference confidence
intervals for all subset selection, and modest gains when a regularization procedure is
used for model selection.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Estimation and Accuracy after Model Selection

  1. 1. Bradley Efron Motivation Bootstrap Smoothing Results Estimation and Accuracy after Model Selection by Bradley Efron (Stanford) Sahir Rai Bhatnagar McGill University sahir.bhatnagar@mail.mcgill.ca April 7, 2014 1 / 43
  2. 2. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes Born in St. Paul, Minnesota in 1938 to Jewish-Russian immigrants 2 / 43
  3. 3. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes Born in St. Paul, Minnesota in 1938 to Jewish-Russian immigrants B.S., Mathematics Caltech (1960) Ph.D., Statistics (1964) under the direction of Rupert Miller and Herb Solomon Professor of Statistics at Stanford for the past 50 years 2 / 43
  4. 4. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes Best known for the Bootstrap, Annals of Statistics (1977) Founding Editor Annals of Applied Statistics Awarded Guy Medal in Gold from RSS (2014) (34 awarded since 1892 including Rao, Cox, Fisher, Nelder) 3 / 43
  5. 5. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes National Medal of Science 2005 Established by Congress in 1959 and administered by the National Science Foundation, the medal is the nation’s highest scientific honour 4 / 43
  6. 6. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes “Statistics is the science of information gathering, especially when the information arrives in little pieces instead of big ones” 5 / 43
  7. 7. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes “Statistics is the science of information gathering, especially when the information arrives in little pieces instead of big ones” “Statistics did not come naturally to me. Dads keeping score for the baseball league helped a lot” 5 / 43
  8. 8. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes “Statistics is the science of information gathering, especially when the information arrives in little pieces instead of big ones” “Statistics did not come naturally to me. Dads keeping score for the baseball league helped a lot” “I spent the first year at Stanford in the Math Department...After, I started taking stats courses, which I thought would be easy. In fact I found them harder” 5 / 43
  9. 9. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example 6 / 43
  10. 10. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example 7 / 43
  11. 11. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates 8 / 43
  12. 12. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates Identify list of candidate models M 2p submodels linear, quadratic, cubic . . . 8 / 43
  13. 13. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates Identify list of candidate models M 2p submodels linear, quadratic, cubic . . . Perform Model Selection (see Abbas class notes) 8 / 43
  14. 14. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates Identify list of candidate models M 2p submodels linear, quadratic, cubic . . . Perform Model Selection (see Abbas class notes) Do inference based on chosen model Prediction Confidence Intervals 8 / 43
  15. 15. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates Identify list of candidate models M 2p submodels linear, quadratic, cubic . . . Perform Model Selection (see Abbas class notes) Do inference based on chosen model Prediction Confidence Intervals Today’s Question: Should we care about the variability of the variable selection step in our post-selection inference? 8 / 43
  16. 16. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example An Example n = 164 men took Cholestyramine (meant to reduce cholesterol in the blood) for 7 years x: a compliance measure adjusted : x ∼ N(0, 1) y: cholesterol decrease Perform a regression of y on x We want to predict cholesterol decrease for a given compliance value µ = E[y|x] 9 / 43
  17. 17. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example An Example Multiple Linear Regression Model Y = Xβ + , i ∼ N(0, σ2 ) 6 candidate models: M = {linear, quadratic, . . . , sextic, } e.g. y = β0 + β1x + β2x2 + . . . + β6x6 + Cp Criterion for Model Selection Cp(M) = SSres(M) n goodness of fit + 2σ2pM n complexity Use OLS estimate for β from chosen model and predict: ˆµ = Xˆβ 10 / 43
  18. 18. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example An Example: Nonparametric Bootstrap Analysis Bootstrap the data: data∗ = {(xj , yj )∗ , j = 1, . . . , n} where (xj , yj )∗ are drawn randomly with replacement from the original data data∗ → Cp M∗ → OLS ˆβ∗ M∗ → ˆµ∗ = XM∗ ˆβ∗ M∗ Repeat B = 4000 times 11 / 43
  19. 19. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Reproduced from Efron 2013 12 / 43
  20. 20. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example 13 / 43
  21. 21. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Prostate Data Examine relation between level of PSA and clinical measures n = 97 men who were about to receive prostatectomy x = (x1, . . . , x8): clinical measures (adjusted : x ∼ N(0, 1)) y = log PSA Perform regression of y on x 8 candidate models were identified using regsubsets and nbest=1 We want to estimate µj = E [y|xj ] , j = 1, . . . , 97 14 / 43
  22. 22. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example original estimate = 3.6 based on Cp chosen model 0 100 200 2 3 4 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 nonparametric bootsrap replications of the Cp chosen model; 60% of the replications greater than the original estimate 3.6 15 / 43
  23. 23. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example original estimate = 3.6 based on Cp chosen model 18% 22% 24% 0 30 60 90 120 3 4 fitted value µ^ 95 count model m3 m5 m7 Fitted values for subject 95, from B=4000 nonparametric bootsrap replications separated by three most frequently chosen models by Cp 16 / 43
  24. 24. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q 1% 18% 12% 22% 15% 24% 8% Model 7 ****** 3 4 m2 m3 m4 m5 m6 m7 m8 model fittedvalueµ^ 95 Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 nonparametric bootsrap samples 17 / 43
  25. 25. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Questions Are you convinced there is a problem in the way we do post-selection inference? 18 / 43
  26. 26. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Questions Are you convinced there is a problem in the way we do post-selection inference? Is the juice worth the squeeze ? 18 / 43
  27. 27. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Bagging (Breiman 1996) Replace original estimator ˆµ = t(y) with bootstrap average ˜µ = s(y) = 1 B B i=1 t(y∗ i ) y∗ i : ith bootstrap sample 19 / 43
  28. 28. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Bagging (Breiman 1996) Replace original estimator ˆµ = t(y) with bootstrap average ˜µ = s(y) = 1 B B i=1 t(y∗ i ) y∗ i : ith bootstrap sample Known as model averaging “If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy” 19 / 43
  29. 29. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) 20 / 43
  30. 30. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) Y ∗ ij =# of times jth data point appears in ith boot sample 20 / 43
  31. 31. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) Y ∗ ij =# of times jth data point appears in ith boot sample covj = cov(Y ∗ ij , t∗ i ) 20 / 43
  32. 32. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) Y ∗ ij =# of times jth data point appears in ith boot sample covj = cov(Y ∗ ij , t∗ i ) The non-parametric estimate of standard deviation for the ideal smoothed bootstrap statistic µ = s(y) = B−1 B i=1 t(y∗ i ) is 20 / 43
  33. 33. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) Y ∗ ij =# of times jth data point appears in ith boot sample covj = cov(Y ∗ ij , t∗ i ) The non-parametric estimate of standard deviation for the ideal smoothed bootstrap statistic µ = s(y) = B−1 B i=1 t(y∗ i ) is sd =   n j=1 cov2 j   1/2 20 / 43
  34. 34. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper Note that covj = cov(Y ∗ ij , t∗ i ) is an unknown quantity. Therefore we must estimate it. The estimate of standard deviation for µ = s(y) in the non-ideal case is sdB =   n j=1 cov2 j   1/2 covj = B−1 B i=1 Y ∗ ij − Y ∗ ·j (t∗ i − t∗ · ) Y ∗ ·j = B−1 B i=1 Y ∗ ij t∗ · = B−1 B i=1 t∗ i 21 / 43
  35. 35. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Improvement on Traditional Standard Error sdB =   n j=1 cov2 j   1/2 is always less than the bootstrap estimate of standard deviation for the unsmoothed statistic ˆsdB =   n j=1 (t∗ i − t∗ · )2   1/2 22 / 43
  36. 36. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Three Types 1 Standard ˆµ ± 1.96sdB 23 / 43
  37. 37. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Three Types 1 Standard ˆµ ± 1.96sdB 2 Percentile ˆµ∗(0.025) , ˆµ∗(0.975) 23 / 43
  38. 38. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Three Types 1 Standard ˆµ ± 1.96sdB 2 Percentile ˆµ∗(0.025) , ˆµ∗(0.975) 3 Smoothed ˜µ ± 1.96sdB 23 / 43
  39. 39. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion L1-Norm Penalty Functions Recall the optimization problem of interest: max β    n(β) − n p j=1 p(|βj |; λ)    24 / 43
  40. 40. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion LASSO, SCAD and MCP penalties LASSO (Tibshirani, 1996) p(|β|; λ) = λ|β| SCAD (Fan and Li, 2001 ) p (|β|; λ, γ) = λsign(β) I(|β|≤λ) + (γλ − |β|)+ (γ − 1)λ I(|β|>λ) , γ > 2 MCP (Zhang, 2010) p(|β|; λ, γ) = λ|β| − |β|2 2γ |β| ≤ γλ γλ2 2 |β| > γλ 25 / 43
  41. 41. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion Software Analysis was performed in R Implement LASSO using the glmnet package (Friedman, Hastie, Tibshirani, 2013) SCAD and MCP using the coordinate descent algorithm (Breheny and Huang, 2011) in the ncvreg package BIC and Cp model selection using the leaps package (Lumley, 2009) 26 / 43
  42. 42. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion MCP SCAD LASSO 0 50 100 150 200 2 3 4 2 3 4 2 3 4 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 nonparametric bootsrap replications 27 / 43
  43. 43. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion BIC Cp 0 100 200 300 −10 0 10 −10 0 10 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 nonparametric bootsrap replications 28 / 43
  44. 44. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion SCAD, MCP, LASSO LASSO SCAD MCP 0.0 0.5 1.0 length penalty type standard quantile smooth 95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for MCP, SCAD and LASSO penalties 29 / 43
  45. 45. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion BIC and Cp BIC Cp 0 5 10 15 20 length penalty type standard quantile smooth Length of 95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for Cp and BIC 30 / 43
  46. 46. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion Table : Prostate data, B=4000, Observation 95 model type fitted value sd length coverage LASSO standard 3.62 0.31 1.21 0.94 quantile 1.20 0.95 smooth 3.57 0.29 1.14 0.93 SCAD standard 3.60 0.35 1.37 0.95 quantile 1.33 0.95 smooth 3.62 0.33 1.28 0.93 MCP standard 3.60 0.35 1.38 0.96 quantile 1.35 0.95 smooth 3.61 0.33 1.29 0.94 BIC standard 5.50 4.75 18.62 0.84 quantile 16.05 0.95 smooth 3.22 3.46 13.55 0.83 Cp standard 5.13 5.11 20.02 0.86 quantile 16.15 0.95 smooth 0.64 4.40 17.24 0.97 31 / 43
  47. 47. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion An Example: Parametric Bootstrap Analysis Obtain OLS estimates ˆµOLS based on full model Generate y∗ ∼ N (ˆµOLS , I) Full Model Bootstrap y∗ → Cp M∗ , ˆβ∗ M∗ → ˆµ∗ = XM∗ ˆβ∗ M∗ Repeat B = 4000 times → t∗ ij = ˆµ∗ ij Smoothed Estimates sj = B−1 B i=1 t∗ ij 32 / 43
  48. 48. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion original estimate = 3.6 based on Cp chosen model 0 50 100 150 200 1 2 3 4 5 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 Parametric bootsrap replications of the Cp chosen model; 53% of the replications greater than the original estimate 3.6 33 / 43
  49. 49. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion original estimate = 3.6 based on Cp chosen model 0 10 20 30 1 2 3 4 5 fitted value µ^ 95 count model m6 m7 m8 Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by Cp 34 / 43
  50. 50. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q3% 6% 16% 12% 13% 14% 17% 19% Model 8 ****** 1 2 3 4 5 m1 m2 m3 m4 m5 m6 m7 m8 model fittedvalueµ^ 95 Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 Parametric bootsrap samples 35 / 43
  51. 51. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion original estimate = 3.7 based on BIC chosen model 0 50 100 150 200 2 3 4 5 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 Parametric bootsrap replications of the BIC chosen model; 40% of the replications greater than the original estimate 3.7 36 / 43
  52. 52. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion original estimate = 3.7 based on BIC chosen model 27% 20% 18% 0 20 40 60 80 3 4 5 fitted value µ^ 95 count model m1 m2 m3 Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by BIC 37 / 43
  53. 53. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion q q q q q q q q q q q q q q q q q q q q qq q q qq q q qq q q q q q q q q q q q q q q q q 20% 18% 27% 13% 9% 5% 5% 3% Model 3 ****** 2 3 4 5 m1 m2 m3 m4 m5 m6 m7 m8 model fittedvalueµ^ 95 Boxplot of fitted values for Subject 95 for the model chosen by BIC criteria based on B=4000 Parametric bootsrap samples 38 / 43
  54. 54. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion Improvements for regularized procedures where tuning parameters are also chosen in a data-driven fashion GLM ? Why parametric bootstrap? 39 / 43
  55. 55. Family
  56. 56. Roots
  57. 57. What I have done so far 1 BSc Actuarial Math - Concordia (2005-2008) 2 Pension actuary (2008-2011) 3 RA at the Chest with Andrea Benedetti (2011-2012) 4 MSc Biostats - Queen’s (2012-2013)
  58. 58. What’s Next? 1 PhD Biostatistics - McGill (2013-???) 2 Supervisor Celia Greenwood (Statistical Genetics)

×