We discuss the main results in Estimation and Accuracy after Model Selection by
Bradley Efron. This well written article, addresses how the variability in the model
selection process can lead to unstable post-selection inferences. The main result is an
easy to use, closed form formula for the standard deviation of a smoothed bootstrap
(or bagged ) estimator. A projection type argument is given in the paper to prove that
the proposed estimator is always less than or equal to the commonly used bootstrap
standard error. We investigate the validity of these results on the prostate data set, a
simulated data set where p > n, and the african data set as a representative example
for GLM. We find substantial gains in accuracy of post selection inference confidence
intervals for all subset selection, and modest gains when a regularization procedure is
used for model selection.
3. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
Born in St. Paul, Minnesota
in 1938 to Jewish-Russian
immigrants
B.S., Mathematics Caltech
(1960)
Ph.D., Statistics (1964)
under the direction of
Rupert Miller and Herb
Solomon
Professor of Statistics at
Stanford for the past 50
years
2 / 43
7. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
“Statistics is the science of information
gathering, especially when the information
arrives in little pieces instead of big ones”
“Statistics did not come naturally to me.
Dads keeping score for the baseball league
helped a lot”
5 / 43
8. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
“Statistics is the science of information
gathering, especially when the information
arrives in little pieces instead of big ones”
“Statistics did not come naturally to me.
Dads keeping score for the baseball league
helped a lot”
“I spent the first year at Stanford in the
Math Department...After, I started taking
stats courses, which I thought would be
easy. In fact I found them harder”
5 / 43
11. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
8 / 43
12. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
8 / 43
13. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
8 / 43
14. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
Prediction
Confidence Intervals
8 / 43
15. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
Prediction
Confidence Intervals
Today’s Question: Should we care about the variability of the
variable selection step in our post-selection inference?
8 / 43
16. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
An Example
n = 164 men took Cholestyramine (meant to reduce
cholesterol in the blood) for 7 years
x: a compliance measure adjusted : x ∼ N(0, 1)
y: cholesterol decrease
Perform a regression of y on x
We want to predict cholesterol decrease for a given
compliance value
µ = E[y|x]
9 / 43
17. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
An Example
Multiple Linear Regression Model
Y = Xβ + , i ∼ N(0, σ2
)
6 candidate models: M = {linear, quadratic, . . . , sextic, } e.g.
y = β0 + β1x + β2x2
+ . . . + β6x6
+
Cp Criterion for Model Selection
Cp(M) =
SSres(M)
n
goodness of fit
+
2σ2pM
n
complexity
Use OLS estimate for β from chosen model and predict:
ˆµ = Xˆβ
10 / 43
18. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
An Example: Nonparametric Bootstrap Analysis
Bootstrap the data:
data∗
= {(xj , yj )∗
, j = 1, . . . , n}
where (xj , yj )∗ are drawn randomly with replacement from the
original data
data∗
→
Cp
M∗
→
OLS
ˆβ∗
M∗ → ˆµ∗
= XM∗ ˆβ∗
M∗
Repeat B = 4000 times
11 / 43
21. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Prostate Data
Examine relation between level of PSA and clinical measures
n = 97 men who were about to receive prostatectomy
x = (x1, . . . , x8): clinical measures (adjusted : x ∼ N(0, 1))
y = log PSA
Perform regression of y on x
8 candidate models were identified using regsubsets and
nbest=1
We want to estimate
µj = E [y|xj ] , j = 1, . . . , 97
14 / 43
22. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
original estimate = 3.6
based on Cp chosen model
0
100
200
2 3 4
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 nonparametric bootsrap replications
of the Cp chosen model; 60% of the replications greater than the
original estimate 3.6
15 / 43
23. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
original estimate = 3.6
based on Cp chosen model
18%
22%
24%
0
30
60
90
120
3 4
fitted value µ^
95
count
model
m3
m5
m7
Fitted values for subject 95, from B=4000 nonparametric bootsrap
replications separated by three most frequently chosen models by Cp
16 / 43
24. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
1% 18% 12% 22% 15% 24% 8%
Model 7
******
3
4
m2 m3 m4 m5 m6 m7 m8
model
fittedvalueµ^
95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria
based on B=4000 nonparametric bootsrap samples
17 / 43
25. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Questions
Are you convinced there is a problem in the way we do
post-selection inference?
18 / 43
26. Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Questions
Are you convinced there is a problem in the way we do
post-selection inference?
Is the juice worth the squeeze ?
18 / 43
28. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Bagging (Breiman 1996)
Replace original estimator ˆµ = t(y) with bootstrap average
˜µ = s(y) =
1
B
B
i=1
t(y∗
i )
y∗
i : ith bootstrap sample
Known as model averaging
“If perturbing the learning set can cause significant changes in
the predictor constructed, then bagging can improve accuracy”
19 / 43
30. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
Y ∗
ij =# of times jth data point appears in ith boot sample
20 / 43
31. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
Y ∗
ij =# of times jth data point appears in ith boot sample
covj = cov(Y ∗
ij , t∗
i )
20 / 43
32. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
Y ∗
ij =# of times jth data point appears in ith boot sample
covj = cov(Y ∗
ij , t∗
i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1
B
i=1
t(y∗
i ) is
20 / 43
33. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
Y ∗
ij =# of times jth data point appears in ith boot sample
covj = cov(Y ∗
ij , t∗
i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1
B
i=1
t(y∗
i ) is
sd =
n
j=1
cov2
j
1/2
20 / 43
34. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
Note that covj = cov(Y ∗
ij , t∗
i ) is an unknown quantity. Therefore
we must estimate it. The estimate of standard deviation for
µ = s(y) in the non-ideal case is
sdB =
n
j=1
cov2
j
1/2
covj = B−1
B
i=1
Y ∗
ij − Y ∗
·j (t∗
i − t∗
· )
Y ∗
·j = B−1
B
i=1
Y ∗
ij t∗
· = B−1
B
i=1
t∗
i
21 / 43
35. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Improvement on Traditional Standard Error
sdB =
n
j=1
cov2
j
1/2
is always less than the bootstrap estimate of standard deviation for
the unsmoothed statistic
ˆsdB =
n
j=1
(t∗
i − t∗
· )2
1/2
22 / 43
41. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
Software
Analysis was performed in R
Implement LASSO using the glmnet package (Friedman,
Hastie, Tibshirani, 2013)
SCAD and MCP using the coordinate descent algorithm
(Breheny and Huang, 2011) in the ncvreg package
BIC and Cp model selection using the leaps package
(Lumley, 2009)
26 / 43
44. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
SCAD, MCP, LASSO
LASSO
SCAD
MCP
0.0 0.5 1.0
length
penalty
type
standard
quantile
smooth
95% Confidence Intervals for fitted value of Subject 95 based
on B=4000 nonparametric bootsrap samples for MCP, SCAD and LASSO penalties
29 / 43
45. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
BIC and Cp
BIC
Cp
0 5 10 15 20
length
penalty
type
standard
quantile
smooth
Length of 95% Confidence Intervals for fitted value of Subject 95 based
on B=4000 nonparametric bootsrap samples for Cp and BIC
30 / 43
47. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
An Example: Parametric Bootstrap Analysis
Obtain OLS estimates ˆµOLS based on full model
Generate
y∗
∼ N (ˆµOLS , I)
Full Model Bootstrap
y∗
→
Cp
M∗
, ˆβ∗
M∗ → ˆµ∗
= XM∗ ˆβ∗
M∗
Repeat B = 4000 times → t∗
ij = ˆµ∗
ij
Smoothed Estimates
sj = B−1
B
i=1
t∗
ij
32 / 43
48. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
original estimate = 3.6
based on Cp chosen model
0
50
100
150
200
1 2 3 4 5
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 Parametric bootsrap replications
of the Cp chosen model; 53% of the replications greater than the
original estimate 3.6
33 / 43
49. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
original estimate = 3.6
based on Cp chosen model
0
10
20
30
1 2 3 4 5
fitted value µ^
95
count
model
m6
m7
m8
Fitted values for subject 95, from B=4000 Parametric bootsrap
replications separated by three most frequently chosen models by Cp
34 / 43
51. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
original estimate = 3.7
based on BIC chosen model
0
50
100
150
200
2 3 4 5
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 Parametric bootsrap replications
of the BIC chosen model; 40% of the replications greater than the original estimate 3.7
36 / 43
52. Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
original estimate = 3.7
based on BIC chosen model
27%
20%
18%
0
20
40
60
80
3 4 5
fitted value µ^
95
count
model
m1
m2
m3
Fitted values for subject 95, from B=4000 Parametric bootsrap
replications separated by three most frequently chosen models by BIC
37 / 43
57. What I have done so far
1 BSc Actuarial Math - Concordia (2005-2008)
2 Pension actuary (2008-2011)
3 RA at the Chest with Andrea Benedetti (2011-2012)
4 MSc Biostats - Queen’s (2012-2013)