Estimation and Accuracy after Model Selection

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Estimation and Accuracy after Model Selection by
Bradley Efron (Stanford)
Sahir Rai Bhatnagar
McGill University
sahir.bhatnagar@mail.mcgill.ca
April 7, 2014
1 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
Born in St. Paul, Minnesota
in 1938 to Jewish-Russian
immigrants
2 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
Born in St. Paul, Minnesota
in 1938 to Jewish-Russian
immigrants
B.S., Mathematics Caltech
(1960)
Ph.D., Statistics (1964)
under the direction of
Rupert Miller and Herb
Solomon
Professor of Statistics at
Stanford for the past 50
years
2 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
Best known for the
Bootstrap, Annals of
Statistics (1977)
Founding Editor Annals of
Applied Statistics
Awarded Guy Medal in Gold
from RSS (2014) (34
awarded since 1892 including
Rao, Cox, Fisher, Nelder)
3 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
National Medal of Science 2005
Established by Congress in 1959 and administered by the National
Science Foundation, the medal is the nation’s highest scientiﬁc
honour
4 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
“Statistics is the science of information
gathering, especially when the information
arrives in little pieces instead of big ones”
5 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
“Statistics did not come naturally to me.
Dads keeping score for the baseball league
helped a lot”
5 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
“Statistics did not come naturally to me.
Dads keeping score for the baseball league
helped a lot”
“I spent the ﬁrst year at Stanford in the
Math Department...After, I started taking
stats courses, which I thought would be
easy. In fact I found them harder”
5 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
6 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
7 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Look at the data: one response, many covariates
8 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
8 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
2p
submodels
Perform Model Selection (see Abbas class notes)
8 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
2p
submodels
Do inference based on chosen model
Prediction
Conﬁdence Intervals
8 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
2p
submodels
Do inference based on chosen model
Prediction
Today’s Question: Should we care about the variability of the
variable selection step in our post-selection inference?
8 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
An Example
n = 164 men took Cholestyramine (meant to reduce
cholesterol in the blood) for 7 years
x: a compliance measure adjusted : x ∼ N(0, 1)
y: cholesterol decrease
Perform a regression of y on x
We want to predict cholesterol decrease for a given
compliance value
µ = E[y|x]
9 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
An Example
Multiple Linear Regression Model
Y = Xβ + , i ∼ N(0, σ2
)
6 candidate models: M = {linear, quadratic, . . . , sextic, } e.g.
y = β0 + β1x + β2x2
+ . . . + β6x6
+
Cp Criterion for Model Selection
Cp(M) =
SSres(M)
n
goodness of ﬁt
+
2σ2pM
n
complexity
Use OLS estimate for β from chosen model and predict:
ˆµ = Xˆβ
10 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
An Example: Nonparametric Bootstrap Analysis
Bootstrap the data:
data∗
= {(xj , yj )∗
, j = 1, . . . , n}
where (xj , yj )∗ are drawn randomly with replacement from the
original data
data∗
→
Cp
M∗
→
OLS
ˆβ∗
M∗ → ˆµ∗
= XM∗ ˆβ∗
M∗
Repeat B = 4000 times
11 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Reproduced from Efron 2013 12 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
13 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Prostate Data
Examine relation between level of PSA and clinical measures
n = 97 men who were about to receive prostatectomy
x = (x1, . . . , x8): clinical measures (adjusted : x ∼ N(0, 1))
y = log PSA
Perform regression of y on x
8 candidate models were identiﬁed using regsubsets and
nbest=1
We want to estimate
µj = E [y|xj ] , j = 1, . . . , 97
14 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
original estimate = 3.6
based on Cp chosen model
0
100
200
2 3 4
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 nonparametric bootsrap replications
of the Cp chosen model; 60% of the replications greater than the
original estimate 3.6
15 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
18%
22%
24%
0
30
60
90
120
3 4
fitted value µ^
95
count
model
m3
m5
m7
Fitted values for subject 95, from B=4000 nonparametric bootsrap
replications separated by three most frequently chosen models by Cp
16 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
1% 18% 12% 22% 15% 24% 8%
Model 7
******
3
4
m2 m3 m4 m5 m6 m7 m8
model
fittedvalueµ^
95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria
based on B=4000 nonparametric bootsrap samples
17 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Questions
Are you convinced there is a problem in the way we do
post-selection inference?
18 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Questions
Are you convinced there is a problem in the way we do
post-selection inference?
Is the juice worth the squeeze ?
18 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Bagging (Breiman 1996)
Replace original estimator ˆµ = t(y) with bootstrap average
˜µ = s(y) =
1
B
B
i=1
t(y∗
i )
y∗
i : ith bootstrap sample
19 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Bagging (Breiman 1996)
Replace original estimator ˆµ = t(y) with bootstrap average
˜µ = s(y) =
1
B
B
i=1
t(y∗
i )
y∗
i : ith bootstrap sample
Known as model averaging
“If perturbing the learning set can cause signiﬁcant changes in
the predictor constructed, then bagging can improve accuracy”
19 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
20 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
t∗
i = t(y∗
Y ∗
ij =# of times jth data point appears in ith boot sample
20 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
t∗
i = t(y∗
Y ∗
covj = cov(Y ∗
ij , t∗
i )
20 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
t∗
i = t(y∗
Y ∗
covj = cov(Y ∗
ij , t∗
i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1
B
i=1
t(y∗
i ) is
20 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
t∗
i = t(y∗
Y ∗
covj = cov(Y ∗
ij , t∗
i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1
B
i=1
t(y∗
i ) is
sd =


n
j=1
cov2
j


1/2
20 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Note that covj = cov(Y ∗
ij , t∗
i ) is an unknown quantity. Therefore
we must estimate it. The estimate of standard deviation for
µ = s(y) in the non-ideal case is
sdB =


n
j=1
cov2
j


1/2
covj = B−1
B
i=1
Y ∗
ij − Y ∗
·j (t∗
i − t∗
· )
Y ∗
·j = B−1
B
i=1
Y ∗
ij t∗
· = B−1
B
i=1
t∗
i
21 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Improvement on Traditional Standard Error
sdB =


n
j=1
cov2
j


1/2
is always less than the bootstrap estimate of standard deviation for
the unsmoothed statistic
ˆsdB =


n
j=1
(t∗
i − t∗
· )2


1/2
22 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Three Types
1 Standard
ˆµ ± 1.96sdB
23 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Three Types
1 Standard
ˆµ ± 1.96sdB
2 Percentile
ˆµ∗(0.025)
, ˆµ∗(0.975)
23 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Three Types
1 Standard
ˆµ ± 1.96sdB
2 Percentile
ˆµ∗(0.025)
, ˆµ∗(0.975)
3 Smoothed
˜µ ± 1.96sdB
23 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
L1-Norm Penalty Functions
Recall the optimization problem of interest:
max
β


 n(β) − n
p
j=1
p(|βj |; λ)



24 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
LASSO, SCAD and MCP penalties
LASSO (Tibshirani, 1996)
p(|β|; λ) = λ|β|
SCAD (Fan and Li, 2001 )
p (|β|; λ, γ) = λsign(β) I(|β|≤λ) +
(γλ − |β|)+
(γ − 1)λ
I(|β|>λ) , γ > 2
MCP (Zhang, 2010)
p(|β|; λ, γ) =
λ|β| − |β|2
2γ |β| ≤ γλ
γλ2
2 |β| > γλ
25 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
Software
Analysis was performed in R
Implement LASSO using the glmnet package (Friedman,
Hastie, Tibshirani, 2013)
SCAD and MCP using the coordinate descent algorithm
(Breheny and Huang, 2011) in the ncvreg package
BIC and Cp model selection using the leaps package
(Lumley, 2009)
26 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
MCP SCAD LASSO
0
50
100
150
200
2 3 4 2 3 4 2 3 4
fitted value µ^
95
count
27 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
BIC Cp
0
100
200
300
−10 0 10 −10 0 10
fitted value µ^
95
count
28 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
SCAD, MCP, LASSO
LASSO
SCAD
MCP
0.0 0.5 1.0
length
penalty
type
standard
quantile
smooth
95% Confidence Intervals for fitted value of Subject 95 based
on B=4000 nonparametric bootsrap samples for MCP, SCAD and LASSO penalties
29 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
BIC and Cp
BIC
Cp
0 5 10 15 20
length
penalty
type
standard
quantile
smooth
Length of 95% Confidence Intervals for fitted value of Subject 95 based
on B=4000 nonparametric bootsrap samples for Cp and BIC
30 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
Table : Prostate data, B=4000, Observation 95
model type ﬁtted value sd length coverage
LASSO standard 3.62 0.31 1.21 0.94
quantile 1.20 0.95
smooth 3.57 0.29 1.14 0.93
SCAD standard 3.60 0.35 1.37 0.95
quantile 1.33 0.95
smooth 3.62 0.33 1.28 0.93
MCP standard 3.60 0.35 1.38 0.96
quantile 1.35 0.95
smooth 3.61 0.33 1.29 0.94
BIC standard 5.50 4.75 18.62 0.84
quantile 16.05 0.95
smooth 3.22 3.46 13.55 0.83
Cp standard 5.13 5.11 20.02 0.86
quantile 16.15 0.95
smooth 0.64 4.40 17.24 0.97 31 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
An Example: Parametric Bootstrap Analysis
Obtain OLS estimates ˆµOLS based on full model
Generate
y∗
∼ N (ˆµOLS , I)
Full Model Bootstrap
y∗
→
Cp
M∗
, ˆβ∗
M∗ → ˆµ∗
= XM∗ ˆβ∗
M∗
Repeat B = 4000 times → t∗
ij = ˆµ∗
ij
Smoothed Estimates
sj = B−1
B
i=1
t∗
ij
32 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
0
50
100
150
200
1 2 3 4 5
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 Parametric bootsrap replications
of the Cp chosen model; 53% of the replications greater than the
original estimate 3.6
33 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
0
10
20
30
1 2 3 4 5
fitted value µ^
95
count
model
m6
m7
m8
Fitted values for subject 95, from B=4000 Parametric bootsrap
replications separated by three most frequently chosen models by Cp
34 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q3% 6% 16% 12% 13% 14% 17% 19%
Model 8
******
1
2
3
4
5
m1 m2 m3 m4 m5 m6 m7 m8
model
fittedvalueµ^
95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria
based on B=4000 Parametric bootsrap samples
35 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
based on BIC chosen model
0
50
100
150
200
2 3 4 5
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 Parametric bootsrap replications
of the BIC chosen model; 40% of the replications greater than the original estimate 3.7
36 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
based on BIC chosen model
27%
20%
18%
0
20
40
60
80
3 4 5
fitted value µ^
95
count
model
m1
m2
m3
Fitted values for subject 95, from B=4000 Parametric bootsrap
replications separated by three most frequently chosen models by BIC
37 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
20% 18% 27% 13% 9% 5% 5% 3%
Model 3
******
2
3
4
5
m1 m2 m3 m4 m5 m6 m7 m8
model
fittedvalueµ^
95
Boxplot of fitted values for Subject 95 for the model chosen by BIC criteria
based on B=4000 Parametric bootsrap samples
38 / 43

Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Discussion
Improvements for regularized procedures where tuning
parameters are also chosen in a data-driven fashion
GLM ?
Why parametric bootstrap?
39 / 43

What I have done so far
1 BSc Actuarial Math - Concordia (2005-2008)
2 Pension actuary (2008-2011)
3 RA at the Chest with Andrea Benedetti (2011-2012)
4 MSc Biostats - Queen’s (2012-2013)

What’s Next?
1 PhD Biostatistics - McGill (2013-???)
2 Supervisor Celia Greenwood (Statistical Genetics)

Estimation and Accuracy after Model Selection

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Estimation and Accuracy after Model Selection

Similar to Estimation and Accuracy after Model Selection (17)

More from sahirbhatnagar

More from sahirbhatnagar (11)

Recently uploaded

Recently uploaded (20)

Estimation and Accuracy after Model Selection