SlideShare a Scribd company logo
1 of 58
Download to read offline
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Estimation and Accuracy after Model Selection by
Bradley Efron (Stanford)
Sahir Rai Bhatnagar
McGill University
sahir.bhatnagar@mail.mcgill.ca
April 7, 2014
1 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
Born in St. Paul, Minnesota
in 1938 to Jewish-Russian
immigrants
2 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
Born in St. Paul, Minnesota
in 1938 to Jewish-Russian
immigrants
B.S., Mathematics Caltech
(1960)
Ph.D., Statistics (1964)
under the direction of
Rupert Miller and Herb
Solomon
Professor of Statistics at
Stanford for the past 50
years
2 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
Best known for the
Bootstrap, Annals of
Statistics (1977)
Founding Editor Annals of
Applied Statistics
Awarded Guy Medal in Gold
from RSS (2014) (34
awarded since 1892 including
Rao, Cox, Fisher, Nelder)
3 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
National Medal of Science 2005
Established by Congress in 1959 and administered by the National
Science Foundation, the medal is the nation’s highest scientific
honour
4 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
“Statistics is the science of information
gathering, especially when the information
arrives in little pieces instead of big ones”
5 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
“Statistics is the science of information
gathering, especially when the information
arrives in little pieces instead of big ones”
“Statistics did not come naturally to me.
Dads keeping score for the baseball league
helped a lot”
5 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Who?
Achievements
Some Quotes
“Statistics is the science of information
gathering, especially when the information
arrives in little pieces instead of big ones”
“Statistics did not come naturally to me.
Dads keeping score for the baseball league
helped a lot”
“I spent the first year at Stanford in the
Math Department...After, I started taking
stats courses, which I thought would be
easy. In fact I found them harder”
5 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
6 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
7 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
8 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
8 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
8 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
Prediction
Confidence Intervals
8 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Look at the data: one response, many covariates
Identify list of candidate models M
2p
submodels
linear, quadratic, cubic . . .
Perform Model Selection (see Abbas class notes)
Do inference based on chosen model
Prediction
Confidence Intervals
Today’s Question: Should we care about the variability of the
variable selection step in our post-selection inference?
8 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
An Example
n = 164 men took Cholestyramine (meant to reduce
cholesterol in the blood) for 7 years
x: a compliance measure adjusted : x ∼ N(0, 1)
y: cholesterol decrease
Perform a regression of y on x
We want to predict cholesterol decrease for a given
compliance value
µ = E[y|x]
9 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
An Example
Multiple Linear Regression Model
Y = Xβ + , i ∼ N(0, σ2
)
6 candidate models: M = {linear, quadratic, . . . , sextic, } e.g.
y = β0 + β1x + β2x2
+ . . . + β6x6
+
Cp Criterion for Model Selection
Cp(M) =
SSres(M)
n
goodness of fit
+
2σ2pM
n
complexity
Use OLS estimate for β from chosen model and predict:
ˆµ = Xˆβ
10 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
An Example: Nonparametric Bootstrap Analysis
Bootstrap the data:
data∗
= {(xj , yj )∗
, j = 1, . . . , n}
where (xj , yj )∗ are drawn randomly with replacement from the
original data
data∗
→
Cp
M∗
→
OLS
ˆβ∗
M∗ → ˆµ∗
= XM∗ ˆβ∗
M∗
Repeat B = 4000 times
11 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Reproduced from Efron 2013 12 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
13 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Prostate Data
Examine relation between level of PSA and clinical measures
n = 97 men who were about to receive prostatectomy
x = (x1, . . . , x8): clinical measures (adjusted : x ∼ N(0, 1))
y = log PSA
Perform regression of y on x
8 candidate models were identified using regsubsets and
nbest=1
We want to estimate
µj = E [y|xj ] , j = 1, . . . , 97
14 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
original estimate = 3.6
based on Cp chosen model
0
100
200
2 3 4
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 nonparametric bootsrap replications
of the Cp chosen model; 60% of the replications greater than the
original estimate 3.6
15 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
original estimate = 3.6
based on Cp chosen model
18%
22%
24%
0
30
60
90
120
3 4
fitted value µ^
95
count
model
m3
m5
m7
Fitted values for subject 95, from B=4000 nonparametric bootsrap
replications separated by three most frequently chosen models by Cp
16 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
1% 18% 12% 22% 15% 24% 8%
Model 7
******
3
4
m2 m3 m4 m5 m6 m7 m8
model
fittedvalueµ^
95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria
based on B=4000 nonparametric bootsrap samples
17 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Questions
Are you convinced there is a problem in the way we do
post-selection inference?
18 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
A Quick Review of the Bootstrap
Typical Model Selection Setting
Cholesterol Data Example
Prostate Data Example
Questions
Are you convinced there is a problem in the way we do
post-selection inference?
Is the juice worth the squeeze ?
18 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Bagging (Breiman 1996)
Replace original estimator ˆµ = t(y) with bootstrap average
˜µ = s(y) =
1
B
B
i=1
t(y∗
i )
y∗
i : ith bootstrap sample
19 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Bagging (Breiman 1996)
Replace original estimator ˆµ = t(y) with bootstrap average
˜µ = s(y) =
1
B
B
i=1
t(y∗
i )
y∗
i : ith bootstrap sample
Known as model averaging
“If perturbing the learning set can cause significant changes in
the predictor constructed, then bagging can improve accuracy”
19 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
20 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
Y ∗
ij =# of times jth data point appears in ith boot sample
20 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
Y ∗
ij =# of times jth data point appears in ith boot sample
covj = cov(Y ∗
ij , t∗
i )
20 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
Y ∗
ij =# of times jth data point appears in ith boot sample
covj = cov(Y ∗
ij , t∗
i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1
B
i=1
t(y∗
i ) is
20 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
t∗
i = t(y∗
i ), i = 1, . . . , B (value of statistic in boot sample i)
Y ∗
ij =# of times jth data point appears in ith boot sample
covj = cov(Y ∗
ij , t∗
i )
The non-parametric estimate of standard deviation for the ideal
smoothed bootstrap statistic µ = s(y) = B−1
B
i=1
t(y∗
i ) is
sd =


n
j=1
cov2
j


1/2
20 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Main Contribution of this Paper
Note that covj = cov(Y ∗
ij , t∗
i ) is an unknown quantity. Therefore
we must estimate it. The estimate of standard deviation for
µ = s(y) in the non-ideal case is
sdB =


n
j=1
cov2
j


1/2
covj = B−1
B
i=1
Y ∗
ij − Y ∗
·j (t∗
i − t∗
· )
Y ∗
·j = B−1
B
i=1
Y ∗
ij t∗
· = B−1
B
i=1
t∗
i
21 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Improvement on Traditional Standard Error
sdB =


n
j=1
cov2
j


1/2
is always less than the bootstrap estimate of standard deviation for
the unsmoothed statistic
ˆsdB =


n
j=1
(t∗
i − t∗
· )2


1/2
22 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Three Types
1 Standard
ˆµ ± 1.96sdB
23 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Three Types
1 Standard
ˆµ ± 1.96sdB
2 Percentile
ˆµ∗(0.025)
, ˆµ∗(0.975)
23 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Idea
Standard Errors
Theorem
Confidence Intervals
Three Types
1 Standard
ˆµ ± 1.96sdB
2 Percentile
ˆµ∗(0.025)
, ˆµ∗(0.975)
3 Smoothed
˜µ ± 1.96sdB
23 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
L1-Norm Penalty Functions
Recall the optimization problem of interest:
max
β


 n(β) − n
p
j=1
p(|βj |; λ)



24 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
LASSO, SCAD and MCP penalties
LASSO (Tibshirani, 1996)
p(|β|; λ) = λ|β|
SCAD (Fan and Li, 2001 )
p (|β|; λ, γ) = λsign(β) I(|β|≤λ) +
(γλ − |β|)+
(γ − 1)λ
I(|β|>λ) , γ > 2
MCP (Zhang, 2010)
p(|β|; λ, γ) =
λ|β| − |β|2
2γ |β| ≤ γλ
γλ2
2 |β| > γλ
25 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
Software
Analysis was performed in R
Implement LASSO using the glmnet package (Friedman,
Hastie, Tibshirani, 2013)
SCAD and MCP using the coordinate descent algorithm
(Breheny and Huang, 2011) in the ncvreg package
BIC and Cp model selection using the leaps package
(Lumley, 2009)
26 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
MCP SCAD LASSO
0
50
100
150
200
2 3 4 2 3 4 2 3 4
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 nonparametric bootsrap replications
27 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
BIC Cp
0
100
200
300
−10 0 10 −10 0 10
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 nonparametric bootsrap replications
28 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
SCAD, MCP, LASSO
LASSO
SCAD
MCP
0.0 0.5 1.0
length
penalty
type
standard
quantile
smooth
95% Confidence Intervals for fitted value of Subject 95 based
on B=4000 nonparametric bootsrap samples for MCP, SCAD and LASSO penalties
29 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
BIC and Cp
BIC
Cp
0 5 10 15 20
length
penalty
type
standard
quantile
smooth
Length of 95% Confidence Intervals for fitted value of Subject 95 based
on B=4000 nonparametric bootsrap samples for Cp and BIC
30 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
Table : Prostate data, B=4000, Observation 95
model type fitted value sd length coverage
LASSO standard 3.62 0.31 1.21 0.94
quantile 1.20 0.95
smooth 3.57 0.29 1.14 0.93
SCAD standard 3.60 0.35 1.37 0.95
quantile 1.33 0.95
smooth 3.62 0.33 1.28 0.93
MCP standard 3.60 0.35 1.38 0.96
quantile 1.35 0.95
smooth 3.61 0.33 1.29 0.94
BIC standard 5.50 4.75 18.62 0.84
quantile 16.05 0.95
smooth 3.22 3.46 13.55 0.83
Cp standard 5.13 5.11 20.02 0.86
quantile 16.15 0.95
smooth 0.64 4.40 17.24 0.97 31 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
An Example: Parametric Bootstrap Analysis
Obtain OLS estimates ˆµOLS based on full model
Generate
y∗
∼ N (ˆµOLS , I)
Full Model Bootstrap
y∗
→
Cp
M∗
, ˆβ∗
M∗ → ˆµ∗
= XM∗ ˆβ∗
M∗
Repeat B = 4000 times → t∗
ij = ˆµ∗
ij
Smoothed Estimates
sj = B−1
B
i=1
t∗
ij
32 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
original estimate = 3.6
based on Cp chosen model
0
50
100
150
200
1 2 3 4 5
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 Parametric bootsrap replications
of the Cp chosen model; 53% of the replications greater than the
original estimate 3.6
33 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
original estimate = 3.6
based on Cp chosen model
0
10
20
30
1 2 3 4 5
fitted value µ^
95
count
model
m6
m7
m8
Fitted values for subject 95, from B=4000 Parametric bootsrap
replications separated by three most frequently chosen models by Cp
34 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q3% 6% 16% 12% 13% 14% 17% 19%
Model 8
******
1
2
3
4
5
m1 m2 m3 m4 m5 m6 m7 m8
model
fittedvalueµ^
95
Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria
based on B=4000 Parametric bootsrap samples
35 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
original estimate = 3.7
based on BIC chosen model
0
50
100
150
200
2 3 4 5
fitted value µ^
95
count
Fitted values for subject 95, from B=4000 Parametric bootsrap replications
of the BIC chosen model; 40% of the replications greater than the original estimate 3.7
36 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
original estimate = 3.7
based on BIC chosen model
27%
20%
18%
0
20
40
60
80
3 4 5
fitted value µ^
95
count
model
m1
m2
m3
Fitted values for subject 95, from B=4000 Parametric bootsrap
replications separated by three most frequently chosen models by BIC
37 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
20% 18% 27% 13% 9% 5% 5% 3%
Model 3
******
2
3
4
5
m1 m2 m3 m4 m5 m6 m7 m8
model
fittedvalueµ^
95
Boxplot of fitted values for Subject 95 for the model chosen by BIC criteria
based on B=4000 Parametric bootsrap samples
38 / 43
Bradley Efron
Motivation
Bootstrap Smoothing
Results
Setting
Prostate Data: Revisited
Parametric Bootstrap
Discussion
Improvements for regularized procedures where tuning
parameters are also chosen in a data-driven fashion
GLM ?
Why parametric bootstrap?
39 / 43
Family
Roots
What I have done so far
1 BSc Actuarial Math - Concordia (2005-2008)
2 Pension actuary (2008-2011)
3 RA at the Chest with Andrea Benedetti (2011-2012)
4 MSc Biostats - Queen’s (2012-2013)
What’s Next?
1 PhD Biostatistics - McGill (2013-???)
2 Supervisor Celia Greenwood (Statistical Genetics)

More Related Content

Viewers also liked

Ciclo basico diurno vigencia 2009 scp
Ciclo basico diurno vigencia 2009 scpCiclo basico diurno vigencia 2009 scp
Ciclo basico diurno vigencia 2009 scpRuth Santana
 
P pt keys for good and happy life.
P pt keys for good and happy life.P pt keys for good and happy life.
P pt keys for good and happy life.Rajasekhar Dasari
 
المعارض والمؤتمرات شهر يونيو
المعارض والمؤتمرات شهر يونيوالمعارض والمؤتمرات شهر يونيو
المعارض والمؤتمرات شهر يونيوPalestinian Business Forum
 
Cilmatic Risk Assessment Of Southern Express Way in Sri Lanka
Cilmatic Risk Assessment Of Southern Express Way in Sri LankaCilmatic Risk Assessment Of Southern Express Way in Sri Lanka
Cilmatic Risk Assessment Of Southern Express Way in Sri LankaMaersk Line
 
Hospice letter
Hospice letterHospice letter
Hospice letternm118486
 
Diy homemade business cards
Diy homemade business cardsDiy homemade business cards
Diy homemade business cardsSameDay Printing
 
Yesu Nallavar -Testimony of Sister Nirmala
Yesu Nallavar -Testimony of Sister NirmalaYesu Nallavar -Testimony of Sister Nirmala
Yesu Nallavar -Testimony of Sister NirmalaRaja Venkatesan
 
Control prenatal
Control prenatal Control prenatal
Control prenatal pedrothg
 
From SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiFrom SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiphilippe_merle
 
Geography of Bihar by Eithasab Ahmed
Geography of Bihar by Eithasab AhmedGeography of Bihar by Eithasab Ahmed
Geography of Bihar by Eithasab AhmedPonnuru Varun
 

Viewers also liked (13)

Ciclo basico diurno vigencia 2009 scp
Ciclo basico diurno vigencia 2009 scpCiclo basico diurno vigencia 2009 scp
Ciclo basico diurno vigencia 2009 scp
 
P pt keys for good and happy life.
P pt keys for good and happy life.P pt keys for good and happy life.
P pt keys for good and happy life.
 
المعارض والمؤتمرات شهر يونيو
المعارض والمؤتمرات شهر يونيوالمعارض والمؤتمرات شهر يونيو
المعارض والمؤتمرات شهر يونيو
 
Beijing2011
Beijing2011Beijing2011
Beijing2011
 
Modul 1 bahasa indonesia kb 1
Modul 1 bahasa indonesia kb 1Modul 1 bahasa indonesia kb 1
Modul 1 bahasa indonesia kb 1
 
Cilmatic Risk Assessment Of Southern Express Way in Sri Lanka
Cilmatic Risk Assessment Of Southern Express Way in Sri LankaCilmatic Risk Assessment Of Southern Express Way in Sri Lanka
Cilmatic Risk Assessment Of Southern Express Way in Sri Lanka
 
Hospice letter
Hospice letterHospice letter
Hospice letter
 
Diy homemade business cards
Diy homemade business cardsDiy homemade business cards
Diy homemade business cards
 
Auraplus ciuziniai
Auraplus ciuziniaiAuraplus ciuziniai
Auraplus ciuziniai
 
Yesu Nallavar -Testimony of Sister Nirmala
Yesu Nallavar -Testimony of Sister NirmalaYesu Nallavar -Testimony of Sister Nirmala
Yesu Nallavar -Testimony of Sister Nirmala
 
Control prenatal
Control prenatal Control prenatal
Control prenatal
 
From SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiFrom SOA to SCA and FraSCAti
From SOA to SCA and FraSCAti
 
Geography of Bihar by Eithasab Ahmed
Geography of Bihar by Eithasab AhmedGeography of Bihar by Eithasab Ahmed
Geography of Bihar by Eithasab Ahmed
 

Similar to Estimation and Accuracy after Model Selection

Euro 2013 barrow crone - slideshare
Euro 2013 barrow crone - slideshareEuro 2013 barrow crone - slideshare
Euro 2013 barrow crone - slideshareDevon Barrow
 
Absolute risk estimation in a case cohort study of prostate cancer
Absolute risk estimation in a case cohort study of prostate cancerAbsolute risk estimation in a case cohort study of prostate cancer
Absolute risk estimation in a case cohort study of prostate cancersahirbhatnagar
 
Identifying the sampling distribution module5
Identifying the sampling distribution module5Identifying the sampling distribution module5
Identifying the sampling distribution module5REYEMMANUELILUMBA
 
APHA 2012: Hierarchical Multiple Informants Model
APHA 2012: Hierarchical Multiple Informants ModelAPHA 2012: Hierarchical Multiple Informants Model
APHA 2012: Hierarchical Multiple Informants ModelJonggyu Baek
 
slides-correlations.pdf
slides-correlations.pdfslides-correlations.pdf
slides-correlations.pdfFlorentBersani
 
Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...
Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...
Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...Saeid Balaneshinkordan (Balaneshin-kordan)
 
MLA - Austin - Efficiently searching for systematic reviews
MLA - Austin - Efficiently searching for systematic reviewsMLA - Austin - Efficiently searching for systematic reviews
MLA - Austin - Efficiently searching for systematic reviewsWichor Bramer
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classificationLokarchanaD
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data MiningRashmi Bhat
 
Statistics chapter1
Statistics chapter1Statistics chapter1
Statistics chapter1cabadia
 
Statistics pres 10 27 2015 roy sabo
Statistics pres 10 27 2015   roy saboStatistics pres 10 27 2015   roy sabo
Statistics pres 10 27 2015 roy sabotjcarter
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and datahaharrington
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMin-hyung Kim
 

Similar to Estimation and Accuracy after Model Selection (17)

Euro 2013 barrow crone - slideshare
Euro 2013 barrow crone - slideshareEuro 2013 barrow crone - slideshare
Euro 2013 barrow crone - slideshare
 
Absolute risk estimation in a case cohort study of prostate cancer
Absolute risk estimation in a case cohort study of prostate cancerAbsolute risk estimation in a case cohort study of prostate cancer
Absolute risk estimation in a case cohort study of prostate cancer
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Identifying the sampling distribution module5
Identifying the sampling distribution module5Identifying the sampling distribution module5
Identifying the sampling distribution module5
 
APHA 2012: Hierarchical Multiple Informants Model
APHA 2012: Hierarchical Multiple Informants ModelAPHA 2012: Hierarchical Multiple Informants Model
APHA 2012: Hierarchical Multiple Informants Model
 
slides-correlations.pdf
slides-correlations.pdfslides-correlations.pdf
slides-correlations.pdf
 
Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...
Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...
Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...
 
sampling.pdf
sampling.pdfsampling.pdf
sampling.pdf
 
Advanced Statistics.pptx
Advanced Statistics.pptxAdvanced Statistics.pptx
Advanced Statistics.pptx
 
MLA - Austin - Efficiently searching for systematic reviews
MLA - Austin - Efficiently searching for systematic reviewsMLA - Austin - Efficiently searching for systematic reviews
MLA - Austin - Efficiently searching for systematic reviews
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classification
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data Mining
 
Statistics chapter1
Statistics chapter1Statistics chapter1
Statistics chapter1
 
Statistics pres 10 27 2015 roy sabo
Statistics pres 10 27 2015   roy saboStatistics pres 10 27 2015   roy sabo
Statistics pres 10 27 2015 roy sabo
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -clean
 

More from sahirbhatnagar

Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Datasahirbhatnagar
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactionssahirbhatnagar
 
An introduction to knitr and R Markdown
An introduction to knitr and R MarkdownAn introduction to knitr and R Markdown
An introduction to knitr and R Markdownsahirbhatnagar
 
Reproducible Research: An Introduction to knitr
Reproducible Research: An Introduction to knitrReproducible Research: An Introduction to knitr
Reproducible Research: An Introduction to knitrsahirbhatnagar
 
Analysis of DNA methylation and Gene expression to predict childhood obesity
Analysis of DNA methylation and Gene expression to predict childhood obesityAnalysis of DNA methylation and Gene expression to predict childhood obesity
Analysis of DNA methylation and Gene expression to predict childhood obesitysahirbhatnagar
 
Computational methods for case-cohort studies
Computational methods for case-cohort studiesComputational methods for case-cohort studies
Computational methods for case-cohort studiessahirbhatnagar
 
Factors influencing participation in cancer screening
Factors influencing participation in cancer screeningFactors influencing participation in cancer screening
Factors influencing participation in cancer screeningsahirbhatnagar
 
Methylation and Expression data integration
Methylation and Expression data integrationMethylation and Expression data integration
Methylation and Expression data integrationsahirbhatnagar
 

More from sahirbhatnagar (11)

Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Data
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
 
An introduction to knitr and R Markdown
An introduction to knitr and R MarkdownAn introduction to knitr and R Markdown
An introduction to knitr and R Markdown
 
Atelier r-gerad
Atelier r-geradAtelier r-gerad
Atelier r-gerad
 
Reproducible Research: An Introduction to knitr
Reproducible Research: An Introduction to knitrReproducible Research: An Introduction to knitr
Reproducible Research: An Introduction to knitr
 
Analysis of DNA methylation and Gene expression to predict childhood obesity
Analysis of DNA methylation and Gene expression to predict childhood obesityAnalysis of DNA methylation and Gene expression to predict childhood obesity
Analysis of DNA methylation and Gene expression to predict childhood obesity
 
Computational methods for case-cohort studies
Computational methods for case-cohort studiesComputational methods for case-cohort studies
Computational methods for case-cohort studies
 
Factors influencing participation in cancer screening
Factors influencing participation in cancer screeningFactors influencing participation in cancer screening
Factors influencing participation in cancer screening
 
Introduction to LaTeX
Introduction to LaTeXIntroduction to LaTeX
Introduction to LaTeX
 
Methylation and Expression data integration
Methylation and Expression data integrationMethylation and Expression data integration
Methylation and Expression data integration
 
Reproducible Research
Reproducible ResearchReproducible Research
Reproducible Research
 

Recently uploaded

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 

Recently uploaded (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 

Estimation and Accuracy after Model Selection

  • 1. Bradley Efron Motivation Bootstrap Smoothing Results Estimation and Accuracy after Model Selection by Bradley Efron (Stanford) Sahir Rai Bhatnagar McGill University sahir.bhatnagar@mail.mcgill.ca April 7, 2014 1 / 43
  • 2. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes Born in St. Paul, Minnesota in 1938 to Jewish-Russian immigrants 2 / 43
  • 3. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes Born in St. Paul, Minnesota in 1938 to Jewish-Russian immigrants B.S., Mathematics Caltech (1960) Ph.D., Statistics (1964) under the direction of Rupert Miller and Herb Solomon Professor of Statistics at Stanford for the past 50 years 2 / 43
  • 4. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes Best known for the Bootstrap, Annals of Statistics (1977) Founding Editor Annals of Applied Statistics Awarded Guy Medal in Gold from RSS (2014) (34 awarded since 1892 including Rao, Cox, Fisher, Nelder) 3 / 43
  • 5. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes National Medal of Science 2005 Established by Congress in 1959 and administered by the National Science Foundation, the medal is the nation’s highest scientific honour 4 / 43
  • 6. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes “Statistics is the science of information gathering, especially when the information arrives in little pieces instead of big ones” 5 / 43
  • 7. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes “Statistics is the science of information gathering, especially when the information arrives in little pieces instead of big ones” “Statistics did not come naturally to me. Dads keeping score for the baseball league helped a lot” 5 / 43
  • 8. Bradley Efron Motivation Bootstrap Smoothing Results Who? Achievements Some Quotes “Statistics is the science of information gathering, especially when the information arrives in little pieces instead of big ones” “Statistics did not come naturally to me. Dads keeping score for the baseball league helped a lot” “I spent the first year at Stanford in the Math Department...After, I started taking stats courses, which I thought would be easy. In fact I found them harder” 5 / 43
  • 9. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example 6 / 43
  • 10. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example 7 / 43
  • 11. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates 8 / 43
  • 12. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates Identify list of candidate models M 2p submodels linear, quadratic, cubic . . . 8 / 43
  • 13. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates Identify list of candidate models M 2p submodels linear, quadratic, cubic . . . Perform Model Selection (see Abbas class notes) 8 / 43
  • 14. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates Identify list of candidate models M 2p submodels linear, quadratic, cubic . . . Perform Model Selection (see Abbas class notes) Do inference based on chosen model Prediction Confidence Intervals 8 / 43
  • 15. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Look at the data: one response, many covariates Identify list of candidate models M 2p submodels linear, quadratic, cubic . . . Perform Model Selection (see Abbas class notes) Do inference based on chosen model Prediction Confidence Intervals Today’s Question: Should we care about the variability of the variable selection step in our post-selection inference? 8 / 43
  • 16. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example An Example n = 164 men took Cholestyramine (meant to reduce cholesterol in the blood) for 7 years x: a compliance measure adjusted : x ∼ N(0, 1) y: cholesterol decrease Perform a regression of y on x We want to predict cholesterol decrease for a given compliance value µ = E[y|x] 9 / 43
  • 17. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example An Example Multiple Linear Regression Model Y = Xβ + , i ∼ N(0, σ2 ) 6 candidate models: M = {linear, quadratic, . . . , sextic, } e.g. y = β0 + β1x + β2x2 + . . . + β6x6 + Cp Criterion for Model Selection Cp(M) = SSres(M) n goodness of fit + 2σ2pM n complexity Use OLS estimate for β from chosen model and predict: ˆµ = Xˆβ 10 / 43
  • 18. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example An Example: Nonparametric Bootstrap Analysis Bootstrap the data: data∗ = {(xj , yj )∗ , j = 1, . . . , n} where (xj , yj )∗ are drawn randomly with replacement from the original data data∗ → Cp M∗ → OLS ˆβ∗ M∗ → ˆµ∗ = XM∗ ˆβ∗ M∗ Repeat B = 4000 times 11 / 43
  • 19. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Reproduced from Efron 2013 12 / 43
  • 20. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example 13 / 43
  • 21. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Prostate Data Examine relation between level of PSA and clinical measures n = 97 men who were about to receive prostatectomy x = (x1, . . . , x8): clinical measures (adjusted : x ∼ N(0, 1)) y = log PSA Perform regression of y on x 8 candidate models were identified using regsubsets and nbest=1 We want to estimate µj = E [y|xj ] , j = 1, . . . , 97 14 / 43
  • 22. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example original estimate = 3.6 based on Cp chosen model 0 100 200 2 3 4 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 nonparametric bootsrap replications of the Cp chosen model; 60% of the replications greater than the original estimate 3.6 15 / 43
  • 23. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example original estimate = 3.6 based on Cp chosen model 18% 22% 24% 0 30 60 90 120 3 4 fitted value µ^ 95 count model m3 m5 m7 Fitted values for subject 95, from B=4000 nonparametric bootsrap replications separated by three most frequently chosen models by Cp 16 / 43
  • 24. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q 1% 18% 12% 22% 15% 24% 8% Model 7 ****** 3 4 m2 m3 m4 m5 m6 m7 m8 model fittedvalueµ^ 95 Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 nonparametric bootsrap samples 17 / 43
  • 25. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Questions Are you convinced there is a problem in the way we do post-selection inference? 18 / 43
  • 26. Bradley Efron Motivation Bootstrap Smoothing Results A Quick Review of the Bootstrap Typical Model Selection Setting Cholesterol Data Example Prostate Data Example Questions Are you convinced there is a problem in the way we do post-selection inference? Is the juice worth the squeeze ? 18 / 43
  • 27. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Bagging (Breiman 1996) Replace original estimator ˆµ = t(y) with bootstrap average ˜µ = s(y) = 1 B B i=1 t(y∗ i ) y∗ i : ith bootstrap sample 19 / 43
  • 28. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Bagging (Breiman 1996) Replace original estimator ˆµ = t(y) with bootstrap average ˜µ = s(y) = 1 B B i=1 t(y∗ i ) y∗ i : ith bootstrap sample Known as model averaging “If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy” 19 / 43
  • 29. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) 20 / 43
  • 30. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) Y ∗ ij =# of times jth data point appears in ith boot sample 20 / 43
  • 31. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) Y ∗ ij =# of times jth data point appears in ith boot sample covj = cov(Y ∗ ij , t∗ i ) 20 / 43
  • 32. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) Y ∗ ij =# of times jth data point appears in ith boot sample covj = cov(Y ∗ ij , t∗ i ) The non-parametric estimate of standard deviation for the ideal smoothed bootstrap statistic µ = s(y) = B−1 B i=1 t(y∗ i ) is 20 / 43
  • 33. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper t∗ i = t(y∗ i ), i = 1, . . . , B (value of statistic in boot sample i) Y ∗ ij =# of times jth data point appears in ith boot sample covj = cov(Y ∗ ij , t∗ i ) The non-parametric estimate of standard deviation for the ideal smoothed bootstrap statistic µ = s(y) = B−1 B i=1 t(y∗ i ) is sd =   n j=1 cov2 j   1/2 20 / 43
  • 34. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Main Contribution of this Paper Note that covj = cov(Y ∗ ij , t∗ i ) is an unknown quantity. Therefore we must estimate it. The estimate of standard deviation for µ = s(y) in the non-ideal case is sdB =   n j=1 cov2 j   1/2 covj = B−1 B i=1 Y ∗ ij − Y ∗ ·j (t∗ i − t∗ · ) Y ∗ ·j = B−1 B i=1 Y ∗ ij t∗ · = B−1 B i=1 t∗ i 21 / 43
  • 35. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Improvement on Traditional Standard Error sdB =   n j=1 cov2 j   1/2 is always less than the bootstrap estimate of standard deviation for the unsmoothed statistic ˆsdB =   n j=1 (t∗ i − t∗ · )2   1/2 22 / 43
  • 36. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Three Types 1 Standard ˆµ ± 1.96sdB 23 / 43
  • 37. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Three Types 1 Standard ˆµ ± 1.96sdB 2 Percentile ˆµ∗(0.025) , ˆµ∗(0.975) 23 / 43
  • 38. Bradley Efron Motivation Bootstrap Smoothing Results Idea Standard Errors Theorem Confidence Intervals Three Types 1 Standard ˆµ ± 1.96sdB 2 Percentile ˆµ∗(0.025) , ˆµ∗(0.975) 3 Smoothed ˜µ ± 1.96sdB 23 / 43
  • 39. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion L1-Norm Penalty Functions Recall the optimization problem of interest: max β    n(β) − n p j=1 p(|βj |; λ)    24 / 43
  • 40. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion LASSO, SCAD and MCP penalties LASSO (Tibshirani, 1996) p(|β|; λ) = λ|β| SCAD (Fan and Li, 2001 ) p (|β|; λ, γ) = λsign(β) I(|β|≤λ) + (γλ − |β|)+ (γ − 1)λ I(|β|>λ) , γ > 2 MCP (Zhang, 2010) p(|β|; λ, γ) = λ|β| − |β|2 2γ |β| ≤ γλ γλ2 2 |β| > γλ 25 / 43
  • 41. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion Software Analysis was performed in R Implement LASSO using the glmnet package (Friedman, Hastie, Tibshirani, 2013) SCAD and MCP using the coordinate descent algorithm (Breheny and Huang, 2011) in the ncvreg package BIC and Cp model selection using the leaps package (Lumley, 2009) 26 / 43
  • 42. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion MCP SCAD LASSO 0 50 100 150 200 2 3 4 2 3 4 2 3 4 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 nonparametric bootsrap replications 27 / 43
  • 43. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion BIC Cp 0 100 200 300 −10 0 10 −10 0 10 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 nonparametric bootsrap replications 28 / 43
  • 44. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion SCAD, MCP, LASSO LASSO SCAD MCP 0.0 0.5 1.0 length penalty type standard quantile smooth 95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for MCP, SCAD and LASSO penalties 29 / 43
  • 45. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion BIC and Cp BIC Cp 0 5 10 15 20 length penalty type standard quantile smooth Length of 95% Confidence Intervals for fitted value of Subject 95 based on B=4000 nonparametric bootsrap samples for Cp and BIC 30 / 43
  • 46. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion Table : Prostate data, B=4000, Observation 95 model type fitted value sd length coverage LASSO standard 3.62 0.31 1.21 0.94 quantile 1.20 0.95 smooth 3.57 0.29 1.14 0.93 SCAD standard 3.60 0.35 1.37 0.95 quantile 1.33 0.95 smooth 3.62 0.33 1.28 0.93 MCP standard 3.60 0.35 1.38 0.96 quantile 1.35 0.95 smooth 3.61 0.33 1.29 0.94 BIC standard 5.50 4.75 18.62 0.84 quantile 16.05 0.95 smooth 3.22 3.46 13.55 0.83 Cp standard 5.13 5.11 20.02 0.86 quantile 16.15 0.95 smooth 0.64 4.40 17.24 0.97 31 / 43
  • 47. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion An Example: Parametric Bootstrap Analysis Obtain OLS estimates ˆµOLS based on full model Generate y∗ ∼ N (ˆµOLS , I) Full Model Bootstrap y∗ → Cp M∗ , ˆβ∗ M∗ → ˆµ∗ = XM∗ ˆβ∗ M∗ Repeat B = 4000 times → t∗ ij = ˆµ∗ ij Smoothed Estimates sj = B−1 B i=1 t∗ ij 32 / 43
  • 48. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion original estimate = 3.6 based on Cp chosen model 0 50 100 150 200 1 2 3 4 5 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 Parametric bootsrap replications of the Cp chosen model; 53% of the replications greater than the original estimate 3.6 33 / 43
  • 49. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion original estimate = 3.6 based on Cp chosen model 0 10 20 30 1 2 3 4 5 fitted value µ^ 95 count model m6 m7 m8 Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by Cp 34 / 43
  • 50. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q3% 6% 16% 12% 13% 14% 17% 19% Model 8 ****** 1 2 3 4 5 m1 m2 m3 m4 m5 m6 m7 m8 model fittedvalueµ^ 95 Boxplot of fitted values for Subject 95 for the model chosen by Cp criteria based on B=4000 Parametric bootsrap samples 35 / 43
  • 51. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion original estimate = 3.7 based on BIC chosen model 0 50 100 150 200 2 3 4 5 fitted value µ^ 95 count Fitted values for subject 95, from B=4000 Parametric bootsrap replications of the BIC chosen model; 40% of the replications greater than the original estimate 3.7 36 / 43
  • 52. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion original estimate = 3.7 based on BIC chosen model 27% 20% 18% 0 20 40 60 80 3 4 5 fitted value µ^ 95 count model m1 m2 m3 Fitted values for subject 95, from B=4000 Parametric bootsrap replications separated by three most frequently chosen models by BIC 37 / 43
  • 53. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion q q q q q q q q q q q q q q q q q q q q qq q q qq q q qq q q q q q q q q q q q q q q q q 20% 18% 27% 13% 9% 5% 5% 3% Model 3 ****** 2 3 4 5 m1 m2 m3 m4 m5 m6 m7 m8 model fittedvalueµ^ 95 Boxplot of fitted values for Subject 95 for the model chosen by BIC criteria based on B=4000 Parametric bootsrap samples 38 / 43
  • 54. Bradley Efron Motivation Bootstrap Smoothing Results Setting Prostate Data: Revisited Parametric Bootstrap Discussion Improvements for regularized procedures where tuning parameters are also chosen in a data-driven fashion GLM ? Why parametric bootstrap? 39 / 43
  • 56. Roots
  • 57. What I have done so far 1 BSc Actuarial Math - Concordia (2005-2008) 2 Pension actuary (2008-2011) 3 RA at the Chest with Andrea Benedetti (2011-2012) 4 MSc Biostats - Queen’s (2012-2013)
  • 58. What’s Next? 1 PhD Biostatistics - McGill (2013-???) 2 Supervisor Celia Greenwood (Statistical Genetics)