SlideShare a Scribd company logo
1 of 61
Download to read offline
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
How0not0to0make0a0fool0of0yourself0
with0model0selec:on0
Paul0Johnson0
IBAHCM0PostDoc/PI0seminar0290May020150 10
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
“Reproducibility.is0the0ability0of0an0en:re0experiment0or0study0to0
be0reproduced,0either0by0the0researcher0or0by0someone0else0
working0independently.0It0is0one0of0the0main0principles0of0the0
scien:fic0method”0
Wikipedia0
0
“non0reproducible0single0occurrences0are0of0no0significance0to0
science”0
Karl0Popper,0The$Logic$of$Scien-fic$Discovery$
IBAHCM0Postdoc/PI0Seminar,0290May020150 20
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
A0crisis0of0irreproducibility?0
IBAHCM0Postdoc/PI0Seminar,0290May020150 30
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
A0crisis0of0irreproducibility?0
IBAHCM0Postdoc/PI0Seminar,0290May020150 40
Schoenfeld0&0Ioannidis0
Am0J0Clin0Nutr.02013;097(1):012734.0
Decreased0risk0of0
cancer0
Increased0risk0of0
cancer0
Most0foods0are0
associated0with0
both0increased0
and0decreased0
risk0of0cancer0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
A0crisis0of0irreproducibility?0
IBAHCM0Postdoc/PI0Seminar,0290May020150 50
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
A0crisis0of0irreproducibility?0
IBAHCM0Postdoc/PI0Seminar,0290May020150 60
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
How0to0not0make0a0fool0of0yourself0with0P0values0
•  “You0make0a0fool0of0yourself0if0you0declare0that0you0have0
discovered0something,0when0all0you0are0observing0is0random0
chance.”0
•  “…what0maaers0is0the.probability.that,.when.you.find.that.a.
result.is.‘sta8s8cally.significant’,.there.is.actually.a.real.
effect.”00
•  “If0you0find0a0‘significant’0result0when0there0is0nothing0but0
chance0at0play,0your0result0is0a0false0posi:ve,0and0the.chance.
of.ge=ng.a.false.posi8ve.is.o?en.alarmingly.high.”0
•  This0probability0is0called0the0false.discovery.rate.(FDR)..
o  FDR.=.P(no.effect.|.significant.result)0
IBAHCM0Postdoc/PI0Seminar,0290May020150 70
Colquhoun0D.020140An0inves:ga:on0of0the0false0discovery0rate0and0the0
misinterpreta:on0of0pvalues.0R.0Soc.0Open0Sci.01:01402160
paul.johnson@glasgow.ac.uk00!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 80
Colquhoun0D.020140An0
inves:ga:on0of0the0false0
discovery0rate0and0the0
misinterpreta:on0of0pvalues.0
R.0Soc.0open0sci.01:01402160
The0false0discovery0rate0(FDR)0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 90
The0false0discovery0rate0(FDR)0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 100
•  No0of0discoveries0=0800+0450
The0false0discovery0rate0(FDR)0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 110
•  No0of0discoveries0=01250
The0false0discovery0rate0(FDR)0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 120
•  No0of0discoveries0=01250
•  No0of0false0discoveries0=0450
The0false0discovery0rate0(FDR)0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 130
•  No0of0discoveries0=01250
•  No0of0false0discoveries0=0450
•  False0discovery0rate0=045/1250=036%00
The0false0discovery0rate0(FDR)0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 140
•  No0of0discoveries0=01250
•  No0of0false0discoveries0=0450
•  False0discovery0rate0=045/1250=036%00
The0false0discovery0rate0(FDR)0
We0know0this0
number…0
…but0not0
this0one0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 150
•  No0of0discoveries0=01250
•  No0of0false0discoveries0<0500
•  False0discovery0rate0<050/1250=040%00
The0false0discovery0rate0(FDR)0
We0know0this0
number…0
…we0can0
guess0this0
one0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
OK,0I’m0alarmed,0but0what0has0this0got0to0do0
with0model0selec:on?0
•  The0alarmingly0high0risk0of0a0“significant”0result0being0false0
poten:ally0applies0to0any0sta:s:cal0method0(not0only0
significance0tes:ng)0that0divides0hypotheses0into0“hits”0and0
“misses”0
•  Simple0model0selec:on0methods0–0in0par:cular0stepwise0
selec:on0–0are0prone0to0making0random0noise0look0like0
discoveries0
•  The0false0discovery0rate0provides0a0simple0way0to0
o  Illustrate0the0unreliability0of0stepwise0selec:on0
o  Poten:ally0make0it0more0reliable0
IBAHCM0Postdoc/PI0Seminar,0290May020150 160
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
What0is0model0selec:on?0
•  A0method0of0sta-s-cal$inference0(learning0from0data)0that0
selects0the0best0model0from0a0set0of0several0candidate0models0
•  Very0commonly0applied0to0regression0models0
•  There’s0a0great0deal0of0debate0about0how0(and0how0not)0to0do0
model0selec:on0–0no0:me0to0get0into0this0here0
IBAHCM0Away0Day,0180December020130 170
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
What0is0stepwise0model0selec:on?0
•  Aims0to0iden:fy0the0subset0of0p0explanatory0variables,0x10,0x2,0
…,0xp0that0best0explains0varia:on0in0the0response0variable,0y$
•  Backwards0stepwise0selec:on0
1.  Fit0full0regression0model:0y0=0β00+0β10x10+0β20x20+0…0+0βp0xp0+0
ε0
2.  Drop0weakest0x0(e.g.0largest0P0value)0and0refit0
3.  Repeat0step020un:l0all0surviving0x0are0significant0
IBAHCM0Away0Day,0180December020130 180
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
What0is0stepwise0model0selec:on?0
•  Forwards0stepwise0selec:on0
1.  Fit0minimal0full0regression0model:0y0=0β00+0ε0
2.  Add0the0strongest0x0(smallest0P0value)0if0significant00
3.  Repeat0step020un:l0strongest0predictor0is0not0significant0
•  For0both0direc:ons,0we0divide0p0predictors0into:0
o  Selected:0β$≠000
o  Not0selected:0β$=000
IBAHCM0Away0Day,0180December020130 190
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Problems0with0simple0stepwise0model0selec:on0
•  Overconfidence0&0bias0in0the0selected0model0
o  Pvalues0underes:mated0due0to0mul:ple0tes:ng0
•  Leads.to.selec8on.of.too.many.variables.
•  Difficult0to0adjust0for0
o  Uncertainty0(standard0errors,0CIs)0underes:mated0
o  Effect0sizes0(slope,0R2,0etc)0overes:mated0
•  Poor0search0algorithm0
o  Inconsistent,0e.g.0forwards0≠0backwards0(some:mes)0
o  Majority0of0models0unexplored0
IBAHCM0Postdoc/PI0Seminar,0290May020150 200
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Example0using0simulated0data0
•  We0have0a0con:nuous0response,0y,0that0we0would0like0to0
explain0using0200con:nuous0explanatory0variables,0x10to0x200
•  n0=0500observa:ons0
•  We0plan0to0use0backwards0stepwise0selec:on:00
1.  Fit0maximal0model0
2.  Drop0weakest0x0(largest0P0value)0and0refit0
3.  Repeat0step020un:l0P0<00.050for0all0all0surviving0x0
IBAHCM0Postdoc/PI0Seminar,0290May020150 210
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Example0using0simulated0data0
•  Ini:al0(full)0model:00
o  y0=0β00+0β10x10+0β20x20+0…0+0β200x200+0ε$
•  The0true0values0of0the0slopes0β1200are0
o  β130=00.300
o  β4200=000
•  So0the0correct0model0is:00
o  y0=0β00+0β10x10+0β20x20+0β30x30+0ε00
IBAHCM0Postdoc/PI0Seminar,0290May020150 220
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x1
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x2
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x3
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x4
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x5
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x6
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x7
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x8
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x9
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x10
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x11
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x12
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x13
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x14
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x15
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x16
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x17
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x18
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x19
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−201234
y
x20
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
Slope estimates +/− 95% CI with 20 predictors remaining
True R2
= 22 %
Est. Radj
2
= 34 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
Slope estimates +/− 95% CI with 19 predictors remaining
True R2
= 22 %
Est. Radj
2
= 36 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
Slope estimates +/− 95% CI with 18 predictors remaining
True R2
= 22 %
Est. Radj
2
= 38 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
Slope estimates +/− 95% CI with 17 predictors remaining
True R2
= 22 %
Est. Radj
2
= 40 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
Slope estimates +/− 95% CI with 16 predictors remaining
True R2
= 22 %
Est. Radj
2
= 41 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ●
Slope estimates +/− 95% CI with 15 predictors remaining
True R2
= 22 %
Est. Radj
2
= 43 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ●
Slope estimates +/− 95% CI with 14 predictors remaining
True R2
= 22 %
Est. Radj
2
= 44 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
● ●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 13 predictors remaining
True R2
= 22 %
Est. Radj
2
= 45 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
● ●
●
●
●
●
●
●
● ●
●
● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 12 predictors remaining
True R2
= 22 %
Est. Radj
2
= 45 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
● ●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 11 predictors remaining
True R2
= 22 %
Est. Radj
2
= 46 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
● ●
●
● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 10 predictors remaining
True R2
= 22 %
Est. Radj
2
= 45 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 9 predictors remaining
True R2
= 22 %
Est. Radj
2
= 44 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 8 predictors remaining
True R2
= 22 %
Est. Radj
2
= 43 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 7 predictors remaining
True R2
= 22 %
Est. Radj
2
= 43 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Slope estimates +/− 95% CI with 20 predictors remaining
True R2
= 0 %
Est. Radj
2
= 21 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
Slope estimates +/− 95% CI with 19 predictors remaining
True R2
= 0 %
Est. Radj
2
= 23 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
Slope estimates +/− 95% CI with 18 predictors remaining
True R2
= 0 %
Est. Radj
2
= 26 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ● ●
Slope estimates +/− 95% CI with 17 predictors remaining
True R2
= 0 %
Est. Radj
2
= 28 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
Slope estimates +/− 95% CI with 16 predictors remaining
True R2
= 0 %
Est. Radj
2
= 30 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ●
Slope estimates +/− 95% CI with 15 predictors remaining
True R2
= 0 %
Est. Radj
2
= 32 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ●
Slope estimates +/− 95% CI with 14 predictors remaining
True R2
= 0 %
Est. Radj
2
= 33 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 13 predictors remaining
True R2
= 0 %
Est. Radj
2
= 35 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 12 predictors remaining
True R2
= 0 %
Est. Radj
2
= 36 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 11 predictors remaining
True R2
= 0 %
Est. Radj
2
= 37 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 10 predictors remaining
True R2
= 0 %
Est. Radj
2
= 38 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 9 predictors remaining
True R2
= 0 %
Est. Radj
2
= 39 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 8 predictors remaining
True R2
= 0 %
Est. Radj
2
= 39 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 7 predictors remaining
True R2
= 0 %
Est. Radj
2
= 38 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 6 predictors remaining
True R2
= 0 %
Est. Radj
2
= 37 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 5 predictors remaining
True R2
= 0 %
Est. Radj
2
= 36 %
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Recap0
•  Backwards0stepwise0selec:on:00
o  70discoveries0
o  50false0discoveries0using0data0with0permuted0y$
o  FDR0=05/70=071%?0
•  No0of0false0discoveries0is0random0–0need0to0average0over0
many0permuta:ons0
IBAHCM0Postdoc/PI0Seminar,0290May020150 240
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Distribution of N false discoveries from 1000 permutations, k = 3.84
n.false.discoveries
Frequency
0 2 4 6 8 10 12
0100200300400500
7 discoveries
Mean 2 false
discoveries
FDR = 28%
IBAHCM0Postdoc/PI0Seminar,0290May020150 250
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Now0what?0
•  Having0an0es:mate0of0FDR0=0P(making0a0fool0of0ourselves)0is0
useful0in0itself0
•  Now0that0we0can0es:mate0FDR,0we0can0increase0stringency0
un:l0FDR0is0acceptable0
IBAHCM0Postdoc/PI0Seminar,0290May020150 260
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Effect0on0FDR0of0increasing0selec:on0criterion0stringency0
●
●
●
● ●
●
●
●
●
● ● ●
2 3 4 5 6 7 8 9
0.00.10.20.30.40.50.6
k
Falsediscoveryrate
11
7
2
1 1
1
1
1
1
0 0 0
Relationship between FDR and test stringency (k)
IBAHCM0Postdoc/PI0Seminar,0290May020150 270
min(AIC)0
P0<00.050
k0
k0=07.50
gives0
FDR0=020%0
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Final0model0aiming0for0FDR0=020%0
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0−0.50.00.51.0
Predictor
Slope(β)
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Final model selected using FDR = 20%
True R2
= 22 %
Est. Radj
2
= 14 %
IBAHCM0Postdoc/PI0Seminar,0290May020150 280
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
How0can0we0avoid0making0fools0of0ourselves0
with0model0selec:on?0
•  Is0it0necessary?0
•  Is0automa:c0selec:on0appropriate,0i.e.0are0all0hypotheses0
equally0plausible?0
•  Avoid0stepwise0selec:on0–0use0superior0methods,0e.g.0lasso0
•  If0using0stepwise0selec:on0
o  Gauge0reliability0of0results0(e.g.0monitoring0FDR)0
o  Controlling0reliability0(e.g.0control0FDR)0
•  Transparency0is0the0last0defence0against0folly!0
IBAHCM0Postdoc/PI0Seminar,0290May020150 290
paul.johnson@glasgow.ac.uk00!00@paulcdjo0
Conclusions0
•  The0“crisis0of0irreproducibility”0is0harmful0and0we0need0to0
avoid0contribu:ng0to0it0
•  We0need0to0be0aware0of0the0(un)reliability0of0our0findings…0
o  Some$scien-sts$have$unreasonable$expecta-ons$of$
replica-on$of$results0–0Stephen0Senn0
•  …so0we0need0to0understand0the0proper:es0of0our0sta:s:cal0
analyses0
o  Banning0Pvalues0is0not0the0answer0
o  Beaer0sta:s:cal0understanding0from0design0to0analysis0is0
part0of0the0answer0
IBAHCM0Postdoc/PI0Seminar,0290May020150 300

More Related Content

Recently uploaded

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 

Recently uploaded (20)

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

How not to make a fool of yourself with model selection