SlideShare a Scribd company logo
Sample size for binary logistic prediction models:
Beyond events per variable criteria
Maarten van Smeden, PhD
Leiden University Medical Center

Senior researcher

MEMTAB 2018

Utrecht, July 3
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Sample size prediction modeling literature (2018)
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Events per variable (EPV)
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Events per variable (EPV)
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Events per variable (EPV)
Critique

• Flimsy supporting evidence for 10 EPV rule [1]

• 50 EPV rule more realistic with traditional variable selection techniques [2]

• 5 EPV sufficient to reduce (average) overfitting after “modern” shrinkage [3]

• EPV only part of sample size story [4]

[1] van Smeden et al., BMC MRM, 2014, doi: 10.1186/s12874-016-0267-3

[2] Steyerberg et al., Stat Med, 2000, doi: 10.1002/(SICI)1097-0258(20000430)19:8<1059::AID-SIM412>3.0.CO;2-0 

[3] Pavlou et al., Stat Med, 2016, doi: 10.1002/sim.6782

[4] Ogundimu et al., JCE, 2016, doi: 10.1016/j.jclinepi.2016.02.031
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
EPV forgets about the intercept?
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
New sample size criteria: rMSPE
Root Mean Squared Prediction Error (rMSPE): 

standard deviation of out-of-sample probability prediction error

Rational: since clinical prediction is about probability estimation, a
sample size criterion should be based on allowable error rates in these
estimates
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
*Coverage property not guaranteed: assuming errors are IID normal
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Unfortunately no closed form solution for out-of-sample rMSPE
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Simulation study
• 4,032 simulation conditions (factorial design)

simulation factors: EPV (3 to 50), number candidate predictors (4 to 12),
events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution
and correlation predictors, number of noise variables

• 5,000 replications per condition -> > 20 million simulation runs
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Simulation study
• 4,032 simulation conditions (factorial design)

simulation factors: EPV (3 to 50), number candidate predictors (4 to 12),
events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution
and correlation predictors, number of noise variables

• 5,000 replications per condition -> > 20 million simulation runs
• Each run: generate pairs of derivation data and validation data
(large, with 5,000 expected events) and develop + validate various
logistic prediction models

• Will focus on maximum likelihood logistic regression
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Simulation study
• 4,032 simulation conditions (factorial design)

simulation factors: EPV (3 to 50), number candidate predictors (4 to 12),
events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution
and correlation predictors, number of noise variables

• 5,000 replications per condition -> > 20 million simulation runs
• Each run: generate pairs of derivation data and validation data
(large, with 5,000 expected events) and develop + validate various
logistic prediction models

• Will focus on maximum likelihood logistic regression

• Simulation meta models: fit linear (Ridge) regression models to predict
simulation outcome (rMSPE) from simulation factors
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Simulation meta models
rMPSE

• Meta-model with 3 (of 7) factors: N, events fraction and number of
(candidate) predictors: R2 = 0.992
• (Meta-model with only EPV as factor: R2 = 0.432)
https://mvansmeden.shinyapps.io/BeyondEPV/
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
In press
Thanks to Richard Riley for commenting on early draft
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Final remarks
• 10 EPV prediction models can produce widely inaccurate probability
estimates

• New sample size criterion - based on rMSPE - could be accurately
approximated by predictable data characteristics

• Validation, analytical work, and extensions still needs to be done

• Our new sample size calculation shiny-app is “Beta”; can be used to
approximate rMSPE for settings that stay close to our simulation
design (article in press)

• One sample criterion probably isn’t always enough. Notably, low events
fraction settings may come with low rMSPE and high need of shrinkage
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Final remarks
Binary logistic regression sample size recommendations

1. Think about allowable probability prediction error (e.g. in terms of 95%
coverage regions)

2. If you can, run a realistic simulation study

3. If you can’t do 2, use our shiny-app with caution to calculate minimal
sample size
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
https://mvansmeden.shinyapps.io/BeyondEPV/
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Logistic prediction models
Schmidt et al., Schizo Bulletin, 2017, doi:10.1093/schbul/sbw098; Damen et al., BMJ, 2017, doi:10.1136/bmj.i2416; Collins et al., BMC MRM, 2014, doi:10.1186/1471-2288-14-40; Collins et al., BMC Med, 2011, doi:
10.1186/1741-7015-9-103; Bouwmeester et al., Plos Med, 2012: 10.1371/journal.pmed.1001221.
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
New sample size criterion
Use expected root Mean Squared Prediction Error (rMSPE)

Interpretation: standard deviation of expected out-of-sample probability
prediction error

Where are the unobservable “true” probabilities that would have been
obtained would the prediction model have been derived with correct
functional form and infinite sample size; are estimated probabilities from
the derived model in a large external set of similar individuals (“out-of-
sample”).

rMSPE = E[(πi − ̂πi)2
],
πi
̂πi
Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
Difference between estimated probability from a prediction model
when applied in large sample validation study vs “true” probability
obtained when the same model would have been derived from an
infinitely large sample

More Related Content

What's hot

ML and AI: a blessing and curse for statisticians and medical doctors
ML and AI: a blessing and curse forstatisticians and medical doctorsML and AI: a blessing and curse forstatisticians and medical doctors
ML and AI: a blessing and curse for statisticians and medical doctors
Maarten van Smeden
 
A gentle introduction to AI for medicine
A gentle introduction to AI for medicineA gentle introduction to AI for medicine
A gentle introduction to AI for medicine
Maarten van Smeden
 
AI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use CasesAI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use Cases
Health Catalyst
 
Prognosis-based medicine: merits and pitfalls of forecasting patient health
Prognosis-based medicine: merits and pitfalls of forecasting patient healthPrognosis-based medicine: merits and pitfalls of forecasting patient health
Prognosis-based medicine: merits and pitfalls of forecasting patient health
Maarten van Smeden
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
amiteshg
 
Data analytics
Data analyticsData analytics
Data analytics
Dr.Bhuvaneswari Velumani
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
Maarten van Smeden
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
Stats Statswork
 
Master of Technology in Enterprise Business Analytics
Master of Technology in Enterprise Business AnalyticsMaster of Technology in Enterprise Business Analytics
Master of Technology in Enterprise Business Analytics
NUS-ISS
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Seth Grimes
 
Personalizing a One-To-Many Customer Success Approach
Personalizing a One-To-Many Customer Success ApproachPersonalizing a One-To-Many Customer Success Approach
Personalizing a One-To-Many Customer Success Approach
Amity
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
Maarten van Smeden
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Kimberley Mitchell
 
Data quality metrics infographic
Data quality metrics infographicData quality metrics infographic
Data quality metrics infographic
Intellspot
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 
Data Quality
Data QualityData Quality
Data Quality
jerdeb
 
Data analytics
Data analyticsData analytics
Data analytics
davidfergarcia
 
Machine learning for social media analytics
Machine learning for  social media analyticsMachine learning for  social media analytics
Machine learning for social media analytics
Jenya Terpil
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
Stat Analytica
 

What's hot (20)

ML and AI: a blessing and curse for statisticians and medical doctors
ML and AI: a blessing and curse forstatisticians and medical doctorsML and AI: a blessing and curse forstatisticians and medical doctors
ML and AI: a blessing and curse for statisticians and medical doctors
 
A gentle introduction to AI for medicine
A gentle introduction to AI for medicineA gentle introduction to AI for medicine
A gentle introduction to AI for medicine
 
AI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use CasesAI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use Cases
 
Prognosis-based medicine: merits and pitfalls of forecasting patient health
Prognosis-based medicine: merits and pitfalls of forecasting patient healthPrognosis-based medicine: merits and pitfalls of forecasting patient health
Prognosis-based medicine: merits and pitfalls of forecasting patient health
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 
Data analytics
Data analyticsData analytics
Data analytics
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Master of Technology in Enterprise Business Analytics
Master of Technology in Enterprise Business AnalyticsMaster of Technology in Enterprise Business Analytics
Master of Technology in Enterprise Business Analytics
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Personalizing a One-To-Many Customer Success Approach
Personalizing a One-To-Many Customer Success ApproachPersonalizing a One-To-Many Customer Success Approach
Personalizing a One-To-Many Customer Success Approach
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Data quality metrics infographic
Data quality metrics infographicData quality metrics infographic
Data quality metrics infographic
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 
Data Quality
Data QualityData Quality
Data Quality
 
Data analytics
Data analyticsData analytics
Data analytics
 
Machine learning for social media analytics
Machine learning for  social media analyticsMachine learning for  social media analytics
Machine learning for social media analytics
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
 

Similar to Sample size for binary logistic prediction models: Beyond events per variable criteria

An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
recsysfr
 
March 2, 2018 - Machine Learning for Production Forecasting
March 2, 2018 - Machine Learning for Production ForecastingMarch 2, 2018 - Machine Learning for Production Forecasting
March 2, 2018 - Machine Learning for Production Forecasting
David Fulford
 
Revealing Differences in Designer‘s and Users‘Perspectives
Revealing Differences in Designer‘s and Users‘PerspectivesRevealing Differences in Designer‘s and Users‘Perspectives
Revealing Differences in Designer‘s and Users‘Perspectives
Sebastian Feuerstack
 
Big data fusion and parametrization for strategic transport models
Big data fusion and parametrization for strategic transport modelsBig data fusion and parametrization for strategic transport models
Big data fusion and parametrization for strategic transport models
Luuk Brederode
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class
QuantUniversity
 
Selecting Ontologies and Publishing Data of Electrical Appliances: A Refrige...
Selecting Ontologies  and Publishing Data of Electrical Appliances: A Refrige...Selecting Ontologies  and Publishing Data of Electrical Appliances: A Refrige...
Selecting Ontologies and Publishing Data of Electrical Appliances: A Refrige...
Anna Fensel
 
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
yguarata
 
Story behind Microelectronic Circuits
Story behind Microelectronic CircuitsStory behind Microelectronic Circuits
Story behind Microelectronic Circuits
Hoopeer Hoopeer
 
Lecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfLecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdf
RTEFGDFGJU
 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
Sanjeev Deshmukh
 
M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...
Martin Scharm
 
e:Bio Kick-Off Meeting, SEMS
e:Bio Kick-Off Meeting, SEMSe:Bio Kick-Off Meeting, SEMS
e:Bio Kick-Off Meeting, SEMS
University Medicine Greifswald
 
Topics of interest for IWPT'01.doc
Topics of interest for IWPT'01.docTopics of interest for IWPT'01.doc
Topics of interest for IWPT'01.doc
butest
 
Subject: Ex-post impact evaluations of energy efficiency policies in Europe
Subject:	Ex-post impact evaluations of energy efficiency policies in EuropeSubject:	Ex-post impact evaluations of energy efficiency policies in Europe
Subject: Ex-post impact evaluations of energy efficiency policies in Europe
Leonardo ENERGY
 
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
QuantUniversity
 
M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...
Martin Scharm
 
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
Stefan Bischof
 
Risk-based cost methods - David Engel, Pacific Northwest National Laboratory
Risk-based cost methods - David Engel, Pacific Northwest National LaboratoryRisk-based cost methods - David Engel, Pacific Northwest National Laboratory
Risk-based cost methods - David Engel, Pacific Northwest National Laboratory
Global CCS Institute
 
2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste
Cdiscount
 
ABB Scheduling.pdf
ABB Scheduling.pdfABB Scheduling.pdf
ABB Scheduling.pdf
AmricoAzevedo2
 

Similar to Sample size for binary logistic prediction models: Beyond events per variable criteria (20)

An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
An Homophily-based Approach for Fast Post Recommendation in Microblogging Sys...
 
March 2, 2018 - Machine Learning for Production Forecasting
March 2, 2018 - Machine Learning for Production ForecastingMarch 2, 2018 - Machine Learning for Production Forecasting
March 2, 2018 - Machine Learning for Production Forecasting
 
Revealing Differences in Designer‘s and Users‘Perspectives
Revealing Differences in Designer‘s and Users‘PerspectivesRevealing Differences in Designer‘s and Users‘Perspectives
Revealing Differences in Designer‘s and Users‘Perspectives
 
Big data fusion and parametrization for strategic transport models
Big data fusion and parametrization for strategic transport modelsBig data fusion and parametrization for strategic transport models
Big data fusion and parametrization for strategic transport models
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class
 
Selecting Ontologies and Publishing Data of Electrical Appliances: A Refrige...
Selecting Ontologies  and Publishing Data of Electrical Appliances: A Refrige...Selecting Ontologies  and Publishing Data of Electrical Appliances: A Refrige...
Selecting Ontologies and Publishing Data of Electrical Appliances: A Refrige...
 
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
 
Story behind Microelectronic Circuits
Story behind Microelectronic CircuitsStory behind Microelectronic Circuits
Story behind Microelectronic Circuits
 
Lecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfLecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdf
 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
 
M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...
 
e:Bio Kick-Off Meeting, SEMS
e:Bio Kick-Off Meeting, SEMSe:Bio Kick-Off Meeting, SEMS
e:Bio Kick-Off Meeting, SEMS
 
Topics of interest for IWPT'01.doc
Topics of interest for IWPT'01.docTopics of interest for IWPT'01.doc
Topics of interest for IWPT'01.doc
 
Subject: Ex-post impact evaluations of energy efficiency policies in Europe
Subject:	Ex-post impact evaluations of energy efficiency policies in EuropeSubject:	Ex-post impact evaluations of energy efficiency policies in Europe
Subject: Ex-post impact evaluations of energy efficiency policies in Europe
 
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
PythonQuants conference - QuantUniversity presentation - Stress Testing in th...
 
M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...M2CAT: Extracting reproducible simulation studies from model repositories usi...
M2CAT: Extracting reproducible simulation studies from model repositories usi...
 
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
ISWC 2015 - Collecting, integrating, enriching and republishing open city dat...
 
Risk-based cost methods - David Engel, Pacific Northwest National Laboratory
Risk-based cost methods - David Engel, Pacific Northwest National LaboratoryRisk-based cost methods - David Engel, Pacific Northwest National Laboratory
Risk-based cost methods - David Engel, Pacific Northwest National Laboratory
 
2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste
 
ABB Scheduling.pdf
ABB Scheduling.pdfABB Scheduling.pdf
ABB Scheduling.pdf
 

More from Maarten van Smeden

Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
Maarten van Smeden
 
UMC Utrecht AI Methods Lab
UMC Utrecht AI Methods LabUMC Utrecht AI Methods Lab
UMC Utrecht AI Methods Lab
Maarten van Smeden
 
Associate professor lecture
Associate professor lectureAssociate professor lecture
Associate professor lecture
Maarten van Smeden
 
Algorithm based medicine
Algorithm based medicineAlgorithm based medicine
Algorithm based medicine
Maarten van Smeden
 
Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...
Maarten van Smeden
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
Maarten van Smeden
 
Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19
Maarten van Smeden
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
Maarten van Smeden
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confounding
Maarten van Smeden
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
Maarten van Smeden
 
Living systematic reviews: now and in the future
Living systematic reviews: now and in the futureLiving systematic reviews: now and in the future
Living systematic reviews: now and in the future
Maarten van Smeden
 
Voorspelmodellen en COVID-19
Voorspelmodellen en COVID-19Voorspelmodellen en COVID-19
Voorspelmodellen en COVID-19
Maarten van Smeden
 
The statistics of the coronavirus
The statistics of the coronavirusThe statistics of the coronavirus
The statistics of the coronavirus
Maarten van Smeden
 
COVID-19 related prediction models for diagnosis and prognosis - a living sys...
COVID-19 related prediction models for diagnosis and prognosis - a living sys...COVID-19 related prediction models for diagnosis and prognosis - a living sys...
COVID-19 related prediction models for diagnosis and prognosis - a living sys...
Maarten van Smeden
 
Measurement error in medical research
Measurement error in medical researchMeasurement error in medical research
Measurement error in medical research
Maarten van Smeden
 
The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling
Maarten van Smeden
 
The absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problemThe absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problem
Maarten van Smeden
 
Anatomy of a successful science thread
Anatomy of a successful science threadAnatomy of a successful science thread
Anatomy of a successful science thread
Maarten van Smeden
 

More from Maarten van Smeden (18)

Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
UMC Utrecht AI Methods Lab
UMC Utrecht AI Methods LabUMC Utrecht AI Methods Lab
UMC Utrecht AI Methods Lab
 
Associate professor lecture
Associate professor lectureAssociate professor lecture
Associate professor lecture
 
Algorithm based medicine
Algorithm based medicineAlgorithm based medicine
Algorithm based medicine
 
Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
 
Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confounding
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
 
Living systematic reviews: now and in the future
Living systematic reviews: now and in the futureLiving systematic reviews: now and in the future
Living systematic reviews: now and in the future
 
Voorspelmodellen en COVID-19
Voorspelmodellen en COVID-19Voorspelmodellen en COVID-19
Voorspelmodellen en COVID-19
 
The statistics of the coronavirus
The statistics of the coronavirusThe statistics of the coronavirus
The statistics of the coronavirus
 
COVID-19 related prediction models for diagnosis and prognosis - a living sys...
COVID-19 related prediction models for diagnosis and prognosis - a living sys...COVID-19 related prediction models for diagnosis and prognosis - a living sys...
COVID-19 related prediction models for diagnosis and prognosis - a living sys...
 
Measurement error in medical research
Measurement error in medical researchMeasurement error in medical research
Measurement error in medical research
 
The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling
 
The absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problemThe absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problem
 
Anatomy of a successful science thread
Anatomy of a successful science threadAnatomy of a successful science thread
Anatomy of a successful science thread
 

Recently uploaded

MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
Steffi Friedrichs
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
pablovgd
 
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
Sérgio Sacani
 
Composting blue materials - Joshua Cabell
Composting blue materials - Joshua CabellComposting blue materials - Joshua Cabell
Composting blue materials - Joshua Cabell
Faculty of Applied Chemistry and Materials Science
 
Shoot apex organization and its theories
Shoot apex organization and its theoriesShoot apex organization and its theories
Shoot apex organization and its theories
MEGHASHREE A M
 
Types of Hypersensitivity Reactions.pptx
Types of Hypersensitivity Reactions.pptxTypes of Hypersensitivity Reactions.pptx
Types of Hypersensitivity Reactions.pptx
Isha Pandey
 
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
marigreenproject
 
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra IonBiochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Faculty of Applied Chemistry and Materials Science
 
Current Electricity MCQ Class XII. Physics pptx
Current Electricity MCQ Class XII. Physics pptxCurrent Electricity MCQ Class XII. Physics pptx
Current Electricity MCQ Class XII. Physics pptx
ArunachalamM22
 
Lake classification and Morphometry.pptx
Lake classification and Morphometry.pptxLake classification and Morphometry.pptx
Lake classification and Morphometry.pptx
boobalanbfsc
 
NuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdfNuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdf
pablovgd
 
Protein: Structure and Function (The Agricultural Magazine)
Protein: Structure and Function (The Agricultural Magazine)Protein: Structure and Function (The Agricultural Magazine)
Protein: Structure and Function (The Agricultural Magazine)
Dr. Lenin Kumar Bompalli
 
Synopsis: Analysis of a Metallic Specimen
Synopsis: Analysis of a Metallic SpecimenSynopsis: Analysis of a Metallic Specimen
Synopsis: Analysis of a Metallic Specimen
Sérgio Sacani
 
Speed-accuracy trade-off for the diffusion models
Speed-accuracy trade-off for the diffusion modelsSpeed-accuracy trade-off for the diffusion models
Speed-accuracy trade-off for the diffusion models
sosukeito
 
Ancient Theory, Abiogenesis , Biogenesis
Ancient Theory, Abiogenesis , BiogenesisAncient Theory, Abiogenesis , Biogenesis
Ancient Theory, Abiogenesis , Biogenesis
SoniaBajaj10
 
Rice Genome Project a complete saga .(1).pptx
Rice Genome  Project a complete saga .(1).pptxRice Genome  Project a complete saga .(1).pptx
Rice Genome Project a complete saga .(1).pptx
SoumyaDixit11
 
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
Faculty of Applied Chemistry and Materials Science
 
Traditional, current and future use of fish and seaweed for fertilisation - ...
Traditional, current and future use of fish and seaweed for fertilisation -  ...Traditional, current and future use of fish and seaweed for fertilisation -  ...
Traditional, current and future use of fish and seaweed for fertilisation - ...
Faculty of Applied Chemistry and Materials Science
 
How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?
How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?
How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?
Christian Birchler
 
Pancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptxPancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptx
muralinath2
 

Recently uploaded (20)

MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
MACRAMÉ ChIPs @Behoerdenklausur 2024 (Berlin)
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
 
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
 
Composting blue materials - Joshua Cabell
Composting blue materials - Joshua CabellComposting blue materials - Joshua Cabell
Composting blue materials - Joshua Cabell
 
Shoot apex organization and its theories
Shoot apex organization and its theoriesShoot apex organization and its theories
Shoot apex organization and its theories
 
Types of Hypersensitivity Reactions.pptx
Types of Hypersensitivity Reactions.pptxTypes of Hypersensitivity Reactions.pptx
Types of Hypersensitivity Reactions.pptx
 
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
 
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra IonBiochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
 
Current Electricity MCQ Class XII. Physics pptx
Current Electricity MCQ Class XII. Physics pptxCurrent Electricity MCQ Class XII. Physics pptx
Current Electricity MCQ Class XII. Physics pptx
 
Lake classification and Morphometry.pptx
Lake classification and Morphometry.pptxLake classification and Morphometry.pptx
Lake classification and Morphometry.pptx
 
NuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdfNuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdf
 
Protein: Structure and Function (The Agricultural Magazine)
Protein: Structure and Function (The Agricultural Magazine)Protein: Structure and Function (The Agricultural Magazine)
Protein: Structure and Function (The Agricultural Magazine)
 
Synopsis: Analysis of a Metallic Specimen
Synopsis: Analysis of a Metallic SpecimenSynopsis: Analysis of a Metallic Specimen
Synopsis: Analysis of a Metallic Specimen
 
Speed-accuracy trade-off for the diffusion models
Speed-accuracy trade-off for the diffusion modelsSpeed-accuracy trade-off for the diffusion models
Speed-accuracy trade-off for the diffusion models
 
Ancient Theory, Abiogenesis , Biogenesis
Ancient Theory, Abiogenesis , BiogenesisAncient Theory, Abiogenesis , Biogenesis
Ancient Theory, Abiogenesis , Biogenesis
 
Rice Genome Project a complete saga .(1).pptx
Rice Genome  Project a complete saga .(1).pptxRice Genome  Project a complete saga .(1).pptx
Rice Genome Project a complete saga .(1).pptx
 
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
 
Traditional, current and future use of fish and seaweed for fertilisation - ...
Traditional, current and future use of fish and seaweed for fertilisation -  ...Traditional, current and future use of fish and seaweed for fertilisation -  ...
Traditional, current and future use of fish and seaweed for fertilisation - ...
 
How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?
How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?
How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?
 
Pancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptxPancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptx
 

Sample size for binary logistic prediction models: Beyond events per variable criteria

  • 1. Sample size for binary logistic prediction models: Beyond events per variable criteria Maarten van Smeden, PhD Leiden University Medical Center Senior researcher MEMTAB 2018 Utrecht, July 3
  • 2. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Sample size prediction modeling literature (2018)
  • 3. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Events per variable (EPV)
  • 4. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Events per variable (EPV)
  • 5. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Events per variable (EPV) Critique • Flimsy supporting evidence for 10 EPV rule [1] • 50 EPV rule more realistic with traditional variable selection techniques [2] • 5 EPV sufficient to reduce (average) overfitting after “modern” shrinkage [3] • EPV only part of sample size story [4] [1] van Smeden et al., BMC MRM, 2014, doi: 10.1186/s12874-016-0267-3 [2] Steyerberg et al., Stat Med, 2000, doi: 10.1002/(SICI)1097-0258(20000430)19:8<1059::AID-SIM412>3.0.CO;2-0  [3] Pavlou et al., Stat Med, 2016, doi: 10.1002/sim.6782 [4] Ogundimu et al., JCE, 2016, doi: 10.1016/j.jclinepi.2016.02.031
  • 6. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 EPV forgets about the intercept?
  • 7. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 New sample size criteria: rMSPE Root Mean Squared Prediction Error (rMSPE): 
 standard deviation of out-of-sample probability prediction error Rational: since clinical prediction is about probability estimation, a sample size criterion should be based on allowable error rates in these estimates
  • 8. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 9. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 10. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 *Coverage property not guaranteed: assuming errors are IID normal
  • 11. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 12. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Unfortunately no closed form solution for out-of-sample rMSPE
  • 13. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Simulation study • 4,032 simulation conditions (factorial design)
 simulation factors: EPV (3 to 50), number candidate predictors (4 to 12), events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution and correlation predictors, number of noise variables • 5,000 replications per condition -> > 20 million simulation runs
  • 14. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Simulation study • 4,032 simulation conditions (factorial design)
 simulation factors: EPV (3 to 50), number candidate predictors (4 to 12), events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution and correlation predictors, number of noise variables • 5,000 replications per condition -> > 20 million simulation runs • Each run: generate pairs of derivation data and validation data (large, with 5,000 expected events) and develop + validate various logistic prediction models • Will focus on maximum likelihood logistic regression
  • 15. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Simulation study • 4,032 simulation conditions (factorial design)
 simulation factors: EPV (3 to 50), number candidate predictors (4 to 12), events fraction (1/16 to 1/2), area under ROC curve (0.65 to 0.85), distribution and correlation predictors, number of noise variables • 5,000 replications per condition -> > 20 million simulation runs • Each run: generate pairs of derivation data and validation data (large, with 5,000 expected events) and develop + validate various logistic prediction models • Will focus on maximum likelihood logistic regression • Simulation meta models: fit linear (Ridge) regression models to predict simulation outcome (rMSPE) from simulation factors
  • 16. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Simulation meta models rMPSE • Meta-model with 3 (of 7) factors: N, events fraction and number of (candidate) predictors: R2 = 0.992 • (Meta-model with only EPV as factor: R2 = 0.432) https://mvansmeden.shinyapps.io/BeyondEPV/
  • 17. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 18. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 In press Thanks to Richard Riley for commenting on early draft
  • 19. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Final remarks • 10 EPV prediction models can produce widely inaccurate probability estimates • New sample size criterion - based on rMSPE - could be accurately approximated by predictable data characteristics • Validation, analytical work, and extensions still needs to be done • Our new sample size calculation shiny-app is “Beta”; can be used to approximate rMSPE for settings that stay close to our simulation design (article in press) • One sample criterion probably isn’t always enough. Notably, low events fraction settings may come with low rMSPE and high need of shrinkage
  • 20. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Final remarks Binary logistic regression sample size recommendations 1. Think about allowable probability prediction error (e.g. in terms of 95% coverage regions) 2. If you can, run a realistic simulation study 3. If you can’t do 2, use our shiny-app with caution to calculate minimal sample size
  • 21. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 https://mvansmeden.shinyapps.io/BeyondEPV/
  • 22. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 23. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018
  • 24. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Logistic prediction models Schmidt et al., Schizo Bulletin, 2017, doi:10.1093/schbul/sbw098; Damen et al., BMJ, 2017, doi:10.1136/bmj.i2416; Collins et al., BMC MRM, 2014, doi:10.1186/1471-2288-14-40; Collins et al., BMC Med, 2011, doi: 10.1186/1741-7015-9-103; Bouwmeester et al., Plos Med, 2012: 10.1371/journal.pmed.1001221.
  • 25. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 New sample size criterion Use expected root Mean Squared Prediction Error (rMSPE) Interpretation: standard deviation of expected out-of-sample probability prediction error Where are the unobservable “true” probabilities that would have been obtained would the prediction model have been derived with correct functional form and infinite sample size; are estimated probabilities from the derived model in a large external set of similar individuals (“out-of- sample”). rMSPE = E[(πi − ̂πi)2 ], πi ̂πi
  • 26. Slides available at: https://www.slideshare.net/MaartenvanSmeden/presentations MEMTAB, Utrecht, July 3 2018 Difference between estimated probability from a prediction model when applied in large sample validation study vs “true” probability obtained when the same model would have been derived from an infinitely large sample