Open Science Better Science? Steyerberg 2June2022.pptx

June, 2022
Is Open Science Better Science?
Ewout W. Steyerberg, PhD
Professor of Clinical Biostatistics and
Medical Decision Making
Thanks to many for assistance and inspiration,
including the GAP3 consortium, CENTER-TBI Study
Yes, but …

Open vs closed science
Long ago
- Performed by few, elitarian scientists
- Doing private experiments
- Discussion in small, closed communities

Probabilities to quantify uncertainty
• Christiaan Huygens 1657:
'Van rekeningh in spelen van geluck'
• Thomas Bayes 1763:
An Essay towards solving a Problem in the Doctrine of Chances”
(read to the Royal Society by Richard Price)
• Pierre Laplace 1812:
Théorie analytique des probabilités
6-Jun-22
3 Insert > Header & footer

Open vs closed science
Long ago
- Performed by few, elitarian scientists
- Doing private experiments
- Discussion in small, closed communities
Recent
- Science as a profession
- Protect data + code as intellectual property
- Aim for shocking findings in high IF journals
https://www.sciencemag.org/news/2020/06/whos-blame-these-three-scientists-are-heart-surgisphere-covid-19-scandal

Overall claim
“Open Science will make research better”
Vote pro / neutral / con
“More data is better”
Vote pro / neutral / con
6-Jun-22

Today
Aims:
- Highlight some strong points in Open Science
- Hint at some challenges in Open Science
Reflections based on personal 30-yr research experience,
specific focus on prediction research / decision making
6-Jun-22

Open Science to better address
Big Research questions

Open science research questions: case 1
Example 1: Red cards and dark skin soccer players
https://psyarxiv.com/qkwst/
6-Jun-22

• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin
toned players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3)
• 20 teams: statistically significant positive effect, 9: non-significant relation
6-Jun-22

Estimated odds ratios by 29 research teams
6-Jun-22

“Logistic regression”
6-Jun-22

• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin toned
players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3).
• 20 teams: statistically significant positive effect, 9: non-significant relation.
• 21 unique combinations of covariates
• “Variation in analysis of complex data may be difficult to
avoid, even by experts with honest intentions”
6-Jun-22

6-Jun-22
Example from Maarten van Smeden
@MaartenvSmeden

Predicting mortality – the media

Findings not convincing
Cox, #4, 30 vars, max c =0.793
RF, #7, 600 vars, c=0.797
Elastic, #9, 600 vars, c=0.801
6-Jun-22

Machine learning vs conventional modeling
1. Findings convincing?
“We found that random forests did not outperform Cox models despite their
inherent ability to accommodate nonlinearities and interactions. …
Elastic nets achieved the highest discrimination performance …, demonstrating
the ability of regularisation to select relevant variables and optimise model
coefficients in an EHR context.”
6-Jun-22

Machine learning vs conventional modeling
1. Findings convincing? Not in case-study
2. Systematic / ”it depends” ?
6-Jun-22

6-Jun-22

• 243 real datasets from “the OpenML database”
• RF performed better than LR:
mean difference between RF and LR was 0.041 (95%-CI =[0.031,0.053]) for
the Area Under the ROC Curve
• Results were dependent on the inclusion criteria used to select the example
datasets
• ES: Results rely on 10 x 10-fold cross-validation
6-Jun-22

• More clarification needed when ML / RF works best; at least large N needed
6-Jun-22

Systematic review on ML vs classic modeling
6-Jun-22

Summary on examples of Open Science
to better address Big research questions
• 1 data set
• multiple modelers
• Multiple modeling options
• 1 neutral comparison; 243 OpenML databases
• Review of 282 comparative studies: meta-research
6-Jun-22

Open Science: data sharing
 Collaboration vs giving

6-Jun-22

Heterogeneity in data .. ignored
6-Jun-22

Data sharing
• Pro:
• Allowed for larger sample size in a rare disease
• Cons:
• Heterogeneity?
• Substantial politics / efforts
6-Jun-22

Open Science: analyses and interpretation

Analyses: ODHSI model
6-Jun-22

OHDSI: COVID and other research topics
6-Jun-22

The power of OHDSI
6-Jun-22

OMOP common data model enables sharing of
model development code
6-Jun-22

Performance for different outcomes in multiple cohorts
6-Jun-22

OHDSI: bridging data sharing - analyses
• Keep data local
• Run locally started, centrally available analyses
• Share results centrally

Open Science challenge:
dealing with heterogeneity for prediction research
Heterogeneity
• Study design
• Selection of subjects
• Measurement of covariates
• Measurement of outcomes
• Associations of covariates with outcome
• Overall outcome rates
• Performance of prediction models

Analyses: dealing with heterogeneity
6-Jun-22

15 cohorts: 11 RCTs, 4 Observational studies
6-Jun-22

Heterogeneous case-mix
6-Jun-22

Heterogeneous predictor effects
6-Jun-22

Heterogeneous predictions
6-Jun-22

Heterogeneity  uncertainty in individual predictions
given that a prespecified logistic model is fitted
6-Jun-22

“Open Science is Better Science”
1. Research questions in competitions
• Red cards
• Neutral comparisons / meta-research
2. Data sharing
• Collaborative efforts most successful
3. Analyses
• OHDSI: modern, keep data local
• Heterogeneity
6-Jun-22

Open Science Better Science? Steyerberg 2June2022.pptx

Recommended

Recommended

More Related Content

Similar to Open Science Better Science? Steyerberg 2June2022.pptx

Similar to Open Science Better Science? Steyerberg 2June2022.pptx (20)

More from Ewout Steyerberg

More from Ewout Steyerberg (6)

Recently uploaded

Recently uploaded (20)

Open Science Better Science? Steyerberg 2June2022.pptx