1. Big data in research:
possibilities and pitfalls
Joppe Nijman
Pediatric Intensivist / Clinician Researcher
University Medical Center Utrecht
SOWIESO dag 2022
Slideshare.net/joppenijman
2. “ Big data is data that contains
greater variety, arriving in
increasing volumes and with
more velocity, making it difficult
or impossible to process using
traditional methods.
2
9. “ Artificial intelligence will not
replace doctors.
But doctors who use a rtificial
intelligence will replace those
who don’t.
9
Berci Meskó - https://medicalfuturist.com/
10. How (not) to make your AI model
10
1 3 5
6
4
2
Healthcare data
Chosing the right
statistical method
Model evaluation &
prediction
Data preparation Modelling Implementation,
legal & ethical
issues
11. Type of data
⬡ (Semi)-structured vs unstructured
⬡ Text, numbers, images, etc.
Data availability?
FAIR data
(Findability, Accessibility, Interoperability, Reusability)
Healthcare data
11
Johnson et al. JACC 2018
12. Research for new breast
cancer biomarkers
⬡ Structured data
⬡ Mainly numbers (imaging)
⬡ Machine learning?
∙ Amount of data
∙ Complex interactions
Example
12
Rodrigues-Ferreira et al. Cancer Lett 2022.
13. Garbage in = garbage out
⬡ Measurement error
⬡ Missing data
⬡ Normalization
⬡ Unclassified data
⬡ Confounding (e.g. treatment effects)?
Data preparation
13
Johnson et al. JACC 2018
https://xkcd.com/1838/
16. Automated sepsis prediction
algorithm in EHR EPIC
Unclassified data
16
https://www.statnews.com/2021/09/27/epic-sepsis-algorithm-
antibiotics-model/
Huat Goh et al. Nature Comm 2021
17. Technique selection
⬡ Conventional
∙ E.g. Regression
⬡ Advanced data science / machine
learning / artificial intelligence
17
Johnson et al. JACC 2018