A critical reflection on the role of Machine Learning in medical research, with specific comparisons to more classical statistical approaches to learning from data
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Statistics and ML Paris 20sept22
1. Sept 20, 2022
Statistics and machine learning:
friends or foes?
Ewout W. Steyerberg, PhD
Professor of Clinical Biostatistics and
Medical Decision Making
Dept of Biomedical Data Sciences
Leiden University Medical Center
Thanks to many, including Ben van Calster, Leuven;
Maarten van Smeden, Utrecht
2. Statistics and machine learning:
friends or foes?
21-Sep-22
2 Insert > Header & footer
• Introduction for debate
• Friction points: foes
• Commonalities between statistics and ML: friends
3. Statistics and Machine Learning (ML)
In medical research, “artificial intelligence” usually just means “machine learning” or
“algorithm”
21-Sep-22
3 Insert > Header & footer
5. Friction points between statistics and ML: foes
1. ML claims to be new and supersede statistics
2. ML claims data is most relevant
3. ML makes promises it cannot keep
21-Sep-22
5 Insert > Header & footer
6. 1. ML claims to be new and supersede statistics
21-Sep-22
6 Insert > Header & footer
11. 1. ML claims to be new and supersede statistics
ML has developed from statistics
ML as part of statistics
Statistics as part of ML
21-Sep-22
11 Insert > Header & footer
12. 2. ML claims data is most relevant
Typical context: Electronic Health Records (EHR); large administrative data sets
Uncover patterns in data that are there but remained hidden
Strong point of EHR: large N, large sets of features
Weak point of EHR: ‘quality’
Selection
Start point definition
End point definition
Missing values
…
21-Sep-22
12 Insert > Header & footer
13. More data is better? Lessons from meta-analysis
Meta-analysis:
Risk of bias assessment
Respect clustering nature
21-Sep-22
13 Personal protective equipment for preventing highly infectious diseases
15. 3. ML makes promises it cannot keep
“Uncover patterns in data that are there but remained hidden”
Unsupervised learning
Clustering unstable and determined by optimization criterion
Supervised learning
Trees / neural networks > regression
21-Sep-22
15 Insert > Header & footer
20. Machine learning vs conventional modeling
Text
“We found that random forests did not outperform Cox models despite their inherent ability to
accommodate nonlinearities and interactions. …
Elastic nets achieved the highest discrimination performance …, demonstrating
the ability of regularisation to select relevant variables and optimise model coefficients in an EHR context.”
21-Sep-22
20 Insert > Header & footer
21. Systematic review on ML vs classic modeling
21-Sep-22
21 Insert > Header & footer
25. Commonalities between statistics and ML: friends
4. Research question is key
5. Complex data structures require innovative approaches
6. Some problems are really hard
21-Sep-22
25 Insert > Header & footer
27. 4. Research question is key
From easy to hard questions
- Exploratory / descriptive
- Prediction / classification
- Causal
21-Sep-22
27 Insert > Header & footer
28. 4. Research questions
Separate
- Exploratory: data mining
“enjoy the results, because you will never see these results again”
- Descriptive: patterns in the data to learn about nature;
hypothesis generating; biomarkers – disease
ML provides more flexibility; less interpretability?
- Prediction: machine learning /trees often poor in performance
ML may provide benefits in specific circumstances?
21-Sep-22
28 Insert > Header & footer
29. ML is data hungry
21-Sep-22
29 Insert > Header & footer
31. ML good for prediction?
Large N, small p
“Natural flexibility”?
Versus non-linear terms / interactions?
21-Sep-22
31 Insert > Header & footer
32.
33. Large N, large p
33
“model based on all of the 473
available variables”
Alaa et al. PLoS One 2019;14:e0213653.
34. ML good for treatment selection rules?
High hopes
“The incorporation of new data modalities such as single-cell profiling, along with techniques that
rapidly find effective drug combinations will likely be instrumental in improving cancer care.”
21-Sep-22
34 Insert > Header & footer
35. Statistics good for treatment selection rules?
21-Sep-22
35 Insert > Header & footer
37. Alternatives
21-Sep-22
37 Insert > Header & footer
1) Risk-based methods (11 papers) use only prognostic factors to define patient subgroups,
relying on the mathematical dependency of the absolute risk difference on baseline risk;
2) Treatment effect modeling methods (9 papers): prognostic factors and treatment effect modifiers,
including penalization or separate data sets for subgroup identification / effect
3) Optimal treatment regime methods (12 papers) focus primarily on treatment effect modifiers
to classify the trial population into those who benefit from treatment and those who do not
38. 5. Complex data structures require innovative approaches
Examples of succesful ML
- Image analysis: Deep Learning (DL)
- Radiology, pathology, dermatology, opthalmology, gastroenterology, cardiology,
…
- Free text: natural language processing (NLP)
- Mining electronic health records, building blocks for prediction, …
- Pharmacovigilance in social media
21-Sep-22
38 Insert > Header & footer
39. 6. Some problems are really hard
Prediction
Small N, small p regression
Small N, large p hopeless
Large N, small p regression
Large N, large p ?
Treatment selection
Balance bias – precision
Causal interpretation
21-Sep-22
39 Insert > Header & footer
40. Summary 20 sept 2022
1. ML is not really new and needs to liaise with statistics
2. Data quality and bias: design is key, learn from clinical epidemiology
3. Don’t make too many promises
4. Research questions relate to description, prediction and causality
5. Recognized power for specific complex data structures
6. Work on the truly hard problems together
21-Sep-22
40 Insert > Header & footer