SlideShare a Scribd company logo
Oct 21, 2022
Statistics and machine learning:
friends or foes?
Ewout W. Steyerberg, PhD
Professor of Clinical Biostatistics and
Medical Decision Making
Dept of Biomedical Data Sciences
Leiden University Medical Center
Thanks to many, including Ben van Calster, Leuven;
Maarten van Smeden, Utrecht
Statistics and machine learning:
friends or foes?
21-Oct-22
2 Insert > Header & footer
• Introduction for debate
• Friction points: foes
• Commonalities between statistics and ML: friends
Statistics and Machine Learning (ML)
In medical research, “artificial intelligence” usually just means “machine learning” or
“algorithm”
21-Oct-22
3 Insert > Header & footer
Machine learning in medical research
21-Oct-22
4 Insert > Header & footer
Machine learning and AI everywhere
IBM Watson winning Jeopardy! (2011)
Dr Watson
21-Oct-22
8 Insert > Header & footer
Dr Watson lessons
21-Oct-22
10 Insert > Header & footer
Dr Watson lesson 1
21-Oct-22
11 Insert > Header & footer
Dr Watson lesson 2
21-Oct-22
12 Insert > Header & footer
Dr Watson lesson 3
21-Oct-22
13 Insert > Header & footer
Friction points between statistics and ML: foes
1. ML claims to be new and supersede statistics
2. ML claims any data is relevant
3. ML makes promises it cannot keep
21-Oct-22
14 Insert > Header & footer
1. ML claims to be new and supersede statistics
21-Oct-22
15 Insert > Header & footer
Reviewer comment
“Everything is an ML method”
Statistics Machine learning
Covariates Features
Outcome variable Target
Model Network, graphs
Parameters Weights
Model for discrete var. Classifier
Model for continuous var. Regression
Log-likelihood Cross-entropy loss
Multinomial regression Softmax
Measurement error Noise
Subject/observation Sample/instance
Dummy coding One-hot encoding
Measurement invariance Concept drift
Statistics Machine learning
Prediction Supervised learning
Latent variable modeling Unsupervised learning
Fitting Learning
Prediction error Error
Sensitivity Recall
Positive predictive value Precision
Contingency table Confusion matrix
Measurement error model Noise-aware ML
Structural equation model Gaussian Bayesian network
Gold standard Ground truth
Derivation–validation Training–test
Experiment A/B test
Language
https://www.analyticsvidhya.com/glossary-of-common-statistics-and-machine-learning-terms/
https://developers.google.com/machine-learning/glossary
Where to place Machine Learning?
21-Oct-22
20 Insert > Header & footer
https://codeburst.io/statistics-a-machine-learning-essential-ee537695786b
21-Oct-22
21 Insert > Header & footer
1. ML claims to be new and supersede statistics
ML has developed from statistics
ML as part of statistics
Statistics as part of ML
ML:
models roughly outside of the traditional regression types of analysis:
• decision trees (and descendants, XGBoost, ..)
• Support vector machines (SVMs)
• neural networks (including Deep learning)
21-Oct-22
22 Insert > Header & footer
2. ML claims any data is relevant
Typical context: Electronic Health Records (EHR); large administrative data sets
Uncover patterns in data that are there but remained hidden
Strong point of EHR: large N, large sets of features
Weak point of EHR: ‘quality’
Selection of patients
Start point definition
End point definition
Selective measurement
Missing values
…
21-Oct-22
23 Insert > Header & footer
More data is better? Lessons from meta-analysis
Meta-analysis:
Risk of bias assessment
Respect clustering nature
21-Oct-22
24 Personal protective equipment for preventing highly infectious diseases
Big Data, Big Errors
3. ML makes promises it cannot keep
“Uncover patterns in data that are there but remained hidden”
Unsupervised learning
Clustering unstable and determined by optimization criterion
Supervised learning
Trees / neural networks better for prediction than regression
21-Oct-22
26 Insert > Header & footer
Supervised learning example
21-Oct-22
27 Insert > Header & footer
Example from Maarten van Smeden, Utrecht; @MaartenvSmeden
Predicting mortality – the media
Findings not convincing
Cox, #4, 30 vars, max c =0.793
RF, #7, 600 vars, c=0.797
Elastic, #9, 600 vars, c=0.801
21-Oct-22
29 Insert > Header & footer
RF showed poor calibration
21-Oct-22
30 Insert > Header & footer
Machine learning vs conventional modeling
Text
“We found that random forests did not outperform Cox models despite their inherent ability to
accommodate nonlinearities and interactions. …
Elastic nets achieved the highest discrimination performance …, demonstrating
the ability of regularisation to select relevant variables and optimise model coefficients in an EHR context.”
21-Oct-22
31 Insert > Header & footer
Systematic review on ML vs classic modeling
21-Oct-22
32 Insert > Header & footer
Differences in discrimination
Commonalities between statistics and ML: friends
4. Research question is key
5. Complex data structures require innovative approaches
6. Some problems are really hard
21-Oct-22
35 Insert > Header & footer
21-Oct-22
36 Insert > Header & footer
4. Research question is key
From easy to hard questions
- Exploratory / descriptive
- Prediction / classification
- Causal
21-Oct-22
37 Insert > Header & footer
4. Research questions
Separate
- Exploratory: data mining
“enjoy the results, because you will never see these results again”
- Descriptive: patterns in the data to learn about nature;
hypothesis generating; biomarkers – disease
ML provides more flexibility; less interpretability?
- Prediction: machine learning /trees often poor in performance
ML may provide benefits in specific circumstances?
21-Oct-22
38 Insert > Header & footer
39
Van der Ploeg et al. BMC Med Res Methodol 2014;14:137.
ML good for prediction?
Large N, small p
“Natural flexibility”?
Versus non-linear terms / interactions in regression?
21-Oct-22
40 Insert > Header & footer
ML good for treatment selection rules?
High hopes
“The incorporation of new data modalities such as single-cell profiling, along with techniques that
rapidly find effective drug combinations will likely be instrumental in improving cancer care.”
21-Oct-22
42 Insert > Header & footer
Statistics good for treatment selection rules?
21-Oct-22
43 Insert > Header & footer
21-Oct-22
44 Insert > Header & footer
https://hbiostat.org/blog/post/path/index.html
Alternatives
21-Oct-22
45 Insert > Header & footer
1) Risk-based methods (11 papers) use only prognostic factors to define patient subgroups,
relying on the mathematical dependency of the absolute risk difference on baseline risk;
2) Treatment effect modeling methods (9 papers): prognostic factors and treatment effect modifiers,
including penalization or separate data sets for subgroup identification / effect
3) Optimal treatment regime methods (12 papers) focus primarily on treatment effect modifiers
to classify the trial population into those who benefit from treatment and those who do not
5. Complex data structures require innovative approaches
Examples of succesful ML
- Image analysis: Deep Learning (DL)
- Radiology, pathology, dermatology, opthalmology, gastroenterology, cardiology,
…
- Free text: natural language processing (NLP)
- Mining electronic health records, building blocks for prediction, …
- Pharmacovigilance in social media
21-Oct-22
46 Insert > Header & footer
6. Some problems are really hard
Prediction
Small N, small p  regression
Small N, large p  hopeless
Large N, small p  regression
Large N, large p  ?
Treatment selection
Balance bias – precision
Causal interpretation
21-Oct-22
47 Insert > Header & footer
Summary 21 Oct 2022
1. ML is not really new and needs to liaise with statistics
2. Data quality and bias: design is key, learn from clinical epidemiology
3. Don’t make too many promises
4. Research questions relate to description, prediction and causality
5. Recognized power for specific complex data structures
6. Work on the truly hard problems together
21-Oct-22
48 Insert > Header & footer

More Related Content

What's hot

Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
Maarten van Smeden
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
Maarten van Smeden
 
Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurt
Stephen Senn
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confounding
Maarten van Smeden
 
4.3.1. controlling confounding matching
4.3.1. controlling confounding matching4.3.1. controlling confounding matching
4.3.1. controlling confounding matching
A M
 
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Maarten van Smeden
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
Laure Wynants
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistency
cheweb1
 
Causal Inference Introduction.pdf
Causal Inference Introduction.pdfCausal Inference Introduction.pdf
Causal Inference Introduction.pdf
Yuna Koyama
 
Observational Research designs: detailed description
Observational Research designs: detailed description Observational Research designs: detailed description
Observational Research designs: detailed description
Tarek Tawfik Amin
 
Noninferiority Trials Presentation
Noninferiority Trials PresentationNoninferiority Trials Presentation
Noninferiority Trials Presentationhsturgeon
 
Clinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxClinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptx
StephenSenn2
 
Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net Benefit
Laure Wynants
 
5.2.3 dags for selection bias
5.2.3 dags for selection bias5.2.3 dags for selection bias
5.2.3 dags for selection bias
A M
 
Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...
Maarten van Smeden
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
Maarten van Smeden
 
A Brief Introduction to Epidemiology
A Brief Introduction to EpidemiologyA Brief Introduction to Epidemiology
A Brief Introduction to Epidemiology
Paul Barratt
 
observational analytical study
observational analytical studyobservational analytical study
observational analytical studyDr. Partha Sarkar
 

What's hot (20)

Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
 
Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurt
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confounding
 
4.3.1. controlling confounding matching
4.3.1. controlling confounding matching4.3.1. controlling confounding matching
4.3.1. controlling confounding matching
 
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistency
 
Causal Inference Introduction.pdf
Causal Inference Introduction.pdfCausal Inference Introduction.pdf
Causal Inference Introduction.pdf
 
Dhiwahar ppt
Dhiwahar pptDhiwahar ppt
Dhiwahar ppt
 
Observational Research designs: detailed description
Observational Research designs: detailed description Observational Research designs: detailed description
Observational Research designs: detailed description
 
Noninferiority Trials Presentation
Noninferiority Trials PresentationNoninferiority Trials Presentation
Noninferiority Trials Presentation
 
Clinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxClinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptx
 
Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net Benefit
 
5.2.3 dags for selection bias
5.2.3 dags for selection bias5.2.3 dags for selection bias
5.2.3 dags for selection bias
 
Grading Strength of Evidence
Grading Strength of EvidenceGrading Strength of Evidence
Grading Strength of Evidence
 
Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
 
A Brief Introduction to Epidemiology
A Brief Introduction to EpidemiologyA Brief Introduction to Epidemiology
A Brief Introduction to Epidemiology
 
observational analytical study
observational analytical studyobservational analytical study
observational analytical study
 

Similar to Statistics and ML 21Oct22 sel.pptx

Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
Turi, Inc.
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
James Hendler
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Editor IJCATR
 
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
jybufgofasfbkpoovh
 
Open science LMU session contribution E Steyerberg 2jul20
Open science LMU session contribution E Steyerberg 2jul20Open science LMU session contribution E Steyerberg 2jul20
Open science LMU session contribution E Steyerberg 2jul20
Ewout Steyerberg
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
Vini Vasundharan
 
Big Data becomes Big Analysis
Big Data becomes Big Analysis Big Data becomes Big Analysis
Big Data becomes Big Analysis
OSTHUS
 
Open Science Better Science? Steyerberg 2June2022.pptx
Open Science Better Science? Steyerberg 2June2022.pptxOpen Science Better Science? Steyerberg 2June2022.pptx
Open Science Better Science? Steyerberg 2June2022.pptx
Ewout Steyerberg
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
Vaticle
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
paperpublications3
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
Paolo Missier
 
0912f50eedb48e44d7000000
0912f50eedb48e44d70000000912f50eedb48e44d7000000
0912f50eedb48e44d7000000
Rakesh Sharma
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
IJSRD
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Seth Grimes
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
ijsc
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
shalini s
 
Mastering Data Science A Comprehensive Introduction.docx
Mastering Data Science A Comprehensive Introduction.docxMastering Data Science A Comprehensive Introduction.docx
Mastering Data Science A Comprehensive Introduction.docx
workshayesteh
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
Sabri Skhiri
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 

Similar to Statistics and ML 21Oct22 sel.pptx (20)

Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
Data; Data manipulation, sorting, grouping, rearranging. Plotting the data. D...
 
Open science LMU session contribution E Steyerberg 2jul20
Open science LMU session contribution E Steyerberg 2jul20Open science LMU session contribution E Steyerberg 2jul20
Open science LMU session contribution E Steyerberg 2jul20
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
 
Big Data becomes Big Analysis
Big Data becomes Big Analysis Big Data becomes Big Analysis
Big Data becomes Big Analysis
 
Open Science Better Science? Steyerberg 2June2022.pptx
Open Science Better Science? Steyerberg 2June2022.pptxOpen Science Better Science? Steyerberg 2June2022.pptx
Open Science Better Science? Steyerberg 2June2022.pptx
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Data Extraction
Data ExtractionData Extraction
Data Extraction
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
0912f50eedb48e44d7000000
0912f50eedb48e44d70000000912f50eedb48e44d7000000
0912f50eedb48e44d7000000
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Mastering Data Science A Comprehensive Introduction.docx
Mastering Data Science A Comprehensive Introduction.docxMastering Data Science A Comprehensive Introduction.docx
Mastering Data Science A Comprehensive Introduction.docx
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 

Recently uploaded

Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Saeid Safari
 
Antiulcer drugs Advance Pharmacology .pptx
Antiulcer drugs Advance Pharmacology .pptxAntiulcer drugs Advance Pharmacology .pptx
Antiulcer drugs Advance Pharmacology .pptx
Rohit chaurpagar
 
THOA 2.ppt Human Organ Transplantation Act
THOA 2.ppt Human Organ Transplantation ActTHOA 2.ppt Human Organ Transplantation Act
THOA 2.ppt Human Organ Transplantation Act
DrSathishMS1
 
POST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its managementPOST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its management
touseefaziz1
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
Sujoy Dasgupta
 
Flu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore KarnatakaFlu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore Karnataka
addon Scans
 
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Savita Shen $i11
 
How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
LanceCatedral
 
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
kevinkariuki227
 
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdfBENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
DR SETH JOTHAM
 
The Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of IIThe Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of II
MedicoseAcademics
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
pal078100
 
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
Catherine Liao
 
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTSARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
Dr. Vinay Pareek
 
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness JourneyTom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
greendigital
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Dr Jeenal Mistry
 
Are There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdfAre There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdf
Little Cross Family Clinic
 
Charaka Samhita Sutra Sthana 9 Chapter khuddakachatuspadadhyaya
Charaka Samhita Sutra Sthana 9 Chapter khuddakachatuspadadhyayaCharaka Samhita Sutra Sthana 9 Chapter khuddakachatuspadadhyaya
Charaka Samhita Sutra Sthana 9 Chapter khuddakachatuspadadhyaya
Dr KHALID B.M
 
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
GL Anaacs
 
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #GirlsFor Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
Savita Shen $i11
 

Recently uploaded (20)

Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
 
Antiulcer drugs Advance Pharmacology .pptx
Antiulcer drugs Advance Pharmacology .pptxAntiulcer drugs Advance Pharmacology .pptx
Antiulcer drugs Advance Pharmacology .pptx
 
THOA 2.ppt Human Organ Transplantation Act
THOA 2.ppt Human Organ Transplantation ActTHOA 2.ppt Human Organ Transplantation Act
THOA 2.ppt Human Organ Transplantation Act
 
POST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its managementPOST OPERATIVE OLIGURIA and its management
POST OPERATIVE OLIGURIA and its management
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
 
Flu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore KarnatakaFlu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore Karnataka
 
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
Phone Us ❤85270-49040❤ #ℂall #gIRLS In Surat By Surat @ℂall @Girls Hotel With...
 
How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
 
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...
 
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdfBENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
 
The Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of IIThe Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of II
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
 
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...The hemodynamic and autonomic determinants of elevated blood pressure in obes...
The hemodynamic and autonomic determinants of elevated blood pressure in obes...
 
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTSARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
ARTHROLOGY PPT NCISM SYLLABUS AYURVEDA STUDENTS
 
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness JourneyTom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
 
Are There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdfAre There Any Natural Remedies To Treat Syphilis.pdf
Are There Any Natural Remedies To Treat Syphilis.pdf
 
Charaka Samhita Sutra Sthana 9 Chapter khuddakachatuspadadhyaya
Charaka Samhita Sutra Sthana 9 Chapter khuddakachatuspadadhyayaCharaka Samhita Sutra Sthana 9 Chapter khuddakachatuspadadhyaya
Charaka Samhita Sutra Sthana 9 Chapter khuddakachatuspadadhyaya
 
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
 
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #GirlsFor Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
 

Statistics and ML 21Oct22 sel.pptx

  • 1. Oct 21, 2022 Statistics and machine learning: friends or foes? Ewout W. Steyerberg, PhD Professor of Clinical Biostatistics and Medical Decision Making Dept of Biomedical Data Sciences Leiden University Medical Center Thanks to many, including Ben van Calster, Leuven; Maarten van Smeden, Utrecht
  • 2. Statistics and machine learning: friends or foes? 21-Oct-22 2 Insert > Header & footer • Introduction for debate • Friction points: foes • Commonalities between statistics and ML: friends
  • 3. Statistics and Machine Learning (ML) In medical research, “artificial intelligence” usually just means “machine learning” or “algorithm” 21-Oct-22 3 Insert > Header & footer
  • 4. Machine learning in medical research 21-Oct-22 4 Insert > Header & footer
  • 5. Machine learning and AI everywhere
  • 6.
  • 7. IBM Watson winning Jeopardy! (2011)
  • 8. Dr Watson 21-Oct-22 8 Insert > Header & footer
  • 9.
  • 10. Dr Watson lessons 21-Oct-22 10 Insert > Header & footer
  • 11. Dr Watson lesson 1 21-Oct-22 11 Insert > Header & footer
  • 12. Dr Watson lesson 2 21-Oct-22 12 Insert > Header & footer
  • 13. Dr Watson lesson 3 21-Oct-22 13 Insert > Header & footer
  • 14. Friction points between statistics and ML: foes 1. ML claims to be new and supersede statistics 2. ML claims any data is relevant 3. ML makes promises it cannot keep 21-Oct-22 14 Insert > Header & footer
  • 15. 1. ML claims to be new and supersede statistics 21-Oct-22 15 Insert > Header & footer
  • 17. “Everything is an ML method”
  • 18.
  • 19. Statistics Machine learning Covariates Features Outcome variable Target Model Network, graphs Parameters Weights Model for discrete var. Classifier Model for continuous var. Regression Log-likelihood Cross-entropy loss Multinomial regression Softmax Measurement error Noise Subject/observation Sample/instance Dummy coding One-hot encoding Measurement invariance Concept drift Statistics Machine learning Prediction Supervised learning Latent variable modeling Unsupervised learning Fitting Learning Prediction error Error Sensitivity Recall Positive predictive value Precision Contingency table Confusion matrix Measurement error model Noise-aware ML Structural equation model Gaussian Bayesian network Gold standard Ground truth Derivation–validation Training–test Experiment A/B test Language https://www.analyticsvidhya.com/glossary-of-common-statistics-and-machine-learning-terms/ https://developers.google.com/machine-learning/glossary
  • 20. Where to place Machine Learning? 21-Oct-22 20 Insert > Header & footer https://codeburst.io/statistics-a-machine-learning-essential-ee537695786b
  • 21. 21-Oct-22 21 Insert > Header & footer
  • 22. 1. ML claims to be new and supersede statistics ML has developed from statistics ML as part of statistics Statistics as part of ML ML: models roughly outside of the traditional regression types of analysis: • decision trees (and descendants, XGBoost, ..) • Support vector machines (SVMs) • neural networks (including Deep learning) 21-Oct-22 22 Insert > Header & footer
  • 23. 2. ML claims any data is relevant Typical context: Electronic Health Records (EHR); large administrative data sets Uncover patterns in data that are there but remained hidden Strong point of EHR: large N, large sets of features Weak point of EHR: ‘quality’ Selection of patients Start point definition End point definition Selective measurement Missing values … 21-Oct-22 23 Insert > Header & footer
  • 24. More data is better? Lessons from meta-analysis Meta-analysis: Risk of bias assessment Respect clustering nature 21-Oct-22 24 Personal protective equipment for preventing highly infectious diseases
  • 25. Big Data, Big Errors
  • 26. 3. ML makes promises it cannot keep “Uncover patterns in data that are there but remained hidden” Unsupervised learning Clustering unstable and determined by optimization criterion Supervised learning Trees / neural networks better for prediction than regression 21-Oct-22 26 Insert > Header & footer
  • 27. Supervised learning example 21-Oct-22 27 Insert > Header & footer Example from Maarten van Smeden, Utrecht; @MaartenvSmeden
  • 29. Findings not convincing Cox, #4, 30 vars, max c =0.793 RF, #7, 600 vars, c=0.797 Elastic, #9, 600 vars, c=0.801 21-Oct-22 29 Insert > Header & footer
  • 30. RF showed poor calibration 21-Oct-22 30 Insert > Header & footer
  • 31. Machine learning vs conventional modeling Text “We found that random forests did not outperform Cox models despite their inherent ability to accommodate nonlinearities and interactions. … Elastic nets achieved the highest discrimination performance …, demonstrating the ability of regularisation to select relevant variables and optimise model coefficients in an EHR context.” 21-Oct-22 31 Insert > Header & footer
  • 32. Systematic review on ML vs classic modeling 21-Oct-22 32 Insert > Header & footer
  • 33.
  • 35. Commonalities between statistics and ML: friends 4. Research question is key 5. Complex data structures require innovative approaches 6. Some problems are really hard 21-Oct-22 35 Insert > Header & footer
  • 36. 21-Oct-22 36 Insert > Header & footer
  • 37. 4. Research question is key From easy to hard questions - Exploratory / descriptive - Prediction / classification - Causal 21-Oct-22 37 Insert > Header & footer
  • 38. 4. Research questions Separate - Exploratory: data mining “enjoy the results, because you will never see these results again” - Descriptive: patterns in the data to learn about nature; hypothesis generating; biomarkers – disease ML provides more flexibility; less interpretability? - Prediction: machine learning /trees often poor in performance ML may provide benefits in specific circumstances? 21-Oct-22 38 Insert > Header & footer
  • 39. 39 Van der Ploeg et al. BMC Med Res Methodol 2014;14:137.
  • 40. ML good for prediction? Large N, small p “Natural flexibility”? Versus non-linear terms / interactions in regression? 21-Oct-22 40 Insert > Header & footer
  • 41.
  • 42. ML good for treatment selection rules? High hopes “The incorporation of new data modalities such as single-cell profiling, along with techniques that rapidly find effective drug combinations will likely be instrumental in improving cancer care.” 21-Oct-22 42 Insert > Header & footer
  • 43. Statistics good for treatment selection rules? 21-Oct-22 43 Insert > Header & footer
  • 44. 21-Oct-22 44 Insert > Header & footer https://hbiostat.org/blog/post/path/index.html
  • 45. Alternatives 21-Oct-22 45 Insert > Header & footer 1) Risk-based methods (11 papers) use only prognostic factors to define patient subgroups, relying on the mathematical dependency of the absolute risk difference on baseline risk; 2) Treatment effect modeling methods (9 papers): prognostic factors and treatment effect modifiers, including penalization or separate data sets for subgroup identification / effect 3) Optimal treatment regime methods (12 papers) focus primarily on treatment effect modifiers to classify the trial population into those who benefit from treatment and those who do not
  • 46. 5. Complex data structures require innovative approaches Examples of succesful ML - Image analysis: Deep Learning (DL) - Radiology, pathology, dermatology, opthalmology, gastroenterology, cardiology, … - Free text: natural language processing (NLP) - Mining electronic health records, building blocks for prediction, … - Pharmacovigilance in social media 21-Oct-22 46 Insert > Header & footer
  • 47. 6. Some problems are really hard Prediction Small N, small p  regression Small N, large p  hopeless Large N, small p  regression Large N, large p  ? Treatment selection Balance bias – precision Causal interpretation 21-Oct-22 47 Insert > Header & footer
  • 48. Summary 21 Oct 2022 1. ML is not really new and needs to liaise with statistics 2. Data quality and bias: design is key, learn from clinical epidemiology 3. Don’t make too many promises 4. Research questions relate to description, prediction and causality 5. Recognized power for specific complex data structures 6. Work on the truly hard problems together 21-Oct-22 48 Insert > Header & footer