SlideShare a Scribd company logo
1 of 20
gsk.com
AI & Big Data Expo, London
Machine learning, biomedical data & trust
Paul Agapow (Statistics & Data Science Innovation Hub)
Background & disclaimer
• Previously a health informatician, biomedical ML
researcher, bioinformatician, “computer guy”,
disease chaser, epi-informatician,
phylogeneticist, evolutionary biologist,
immunologist, biochemist …
• Now a director @GSK
• This presentation does not reflect thought,
policy or projects in progress at GSK
• There are no conflicts of interest
10 June 2021 3
“AI will not replace
drug hunters, but drug
hunters who don’t use
AI will be replaced by
those who do.”
-Andrew Hopkins, CEO Exscientia
4
5
07 February 2023
3 hurdles to using AI/ML in therapy development
Biological & physiological
complexity
Insufficient & uneven data
A gap between AI/ML practice &
medical needs
To make a
new drug,
you must
first solve for
everything
6
12 July 2021 7
The complexity of biology:
About 50 trillion cells of 200 types
Each cell has 23 pairs of chromosomes
In total 6.4 billion basepairs (positions)
Organised into about 18,000 genes
(Or maybe more like 40,000 genes)
Genetic material elsewhere in the cell
Epigenetic modification
1 million different types of molecules
Lifestyle & history
Exposure & environment
Immune system repertoire & priming
…
Of which we know only a fraction
The data types and sources we need are myriad & varied
8
Hughes et al. (2010) ”Principles of early drug discovery”
• There are many different
modalities of intervention
• With different (data)
considerations & different
levels of ML experience
07 February 2023 9
There are many different means to the same end
McKinsey, EvaluatePharma 2022
It’s often not
the right data
• Difficult / expensive to generate
• Unstructured
• Unlabeled
• The wrong type
• Sparse, unevenly sampled
• WEIRD
• In different formats and silos
10
07 February 2023 11
Melanie Mitchell via Dagmar Monett
A disconnect between AI/ML practice and medical needs
Academic focus on problems with low medical value
• There are many models
that work perfectly … in
the lab
• Why?
- Unrealistic or poor
training data
- Emphasis on hitting
metrics
07 February 2023 12
A disconnect between AI/ML practice and medical needs
A tendency to treat biomedicine as simply a data / ML problem
The classic
analytical
tension
13
What we need to solve
What we tend to solve
Easy things
Available, ideal data
Ground truth
Simplify
“Interesting”
“Table-land”
Useful things
Incomplete messy data
Unclear biological reality
Uncertain findings
Needful
“Network-land”
14
Laure Wynants via Maarten van Smeden
A disconnect between AI/ML practice and medical needs
Many ”good” models are not fit for production
07 February 2023 15
• The pandemic prompted a flood of publications &
preprints
• Most plagued by the usual biomedical AI problems
• … and also produced by those outside the field
• As a general principle, any paper applying ML to COVID
is terrible
• Bad models in a crisis situation are not neutral, they
distract, expend effort, are an opportunity cost
COVID was a lightning rod for bad biomedical ML
07 February 2023 16
• What does it purport to do: Find risk factors
associated with deterioration of COVID patients
• Why? Better / faster assessment of incoming
patients
• Who? Patients admitted to two hospitals with +ve
PCR test for COVID with CT scan with lesions
• Data? Demographics, bloods, labs, breathing/
oxygen scores, CT scans manually scored
“Interpretable Prediction of Severity & Crucial Factors of COVID Patients”
Zheng et al. BioMed Research International (2021), DOI: 10.1155/2021/8840835
07 February 2023 17
• Conflates diagnosis & prognosis
• The cohort:
- Suggested this can replace PCR but cohort are selected
by PCR result
- The act of taking a CT scan in some ways selects for
cohort
- Unclear when some readings taken, when we are looking
at deterioration
- Are the training set the set that a model might be used on
in the clinic?
- Not many critical – so actually testing for severe cases
- What’s the split between hospitals
- Patients are different already, pre-existing conditions
- Association with age & general health
- Old patients running a temperature with lesioned lungs do
poorly
• Clinical use:
- Will all this data be available in a timely fashion for a
model in the clinic
- If the severity is based of bloods & oxygenation readings,
why not just use them
- Information complexity?
• Validation:
- Would it work for another time period at same hospitals?
At other hospitals?
• Analytics
- “The impenetrable wall of math”
- XGBoost is always a good place to start
- Ensemble methods usually are
- Feature interaction?
- Some features overlap (neutrophils, n. ratio, NLR)
- What features correlate?
- No attempt to simplify model
- Any model is interpretable with SHAP
• Still useful for intrinsic / research purposes
Thoughts and questions
Not necessarily faults, not all easily answerable
07 February 2023 18
• Models will always tell you the truth
- But it’s the truth conditioned on the data they’ve seen
- It might not be the truth you think
• Biomedical data is complex, it always come with a context
• Patients are complex, they always come with a medical history
• How were these patients selected?
• What is this model actually saying and why?
• Does this model replicate in other populations?
• But despite all this, we have to make and actionably interpret
models
Some principles for better biomedical ML
Click to enter
title here
Why not join us?
19
Academic Press (2021)
Click to enter
title here
Some light
reading
20
Academic Press (2021)

More Related Content

Similar to ML, biomedical data & trust

ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
ssuser6b571f
 
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Manuel GEA - Bio-Modeling Systems
 
grandroundsonai-190917135538.pdf
grandroundsonai-190917135538.pdfgrandroundsonai-190917135538.pdf
grandroundsonai-190917135538.pdf
UmayKulsoom2
 
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
DataScienceConferenc1
 

Similar to ML, biomedical data & trust (20)

Multi-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gainMulti-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gain
 
ML & AI in Drug development: the hidden part of the iceberg
ML & AI in Drug development: the hidden part of the icebergML & AI in Drug development: the hidden part of the iceberg
ML & AI in Drug development: the hidden part of the iceberg
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
 
Big Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- CeliBig Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- Celi
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Diabetes Data Science
Diabetes Data ScienceDiabetes Data Science
Diabetes Data Science
 
Atul Butte NIPS 2017 ML4H
Atul Butte NIPS 2017 ML4HAtul Butte NIPS 2017 ML4H
Atul Butte NIPS 2017 ML4H
 
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
 
MDC Connects Series 2021 | A Guide to Complex Medicines: Developing the assay...
MDC Connects Series 2021 | A Guide to Complex Medicines: Developing the assay...MDC Connects Series 2021 | A Guide to Complex Medicines: Developing the assay...
MDC Connects Series 2021 | A Guide to Complex Medicines: Developing the assay...
 
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AIStr-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
 
An Introduction to Artificial Intelligence for the Everyday Radiologist
An Introduction to Artificial Intelligence for the Everyday RadiologistAn Introduction to Artificial Intelligence for the Everyday Radiologist
An Introduction to Artificial Intelligence for the Everyday Radiologist
 
grandroundsonai-190917135538.pdf
grandroundsonai-190917135538.pdfgrandroundsonai-190917135538.pdf
grandroundsonai-190917135538.pdf
 
Artificial Intelligence and ChatGPT: Impacts and Challenges for Medical Educa...
Artificial Intelligence and ChatGPT: Impacts and Challenges for Medical Educa...Artificial Intelligence and ChatGPT: Impacts and Challenges for Medical Educa...
Artificial Intelligence and ChatGPT: Impacts and Challenges for Medical Educa...
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
 
2023-11-09 HealthRI Biobanking day_Amsterdam_Alain van Gool.pdf
2023-11-09 HealthRI Biobanking day_Amsterdam_Alain van Gool.pdf2023-11-09 HealthRI Biobanking day_Amsterdam_Alain van Gool.pdf
2023-11-09 HealthRI Biobanking day_Amsterdam_Alain van Gool.pdf
 
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
 
The reality of moving towards precision medicine
The reality of moving towards precision medicineThe reality of moving towards precision medicine
The reality of moving towards precision medicine
 

More from Paul Agapow

More from Paul Agapow (12)

Digital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfDigital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdf
 
How to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfHow to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdf
 
Get yourself a better bioinformatics job
Get yourself a better bioinformatics jobGet yourself a better bioinformatics job
Get yourself a better bioinformatics job
 
Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)
 
Machine Learning for Preclinical Research
Machine Learning for Preclinical ResearchMachine Learning for Preclinical Research
Machine Learning for Preclinical Research
 
AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)
 
Patient subtypes: real or not?
Patient subtypes: real or not?Patient subtypes: real or not?
Patient subtypes: real or not?
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a lie
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
Introduction to Snakemake
Introduction to SnakemakeIntroduction to Snakemake
Introduction to Snakemake
 
Analysing biomedical data (ers october 2017)
Analysing biomedical data (ers  october 2017)Analysing biomedical data (ers  october 2017)
Analysing biomedical data (ers october 2017)
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)
 

Recently uploaded

Unveiling Pharyngitis: Causes, Symptoms, Diagnosis, and Treatment Strategies.pdf
Unveiling Pharyngitis: Causes, Symptoms, Diagnosis, and Treatment Strategies.pdfUnveiling Pharyngitis: Causes, Symptoms, Diagnosis, and Treatment Strategies.pdf
Unveiling Pharyngitis: Causes, Symptoms, Diagnosis, and Treatment Strategies.pdf
NoorulainMehmood1
 
Failure to thrive in neonates and infants + pediatric case.pptx
Failure to thrive in neonates and infants  + pediatric case.pptxFailure to thrive in neonates and infants  + pediatric case.pptx
Failure to thrive in neonates and infants + pediatric case.pptx
claviclebrown44
 
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose AcademicsCytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
MedicoseAcademics
 
Unit 4 Pharmaceutical Organic Chemisty 3 Quinoline
Unit 4 Pharmaceutical Organic Chemisty 3 QuinolineUnit 4 Pharmaceutical Organic Chemisty 3 Quinoline
Unit 4 Pharmaceutical Organic Chemisty 3 Quinoline
AarishRathnam1
 
Histology of Epithelium - Dr Muhammad Ali Rabbani - Medicose Academics
Histology of Epithelium - Dr Muhammad Ali Rabbani - Medicose AcademicsHistology of Epithelium - Dr Muhammad Ali Rabbani - Medicose Academics
Histology of Epithelium - Dr Muhammad Ali Rabbani - Medicose Academics
MedicoseAcademics
 
Physiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdfPhysiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdf
MedicoseAcademics
 

Recently uploaded (20)

parliaments-for-health-security_RecordOfAchievement.pdf
parliaments-for-health-security_RecordOfAchievement.pdfparliaments-for-health-security_RecordOfAchievement.pdf
parliaments-for-health-security_RecordOfAchievement.pdf
 
Unveiling Pharyngitis: Causes, Symptoms, Diagnosis, and Treatment Strategies.pdf
Unveiling Pharyngitis: Causes, Symptoms, Diagnosis, and Treatment Strategies.pdfUnveiling Pharyngitis: Causes, Symptoms, Diagnosis, and Treatment Strategies.pdf
Unveiling Pharyngitis: Causes, Symptoms, Diagnosis, and Treatment Strategies.pdf
 
Benefits of Chanting Hanuman Chalisa .pdf
Benefits of Chanting Hanuman Chalisa .pdfBenefits of Chanting Hanuman Chalisa .pdf
Benefits of Chanting Hanuman Chalisa .pdf
 
BURNS (CLASSIFICATION & MANAGEMENTS).pdf
BURNS (CLASSIFICATION & MANAGEMENTS).pdfBURNS (CLASSIFICATION & MANAGEMENTS).pdf
BURNS (CLASSIFICATION & MANAGEMENTS).pdf
 
Failure to thrive in neonates and infants + pediatric case.pptx
Failure to thrive in neonates and infants  + pediatric case.pptxFailure to thrive in neonates and infants  + pediatric case.pptx
Failure to thrive in neonates and infants + pediatric case.pptx
 
Top 15 Sexiest Pakistani Pornstars with Images & Videos
Top 15 Sexiest Pakistani Pornstars with Images & VideosTop 15 Sexiest Pakistani Pornstars with Images & Videos
Top 15 Sexiest Pakistani Pornstars with Images & Videos
 
Sell 5cladba adbb JWH-018 5FADB in stock
Sell 5cladba adbb JWH-018 5FADB in stockSell 5cladba adbb JWH-018 5FADB in stock
Sell 5cladba adbb JWH-018 5FADB in stock
 
seventh section physiology laboratory.pptx
seventh section physiology laboratory.pptxseventh section physiology laboratory.pptx
seventh section physiology laboratory.pptx
 
Lachesis Mutus- a Homoeopathic medicinel.pptx
Lachesis Mutus- a Homoeopathic medicinel.pptxLachesis Mutus- a Homoeopathic medicinel.pptx
Lachesis Mutus- a Homoeopathic medicinel.pptx
 
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose AcademicsCytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
 
JOURNAL CLUB PRESENTATION TEMPLATE DOCUMENT
JOURNAL CLUB PRESENTATION TEMPLATE DOCUMENTJOURNAL CLUB PRESENTATION TEMPLATE DOCUMENT
JOURNAL CLUB PRESENTATION TEMPLATE DOCUMENT
 
Drug development life cycle indepth overview.pptx
Drug development life cycle indepth overview.pptxDrug development life cycle indepth overview.pptx
Drug development life cycle indepth overview.pptx
 
Unit 4 Pharmaceutical Organic Chemisty 3 Quinoline
Unit 4 Pharmaceutical Organic Chemisty 3 QuinolineUnit 4 Pharmaceutical Organic Chemisty 3 Quinoline
Unit 4 Pharmaceutical Organic Chemisty 3 Quinoline
 
Negative Pressure Wound Therapy in Diabetic Foot Ulcer.pptx
Negative Pressure Wound Therapy in Diabetic Foot Ulcer.pptxNegative Pressure Wound Therapy in Diabetic Foot Ulcer.pptx
Negative Pressure Wound Therapy in Diabetic Foot Ulcer.pptx
 
Histology of Epithelium - Dr Muhammad Ali Rabbani - Medicose Academics
Histology of Epithelium - Dr Muhammad Ali Rabbani - Medicose AcademicsHistology of Epithelium - Dr Muhammad Ali Rabbani - Medicose Academics
Histology of Epithelium - Dr Muhammad Ali Rabbani - Medicose Academics
 
Mgr university bsc nursing adult health previous question paper with answers
Mgr university  bsc nursing adult health previous question paper with answersMgr university  bsc nursing adult health previous question paper with answers
Mgr university bsc nursing adult health previous question paper with answers
 
Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024
 
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptxCreeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
 
High Purity 99% PMK Ethyl Glycidate Powder CAS 28578-16-7
High Purity 99% PMK Ethyl Glycidate Powder CAS 28578-16-7High Purity 99% PMK Ethyl Glycidate Powder CAS 28578-16-7
High Purity 99% PMK Ethyl Glycidate Powder CAS 28578-16-7
 
Physiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdfPhysiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdf
 

ML, biomedical data & trust

  • 1. gsk.com AI & Big Data Expo, London Machine learning, biomedical data & trust Paul Agapow (Statistics & Data Science Innovation Hub)
  • 2. Background & disclaimer • Previously a health informatician, biomedical ML researcher, bioinformatician, “computer guy”, disease chaser, epi-informatician, phylogeneticist, evolutionary biologist, immunologist, biochemist … • Now a director @GSK • This presentation does not reflect thought, policy or projects in progress at GSK • There are no conflicts of interest
  • 3. 10 June 2021 3 “AI will not replace drug hunters, but drug hunters who don’t use AI will be replaced by those who do.” -Andrew Hopkins, CEO Exscientia
  • 4. 4
  • 5. 5 07 February 2023 3 hurdles to using AI/ML in therapy development Biological & physiological complexity Insufficient & uneven data A gap between AI/ML practice & medical needs
  • 6. To make a new drug, you must first solve for everything 6
  • 7. 12 July 2021 7 The complexity of biology: About 50 trillion cells of 200 types Each cell has 23 pairs of chromosomes In total 6.4 billion basepairs (positions) Organised into about 18,000 genes (Or maybe more like 40,000 genes) Genetic material elsewhere in the cell Epigenetic modification 1 million different types of molecules Lifestyle & history Exposure & environment Immune system repertoire & priming … Of which we know only a fraction
  • 8. The data types and sources we need are myriad & varied 8 Hughes et al. (2010) ”Principles of early drug discovery”
  • 9. • There are many different modalities of intervention • With different (data) considerations & different levels of ML experience 07 February 2023 9 There are many different means to the same end McKinsey, EvaluatePharma 2022
  • 10. It’s often not the right data • Difficult / expensive to generate • Unstructured • Unlabeled • The wrong type • Sparse, unevenly sampled • WEIRD • In different formats and silos 10
  • 11. 07 February 2023 11 Melanie Mitchell via Dagmar Monett A disconnect between AI/ML practice and medical needs Academic focus on problems with low medical value
  • 12. • There are many models that work perfectly … in the lab • Why? - Unrealistic or poor training data - Emphasis on hitting metrics 07 February 2023 12 A disconnect between AI/ML practice and medical needs A tendency to treat biomedicine as simply a data / ML problem
  • 13. The classic analytical tension 13 What we need to solve What we tend to solve Easy things Available, ideal data Ground truth Simplify “Interesting” “Table-land” Useful things Incomplete messy data Unclear biological reality Uncertain findings Needful “Network-land”
  • 14. 14 Laure Wynants via Maarten van Smeden A disconnect between AI/ML practice and medical needs Many ”good” models are not fit for production
  • 15. 07 February 2023 15 • The pandemic prompted a flood of publications & preprints • Most plagued by the usual biomedical AI problems • … and also produced by those outside the field • As a general principle, any paper applying ML to COVID is terrible • Bad models in a crisis situation are not neutral, they distract, expend effort, are an opportunity cost COVID was a lightning rod for bad biomedical ML
  • 16. 07 February 2023 16 • What does it purport to do: Find risk factors associated with deterioration of COVID patients • Why? Better / faster assessment of incoming patients • Who? Patients admitted to two hospitals with +ve PCR test for COVID with CT scan with lesions • Data? Demographics, bloods, labs, breathing/ oxygen scores, CT scans manually scored “Interpretable Prediction of Severity & Crucial Factors of COVID Patients” Zheng et al. BioMed Research International (2021), DOI: 10.1155/2021/8840835
  • 17. 07 February 2023 17 • Conflates diagnosis & prognosis • The cohort: - Suggested this can replace PCR but cohort are selected by PCR result - The act of taking a CT scan in some ways selects for cohort - Unclear when some readings taken, when we are looking at deterioration - Are the training set the set that a model might be used on in the clinic? - Not many critical – so actually testing for severe cases - What’s the split between hospitals - Patients are different already, pre-existing conditions - Association with age & general health - Old patients running a temperature with lesioned lungs do poorly • Clinical use: - Will all this data be available in a timely fashion for a model in the clinic - If the severity is based of bloods & oxygenation readings, why not just use them - Information complexity? • Validation: - Would it work for another time period at same hospitals? At other hospitals? • Analytics - “The impenetrable wall of math” - XGBoost is always a good place to start - Ensemble methods usually are - Feature interaction? - Some features overlap (neutrophils, n. ratio, NLR) - What features correlate? - No attempt to simplify model - Any model is interpretable with SHAP • Still useful for intrinsic / research purposes Thoughts and questions Not necessarily faults, not all easily answerable
  • 18. 07 February 2023 18 • Models will always tell you the truth - But it’s the truth conditioned on the data they’ve seen - It might not be the truth you think • Biomedical data is complex, it always come with a context • Patients are complex, they always come with a medical history • How were these patients selected? • What is this model actually saying and why? • Does this model replicate in other populations? • But despite all this, we have to make and actionably interpret models Some principles for better biomedical ML
  • 19. Click to enter title here Why not join us? 19 Academic Press (2021)
  • 20. Click to enter title here Some light reading 20 Academic Press (2021)