SlideShare a Scribd company logo
1 of 18
Download to read offline
Machine learning,
health data & the limits
of knowledge
Paul Agapow
ONC R&D ML&AI AstraZeneca
<paul.agapow@astrazeneca.com>
20201/3/10
2
Disclosure
• Does not reflect official AZ thought or projects
• No conflicts of interest
3
About me
• Have been a:
• At
• Oncology R&D ML&AI / RWE @AZ
• Data Science Institute @ICL
• Centre for Infection @HPA (UK)
• Universities, industry, government …
health informatician, data scientist, bioinformatician, database
administrator, epi-informaticist, software dev, data manager,
consultant, molecular geneticist, data scientist, evolutionary
scientist, biochemist, phylogeneticist, immunologist, programmer …
Using this paper as a jumping-off point
• The Hierarchical Classifier for COVID-19
Resistance Evaluation (2021) Shakhovska,
Izonin & Melnykova, Data v6:6
• https://doi.org/10.3390/data6010006
• https://www.mdpi.com/2306-
5729/6/1/6/htm
• How to analyse for patterns in COVID data
when the observational data is diverse &
complex
4
Data is a
saviour & a
curse
• Data & analytics has saved us several times in the current
crisis
• But too much data can create problems
• And data is not information
5
RWE: real world evidence
6
• Electronic Health Records
• Registries
• Claims databases
• Repurposed trial data
• Defined:
• Anything that isn’t an RCT
(randomised controlled trial)
• Observational data
• Anything we have to consider the
context & sourcing of?
• Why?
• Cheap
• Ethical
• Accesses scales & types of data &
situations that are otherwise
unavailable
• Where was it collected?
• Who did they look for?
• What are those peoples
habits and histories?
But all (RWE) data is biased
What population does it
come from?
• “severe asthma” or
“PDL1 expression”
• What are the diagnostic
devices?
• What’s common medical
practice there?
What are the definitions
used?
• E.g. surveys, visits
• Are inclusion / exclusion
at random?
• What incidental
correlations?
• Choice of features
What causes data to be
included / excluded?
7
The COVID publication: is it good data?
• Do we know where it came from?
• Do we know who is in it?
• Is there missing data?
• “maybe” COVID?
• Are the populations comparable?
• Are antibody levels comparable?
• Different test kits?
• Imbalanced classes?
8
The data
How do we analyse RWE correctly?
• Patients are complex:
• Co-morbidities
• Lifestyle, prior history, exposure
• Demographics, genetics, epigenome, microbiome …
• Disease is complex:
• Affects different body subsystems
• Health data is complex:
• Sparse, irregular
• A product of a healthcare system …
• Underlying models unclear
• Many opportunities for confounders & noise
9
10
Is ML the best approach for RWE analysis?
Messy data
Clear
assumptions
Explicit
models … No model
Statistical modelling Machine Learning / AI
…
a continuum of approaches
Few
assumptions
Clean &
controlled data
Trained from
data
Larger data
But what are the pitfalls of using ML on health data?
11
• Need more (labelled) data
• Bias – how was the data
sourced?
• Needs to be handled carefully
• May require specialised
computation & skills
• Some problems difficult to
adapt to ML
• Interpretability – data never
lies, but what is it telling us?
Clustering: how simple algorithms can
actually be very complex
• Idea of clustering is simple: but what does it actually do?
• Every dataset has clusters, even random noise
• Do clusters reflect the underlying reality?
• Are the clusters revealed valid and/or robust?
• Are the clusters of groups you are interested in?
• A cluster is the truth, it’s a hypothesis
(The paper is modestly convincing about these points)
12
The COVID publication: is it good methodology?
• Many different methods but:
• What’s the concordance?
• What use is 6-7 methods?
• Ensemble them?
• Where’s the validation?
• What’s the question?
• How many people are actually
infected with COVID? or
• Can we build a model to calculate
this?
13
The data
What makes a good machine-learning approach?
14
• Be clear what it is predicting
• It should be reproducible
• It should be validated:
• Internally: performance, convergence, loss,
sensitivity, robust, …
• Externally: against another dataset
• Almost any ML method can
• Do (slightly) better than humans
• Get better than 50%
• If it is “better”, compared to what?
How do the
systems in the
paper measure
up?
15
How do we know what a system is doing?
• Interpretability is non-negotiable
• AI models can only be built for data that
you have
• Biased data gives rise to biased models
• A model may not be doing what we
think it is
• Toolkits like Shap & Lime make
interpretability easy and comparable
(Paper used very interpretable systems)
How could this have been done better?
• What question are we trying to solve?
• “What’s the actual level of infected people in the population”?
• In what time period or setting?
• What’s actionable?
• What data can we get?
• What data can we get for validation?
• We don’t need 6-7 different methods, just 1 good one
• Be clear about “how good” the results are
16
Summary
• RWE may be a broad and over-reaching category
• But it underlines the complexity & biases of health data
• ML may be the best approach for analysing RWE
• However its power and flexibility introduces other problems
• Data “bias”
• Validation
• Interpretability
• ML “findings” are almost always just hypotheses
• Healthcare analytics should not be about analytics but about biology
17
Final thought
• If you are driven by science and passionate about improving lives, why not work at
AstraZeneca?
• Example jobs – please visit our careers website
• Principal Data Scientist - https://careers.astrazeneca.com/job/gaithersburg/principal-
data-scientist/7684/14833674
• Associate Director Imaging & AI - Imaging & Data Analytics -
https://careers.astrazeneca.com/job/gothenburg/associate-director-imaging-and-ai-
imaging-and-data-analytics/7684/14469379
• Data Sciences & AI Graduate Programme – UK -
https://careers.astrazeneca.com/data-sciences-and-ai-graduate-programme
18

More Related Content

What's hot

Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Ewout Steyerberg
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
 
AI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelAI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelLevi Shapiro
 
Artificial intelligence in health care (drug discovery) in pharmacy
Artificial intelligence in health care (drug discovery) in pharmacy Artificial intelligence in health care (drug discovery) in pharmacy
Artificial intelligence in health care (drug discovery) in pharmacy Dr. Amit Gangwal Jain (MPharm., PhD.)
 
AI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryAI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryDavid Leahy
 
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...Cirdan
 
Interpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchInterpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchPaul Agapow
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great againBenVanCalster
 
Machine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilanceMachine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilanceRevathi Boyina
 
Artificial intelligence in drug discovery
Artificial intelligence in drug discoveryArtificial intelligence in drug discovery
Artificial intelligence in drug discoveryRAVINDRABABUKOPPERA
 
Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug developmentSRUTHI N
 
Machine learning in medicine: calm down
Machine learning in medicine: calm downMachine learning in medicine: calm down
Machine learning in medicine: calm downBenVanCalster
 
Journal for Clinical Studies: Close Cooperation Between Data Management and B...
Journal for Clinical Studies: Close Cooperation Between Data Management and B...Journal for Clinical Studies: Close Cooperation Between Data Management and B...
Journal for Clinical Studies: Close Cooperation Between Data Management and B...KCR
 
How Artificial Intelligence in Transforming Pharma
How Artificial Intelligence in Transforming PharmaHow Artificial Intelligence in Transforming Pharma
How Artificial Intelligence in Transforming PharmaTyrone Systems
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Artificial intelligence ppt
Artificial intelligence pptArtificial intelligence ppt
Artificial intelligence pptSwastik Jyoti
 
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...Jan Wiegelmann
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceMaarten van Smeden
 
Machine Learning and Prediction in Medicine
Machine Learning and Prediction in MedicineMachine Learning and Prediction in Medicine
Machine Learning and Prediction in MedicineChad You
 

What's hot (20)

Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
AI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelAI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth Israel
 
Artificial intelligence in health care (drug discovery) in pharmacy
Artificial intelligence in health care (drug discovery) in pharmacy Artificial intelligence in health care (drug discovery) in pharmacy
Artificial intelligence in health care (drug discovery) in pharmacy
 
AI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryAI is the Future of Drug Discovery
AI is the Future of Drug Discovery
 
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
 
Interpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchInterpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical Research
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great again
 
Machine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilanceMachine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilance
 
Artificial intelligence in drug discovery
Artificial intelligence in drug discoveryArtificial intelligence in drug discovery
Artificial intelligence in drug discovery
 
Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug development
 
Machine learning in medicine: calm down
Machine learning in medicine: calm downMachine learning in medicine: calm down
Machine learning in medicine: calm down
 
Journal for Clinical Studies: Close Cooperation Between Data Management and B...
Journal for Clinical Studies: Close Cooperation Between Data Management and B...Journal for Clinical Studies: Close Cooperation Between Data Management and B...
Journal for Clinical Studies: Close Cooperation Between Data Management and B...
 
How Artificial Intelligence in Transforming Pharma
How Artificial Intelligence in Transforming PharmaHow Artificial Intelligence in Transforming Pharma
How Artificial Intelligence in Transforming Pharma
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Artificial intelligence ppt
Artificial intelligence pptArtificial intelligence ppt
Artificial intelligence ppt
 
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
How Artificial Intelligence is Reducing Costs and Improving Outcomes in Pharm...
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial Intelligence
 
Machine Learning and Prediction in Medicine
Machine Learning and Prediction in MedicineMachine Learning and Prediction in Medicine
Machine Learning and Prediction in Medicine
 

Similar to Machine learning, health data & the limits of knowledge

ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trustPaul Agapow
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in HealthcarePaul Agapow
 
Melissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AIMelissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AImelissadata
 
Standards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesStandards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesSilje Ljosland Bakke
 
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...Health Catalyst
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxssuser6b571f
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumDale Sanders
 
Social Graphs for Better Drug Development
Social Graphs for Better Drug DevelopmentSocial Graphs for Better Drug Development
Social Graphs for Better Drug DevelopmentVaticle
 
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...University of California, San Francisco
 
An Introduction to Artificial Intelligence for the Everyday Radiologist
An Introduction to Artificial Intelligence for the Everyday RadiologistAn Introduction to Artificial Intelligence for the Everyday Radiologist
An Introduction to Artificial Intelligence for the Everyday RadiologistBrian Wells, MD, MS, MPH
 
grandroundsonai-190917135538.pdf
grandroundsonai-190917135538.pdfgrandroundsonai-190917135538.pdf
grandroundsonai-190917135538.pdfUmayKulsoom2
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalkimlyman
 
Standards in health informatics - problem, clinical models and terminology
Standards in health informatics - problem, clinical models and terminologyStandards in health informatics - problem, clinical models and terminology
Standards in health informatics - problem, clinical models and terminologySilje Ljosland Bakke
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicinePaul Agapow
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesPhilip Payne
 
Analyzing and Interpreting Data statippt
Analyzing and Interpreting Data statipptAnalyzing and Interpreting Data statippt
Analyzing and Interpreting Data statipptElleMaRie3
 
Digital Health Transformation for Health Executives (January 18, 2022)
Digital Health Transformation for Health Executives (January 18, 2022)Digital Health Transformation for Health Executives (January 18, 2022)
Digital Health Transformation for Health Executives (January 18, 2022)Nawanan Theera-Ampornpunt
 
Data Quality in Healthcare: An Important Challenge
Data Quality in Healthcare: An Important ChallengeData Quality in Healthcare: An Important Challenge
Data Quality in Healthcare: An Important ChallengeMike Hogarth, MD, FACMI, FACP
 
eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records Health Informatics New Zealand
 

Similar to Machine learning, health data & the limits of knowledge (20)

ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trust
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
Melissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AIMelissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AI
 
Standards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesStandards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologies
 
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
 
Social Graphs for Better Drug Development
Social Graphs for Better Drug DevelopmentSocial Graphs for Better Drug Development
Social Graphs for Better Drug Development
 
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...Atul Butte's presentation to the Association of Medical School Pediatric Depa...
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
 
An Introduction to Artificial Intelligence for the Everyday Radiologist
An Introduction to Artificial Intelligence for the Everyday RadiologistAn Introduction to Artificial Intelligence for the Everyday Radiologist
An Introduction to Artificial Intelligence for the Everyday Radiologist
 
grandroundsonai-190917135538.pdf
grandroundsonai-190917135538.pdfgrandroundsonai-190917135538.pdf
grandroundsonai-190917135538.pdf
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) final
 
Standards in health informatics - problem, clinical models and terminology
Standards in health informatics - problem, clinical models and terminologyStandards in health informatics - problem, clinical models and terminology
Standards in health informatics - problem, clinical models and terminology
 
Atul Butte NIPS 2017 ML4H
Atul Butte NIPS 2017 ML4HAtul Butte NIPS 2017 ML4H
Atul Butte NIPS 2017 ML4H
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicine
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across Scales
 
Analyzing and Interpreting Data statippt
Analyzing and Interpreting Data statipptAnalyzing and Interpreting Data statippt
Analyzing and Interpreting Data statippt
 
Digital Health Transformation for Health Executives (January 18, 2022)
Digital Health Transformation for Health Executives (January 18, 2022)Digital Health Transformation for Health Executives (January 18, 2022)
Digital Health Transformation for Health Executives (January 18, 2022)
 
Data Quality in Healthcare: An Important Challenge
Data Quality in Healthcare: An Important ChallengeData Quality in Healthcare: An Important Challenge
Data Quality in Healthcare: An Important Challenge
 
eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records
 

More from Paul Agapow

Digital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfDigital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfPaul Agapow
 
How to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfHow to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfPaul Agapow
 
Get yourself a better bioinformatics job
Get yourself a better bioinformatics jobGet yourself a better bioinformatics job
Get yourself a better bioinformatics jobPaul Agapow
 
Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Paul Agapow
 
AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)Paul Agapow
 
Patient subtypes: real or not?
Patient subtypes: real or not?Patient subtypes: real or not?
Patient subtypes: real or not?Paul Agapow
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a liePaul Agapow
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondonPaul Agapow
 
Introduction to Snakemake
Introduction to SnakemakeIntroduction to Snakemake
Introduction to SnakemakePaul Agapow
 
Analysing biomedical data (ers october 2017)
Analysing biomedical data (ers  october 2017)Analysing biomedical data (ers  october 2017)
Analysing biomedical data (ers october 2017)Paul Agapow
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Paul Agapow
 

More from Paul Agapow (11)

Digital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfDigital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdf
 
How to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfHow to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdf
 
Get yourself a better bioinformatics job
Get yourself a better bioinformatics jobGet yourself a better bioinformatics job
Get yourself a better bioinformatics job
 
Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)
 
AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)
 
Patient subtypes: real or not?
Patient subtypes: real or not?Patient subtypes: real or not?
Patient subtypes: real or not?
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a lie
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
Introduction to Snakemake
Introduction to SnakemakeIntroduction to Snakemake
Introduction to Snakemake
 
Analysing biomedical data (ers october 2017)
Analysing biomedical data (ers  october 2017)Analysing biomedical data (ers  october 2017)
Analysing biomedical data (ers october 2017)
 
Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)Interpreting transcriptomics (ers berlin 2017)
Interpreting transcriptomics (ers berlin 2017)
 

Recently uploaded

Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaPooja Gupta
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...rajnisinghkjn
 
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAAjennyeacort
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000aliya bhat
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 

Recently uploaded (20)

Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
 
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli đź“ž 9907093804 High Profile Service 100% Safe
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 

Machine learning, health data & the limits of knowledge

  • 1. Machine learning, health data & the limits of knowledge Paul Agapow ONC R&D ML&AI AstraZeneca <paul.agapow@astrazeneca.com> 20201/3/10
  • 2. 2 Disclosure • Does not reflect official AZ thought or projects • No conflicts of interest
  • 3. 3 About me • Have been a: • At • Oncology R&D ML&AI / RWE @AZ • Data Science Institute @ICL • Centre for Infection @HPA (UK) • Universities, industry, government … health informatician, data scientist, bioinformatician, database administrator, epi-informaticist, software dev, data manager, consultant, molecular geneticist, data scientist, evolutionary scientist, biochemist, phylogeneticist, immunologist, programmer …
  • 4. Using this paper as a jumping-off point • The Hierarchical Classifier for COVID-19 Resistance Evaluation (2021) Shakhovska, Izonin & Melnykova, Data v6:6 • https://doi.org/10.3390/data6010006 • https://www.mdpi.com/2306- 5729/6/1/6/htm • How to analyse for patterns in COVID data when the observational data is diverse & complex 4
  • 5. Data is a saviour & a curse • Data & analytics has saved us several times in the current crisis • But too much data can create problems • And data is not information 5
  • 6. RWE: real world evidence 6 • Electronic Health Records • Registries • Claims databases • Repurposed trial data • Defined: • Anything that isn’t an RCT (randomised controlled trial) • Observational data • Anything we have to consider the context & sourcing of? • Why? • Cheap • Ethical • Accesses scales & types of data & situations that are otherwise unavailable
  • 7. • Where was it collected? • Who did they look for? • What are those peoples habits and histories? But all (RWE) data is biased What population does it come from? • “severe asthma” or “PDL1 expression” • What are the diagnostic devices? • What’s common medical practice there? What are the definitions used? • E.g. surveys, visits • Are inclusion / exclusion at random? • What incidental correlations? • Choice of features What causes data to be included / excluded? 7
  • 8. The COVID publication: is it good data? • Do we know where it came from? • Do we know who is in it? • Is there missing data? • “maybe” COVID? • Are the populations comparable? • Are antibody levels comparable? • Different test kits? • Imbalanced classes? 8 The data
  • 9. How do we analyse RWE correctly? • Patients are complex: • Co-morbidities • Lifestyle, prior history, exposure • Demographics, genetics, epigenome, microbiome … • Disease is complex: • Affects different body subsystems • Health data is complex: • Sparse, irregular • A product of a healthcare system … • Underlying models unclear • Many opportunities for confounders & noise 9
  • 10. 10 Is ML the best approach for RWE analysis? Messy data Clear assumptions Explicit models … No model Statistical modelling Machine Learning / AI … a continuum of approaches Few assumptions Clean & controlled data Trained from data Larger data
  • 11. But what are the pitfalls of using ML on health data? 11 • Need more (labelled) data • Bias – how was the data sourced? • Needs to be handled carefully • May require specialised computation & skills • Some problems difficult to adapt to ML • Interpretability – data never lies, but what is it telling us?
  • 12. Clustering: how simple algorithms can actually be very complex • Idea of clustering is simple: but what does it actually do? • Every dataset has clusters, even random noise • Do clusters reflect the underlying reality? • Are the clusters revealed valid and/or robust? • Are the clusters of groups you are interested in? • A cluster is the truth, it’s a hypothesis (The paper is modestly convincing about these points) 12
  • 13. The COVID publication: is it good methodology? • Many different methods but: • What’s the concordance? • What use is 6-7 methods? • Ensemble them? • Where’s the validation? • What’s the question? • How many people are actually infected with COVID? or • Can we build a model to calculate this? 13 The data
  • 14. What makes a good machine-learning approach? 14 • Be clear what it is predicting • It should be reproducible • It should be validated: • Internally: performance, convergence, loss, sensitivity, robust, … • Externally: against another dataset • Almost any ML method can • Do (slightly) better than humans • Get better than 50% • If it is “better”, compared to what? How do the systems in the paper measure up?
  • 15. 15 How do we know what a system is doing? • Interpretability is non-negotiable • AI models can only be built for data that you have • Biased data gives rise to biased models • A model may not be doing what we think it is • Toolkits like Shap & Lime make interpretability easy and comparable (Paper used very interpretable systems)
  • 16. How could this have been done better? • What question are we trying to solve? • “What’s the actual level of infected people in the population”? • In what time period or setting? • What’s actionable? • What data can we get? • What data can we get for validation? • We don’t need 6-7 different methods, just 1 good one • Be clear about “how good” the results are 16
  • 17. Summary • RWE may be a broad and over-reaching category • But it underlines the complexity & biases of health data • ML may be the best approach for analysing RWE • However its power and flexibility introduces other problems • Data “bias” • Validation • Interpretability • ML “findings” are almost always just hypotheses • Healthcare analytics should not be about analytics but about biology 17
  • 18. Final thought • If you are driven by science and passionate about improving lives, why not work at AstraZeneca? • Example jobs – please visit our careers website • Principal Data Scientist - https://careers.astrazeneca.com/job/gaithersburg/principal- data-scientist/7684/14833674 • Associate Director Imaging & AI - Imaging & Data Analytics - https://careers.astrazeneca.com/job/gothenburg/associate-director-imaging-and-ai- imaging-and-data-analytics/7684/14469379 • Data Sciences & AI Graduate Programme – UK - https://careers.astrazeneca.com/data-sciences-and-ai-graduate-programme 18