Beyond Proofs of Concept for Biomedical AI

Moving beyond
proofs of concept for
biomedical AI
PAUL AGAPOW
STATISTICS & DATA SCIENCE INNOVATION, GSK
FESTIVAL OF GENOMICS JANUARY 2022

Or …
The vast gulf
between the
promise and
practice of AI
in medicine

Obligatory disclosure
◦ About myself:
◦ Currently Statistics & Data Science Innovation @GSK
◦ Health Informatics / Oncology ML&AI @AZ
◦ Data Science Institute @ICL
◦ Bioinformatics @Health Protection Agency UK
◦ Does not reflect any current or past projects at any of the above
◦ Solely my opinion
◦ No conflicts of interest

Why do so many ML/AI systems
that promise improvements to
healthcare & medicine fail to
deliver on that promise?
THE PROBLEM

We live in an
age of wonders
Fungible & powerful computation
together with powerful AI
techniques working on a mountain
of high-throughput biological data
promise to revolutionize drug
development & healthcare

10 June 2021
6
“AI will not replace
drug hunters, but drug
hunters who don’t use
AI will be replaced by
those who do.”
-Andrew Hopkins, CEO Exscientia

Just when we
needed it, AI
failed us
Of 232 models for COVID
diagnosis / prognosis / prediction,
only 2 held any promise for actual
clinical application …
Wynants et al. (2020) Prediction models for diagnosis and prognosis of covid-19:
systematic review and critical appraisal. BMJ

Every bad
model has
exacts a cost

Excuses
“It takes time …”
“We need to use
<ML/AI
APPROACH> …”
“We need more
powerful
computation …”
“We need to attract
more computer
scientists and AI
experts to the field
…”
“We need more
data …”
“Computers will
never understand
biology …”

ML/AI in medicine is performed by
many different types of people with
different knowledge, different skills
and different goals, leading to
misalignment between research & the
clinic
A PROPOSED DIAGNOSIS

“I don’t need 100
new drug
candidates, I
need 5 good
ones”

The data isn’t
right
• “Big Data” is a problem
• A paucity of large, labelled datasets
• Frankenstein datasets (c.v. Christoph Molnar)
• Bias and selection cooked in
• Often lack data on crucial issues
•Advances will often be limited by biological knowledge …

12 July 2021 15
Biology is complicated
About 50 trillion cells of 200 types
Each cell has 23 pairs of chromosomes
In total 6.4 billion basepairs (positions)
Organised into about 18,000 genes
(Or maybe more like 40,000 genes)
Genetic material elsewhere in the cell
Epigenetic modification
1 million different types of molecules
Lifestyle & history
Exposure & environment
Immune system repertoire & priming
…
Of which we know only a fraction

We might not
understand what
the system is
doing
The wolf-husky problem – are we just building “snow
detectors”
Does a patient have a right to know why a medical
decision was made?
Algorithms have frequently been shown to be biased
Thus explainability / interpretability
◦ As a smoke test
◦ But interpretability is not straightforward

There are few
incentives for
writing good
software …
•Software engineering is still under-valued in academia.
“Research” software is often unsuitable for real world use
•Software in the clinic needs to be robust, reliable and regulated
• Writing software to that quality is non-trivial
•If a result cannot be reproduced, did it ever really work?

… let alone
clinically
useful AI
•Well-know pressure in academia for novel results
• Doubly so in AI, focus on novel methods on standard problems
•Desire to “do something”
•Difficult to get biologists & informaticians collaborating
• “Every time I fire a linguist, the accuracy of my NLP models
improves”
“The proposed solutions are never
intended to be applied directly”

We may be
running a
massive multiple
hypothesis test
Consider:
• Maybe millions of researchers working on similar problems
• Using different approaches and assumptions
• Using different data, processed differently
• Using different software stacks
• Using different tunings & hyper-parameters on these models
• Throwing out models that “don’t work”
How many results may be due to simple chance?

So what do we
do?
•Broad validation is a non-negotiable
•Likewise, reproducibility
•And interpretability?
•Need collaborating experts
• In biology
• In software and programming
•Focus on incremental improvement of models
•Distrust accuracy metrics
•Accept that nothing ever works as well in the real-world
•Does the model solve a useful problem? If it works,
what will you do?
•As always, we need more of the right sort of data

Come along to …
Vibhor Gupta (PangaeaAI) and myself leading a
discussion (later today?)
Towards the Industrial Use of AI in Biotech &
Medicine

Looking for a job?
https://www.gsk.com/en-gb/careers/

Beyond Proofs of Concept for Biomedical AI

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Beyond Proofs of Concept for Biomedical AI

Similar to Beyond Proofs of Concept for Biomedical AI (20)

More from Paul Agapow

More from Paul Agapow (11)

Recently uploaded

Recently uploaded (20)

Beyond Proofs of Concept for Biomedical AI