Multi-omics for drug
discovery:
what we lose, what we gain
PAUL AGAPOW
STATISTICS & DATA SCIENCE INNOVATION HUB, GSK
The obligatory disclaimer
No conflicts of interest
Personal opinion, does not reflect views of or projects at my employer
Background:
◦ Data science @GSK
◦ ML/AI & RWE @AZ
◦ Data Science Institute @ICL
Agenda
Where, why and how could multi-omics be used in
therapeutic development
What does it get us?
What’s the price of entry?
Very much a talking-out-loud presentation:
◦ Ask “why” a lot
◦ Lots of principles and high-level thinking
◦ Slim on outcomes realized to date
◦ Largely about late(r) dev not basic research
Where, why and
What problem are we trying to solve?
DRUG DEVELOPMENT IS CRASHING
It was already too slow & expensive
Now it’s worse & getting worse
New drugs often only marginally
better
‘Siege’ approaches only partly
effective
Ditto ‘clever’ approaches
Why the downturn?
 “The Cautious Regulator”?
“Just throwing money at it”
“Better than the Beatles”?
◦ Higher bar to reach / all the low hanging fruit is gone?
 “Lack of basic research”
◦ Whole organism models giving way to high throughput
methods
Biology is the throttling factor
Have many many candidates but a lot
fewer targets
◦ “undruggable targets”
High throughput screening simplifies
biology
◦ Pushes potential problems to late dev
“High hanging fruit” ill-understood &
complex
◦ Exacerbated by combo therapies etc.
Machine learning can’t function without
data
Understanding (complex) biology
is the answer (?)
“Complex” (multifactorial)
disease is common, perhaps
even the rule
Pure science is what you’re rewarded
for. That’s what you get promoted for.
That’s what they give the Nobel Prizes
for. And yet developing a drug is a
hundred times harder than getting a
Nobel Prize.
-BEN A. BARRES, STANFORD UNIVERSITY SCHOOL OF MEDICINE
Why multi-omics?
Taking a broad definition of multi-omics …
◦ An integrated multivariate analysis over different data
modalities
1. Complex diseases arise through interactions in
different compartments / on different levels
2. Diseases express on different in different
compartments / on different levels
3. Ignorance of drug candidate behaviour in vivo
can lead to late, costly mistakes
4. Give a view of information flow in disease
Does it matter what methodology
is used?
Broad divide between “multi-omics” and “ML”
approaches
Many (most) multi-omics methods don’t work
◦ Let citations sort it out?
But failure to “work” in any case does not mean
there isn’t a signal there
Difficult to assert a strong theoretical basis for
most multi-omic algorithms … like most ML
algorithms in practice
◦ Often unclear if assumptions are met
Experimenters regress – treat just like
hypothesis generation
Does it matter what data is used?
Temptation to use any and all data
But Hughes / peaking phenomena & curse
of dimensionality …
… and omni-genics
Effective multi-omics requires data curation
and selection – how do we do this?
◦ Automated feature selection
◦ Dimensional reduction
◦ Subjective expert opinion
Other challenges
Usual data gathering, integration,
harmonization hassles
◦ And possible introduction of bias
How to do this at scale?
◦ Khadeer@AZ: OmicsFold
◦ Chen: multiomics
Poor grasp of sample size & power
◦ Tarazon (2020) “Figures of Merit”
Example: cancer prediction &
stratification
Complex whole-body disease
Abnormalities in DNA, RNA, protein,
metabolite & regulatory molecules
Disease evolves & changes
Valuable information in medical imaging
A very personal disease requiring
personalised treatment
Promise with survival prediction & imaging
data
Example: cancer prediction &
stratification #2
Fang 2021: DeePaN
Patient responses to IO are variable,
influenced by health, immune & tumor
factors
Graph Convolutional Network over RWE
(EHRs) & genomic data
◦ Learns latent patient representation
◦ Cluster these for stratification
Clusters show significantly different survival
Example: asthma subtyping &
understanding response
A heterogenous disease
◦ Symptoms
◦ Progression
◦ Response to intervention
Simple clustering on transcriptomics
then proteomics rearranges clinical
types
◦ Immune-related
◦ Inflammasome-related
◦ Mito-ox-related
No clear distinction at clinical level … Kermani et al. 2018
Example: knowledge graphs for
drug repurposing
Much of ML & data science lives in “table-
land” but biology is “network land”
◦ Associative
◦ Multi-level / multi-modal
Graphs are one way of capturing these
Then what?
◦ Heritage methodology – exploration, path
lengths, link prediction
◦ Distil into latent forms for GCNs etc.
Many suggested repurposing candidates …
Example: predicting adverse
events
Immune-related adverse events (irAEs)
◦ Increasing issue with IO therapies
◦ Unpredictable
Jing et al. (2020):
◦ For many cancers
◦ Collected 42 factors linked to immune response,
filtered for association against reports down to 7
◦ Constructed series of bivariate models to explain
irAEs
◦ TCR & CD8+ cells explain 56% of irAEs
◦ Repeat on molecular markers to implicate
LCP1 and ADPGK, not previously found
Take-away
1. Much of the downturn in drug development can be
attributed to incomplete knowledge / use of complex
biology
2. Multi-omics is one way to investigate complex, multi-
level biology
3. We need to be careful, although not puritanical, about
methodology. This is about hypothesis-generation &
what works
4. Multi-omics needs much the same careful in analysis
as for ML/AI – validation, sensitivity,
5. Multi-omics can be used successfully along the entire
drug development pathway
Thanks
Data Science Institute @ICL
NHLI
ML/AI ONC R&D @AZ
Sanjay Budhdeo
Michal Krassowski
Jinyi Wu

Multi-omics for drug discovery: what we lose, what we gain

  • 1.
    Multi-omics for drug discovery: whatwe lose, what we gain PAUL AGAPOW STATISTICS & DATA SCIENCE INNOVATION HUB, GSK
  • 2.
    The obligatory disclaimer Noconflicts of interest Personal opinion, does not reflect views of or projects at my employer Background: ◦ Data science @GSK ◦ ML/AI & RWE @AZ ◦ Data Science Institute @ICL
  • 3.
    Agenda Where, why andhow could multi-omics be used in therapeutic development What does it get us? What’s the price of entry? Very much a talking-out-loud presentation: ◦ Ask “why” a lot ◦ Lots of principles and high-level thinking ◦ Slim on outcomes realized to date ◦ Largely about late(r) dev not basic research Where, why and
  • 4.
    What problem arewe trying to solve? DRUG DEVELOPMENT IS CRASHING It was already too slow & expensive Now it’s worse & getting worse New drugs often only marginally better ‘Siege’ approaches only partly effective Ditto ‘clever’ approaches
  • 5.
    Why the downturn? “The Cautious Regulator”? “Just throwing money at it” “Better than the Beatles”? ◦ Higher bar to reach / all the low hanging fruit is gone?  “Lack of basic research” ◦ Whole organism models giving way to high throughput methods
  • 6.
    Biology is thethrottling factor Have many many candidates but a lot fewer targets ◦ “undruggable targets” High throughput screening simplifies biology ◦ Pushes potential problems to late dev “High hanging fruit” ill-understood & complex ◦ Exacerbated by combo therapies etc. Machine learning can’t function without data
  • 7.
    Understanding (complex) biology isthe answer (?) “Complex” (multifactorial) disease is common, perhaps even the rule
  • 8.
    Pure science iswhat you’re rewarded for. That’s what you get promoted for. That’s what they give the Nobel Prizes for. And yet developing a drug is a hundred times harder than getting a Nobel Prize. -BEN A. BARRES, STANFORD UNIVERSITY SCHOOL OF MEDICINE
  • 9.
    Why multi-omics? Taking abroad definition of multi-omics … ◦ An integrated multivariate analysis over different data modalities 1. Complex diseases arise through interactions in different compartments / on different levels 2. Diseases express on different in different compartments / on different levels 3. Ignorance of drug candidate behaviour in vivo can lead to late, costly mistakes 4. Give a view of information flow in disease
  • 10.
    Does it matterwhat methodology is used? Broad divide between “multi-omics” and “ML” approaches Many (most) multi-omics methods don’t work ◦ Let citations sort it out? But failure to “work” in any case does not mean there isn’t a signal there Difficult to assert a strong theoretical basis for most multi-omic algorithms … like most ML algorithms in practice ◦ Often unclear if assumptions are met Experimenters regress – treat just like hypothesis generation
  • 11.
    Does it matterwhat data is used? Temptation to use any and all data But Hughes / peaking phenomena & curse of dimensionality … … and omni-genics Effective multi-omics requires data curation and selection – how do we do this? ◦ Automated feature selection ◦ Dimensional reduction ◦ Subjective expert opinion
  • 12.
    Other challenges Usual datagathering, integration, harmonization hassles ◦ And possible introduction of bias How to do this at scale? ◦ Khadeer@AZ: OmicsFold ◦ Chen: multiomics Poor grasp of sample size & power ◦ Tarazon (2020) “Figures of Merit”
  • 13.
    Example: cancer prediction& stratification Complex whole-body disease Abnormalities in DNA, RNA, protein, metabolite & regulatory molecules Disease evolves & changes Valuable information in medical imaging A very personal disease requiring personalised treatment Promise with survival prediction & imaging data
  • 14.
    Example: cancer prediction& stratification #2 Fang 2021: DeePaN Patient responses to IO are variable, influenced by health, immune & tumor factors Graph Convolutional Network over RWE (EHRs) & genomic data ◦ Learns latent patient representation ◦ Cluster these for stratification Clusters show significantly different survival
  • 15.
    Example: asthma subtyping& understanding response A heterogenous disease ◦ Symptoms ◦ Progression ◦ Response to intervention Simple clustering on transcriptomics then proteomics rearranges clinical types ◦ Immune-related ◦ Inflammasome-related ◦ Mito-ox-related No clear distinction at clinical level … Kermani et al. 2018
  • 16.
    Example: knowledge graphsfor drug repurposing Much of ML & data science lives in “table- land” but biology is “network land” ◦ Associative ◦ Multi-level / multi-modal Graphs are one way of capturing these Then what? ◦ Heritage methodology – exploration, path lengths, link prediction ◦ Distil into latent forms for GCNs etc. Many suggested repurposing candidates …
  • 17.
    Example: predicting adverse events Immune-relatedadverse events (irAEs) ◦ Increasing issue with IO therapies ◦ Unpredictable Jing et al. (2020): ◦ For many cancers ◦ Collected 42 factors linked to immune response, filtered for association against reports down to 7 ◦ Constructed series of bivariate models to explain irAEs ◦ TCR & CD8+ cells explain 56% of irAEs ◦ Repeat on molecular markers to implicate LCP1 and ADPGK, not previously found
  • 18.
    Take-away 1. Much ofthe downturn in drug development can be attributed to incomplete knowledge / use of complex biology 2. Multi-omics is one way to investigate complex, multi- level biology 3. We need to be careful, although not puritanical, about methodology. This is about hypothesis-generation & what works 4. Multi-omics needs much the same careful in analysis as for ML/AI – validation, sensitivity, 5. Multi-omics can be used successfully along the entire drug development pathway
  • 19.
    Thanks Data Science Institute@ICL NHLI ML/AI ONC R&D @AZ Sanjay Budhdeo Michal Krassowski Jinyi Wu