Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
1. AI & Scientific Discovery in Oncology:
Opportunities, Challenges & Trends
André Freitas & dECMT AI Team
DART Meeting
Barcelona, May, 2023
2. Outline
What are the recent developments in AI
which are relevant for oncology research ?
Three perspectives:
• Explainable AI (XAI)
• Building associations over heterogeneous data
• Variational Autoencoders (VAEs)
• Unified evidence spaces for multi-omics
• Large Language Models (LLMs)
• Interpreting textual evidence (at scale)
New Infrastructures
for Scientific Discovery
New Models of Inference
Disclaimer:
AI-centered
Near-future(-istic) perspective (emerging trends)
4. Motivation:
• Certain aspects of tumor pathology need to be studied within tissue
context.
• Interaction of the neoplastic cell with the surrounding microenvironment
including the immune system.
Goal:
• Models which: bridge the gap between microscopic imaging and high-
dimensional ”omics” technologies.
• Facilitate the discovery of localised molecular features that drive the
spatially heterogeneous phenotypes of a tumour.
Breast cancer profiling by explainable AI
Binder et al. (Nature MI, 2021)
5. Breast cancer profiling by explainable AI
Binder et al. (Nature MI, 2021)
Morpho-molecular
integration
Computationally generated “fluorescence microscopy”
Correlation of spatio-morphological and molecular features
6. SVM
‘LRP’
Bag
of
keypoints
Molecular
Som. mut.
Copy num. var.
RNA-seq
DNA meth
Prot. profiles
Binder et al. (Nature MI, 2021)
over 200,000 individually
annotated cells
positive label indicating presence of at least one
cell of the respective type per patch
prediction for each
protein/gene separately
Bag
of
keypoints
SVM
‘LRP’
Cancer
Lymphocytes
Stroma
Heatmap: relevance of a pixel is the sum of the
relevance scores over all local features which
cover that pixel.
Layer-wise Relevance Propagation (LRP):
facilitates high-resolution (pixel-wise) classification
results allowing to identify individual cells while
requiring only coarse-grained training data (region
annotations).
TCGA: over 500
A kind of abductive reasoning:
‘Inference to the best explanation’
Additive/monotonic explanatory data integration
7. • Motivation:
• Molecular heterogeneity of cancer cells.
• Relapse of disease due to the escape of resistant cell populations.
• Goal:
• Model which: obtains characteristic network patterns for tumor cells and normal
epithelial cells.
• Approach:
• Coping with sampling heterogeneities (unique rare molecular properties).
• Capturing complex global correlations, which are inherent to biological networks.
• Leveraging single cell data which can deliver large training datasets (10k cells per
patient).
Single-cell gene regulatory network prediction
by explainable AI
Keyl et al. (Nuc. Ac. Res, 2023)
9. NN
LRP
scRNA-seq
Cell-type
Tumor
Ciliated
AT1
AT2
Club
Ciliated
10 patients with non-small
cell lung cancer
(1) A target gene is predicted based
on a set of other genes.
(2) LRP is used to infer the relevance
of every gene for this prediction.
Predicted
masked gene
Single-cell gene regulatory network
prediction by explainable AI
Keyl et al. (Nuc. Ac. Res, 2023)
10. NN
LRP
scRNA-seq
Infer interaction
strength graph
Cell-type
Tumor
Ciliated
AT1
AT2
Club
Ciliated
10 patients with non-small
cell lung cancer
(1) A target gene is predicted based
on a set of other genes.
(2) LRP is used to infer the relevance
of every gene for this prediction.
Predicted
masked gene
(3) Aggregate interaction strength
Single-cell gene regulatory network
prediction by explainable AI
Keyl et al. (Nuc. Ac. Res, 2023)
11. NN
scRNA-seq
Cell-type
Tumor
Ciliated
AT1
AT2
Club
Ciliated
10 patients with non-small
cell lung cancer
Predicted
masked gene
Single-cell gene regulatory network
prediction by explainable AI
Keyl et al. (Nuc. Ac. Res, 2023)
UMAP
Inter- and intra-tumoral distribution of tumor-
specific network activity
(dots represent tumor cells)
13. Certain patients show different active
networks in the same cells (e.g. T1 and T6 in
patient p032).
pathogenic networks
Keyl et al. (Nuc. Ac. Res, 2023)
Similarity/clustering ‘view’
(NN)
Linked associations
(expl.)
qualify
14. Some network modules are distinctly active
only in a minority of tumour cells (e.g. T2 in
patient p024), indicating a functional
heterogeneity.
pathogenic networks
Keyl et al. (Nuc. Ac. Res, 2023)
Similarity/clustering ‘view’
(NN)
Linked associations
(expl.)
qualify
15. Explainable AI (XAI): Take-away
• ML:
• Computes the relevant correlations in a predictive setting.
• Flexibility for operating over multi-modal, heterogeneous data.
• Explainability:
• Shift the focus from prediction to association.
• Allows for transparency, verifiability and control.
• Discovered relations are associational, not causal.
• Exploratory: useful to inform hypotheses for new
interventional studies.
17. Motivation:
• Integration of transcriptomics from diverse/multicenter data sources.
• Accounting for batch effects.
Example:
• Transcriptomics profiles from
• 932 CCLE cell lines
• 434 patient-derived tumor xenografts
• 10’550 patient tumors from TCGA
• 406 metastatic tumors from MET50029
• 203 breast tumors from Count Me In (CMI)
Integration of transcriptomics profiles from
different datasets
Dimitrieva et al. (BioRxiv, 2022)
18. Enc
RNA-seq
TCGA
CCLE
PTX
MET500
CMI
MOBER architecture
Dec
μ
σ
z
Dimitrieva et al. (BioRxiv, 2022)
decoder takes a sample from the latent space
and reconstructs the gene expression profile
RNA-seq
aNN
Source discriminator
Integration of transcriptomics profiles from
different datasets
Self-supervision
Variational autoencoders (VAE)
19. • Compensates for batch effects.
• Alignment preserves biological subtype relationships.
• Information transfer between cell line and patient tumor datasets.
Dimitrieva et al. (BioRxiv, 2022)
22. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs
VAEs induce a lower-dimensional
smooth space.
Disentangling latent factors.
Observations are organised within a
lower dimensional (perceptual-
semantic) manifold.
Imply: ‘Extrapolations’ are possible.
24. Style transfer in the gene expression space
Jain et al. (ArXiv:2106.15456, 2021)
Uhler (Talk at ICML, 2020)
Similarity/clustering ‘view’
Qualified changes as aligned trajectories (style
transfer)
Predict, project, extrapolate via style transfer
Lotfollahi et al. (2018): Generative modelling and
latent space arithmetics predict single-cell response
across cell types, studies and species.
• Foundation for a digital twin framework.
• Patient, cell states, etc.
25. Better mechanistic grounding
We need to:
• Gene expression is governed by a gene regulatory network.
• Closer dialogue to the real biochemical processes.
• Allows for a better control (move the system to a desirable state).
• How to ground on the actual mechanistic model?
Uhler et al. (ICML, 2022)
In order to:
• Identify optimal interventions.
• Transport drug intervention to a new cell-type.
30. Accumulated
Knowledge
Hypotheses
Questions
Natural Language
Inference (NLI)
NLI
Models
Adapted from: https://human-centered.ai/project/explainable-ai-fwf-32554/
Automating meta-analyses
Cytokine release syndrome (CRS):
Significant adverse event of T cell-engaging
therapies.
Need: Predictive models for CRS
Problem: Lack of patient-level datasets.
Can one explore relevant evidence in the
literature?
Bogatu et al. (JBI, 2023)
32. ~ 460 papers 17 highly
aligned papers
Parameter
extraction
Meta-review
19hs 38hs 7 mins
Bogatu et al. (JBI, 2023)
33.
34. ~ 460 papers 17 highly
aligned papers
Parameter
extraction
Meta-review
LLM
context window
chain of prompts
GPT 3.5
Table builder
Layout Extractor
Not possible one year ago!
Demo Wysocki & Wysocka
35. 64 years old woman with:
• multiple myeloma,
• s/p allogeneic transplant with recurrent disease and with systemic amyloidosis (involvement of lungs, tongue, bladder, heart),
• on hemodialysis for ESRD who represents for malaise, weakness, and generalized body aching x 2 days.
• she was admitted with hypercalcemia and treated with pamidronate 30mg, calcitonin, and dialysis.
• patient was initially treated with melphalan and prednisone, followed by VAD regimen, and autologous stem cell transplant.
• with relapse of her myeloma, she received thalidomide velcade and thalidomide, which were eventually also held due to
worsening edema and kidney function.
~ 375,600
CTs
Clinical trial matching
37. ~ 375,600
CT reports
Coarse-grained
ranking
LLM
context window
chain of prompts
GPT 3.5
Layout Extractor
Neural-indexing
eligibility criteria
Neural-search
ranked list
(relevant trials)
Patient description
Fine-grained
inference
LM LM
Demo from Bogatu, Jullien
Jullien et al. (Semeval 2021)
38. Take-away
Universal framework for integrating and organising heterogeneous evidence
Emerging foundations for industrial-scale scientific inference
Explainable ML
Variational
Autoencoders
Large Language
Models
Flexibility for operating over multi-modal, heterogeneous data.
Allows for associations, transparency and verifiability.
Foundation for digital twins.
Integrating and extrapolating over multicentric, heterogeneous data.
Integrating mechanistic knowledge.
Allows for semantic interpretation of text at scale.
Extracting and structuring complex textual evidence.