NTU-2019

Machine Learning and Network Analysis Approaches for
Predicting Clinically Relevant Outcomes
Francisco Azuaje, PhD
Head of Bioinformatics
Luxembourg Institute of Health (LIH)

304 citations (Source: Google Scholar)

Mission
To enable patient-oriented research and biological
understanding through advanced computational approaches
Bioinformatics research and support @ LIH
F. Azuaje
(PI)
P. Nazarov
(Scientist)
T. Kaoma
(Bioinformatician)
S.Y Kim
(Bioinformatician)
A. Muller
(Bioinformatician)
K. Baum
(Postdoc. Fellow, part-time)
Y. Zhang
(PhD. Candidate)
Members:
(April 2019)
+ MSc research
students

DataQuestions Approaches Outcomes
Diagnostic
Prognostic
Predictive (drug response)
Other descriptive/modeling
Multi. sources/ technologies
Multi-omics
Clinically-relevant
Cells, animals, patients
Statistical models
Machine learning
Network-based models
Their combinations
Biological understanding
Candidate biomarkers, drugs
and targets
Software, workflows
Our research activities
Collaborations
National and international
Leading and non-leading partner
Funding targets
FNR
EU

Our collaborations
External Focus on Luxembourg

(Topol, 2014, Cell)
(Eisenstein, 2015, Nature)
Biomedical research: larger and diverse datasets
High inter-individual variabilityDatasets change in time and space High intra-individual variability

Key challenges in the field
Heterogeneity: Data, events, states,
within and between individuals…
Data not always “big”: relative lack of
labelled data, curse of dimensionality
Data: multi-layered, hierarchical
For same data type/layer: multiple
measurement platforms

Shared, key challenges in the field (2)
Interpretability, understandability:
Global and local, novelty and consistency
with prior knowledge
Reproducibility:
Crucial requirement
“Gold standards”/”ground truth”:
Lack, limitations
Complexity of pattern recurrence,
regularities

Addressing key challenges through combination of ML and
biological network models
Why networks?
• Networks are intuitive and biologically-meaningful representations of
biological data
• Networks can be used to encode and visualize data, and more
importantly: to extract features and make predictions about the data
• Network-based models can address different predictive modelling
challenges, including: multi-modal/-layered data analysis applications
and interpretable models

A biological network can be represented as a graph that is
biologically meaningful
From: McGillivray et al., 2018, Annu. Rev.
Biomed. Data Sci.

Using biological networks and machine learning for multi-omics
patient stratification
Hypothesis: information encoded in graphs is biologically relevant.
Protein-protein network
Jeong et al., Nature (2001)
Patient similarity network

Using biological networks and machine learning for multi-omics
patient stratification (cont.)
Global strategy Examples of centrality features
• 4 categories of topological features: Centrality (12 measures), modularity
features (from 7 to 153 features), diffusion features (1000), Node2Vec-
derived features (256).
• Each category generates a model
• Integrated models (weighted voting) also investigated

Application example (1): neuroblastoma multi-omic datasets
from the CAMDA challenge
Dataset 1 (498 patients,
2 omic datasets)
Dataset 2 (142 patients,
3 omic datasets)
Focus on Data 1
6,300 classification models
• Models based on graph topology features outperform models based on “classical” approach
• Among topological features, centrality metrics are most predictive (followed by diffusion-based features)

Application example (2): Neuroblastoma multi-omics datasets
from the CAMDA challenge, a deep learning approach*
Global strategy Algorithm Parameters Balanced
accuracy
Death from disease, Fischer-M
DNN h=[8,8,8,2], o=Adam, lr=1e-3, d=0.3 87.3% *
SVM t=RBF, c=64, g=0.25 75.4%
RF n=100 75.1% *
Disease progression, Fischer
DNN h=[4,2,2,2], o=Adam, lr=1e-3, d=0.3 84.7% *
SVM t=RBF, c=16, g=0.0625 81.8%
RF n=100 78.1% *• Network features from each dataset: Centrality (12), modularity
(30 to 47) features.
• Models based on each feature category, and their combination
• Data: 498 patients (2 omic datasets, gene expression data)
• Training (50% of total data), validation and test datasets
• DNNs: multiple architectures, Rectified Linear Units (ReLU),
Softmax function (2 outputs)
Prediction performance on test
dataset (top models)
Top DNN: Input features are graph centrality measures
Fischer-M: 1 dataset only (microarrays)
Fischer: Combination of 2 datasets (microarrays and RNA-Seq)
* Article submitted in
cooperation with:

Global strategy
• Additional Independent dataset (Versteeg, 88 patients,
microarray dataset)
• Network centrality features
• 3000 DNNs / classification task
• DNNs: Rectified Linear Units (ReLU), Softmax function (2
outputs)
Train Test DNN SVM RF
Death from disease, centralities
Fischer-M
Fischer-M 87.3% 75.4% 75.1%
Fischer-R 82.1% 53.5% 66.8%
Versteeg 75.0% 53.3% 67.5%
Fischer-R
Fischer-R 85.8% 66.0% 62.4%
Fischer-M 81.5% 75.4% 61.2%
Versteeg 70.8% 68.3% 67.5%
Further evaluation using independent datasets
Deep neural nets using graph centrality- based
input features offer best prediction performance
* Article submitted in
cooperation with:

Example 2: Linking gene network centrality to anti-cancer drug response
• Biological relevance of central genes/proteins previously determined in several model organisms and phenotypes.
• Their predictive capability in gene co-expression networks in the specific context of cancer-related drug response remains to be
deeply investigated.
Hubs in a pan-cancer cell line co-expression
network are biologically meaningful and
predictive of drug responses

• A (linear) model based on the expression
of 47 hubs shows accurate drug sensitivity
prediction capability (CCLE and GDSC
datasets)
• Independent of expression platform
technology (microarrays, RNA-Seq, qPCR)
• Comparable performance to published
models
• Relative accurate predictions in other
independent cell lines and drugs
Linking gene centrality to anti-cancer drug response (cont.)
Predicted vs. actual drug
sensitivity in the CCLE dataset

Expression of autophagy-related genes accurately predicts anti-cancer drug response.
Example 3: Biological pathway-focused prediction of drug sensitivity
Tests on a leukemia patient dataset
Prediction accuracy in GDSC dataset
Patients treated with Cytarabine (Data from Farge et al., Cancer Disc 2017)
Article in
preparation in
cooperation with:

Takeaways:
• Many ML challenges in BM research are shared by different application domains, but
this field poses its unique challenges.
• Supervised learning, including e.g., deep learning, will meet many of these needs,
however: unbiased exploration, hypothesis generation and interpretation (incl.
“mechanistic”) are crucial.
• The use of graphs/networks to represent data, extract predictive features and
integrate datasets together with ML will continue enabling new discoveries and
applications closer to the patient.

Thanks to:
Funding from:
Bioinformatics team Our research partners in Luxembourg and abroad

NTU-2019

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to NTU-2019

Similar to NTU-2019 (20)

Recently uploaded

Recently uploaded (20)

NTU-2019