Graph Neural Network for Phenotype Prediction

Graph Neural Network for Phenotype Prediction
Céline Brouard and Nathalie Vialaneix
celine.brouard@inrae.fr & nathalie.vialaneix@inrae.fr
https://miat.inrae.fr/brouard/ & http://www.nathalievialaneix.eu
DeepBioHealth
January 26th, 2022

Outline
Introduction
Graph Neural Network
Our experiments
Jan. 17th, 2022 / Céline Brouard and Nathalie Vialaneix
p. 2

Objectives of the work
I prediction: expression −→ phenotype
p. 3

I does the inclusion of network information can improve prediction?
p. 3

I which type of network (PPI, co-expression, ...)?
p. 3

I which type of network (PPI, co-expression, ...)?
I can the learning embed network inference? [not in this talk]
p. 3

Earlier work in this field
Problem: predict Y (numerical) from X (multivariate, dimension p) with a linear
model:
y
|{z}
vector, length n
= X
|{z}
matrix, dimension n×p
× β
|{z}
vector to be estimated, length p
+
p. 4

Earlier work in this field
Problem: predict Y (numerical) from X (multivariate, dimension p) with a linear
model:
y
|{z}
vector, length n
= X
|{z}
matrix, dimension n×p
× β
|{z}
vector to be estimated, length p
+
Examples:
I [Rapaport et al., 2007]: Y is “Radiated/Not radiated sample” and X is gene
expression. A network is given on the p genes based on KEGG metabolic pathways
I [Li and Li, 2008]: Y is time to death (Glioblastoma) and X is gene expression. A
network is given on the p genes based on KEGG metabolic pathways
p. 4

Sketch of main directions
1. use a kernel based on the Laplacian and its associated dot product to compute a
distance
2. use a standard linear model but regularize/penalize it with the Laplacian norm
p. 5

Background and notations
What we have: a network (graph), G, with p nodes, v1, . . . , vp and edges between
these nodes
Example: nodes are genes; edges are known regulatory information or co-expression
p. 6

Background and notations
What we have: a network (graph), G, with p nodes, v1, . . . , vp and edges between
these nodes
Example: nodes are genes; edges are known regulatory information or co-expression
An important matrix: the Laplacian
LG
ij =



−1 if i 6= j and vi and vj are linked by an edge
0 if i 6= j and vi and vj are not linked by an edge
di if i = j
with di the degree (i.e., the number of edges) of node vi .
Minor note (however important for those who like linear algebra): rows and columns of this matrix
sum to 0. It is equivalent to notice that 0 is an eigenvalue (the smallest) with 1p its eigenvector.
p. 6

Eigendecomposition of the Laplacian
L is symmetric and positive (not definite positive) so it can be decomposed into:
L =
p
X
i=1
λi ei e
i
with λi the eigenvalues (in increasing order) and ei the orthonormal eigenvectors in
Rp.
p. 7

Eigendecomposition of the Laplacian
L is symmetric and positive (not definite positive) so it can be decomposed into:
L =
p
X
i=1
λi ei e
i
with λi the eigenvalues (in increasing order) and ei the orthonormal eigenvectors in
Rp.
If you want to extract the most relevant information from the network, use the
smallest eigenvalues with:
I low pass filter (similar to signal processing): FG =
Pr
i=1 λi ei e
i for r p
I regularization FG =
Pp
i=1 φ(λi )ei e
i with φ(x) = e−βλi or 1
λi
for instance (is,
most of the time, a kernel)
p. 7

Take home messages
I eigenvectors of the Laplacian that are associated to the smallest eigenvalues are
strongly related to the graph structure
I many kernels have been derived from the Laplacian [Smola and Kondor, 2003] that
perform regularization on graphs that can be used to measure similarities between
nodes of the graph
p. 8

How to use L in prediction models?
arg min
β∈Rp
n
X
i=1

β
xi − yi
2
+ Cβ
Lβ +C0
kβk1
| {z }
to enforce sparsity
p. 9

How to use L in prediction models?
arg min
β∈Rp
n
X
i=1

β
xi − yi
2
+ Cβ
Lβ +C0
kβk1
| {z }
to enforce sparsity
⇒ implemented in R package glmgraph (not maintained, archived on CRAN)
p. 9

Outline
Introduction
Our experiments
p. 10

Overview of GNN
where last layer is fed to a standard MLP for prediction (if performed at the graph
level)
p. 11

Message passing layers
I are the generalization of convolutional layers to graph data
I general concept introduced in [Gilmer et al., 2017]
p. 12

More formally, for node features xi (for node vi ), representation of node vi , hi ∈ RK is
learned iteratively (layers t = 1, . . . , T) with:
ht+1
i = F ht
i j∈N(vi )φt(ht
i , ht
j )

with : differential permutation invariant function (mean, sum...)
p. 12

More formally, for node features xi (for node vi ), representation of node vi , hi ∈ RK is
learned iteratively (layers t = 1, . . . , T) with:
ht+1
i = F ht
i j∈N(vi )φt(ht
i , ht
j )

with : differential permutation invariant function (mean, sum...)
Here: ChebNets [Defferrard et al., 2016] (based on Laplacian low band filtering)
p. 12

GNN in practice
I Spektral [Grattarola and Alippi, 2020]
I based on tensorflow (at least 2.3.1) (easy to install on ubuntu with pip3 but
installation from source required for the last version)
I github repository https://github.com/danielegrattarola/spektral and
detailed documentation https://graphneural.network/ with tutorials
I many datasets included: https://graphneural.network/datasets/
I PyTorch Geometric [Fey and Lenssen, 2019]
I based on PyTorch (a bit harder to install on ubuntu due to dependencies)
I github repository https://github.com/rusty1s/pytorch_geometric and
detailed documentation
https://pytorch-geometric.readthedocs.io/en/latest/ with examples
I many datasets included: https:
//pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html
p. 13

Starting point
Two references: [Chereda et al., 2019, Chereda et al., 2021]
p. 14

Starting point
Data and code provided.
p. 14

Starting point
p. 15

Starting point
Two references: [Chereda et al., 2019, Chereda et al., 2021] Our questions:
I are we able to reproduce that result...
I ... and to extend it to other datasets? [not in this talk]
p. 16

Starting point
Two references: [Chereda et al., 2019, Chereda et al., 2021] Our questions:
I are we able to reproduce that result...
I ... and to extend it to other datasets? [not in this talk]
I what part does the network play?
I and which type of network is the most interesting?
p. 16

Architecture of the GCN used in Chereda et al., 2019, 2021
Chebyshev convolutional layer
Chebyshev convolutional layer
Pooling layer
Pooling layer
Fully connected layer
batch_size X 10032 X 32
batch_size X 512
batch_size X 128
batch_size X 2
Reshape batch_size X 80256
(2508*32)
input : batch_size X 10032 X 1
p. 17

Graph coarsening [Defferrard et al., 2016]
I Graclus algorithm: computes successive coarser versions of the graph
I clustering objective: normalized cut Wij

1
di
+ 1
dj

I Creation of a balanced binary tree: fake (disconnected) nodes are added to pair
with singletons
I Vertices are then rearranged
→ pooling is analog to pooling a regular 1D signal
p. 18

Implementation of the model of [Chereda et al., 2021] using
Keras and Spektral
I Layers:
I Convolutional layers: Spektral (ChebConv)
I Pooling layers: the coarsening from [Defferrard et al., 2016] is computed in the
preprocessing and then a max pooling of size 2 is used.
I Fully connected layers: Keras (dense) with `2 regularization
I For creating mini-batches data, we use the mixed data mode of Spektral (single
graph and different node attributes)
I The GNN model had to be adapted to take into account the different coarsened
graphs
p. 19

Outline
Introduction
Our experiments
p. 20

Methodology
I tested methods: GNN, RF, SVC, perceptron (with or without regularization),
glmgraph (including a 5-fold CV to tune hyperparameters)
I with expression data scaled or not
I with different networks (for relevant methods): PPI network, correlation network
[partially in this talk], random network [partially in this talk], complete network
[partially in this talk]
p. 21

Methodology
I methodology: 10-fold CV (same folds for all methods)
p. 21

Methodology
I methodology: 10-fold CV (same folds for all methods)
I quality metrics: AUC, accuracy, balanced accuracy, training time, and prediction
time
p. 21

Results: computation times
(not shown) glmgraph computation was extremely high compared to other method
(due to the need of hyperparameter tuning)
then RF and GNN are the slowest
p. 22

Results: accuracy
glmgraph accuracy was very
bad (∼ 0.6), slightly better
with unscaled input data and
correlation network
(significant? second best is
random...) ⇒ tuning
improvement is required
GNN is not better than most
methods based on data with
no network
p. 23

Results: accuracy
p. 24

References
(unofficial) Beamer template made with the help of Thomas Schiex and Andreea Dreau:
https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae
Chereda, H., Bleckmann, A., Kramer, F., Leha, A., and Beissbarth, T. (2019).
Utilizing molecular network information via graph convolutional neural networks to predict metastatic event in breast cancer.
Studies in Health Technology and Informatics, 267:181–186.
Chereda, H., Bleckmann, A., Menck, K., Perera-Bel, J., Stegmaier, P., Auer, F., Kramer, F., Leha, A., and Beißbarth, T. (2021).
Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in
breast cancer.
Genome Medicine, 13:42.
Defferrard, M., Bresson, X., and Vandergheynst, P. (2016).
Convolutional neural networks on graphs with fast localized spectral filtering.
In Lee, D. D., von Luxburg, U., Garnett, R., Sugiyama, M., and Guyon, I., editors, Advances in Neural Information Processing Systems (NIPS
2016), volume 29, pages 3844–3852, Red Hook, NY, USA. Curran Associates Inc.
Fey, M. and Lenssen, J. E. (2019).
Fast graph representation learning with pytorch geometric.
In Proceedings of RLGM Workshop at ICLR 2019.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017).
Neural message passing for quantum chemistry.
In Precup, D. and The, Y. W., editors, Proceedings of the 34 th International Conference on Machine Learning (ICML 2017), volume 70,
pages 1263–1272, Sydney, Australia.
p. 24

Grattarola, D. and Alippi, C. (2020).
Graph neural networks in TensorFlow and Keras with Spektral.
In Proceedings of the Graph Representation Learning and Beyond – ICML 2020 Workshop.
Li, C. and Li, H. (2008).
Network-constrained regularization and variable selection for analysis of genomic data.
Bioinformatics, 24(9):1175–1182.
Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E., and Vert, J.-P. (2007).
Classification of microarray data using gene networks.
BMC Bioinformatics, 8:35.
Smola, A. and Kondor, R. (2003).
Kernels and regularization on graphs.
In Warmuth, M. and Schölkopf, B., editors, Proceedings of the Conference on Learning Theory (COLT) and Kernel Workshop, Lecture Notes
in Computer Science, pages 144–158, Washington, DC, USA. Springer-Verlag Berlin Heidelberg.
p. 24

Graph Neural Network for Phenotype Prediction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Graph Neural Network for Phenotype Prediction

Similar to Graph Neural Network for Phenotype Prediction (20)

More from tuxette

More from tuxette (20)

Recently uploaded

Recently uploaded (20)

Graph Neural Network for Phenotype Prediction