NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020

Hyo Eun Lee
Network Science Lab
Dept. of Biotechnology
The Catholic University of Korea
E-mail: gydnsml@gmail.com
2023.06.28
Bioinformatics 2020

1
 Introduction
• Motivation and tasks
• Biomedical graph
• Purpose
 Method
• Graph embedding methods
• Application of Graph embedding on biomedical network
 Result
• Dataset and experimental set-up
• Link prediction / Node classification results
• Influence of hyperparameters
 Discussion and Conclusion

2
Motivation
• Graph embedding is underutilized in
biomedical networks
• Graph embedding in biomedical networks
can help uncover potential discoveries
1. Introduction
Tasks
: Biomedical Link Prediction Tasks
, Node Classification Tasks
• Biomedical Link Prediction Tasks
DDA(Drug-Disease Association)
DDI(Drug-Drug Interaction)
PPI(Protein - Protein Interaction)
• Node Classification Tasks
Medical term semantic type
Protein function prediction

3
1. Introduction
Biomedical graph
• Graph
node ∶ biomedical entities
edge ∶ relations
• Effects of graph analyzing
: DDA-based prediction of potential drug indications and clinical decision support
, Detecting lncRNA function
• Embedding Method : Automatically learn a low-dimensional future representation
- Method of Preserving Structural Information of graph
- Can be used for downstream tasks

4
Purpose
1. Investigate the potential of embedding an advances graph
2. Links Prediction serves 3 critical biomedical applications
3. Formalizing semantic classification of medical terms and classifying them using embedding techniques
4. Suggest proper embedding method and hyperparameter settings for each task
Fill in this black
Fig. 1. Pipeline for applying graph embedding methods to biomedical tasks. Low-dimensional node representations are first learned from biomedical networks by graph embedding methods and then used as features to build specific classifiers for different tasks. For
(a) matrix factorization-based methods, they use a data matrix (e.g. adjacency matrix) as the input to learn embeddings through matrix factorization. For (b) random walk-based methods, they first generate sequences of nodes through random walks and then feed the
sequences into the word2vec model (Mikolov et al., 2013) to learn node representations. For (c) neural network-based methods, their architectures and inputs vary from different models (see Section 2 for details)
1. Introduction

5
2. Method
Graph embedding methods
• 11 Embedding Methods
Type : MF(5) / Random Walk(3)
/ Neural Network(3)
Fill in this black

6
2. Method
• First-order proximity
: Based on direct connections between two objects
(local)
• Second-order proximity
: Considers indirect connections between objects
(global)
• High-order proximity
: Consider the neighbors of neighbors
Fill in this black

7
• (a) MF-based methods
- Factorize a data metric into a low-dimensional vector
- Preserves hidden manifold structure and topological properties
2. Method
HOPE GraRep

8
• (b) Random walk-based methods
- Create a node sequence to learning node representations
2. Method
Deep Walk node2vec
struc2vec

9
• (c) Neural Network -based methods
- Different methods use different architectures and information inputs
2. Method
LINE SDNE GAE

10
2. Method
Application of Graph embedding on biomedical network
• 3 biomedical link prediction(DDA, DDI,
PPI) and node classifications
Type : Link prediction(3)
, Node classification(2)
Fill in this black

11
• 1) Link prediction
- Predicting potential interactions based on
biomedical entities and unknown interactions
2. Method
Formalize
• Traditional methods
: Use biological feature structures, gene ontology, graph properties
→ Problem 1. Difficult to apply and use biological features
2. Fit of bio-features
⇒ Use graph embedding methods to solve this problem
• Use supervised or semi-supervised graph inference models to make
predictions

12
• 2) Node classification
- Protein function prediction, Medical terms classification
2. Method
Protein function prediction
• Real experiments are expensive
, so graph-based methods were introduced
Medical terms classification
• Models for using the growth of clinical text to improve
personalized care and aid judgment
• Medical terms (using UMLS data) and how to measure their co-
occurrence to overcome privacy concerns
Fig. 2. Illustration of (a) how medical term–term co-occurrence graph is
constructed and (b) node type classification in the graph. Our work
assumes that the graph is given as in Finlayson et al. (2014) and mainly
focuses on (b), i.e. testing various embedding methods on the
classification performancE

13
2. Method
Summary of embedding methods

14
3. Results
Dataset (7)
Link prediction Node classification
: DDA(2), DDI(1), PPI(1)
• DDA
- Validated association of
chemicals and disease pathways in CTDs
- Drug-disease relationship in NDF-RT in UMLS
• DDI
- Comprehensive data from DrugBank
• PPI
- Get Homo sapiens PPIs from STRING
: Term-Term Co-occurrence Graph(1), PPI(1)
• Refine data from stanford hospitals and clinics
using frequency of occurrence statistics
• PPI
- Using Meshup data and Node2vec

16
3. Results
Experimental set-up
Link prediction
• Known interactions(Positive)
: 80% Training 20% Testing
• Unknown interactions(majority)
: Negative sampling
• Evaluation: ROC curve (AUC), accuracy, F1 score
Node classification
• Training by embedding the entire graph
information
• Nodes with label information
: 80% Training 20% Testing
• Evaluation : F1(Percentage) Micro/Macro
• Dimension setting: 100
Use grid search to tune 1-2 critical hyperparameters

17
3. Results
Link prediction results
Note: Due to the limited space, we only show the AUC value. Other evaluation metrics can be found in Supplementary Material. The best performing method in each category is in bold.

18
3. Results
Link prediction results
Fig. 3. (a) Comparison with the state-of-the-arts for drug-disease association prediction (LRSSL) (Liang et al., 2017); (b) drug–drug interaction prediction (DeepDDI) (Ryu et al., 2018) and (c) gene (protein)
function prediction (Mashup) (Cho et al., 2016). Same as Mashup, we evaluate their performance on three-level human Biological Process (BP) gene annotations (each containing GO terms with 101–300, 31–
100 and 11–30 genes, respectively). As can be seen, in each task, general graph embedding methods achieve competitive performance against them

19
3. Results
Node classification
Note: The best performing method in each category is in bold. a The source code of GAE provided by the authors does not support a large-scale graph (nodes>40k). We omit its performance on ‘Clini COOC’ here.

20
3. Results
The influence of dimension
ig. 4. The influence of dimensionality on the performance and training time of different embedding methods based on ‘CTD DDA’ dataset

21
3. Results
Fill in this black
Influence of hyperparameters
• Embedding dimensions effects prediction
performance and time efficiency
- When dimensionality exceeds 100,
performance saturates and
time cost increases rapidly.
Fig. 4. The influence of dimensionality on the performance and training time of different
embedding methods based on ‘CTD DDA’ dataset

22
3. Results
Influence of hyperparameters

23
4. Discussion and Conclusion
Discussion
• Need for a comprehensive evaluation of graph embedding methods in biomedical networks
•
• Future research
: Exploring the use of graph embedding methods for various biomedical challenges
(such as gene expression analysis and disease diagnosis)
, Investigating the interpretability of graph embeddings and developing methods to
incorporate domain knowledge into the embedding process
• Emphasized the importance of open source tools and datasets, and the need to develop them
Conclusion
• Evaluate 11 graph embedding methods on 7 biomedical datasets
• Found that embedding methods performed well and the potential for future predictive work
• Provided guidance on setting hyperparameters and discussed potential directions for future work

NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020

NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020

Recommended

Recommended

More Related Content

Similar to NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020

Similar to NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020 (20)

More from ssuser4b1f48

More from ssuser4b1f48 (20)

Recently uploaded

Recently uploaded (20)

NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020

Editor's Notes