Similar to NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020 (20)
Navigating Identity and Access Management in the Modern Enterprise
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks: methods, applications and evaluations", Bioinformatics 2020
1. Hyo Eun Lee
Network Science Lab
Dept. of Biotechnology
The Catholic University of Korea
E-mail: gydnsml@gmail.com
2023.06.28
Bioinformatics 2020
2. 1
Introduction
• Motivation and tasks
• Biomedical graph
• Purpose
Method
• Graph embedding methods
• Application of Graph embedding on biomedical network
Result
• Dataset and experimental set-up
• Link prediction / Node classification results
• Influence of hyperparameters
Discussion and Conclusion
3. 2
Motivation
• Graph embedding is underutilized in
biomedical networks
• Graph embedding in biomedical networks
can help uncover potential discoveries
1. Introduction
Tasks
: Biomedical Link Prediction Tasks
, Node Classification Tasks
• Biomedical Link Prediction Tasks
DDA(Drug-Disease Association)
DDI(Drug-Drug Interaction)
PPI(Protein - Protein Interaction)
• Node Classification Tasks
Medical term semantic type
Protein function prediction
4. 3
1. Introduction
Biomedical graph
• Graph
node ∶ biomedical entities
edge ∶ relations
• Effects of graph analyzing
: DDA-based prediction of potential drug indications and clinical decision support
, Detecting lncRNA function
• Embedding Method : Automatically learn a low-dimensional future representation
- Method of Preserving Structural Information of graph
- Can be used for downstream tasks
5. 4
Purpose
1. Investigate the potential of embedding an advances graph
2. Links Prediction serves 3 critical biomedical applications
3. Formalizing semantic classification of medical terms and classifying them using embedding techniques
4. Suggest proper embedding method and hyperparameter settings for each task
Fill in this black
Fig. 1. Pipeline for applying graph embedding methods to biomedical tasks. Low-dimensional node representations are first learned from biomedical networks by graph embedding methods and then used as features to build specific classifiers for different tasks. For
(a) matrix factorization-based methods, they use a data matrix (e.g. adjacency matrix) as the input to learn embeddings through matrix factorization. For (b) random walk-based methods, they first generate sequences of nodes through random walks and then feed the
sequences into the word2vec model (Mikolov et al., 2013) to learn node representations. For (c) neural network-based methods, their architectures and inputs vary from different models (see Section 2 for details)
1. Introduction
6. 5
2. Method
Graph embedding methods
• 11 Embedding Methods
Type : MF(5) / Random Walk(3)
/ Neural Network(3)
Fill in this black
7. 6
2. Method
Graph embedding methods
• First-order proximity
: Based on direct connections between two objects
(local)
• Second-order proximity
: Considers indirect connections between objects
(global)
• High-order proximity
: Consider the neighbors of neighbors
Fill in this black
8. 7
Graph embedding methods
• (a) MF-based methods
- Factorize a data metric into a low-dimensional vector
- Preserves hidden manifold structure and topological properties
2. Method
HOPE GraRep
9. 8
Graph embedding methods
• (b) Random walk-based methods
- Create a node sequence to learning node representations
2. Method
Deep Walk node2vec
struc2vec
10. 9
Graph embedding methods
• (c) Neural Network -based methods
- Different methods use different architectures and information inputs
2. Method
LINE SDNE GAE
11. 10
2. Method
Application of Graph embedding on biomedical network
• 3 biomedical link prediction(DDA, DDI,
PPI) and node classifications
Type : Link prediction(3)
, Node classification(2)
Fill in this black
12. 11
• 1) Link prediction
- Predicting potential interactions based on
biomedical entities and unknown interactions
2. Method
Formalize
• Traditional methods
: Use biological feature structures, gene ontology, graph properties
→ Problem 1. Difficult to apply and use biological features
2. Fit of bio-features
⇒ Use graph embedding methods to solve this problem
• Use supervised or semi-supervised graph inference models to make
predictions
Application of Graph embedding on biomedical network
13. 12
• 2) Node classification
- Protein function prediction, Medical terms classification
2. Method
Protein function prediction
• Real experiments are expensive
, so graph-based methods were introduced
Medical terms classification
• Models for using the growth of clinical text to improve
personalized care and aid judgment
• Medical terms (using UMLS data) and how to measure their co-
occurrence to overcome privacy concerns
Application of Graph embedding on biomedical network
Fig. 2. Illustration of (a) how medical term–term co-occurrence graph is
constructed and (b) node type classification in the graph. Our work
assumes that the graph is given as in Finlayson et al. (2014) and mainly
focuses on (b), i.e. testing various embedding methods on the
classification performancE
15. 14
3. Results
Dataset (7)
Link prediction Node classification
: DDA(2), DDI(1), PPI(1)
• DDA
- Validated association of
chemicals and disease pathways in CTDs
- Drug-disease relationship in NDF-RT in UMLS
• DDI
- Comprehensive data from DrugBank
• PPI
- Get Homo sapiens PPIs from STRING
: Term-Term Co-occurrence Graph(1), PPI(1)
• Refine data from stanford hospitals and clinics
using frequency of occurrence statistics
• PPI
- Using Meshup data and Node2vec
17. 16
3. Results
Experimental set-up
Link prediction
• Known interactions(Positive)
: 80% Training 20% Testing
• Unknown interactions(majority)
: Negative sampling
• Evaluation: ROC curve (AUC), accuracy, F1 score
Node classification
• Training by embedding the entire graph
information
• Nodes with label information
: 80% Training 20% Testing
• Evaluation : F1(Percentage) Micro/Macro
• Dimension setting: 100
Use grid search to tune 1-2 critical hyperparameters
18. 17
3. Results
Link prediction results
Note: Due to the limited space, we only show the AUC value. Other evaluation metrics can be found in Supplementary Material. The best performing method in each category is in bold.
19. 18
3. Results
Link prediction results
Fig. 3. (a) Comparison with the state-of-the-arts for drug-disease association prediction (LRSSL) (Liang et al., 2017); (b) drug–drug interaction prediction (DeepDDI) (Ryu et al., 2018) and (c) gene (protein)
function prediction (Mashup) (Cho et al., 2016). Same as Mashup, we evaluate their performance on three-level human Biological Process (BP) gene annotations (each containing GO terms with 101–300, 31–
100 and 11–30 genes, respectively). As can be seen, in each task, general graph embedding methods achieve competitive performance against them
20. 19
3. Results
Node classification
Note: The best performing method in each category is in bold. a The source code of GAE provided by the authors does not support a large-scale graph (nodes>40k). We omit its performance on ‘Clini COOC’ here.
21. 20
3. Results
The influence of dimension
ig. 4. The influence of dimensionality on the performance and training time of different embedding methods based on ‘CTD DDA’ dataset
22. 21
3. Results
Fill in this black
Influence of hyperparameters
• Embedding dimensions effects prediction
performance and time efficiency
- When dimensionality exceeds 100,
performance saturates and
time cost increases rapidly.
Fig. 4. The influence of dimensionality on the performance and training time of different
embedding methods based on ‘CTD DDA’ dataset
24. 23
4. Discussion and Conclusion
Discussion
• Need for a comprehensive evaluation of graph embedding methods in biomedical networks
•
• Future research
: Exploring the use of graph embedding methods for various biomedical challenges
(such as gene expression analysis and disease diagnosis)
, Investigating the interpretability of graph embeddings and developing methods to
incorporate domain knowledge into the embedding process
• Emphasized the importance of open source tools and datasets, and the need to develop them
Conclusion
• Evaluate 11 graph embedding methods on 7 biomedical datasets
• Found that embedding methods performed well and the potential for future predictive work
• Provided guidance on setting hyperparameters and discussed potential directions for future work
Editor's Notes
동기현재까지 그래프 임베딩은 소셜 또는 단순한 바이오 인포메이션 네트워크에서 사용되었으며, 체계적 실험 및 분석 관련 바이오메디컬 네트워크에서는 사용되지 않고 있었다.
따라서 바이오 메디컬 네트워크에 이를 적용하면 잠재적인 발견을 할 수 있을 거다.
Task
: 이 논문에서는 11가지 임베딩 방법을 크게 2가지 테스크에 적용하는데, 각각 바이오메디컬 링크 프리딕션 테스크와 노드 클레시피케이션 테스크로 나뉜다. 세부적으로는 --.