Sparse Graph Attention Networks 2021.pptx

Sparse Graph Attention
Networks
Tien-Bach-Thanh Do
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: @catholic.ac.kr
2024/02/20
Yang Ye et al.
IEEE Transactions on Knowledge and Data Engineering, 2021

2
Introduction
• Graphs in Data Representation
○ Graphs model relationships between entities in data
○ Valuable for scenarios such as social networks, molecular structures, and recommendation systems
• Success of Graph Attention Networks (GATs)
○ GATs capture complex dependencies in graph-structured data
○ Demonstrate effectiveness in tasks like node classification, graph classification, and link prediction
• Scalability Challenge
○ GATs face scalability issues with large graphs
○ Computational complexity grows with the number of nodes and edges

3
Background and related work
• G = (V,E) denote a graph with a set of nodes V = {v1,...,vN} connected by a set of edges E
• A denote adjacent matrix
• G graph structure
• function f(X, A, W) parameterized by W
• H is fed to a classifier to predict the class label of each unlabeled node
• To learn the model parameter W => minimize an empirical risk over all labeled nodes

4
Background and related work
Neighbor Aggregation Methods
• Most graph learning algorithms follow a neighbor aggregation mechanism
• Idea: learn a parameter-sharing aggregator, which takes feature vector xi of node i and its neighbors’
feature vectors as inputs and outputs a new feature vector for node i
• 2-layer GCN, encoder function:
• The aggregator of GCNs

5
Problem statement
● Challenges with GATs: discuss the limitations of GATs, such as their tendency to overfit and their poor
performance on disassortative graphs, where nodes of different types tend to connect
● Real-world graphs: explain that real-world graphs are often large and noisy, which exacerbates these
issues

6
Sparse Graph Attention Networks
Key idea
• Sparse Attention Mechanism
○ Instead of considering all neighbors, focus on a subset
○ Achieved through techniques like neighbor sampling and attention sparsity

7
Advantages
• Scalability
○ Reduces computation time and resources for large graphs
○ Enables the application of attention mechanisms to massive datasets
• Memory efficiency
○ Optimizes memory usage by computing attention only on selected neighbors
○ Particularly crucial for graphs with millions or billions of nodes

8
Formulation
• Attach a binary gate zij to each edge
where M is the number of edges.
● To use as fewer edges as possible for semi-supervised node classification => train model parameters W
and binary masks Z by minimizing the following L0-norm regularized empirical risk
● Attention-based aggregation function

9
Model optimization
• Stochastic Variational Optimization
• zij is subject to a Bernoulli distribution
• The hard concrete gradient estimator

10
Model optimization
• Optimize log for each edge. Test phrase, generate a deterministic mask Z by employing the following
formula
which is the expectation of Z under the hard concrete distribution q(Z|log)

11
Benefits of SGATs
● Identifying noisy/task-irrelevant edges: SGATs can identify and remove noisy or task-irrelevant edges,
allowing them to perform feature aggregation on the most informative neighbors
● Performance on disassortative graphs: superior performance of SGATs, especially on disassortative
graphs
● Edge removal: mention that SGATs can remove about 50-80% edge from large assortative graphs, while
remaining similar classification accuracies

14
Evaluation
Synthetic dataset

15
Evaluation
Assortative graphs

16
Evaluation
Disassortative graphs

17
Evaluation
Analysis of removed edges

18
Evaluation
Hyperparameter tuning

19
Evaluation
Hyperparameter tuning

20
Evaluation
Visualization of learned features

21
Challenges and future directions
● Trade-off
○ Fine-tuning the level of sparsity to find the right balance
○ Optimal sparsity may vary depending on the nature of the graph and the task
● Dynamic graphs
○ Extending techniques for graphs that evolve over time
○ Adapting to changes in the structure and relationships
● Benchmarking
○ Developing standardized benchmarks for evaluating Sparse GATs
○ Ensuring fair comparisons with other graph-based models

22
Conclusion
● Efficient graph learning:
○ Sparse graph attention networks offer an efficient solution for large-scale graph learning
○ Balancing computational complexity with model performance is a critical consideration
● First graph learning algorithm: SGATs represent the first graph learning algorithm that shows significant
redundancies in graphs and that edge-sparsified can achieve similar or sometimes higher predictive
performances than original graphs

Sparse Graph Attention Networks 2021.pptx

More Related Content

Similar to Sparse Graph Attention Networks 2021.pptx

More from ssuser2624f71

Recently uploaded

Sparse Graph Attention Networks 2021.pptx