"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data Engineering.pptx

Ho-Beom Kim
Network Science Lab
Dept. of Mathematics
The Catholic University of Korea
E-mail: hobeom2001@catholic.ac.kr
2023 / 09 / 25
IEEE Transactions on Knowledge and Data Engineering

2
Introduction
Problem Statements
• Graph-structured data is ubiquitous in many real-world systems, such as social networks, biological
networks, and citation networks
• To learn from graph-structured data, typically an encoder function is needed to project high-
dimensional node features into a low-dimensional embedding space such that “semantically” similar
nodes are close to each other in the low-dimensional Euclidean space
• Training in an unsupervised way, these methods employ dot product or cooccurrances on short
random walks over graphs to measure the similarity between a pair of nodes.
• How they manipulate the adjacent matrix
• spectral graph convolution networks
• neighbor aggregation or message passing algorithms

3
Introduction
Contributions
• SGATs simplify the architecture of GATs, and this reduces the risk of overfitting
• SGATs can identify noisy/task-irrelevant edges1 of a graph such that an edge-sparsified graph structure
can be discovered, which is more robust for downstream classification tasks.
• SGAT is a robust graph learning algorithm that can learn from both assortative and disassortative
graphs, while GAT fails on disassortative graphs.

4
Related Work
Graph Sparsificatoin
• Spectral graph sparsification aims to remove unnecessary edges for graph compression
• DropEdge
• propose to stochastically drop edges from graphs to regularize the training of GNNs
• Randomly removes a certain number of edges from an input graph at each training iteration to prevent
the overfitting and oversmoothing issues
• Does not induce an edge-sparsified graph since different subsets of edges are removed at different
training iterations and the full graph is utilized for validation and test
• SGAT learns an edge-sparsified graph by removing noisy/task-irrelevant edges permanently from
input graph.
• LAGCN
• Add/remove edges based on the predictions of a trained edge classifier
• It assumes the input graph is almost noisy free such that an edge classifier can be trained reliably from
the existing graph topology
• This assumption does not hold for very noisy graphs

5
Related Work
Graph sparsification
• NeuralSparse
• Learns a sparsification network to sample a k-neighbor subgraph, which is them fed to GCN,
GraphSage or GAT for node classification
• It does not aim to learn an edge-sparsified graph as the sparsification network produces a different
subgraph sample each time and multiple subgraphs are used to improve accuracy
• PTDNet
• Proposes to improve the robustness and generalization performance of GNNs by learning to drop
task-irrelevant edges
• Sample a subgraph for each layer and applies a denoising layer before each GNN layer
• SuperGAT
• Improves Gat with an edge self-supervision regularization.

7
Related Work
Histogram of variance of coefficient of a 2-layer GAT with 8-head attention on the Cora and
Citeseer datasets

8
Methodology
SGAT
• 𝐴 = 𝐴 ⨀𝑍, 𝑍 ∈ 0, 1 𝑀
• 𝑅 𝑊, 𝑍 =
1
𝑛 𝑖=1
𝑛
ℒ 𝑓𝑖 𝑋, 𝐴⨀𝑍, 𝑊 , 𝑦𝑖 + 𝜆 𝑍 0
• =
1
𝑛 𝑖=1
𝑛
ℒ 𝑓𝑖 𝑋, 𝐴⨀𝑍, 𝑊 , 𝑦𝑖 + λ 𝑖,𝑗 ∈𝐸 1 𝑧𝑖𝑗≠0
• ℎ𝑖
(𝑙+1)
= 𝜎 𝑗∈𝑁𝑖
𝑎𝑖𝑗ℎ𝑗
(𝑙)
𝑊(𝑙)
• 𝑎𝑖𝑗 = 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒 𝐴𝑖𝑗𝑧𝑖𝑗 =
𝐴𝑖𝑗𝑧𝑖𝑗
𝑘∈𝑁𝑖
𝐴𝑖𝑘𝑧𝑖𝑘
• ℎ𝑖
(𝑙+1)
= ||𝐾
𝑘=1𝜎 𝑗∈𝑁𝑖
𝑎𝑖𝑗ℎ𝑗
𝑙
𝑊
𝑘
𝑙

9
Methodology
Model Optimization
• min
𝑧
𝐹 𝑧 ≤ 𝔼𝑧~𝑞 𝑧 𝐹(𝑧)
• 𝑅 𝑊, 𝜋 =
1
𝑛 𝑖=1
𝑛
𝔼𝑞 𝑍 𝜋 𝐿 𝑓𝑖 𝑋, 𝐴⨀𝑍, 𝑊 , 𝑦𝑖 + 𝜆 𝑖,𝑗 ∈𝐸 𝜋𝑖𝑗
• 𝑅 𝑊, 𝑙𝑜𝑔𝛼 =
1
𝑛 𝑖=1
𝑛
𝔼𝑢~𝑈(0,1)𝐿 𝑓𝑖 𝑋, 𝐴⨀𝑍𝑔 𝑓 𝑙𝑜𝑔𝛼, 𝑢 , 𝑊 , 𝑦𝑖 + 𝜆 𝑖,𝑗 ∈𝐸 𝜎 𝑙𝑜𝑔𝛼𝑖𝑗 − 𝛽 log
−𝛾
𝜁
• 𝑙𝑜𝑔𝛼𝑖𝑗 = 𝑥𝑖𝑊
0
(0)
||𝑥𝑗𝑊
0
(0)
𝑏𝑇

10
Experiments
Summary of the Graph Datasets Used in the Experiments

11
Experiments
The evolution of the graph of the Zachary’s Karate Club at different training epochs

12
Experiments
Classification Accuracies on Seven Assortative Graphs for Semi-Supervised Node Classification

13
Experiments
Classification Accuracies of Different Node Classification Algorithms On Four Disassortative Graphs

14
Experiments
The evolution of classification accuracy and number of kept edges

15
Experiments
The evolution of classification accuracies on the PPI test dataset

16
Experiments
The impact of 𝝀 to the classification accuracy and edge sparsity on the PPI
and Texas validation dataset

17
Experiments
The evolution of classification accuracy of SGAT as a function of number of heads on the PPI and
Texas datasets

18
Experiments
T-SNE visualization of learned feature representations on Cora and Texas

19
Conclusions
Conclusions and Futurework
• They propose sparse graph attention networks that integrate a sparse attention mechanism into graph
attention networks via an L0-nrom regularization on the number of edges of a graph
• To assign a single attention coefficient to each edge, SGATs further simplify the architecture of GATs by
sharing one set of attention coefficients across all heads and all layers
• Can detect and remove noisy/task-irrelevant edges from a graph in order to achieve a similar or
improved accuracy on downstream classification tasks
• As for future extensions, they plan to investigate the applications of SGATs on detecting superficial or
malicious edges injected by adversaries
• Plan to explore the application of sparse attention network of SGATs in unsupervised graph domain
adaptaion

"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data Engineering.pptx

Recommended

Recommended

More Related Content

Similar to "Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data Engineering.pptx

Similar to "Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data Engineering.pptx (20)

More from ssuser2624f71

More from ssuser2624f71 (20)

Recently uploaded

Recently uploaded (20)

"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data Engineering.pptx