Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
1. Van Thuy Hoang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: hoangvanthuy90@gmail.com
2023-12-18
PMLR 2023
2. 2
Graph Convolutional Networks (GCNs)
Generate node embeddings based on local network neighborhoods
Nodes have embeddings at each layer, repeating combine messages
from their neighbor using neural networks
4. 4
Graph Attention Networks
Employing self-attention over the node features to do so.
This choice was not without motivation, as self-attention has
previously been shown to be self-sufficient for state-of-the-art-level
results on machine translation, as demonstrated by the Transformer
architecture
5. 5
Questions
Can the model remain expressive over deep layers?
How to design a deep GAT?
6. 6
From hard to soft attentions
Message-Passing GNNs
Edge Attention: Edge-attention GNNs (e.g., GAT and its variants)
learn an edge-attention matrix
Hop Attention: With hop attention, different importance γ(k) can be
assigned at different layers k for every node :
the hop attention matrix
7. 7
Cumulative Attention
A concept of cumulative attention matrix, denoted by T (k) ,
Intuitively represents attention between all node pairs within k hops
(or equivalently, at layer k) that considers both edge and hop
attentions
an edge-attention matrix
8. 8
Proposed Method: AERO-GNN
Attentive dEep pROpagation-GNN (AERO-GNN)
The feature transformation and propagation of AERO-GNN consist
of:
Using Layer-Aggregated Features (more stable)
9. 9
Proposed Method: AERO-GNN
Attention Functions
Compute the pre-normalized edge attention at each layer:
Softplus is used to positively map edge attention, with two
primary advantages over two other mapping functions, exp
and tanh
Hop Attention:
10. 10
Experiments
Datasets:
12 node classification benchmark datasets, among which 6 are
homophilic and 6 are heterophilic
Baseline Methods:
edge-attention GNNs (GAT, GATv2, GATv2)
12. 12
Discussion
Bridge the two research directions, addressing two underexplored
questions:
What are the unique challenges in deep graph attention
How can we design provably more expressive deep graph
attention?
Under a larger context, these findings extend prior literature on
limitations to deep attention in general
demonstrate that attention-based GNNs share related, yet distinct,
problems and propose a novel solution.
This study will inspire future research on deep attention and graph
learning in various directions.