Advertisement

Mar. 30, 2023•0 likes## 0 likes

•37 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Technology

NS-CUK Joint Journal Club: S.T.Nguyen, Review on "How Attentive are Graph Attention Networks?", ICLR 2022

ssuser4b1f48Follow

Advertisement

Advertisement

Advertisement

NS - CUK Seminar: J.H.Lee, Review on "How Attentive are Graph Attention Netw...ssuser4b1f48

ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...Jihun Yun

NS-CUK Seminar: S.T.Nguyen, Review on "On Generalized Degree Fairness in Grap...Network Science Lab, The Catholic University of Korea

Partition Configuration RTNS 2013 PresentationJoseph Porter

NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...ssuser4b1f48

Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks

- Nguyen Thanh Sang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: sang.ngt99@gmail.com
- 1 Introduction • Graphs • GNN • GAT Method • Dynamic Attention • Dynamic Graph Attention Network Evaluations • Datasets • Robustness to noise • Downstream tasks Conclusions • Summarize strengths • Limitation of this work
- 2 Graphs Graphs (Networks) are complex. Several applications of Graph mining: • Node classification: predict a property of a node • Ex: Categorize online users / items • Link prediction: predict whether there are missing links between two nodes • Ex: Knowledge graph completion • Graph classification: categorize different graphs • Ex: Molecule property prediction • Clustering: detect if nodes form a community • Ex: Social circle detection • Other tasks: • Graph generation: drug discovery • Graph evolution: physical simulation
- 3 Graph Neural Network GNN aggregation Fill in this black
- 4 GAT Graph Attention
- 5 Problems GAT • In GAT, every node attends to its neighbors given its own representation as the query. • GAT computes a very limited kind of attention: the ranking of the attention scores is unconditioned on the query node => static attention • Static attention hinders GAT from even fitting the training data. => A dynamic graph attention variant that is strictly more expressive than GAT.
- 6 Contributions • Identified that one of the most popular GNN types, the graph attention network, does not compute dynamic attention. • Introduced formal definitions for analyzing the expressive power of graph attention mechanisms. • A simple fix by switching the order of internal operations in GAT, and propose GATv2, which does compute dynamic attention. • Conducted a thorough empirical comparison of GAT and GATv2 and found that GATv2 outperforms GAT across 12 benchmarks of node-, link-, and graph-prediction. • Found that dynamic attention provided a much better robustness to noise.
- 7 Dynamic Attention • GAT computes only a restricted “static” form of attention: for any query node, the attention function is monotonic with respect to the neighbor (key) scores. • The ranking (the argsort) of attention coefficients: • shared across all nodes in the graph, • unconditioned on the query node. • Dynamic attention computes dynamic scoring for a given set of key vectors and query vectors. • Note that dynamic and static attention are exclusive properties, but they are not complementary. • Every dynamic attention family has strict subsets of static attention families with respect to the same K and Q.
- 8 Dynamic Graph Attention • GATv2 is a simple fix of GAT that has a strictly more expressive attention mechanism. • Modify the order of internal operations in GAT. • Simply apply the a layer after the nonlinearity (LeakyReLU), and the W layer after the concatenation, • A GATv2 layer computes dynamic attention for any set of node representations K=Q={ℎ1, ...,ℎ𝑛}. • GATv2 has the same time-complexity as GAT’s declared complexity: • by merging its linear layers, GATv2 can be computed faster than GAT
- 9 Evaluations • The OGB datasets are used for node- and link-prediction. • QM9 dataset is used for graph prediction. • Varmisuse dataset is used for evaluating node-pointing problem. Datasets
- 10 Evaluations • The accuracy on two node-prediction datasets as a function of the noise ratio p. • As p increases, all models show a natural decline in test accuracy in both datasets. • Computing dynamic attention, GATv2 shows a milder degradation in accuracy compared to GAT, which shows a steeper descent. • GAT cannot distinguish between given data edges and noise edges, because it scores the source and target nodes separately. => Solved by dynamic attention Robustness to noise
- 11 Evaluations • Varmisuse is an inductive node-pointing problem that depends on 11 types of syntactic and semantic interactions between elements in computer programs. • GATv2 is more accurate than GAT and other GNNs in the SeenProj test sets. • Furthermore, GATv2 achieves an even higher improvement in the UnseenProj test set. => the power of GATv2 in modeling complex relational problems, especially since it outperforms extensively tuned models, without any further tuning. Programs
- 12 Evaluations • GATv2 is more accurate than GAT and the non-attentive GNNs. • A single head of GATv2 outperforms GAT with 8 heads. • Increasing the number of heads results in a major improvement for GAT. Node Prediction
- 13 Evaluations • GATv2 achieves a lower (better) average error than GAT, by 11.5% relatively. • GAT achieves the overall highest average error. • In some properties, the non-attentive GNNs, GCN and GIN, perform best. Graph Prediction
- 14 Evaluations • GATv2 achieves a higher MRR than GAT, which achieves the lowest MRR. • The non-attentive GraphSAGE performs better than all attentive GNNs. => attention might not be needed in these datasets / another possibility is that dynamic attention is especially useful in graphs that have high node degrees. • dynamic attention mechanism is especially useful to select the most relevant neighbors when the total number of neighbors is high. Link Prediction
- 15 Conclusions • GATv2 is more accurate than GAT. • Further, GATv2 is significantly more robust to noise than GAT. • In the synthetic DICTIONARYLOOKUP benchmark, GAT fails to express the data, and thus achieves even poor training accuracy. • By modifying the order of operations in GAT, GATv2 achieves a universal approximate or attention function and is thus strictly more powerful than GAT. • This paper shows that many modern graph benchmarks and datasets contain more complex interactions, and thus require dynamic attention. • This model might overfit the training data if the task is “too simple” and does not require such expressiveness.
- 16

Advertisement