Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-prints 2019.pptx
1. Hyo Eun Lee
Network Science Lab
Dept. of Biotechnology
The Catholic University of Korea
E-mail: gydnsml@gmail.com
2023.09.11
arXiv e-prints 2019
2. 1
Theoretical definition
• Notation and Background
• Weisfeiler-Leman Algorithm
• Graph Neural Networks
k-dimensional Graph Neural Networks
Experimental Study
• Datasets
• Baselines
• Experimental Protocol
• Model Configuration
• Results and Discussion
Conclusion
3. 2
1. Theoretical definition
Notation and Background
• Graph 𝐺 = 𝑉, 𝐸 (𝐸 ⊆ 𝑢, 𝑣 ⊆ 𝑉 𝑢 ≠ 𝑣})
• Neighbors 𝑁(𝑣) = {𝑢 ∈ 𝑉(𝐺) | (𝑣, 𝑢) ∈ 𝐸(𝐺)}
• Two graphs 𝐺 and 𝐻 are isomorphic
if there exists a space ϕ ∶ 𝑉 𝐺 → 𝑉 𝐻 𝑠. 𝑡. 𝜙 𝑢 , 𝜙 𝑣 ∈ 𝐸 𝐻
• Color a node 𝑙: 𝑉 𝐺 → 𝛴
• Color equivalence c ≡ d (c v d and d v c)
• Color classes 𝑄 ⊆ 𝑉(𝐺)
4. 3
1. Theoretical definition
Weisfeiler-Leman Algorithm
• Define a node coloring cl
t
: 𝑉 𝐺 → 𝛴
𝑐𝑙
𝑡
𝑣 = HASH((𝑐𝑙
𝑡−1
𝑣 , {{𝑐𝑙
𝑡−1
(𝑢)|𝑢 ∈ 𝑁(𝑣)}}))
• HASH is a function for bijectively applying a previously unused color
• If two graphs have different node colors, they are not isomorphic
• Terminate the algorithm if the number of colors does not change between iterations
• set a maximum value if it continues to change
• This method has difficulty distinguishing between all isomorphic graphs
, but can be applied in a large domain
5. 4
1. Theoretical definition
The k-dimensional Weisfeiler-Leman algorithm
• For k-dimensional algorithms, a generalization of 1-WL is to color sets of nodes instead of nodes.
• That is cl,k
t
: 𝑉 𝐺 𝑘
→ 𝛴
𝑁𝑗 𝑠 = {(𝑠1, … , 𝑠𝑗−1, 𝑟 , 𝑠𝑗+1, … , 𝑠𝑘)|𝑟 ∈ 𝑉(𝐺)}
𝐶𝑗
𝑡
𝑠 = HASH({ 𝑐𝑙,𝑘
𝑡−1
𝑠′
𝑠′
∈ 𝑁𝑗(𝑠)}})
𝑐𝑘,𝑙
𝑡
𝑠 = HASH((𝑐𝑘,𝑙
𝑡−1
𝑠 , (𝐶1
𝑡
𝑠 , … , 𝐶𝑘
𝑡
(𝑠)))
• Color a set of neighbors a certain color, getting different colors for different number of neighbors or different
neighbors
• This method performs better than traditional methods, but there are still graphs that are indistinguishable.
6. 5
1. Theoretical definition
Graph Neural Networks
• Define 𝑓0
∶ 𝑉 𝐺 → ℝ1×𝑑
• This method consists of aggregating the attribute information of neighbors
and transcribing the aggregated information to the next layer using
𝑓𝑡
𝑣 = 𝜎 𝑓𝑡−1
𝑣 𝑊1
𝑡
+
𝑤∈𝑁 𝑣
𝑓𝑡−1
𝑤 𝑊2
𝑡
• In another example, in LSTM, a characteristic is defined as
𝑓𝑚𝑒𝑟𝑔𝑒
𝑊1
𝑓𝑡−1
𝑣 , 𝑓𝑎𝑔𝑔𝑟
𝑊2
𝑓𝑡−1
𝑤 𝑤 ∈ 𝑁 𝑣
7. 6
1. Theoretical definition
Graph Neural Networks
• W1 aggregates the set of characteristics of neighbors
, and W2 merges the representation of nodes in each step with the characteristics of neighbors.
• The characteristics for the entire graph can be computed by summing as follows
𝑓𝐺𝑁𝑁 𝐺 =
𝑢∈𝑉(𝐺)
𝑓𝑇(𝑣)
• More sophisticated methods can use differentiated pooling, soft assignment, etc.
Commonly used parameters can be optimized in an end-to-end
8. 7
1. Theoretical definition
Relationship Between 1-WL and 1-GNNs
• Based on the functional expression defined above
, we show that 1GNN is worse at discriminating in terms of subgraphs compared to the algorithm.
• Theorem1.
: For all t and initial colors and weights in a labeled graph, the following can be defined
𝑐𝑙
𝑡
⊑ 𝑓𝑡
• Theorem2.
: For every trial t in a labeled graph, the sequence and architecture of the weights is defined by
𝑐𝑙
𝑡
≡ 𝑓𝑡
• Can have the same ability to distinguish between isomorphic graphs
• However, GNNs are more efficient due to the adaptability of the graph
and their ability to handle continuous features.
9. 8
2. k-dimensional Graph Neural Networks
Model description
• For a given 𝑘, consider the subset 𝑉 𝐺 𝑘 of all k-elements in 𝑉(𝐺)
𝑁 𝑠 = 𝑡 ∈ 𝑉 𝐺 𝑘
𝑠 ∩ 𝑡 = 𝑘 − 1}
• At this point, the neighborhood is divided into global and local neighborhoods
, each of which is defined as follows
• The local neighborhood 𝑁𝐿 𝑠 contains all 𝑡 ∈ 𝑁(𝑠) such that (𝑣, 𝑤) ∈ 𝐸(𝐺)
• The global neighborhood 𝑁𝐺(𝑠) is then defined as 𝑁 𝑠 N𝐿 𝑠 2
• From the labeled data, compute 𝑐𝑠,𝑘,𝑙
𝑡
using the feature vectors as follows
𝑓𝑘
𝑡
𝑠 = 𝜎(𝑓𝑘
𝑡−1
𝑠 𝑊1
𝑡
+
𝑢∈𝑁𝐿(𝑠)∪𝑁𝐺(𝑠)
𝑓𝑘
𝑡−1
𝑢 𝑊2
𝑡
)
10. 9
2. k-dimensional Graph Neural Networks
Model description
• Learn the importance of local and global neighbors using different parameters
• To scale k-GNN to large datasets and avoid overfitting, we propose a local k-GNN that omits global
neighbors
𝑓𝑘,𝐿
𝑡
𝑠 = 𝜎 𝑓𝑘
𝑡−1
𝑠 𝑊1
𝑡
+
𝑢∈𝑁𝐿 𝑠
𝑓𝑘,𝐿
𝑡−1
𝑢 𝑊2
𝑡
• Proposition 3
: For every choice of initial coloring 𝑓𝑘
0
that matches the graph and every step, and for every weight W(𝑡)
𝑐𝑠,𝑘,𝑙
𝑡
⊑ 𝑓𝑘
𝑡
• Proposition 4
: For any graph and any step, there exists a set of weights W(t) and a k-GNN architecture
𝑐𝑠,𝑘,𝑙
𝑡
≡ 𝑓𝑘
𝑡
11. 10
Hierarchical Variant
• Propose a hierarchical k-GNN that iteratively learns features learned from a one-dimensional GNN with
isomorphism types as initial feature input in a k-GNN
𝑓𝑘
0
𝑠 = 𝜎 [ 𝑓𝑖𝑠𝑜 𝑠 ,
𝑢⊂𝑠
𝑓𝑘−1
𝑇𝑡−1
𝑢 ]𝑊𝑘−1
• 𝑇𝑘−1 > 0, 𝑊𝑘−1 is an appropriately sized matrix, brackets indicate matrix concatenation
• This method has the same representational power as GNNs because it satisfies Propositions 3 and 4
, but it is easier to interpret more real-world graphs because it includes hierarchies.
2. k-dimensional Graph Neural Networks
12. 11
3. Experimental Study
Goals
• Investigate the benefits of GNNs over kernels and proposed model architectures
• Performance differences between k-GNNs and state-of-the-art kernel methods
• Performance difference between k-GNNs and 1-GNNs
• Impact of optimization algorithms on GNNs
Datasets
• Using well-known benchmark datasets to compare the k-GNN architecture to kernel methods
• Used the Q M 9 dataset to verify that it scales on large datasets
• Consists of 133385 small molecules, and experiments were performed on 12 thermophiles
13. 12
3. Experimental Study
Baselines
• Kernel Baselines
• Graphlet/Weisfeiler-Lehman subtree kernel (WL)
/Weisfeiler-Lehman optimal allocation kernel (WL - OA)/Global-local k-WL
• Calculate the Normalized Grammatrix
• Use 10-fold cross-validation to calculate classification accuracy
• GNN Criteria
• Base 1-GNN layer/ PatchySan
14. 13
3. Experimental Study
Experimental Protocol
• Always used 3 layers for 1-GNN and 2 layers for 2-GNN and 3-GNN, and used the computed features from
1-GNN as the initial features for 2-GNN (1-2-GNN) and 3-GNN (1-3-GNN)
• Computed features from 1-2-GNN and 1-3-GNN are component wise concatenated
to get 1-2-3-GNN
• For the final classification and regression steps
, we used a three-layer MLP with binary cross-entropy and mean square error.
• Use a dropout layer after the first layer of the MLP for classification
• Apply global mean pooling to generate a vector representation of the graph.
• Trained with 100 epochs for the classification network and 200 epochs for the regression network
• Using the Adam optimizer
15. 14
3. Experimental Study
Model Configuration
• The benchmark dataset is subjected to 10-fold cross-validation for comparison with the kernel
, and the 10% of the training folds are randomly sampled and used as the validation set.
• Q M 9 Randomly sampled 10% of the dataset, another 10% for testing, and the remaining data for training.
16. 15
3. Experimental Study
Results and Discussion
• Hierarchical k-GNN produces kernel-like results despite small dataset size (Question 1)
• 1-2-3-GNN outperforms 1-GNN on all seven datasets (Question 2)
• Better results on the P R O T E I N S benchmark dataset even without optimization (Question 3)
17. 16
3. Experimental Study
Results and Discussion
• However, the additional structural information extracted by the k-GNN layer is not helpful for all tasks
• Mention that the k-GNN model has more parameters than the 1-GNN model.
• Stacked additional layers to match the number of parameters in the 1-GNN model
, but did not get better results in any of the experiments
18. 17
4. Conclusions
Conclusions
• Present theoretical work on GNNs, showing that a broad class of GNN architectures may not be more
powerful than 1-WLs
• GNNs have, in general, the same ability as 1-WL in terms of distinguishing between nonisomorphic graphs,
while having the advantage of adapting to a given data distribution
• Proposed k-GNN, a generalization of GNNs based on k-WLs
• Strictly robust in terms of distinguishing graphs that are not (partially) isomorphic and distinguish more
graph properties than 1-GNNs
• Propose a hierarchical k-GNN variant that can take advantage of hierarchical configurations
• Requires a suitable k-GNN design for each classification criterion