- Graph representation learning usually relies on
- Supervised learning
- Feature engineering
- Generic representations of graphs
- Algorithmic approach
- Graph similarity measure is hard due to
- Graph isomorphism
- DDGK learns without supervision and domain knowledge
Traditional Graph Kernels:
- Graph Edit Distance (Gao, et al., 2010) and Maximum Common Subgraph (Bunke, et al., 2002)
- Weisfeiler-Lehman Graph Kernels (Kriege, et al., 2016)
Node Embedding Methods:
- DeepWalk (Perozzi, et al., 2014)
- Graph Attention (Abu-El-Haija, et al., 2018)
Graph Statistics (Feature engineering):
- NetSmilie (Berlingerio, et al., 2012)
- DeltaCon (Koutra, et al., 2013)
Supervised Graph Similarity
- CNN for graphs (Niepert, et al., 2016)
- Graph Convolutional Networks (T. Kipf and M. Welling, 2016)
Given two graphs 𝑆 (Source graph) and 𝑇 (Target graph)
Provides a bidirectional mapping across the pair’s nodes
Input: A one-hot encoded vertex from 𝑇
Output: The vertex’s neighbor
The first attention network (𝑀𝑇→𝑆 )
Place photo here
Assigns every node in 𝑇 with a probability
distribution over the nodes of 𝑆
Consists of one Linear layer
Modeled as a multiclass classifier
𝑃𝑟 𝑣𝑗 𝑢𝑖 =
Node attribute regularizer
Attribute distribution over nodes
Vertices and edges could have their own
Cross-Graph attention could provide several
equally good mapping
Solution: adding regularizing losses to
preserve nodes and edges attributes
Replace 𝑄𝑛 with 𝑄𝑒, we obtain Edge Attribute
Save the similarity score in the matrix 𝚿
for every pair of source and target graph
Could be used as a representation vector
- Since Ψ is not a perfect function, 𝐷(𝑆| 𝑆 ≠ 0 could
𝐷(𝑆| 𝑇 ≔ 𝐷(𝑆| 𝑇 − 𝐷(𝑆||𝑆)
ensures 𝐷(𝑆| 𝑆 = 0
- If symmetry is required, we can define
𝐷(𝑆| 𝑇 ≔ 𝐷(𝑆| 𝑇 + 𝐷(𝑇||𝑆)
DDGK requires 𝑂(𝑇𝑁2
𝑉) computations, where
𝑇 = max(𝜌, 𝜏)
𝑁 = The number of graphs
𝑉 = The average number of nodes
Linear layers in Cross-Graph Attention could be replaced
by a DNN with fixed size hidden layers to reduce the
network size from 𝑂( 𝑉𝑆 × 𝑉𝑇 ) to 𝑂( 𝑉𝑆 + 𝑉𝑇 )
For large number of source graphs, we could sample 20%
of them and DDGK could still achieve high accuracy
Extensions & Future Works
- Edge-to-Nodes Encoder.
- Neighborhood Encoder.
- Subgraph alignment.
- Better regularization to avoid overfitting.
- Combination of the two could be useful for graph classification.
- Perozzi’s newer work: “Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale
Graphs, WWW ’20” could handle graphs with billions of nodes within an hour.
- Neural Networks can learn powerful representations of graphs without feature engineering.
- Proposed DDGK:
- Graph Encoder
- Isomorphism preserving attention
- Provide interpretability into the alignment of pairs of graph
- Divergence score to measure (dis)similarity between source and target graphs
- Representations produced by DDGK are competitive with challenging baselines.
Generic representations of graphs -> Generic node alignment -> Extract useful information
Algorithmic approach from theoretical computer science
NP-hard natural of the classical measurement such as Graph Edit Distance, and Maximum Common Subgraph
Graph isomorphism is a hard problem (no polynomial algorithm)
DeepWalk learns embeddings of a graph's vertices, by modeling a stream of short random walks
Overfit the model on the source graph to accurately obtain the graph’s structure
Similar idea with Target Graph
The idea is given a vertex in the target graph, find the most similar vertex from the source graph
Activation layer is Softmax
The source graph encoder outputs the neighbors of the chosen vertex
From that, the reverse attention predict its corresponding position in the target graph
Activation layer: Sigmoid
Overall structure of the model
There could be a lot of node mappings from the target to the source graphs.
But not all of them preserve the attributes on the graph’s nodes and edges.
This is to demonstrate the power of attribute regularization.
They are two identical graph, and the attention map should produce an Identity matrix
This is one application of DDGK, Hierarchical Clustering.
30 different graphs
Graphs are sampled from different data sets such as neural network structure, social network, network of common nouns and adjectives in a novel, and chemistry-related graph.
Experiment with different amount of sampling in the source graph set.
You can notice that the accuracy converges quickly from just 20% of the original size.
I also did my own experiment on this method
I implemented this model on Google Colab and measure the time taken to process graphs of different sizes.
SLaQ uses spectral analysis on graph, which relies on some linear algebra properties of graph. I have looked at this paper but it’s quite hard to understand.