"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding
1. Scalable Global Alignment Graph Kernel Using
Random Features: From Node Embedding to
Graph Embedding
KDD2019
Lingfei Wu, Ian En-Hsu Yen, Zhen Zhang †, Kun Xu, Liang Zhao, Xi
Peng, Yinglong Xia, Charu Aggarwal
Presenter: Hagawa, Nishi, Eugene
2019.11.11
1 / 35
2. Problem Setup
Goal:
▶ Create a good kernel to measure Graph similarity
▶ Less computational complexity
▶ Take into account global and local graph property
▶ Have positive definite
▶ Leads to good classifier
Application:
▶ Kernel SVM (input: graph,
output: binary)
▶ Kernel PCA
▶ Kernel Ridge Regression
▶ . . .
How similar?
𝑘( ) = 0.5,
2 / 35
3. Difficulty : Graph isomorphism
difficulty to define similarity between graphs
▶ 2 graphs : G1(V1, E1, ℓ1, L1), G2(V2, E2, ℓ2, L2)
▶ Bijection1 f exists, if and only if, G1 is isomorphism with G2
▶ Bijection f : V1 → V2 s.t {va, vb} ∈ E1, va and vb are adjacent.
▶ Partial isomorphism is NP-complete
1
全単射
3 / 35
4. Related Work
2 groups of recent graph kernel method
Comparing sub-structure:
▶ The major difference is how to define and explore sub-structures
- random walks, shortest paths, cycles, subtree patterns, graphlets...
Geometric node embeddings:
▶ Capture global property
▶ Achieved state-of-the-art performance in the graph classification task
Bad points of related works
Comparing sub-structure:
▶ Do not take into account the global property
Geometric node embeddings:
▶ Do not necessarily use positive definite for Kernel
Poor scalability:
4 / 35
5. Contribution
▶ Propose a Positive definite Kernel
▶ Reduce computational complexity
▶ From quadratic to (quasi-)linear 2
▶ Propose an approximation of the kernel with convergence analysis
▶ Take into account global property
▶ Outperforms 12 state-of-the-art graph classification algorithms
- Include graph kernels, deep graph neural networks
2
quasi-linear : n log n. Time and Space.
5 / 35
6. Common kernel
Compare directly 2 graphs using kernel
Similarity
𝒌(・, ・)
Figure: calculation of kernel value between 2 graphs
6 / 35
7. Proposed kernel
Compare directly 2 graphs using kernel
Similarity
𝒌(・, ・)
Random Graphs
Similarity
with 𝒌(・, ・)
Figure: calculation of kernel value between 2 graphs
7 / 35
8. Notation : Graph definition
Graph: G = (V , E, ℓ)
Node: V = {vi }n
i=1
Edge: E = (V × V )
Assign label function: ℓ : V → Σ
Size of node: n
# of edge: m
Node label: l
# of graphs: N
G<latexit sha1_base64="QLLEFqFGXJzmcwbhRTcNSo8/+r8=">AAAB6HicbVDLSsNAFJ3UV62vqks3g0VwVRItPnZFF7pswT6gDWUyvWnHTiZhZiKU0C9w40IRt36SO//GSRpErQcuHM65l3vv8SLOlLbtT6uwtLyyulZcL21sbm3vlHf32iqMJYUWDXkoux5RwJmAlmaaQzeSQAKPQ8ebXKd+5wGkYqG409MI3ICMBPMZJdpIzZtBuWJX7Qx4kTg5qaAcjUH5oz8MaRyA0JQTpXqOHWk3IVIzymFW6scKIkInZAQ9QwUJQLlJdugMHxlliP1QmhIaZ+rPiYQESk0Dz3QGRI/VXy8V//N6sfYv3ISJKNYg6HyRH3OsQ5x+jYdMAtV8agihkplbMR0TSag22ZSyEC5TnH2/vEjaJ1XntFpr1ir1qzyOIjpAh+gYOegc1dEtaqAWogjQI3pGL9a99WS9Wm/z1oKVz+yjX7DevwC1D40D</latexit>
v1<latexit sha1_base64="6r48FeRijmeRwM0ce/9YOgxnVX0=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0+HErevFYwbSFNpTNdtMu3WzC7qZQQn+DFw+KePUHefPfuEmDqPXBwOO9GWbm+TFnStv2p1VaWV1b3yhvVra2d3b3qvsHbRUlklCXRDySXR8rypmgrmaa024sKQ59Tjv+5DbzO1MqFYvEg57F1AvxSLCAEayN5E4HqTMfVGt23c6BlolTkBoUaA2qH/1hRJKQCk04Vqrn2LH2Uiw1I5zOK/1E0RiTCR7RnqECh1R5aX7sHJ0YZYiCSJoSGuXqz4kUh0rNQt90hliP1V8vE//zeokOrryUiTjRVJDFoiDhSEco+xwNmaRE85khmEhmbkVkjCUm2uRTyUO4znDx/fIyaZ/VnfN6475Ra94UcZThCI7hFBy4hCbcQQtcIMDgEZ7hxRLWk/VqvS1aS1Yxcwi/YL1/AeZQjuI=</latexit>
v2<latexit sha1_base64="HvFip7AjDkPR91+3+J6CugKM0SQ=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KkktftyKXjxWMG2hDWWz3bRLN5uwuymU0N/gxYMiXv1B3vw3btIgan0w8Hhvhpl5fsyZ0rb9aa2srq1vbJa2yts7u3v7lYPDtooSSahLIh7Jro8V5UxQVzPNaTeWFIc+px1/cpv5nSmVikXiQc9i6oV4JFjACNZGcqeDtD4fVKp2zc6BlolTkCoUaA0qH/1hRJKQCk04Vqrn2LH2Uiw1I5zOy/1E0RiTCR7RnqECh1R5aX7sHJ0aZYiCSJoSGuXqz4kUh0rNQt90hliP1V8vE//zeokOrryUiTjRVJDFoiDhSEco+xwNmaRE85khmEhmbkVkjCUm2uRTzkO4znDx/fIyaddrznmtcd+oNm+KOEpwDCdwBg5cQhPuoAUuEGDwCM/wYgnryXq13hatK1YxcwS/YL1/AefVjuM=</latexit>
v3<latexit sha1_base64="+XpoULfOHqCHvyZwfk/DV8G7sg0=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0+HErevFYwbSFNpTNdtMu3WzC7qZQQn+DFw+KePUHefPfuEmDqPXBwOO9GWbm+TFnStv2p1VaWV1b3yhvVra2d3b3qvsHbRUlklCXRDySXR8rypmgrmaa024sKQ59Tjv+5DbzO1MqFYvEg57F1AvxSLCAEayN5E4H6fl8UK3ZdTsHWiZOQWpQoDWofvSHEUlCKjThWKmeY8faS7HUjHA6r/QTRWNMJnhEe4YKHFLlpfmxc3RilCEKImlKaJSrPydSHCo1C33TGWI9Vn+9TPzP6yU6uPJSJuJEU0EWi4KEIx2h7HM0ZJISzWeGYCKZuRWRMZaYaJNPJQ/hOsPF98vLpH1Wd87rjftGrXlTxFGGIziGU3DgEppwBy1wgQCDR3iGF0tYT9ar9bZoLVnFzCH8gvX+BelajuQ=</latexit>
V = {v1, v2, v3}<latexit sha1_base64="5/LCIMtGZ5h5wVQCMmTMg/5dkCc=">AAAB/3icbVDLSsNAFJ34rPUVFdy4GSyCCylJW3wshKIblxXsA5oQJtNJO3QyCTOTQold+CtuXCji1t9w5984SYuo9cBcDufcy71z/JhRqSzr01hYXFpeWS2sFdc3Nre2zZ3dlowSgUkTRywSHR9JwignTUUVI51YEBT6jLT94XXmt0dESBrxOzWOiRuiPqcBxUhpyTP3W/ASOikcefaJLpWsVJ2JZ5asspUDzhN7RkpghoZnfji9CCch4QozJGXXtmLlpkgoihmZFJ1EkhjhIeqTrqYchUS6aX7/BB5ppQeDSOjHFczVnxMpCqUch77uDJEayL9eJv7ndRMVnLsp5XGiCMfTRUHCoIpgFgbsUUGwYmNNEBZU3wrxAAmElY6smIdwkeH0+8vzpFUp29Vy7bZWql/N4iiAA3AIjoENzkAd3IAGaAIM7sEjeAYvxoPxZLwab9PWBWM2swd+wXj/AprvlA8=</latexit>
⌃ = { , }<latexit sha1_base64="ZY89SR6jHBd25PoJ2nDrWsihEs4=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0VwISXR4mMhFN24rGgf0IQymU7aoTNJmJkINbT+ihsXirj1P9z5N07SIGo9cOFwzr3ce48XMSqVZX0aM7Nz8wuLhaXi8srq2rq5sdmQYSwwqeOQhaLlIUkYDUhdUcVIKxIEcY+Rpje4TP3mHRGShsGtGkbE5agXUJ9ipLTUMbedG9rjCJ5DJxmPxwe6nFHHLFllKwOcJnZOSiBHrWN+ON0Qx5wECjMkZdu2IuUmSCiKGRkVnViSCOEB6pG2pgHiRLpJdv0I7mmlC/1Q6AoUzNSfEwniUg65pzs5Un3510vF/7x2rPxTN6FBFCsS4MkiP2ZQhTCNAnapIFixoSYIC6pvhbiPBMJKB1bMQjhLcfz98jRpHJbto3LlulKqXuRxFMAO2AX7wAYnoAquQA3UAQb34BE8gxfjwXgyXo23SeuMkc9sgV8w3r8ASdWVRQ==</latexit>
8 / 35
9. Notation
Set of graphs: G = {Gi }N
i=1
Set of graph lebels: Y = {Yi }N
i=1
Set of geometric embeddings (each graph): U = {ui }n
i=1 ∈ Rn×d
Latent node embedding space (each node): u ∈ Rd
𝐺" ・・・
𝑁
𝑛Latent node ↑
embedding
Node size→
# of graphs→
𝑌"Graph label→ 𝑌&
𝐺&
u1 2 Rd
<latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit>
u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit>
u3<latexit sha1_base64="w2BS8kqWqIp26xG7B4vB81cBpaY=">AAAB+XicbVDLSsNAFJ3UV62vqEs3g0VwVRItPnZFNy4r2Ae0IUwmk3boZCbMTAol9E/cuFDErX/izr9xkgZR64GBwzn3cs+cIGFUacf5tCorq2vrG9XN2tb2zu6evX/QVSKVmHSwYEL2A6QIo5x0NNWM9BNJUBww0gsmt7nfmxKpqOAPepYQL0YjTiOKkTaSb9vDQLAwRnqcpXM/O5/7dt1pOAXgMnFLUgcl2r79MQwFTmPCNWZIqYHrJNrLkNQUMzKvDVNFEoQnaEQGhnIUE+VlRfI5PDFKCCMhzeMaFurPjQzFSs3iwEzmGdVfLxf/8wapjq68jPIk1YTjxaEoZVALmNcAQyoJ1mxmCMKSmqwQj5FEWJuyakUJ1zkuvr+8TLpnDfe80bxv1ls3ZR1VcASOwSlwwSVogTvQBh2AwRQ8gmfwYmXWk/VqvS1GK1a5cwh+wXr/AiT3lCE=</latexit>
9 / 35
10. Geometric Embeddings
Use partial eigendecomposition 3 to extract node embeddings:
1. Create normalized Laplacian matrix L ∈ Rn×n
2. Do partial eigendecomposition and obtaining U
3. Use the smallest d eigenvectors
Normalized
Laplacian matrix
→
Partial
Eigendecomposition
𝑈Λ𝑈#
𝑛×𝑑 𝑑×𝑑 𝑑×𝑛
The smallest 𝑑 eigenvectors
L𝑛×𝑛
A
B C
A B C
A 0 1 1
B 1 0 0
C 1 0 0
Adjacency matrix
A B C
A 2 0 0
B 0 1 0
C 0 0 1
Degree matrix
A B C
A 2 -1 -1
B -1 1 0
C -1 0 1
Laplacian matrix
-=
Normalize
Figure: Example obtaining U
3
Time complexity: Linear (# of graph edge) (...I don’t know how.)
10 / 35
11. Transportation Distance [1]
Earth Mover’s Distance (EMD): measure of dissimilarity
EMD (Gx , Gy ) := min
T ∈R
nx ×ny
+
⟨D, T ⟩
s.t.T 1 = t(Gx )
, T T
1 = t(Gy )
▶ Linear programming problem
▶ Flow matrix T
- Tij : how much of vi in Gx travels to vj in Gy
▶ GX → UX = {ux
1, ux
2, · · · , ux
nx
}
▶ GY → UY = {uy
1, uy
2, · · · , uy
ny }
▶ Transport cost matrix D
- Dij = ∥ux
i − uy
j ∥2
11 / 35
12. Transportation Distance [1]
Earth Mover’s Distance (EMD): measure of dissimilarity
EMD (Gx , Gy ) := min
T ∈R
nx ×ny
+
⟨D, T ⟩
s.t.T 1 = t(Gx )
, T T
1 = t(Gy )
▶ Node vi has ci outgoing edges
▶ Normalized bog-of-words (nBOW): ti = ci /
∑n
j=1 cj ∈ R
12 / 35
14. Straightforward way to define kernel, It’s high cost
EDM based Kernel = −
1
2
JDemd J
J = I −
1
N
11⊤
▶ Not necessarily positive definite
▶ Time complexity:O(N2n3log(n)), Space complexity :O(N2)
Graph A Graph CGraph B
A B C
A EMD(A,A) EMD(A,B) EMD(A,C)
B EMD(B,A) EMD(B,B) EMD(B,C)
C EMD(C,A) EMD(C,B) EMD(C,C)
Distance
Matrix
Demd<latexit sha1_base64="/2TlWLrs+SXNjZgq6EbvOI79l2o=">AAAB7nicbVDLSsNAFJ3UV62vqks3g0VwVRItPnZFXbisYB/QhjKZ3LRDJ5MwMxFK6Ee4caGIW7/HnX/jJA2i1gMXDufcy733eDFnStv2p1VaWl5ZXSuvVzY2t7Z3qrt7HRUlkkKbRjySPY8o4ExAWzPNoRdLIKHHoetNrjO/+wBSsUjc62kMbkhGggWMEm2k7s0whdCfDas1u27nwIvEKUgNFWgNqx8DP6JJCEJTTpTqO3as3ZRIzSiHWWWQKIgJnZAR9A0VJATlpvm5M3xkFB8HkTQlNM7VnxMpCZWahp7pDIkeq79eJv7n9RMdXLgpE3GiQdD5oiDhWEc4+x37TALVfGoIoZKZWzEdE0moNglV8hAuM5x9v7xIOid157TeuGvUmldFHGV0gA7RMXLQOWqiW9RCbUTRBD2iZ/RixdaT9Wq9zVtLVjGzj37Bev8Cc8SPyQ==</latexit>
Figure: Straightforward kernel based on EMD
14 / 35
15. Global Alignment Graph Kernel
Using EMD and Random feature (RF)
Proposed Kernel: 4
k (Gx , Gy ) :=
∫
p (Gω) ϕGω (Gx ) dGω
where ϕGω
:= exp (−γEMD(Gx , Gω))
▶ Gω : random graph
▶ W = {wi }D
i=1
▶ wi is sampled from V ∈ Rd
▶ p(Gω) is a distribution over the space of all random graphs of variable
sizes Ω := ∪Dmax
D=1VD
4
ランダムグラフの詳細に踏み込もうと思ったが, 非常に込み入った話でハガワは挫折
した. 気になる方はこちらを参照. 確率なんもワカンネ.
15 / 35
16. Global Alignment Graph Kernel Using EMD and RF
Approximation5:
˜k (Gx , Gy ) =
1
R
R∑
i=1
ϕGωi (Gx ) ϕGωi (Gy )
→ k (Gx , Gy ) , as R → ∞
𝜙"#
(𝐺&)
Random Graphs
𝐺(
𝐺&
𝜙"#
(𝐺))
𝐺)
5
the uniform convergence of approximate proposed kernel
16 / 35
17. Algorithm
Set Data and hyperparameters
▶ Node embedding size (dimension): d
▶ Max size of random graphs: Dmax
▶ Graph embedding size: R
𝜙"#
(𝐺&)
Random Graphs
𝐺(
𝐺&
𝜙"#
(𝐺))
𝐺)
DataGraphs
𝑅𝐷,-&
𝑑
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
17 / 35
18. Compute {t(Gt )}N
i=1 and Laplacian matrix L
A
B C
A B C
A 2 -1 -1
B -1 1 0
C -1 0 1
Laplacian matrix
𝒕(𝑮 𝒙)
½
¼
¼
→For All Graphs
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
18 / 35
19. Obtain node embedding vectors
Normalized
Laplacian matrix
→
Partial
Eigendecomposition
𝑈Λ𝑈#
𝑛×𝑑 𝑑×𝑑 𝑑×𝑛
The smallest 𝑑 eigenvectors
L𝑛×𝑛
→For All Graphs
𝐺"
u1 2 Rd
<latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit>
u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit>
u3<latexit sha1_base64="w2BS8kqWqIp26xG7B4vB81cBpaY=">AAAB+XicbVDLSsNAFJ3UV62vqEs3g0VwVRItPnZFNy4r2Ae0IUwmk3boZCbMTAol9E/cuFDErX/izr9xkgZR64GBwzn3cs+cIGFUacf5tCorq2vrG9XN2tb2zu6evX/QVSKVmHSwYEL2A6QIo5x0NNWM9BNJUBww0gsmt7nfmxKpqOAPepYQL0YjTiOKkTaSb9vDQLAwRnqcpXM/O5/7dt1pOAXgMnFLUgcl2r79MQwFTmPCNWZIqYHrJNrLkNQUMzKvDVNFEoQnaEQGhnIUE+VlRfI5PDFKCCMhzeMaFurPjQzFSs3iwEzmGdVfLxf/8wapjq68jPIk1YTjxaEoZVALmNcAQyoJ1mxmCMKSmqwQj5FEWJuyakUJ1zkuvr+8TLpnDfe80bxv1ls3ZR1VcASOwSlwwSVogTvQBh2AwRQ8gmfwYmXWk/VqvS1GK1a5cwh+wXr/AiT3lCE=</latexit>
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
19 / 35
20. Generate random graph 6
2 ← 𝑅𝑎𝑛𝑑(1, 𝐷,-.)
𝑈2×𝑑
← Generate_𝑟𝑎𝑛𝑑𝑜𝑚_𝑔𝑟𝑎𝑝ℎ (2, 𝑑)
u1 2 Rd
<latexit sha1_base64="FiX+xGGr4lrH54q+qBxUWlkIUrA=">AAACDnicbVDLSsNAFJ3UV62vqEs3g6XgqiRafOyKblxWsQ9oYphMJu3QySTMTIQS8gVu/BU3LhRx69qdf2OSBlHrgQuHc+7l3nvciFGpDONTqywsLi2vVFdra+sbm1v69k5PhrHApItDFoqBiyRhlJOuooqRQSQIClxG+u7kIvf7d0RIGvIbNY2IHaARpz7FSGWSozcsN2RegNQ4iVMnMVNoUQ6tXBBBcp3eJl4KoaPXjaZRAM4TsyR1UKLj6B+WF+I4IFxhhqQcmkak7AQJRTEjac2KJYkQnqARGWaUo4BIOyneSWEjUzzohyIrrmCh/pxIUCDlNHCzzvxO+dfLxf+8Yaz8UzuhPIoV4Xi2yI8ZVCHMs4EeFQQrNs0IwoJmt0I8RgJhlSVYK0I4y3H8/fI86R02zaNm66pVb5+XcVTBHtgHB8AEJ6ANLkEHdAEG9+ARPIMX7UF70l61t1lrRStndsEvaO9fjuucjQ==</latexit>
u2<latexit sha1_base64="Pm48/PPv93nEVMYQDi7yld7eDYw=">AAAB+XicbVDLSsNAFJ34rPUVdelmsAiuSlKLj13RjcsK9gFtCJPJpB06mQkzk0IJ/RM3LhRx65+482+cpEHUemDgcM693DMnSBhV2nE+rZXVtfWNzcpWdXtnd2/fPjjsKpFKTDpYMCH7AVKEUU46mmpG+okkKA4Y6QWT29zvTYlUVPAHPUuIF6MRpxHFSBvJt+1hIFgYIz3O0rmfNea+XXPqTgG4TNyS1ECJtm9/DEOB05hwjRlSauA6ifYyJDXFjMyrw1SRBOEJGpGBoRzFRHlZkXwOT40SwkhI87iGhfpzI0OxUrM4MJN5RvXXy8X/vEGqoysvozxJNeF4cShKGdQC5jXAkEqCNZsZgrCkJivEYyQR1qasalHCdY6L7y8vk26j7p7Xm/fNWuumrKMCjsEJOAMuuAQtcAfaoAMwmIJH8AxerMx6sl6tt8XoilXuHIFfsN6/ACNylCA=</latexit>
Figure: 2 nodes random graph
example
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
6
In after section, I show 2 way to generate random graphs.
20 / 35
21. Compute a feature veotor Zj
𝜙"#
(𝐺&)
𝑧)=
𝑍)+
⋮
⋮
⋮
𝑍)-
Zji = ϕGω (Gi ) := exp (−γ EMD (Gi , Gω))
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
21 / 35
22. Generate random graph for R times
𝑧"=
𝑍""
⋮
⋮
⋮
𝑍"%
𝑧&=
𝑍&"
⋮
⋮
⋮
𝑍&%
𝜙()
(𝐺,) 𝜙()
(𝐺,)
⋯
𝜙()
(𝐺,)
𝑧/=
𝑍/"
⋮
⋮
⋮
𝑍/%
⋯
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
7
7
R : number of Random graphs
22 / 35
23. Output N × R Matrix Z
𝑍=
"
√$
𝑍""
⋮
𝑍"&
⋮
𝑍"'
⋯
⋯
𝑍$"
⋮
𝑍$&
⋮
𝑍$'
𝑍"&
𝑍*&
𝑍*'
⋯
⋯
Algorithm 1 Random Graph Embedding
Input: Data graphs {Gi }N
i=1, node embedding size d, maximum
size of random graphs Dmax , graph embedding size R.
Output: Feature matrix ZN ⇥R for data graphs
1: Compute nBOW weights vectors {t(Gi )}N
i=1 of the normalized
Laplacian L of all graphs
2: Obtain node embedding vectors {ui }n
i=1 by computing d small-
est eigenvectors of L
3: for j = 1, . . . ,R do
4: Draw Dj uniformly from [1, Dmax ].
5: Generate a random graph G j with Dj number of nodes
embeddings W from Algorithm 2.
6: Compute a feature vector Zj = G j
({Gi }N
i=1)) using EMD
or other optimal transportation distance in Equation (3).
7: end for
8: Return feature matrix Z({Gi }N
i=1) = 1p
R
{Zi }R
i=1
23 / 35
24. How to generate Random Graph
Data-independent and Data-dependent Distributions
Data-dependent 8
Random Graph Embedding(Anchor Sug-Graphs(ASG)):
1. Pick up Gk from data set
2. Uniformly draw Dj nodes
3. {wi }
Dj
i=1 = {un1 , un1 , · · · , unDj
}
Incorporating Label information:
▶ d(ui , uj) = max(∥ui − uj∥2,
√
d) if vi and vj have diffrent node label
▶ Make distance between different node labels
▶
√
d is largest distance in a d-dimentionnal unit hypercube space
8
data independent は appendix を参照
24 / 35
25. Complexity comparison (Left: Proposed, Right: Straightforward)
𝜙"#
(𝐺&)
Random Graphs
𝐺(
𝐺&
𝜙"#
(𝐺))
𝐺)
Figure: Proposed kernel
Graph A Graph CGraph B
A B C
A EMD(A,A) EMD(A,B) EMD(A,C)
B EMD(B,A) EMD(B,B) EMD(B,C)
C EMD(C,A) EMD(C,B) EMD(C,C)
Distance
Matrix
Figure: Straitforward kernel
Time complexity (dmz is partial eigendecomposition cost) 9:
▶ O(NRD2nlog(n) + dmz) ▶ O(N2n3log(n) + dmz)
※ R is # of Random Graphs, D is # of Random Graph nodes (D < n)
Space complexity:
▶ O(NR) ▶ O(N2)
9
dmz is eigendecomposition cost.
25 / 35
26. Experiments
Experimental setup
Machine:
▶ Use linear SVM (LIBLINEAR)
Data:
▶ 9 Datasets
Hyperparameters:
▶ γ(Kernel)→[1e-3 1e-2 1e-1 1 10]
▶ D max (Size of random graph)→[3:3:30]
▶ SVM
Evaluation:
▶ 10-fold cross-validation
▶ 10 times average accuracy
26 / 35
27. # of Random Graph (R) and Testing accuracy:
10
0
10
1
10
2
10
3
10
4
Varying R
15
20
25
30
35
40
45
50
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(a) ENZYMES
10
0
10
1
10
2
10
3
10
4
Varying R
62
64
66
68
70
72
74
76
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(b) NCI109
10
0
10
1
10
2
10
3
10
4
Varying R
55
60
65
70
75
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
(c) IMDBBINARY
10
0
10
1
10
2
10
3
10
4
Varying R
55
60
65
70
75
80
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
(d) COLLAB
0 500 1000 1500 2000 2500
Varying R
0
10
20
30
40
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(e) ENZYMES
0 1000 2000 3000 4000 5000
Varying R
0
100
200
300
400
500
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(f) NCI109
0 1000 2000 3000 4000 5000
Varying R
0
20
40
60
80
100
120
140
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
(g) IMDBBINARY
0 1000 2000 3000 4000 5000
Varying R
0
500
1000
1500
2000
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
(h) COLLAB
Figure 2: Test accuracies and runtime of three variants of RGE with and without node labels when varying R.
▶ Converge very rapidly when increasing R
# of Random Graph (R) and Runtime:
10
0
10
1
10
2
10
3
10
4
Varying R
15
20
25
30
35
40
45
50
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(a) ENZYMES
10
0
10
1
10
2
10
3
10
4
Varying R
62
64
66
68
70
72
74
76
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(b) NCI109
10
0
10
1
10
2
10
3
10
4
Varying R
55
60
65
70
75
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
(c) IMDBBINARY
10
0
10
1
10
2
10
3
10
4
Varying R
55
60
65
70
75
80
TestingAccuracy%
Testing Accuracy VS R
RGE(RF)
RGE(ASG)
(d) COLLAB
0 500 1000 1500 2000 2500
Varying R
0
10
20
30
40
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(e) ENZYMES
0 1000 2000 3000 4000 5000
Varying R
0
100
200
300
400
500
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
RGE(ASG)-NodeLab
(f) NCI109
0 1000 2000 3000 4000 5000
Varying R
0
20
40
60
80
100
120
140
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
(g) IMDBBINARY
0 1000 2000 3000 4000 5000
Varying R
0
500
1000
1500
2000
Runtime(Seconds)
Total Runtime VS R
RGE(RF)
RGE(ASG)
(h) COLLAB
Figure 2: Test accuracies and runtime of three variants of RGE with and without node labels when varying R.
▶ Show quasi-linear scalability with respect to R
27 / 35
28. 10
2
10
3
10
4
Varying number of graphs N
10
-2
10
0
10
2
10
4
10
6
10
8
Time(Seconds)
Runtime VS number of graphs N
RGE(Eigentime)
RGE(FeaGentime)
RGE(Runtime)
Linear
Quatratic
(a) Number of graphs N
10
2
10
3
Varying size of graph n
10
0
10
1
10
2
10
3
10
4
10
5
Time(Seconds)
Runtime VS size of graph n
RGE(Eigentime)
RGE(FeaGentime)
RGE(Runtime)
Linear
Quatratic
(b) Size of graph n
▶ shows the linear scalability with respect to N (a)
▶ shows the quasi-liniear scalability with respect to n (b)
28 / 35
32. Appendix I
▶ グラフが同型ならば, 隣接行列の固有値は一致するが, 逆は成り立た
ない
Normalized Laplacian Matrix:
Li,j :=
1 if i = j and deg (vi ) ̸= 0
− 1√
deg(vi ) deg(vj )
if i ̸= j and vi is adjacent to vj
0 otherwise.
deg(v): Degree of node (vertex) v
32 / 35
34. Appendix III Table 4: Properties of the datasets.
Dataset MUTAG PTC ENZYMES PROTEINS NCI1 NCI109 IMDB-B IMDB-M COLLAB
Max # Nodes 28 109 126 620 111 111 136 89 492
Min # Nodes 10 2 2 4 3 4 12 7 32
Ave # Nodes 17.9 25.6 32.6 39.05 29.9 29.7 19.77 13.0 74.49
Max # Edges 33 108 149 1049 119 119 1249 1467 40119
Min # Edges 10 1 1 5 2 3 26 12 60
Ave # Edges 19.8 26.0 62.1 72.81 32.3 32.1 96.53 65.93 2457.34
# Graph 188 344 600 1113 4110 4127 1000 1500 5000
# Graph Labels 2 2 6 2 2 2 2 3 3
# Node Labels 7 19 3 3 37 38 — — —
wice the number of nodes in a graph. We use the size of
ding d = 6 just like in the previous sections. We set the
eters related to RGE itself are DMax = 10 and R = 128.
e runtime for computing node embeddings using state-
gensolver [33, 40] and RGE graph embeddings, and the
me, respectively.
ditional Results and Discussions on
mparisons Against All Baselines
e RGE is a graph embedding, we directly employ a lin-
plemented in LIBLIBNEAR [7] since it can faithfully
eectiveness of our feature representation from the
nonlinear learning solvers. Following the convention
experiments ten times (thus 100 runs per dataset) an
average prediction accuracies and standard deviations
of hyperparameters and D_max are [1e-3 1e-2 1e
[3:3:30], respectively. All parameters of the SVM and
eters of our method were optimized only on the train
The node embedding size is set to either 4, 6 or 8 bu
the same number for all variants of RGE on the same
eliminate the random eects, we repeat the whole exp
times and report the average prediction accuracies a
deviations. For all baselines we take the best number
the papers except EMD, where we rerun the experim
comparisons in terms of both accuracy and runtime. Sin
EMD, and PM are essentially built on the same node
Terms
WL test:
▶ Technique to improve kernel with node labels
RGE(ASG)-NodeLab:
▶ Data-dependent random graph + Incorporating Label information
WL-RGE:
▶ Data-dependent random graph + WL test
34 / 35
35. 引用 I
Giannis Nikolentzos, Polykarpos Meladianos, and Michalis
Vazirgiannis.
Matching node embeddings for graph similarity.
In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
35 / 35