Multiscale Laplacian Graph Kernel

The Multiscale Laplacian Graph Kernel
NIPS 2016
Risi Kondor
Dept. of CS and Statistics,
University of Chicago
Horace Pan
Dept. of CS,
3 citations

This Work Focus on Graph Kernels

Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
Q: which one of the following is ethanol?

C CH
H
H
H
H
O H C OH
H
H
H
H
C H
● local structures are critical
at specific position of the graph

phishingtrojan

discussion-basedQA-based
● global property roughly summarizes the graph

A Good Graph Kernel Should...
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (this work → X)

Related Work: Spectral Graph Kernels
and its relative position in the graph(?)
● Be efficient to compute(?)

Related Work: Local Graph Kernels
● Be efficient to compute (only WL-kernel)

Problem Formulation
● How to define a graph kernel that can take
structure into account at multiple scales?

Main Idea
● “Compare graphs by subgraphs” recursively
– two graph are compared by subgraphs
● two subgraphs are compared by smaller subgraphs
– … so on

Laplacian Graph Kernel
● Construct a Gaussian graphical model

G1 G2
● To compare two graph

G1 G2
Use Bhattacharyya kernel to compare distributions

G1 G2
Has closed-form for Gaussian distribution

● Laplacian graph kernel

– cannot compare graph with different size
– sensitive to vertex ordering
How to overcome it? [if node features are given]

● feature space Laplacian graph kernel
Multiscale Laplacian Graph Kernel

still captures only global structure,
how to compare structures at multiple scale?
[hint: node feature]

induced feature vectors by similarity scores
calculated on smaller neighborhood

● Generalized FLG
Then, construct vertex features by
calculate the joint Gram matrix

eigendecompose
● Generalized FLG

use as base kernel at larger scale comparison
● Generalized FLG
– captures neighborhood similarity of the 2 vertex
– Each entry in Gram matrix:

GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
GL(u) GL(v)
n1
+
n2

GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
...
G0(uuu) G0(vvv)
|G1
(uu)|
+
|G1
(vv)|
GL(u) GL(v)
n1
+
n2

Speedup
● Recursive approach compares the same
subgraph pairs multiple times

Speedup
● Can you speedup the computation?

Speedup
● Dynamic programming
– compute all 1st level pairs, then 2nd level to lookup
1st level pairs, … and so on
–

Speedup
–
– For dataset with M graph, require
● Still too slow, how to speedup further?

Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– eigendecompose Gram matrix to get basis vector

Speedup
● But, decompose the huge Gram matrix is costly
and the # of basis increases

Speedup
– sample only vertex and select basis vector

Speedup
● Time complexity for comparing M graphs

Speedup
– can speedup only when all graphs are known

Speedup
– large graph might require large sample size and
have to select more basis vector

Experiment
– setting
Time complexity for comparing M graphs

Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages

Conclusion
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible

Conclusion
● Nevermind, their solid theoretical contribution
makes up for its limited use in practice.
● However, it’s

The Multiscale Laplacian Graph Kernel
NIPS 2016
Risi Kondor
Dept. of CS and Statistics,
Horace Pan
Dept. of CS,
3 citations
graph kernel 做的事情是給定兩張圖，回傳一個
similarity score 表示這兩張圖有多相似，這篇宣稱他
們是第一個能作到 multiscale 地比較兩張圖在不同
scale 上的結構的 graph kernel ，但其實他們錯了
reviewer 也有指出來之前報過得 WL 就可以作到
multiscale 的比較。
不過因為方法很有趣，理論也很紮實所以還是上了
NIPS

C CH
H
H
H
H
O H C OH
H
H
H
H
C H
Q: which one of the following is ethanol?
為什麼比較兩張圖，會需要看不同 scale 呢？我舉高
中化學常常出的考題，請問以下兩個都是 C2H6O 的
化合物，哪一個是醇類呢？

C CH
H
H
H
H
O H C OH
H
H
H
H
C H
你可以只看 OH 鍵來分，或者是由醚類的氧分子是不
是處在對稱中心來分類。也就是說，想要比較兩張
圖來做比如圖的分類的話，你不只需要知道 local
structure 長相而且也可能要知道那個 local structure
在圖中的位置，比如 OH 是不是出現在尾端

discussion-basedQA-based
● global property roughly summarizes the graph
另外， global 看整個 network 粗略的特性也是很重要
的，比如：這是兩種不同類型的社群的討論情況，
中心點是 po 文的人，其他人回覆他的話就有個
link 。 QA-based 比如 stack overflow 的上問問題通
常可以得到精確的解答，但 discussion-based 比如
討論政治議題通常沒有正確解答，底下回文的人可
能彼此之間會吵起來，所以相比於 QA-based 會有
比較多的 interlink 。

A Good Graph Kernel Should...
● Be efficient to compute (this work → X)
一個好的 graph kernel ，應該要能夠 down 到 low-
level 的 local structure 的比較也能 scale up 到 high-
level roughly 的比較，並且在 scale up 的同時能考
慮相對位置的資訊。這篇號稱是第一個能達到以上
兩點的 paper 。最後是 practical 使用時不能太花時
間，這篇已經盡量快了但時間複雜度還是很高。

Related Work: Spectral Graph Kernels
and its relative position in the graph(?)
● Be efficient to compute(?)
而 graph kernel 可以分成兩大類一類是 spectral
kernel 就是對整張圖的 adjacancy matrix 或
Laplacian matrix 取 spectrum 。這類 kernel 只能抓
到很概略的比較像是 network 大致的形狀、有幾個
connected component 等 global 的特性，而不
sensitive to local structure 。

Related Work: Local Graph Kernels
● Be efficient to compute (only WL-kernel)
而另一類就是 bag of structure ，定義好幾種小的
structure ，然後統計每種 structure 在圖中出現幾
次。這個方法當然可以偵測 local structure ，但沒辦
法 scale up 知道這個 local structure 在 graph 中的
位置。而且除了 WL 以外其他 bag of structure
kernel 都跑得很慢。

Problem Formulation
● How to define a graph kernel that can take
structure into account at multiple scales?
要如何定義一個能對兩張圖在不同 scale 上的
structure 做比較的 kernel 呢？

Main Idea
● “Compare graphs by subgraphs” recursively
– two graph are compared by subgraphs
● two subgraphs are compared by smaller subgraphs
– … so on
主要想法是比較兩張 graph 有多相似，可以 down 到
比較這兩張 graph 的 subgraph 有多相似。那麼為何
不 recursive 的比較下去呢？兩個 graph 的
subgraph 之間的比較也可以再 down subgraph 的
subgraph 做比較。

他們 introduce 一個叫做 Laplcain graph kernel ，是將
一張圖用 Markov Random Field 表示，並且將 node
potential 以及 edge potential 定義成這樣的話，那麼
這些 random variable 的 join distribution

可以用位在原點而 covariance matrix 就是 inverse 的
Laplacian matrix 的 Gaussian distribution

G1 G2
所以想要比較兩張圖的話，我需要比較兩個
distribution

G1 G2
Use Bhattacharyya kernel to compare distributions
這篇使用一個叫 Bhattacharyya kernel 來比較兩個
distribution 算法是這樣

G1 G2
Has closed-form for Gaussian distribution
在 distribution 都是 Gaussian 可以寫成 closed-
form ，可是我們知道 Laplacian matrix 一定會有
eigenvalue 是 0 的解，所以分母項對 Laplacain
matrix 做 inverse 然後取 determine 的話

所以他們的作法是 regularize distribution
在 covariance matrix 多加上了常數倍的 identity
matrix 。那麼，這就是 Laplacian graph kernel

– cannot compare graph with different size
– sensitive to vertex ordering
How to overcome it? [if node features are given]
目前還無法作到 multiscale ，我們一步一步來改進這
個 kernel 。首先來解決一個問題，這個問題導致兩
個限制。第一個限制是要做比較的兩張圖一定要是
一樣大小，因為你看分子項，兩個 laplacian matrix
的 inverse dimension 要相同才能相加。第二這個
kernel 在 vertex reordering 下不是 invariant ，給
vertex 順序是為了寫成 adjacency matrix ，而將兩
個 vertex 的順序調換，但是對應的 laplacian matrix
就不一樣了。

在 Laplacian matrix 的 inverse 的前後跟 node feature
做矩陣乘法

still captures only global structure,
how to compare structures at multiple scale?
[hint: node feature]
那接下來要怎麼改進讓這個 kernel 可以考慮不同
scale 的結構呢？

induced feature vectors by similarity scores
calculated on smaller neighborhood
回憶一下 main idea ：比較 G1, G2 就是由 G1, G2 的
subgraph 做比較，那麼讓 vertex feature U1, U2 表
示的就是 subgraph 做比較的結果就行啦？不過
vertex feature 必須要跟 vertex 有關，我要怎麼樣讓
subgraph 比較結果跟對應的 vertex 連結呢？很直覺
的 subgraph 就會定義成 vertex 的 neighborhood 。

● Generalized FLG
Then, construct vertex features by
calculate the joint Gram matrix
作法是對兩張圖所有的 vertex ，計算一個 joint 的
Gram matrix ，裡面每個 entry 表示兩個 vertex
neighborhood 用另一個 kernel 計算出的相似度。

eigendecompose
● Generalized FLG
然後就可以來 construct vertex feature 了，由於
vertex feature 要 align 同個 space 上，所以我將
Gram matrix 做 eigenvalue decomposition ，令 Q
表示取前 top 大的幾個 eigenvalue 開根號後乘上對
應的 eigenvector 。那麼 G1 的 vertex feature 就是
Q 中對應的到那些 G1 vertex 的 column

use as base kernel at larger scale comparison
● Generalized FLG
– captures neighborhood similarity of the 2 vertex
– Each entry in Gram matrix:
那麼這個 kernel FLG kernel 就實現了這篇 paper 核心
概念，比較兩張圖 G1, G2 是由比較他們的
subgraph 的結果決定的，而且可以 recursive 的將
FLG kernel 本身用在 subgraph Gram matrix 的計算
上。那剩下的問題 multiscale 的每個 level 要比較多
大範圍的 neighborhood ？

G1 G2
那我來 top-down 的給個例子， MLG 是怎麼比較兩張
圖的。給定兩張圖 G1,G2

FLG kernel 計算這兩張圖的相似度，必須要先
construct vertex feature ，

n1
+
n2
來得到兩張圖中所有的 vertex 兩兩 neighborhood 的
相似度，那多大範圍的 neighborhood 呢？假如我取
到 L=2-hop 的 neighborhood

GL(u) GL(v)
n1
+
n2
所以這一層每一個 Gram matrix entry 是用 FLG
kernel 在 vertex 兩兩 2-hop 的 neighborhood 算出
的。那算每一對 2-hop neighborhood 的相似度

GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
GL(u) GL(v)
n1
+
n2
又必須先計算這兩個 2-hop neighborhood 中所有點兩
兩 vertex 在 1-hop neighborhood 相似度

GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
...
G0(uuu) G0(vvv)
|G1
(uu)|
+
|G1
(vv)|
GL(u) GL(v)
n1
+
n2
就這樣展開直到對底層比較兩個 vertex 就用最基本的
cosine similarity kernel 來算。那你想像這樣計算的
時間複雜度非常高，多高呢？

GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
...
G0(uuu) G0(vvv)
|G1
(uu)|
+
|G1
(vv)|
GL(u) GL(v)
n1
+
n2
假如 G1,G 分別有 n1 和 n2 個點，最上層總共會做 C
n1+n2 取 2 次 L-hop neighborhood 比較，而每一對
L-hop neighborhood 會算幾次呢？不是固定的但我
們可以來想 upper bound 情況是可能會包含到所有
點，比如圖中任何兩個點之間都有 edge 話，那 L-
hop neighborhood 會包含到所有點，接下來到最底
層也是，所以整體複雜度是 n 的 2L+2 次方，超慢，
實務上根本無法使用

Speedup
有沒有辦法可以做加速呢？ top-down recursive 算法
顯然存在很多重複的計算，比如這兩次的 kernel
evaluation ，紅色上下兩張圖是不同點的 2-hop
neighborhood

Speedup
● Can you speedup the computation?
但他們有很多重複的點，往下展開的計算就會重複，
對於這種問題，一個有好好學過 algorithm 人會有
sense 知道怎麼加速。

Speedup
–
沒錯！就是 dynamic programming ，我從底一層算
起，每次層直接算所有點兩兩 neighborhood 的相似
度，然後慢慢算到 L 層。總共需要的 kernel
evaluation 降到 Ln^2 次，而每一次 kernel
evaluation 必須要 decompose inverse Laplacian
matrix 所以需要 n^3 次。整體複雜度是 Ln^5

Speedup
–
– For dataset with M graph, require
● Still too slow, how to speedup further?
而且只計算了兩張圖而已，假如我的 dataset 有 M 張
圖，複雜度是 LM^2n^5 ，以 graph kernel 來說複雜
度還是非常高，平均圖有 100 個點的話複雜度就超
過了 10^10 要算幾個小時。所以，還有沒有辦法做
加速呢？仔細想想還有重複做的東西。

Speedup
他們的加速你聽起來會笑，他們想在每個 level 的比
較，一次 construct 出所有圖所有點的 feature ，所
以算了一個超級大的 Gram matrix ，包含了 dataset
中所有圖的點兩兩之間的相似度。

Speedup
● But, decompose the huge Gram matrix is costly
and the # of basis increases
但 dataset 量一增大不用增到多大 1000 張圖好了，這
個 Gram matrix 就已經放不下 memory 了，別說還
要做 eigenvalue decomposition ，比起一次只
construct 兩張圖的 vertex feature ，這個超大的
Gram matrix 包含了更多的 eigenvector ，效果和每
次兩兩比較是不一樣的。
可是他們的複雜度實在是太誇張了，他們不管這個問
題，先把複雜度降低一點再說。
你們有想法嗎？

Speedup
他們作法就是我只 sample N 個點，然後只取到前 P
大 eigenvalue 對應的 eigenvector 當作 basis ，然
後一次將所有點 project 到這 P 個 basis spanned 的
空間上。

Speedup
最後複雜度是這樣， dataset 大小乘上 L 層再乘上
sample 量的三次方或 basis 數的三次方，總之對於
有 1000 個圖的 dataset sample 量取 100 可能就要
算幾個小時了。

Speedup
而且還有幾個問題，第一個是他們這種加速，因為要
一次 construct 所有圖的 feature ，也就是說當我想
用在沒看過得圖的比較時，這個加速就沒辦法用
了，因為 dataset 中原本已經 construct 好的圖的
vertex feature 要重算。

Speedup
– large graph might require large sample size and
have to select more basis vector
再來第二點，顯而易見的不 scale 阿，對大一點的圖
來說，加速用的 sample 量要提高而且 basis 也會比
較多。

Experiment
– setting
Time complexity for comparing M graphs
實驗姑且看一下，在小小的 benchmark dataset 的，
用了加速方法取 10 個 basis ，幾百個 sample 的
performance ，我很想看它的計算時間但他們沒放。

Conclusion
確實是個 novel 寫起來千迴百轉看得很過癮的方法，
只有 9 頁對他們來說太少了，他們 present 時一張
圖都沒有，我花了點時間才參透這個方法。

Conclusion
● However, it’s
但是呢，複雜度太高了實務上很難用阿。而且要計算
兩兩 vertex 相似度他們還需要額外的 node
feature ，而我們不用。再來是當要計算的圖是沒看
過的，他們所需的計算時間又升回 node 數的 5 次
方。

Conclusion
● Nevermind, their solid theoretical contribution
makes up for its limited use in practice.
● However, it’s
但都沒關係， machine learning community 欣賞的就
是他們這樣理論很紮實的方法，這些實務上的缺
陷，甚至他們在 related work 介紹時還講錯都可以
原諒。

Multiscale Laplacian Graph Kernel

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to Multiscale Laplacian Graph Kernel

Similar to Multiscale Laplacian Graph Kernel (20)

More from Ruochun Tzeng

More from Ruochun Tzeng (10)

Recently uploaded

Recently uploaded (20)

Multiscale Laplacian Graph Kernel