The paper proposes the Multiscale Laplacian Graph Kernel, which aims to compare graphs at multiple scales by recursively comparing subgraphs. It constructs a Gaussian graphical model using the Laplacian matrix to represent graphs. To compare two graphs, it uses the Bhattacharyya kernel to compare the Gaussian distributions defined by the regularized inverse Laplacian matrices. However, this kernel only captures global structure and cannot compare graphs of different sizes or those with different vertex orderings. The paper then introduces the feature space Laplacian graph kernel which multiplies node features to the inverse Laplacian matrices, addressing these issues but still only capturing global structure. It aims to enable multiscale comparisons by inducing features based on neighborhood similarity scores at different scales.
1. The Multiscale Laplacian Graph Kernel
NIPS 2016
Risi Kondor
Dept. of CS and Statistics,
University of Chicago
Horace Pan
Dept. of CS,
University of Chicago
3 citations
3. Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
Q: which one of the following is ethanol?
4. Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
● local structures are critical
at specific position of the graph
5. Why Compare Graphs At Multiscale?
phishingtrojan
at specific position of the graph
● local structures are critical
6. Why Compare Graphs At Multiscale?
discussion-basedQA-based
● global property roughly summarizes the graph
at specific position of the graph
● local structures are critical
7. A Good Graph Kernel Should...
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (this work → X)
8. Related Work: Spectral Graph Kernels
● Be able to detect local structures
and its relative position in the graph(?)
● Be able to capture global property of the graph
● Be efficient to compute(?)
9. Related Work: Local Graph Kernels
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (only WL-kernel)
10. Problem Formulation
● How to define a graph kernel that can take
structure into account at multiple scales?
11. Main Idea
● “Compare graphs by subgraphs” recursively
– two graph are compared by subgraphs
● two subgraphs are compared by smaller subgraphs
– … so on
18. Laplacian Graph Kernel
● Laplacian graph kernel
– cannot compare graph with different size
– sensitive to vertex ordering
How to overcome it? [if node features are given]
20. still captures only global structure,
how to compare structures at multiple scale?
● feature space Laplacian graph kernel
Multiscale Laplacian Graph Kernel
[hint: node feature]
21. Multiscale Laplacian Graph Kernel
● feature space Laplacian graph kernel
induced feature vectors by similarity scores
calculated on smaller neighborhood
22. Multiscale Laplacian Graph Kernel
● Generalized FLG
Then, construct vertex features by
calculate the joint Gram matrix
24. Multiscale Laplacian Graph Kernel
use as base kernel at larger scale comparison
● Generalized FLG
– captures neighborhood similarity of the 2 vertex
– Each entry in Gram matrix:
35. Speedup
● Dynamic programming
– compute all 1st level pairs, then 2nd level to lookup
1st level pairs, … and so on
–
– For dataset with M graph, require
● Still too slow, how to speedup further?
36. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– eigendecompose Gram matrix to get basis vector
37. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
● But, decompose the huge Gram matrix is costly
and the # of basis increases
– eigendecompose Gram matrix to get basis vector
38. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
39. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
40. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
– can speedup only when all graphs are known
41. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
– can speedup only when all graphs are known
– large graph might require large sample size and
have to select more basis vector
44. Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible
45. Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
● Nevermind, their solid theoretical contribution
makes up for its limited use in practice.
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible
46. The Multiscale Laplacian Graph Kernel
NIPS 2016
Risi Kondor
Dept. of CS and Statistics,
University of Chicago
Horace Pan
Dept. of CS,
University of Chicago
3 citations
graph kernel 做的事情是給定兩張圖,回傳一個
similarity score 表示這兩張圖有多相似,這篇宣稱他
們是第一個能作到 multiscale 地比較兩張圖在不同
scale 上的結構的 graph kernel ,但其實他們錯了
reviewer 也有指出來之前報過得 WL 就可以作到
multiscale 的比較。
不過因為方法很有趣,理論也很紮實所以還是上了
NIPS
48. Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
Q: which one of the following is ethanol?
為什麼比較兩張圖,會需要看不同 scale 呢?我舉高
中化學常常出的考題,請問以下兩個都是 C2H6O 的
化合物,哪一個是醇類呢?
49. Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
● local structures are critical
at specific position of the graph
你可以只看 OH 鍵來分,或者是由醚類的氧分子是不
是處在對稱中心來分類。也就是說,想要比較兩張
圖來做比如圖的分類的話,你不只需要知道 local
structure 長相而且也可能要知道那個 local structure
在圖中的位置,比如 OH 是不是出現在尾端
50. Why Compare Graphs At Multiscale?
phishingtrojan
at specific position of the graph
● local structures are critical
51. Why Compare Graphs At Multiscale?
discussion-basedQA-based
● global property roughly summarizes the graph
at specific position of the graph
● local structures are critical
另外, global 看整個 network 粗略的特性也是很重要
的,比如:這是兩種不同類型的社群的討論情況,
中心點是 po 文的人,其他人回覆他的話就有個
link 。 QA-based 比如 stack overflow 的上問問題通
常可以得到精確的解答,但 discussion-based 比如
討論政治議題通常沒有正確解答,底下回文的人可
能彼此之間會吵起來,所以相比於 QA-based 會有
比較多的 interlink 。
52. A Good Graph Kernel Should...
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (this work → X)
一個好的 graph kernel ,應該要能夠 down 到 low-
level 的 local structure 的比較也能 scale up 到 high-
level roughly 的比較,並且在 scale up 的同時能考
慮相對位置的資訊。這篇號稱是第一個能達到以上
兩點的 paper 。最後是 practical 使用時不能太花時
間,這篇已經盡量快了但時間複雜度還是很高。
53. Related Work: Spectral Graph Kernels
● Be able to detect local structures
and its relative position in the graph(?)
● Be able to capture global property of the graph
● Be efficient to compute(?)
而 graph kernel 可以分成兩大類一類是 spectral
kernel 就是對整張圖的 adjacancy matrix 或
Laplacian matrix 取 spectrum 。這類 kernel 只能抓
到很概略的比較像是 network 大致的形狀、有幾個
connected component 等 global 的特性,而不
sensitive to local structure 。
54. Related Work: Local Graph Kernels
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (only WL-kernel)
而另一類就是 bag of structure ,定義好幾種小的
structure ,然後統計每種 structure 在圖中出現幾
次。這個方法當然可以偵測 local structure ,但沒辦
法 scale up 知道這個 local structure 在 graph 中的
位置。而且除了 WL 以外其他 bag of structure
kernel 都跑得很慢。
55. Problem Formulation
● How to define a graph kernel that can take
structure into account at multiple scales?
要如何定義一個能對兩張圖在不同 scale 上的
structure 做比較的 kernel 呢?
56. Main Idea
● “Compare graphs by subgraphs” recursively
– two graph are compared by subgraphs
● two subgraphs are compared by smaller subgraphs
– … so on
主要想法是比較兩張 graph 有多相似,可以 down 到
比較這兩張 graph 的 subgraph 有多相似。那麼為何
不 recursive 的比較下去呢?兩個 graph 的
subgraph 之間的比較也可以再 down subgraph 的
subgraph 做比較。
57. Laplacian Graph Kernel
● Construct a Gaussian graphical model
他們 introduce 一個叫做 Laplcain graph kernel ,是將
一張圖用 Markov Random Field 表示,並且將 node
potential 以及 edge potential 定義成這樣的話,那麼
這些 random variable 的 join distribution
58. Laplacian Graph Kernel
● Construct a Gaussian graphical model
可以用位在原點而 covariance matrix 就是 inverse 的
Laplacian matrix 的 Gaussian distribution
59. G1 G2
● To compare two graph
Laplacian Graph Kernel
所以想要比較兩張圖的話,我需要比較兩個
distribution
60. Laplacian Graph Kernel
G1 G2
Use Bhattacharyya kernel to compare distributions
● To compare two graph
這篇使用一個叫 Bhattacharyya kernel 來比較兩個
distribution 算法是這樣
61. Laplacian Graph Kernel
G1 G2
Has closed-form for Gaussian distribution
● To compare two graph
在 distribution 都是 Gaussian 可以寫成 closed-
form ,可是我們知道 Laplacian matrix 一定會有
eigenvalue 是 0 的解,所以分母項對 Laplacain
matrix 做 inverse 然後取 determine 的話
65. still captures only global structure,
how to compare structures at multiple scale?
● feature space Laplacian graph kernel
Multiscale Laplacian Graph Kernel
[hint: node feature]
那接下來要怎麼改進讓這個 kernel 可以考慮不同
scale 的結構呢?
66. Multiscale Laplacian Graph Kernel
● feature space Laplacian graph kernel
induced feature vectors by similarity scores
calculated on smaller neighborhood
回憶一下 main idea :比較 G1, G2 就是由 G1, G2 的
subgraph 做比較,那麼讓 vertex feature U1, U2 表
示的就是 subgraph 做比較的結果就行啦?不過
vertex feature 必須要跟 vertex 有關,我要怎麼樣讓
subgraph 比較結果跟對應的 vertex 連結呢?很直覺
的 subgraph 就會定義成 vertex 的 neighborhood 。
77. Speedup
● Recursive approach compares the same
subgraph pairs multiple times
有沒有辦法可以做加速呢? top-down recursive 算法
顯然存在很多重複的計算,比如這兩次的 kernel
evaluation ,紅色上下兩張圖是不同點的 2-hop
neighborhood
78. Speedup
● Recursive approach compares the same
subgraph pairs multiple times
● Can you speedup the computation?
但他們有很多重複的點,往下展開的計算就會重複,
對於這種問題,一個有好好學過 algorithm 人會有
sense 知道怎麼加速。
79. Speedup
● Dynamic programming
– compute all 1st level pairs, then 2nd level to lookup
1st level pairs, … and so on
–
沒錯!就是 dynamic programming ,我從底一層算
起,每次層直接算所有點兩兩 neighborhood 的相似
度,然後慢慢算到 L 層。總共需要的 kernel
evaluation 降到 Ln^2 次,而每一次 kernel
evaluation 必須要 decompose inverse Laplacian
matrix 所以需要 n^3 次。整體複雜度是 Ln^5
80. Speedup
● Dynamic programming
– compute all 1st level pairs, then 2nd level to lookup
1st level pairs, … and so on
–
– For dataset with M graph, require
● Still too slow, how to speedup further?
而且只計算了兩張圖而已,假如我的 dataset 有 M 張
圖,複雜度是 LM^2n^5 ,以 graph kernel 來說複雜
度還是非常高,平均圖有 100 個點的話複雜度就超
過了 10^10 要算幾個小時。所以,還有沒有辦法做
加速呢?仔細想想還有重複做的東西。
81. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– eigendecompose Gram matrix to get basis vector
他們的加速你聽起來會笑,他們想在每個 level 的比
較,一次 construct 出所有圖所有點的 feature ,所
以算了一個超級大的 Gram matrix ,包含了 dataset
中所有圖的點兩兩之間的相似度。
82. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
● But, decompose the huge Gram matrix is costly
and the # of basis increases
– eigendecompose Gram matrix to get basis vector
但 dataset 量一增大不用增到多大 1000 張圖好了,這
個 Gram matrix 就已經放不下 memory 了,別說還
要做 eigenvalue decomposition ,比起一次只
construct 兩張圖的 vertex feature ,這個超大的
Gram matrix 包含了更多的 eigenvector ,效果和每
次兩兩比較是不一樣的。
可是他們的複雜度實在是太誇張了,他們不管這個問
題,先把複雜度降低一點再說。
你們有想法嗎?
83. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
他們作法就是我只 sample N 個點,然後只取到前 P
大 eigenvalue 對應的 eigenvector 當作 basis ,然
後一次將所有點 project 到這 P 個 basis spanned 的
空間上。
84. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
最後複雜度是這樣, dataset 大小乘上 L 層再乘上
sample 量的三次方或 basis 數的三次方,總之對於
有 1000 個圖的 dataset sample 量取 100 可能就要
算幾個小時了。
85. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
– can speedup only when all graphs are known
而且還有幾個問題,第一個是他們這種加速,因為要
一次 construct 所有圖的 feature ,也就是說當我想
用在沒看過得圖的比較時,這個加速就沒辦法用
了,因為 dataset 中原本已經 construct 好的圖的
vertex feature 要重算。
86. Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
– can speedup only when all graphs are known
– large graph might require large sample size and
have to select more basis vector
再來第二點,顯而易見的不 scale 阿,對大一點的圖
來說,加速用的 sample 量要提高而且 basis 也會比
較多。
87. Experiment
– setting
Time complexity for comparing M graphs
實驗姑且看一下,在小小的 benchmark dataset 的,
用了加速方法取 10 個 basis ,幾百個 sample 的
performance ,我很想看它的計算時間但他們沒放。
88. Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
確實是個 novel 寫起來千迴百轉看得很過癮的方法,
只有 9 頁對他們來說太少了,他們 present 時一張
圖都沒有,我花了點時間才參透這個方法。
89. Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible
但是呢,複雜度太高了實務上很難用阿。而且要計算
兩兩 vertex 相似度他們還需要額外的 node
feature ,而我們不用。再來是當要計算的圖是沒看
過的,他們所需的計算時間又升回 node 數的 5 次
方。
90. Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
● Nevermind, their solid theoretical contribution
makes up for its limited use in practice.
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible
但都沒關係, machine learning community 欣賞的就
是他們這樣理論很紮實的方法,這些實務上的缺
陷,甚至他們在 related work 介紹時還講錯都可以
原諒。