SlideShare a Scribd company logo
1 of 90
Download to read offline
The Multiscale Laplacian Graph Kernel
NIPS 2016
Risi Kondor
Dept. of CS and Statistics,
University of Chicago
Horace Pan
Dept. of CS,
University of Chicago
3 citations
This Work Focus on Graph Kernels
Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
Q: which one of the following is ethanol?
Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
● local structures are critical
at specific position of the graph
Why Compare Graphs At Multiscale?
phishingtrojan
at specific position of the graph
● local structures are critical
Why Compare Graphs At Multiscale?
discussion-basedQA-based
● global property roughly summarizes the graph
at specific position of the graph
● local structures are critical
A Good Graph Kernel Should...
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (this work → X)
Related Work: Spectral Graph Kernels
● Be able to detect local structures
and its relative position in the graph(?)
● Be able to capture global property of the graph
● Be efficient to compute(?)
Related Work: Local Graph Kernels
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (only WL-kernel)
Problem Formulation
● How to define a graph kernel that can take
structure into account at multiple scales?
Main Idea
● “Compare graphs by subgraphs” recursively
– two graph are compared by subgraphs
● two subgraphs are compared by smaller subgraphs
– … so on
Laplacian Graph Kernel
● Construct a Gaussian graphical model
Laplacian Graph Kernel
● Construct a Gaussian graphical model
G1 G2
● To compare two graph
Laplacian Graph Kernel
Laplacian Graph Kernel
G1 G2
Use Bhattacharyya kernel to compare distributions
● To compare two graph
Laplacian Graph Kernel
G1 G2
Has closed-form for Gaussian distribution
● To compare two graph
Laplacian Graph Kernel
● Laplacian graph kernel
Laplacian Graph Kernel
● Laplacian graph kernel
– cannot compare graph with different size
– sensitive to vertex ordering
How to overcome it? [if node features are given]
● feature space Laplacian graph kernel
Multiscale Laplacian Graph Kernel
still captures only global structure,
how to compare structures at multiple scale?
● feature space Laplacian graph kernel
Multiscale Laplacian Graph Kernel
[hint: node feature]
Multiscale Laplacian Graph Kernel
● feature space Laplacian graph kernel
induced feature vectors by similarity scores
calculated on smaller neighborhood
Multiscale Laplacian Graph Kernel
● Generalized FLG
Then, construct vertex features by
calculate the joint Gram matrix
Multiscale Laplacian Graph Kernel
eigendecompose
● Generalized FLG
Multiscale Laplacian Graph Kernel
use as base kernel at larger scale comparison
● Generalized FLG
– captures neighborhood similarity of the 2 vertex
– Each entry in Gram matrix:
G1 G2
n1
+
n2
GL(u) GL(v)
n1
+
n2
GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
GL(u) GL(v)
n1
+
n2
GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
...
G0(uuu) G0(vvv)
|G1
(uu)|
+
|G1
(vv)|
GL(u) GL(v)
n1
+
n2
GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
...
G0(uuu) G0(vvv)
|G1
(uu)|
+
|G1
(vv)|
GL(u) GL(v)
n1
+
n2
Speedup
● Recursive approach compares the same
subgraph pairs multiple times
Speedup
● Recursive approach compares the same
subgraph pairs multiple times
● Can you speedup the computation?
Speedup
● Dynamic programming
– compute all 1st level pairs, then 2nd level to lookup
1st level pairs, … and so on
–
Speedup
● Dynamic programming
– compute all 1st level pairs, then 2nd level to lookup
1st level pairs, … and so on
–
– For dataset with M graph, require
● Still too slow, how to speedup further?
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– eigendecompose Gram matrix to get basis vector
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
● But, decompose the huge Gram matrix is costly
and the # of basis increases
– eigendecompose Gram matrix to get basis vector
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
– can speedup only when all graphs are known
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
– can speedup only when all graphs are known
– large graph might require large sample size and
have to select more basis vector
Experiment
– setting
Time complexity for comparing M graphs
Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible
Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
● Nevermind, their solid theoretical contribution
makes up for its limited use in practice.
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible
The Multiscale Laplacian Graph Kernel
NIPS 2016
Risi Kondor
Dept. of CS and Statistics,
University of Chicago
Horace Pan
Dept. of CS,
University of Chicago
3 citations
graph kernel 做的事情是給定兩張圖,回傳一個
similarity score 表示這兩張圖有多相似,這篇宣稱他
們是第一個能作到 multiscale 地比較兩張圖在不同
scale 上的結構的 graph kernel ,但其實他們錯了
reviewer 也有指出來之前報過得 WL 就可以作到
multiscale 的比較。
不過因為方法很有趣,理論也很紮實所以還是上了
NIPS
This Work Focus on Graph Kernels
Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
Q: which one of the following is ethanol?
為什麼比較兩張圖,會需要看不同 scale 呢?我舉高
中化學常常出的考題,請問以下兩個都是 C2H6O 的
化合物,哪一個是醇類呢?
Why Compare Graphs At Multiscale?
C CH
H
H
H
H
O H C OH
H
H
H
H
C H
● local structures are critical
at specific position of the graph
你可以只看 OH 鍵來分,或者是由醚類的氧分子是不
是處在對稱中心來分類。也就是說,想要比較兩張
圖來做比如圖的分類的話,你不只需要知道 local
structure 長相而且也可能要知道那個 local structure
在圖中的位置,比如 OH 是不是出現在尾端
Why Compare Graphs At Multiscale?
phishingtrojan
at specific position of the graph
● local structures are critical
Why Compare Graphs At Multiscale?
discussion-basedQA-based
● global property roughly summarizes the graph
at specific position of the graph
● local structures are critical
另外, global 看整個 network 粗略的特性也是很重要
的,比如:這是兩種不同類型的社群的討論情況,
中心點是 po 文的人,其他人回覆他的話就有個
link 。 QA-based 比如 stack overflow 的上問問題通
常可以得到精確的解答,但 discussion-based 比如
討論政治議題通常沒有正確解答,底下回文的人可
能彼此之間會吵起來,所以相比於 QA-based 會有
比較多的 interlink 。
A Good Graph Kernel Should...
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (this work → X)
一個好的 graph kernel ,應該要能夠 down 到 low-
level 的 local structure 的比較也能 scale up 到 high-
level roughly 的比較,並且在 scale up 的同時能考
慮相對位置的資訊。這篇號稱是第一個能達到以上
兩點的 paper 。最後是 practical 使用時不能太花時
間,這篇已經盡量快了但時間複雜度還是很高。
Related Work: Spectral Graph Kernels
● Be able to detect local structures
and its relative position in the graph(?)
● Be able to capture global property of the graph
● Be efficient to compute(?)
而 graph kernel 可以分成兩大類一類是 spectral
kernel 就是對整張圖的 adjacancy matrix 或
Laplacian matrix 取 spectrum 。這類 kernel 只能抓
到很概略的比較像是 network 大致的形狀、有幾個
connected component 等 global 的特性,而不
sensitive to local structure 。
Related Work: Local Graph Kernels
● Be able to detect local structures
and its relative position in the graph
● Be able to capture global property of the graph
● Be efficient to compute (only WL-kernel)
而另一類就是 bag of structure ,定義好幾種小的
structure ,然後統計每種 structure 在圖中出現幾
次。這個方法當然可以偵測 local structure ,但沒辦
法 scale up 知道這個 local structure 在 graph 中的
位置。而且除了 WL 以外其他 bag of structure
kernel 都跑得很慢。
Problem Formulation
● How to define a graph kernel that can take
structure into account at multiple scales?
要如何定義一個能對兩張圖在不同 scale 上的
structure 做比較的 kernel 呢?
Main Idea
● “Compare graphs by subgraphs” recursively
– two graph are compared by subgraphs
● two subgraphs are compared by smaller subgraphs
– … so on
主要想法是比較兩張 graph 有多相似,可以 down 到
比較這兩張 graph 的 subgraph 有多相似。那麼為何
不 recursive 的比較下去呢?兩個 graph 的
subgraph 之間的比較也可以再 down subgraph 的
subgraph 做比較。
Laplacian Graph Kernel
● Construct a Gaussian graphical model
他們 introduce 一個叫做 Laplcain graph kernel ,是將
一張圖用 Markov Random Field 表示,並且將 node
potential 以及 edge potential 定義成這樣的話,那麼
這些 random variable 的 join distribution
Laplacian Graph Kernel
● Construct a Gaussian graphical model
可以用位在原點而 covariance matrix 就是 inverse 的
Laplacian matrix 的 Gaussian distribution
G1 G2
● To compare two graph
Laplacian Graph Kernel
所以想要比較兩張圖的話,我需要比較兩個
distribution
Laplacian Graph Kernel
G1 G2
Use Bhattacharyya kernel to compare distributions
● To compare two graph
這篇使用一個叫 Bhattacharyya kernel 來比較兩個
distribution 算法是這樣
Laplacian Graph Kernel
G1 G2
Has closed-form for Gaussian distribution
● To compare two graph
在 distribution 都是 Gaussian 可以寫成 closed-
form ,可是我們知道 Laplacian matrix 一定會有
eigenvalue 是 0 的解,所以分母項對 Laplacain
matrix 做 inverse 然後取 determine 的話
Laplacian Graph Kernel
● Laplacian graph kernel
所以他們的作法是 regularize distribution
在 covariance matrix 多加上了常數倍的 identity
matrix 。那麼,這就是 Laplacian graph kernel
Laplacian Graph Kernel
● Laplacian graph kernel
– cannot compare graph with different size
– sensitive to vertex ordering
How to overcome it? [if node features are given]
目前還無法作到 multiscale ,我們一步一步來改進這
個 kernel 。首先來解決一個問題,這個問題導致兩
個限制。第一個限制是要做比較的兩張圖一定要是
一樣大小,因為你看分子項,兩個 laplacian matrix
的 inverse dimension 要相同才能相加。第二這個
kernel 在 vertex reordering 下不是 invariant ,給
vertex 順序是為了寫成 adjacency matrix ,而將兩
個 vertex 的順序調換,但是對應的 laplacian matrix
就不一樣了。
● feature space Laplacian graph kernel
Multiscale Laplacian Graph Kernel
在 Laplacian matrix 的 inverse 的前後跟 node feature
做矩陣乘法
still captures only global structure,
how to compare structures at multiple scale?
● feature space Laplacian graph kernel
Multiscale Laplacian Graph Kernel
[hint: node feature]
那接下來要怎麼改進讓這個 kernel 可以考慮不同
scale 的結構呢?
Multiscale Laplacian Graph Kernel
● feature space Laplacian graph kernel
induced feature vectors by similarity scores
calculated on smaller neighborhood
回憶一下 main idea :比較 G1, G2 就是由 G1, G2 的
subgraph 做比較,那麼讓 vertex feature U1, U2 表
示的就是 subgraph 做比較的結果就行啦?不過
vertex feature 必須要跟 vertex 有關,我要怎麼樣讓
subgraph 比較結果跟對應的 vertex 連結呢?很直覺
的 subgraph 就會定義成 vertex 的 neighborhood 。
Multiscale Laplacian Graph Kernel
● Generalized FLG
Then, construct vertex features by
calculate the joint Gram matrix
作法是對兩張圖所有的 vertex ,計算一個 joint 的
Gram matrix ,裡面每個 entry 表示兩個 vertex
neighborhood 用另一個 kernel 計算出的相似度。
Multiscale Laplacian Graph Kernel
eigendecompose
● Generalized FLG
然後就可以來 construct vertex feature 了,由於
vertex feature 要 align 同個 space 上,所以我將
Gram matrix 做 eigenvalue decomposition ,令 Q
表示取前 top 大的幾個 eigenvalue 開根號後乘上對
應的 eigenvector 。那麼 G1 的 vertex feature 就是
Q 中對應的到那些 G1 vertex 的 column
Multiscale Laplacian Graph Kernel
use as base kernel at larger scale comparison
● Generalized FLG
– captures neighborhood similarity of the 2 vertex
– Each entry in Gram matrix:
那麼這個 kernel FLG kernel 就實現了這篇 paper 核心
概念,比較兩張圖 G1, G2 是由比較他們的
subgraph 的結果決定的,而且可以 recursive 的將
FLG kernel 本身用在 subgraph Gram matrix 的計算
上。那剩下的問題 multiscale 的每個 level 要比較多
大範圍的 neighborhood ?
G1 G2
那我來 top-down 的給個例子, MLG 是怎麼比較兩張
圖的。給定兩張圖 G1,G2
FLG kernel 計算這兩張圖的相似度,必須要先
construct vertex feature ,
n1
+
n2
來得到兩張圖中所有的 vertex 兩兩 neighborhood 的
相似度,那多大範圍的 neighborhood 呢?假如我取
到 L=2-hop 的 neighborhood
GL(u) GL(v)
n1
+
n2
所以這一層每一個 Gram matrix entry 是用 FLG
kernel 在 vertex 兩兩 2-hop 的 neighborhood 算出
的。那算每一對 2-hop neighborhood 的相似度
GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
GL(u) GL(v)
n1
+
n2
又必須先計算這兩個 2-hop neighborhood 中所有點兩
兩 vertex 在 1-hop neighborhood 相似度
GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
...
G0(uuu) G0(vvv)
|G1
(uu)|
+
|G1
(vv)|
GL(u) GL(v)
n1
+
n2
就這樣展開直到對底層比較兩個 vertex 就用最基本的
cosine similarity kernel 來算。那你想像這樣計算的
時間複雜度非常高,多高呢?
GL-1(uu) GL-1(vv)
|GL
(u)|
+
|GL
(v)|
...
G0(uuu) G0(vvv)
|G1
(uu)|
+
|G1
(vv)|
GL(u) GL(v)
n1
+
n2
假如 G1,G 分別有 n1 和 n2 個點,最上層總共會做 C
n1+n2 取 2 次 L-hop neighborhood 比較,而每一對
L-hop neighborhood 會算幾次呢?不是固定的但我
們可以來想 upper bound 情況是可能會包含到所有
點,比如圖中任何兩個點之間都有 edge 話,那 L-
hop neighborhood 會包含到所有點,接下來到最底
層也是,所以整體複雜度是 n 的 2L+2 次方,超慢,
實務上根本無法使用
Speedup
● Recursive approach compares the same
subgraph pairs multiple times
有沒有辦法可以做加速呢? top-down recursive 算法
顯然存在很多重複的計算,比如這兩次的 kernel
evaluation ,紅色上下兩張圖是不同點的 2-hop
neighborhood
Speedup
● Recursive approach compares the same
subgraph pairs multiple times
● Can you speedup the computation?
但他們有很多重複的點,往下展開的計算就會重複,
對於這種問題,一個有好好學過 algorithm 人會有
sense 知道怎麼加速。
Speedup
● Dynamic programming
– compute all 1st level pairs, then 2nd level to lookup
1st level pairs, … and so on
–
沒錯!就是 dynamic programming ,我從底一層算
起,每次層直接算所有點兩兩 neighborhood 的相似
度,然後慢慢算到 L 層。總共需要的 kernel
evaluation 降到 Ln^2 次,而每一次 kernel
evaluation 必須要 decompose inverse Laplacian
matrix 所以需要 n^3 次。整體複雜度是 Ln^5
Speedup
● Dynamic programming
– compute all 1st level pairs, then 2nd level to lookup
1st level pairs, … and so on
–
– For dataset with M graph, require
● Still too slow, how to speedup further?
而且只計算了兩張圖而已,假如我的 dataset 有 M 張
圖,複雜度是 LM^2n^5 ,以 graph kernel 來說複雜
度還是非常高,平均圖有 100 個點的話複雜度就超
過了 10^10 要算幾個小時。所以,還有沒有辦法做
加速呢?仔細想想還有重複做的東西。
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– eigendecompose Gram matrix to get basis vector
他們的加速你聽起來會笑,他們想在每個 level 的比
較,一次 construct 出所有圖所有點的 feature ,所
以算了一個超級大的 Gram matrix ,包含了 dataset
中所有圖的點兩兩之間的相似度。
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
● But, decompose the huge Gram matrix is costly
and the # of basis increases
– eigendecompose Gram matrix to get basis vector
但 dataset 量一增大不用增到多大 1000 張圖好了,這
個 Gram matrix 就已經放不下 memory 了,別說還
要做 eigenvalue decomposition ,比起一次只
construct 兩張圖的 vertex feature ,這個超大的
Gram matrix 包含了更多的 eigenvector ,效果和每
次兩兩比較是不一樣的。
可是他們的複雜度實在是太誇張了,他們不管這個問
題,先把複雜度降低一點再說。
你們有想法嗎?
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
他們作法就是我只 sample N 個點,然後只取到前 P
大 eigenvalue 對應的 eigenvector 當作 basis ,然
後一次將所有點 project 到這 P 個 basis spanned 的
空間上。
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
最後複雜度是這樣, dataset 大小乘上 L 層再乘上
sample 量的三次方或 basis 數的三次方,總之對於
有 1000 個圖的 dataset sample 量取 100 可能就要
算幾個小時了。
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
– can speedup only when all graphs are known
而且還有幾個問題,第一個是他們這種加速,因為要
一次 construct 所有圖的 feature ,也就是說當我想
用在沒看過得圖的比較時,這個加速就沒辦法用
了,因為 dataset 中原本已經 construct 好的圖的
vertex feature 要重算。
Speedup
● At each level of DP,
– compute Gram matrix between all vertex of all graphs
– then, project all vertex onto the same space
– sample only vertex and select basis vector
● Time complexity for comparing M graphs
– can speedup only when all graphs are known
– large graph might require large sample size and
have to select more basis vector
再來第二點,顯而易見的不 scale 阿,對大一點的圖
來說,加速用的 sample 量要提高而且 basis 也會比
較多。
Experiment
– setting
Time complexity for comparing M graphs
實驗姑且看一下,在小小的 benchmark dataset 的,
用了加速方法取 10 個 basis ,幾百個 sample 的
performance ,我很想看它的計算時間但他們沒放。
Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
確實是個 novel 寫起來千迴百轉看得很過癮的方法,
只有 9 頁對他們來說太少了,他們 present 時一張
圖都沒有,我花了點時間才參透這個方法。
Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible
但是呢,複雜度太高了實務上很難用阿。而且要計算
兩兩 vertex 相似度他們還需要額外的 node
feature ,而我們不用。再來是當要計算的圖是沒看
過的,他們所需的計算時間又升回 node 數的 5 次
方。
Conclusion
● Refreshing novel approach and so intense to
fit into just 9 pages
● Nevermind, their solid theoretical contribution
makes up for its limited use in practice.
● However, it’s
– hard to scale on large dataset
– require node features, and
– require all graphs are known to be computation feasible
但都沒關係, machine learning community 欣賞的就
是他們這樣理論很紮實的方法,這些實務上的缺
陷,甚至他們在 related work 介紹時還講錯都可以
原諒。

More Related Content

What's hot

Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processingHarisankar H
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabVijay Srinivas Agneeswaran, Ph.D
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsINRIA-OAK
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems researchVasia Kalavri
 

What's hot (9)

Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
dev_int_96
dev_int_96dev_int_96
dev_int_96
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Unit3 slides
Unit3 slidesUnit3 slides
Unit3 slides
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 

Similar to Multiscale Laplacian Graph Kernel

NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...ssuser4b1f48
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraphsscdotopen
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 

Similar to Multiscale Laplacian Graph Kernel (20)

NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
 
Graph chi
Graph chiGraph chi
Graph chi
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Graph processing
Graph processingGraph processing
Graph processing
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14
 
S2
S2S2
S2
 

More from Ruochun Tzeng

Word Mover's Distance
Word Mover's DistanceWord Mover's Distance
Word Mover's DistanceRuochun Tzeng
 
PRUNE: Preserving Proximity and Global Ranking for Network Embedding
PRUNE: Preserving Proximity and Global Ranking for Network EmbeddingPRUNE: Preserving Proximity and Global Ranking for Network Embedding
PRUNE: Preserving Proximity and Global Ranking for Network EmbeddingRuochun Tzeng
 
Omni-Prop: Seamless Node Classification on Arbitrary Label Correlation
Omni-Prop: Seamless Node Classification on Arbitrary Label CorrelationOmni-Prop: Seamless Node Classification on Arbitrary Label Correlation
Omni-Prop: Seamless Node Classification on Arbitrary Label CorrelationRuochun Tzeng
 
Scalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Scalable gradientbasedtuningcontinuousregularizationhyperparameters pptScalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Scalable gradientbasedtuningcontinuousregularizationhyperparameters pptRuochun Tzeng
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural NetworkRuochun Tzeng
 
On the Number of Linear Regions of DNN
On the Number of Linear Regions of DNNOn the Number of Linear Regions of DNN
On the Number of Linear Regions of DNNRuochun Tzeng
 
Cloud Database Final Project
Cloud Database Final ProjectCloud Database Final Project
Cloud Database Final ProjectRuochun Tzeng
 
Lab: Foundation of Concurrent and Distributed Systems
Lab: Foundation of Concurrent and Distributed SystemsLab: Foundation of Concurrent and Distributed Systems
Lab: Foundation of Concurrent and Distributed SystemsRuochun Tzeng
 
LCDS - State Presentation
LCDS - State PresentationLCDS - State Presentation
LCDS - State PresentationRuochun Tzeng
 

More from Ruochun Tzeng (10)

Word Mover's Distance
Word Mover's DistanceWord Mover's Distance
Word Mover's Distance
 
PRUNE: Preserving Proximity and Global Ranking for Network Embedding
PRUNE: Preserving Proximity and Global Ranking for Network EmbeddingPRUNE: Preserving Proximity and Global Ranking for Network Embedding
PRUNE: Preserving Proximity and Global Ranking for Network Embedding
 
Omni-Prop: Seamless Node Classification on Arbitrary Label Correlation
Omni-Prop: Seamless Node Classification on Arbitrary Label CorrelationOmni-Prop: Seamless Node Classification on Arbitrary Label Correlation
Omni-Prop: Seamless Node Classification on Arbitrary Label Correlation
 
Scalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Scalable gradientbasedtuningcontinuousregularizationhyperparameters pptScalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
Scalable gradientbasedtuningcontinuousregularizationhyperparameters ppt
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural Network
 
On the Number of Linear Regions of DNN
On the Number of Linear Regions of DNNOn the Number of Linear Regions of DNN
On the Number of Linear Regions of DNN
 
XSSearch
XSSearchXSSearch
XSSearch
 
Cloud Database Final Project
Cloud Database Final ProjectCloud Database Final Project
Cloud Database Final Project
 
Lab: Foundation of Concurrent and Distributed Systems
Lab: Foundation of Concurrent and Distributed SystemsLab: Foundation of Concurrent and Distributed Systems
Lab: Foundation of Concurrent and Distributed Systems
 
LCDS - State Presentation
LCDS - State PresentationLCDS - State Presentation
LCDS - State Presentation
 

Recently uploaded

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Recently uploaded (20)

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Multiscale Laplacian Graph Kernel

  • 1. The Multiscale Laplacian Graph Kernel NIPS 2016 Risi Kondor Dept. of CS and Statistics, University of Chicago Horace Pan Dept. of CS, University of Chicago 3 citations
  • 2. This Work Focus on Graph Kernels
  • 3. Why Compare Graphs At Multiscale? C CH H H H H O H C OH H H H H C H Q: which one of the following is ethanol?
  • 4. Why Compare Graphs At Multiscale? C CH H H H H O H C OH H H H H C H ● local structures are critical at specific position of the graph
  • 5. Why Compare Graphs At Multiscale? phishingtrojan at specific position of the graph ● local structures are critical
  • 6. Why Compare Graphs At Multiscale? discussion-basedQA-based ● global property roughly summarizes the graph at specific position of the graph ● local structures are critical
  • 7. A Good Graph Kernel Should... ● Be able to detect local structures and its relative position in the graph ● Be able to capture global property of the graph ● Be efficient to compute (this work → X)
  • 8. Related Work: Spectral Graph Kernels ● Be able to detect local structures and its relative position in the graph(?) ● Be able to capture global property of the graph ● Be efficient to compute(?)
  • 9. Related Work: Local Graph Kernels ● Be able to detect local structures and its relative position in the graph ● Be able to capture global property of the graph ● Be efficient to compute (only WL-kernel)
  • 10. Problem Formulation ● How to define a graph kernel that can take structure into account at multiple scales?
  • 11. Main Idea ● “Compare graphs by subgraphs” recursively – two graph are compared by subgraphs ● two subgraphs are compared by smaller subgraphs – … so on
  • 12. Laplacian Graph Kernel ● Construct a Gaussian graphical model
  • 13. Laplacian Graph Kernel ● Construct a Gaussian graphical model
  • 14. G1 G2 ● To compare two graph Laplacian Graph Kernel
  • 15. Laplacian Graph Kernel G1 G2 Use Bhattacharyya kernel to compare distributions ● To compare two graph
  • 16. Laplacian Graph Kernel G1 G2 Has closed-form for Gaussian distribution ● To compare two graph
  • 17. Laplacian Graph Kernel ● Laplacian graph kernel
  • 18. Laplacian Graph Kernel ● Laplacian graph kernel – cannot compare graph with different size – sensitive to vertex ordering How to overcome it? [if node features are given]
  • 19. ● feature space Laplacian graph kernel Multiscale Laplacian Graph Kernel
  • 20. still captures only global structure, how to compare structures at multiple scale? ● feature space Laplacian graph kernel Multiscale Laplacian Graph Kernel [hint: node feature]
  • 21. Multiscale Laplacian Graph Kernel ● feature space Laplacian graph kernel induced feature vectors by similarity scores calculated on smaller neighborhood
  • 22. Multiscale Laplacian Graph Kernel ● Generalized FLG Then, construct vertex features by calculate the joint Gram matrix
  • 23. Multiscale Laplacian Graph Kernel eigendecompose ● Generalized FLG
  • 24. Multiscale Laplacian Graph Kernel use as base kernel at larger scale comparison ● Generalized FLG – captures neighborhood similarity of the 2 vertex – Each entry in Gram matrix:
  • 25. G1 G2
  • 26.
  • 32. Speedup ● Recursive approach compares the same subgraph pairs multiple times
  • 33. Speedup ● Recursive approach compares the same subgraph pairs multiple times ● Can you speedup the computation?
  • 34. Speedup ● Dynamic programming – compute all 1st level pairs, then 2nd level to lookup 1st level pairs, … and so on –
  • 35. Speedup ● Dynamic programming – compute all 1st level pairs, then 2nd level to lookup 1st level pairs, … and so on – – For dataset with M graph, require ● Still too slow, how to speedup further?
  • 36. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – eigendecompose Gram matrix to get basis vector
  • 37. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space ● But, decompose the huge Gram matrix is costly and the # of basis increases – eigendecompose Gram matrix to get basis vector
  • 38. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – sample only vertex and select basis vector
  • 39. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – sample only vertex and select basis vector ● Time complexity for comparing M graphs
  • 40. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – sample only vertex and select basis vector ● Time complexity for comparing M graphs – can speedup only when all graphs are known
  • 41. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – sample only vertex and select basis vector ● Time complexity for comparing M graphs – can speedup only when all graphs are known – large graph might require large sample size and have to select more basis vector
  • 42. Experiment – setting Time complexity for comparing M graphs
  • 43. Conclusion ● Refreshing novel approach and so intense to fit into just 9 pages
  • 44. Conclusion ● Refreshing novel approach and so intense to fit into just 9 pages ● However, it’s – hard to scale on large dataset – require node features, and – require all graphs are known to be computation feasible
  • 45. Conclusion ● Refreshing novel approach and so intense to fit into just 9 pages ● Nevermind, their solid theoretical contribution makes up for its limited use in practice. ● However, it’s – hard to scale on large dataset – require node features, and – require all graphs are known to be computation feasible
  • 46. The Multiscale Laplacian Graph Kernel NIPS 2016 Risi Kondor Dept. of CS and Statistics, University of Chicago Horace Pan Dept. of CS, University of Chicago 3 citations graph kernel 做的事情是給定兩張圖,回傳一個 similarity score 表示這兩張圖有多相似,這篇宣稱他 們是第一個能作到 multiscale 地比較兩張圖在不同 scale 上的結構的 graph kernel ,但其實他們錯了 reviewer 也有指出來之前報過得 WL 就可以作到 multiscale 的比較。 不過因為方法很有趣,理論也很紮實所以還是上了 NIPS
  • 47. This Work Focus on Graph Kernels
  • 48. Why Compare Graphs At Multiscale? C CH H H H H O H C OH H H H H C H Q: which one of the following is ethanol? 為什麼比較兩張圖,會需要看不同 scale 呢?我舉高 中化學常常出的考題,請問以下兩個都是 C2H6O 的 化合物,哪一個是醇類呢?
  • 49. Why Compare Graphs At Multiscale? C CH H H H H O H C OH H H H H C H ● local structures are critical at specific position of the graph 你可以只看 OH 鍵來分,或者是由醚類的氧分子是不 是處在對稱中心來分類。也就是說,想要比較兩張 圖來做比如圖的分類的話,你不只需要知道 local structure 長相而且也可能要知道那個 local structure 在圖中的位置,比如 OH 是不是出現在尾端
  • 50. Why Compare Graphs At Multiscale? phishingtrojan at specific position of the graph ● local structures are critical
  • 51. Why Compare Graphs At Multiscale? discussion-basedQA-based ● global property roughly summarizes the graph at specific position of the graph ● local structures are critical 另外, global 看整個 network 粗略的特性也是很重要 的,比如:這是兩種不同類型的社群的討論情況, 中心點是 po 文的人,其他人回覆他的話就有個 link 。 QA-based 比如 stack overflow 的上問問題通 常可以得到精確的解答,但 discussion-based 比如 討論政治議題通常沒有正確解答,底下回文的人可 能彼此之間會吵起來,所以相比於 QA-based 會有 比較多的 interlink 。
  • 52. A Good Graph Kernel Should... ● Be able to detect local structures and its relative position in the graph ● Be able to capture global property of the graph ● Be efficient to compute (this work → X) 一個好的 graph kernel ,應該要能夠 down 到 low- level 的 local structure 的比較也能 scale up 到 high- level roughly 的比較,並且在 scale up 的同時能考 慮相對位置的資訊。這篇號稱是第一個能達到以上 兩點的 paper 。最後是 practical 使用時不能太花時 間,這篇已經盡量快了但時間複雜度還是很高。
  • 53. Related Work: Spectral Graph Kernels ● Be able to detect local structures and its relative position in the graph(?) ● Be able to capture global property of the graph ● Be efficient to compute(?) 而 graph kernel 可以分成兩大類一類是 spectral kernel 就是對整張圖的 adjacancy matrix 或 Laplacian matrix 取 spectrum 。這類 kernel 只能抓 到很概略的比較像是 network 大致的形狀、有幾個 connected component 等 global 的特性,而不 sensitive to local structure 。
  • 54. Related Work: Local Graph Kernels ● Be able to detect local structures and its relative position in the graph ● Be able to capture global property of the graph ● Be efficient to compute (only WL-kernel) 而另一類就是 bag of structure ,定義好幾種小的 structure ,然後統計每種 structure 在圖中出現幾 次。這個方法當然可以偵測 local structure ,但沒辦 法 scale up 知道這個 local structure 在 graph 中的 位置。而且除了 WL 以外其他 bag of structure kernel 都跑得很慢。
  • 55. Problem Formulation ● How to define a graph kernel that can take structure into account at multiple scales? 要如何定義一個能對兩張圖在不同 scale 上的 structure 做比較的 kernel 呢?
  • 56. Main Idea ● “Compare graphs by subgraphs” recursively – two graph are compared by subgraphs ● two subgraphs are compared by smaller subgraphs – … so on 主要想法是比較兩張 graph 有多相似,可以 down 到 比較這兩張 graph 的 subgraph 有多相似。那麼為何 不 recursive 的比較下去呢?兩個 graph 的 subgraph 之間的比較也可以再 down subgraph 的 subgraph 做比較。
  • 57. Laplacian Graph Kernel ● Construct a Gaussian graphical model 他們 introduce 一個叫做 Laplcain graph kernel ,是將 一張圖用 Markov Random Field 表示,並且將 node potential 以及 edge potential 定義成這樣的話,那麼 這些 random variable 的 join distribution
  • 58. Laplacian Graph Kernel ● Construct a Gaussian graphical model 可以用位在原點而 covariance matrix 就是 inverse 的 Laplacian matrix 的 Gaussian distribution
  • 59. G1 G2 ● To compare two graph Laplacian Graph Kernel 所以想要比較兩張圖的話,我需要比較兩個 distribution
  • 60. Laplacian Graph Kernel G1 G2 Use Bhattacharyya kernel to compare distributions ● To compare two graph 這篇使用一個叫 Bhattacharyya kernel 來比較兩個 distribution 算法是這樣
  • 61. Laplacian Graph Kernel G1 G2 Has closed-form for Gaussian distribution ● To compare two graph 在 distribution 都是 Gaussian 可以寫成 closed- form ,可是我們知道 Laplacian matrix 一定會有 eigenvalue 是 0 的解,所以分母項對 Laplacain matrix 做 inverse 然後取 determine 的話
  • 62. Laplacian Graph Kernel ● Laplacian graph kernel 所以他們的作法是 regularize distribution 在 covariance matrix 多加上了常數倍的 identity matrix 。那麼,這就是 Laplacian graph kernel
  • 63. Laplacian Graph Kernel ● Laplacian graph kernel – cannot compare graph with different size – sensitive to vertex ordering How to overcome it? [if node features are given] 目前還無法作到 multiscale ,我們一步一步來改進這 個 kernel 。首先來解決一個問題,這個問題導致兩 個限制。第一個限制是要做比較的兩張圖一定要是 一樣大小,因為你看分子項,兩個 laplacian matrix 的 inverse dimension 要相同才能相加。第二這個 kernel 在 vertex reordering 下不是 invariant ,給 vertex 順序是為了寫成 adjacency matrix ,而將兩 個 vertex 的順序調換,但是對應的 laplacian matrix 就不一樣了。
  • 64. ● feature space Laplacian graph kernel Multiscale Laplacian Graph Kernel 在 Laplacian matrix 的 inverse 的前後跟 node feature 做矩陣乘法
  • 65. still captures only global structure, how to compare structures at multiple scale? ● feature space Laplacian graph kernel Multiscale Laplacian Graph Kernel [hint: node feature] 那接下來要怎麼改進讓這個 kernel 可以考慮不同 scale 的結構呢?
  • 66. Multiscale Laplacian Graph Kernel ● feature space Laplacian graph kernel induced feature vectors by similarity scores calculated on smaller neighborhood 回憶一下 main idea :比較 G1, G2 就是由 G1, G2 的 subgraph 做比較,那麼讓 vertex feature U1, U2 表 示的就是 subgraph 做比較的結果就行啦?不過 vertex feature 必須要跟 vertex 有關,我要怎麼樣讓 subgraph 比較結果跟對應的 vertex 連結呢?很直覺 的 subgraph 就會定義成 vertex 的 neighborhood 。
  • 67. Multiscale Laplacian Graph Kernel ● Generalized FLG Then, construct vertex features by calculate the joint Gram matrix 作法是對兩張圖所有的 vertex ,計算一個 joint 的 Gram matrix ,裡面每個 entry 表示兩個 vertex neighborhood 用另一個 kernel 計算出的相似度。
  • 68. Multiscale Laplacian Graph Kernel eigendecompose ● Generalized FLG 然後就可以來 construct vertex feature 了,由於 vertex feature 要 align 同個 space 上,所以我將 Gram matrix 做 eigenvalue decomposition ,令 Q 表示取前 top 大的幾個 eigenvalue 開根號後乘上對 應的 eigenvector 。那麼 G1 的 vertex feature 就是 Q 中對應的到那些 G1 vertex 的 column
  • 69. Multiscale Laplacian Graph Kernel use as base kernel at larger scale comparison ● Generalized FLG – captures neighborhood similarity of the 2 vertex – Each entry in Gram matrix: 那麼這個 kernel FLG kernel 就實現了這篇 paper 核心 概念,比較兩張圖 G1, G2 是由比較他們的 subgraph 的結果決定的,而且可以 recursive 的將 FLG kernel 本身用在 subgraph Gram matrix 的計算 上。那剩下的問題 multiscale 的每個 level 要比較多 大範圍的 neighborhood ?
  • 70. G1 G2 那我來 top-down 的給個例子, MLG 是怎麼比較兩張 圖的。給定兩張圖 G1,G2
  • 72. n1 + n2 來得到兩張圖中所有的 vertex 兩兩 neighborhood 的 相似度,那多大範圍的 neighborhood 呢?假如我取 到 L=2-hop 的 neighborhood
  • 73. GL(u) GL(v) n1 + n2 所以這一層每一個 Gram matrix entry 是用 FLG kernel 在 vertex 兩兩 2-hop 的 neighborhood 算出 的。那算每一對 2-hop neighborhood 的相似度
  • 74. GL-1(uu) GL-1(vv) |GL (u)| + |GL (v)| GL(u) GL(v) n1 + n2 又必須先計算這兩個 2-hop neighborhood 中所有點兩 兩 vertex 在 1-hop neighborhood 相似度
  • 75. GL-1(uu) GL-1(vv) |GL (u)| + |GL (v)| ... G0(uuu) G0(vvv) |G1 (uu)| + |G1 (vv)| GL(u) GL(v) n1 + n2 就這樣展開直到對底層比較兩個 vertex 就用最基本的 cosine similarity kernel 來算。那你想像這樣計算的 時間複雜度非常高,多高呢?
  • 76. GL-1(uu) GL-1(vv) |GL (u)| + |GL (v)| ... G0(uuu) G0(vvv) |G1 (uu)| + |G1 (vv)| GL(u) GL(v) n1 + n2 假如 G1,G 分別有 n1 和 n2 個點,最上層總共會做 C n1+n2 取 2 次 L-hop neighborhood 比較,而每一對 L-hop neighborhood 會算幾次呢?不是固定的但我 們可以來想 upper bound 情況是可能會包含到所有 點,比如圖中任何兩個點之間都有 edge 話,那 L- hop neighborhood 會包含到所有點,接下來到最底 層也是,所以整體複雜度是 n 的 2L+2 次方,超慢, 實務上根本無法使用
  • 77. Speedup ● Recursive approach compares the same subgraph pairs multiple times 有沒有辦法可以做加速呢? top-down recursive 算法 顯然存在很多重複的計算,比如這兩次的 kernel evaluation ,紅色上下兩張圖是不同點的 2-hop neighborhood
  • 78. Speedup ● Recursive approach compares the same subgraph pairs multiple times ● Can you speedup the computation? 但他們有很多重複的點,往下展開的計算就會重複, 對於這種問題,一個有好好學過 algorithm 人會有 sense 知道怎麼加速。
  • 79. Speedup ● Dynamic programming – compute all 1st level pairs, then 2nd level to lookup 1st level pairs, … and so on – 沒錯!就是 dynamic programming ,我從底一層算 起,每次層直接算所有點兩兩 neighborhood 的相似 度,然後慢慢算到 L 層。總共需要的 kernel evaluation 降到 Ln^2 次,而每一次 kernel evaluation 必須要 decompose inverse Laplacian matrix 所以需要 n^3 次。整體複雜度是 Ln^5
  • 80. Speedup ● Dynamic programming – compute all 1st level pairs, then 2nd level to lookup 1st level pairs, … and so on – – For dataset with M graph, require ● Still too slow, how to speedup further? 而且只計算了兩張圖而已,假如我的 dataset 有 M 張 圖,複雜度是 LM^2n^5 ,以 graph kernel 來說複雜 度還是非常高,平均圖有 100 個點的話複雜度就超 過了 10^10 要算幾個小時。所以,還有沒有辦法做 加速呢?仔細想想還有重複做的東西。
  • 81. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – eigendecompose Gram matrix to get basis vector 他們的加速你聽起來會笑,他們想在每個 level 的比 較,一次 construct 出所有圖所有點的 feature ,所 以算了一個超級大的 Gram matrix ,包含了 dataset 中所有圖的點兩兩之間的相似度。
  • 82. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space ● But, decompose the huge Gram matrix is costly and the # of basis increases – eigendecompose Gram matrix to get basis vector 但 dataset 量一增大不用增到多大 1000 張圖好了,這 個 Gram matrix 就已經放不下 memory 了,別說還 要做 eigenvalue decomposition ,比起一次只 construct 兩張圖的 vertex feature ,這個超大的 Gram matrix 包含了更多的 eigenvector ,效果和每 次兩兩比較是不一樣的。 可是他們的複雜度實在是太誇張了,他們不管這個問 題,先把複雜度降低一點再說。 你們有想法嗎?
  • 83. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – sample only vertex and select basis vector 他們作法就是我只 sample N 個點,然後只取到前 P 大 eigenvalue 對應的 eigenvector 當作 basis ,然 後一次將所有點 project 到這 P 個 basis spanned 的 空間上。
  • 84. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – sample only vertex and select basis vector ● Time complexity for comparing M graphs 最後複雜度是這樣, dataset 大小乘上 L 層再乘上 sample 量的三次方或 basis 數的三次方,總之對於 有 1000 個圖的 dataset sample 量取 100 可能就要 算幾個小時了。
  • 85. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – sample only vertex and select basis vector ● Time complexity for comparing M graphs – can speedup only when all graphs are known 而且還有幾個問題,第一個是他們這種加速,因為要 一次 construct 所有圖的 feature ,也就是說當我想 用在沒看過得圖的比較時,這個加速就沒辦法用 了,因為 dataset 中原本已經 construct 好的圖的 vertex feature 要重算。
  • 86. Speedup ● At each level of DP, – compute Gram matrix between all vertex of all graphs – then, project all vertex onto the same space – sample only vertex and select basis vector ● Time complexity for comparing M graphs – can speedup only when all graphs are known – large graph might require large sample size and have to select more basis vector 再來第二點,顯而易見的不 scale 阿,對大一點的圖 來說,加速用的 sample 量要提高而且 basis 也會比 較多。
  • 87. Experiment – setting Time complexity for comparing M graphs 實驗姑且看一下,在小小的 benchmark dataset 的, 用了加速方法取 10 個 basis ,幾百個 sample 的 performance ,我很想看它的計算時間但他們沒放。
  • 88. Conclusion ● Refreshing novel approach and so intense to fit into just 9 pages 確實是個 novel 寫起來千迴百轉看得很過癮的方法, 只有 9 頁對他們來說太少了,他們 present 時一張 圖都沒有,我花了點時間才參透這個方法。
  • 89. Conclusion ● Refreshing novel approach and so intense to fit into just 9 pages ● However, it’s – hard to scale on large dataset – require node features, and – require all graphs are known to be computation feasible 但是呢,複雜度太高了實務上很難用阿。而且要計算 兩兩 vertex 相似度他們還需要額外的 node feature ,而我們不用。再來是當要計算的圖是沒看 過的,他們所需的計算時間又升回 node 數的 5 次 方。
  • 90. Conclusion ● Refreshing novel approach and so intense to fit into just 9 pages ● Nevermind, their solid theoretical contribution makes up for its limited use in practice. ● However, it’s – hard to scale on large dataset – require node features, and – require all graphs are known to be computation feasible 但都沒關係, machine learning community 欣賞的就 是他們這樣理論很紮實的方法,這些實務上的缺 陷,甚至他們在 related work 介紹時還講錯都可以 原諒。