We review our recent progress in the development of graph kernels. We discuss the hash graph kernel framework, which makes the computation of kernels for graphs with vertices and edges annotated with real-valued information feasible for large data sets. Moreover, we summarize our general investigation of the benefits of explicit graph feature maps in comparison to using the kernel trick. Our experimental studies on real-world data sets suggest that explicit feature maps often provide sufficient classification accuracy while being computed more efficiently. Finally, we describe how to construct valid kernels from optimal assignments to obtain new expressive graph kernels. These make use of the kernel trick to establish one-to-one correspondences. We conclude by a discussion of our results and their implication for the future development of graph kernels.
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Recent Advances in Kernel-Based Graph Classification
1. Recent Advances in Kernel-Based Graph
Classification
ECML PKDD 2017, Nectar Track
Nils Kriege, Christopher Morris
June 20, 2017
TU Dortmund University, Algorithm Engineering Group
4. Primer on Graph Kernels
Question
How similar are two graphs?
Definition (Graph Kernel)
Let 𝒢 be a non-empty set of graphs and let k: 𝒢 × 𝒢 → R. Then k is
a graph kernel if there is a real Hilbert space ℋ and a feature map
𝜑: 𝒢 → ℋ such that k(G, H) = ⟨𝜑(G), 𝜑(H)⟩.
Explicit vs. Implicit
Exp.
(EX)
Imp.
(IM)
G H
Inner Product
PSD function
𝜑(G)
𝜑(H)
k(G, H)
3
5. Talk Structure
1 Explict vs. Implicit Graph Kernels, IEEE ICDM 2014
2 Fast Kernels for Graphs with Continuous Labels, IEEE ICDM 2016
3 Graph Kernels Based on Optimal Assignments, NIPS 2016
4 Outlook/What’s next?
4
6. Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
N. Kriege, M. Neumann, K. Kersting, and P. Mutzel. “Explicit versus
Implicit Graph Feature Maps: A Computational Phase Transition for
Walk Kernels”. In: IEEE International Conference on Data Mining.
2014, pp. 881–886
N. M. Kriege, M. Neumann, C. Morris, K. Kersting, and P. Mutzel. “A
Unifying View of Explicit and Implicit Feature Maps for Structured
Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676
(2017). url: http://arxiv.org/abs/1703.00676
5
7. Part I: Explicit vs. Implicit Graph Kernels
𝜑 Cont. Labels Run time
Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔
)
Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4
)
Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2
)
GraphHopper [Feragen et al., 2013] IM 𝒪(n2
m)
Graphlet [Shervashidze et al., 2009] EX
NSPDK [Costa et al., 2010] EX
Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm)
Propagation [Neumann et al., 2016] EX
Implicit vs. Explicit
• Implicit Kernels: do not scale, extendable to continuous labels
• Explicit Kernels: do scale, only discrete labels
6
8. Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
7
9. Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
Contribution
• Conditions under which the computation of a
finite-dimensional explicit mapping is possible
7
10. Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
Contribution
• Conditions under which the computation of a
finite-dimensional explicit mapping is possible
• Explicit feature maps for convolution kernels
7
11. Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
Contribution
• Conditions under which the computation of a
finite-dimensional explicit mapping is possible
• Explicit feature maps for convolution kernels
• Weighted vertex kernels: derived approximate finite-dimensional
explicit feature maps
7
12. Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
Contribution
• Conditions under which the computation of a
finite-dimensional explicit mapping is possible
• Explicit feature maps for convolution kernels
• Weighted vertex kernels: derived approximate finite-dimensional
explicit feature maps
• Validated theoretical results in experimental study
7
13. Part I: Explicit vs. Implicit Graph Kernels
implicit
explicit
100
150
200
250
300
Data set size
0
1020
30
40
50
60
Label diversity
0
2
4
6
8
10
Runtime [s]
Experimental Results
Discrete Labels: explicit feature maps outperform implicit kernels
(for most kernels and benchmark data sets)
8
14. Part I: Explicit vs. Implicit Graph Kernels
implicit
explicit
100
150
200
250
300
Data set size
0
1020
30
40
50
60
Label diversity
0
2
4
6
8
10
Runtime [s]
Experimental Results
Discrete Labels: explicit feature maps outperform implicit kernels
(for most kernels and benchmark data sets)
Continuous Labels: approximation by explicit feature maps not
competitive for complex kernels
8
15. Part II: Hash Graph Kernel Framework
Challenge
Design fast, explicit graph kernels that can handle continuous
labels.
C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel. “Faster Kernel for
Graphs with Continuous Attributes via Hashing”. In: IEEE
International Conference on Data Mining. 2016, pp. 1095–1100
[ 1.2
0.3 ]
[ 9.1
0.9 ]
[ 1.6
0.7 ]
[ 5.2
1.0 ]
[ 5.1
0.2 ]
[ 1.0
0.2 ]
9
16. Part II: Hash Graph Kernel Framework
Challenge
Design fast, explicit graph kernels that can handle continuous
labels.
𝜑 Cont. Labels Run time
Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔
)
Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4
)
Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2
)
GraphHopper [Feragen et al., 2013] IM 𝒪(n2
m)
Graphlet [Shervashidze et al., 2009] EX
NSPDK [Costa et al., 2010] EX
Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm)
Propagation [Neumann et al., 2016] EX
HGK Framework [Morris et al., 2016] EX Linear in BK
10
18. Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
12
19. Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
12
20. Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
• Theoretical approximation bounds
12
21. Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
• Theoretical approximation bounds
• State-of-the-art classification accuracies but orders of
magnitude faster than implicit kernels
12
22. Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
• Theoretical approximation bounds
• State-of-the-art classification accuracies but orders of
magnitude faster than implicit kernels
Question
Is there no benefit from employing the kernel trick at all on the
graph domain?
12
23. Part III: Positive-semidefinite Optimal Assignments
Challenge
Design valid graph kernel that is based on optimal assignments.
X Y
a
a
a
b
c
a
b
b
c
c
N. M. Kriege, Giscard. P.-L., and R. C. Wilson. “On Valid Optimal
Assignment Kernels and Applications to Graph Classification”. In:
Advances in Neural Information Processing Systems. 2016,
pp. 1615–1623
13
24. Part III: Positive-semidefinite Optimal Assignments
Intuition
Optimal Assignments are a “natural” measure of similarity.
Definition (Optimal Assignment Kernel)
Let ℬ(X, Y) be the bijections between X, Y in [𝒮]n
, the optimal
assignment kernel on [𝒮]n
is defined as
Kk
ℬ(X, Y) = max
B∈ℬ(X,Y)
W(B), where W(B) =
∑︁
(x,y)∈B
k(x, y)
and k is a base kernel on 𝒮.
14
25. Part III: Positive-semidefinite Optimal Assignments
Previous Work:
• Optimal assignment kernels for attributed molecular graphs
[Fröhlich, Wegner, Sieker, Zell, 2005], ICML
• The optimal assignment kernel is not positive definite
[Vert, 2008], CoRR, abs/0801.4061
Problem
Optimal assignments yield indefinite functions.
15
26. Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
16
27. Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
16
28. Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
16
29. Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
• strong kernels give rise to hierarchies
16
30. Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
• strong kernels give rise to hierarchies
Contribution
• Strong base kernels that guarantee PSD optimal assignment
kernels
16
31. Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
• strong kernels give rise to hierarchies
Contribution
• Strong base kernels that guarantee PSD optimal assignment
kernels
• Linear time computation of optimal assignment kernels
16
32. Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
• strong kernels give rise to hierarchies
Contribution
• Strong base kernels that guarantee PSD optimal assignment
kernels
• Linear time computation of optimal assignment kernels
• Weisfeiler-Lehman optimal assignment kernels 16
33. Outlook/What’s next?
Classical Graph Kernels
C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017
17
34. Outlook/What’s next?
Classical Graph Kernels
C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017
Optimization Based Graph Feature Maps
Classical:
Feature Engineering
Phase I
Classifier
Phase II
17
35. Outlook/What’s next?
Classical Graph Kernels
C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017
Optimization Based Graph Feature Maps
Classical:
Feature Engineering
Phase I
Classifier
Phase II
End-to-End:
Feature Engineering + Classifier
Phase I + Phase II
Optimize Parameters
17
36. Conclusion
1 Explicit vs. Implicit Kernels
2 Hash Graph Kernel Framework
3 Valid Kernels from Optimal Assignments
Collection of Graph Classification Benchmarks
graphkernels.cs.tu-dortmund.de
18
37. References I
Kriege, N. M., Giscard. P.-L., and R. C. Wilson. “On Valid Optimal
Assignment Kernels and Applications to Graph Classification”. In:
Advances in Neural Information Processing Systems. 2016,
pp. 1615–1623.
Kriege, N. M. et al. “A Unifying View of Explicit and Implicit Feature
Maps for Structured Data: Systematic Studies of Graph Kernels”. In:
CoRR abs/1703.00676 (2017). url:
http://arxiv.org/abs/1703.00676.
Kriege, N. et al. “Explicit versus Implicit Graph Feature Maps: A
Computational Phase Transition for Walk Kernels”. In: IEEE
International Conference on Data Mining. 2014, pp. 881–886.
Morris, C., K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017.
19
38. References II
Morris, C. et al. “Faster Kernel for Graphs with Continuous Attributes
via Hashing”. In: IEEE International Conference on Data Mining.
2016, pp. 1095–1100.
20