Graph Kernels in Chemoinformatics
Luc Brun
Co authors: Benoit Ga¨uz`ere(MIVIA), Pierre Anthony Grenier(GREYC), Alessia Saggese(MIVIA), Didier
Villemin (LCMT)
Bordeaux,
November, 20 2014
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Plan
1 Introduction
2 Graph Kernels
3 Treelets kernel
4 Cyclic similarity
5 Conclusion
6 Extensions
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 1 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Plan
1 Introduction
2 Graph Kernels
3 Treelets kernel
4 Cyclic similarity
5 Conclusion
6 Extensions
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 1 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Motivation
Structural Pattern Recognition
Rich description of objects
Poor properties of graph’s space does not allow to readily
generalize/combine sets of graphs
Statistical Pattern Recognition
Global description of objects
Numerical spaces with many mathematical properties (metric,
vector space, . . . ).
Motivation
Analyse large famillies of structural and numerical objects using a unified
framework based on pairwise similarity.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 2 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Chemoinformatics
Usages of computer science for chemistry
Prediction of molecular properties
Prediction of biological activities
Toxicity, cancerous . . .
Prediction of physico-chemical properties.
Boiling point, LogP, ability to capture CO2. . .
Screening :
Synthesis of molecule,
Test and validation of the synthesized
molecules
Virtual Screening :
Selection of a limited set of candidate
molecules,
Synthesis and test of the selected molecules,
Gain of Time and Money
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 3 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Molecular representation
Molecular Graph:
Vertice atoms.
Edges Atomic bonds.
H3
C
Cl O
O
O P
S
O
CH3
O CH3
Cl
P
S
O
C
C
C
C
C C
C
CC
C C
C
C C
O
O O
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 4 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Molecular representation
Molecular Graph:
Vertice atoms.
Edges Atomic bonds.
H3
C
Cl O
O
O P
S
O
CH3
O CH3
Cl
P
S
O
C
C
C
C
C C
C
CC
C C
C
C C
O
O O
O
Similarity principle [Johnson and Maggiora, 1990]
“Similar molecules have similar properties.”
Design of similarity measures between molecular graphs.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 4 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Plan
1 Introduction
2 Graph Kernels
3 Treelets kernel
4 Cyclic similarity
5 Conclusion
6 Extensions
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 4 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Kernel Theory
Kernel k : X × X → R
Symmetric: k(xi , xj ) = k(xj , xi ),
For any dataset D = {x1, . . . , xn}, Gram’s matrix K ∈ Rn×n
is defined by
Ki,j = k(xi , xj ).
Semi-definite positive: ∀α ∈ Rn
, αT
Kα ≥ 0
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 5 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Kernel Theory
Kernel k : X × X → R
Symmetric: k(xi , xj ) = k(xj , xi ),
For any dataset D = {x1, . . . , xn}, Gram’s matrix K ∈ Rn×n
is defined by
Ki,j = k(xi , xj ).
Semi-definite positive: ∀α ∈ Rn
, αT
Kα ≥ 0
Connection with Hilbert spaces, reproducing kernel Hilbert
Spaces [Aronszajn, 1950]:
Relationship scalar product ⇔ kernel :
k(x, x ) = Φ(x), Φ(x ) H.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 5 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Graph Kernels
Graph kernels in chemoinformatics
Graph kernel k : G × G → R.
Similarity measure between molecular graphs
Natural encoding of a molecule
Implicit embedding in H,
Machine learning methods (SVM, KRR, . . . ).
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 6 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Graph Kernels
Graph kernels in chemoinformatics
Graph kernel k : G × G → R.
Similarity measure between molecular graphs
Natural encoding of a molecule
Implicit embedding in H,
Machine learning methods (SVM, KRR, . . . ).
Natural connection between molecular graphs and machine learning methods
Definition of a similarity measure as a kernel
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 6 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Kernels based on bags of patterns
Graph kernels based on bags of patterns
(1) Extraction of a set of patterns from graphs,
(2) Comparison between patterns,
(3) Comparison between bags of patterns.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 7 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Plan
1 Introduction
2 Graph Kernels
3 Treelets kernel
4 Cyclic similarity
5 Conclusion
6 Extensions
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 7 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Treelets
Set of labeled subtrees with at most 6 nodes :
Encode branching points
Take labels into account.
Pertinent from a chemical point of view
limited set of substructures
Linear complexity (in O(nd5
)).
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 8 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
enumaration
Labelling
Code encoding the labels,
Defined on each of the 14 structure of treelet,
Computation in constant time.
H3C
CH3
CH3
OOCC
CC
CC
CCC
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 9 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
enumaration
Labelling
Code encoding the labels,
Defined on each of the 14 structure of treelet,
Computation in constant time.
H3C
CH3
CH3
OOCC
CC
CC
CCC
Canonical code :
type of treelet,
canonical code : G9-C1C1C1O1C1C
Labels.
t t ⇔ code(t) = code(t )
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 9 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Definition of the Treelet kernel
Counting treelet’s occurences
ft(G) : Number of occurences of treelet t in G.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 10 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Definition of the Treelet kernel
Counting treelet’s occurences
ft(G) : Number of occurences of treelet t in G.
Treelet kernel
kT (G, G ) =
t∈T (G)∩T (G )
k(ft(G), ft(G ))
T (G): Set of treelets of G.
k(., .): Kernel between real numbers.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 10 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Definition of the Treelet kernel
Counting treelet’s occurences
ft(G) : Number of occurences of treelet t in G.
Treelet kernel
kT (G, G ) =
t∈T (G)∩T (G )
kt(G, G )
T (G) : Set of treelets of G.
kt(G, G ) = k(ft(G), ft(G ).
kt(., .) : Similarity according to t.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 10 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Pattern selection
Prediction problems:
Different properties Different causes.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 11 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Pattern selection
Prediction problems:
Different properties Different causes.
Chemist approach:
Pattern relevant for a precise property
A priori selection by a chemical expert
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 11 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Pattern selection
Prediction problems:
Different properties Different causes.
Chemist approach:
Pattern relevant for a precise property
A priori selection by a chemical expert
Machine learning approach:
Parcimonious set of relevant treelets
Selection a posteriori induced by data.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 11 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Multiple kernel learning
Combination of kernels [Rakotomamonjy et al., 2008] :
k(G, G ) =
t∈T
kt(G, G )
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 12 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Multiple kernel learning
Optimal Combination of kernels [Rakotomamonjy et al., 2008] :
kW (G, G ) =
t∈T
wt kt(G, G )
T : Set of treelets extracted from the learning dataset.
wt ∈ [0, 1] : Measure the influence of t
Weighing of each sub kernel
Selection of relevant treelets
Weighs computed for a given property.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 12 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Experiments
Prediction of the boiling point of acyclic molecules:
183 acyclic molecules :
Learning set: 80 %



10×Validation set: 10 %
Test set: 10 %
Selection of optimal parameters using a grid search
RMSE obtained on the 10 test sets.
OO
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 13 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Experiments
Prediction of the boiling point of acyclic molecules:
M´ethode RMSE (◦
C)
Descriptor based approach [Cherqaoui et al., 1994]* 5,10
Empirical embedding [Riesen, 2009] 10,15
Gaussian kernel based on the graph edit distance [Neuhaus and Bunke, 2007] 10,27
Kernel on paths [Ralaivola et al., 2005] 12,24
Kernel on random walks [Kashima et al., 2003] 18,72
Kernel on tree patterns [Mah´e and Vert, 2009] 11,02
Weisfeiler-Lehman Kernel[Shervaszide, 2012] 14,98
Treelet kernel 6,45
Treelet kernel with MKL 4,22
* leave-one-out
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 14 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Plan
1 Introduction
2 Graph Kernels
3 Treelets kernel
4 Cyclic similarity
5 Conclusion
6 Extensions
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 14 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Cyclic molecular information
Cycles of a molecule
Have a large influence on molecular properties.
Identifies some molecular’s families.
(a) Benz`ene
N
(b) Indole (c) Cyclohexane
Cyclic information must be taken into account.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 15 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant Cycles
Enumeration of all cycles is NP complete
Cycles of a graph form a vector space on Z
2Z .
HN
HN
O
O
OH
O
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 16 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant Cycles
Enumeration of relevant cycles [Vismara, 1995]:
Union of all cyclic bases with a minimal size,
Unique for each graph,
Enumeration in polynomial time,
Kernel on relevant cycles [Horv´ath, 2005].
HN
HN
O
O
OH
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 17 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycles Graph
Graph of relevant cycles GC(G) [Vismara, 1995] :
Representation of the cyclic system,
Vertices: Relevant cycles,
Edges: intersection between relevant cycles.
HN
HN
O
O
OH
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 18 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycles Graph
Graph of relevant cycles GC(G) [Vismara, 1995] :
Representation of the cyclic system,
Vertices: Relevant cycles,
Edges: intersection between relevant cycles.
HN
HN
O
O
OH
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 18 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycles Graph
Graph of relevant cycles GC(G) [Vismara, 1995] :
Representation of the cyclic system,
Vertices: Relevant cycles,
Edges: intersection between relevant cycles.
HN
HN
O
O
OH
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 18 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycles Graph
Similarity between relevant cycle graphs
Enumeration of relevant cycle graph’s treelets.
Adjacency relationship between relevant cycles.
Extracted Patterns :
HN
HN
O
O
OH
Solely cyclic information.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 19 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycle Hypergraph
Connexion between cycles and acyclic parts
Idea : Molecular Representation:
Global,
Explicit cyclic Information
Addition of acyclic parts to the relevant cycle graph.
HN
HN
O
O
OH
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 20 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycle Hypergraph
Connexion between cycles and acyclic parts
Idea : Molecular Representation:
Global,
Explicit cyclic Information
Addition of acyclic parts to the relevant cycle graph.
HN
HN
O
O
OH
O
OH
HN
HN
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 20 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycle Hypergraph
Problem : An atomic bound can connect an atom to two cycles.
N
S
S
C
O
N
S
S
C
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 21 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycle Hypergraph
Connexion between cycles and acyclic parts
hypergraph representation of the molecule.
Adjacency relationship: oriented hyper-edges.
N
S
S
C
O
N
S
S
C
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 22 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Relevant cycle Hypergraph
Connexion between cycles and acyclic parts
hypergraph representation of the molecule.
Adjacency relationship: oriented hyper-edges.
N
S
S
C
O
N
S
S
C
O
Hypergraph comparison.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 22 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Treelets on relevant cycle hypergraphs
Adaptation of the treelet kernel:
(1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges.
N
S
S
C
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Treelets on relevant cycle hypergraphs
Adaptation of the treelet kernel:
(1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges.
N
S
S
C
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Treelets on relevant cycle hypergraphs
Adaptation of the treelet kernel:
(1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges.
N
S
S
C
O
S
N
S
C
N C
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Treelets on relevant cycle hypergraphs
Adaptation of the treelet kernel:
(1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges.
(2) Contraction of cycles incident to each hyper-edge.
Construction of the reduced relevant cycle hypergraph GRC (G).
N
S
S
C
O
N
S
S
C
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Treelets on relevant cycle hypergraphs
Adaptation of the treelet kernel:
(1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges.
(2) Contraction of cycles incident to each hyper-edge.
Construction of the reduced relevant cycle hypergraph GRC (G).
(3) Extraction of treelets containing an initial hyper-edge
N
S
S
C
O
N
S
O
S
O
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Kernel on relevant cycle hypergraphs
Bag of patterns: TCR (G) = T1 ∪ T2
T1: Set of treelets obtained from the hypergraph without hyper-edges.
T2 : Set of treelets extracted from GRC (G).
N
S
S
C
O
S
O
S
N C
O
N
S
C
N C
O
S
N C
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 24 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Kernel on relevant cycle hypergraphs
Bag of patterns: TCR (G) = T1 ∪ T2
T1: Set of treelets obtained from the hypergraph without hyper-edges.
T2 : Set of treelets extracted from GRC (G).
Kernel design:
kT (G, G ) =
t∈TCR (G)∩TCR (G )
wt kt(ft(G), ft(G ))
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 24 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Experiments
Predictive Toxicity Challenge
Molecular graph labelled and composed of cycles.
340 molecules for each dataset
4 classes of animals (MM, FM, MR, FR) :
Learning set 310 mol´ecules,
10×Test set 34 molecules,
Number of molecules correctly classified on the 10 test sets.
M´ethode MM FM MR FR
Treelet kernel (TK) 208 205 209 212
Relevant cycle kernel 209 207 202 228
TK on relevant cycle graph (TC) 211 210 203 232
TK on relevant cycle hypergraph
217 224 207 233
(TCH)
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 25 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Experiments
Predictive Toxicity Challenge
M´ethode MM FM MR FR
Treelet kernel (TK) 208 205 209 212
Relevant cycle kernel 209 207 202 228
TK on relevant cycle graph (TC) 211 210 203 232
TK on relevant cycle hypergraph
217 224 207 233
(TCH)
TK + MKL 218 224 224 250
TC + MKL 216 213 212 237
TCH + MKL 225 229 215 239
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 26 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Experiments
M´ethode MM FM MR FR
Treelet kernel (TK) 208 205 209 212
Relevant cycle kernel 209 207 202 228
TK on relevant cycle graph (TC) 211 210 203 232
TK on relevant cycle hypergraph
217 224 207 233
(TCH)
TK + MKL 218 224 224 250
TCH + MKL 225 229 215 239
Combo TK - TCH 225 230 224 252
Gaussian kernel on graph edit distance [Neuhaus and Bunke, 2007] 223 212 194 234
Laplacien kernel 207 186 209 203
Kernel on paths [Ralaivola et al., 2005]* 223 225 226 234
Weisfeiler-Lehman Kernel [Shervaszide, 2012] 138 154 221 242
* leave-one-out
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 27 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Plan
1 Introduction
2 Graph Kernels
3 Treelets kernel
4 Cyclic similarity
5 Conclusion
6 Extensions
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 27 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Conclusion
General problem:
Kernel between molecular graphs including cyclic and acyclic patterns in order to
predict molecular properties.
Extensions :
Cyclic Information:
Encoding the relative positioning of atoms
R2
R1
R3
R2
R1
R3
Taking into account stereoisometry.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 28 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Plan
1 Introduction
2 Graph Kernels
3 Treelets kernel
4 Cyclic similarity
5 Conclusion
6 Extensions
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 28 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Kernel between shapes
Compute skeletons
Encode skeletons by graphs,
Kernel between bags of treelets.
a) le graphe
b) la segmentation
induite
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 29 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Kernel on strings
050100150200250300350
0
50
100
150
200
250
300
350
400
450
10
20
30
40
50
60
0 50 100 150 200 250 300 350
0
50
100
150
200
250
300
350
400
450
10
20
30
40
50
60
1
2 6
3
4
7
5 8
9
10
11
13
12
14
1516
17
18
19
20
21
22
23
2425
26
2728
29
30
Decompose the scene into zones,
Compute a string encoding the sequence of traversed zones,
Define a kernel between strings,
Cluster strings.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 30 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Bibliographie I
Aronszajn, N. (1950).
Theory of reproducing kernels.
Transactions of the American Mathematical Society, 68(3):337–404.
Cherqaoui, D., Villemin, D., Mesbah, A., Cense, J. M., and Kvasnicka, V.
(1994).
Use of a Neural Network to Determine the Normal Boiling Points of Acyclic
Ethers, Peroxides, Acetals and their Sulfur Analogues.
J. Chem. Soc. Faraday Trans., 90:2015–2019.
Horv´ath, T. (2005).
Cyclic pattern kernels revisited.
In Springer-Verlag, editor, Proceedings of the 9th Pacific-Asia Conference on
Knowledge Discovery and Data Mining, volume 3518, pages 791 – 801.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 30 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Bibliographie II
Johnson, M. A. and Maggiora, G. M., editors (1990).
Concepts and Applications of Molecular Similarity.
Wiley.
Kashima, H., Tsuda, K., and Inokuchi, A. (2003).
Marginalized Kernels Between Labeled Graphs.
Machine Learning.
Mah´e, P. and Vert, J.-p. (2009).
Graph kernels based on tree patterns for molecules.
Machine Learning, 75(1)(September 2008):3–35.
Neuhaus, M. and Bunke, H. (2007).
Bridging the gap between graph edit distance and kernel machines, volume 68
of Series in Machine Perception and Artificial Intelligence.
World Scientific Publishing.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 31 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Bibliographie III
Rakotomamonjy, A., Bach, F., Canu, S., and Grandvalet, Y. (2008).
SimpleMKL.
Journal of Machine Learning Research, 9:2491–2521.
Ralaivola, L., Swamidass, S. J., Saigo, H., and Baldi, P. (2005).
Graph kernels for chemical informatics.
Neural networks : the official journal of the International Neural Network
Society, 18(8):1093–110.
Riesen, K. (2009).
Classification and Clustering of Vector Space Embedded Graphs.
PhD thesis, Institut f¨ur Informatik und angewandte Mathematik Universit¨at
Bern.
Shervaszide, N. (2012).
Scalable Graph Kernels.
PhD thesis, Universit¨at T¨ubingen.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 32 / 32
Introduction
Graph Kernels
Treelets kernel
Cyclic similarity
Conclusion
Extensions
Bibliographie IV
Vismara, P. (1995).
Reconnaissance et repr´esentation d’´el´ements structuraux pour la description
d’objets complexes. Application `a l’´elaboration de strat´egies de synth`ese en
chimie organique.
PhD thesis, Universit´e Montpellier II.
L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 33 / 32

Graph kernels

  • 1.
    Graph Kernels inChemoinformatics Luc Brun Co authors: Benoit Ga¨uz`ere(MIVIA), Pierre Anthony Grenier(GREYC), Alessia Saggese(MIVIA), Didier Villemin (LCMT) Bordeaux, November, 20 2014
  • 2.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Plan 1 Introduction 2 Graph Kernels 3 Treelets kernel 4 Cyclic similarity 5 Conclusion 6 Extensions L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 1 / 32
  • 3.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Plan 1 Introduction 2 Graph Kernels 3 Treelets kernel 4 Cyclic similarity 5 Conclusion 6 Extensions L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 1 / 32
  • 4.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Motivation Structural Pattern Recognition Rich description of objects Poor properties of graph’s space does not allow to readily generalize/combine sets of graphs Statistical Pattern Recognition Global description of objects Numerical spaces with many mathematical properties (metric, vector space, . . . ). Motivation Analyse large famillies of structural and numerical objects using a unified framework based on pairwise similarity. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 2 / 32
  • 5.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Chemoinformatics Usages of computer science for chemistry Prediction of molecular properties Prediction of biological activities Toxicity, cancerous . . . Prediction of physico-chemical properties. Boiling point, LogP, ability to capture CO2. . . Screening : Synthesis of molecule, Test and validation of the synthesized molecules Virtual Screening : Selection of a limited set of candidate molecules, Synthesis and test of the selected molecules, Gain of Time and Money L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 3 / 32
  • 6.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Molecular representation Molecular Graph: Vertice atoms. Edges Atomic bonds. H3 C Cl O O O P S O CH3 O CH3 Cl P S O C C C C C C C CC C C C C C O O O O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 4 / 32
  • 7.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Molecular representation Molecular Graph: Vertice atoms. Edges Atomic bonds. H3 C Cl O O O P S O CH3 O CH3 Cl P S O C C C C C C C CC C C C C C O O O O Similarity principle [Johnson and Maggiora, 1990] “Similar molecules have similar properties.” Design of similarity measures between molecular graphs. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 4 / 32
  • 8.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Plan 1 Introduction 2 Graph Kernels 3 Treelets kernel 4 Cyclic similarity 5 Conclusion 6 Extensions L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 4 / 32
  • 9.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Kernel Theory Kernel k : X × X → R Symmetric: k(xi , xj ) = k(xj , xi ), For any dataset D = {x1, . . . , xn}, Gram’s matrix K ∈ Rn×n is defined by Ki,j = k(xi , xj ). Semi-definite positive: ∀α ∈ Rn , αT Kα ≥ 0 L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 5 / 32
  • 10.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Kernel Theory Kernel k : X × X → R Symmetric: k(xi , xj ) = k(xj , xi ), For any dataset D = {x1, . . . , xn}, Gram’s matrix K ∈ Rn×n is defined by Ki,j = k(xi , xj ). Semi-definite positive: ∀α ∈ Rn , αT Kα ≥ 0 Connection with Hilbert spaces, reproducing kernel Hilbert Spaces [Aronszajn, 1950]: Relationship scalar product ⇔ kernel : k(x, x ) = Φ(x), Φ(x ) H. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 5 / 32
  • 11.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Graph Kernels Graph kernels in chemoinformatics Graph kernel k : G × G → R. Similarity measure between molecular graphs Natural encoding of a molecule Implicit embedding in H, Machine learning methods (SVM, KRR, . . . ). L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 6 / 32
  • 12.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Graph Kernels Graph kernels in chemoinformatics Graph kernel k : G × G → R. Similarity measure between molecular graphs Natural encoding of a molecule Implicit embedding in H, Machine learning methods (SVM, KRR, . . . ). Natural connection between molecular graphs and machine learning methods Definition of a similarity measure as a kernel L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 6 / 32
  • 13.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Kernels based on bags of patterns Graph kernels based on bags of patterns (1) Extraction of a set of patterns from graphs, (2) Comparison between patterns, (3) Comparison between bags of patterns. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 7 / 32
  • 14.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Plan 1 Introduction 2 Graph Kernels 3 Treelets kernel 4 Cyclic similarity 5 Conclusion 6 Extensions L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 7 / 32
  • 15.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Treelets Set of labeled subtrees with at most 6 nodes : Encode branching points Take labels into account. Pertinent from a chemical point of view limited set of substructures Linear complexity (in O(nd5 )). L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 8 / 32
  • 16.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions enumaration Labelling Code encoding the labels, Defined on each of the 14 structure of treelet, Computation in constant time. H3C CH3 CH3 OOCC CC CC CCC L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 9 / 32
  • 17.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions enumaration Labelling Code encoding the labels, Defined on each of the 14 structure of treelet, Computation in constant time. H3C CH3 CH3 OOCC CC CC CCC Canonical code : type of treelet, canonical code : G9-C1C1C1O1C1C Labels. t t ⇔ code(t) = code(t ) L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 9 / 32
  • 18.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Definition of the Treelet kernel Counting treelet’s occurences ft(G) : Number of occurences of treelet t in G. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 10 / 32
  • 19.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Definition of the Treelet kernel Counting treelet’s occurences ft(G) : Number of occurences of treelet t in G. Treelet kernel kT (G, G ) = t∈T (G)∩T (G ) k(ft(G), ft(G )) T (G): Set of treelets of G. k(., .): Kernel between real numbers. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 10 / 32
  • 20.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Definition of the Treelet kernel Counting treelet’s occurences ft(G) : Number of occurences of treelet t in G. Treelet kernel kT (G, G ) = t∈T (G)∩T (G ) kt(G, G ) T (G) : Set of treelets of G. kt(G, G ) = k(ft(G), ft(G ). kt(., .) : Similarity according to t. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 10 / 32
  • 21.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Pattern selection Prediction problems: Different properties Different causes. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 11 / 32
  • 22.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Pattern selection Prediction problems: Different properties Different causes. Chemist approach: Pattern relevant for a precise property A priori selection by a chemical expert L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 11 / 32
  • 23.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Pattern selection Prediction problems: Different properties Different causes. Chemist approach: Pattern relevant for a precise property A priori selection by a chemical expert Machine learning approach: Parcimonious set of relevant treelets Selection a posteriori induced by data. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 11 / 32
  • 24.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Multiple kernel learning Combination of kernels [Rakotomamonjy et al., 2008] : k(G, G ) = t∈T kt(G, G ) L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 12 / 32
  • 25.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Multiple kernel learning Optimal Combination of kernels [Rakotomamonjy et al., 2008] : kW (G, G ) = t∈T wt kt(G, G ) T : Set of treelets extracted from the learning dataset. wt ∈ [0, 1] : Measure the influence of t Weighing of each sub kernel Selection of relevant treelets Weighs computed for a given property. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 12 / 32
  • 26.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Experiments Prediction of the boiling point of acyclic molecules: 183 acyclic molecules : Learning set: 80 %    10×Validation set: 10 % Test set: 10 % Selection of optimal parameters using a grid search RMSE obtained on the 10 test sets. OO L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 13 / 32
  • 27.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Experiments Prediction of the boiling point of acyclic molecules: M´ethode RMSE (◦ C) Descriptor based approach [Cherqaoui et al., 1994]* 5,10 Empirical embedding [Riesen, 2009] 10,15 Gaussian kernel based on the graph edit distance [Neuhaus and Bunke, 2007] 10,27 Kernel on paths [Ralaivola et al., 2005] 12,24 Kernel on random walks [Kashima et al., 2003] 18,72 Kernel on tree patterns [Mah´e and Vert, 2009] 11,02 Weisfeiler-Lehman Kernel[Shervaszide, 2012] 14,98 Treelet kernel 6,45 Treelet kernel with MKL 4,22 * leave-one-out L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 14 / 32
  • 28.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Plan 1 Introduction 2 Graph Kernels 3 Treelets kernel 4 Cyclic similarity 5 Conclusion 6 Extensions L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 14 / 32
  • 29.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Cyclic molecular information Cycles of a molecule Have a large influence on molecular properties. Identifies some molecular’s families. (a) Benz`ene N (b) Indole (c) Cyclohexane Cyclic information must be taken into account. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 15 / 32
  • 30.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant Cycles Enumeration of all cycles is NP complete Cycles of a graph form a vector space on Z 2Z . HN HN O O OH O O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 16 / 32
  • 31.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant Cycles Enumeration of relevant cycles [Vismara, 1995]: Union of all cyclic bases with a minimal size, Unique for each graph, Enumeration in polynomial time, Kernel on relevant cycles [Horv´ath, 2005]. HN HN O O OH O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 17 / 32
  • 32.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycles Graph Graph of relevant cycles GC(G) [Vismara, 1995] : Representation of the cyclic system, Vertices: Relevant cycles, Edges: intersection between relevant cycles. HN HN O O OH O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 18 / 32
  • 33.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycles Graph Graph of relevant cycles GC(G) [Vismara, 1995] : Representation of the cyclic system, Vertices: Relevant cycles, Edges: intersection between relevant cycles. HN HN O O OH L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 18 / 32
  • 34.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycles Graph Graph of relevant cycles GC(G) [Vismara, 1995] : Representation of the cyclic system, Vertices: Relevant cycles, Edges: intersection between relevant cycles. HN HN O O OH L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 18 / 32
  • 35.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycles Graph Similarity between relevant cycle graphs Enumeration of relevant cycle graph’s treelets. Adjacency relationship between relevant cycles. Extracted Patterns : HN HN O O OH Solely cyclic information. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 19 / 32
  • 36.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycle Hypergraph Connexion between cycles and acyclic parts Idea : Molecular Representation: Global, Explicit cyclic Information Addition of acyclic parts to the relevant cycle graph. HN HN O O OH L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 20 / 32
  • 37.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycle Hypergraph Connexion between cycles and acyclic parts Idea : Molecular Representation: Global, Explicit cyclic Information Addition of acyclic parts to the relevant cycle graph. HN HN O O OH O OH HN HN L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 20 / 32
  • 38.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycle Hypergraph Problem : An atomic bound can connect an atom to two cycles. N S S C O N S S C O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 21 / 32
  • 39.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycle Hypergraph Connexion between cycles and acyclic parts hypergraph representation of the molecule. Adjacency relationship: oriented hyper-edges. N S S C O N S S C O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 22 / 32
  • 40.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Relevant cycle Hypergraph Connexion between cycles and acyclic parts hypergraph representation of the molecule. Adjacency relationship: oriented hyper-edges. N S S C O N S S C O Hypergraph comparison. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 22 / 32
  • 41.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Treelets on relevant cycle hypergraphs Adaptation of the treelet kernel: (1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges. N S S C O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
  • 42.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Treelets on relevant cycle hypergraphs Adaptation of the treelet kernel: (1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges. N S S C O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
  • 43.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Treelets on relevant cycle hypergraphs Adaptation of the treelet kernel: (1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges. N S S C O S N S C N C L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
  • 44.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Treelets on relevant cycle hypergraphs Adaptation of the treelet kernel: (1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges. (2) Contraction of cycles incident to each hyper-edge. Construction of the reduced relevant cycle hypergraph GRC (G). N S S C O N S S C O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
  • 45.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Treelets on relevant cycle hypergraphs Adaptation of the treelet kernel: (1) enumeration of the relevant cycle hypergraph’s treelets without hyper-edges. (2) Contraction of cycles incident to each hyper-edge. Construction of the reduced relevant cycle hypergraph GRC (G). (3) Extraction of treelets containing an initial hyper-edge N S S C O N S O S O L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 23 / 32
  • 46.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Kernel on relevant cycle hypergraphs Bag of patterns: TCR (G) = T1 ∪ T2 T1: Set of treelets obtained from the hypergraph without hyper-edges. T2 : Set of treelets extracted from GRC (G). N S S C O S O S N C O N S C N C O S N C L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 24 / 32
  • 47.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Kernel on relevant cycle hypergraphs Bag of patterns: TCR (G) = T1 ∪ T2 T1: Set of treelets obtained from the hypergraph without hyper-edges. T2 : Set of treelets extracted from GRC (G). Kernel design: kT (G, G ) = t∈TCR (G)∩TCR (G ) wt kt(ft(G), ft(G )) L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 24 / 32
  • 48.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Experiments Predictive Toxicity Challenge Molecular graph labelled and composed of cycles. 340 molecules for each dataset 4 classes of animals (MM, FM, MR, FR) : Learning set 310 mol´ecules, 10×Test set 34 molecules, Number of molecules correctly classified on the 10 test sets. M´ethode MM FM MR FR Treelet kernel (TK) 208 205 209 212 Relevant cycle kernel 209 207 202 228 TK on relevant cycle graph (TC) 211 210 203 232 TK on relevant cycle hypergraph 217 224 207 233 (TCH) L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 25 / 32
  • 49.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Experiments Predictive Toxicity Challenge M´ethode MM FM MR FR Treelet kernel (TK) 208 205 209 212 Relevant cycle kernel 209 207 202 228 TK on relevant cycle graph (TC) 211 210 203 232 TK on relevant cycle hypergraph 217 224 207 233 (TCH) TK + MKL 218 224 224 250 TC + MKL 216 213 212 237 TCH + MKL 225 229 215 239 L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 26 / 32
  • 50.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Experiments M´ethode MM FM MR FR Treelet kernel (TK) 208 205 209 212 Relevant cycle kernel 209 207 202 228 TK on relevant cycle graph (TC) 211 210 203 232 TK on relevant cycle hypergraph 217 224 207 233 (TCH) TK + MKL 218 224 224 250 TCH + MKL 225 229 215 239 Combo TK - TCH 225 230 224 252 Gaussian kernel on graph edit distance [Neuhaus and Bunke, 2007] 223 212 194 234 Laplacien kernel 207 186 209 203 Kernel on paths [Ralaivola et al., 2005]* 223 225 226 234 Weisfeiler-Lehman Kernel [Shervaszide, 2012] 138 154 221 242 * leave-one-out L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 27 / 32
  • 51.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Plan 1 Introduction 2 Graph Kernels 3 Treelets kernel 4 Cyclic similarity 5 Conclusion 6 Extensions L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 27 / 32
  • 52.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Conclusion General problem: Kernel between molecular graphs including cyclic and acyclic patterns in order to predict molecular properties. Extensions : Cyclic Information: Encoding the relative positioning of atoms R2 R1 R3 R2 R1 R3 Taking into account stereoisometry. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 28 / 32
  • 53.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Plan 1 Introduction 2 Graph Kernels 3 Treelets kernel 4 Cyclic similarity 5 Conclusion 6 Extensions L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 28 / 32
  • 54.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Kernel between shapes Compute skeletons Encode skeletons by graphs, Kernel between bags of treelets. a) le graphe b) la segmentation induite L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 29 / 32
  • 55.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Kernel on strings 050100150200250300350 0 50 100 150 200 250 300 350 400 450 10 20 30 40 50 60 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 400 450 10 20 30 40 50 60 1 2 6 3 4 7 5 8 9 10 11 13 12 14 1516 17 18 19 20 21 22 23 2425 26 2728 29 30 Decompose the scene into zones, Compute a string encoding the sequence of traversed zones, Define a kernel between strings, Cluster strings. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 30 / 32
  • 56.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Bibliographie I Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337–404. Cherqaoui, D., Villemin, D., Mesbah, A., Cense, J. M., and Kvasnicka, V. (1994). Use of a Neural Network to Determine the Normal Boiling Points of Acyclic Ethers, Peroxides, Acetals and their Sulfur Analogues. J. Chem. Soc. Faraday Trans., 90:2015–2019. Horv´ath, T. (2005). Cyclic pattern kernels revisited. In Springer-Verlag, editor, Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, volume 3518, pages 791 – 801. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 30 / 32
  • 57.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Bibliographie II Johnson, M. A. and Maggiora, G. M., editors (1990). Concepts and Applications of Molecular Similarity. Wiley. Kashima, H., Tsuda, K., and Inokuchi, A. (2003). Marginalized Kernels Between Labeled Graphs. Machine Learning. Mah´e, P. and Vert, J.-p. (2009). Graph kernels based on tree patterns for molecules. Machine Learning, 75(1)(September 2008):3–35. Neuhaus, M. and Bunke, H. (2007). Bridging the gap between graph edit distance and kernel machines, volume 68 of Series in Machine Perception and Artificial Intelligence. World Scientific Publishing. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 31 / 32
  • 58.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Bibliographie III Rakotomamonjy, A., Bach, F., Canu, S., and Grandvalet, Y. (2008). SimpleMKL. Journal of Machine Learning Research, 9:2491–2521. Ralaivola, L., Swamidass, S. J., Saigo, H., and Baldi, P. (2005). Graph kernels for chemical informatics. Neural networks : the official journal of the International Neural Network Society, 18(8):1093–110. Riesen, K. (2009). Classification and Clustering of Vector Space Embedded Graphs. PhD thesis, Institut f¨ur Informatik und angewandte Mathematik Universit¨at Bern. Shervaszide, N. (2012). Scalable Graph Kernels. PhD thesis, Universit¨at T¨ubingen. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 32 / 32
  • 59.
    Introduction Graph Kernels Treelets kernel Cyclicsimilarity Conclusion Extensions Bibliographie IV Vismara, P. (1995). Reconnaissance et repr´esentation d’´el´ements structuraux pour la description d’objets complexes. Application `a l’´elaboration de strat´egies de synth`ese en chimie organique. PhD thesis, Universit´e Montpellier II. L. Brun (GREYC) Seminar Bordeaux, November, 20 2014 33 / 32