Graph Kernels for Chemical Informatics

Graph Kernels for Chemical Informatics
Fall 2015 Data Lunch Seminar
Mukund Raj
10th Nov, 2015
1 / 48

Outline
1 Introduction
2 Expressiveness versus complexity
3 Walk kernels
4 Conclusion and future directions
5 Data depth for labeled graph ensembles
2 / 48

Liband-Based Virtual Screening1
Objective
Build models to predict biochemical properties of small molecules
from their structures.
Structures
C15H14CIN3O3
Properties
toxicity
pharmacokinetics (absorption)
binding to target
1
Slide from: Jean-Phillipe Vert, ParisTech
3 / 48

Example1
NCI AIDS screen results (from http://cactus.nci.nih.gov)
1
4 / 48

Chemical space2
Stars Small Molecules
Existing 1022 107
Virtual 0 1060
Access Diﬃcult “Easy”
2
Slide from: Pierre Baldi, UC Irvine
5 / 48

Formalization
Problem statement
Given a set of training instances (x1, y1), . . . , (xn, yn), where :
xi ’s are graphs and yi ’s are continuous or discrete variables of
interest.
Estimate a function
y = f (x)
where x is any graph.
6 / 48

Classical Approaches1
Classical approaches
1 Map each molecule to a vector of ﬁxed dimension.
2 Apply an algorithm for regression or classiﬁcation over vectors.
Example: 2D structural keys in chemoinformatics
Then use NN, decision tree, least squares e.t.c
1
7 / 48

Diﬃculties
Expressiveness of features (which features are relevant?)
Large dimension of vector representation.
8 / 48

The kernel trick
Kernel
Let φ(x) be a vector representation of the graph x.
The kernel between two graphs is deﬁned by:
K(x, x ) = φ(x)T
φ(x ).
Many linear algorithms can be expressed only in terms of inner
products between vectors.
Often computing kernel is more eﬃcient than computing φ(x).
9 / 48

Kernel trick example: computing distances in the feature
space1
1
10 / 48

Positive definite (p.d.) Kernels
Definition
A positive definite (p.d.) kernel on a set χ is a function
K : χ × χ → R that is symmetric and satisfies, for all
N ∈ N, (x1, x2, . . . , xn) ∈ χN and (a1, a2, . . . , an) ∈ RN:
N
i=1
N
j=1
ai aj K(xi , xj ) ≥ 0
11 / 48

Positive deﬁnite kernels are inner products1
Mercer’s property
K is a p.d. kernel on the set χ if and only if there exists a Hilbert
space H and a mapping
φ : χ → H,
such that, for any x, x’ in χ:
K(x, x ) = φ(x), φ(x ) H
1
12 / 48

Graph kernels 1
Deﬁnition
A graph kernel K(x, x ) is a p.d. kernel over the set of
(labeled) graphs.
It is equivalent to an embedding φ : χ → H of the set of
graphs to a Hilbert space through the relation:
K(x, x ) = φ(x)T
φ(x ).
1
13 / 48

Clariﬁcation
Descriptors and kernels in chemoinformatics
1D- SMILES strings
2D- Graph of chemical bonds
2.5D- Surfaces
3D- Atomic coordinates
4D- Temporal evolution
14 / 48

Outline
1 Introduction
3 Walk kernels
15 / 48

Expressibility versus complexity
Deﬁnition: Complete graph kernels
A graph kernel is complete if it separates nonisomorphic graphs.
16 / 48

Graph isomorphism
Figure from: Wikipedia
17 / 48

Implication
If graph kernel not complete, then it cannot diﬀerentiate all
nonisomorphic graphs.
18 / 48

Implication
If graph kernel not complete, then it cannot diﬀerentiate all
nonisomorphic graphs.
Tractability
Computing any complete graph kernel is at least as hard as the
graph isomorphism problem (Gartner et al, 2003)
19 / 48

Subgraph kernel
Deﬁnition: Subgraph
A subgraph of a graph (V , E) is a graph (V , E ) with V ⊂ V and
E ⊂ E.
20 / 48

Subgraph kernel
E ⊂ E.
Deﬁnition: Subgraph kernel
Ksubgraph(G1, G2) =
H∈χ
λHφH(G1)φH(G2).
where H ⊂ χ, λH is weight associated with H and φH(Gx ) returns
the number of occurrences of H in Gx .
21 / 48

Subgraph kernel
E ⊂ E.
Deﬁnition: Subgraph kernel
Ksubgraph(G1, G2) =
H∈χ
λHφH(G1)φH(G2).
where H ⊂ χ, λH is weight associated with H and φH(Gx ) returns
the number of occurrences of H in Gx .
Subgraph kernel complexity
Computing the subgraph kernel is NP hard (Gartner et.al. 2003)
22 / 48

Path kernels
Deﬁnition: Path
A path of a graph (V,E) is a sequence of distinct vertices such that
consecutive vertices share an edge.
23 / 48

Path kernels
Deﬁnition: Path
Deﬁnition: Path kernel
Kpath(G1, G2) =
H∈P
λHφH(G1)φH(G2).
where P ⊂ χ is the set of path graphs.
24 / 48

Path kernels
Deﬁnition: Path
Deﬁnition: Path kernel
Kpath(G1, G2) =
H∈P
λHφH(G1)φH(G2).
where P ⊂ χ is the set of path graphs.
Path kernel complexity
Computing the path kernel is NP hard (Gartner et.al. 2003)
25 / 48

Outline
1 Introduction
3 Walk kernels
26 / 48

Walks
Deﬁnition
A walk of a graph (V,E) is a sequence of distinct vertices such that
consecutive vertices share an edge. Edge cannot appear in path
only once.
27 / 48

Walks
Deﬁnition
A walk of a graph (V,E) is a sequence of distinct vertices such that
consecutive vertices share an edge. Edge cannot appear in path
only once.
Deﬁnition: Walk kernel
Kwalk(G1, G2) =
w∈S
λw φw (G1)φw (G2).
where S is the set of all walks and φw (Gx ) returns the count of
walk w in Gx .
28 / 48

Walk kernel examples
nth order walk kernel
λG (w) = 1 if the length of w is n, 0 other wise.
29 / 48

Geometric walk kernel
λG (w) = βlength(w), for β > 0
30 / 48

Random walk kernel
λG (w) = PG (w)
31 / 48

Random walk kernel
λG (w) = PG (w)
Fingerprint based kernels
Dot product kernel
Tanimoto kernel
MinMax kernel
32 / 48

Computation of walk kernels
Yay!
All the above walk kernels can be computed eﬃciently in
polynomial time.
33 / 48

Computation of n-th order walk kernel (1/2)1
Product graphs
Let G1 = (V1, E1) and G2 = (V1, E2). Then product graph
G = G1 × G2 is the graph G = (V , E) with:
1 V = {(v1, v2) ∈ V1 × V2: v1 and v2 have same label},
2 E = { (v1, v2), (v1, v2) ∈ V × V : (v1, v1) ∈
E1 and (v2, v2) ∈ E2}
1
34 / 48

Computation of n-th order walk kernel (2/2)1
For nth order walk kernel we have λG1×G2 = 1 if length of w
is n, 0 otherwise.
Therefore:
Knth−order (G1, G2) =
w∈Sn(G1×G2)
1 =
i,j
[An
]i,j = 1T
An
1.
Computation in O(n|G1||G2|d1d2), where di is the maximum
degree of Gi .
1
35 / 48

Traditional molecular ﬁngerprints
Bit vectors of size ( usually = 512 or 1024)
Steps (as summarized in Ralaivola et al. 2005)
1 DFS exploration from each atom to get set of walks
2 Each path initializes a random number generator to form b
integers.
3 b integers reduced molulo then used to set corresponding
bits in ﬁngerprint vector.
Complexity O(nm) or O(nαd ), where n := # atoms and
m := #edges, α := branching factor and d := depth of walk
36 / 48

Generalized molecular ﬁngerprints
Avoids clashes/information loss with reserved bit positions.
Let P(d) be set of all atom-bond labeled path containing max
d bonds.
Binary feature map given depth d:
φd (u) = φpath(u) path∈P(d)
Binary feature map given depth d and ﬁxed vector size :
φd (u) = φγ (path)(u) path∈P(d)
where γ :→ {1, . . . , }b
37 / 48

Fingerprint based kernel (Ralaivola et.al. 2005)2
Complexity O(d(n1m1 + n2m2)) using suﬃx tree data
structure.
2
Slide from: Pierre Baldi, UC Irvine
38 / 48

Extensions for walk kernels
Label enrichment
Non tottering walk
3D kernels
Mutual information in ﬁngerprint construction
39 / 48

Results(Mahe et al., 2005, Ralaivola et al, 2005)
MUTAG Dataset
Collection of 188 compounds.
Classiﬁcation of mutagenic activity :high(125) or none(63), as
assayed in Salmonella typhimurium.
Method Accuracy
Progol1(1D) 81.4 %
Random walk kernel (2D) 91.2 %
MinMax kernel (2D) 91.5%
40 / 48

Outline
1 Introduction
3 Walk kernels
41 / 48

Conclusion
Summary
Extension of ML algorithms to graph data using deﬁnition of
positive deﬁnite kernels.
Two classes of 2D kernels for chemical molecule structures.
What next?
Can we use graph kernel machinery for computing depth.
42 / 48

Outline
1 Introduction
3 Walk kernels
43 / 48

Data depth
What is depth function?
A depth function is designed to provide a P-based center outward
ordering( and thus a ranking) for ensemble of data objects drawn
from any arbitrary distribution P.
Taxonomy of data depth deﬁnitions (Mosler 2012)
Distance based depth functions
Simplex/Halfspace based depth
Weighted mean based depth
44 / 48

Band depth
Band depth: A type of simplex based data depth method.
Many deﬁnitions for various kinds of data.
Functions Multivariate functions
Paths on graph
45 / 48

Band formed by graphs
When alignment is known..
Graphs → Adjacency matrices → Functions
Alignment: A mapping from θ : VGx → VGy , where Gx and Gy
are any two graphs.
When alignment is unknown..
????
46 / 48

Product graphs!
V = {(v1, v2) ∈ V1 × V2: v1 and v2 have same label},
E = { (v1, v2), (v1, v2) ∈ V × V : (v1, v1) ∈ E1 and (v2, v2) ∈ E2}
Weak Direct Product (aka Tensor product or Kronecker product..)
E× = { (v1, v2), (v1, v2) ∈ V × V : (v1, v1) ∈ E1 and (v2, v2) ∈
E2}
Strong Product
E = E× ∪ E where
E = { (v1, v2), (v1, v2) ∈ V × V : v1 = v1 and (v2, v2) ∈
E2} or v2 = v2 and (v1, v1) ∈ E1}
47 / 48

References
Ralaivola, Liva, Sanjay J. Swamidass, Hiroto Saigo, and Pierre
Baldi. ”Graph kernels for chemical informatics.” Neural
Networks 18, no. 8 (2005): 1093-1110.
Mahe, Pierre, et al. ”Graph kernels for molecular
structure-activity relationship analysis with support vector
machines.” Journal of chemical information and modeling
45.4 (2005): 939-951.
Gartner, Thomas, Peter Flach, and Stefan Wrobel. ”On graph
kernels: Hardness results and eﬃcient alternatives.” Learning
Theory and Kernel Machines. Springer Berlin Heidelberg,
2003. 129-143.
http://videolectures.net/site/normal_dl/tag=9127/
gbr07_vert_ckac_01.pdf
http:
//www.ics.uci.edu/~dock/upload/UCI_CHEM_05.ppt
48 / 48

Graph Kernels for Chemical Informatics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Graph Kernels for Chemical Informatics

Similar to Graph Kernels for Chemical Informatics (20)

Recently uploaded

Recently uploaded (20)

Graph Kernels for Chemical Informatics