Kernel Methods and Relational Learning in Computational Biology
1. Kernel Methods and Relational Learning in
Computational Biology
ir. Michiel Stock
Faculty of Bioscience Engineering
Ghent University
November 2014
KERMIT
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 1 / 36
2. Outline
1 Introduction
2 Kernel methods
Theoretical overview
Dealing with sequences
Dealing with graphs
Other kernels
3 Learning relations
Kronecker kernels
Conditional ranking
4 Predicting enzyme function
De
3. ning the problem
Results
5 Conclusions
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 2 / 36
5. Introduction
Introductory example: drug design
Strategy for curing Alzheimer's disease
Find compounds with good ADMET properties that selectively bind
cholinesterase and amyloid precursor protein
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 4 / 36
6. Introduction
Labels: known protein-ligand interaction
D
F
G
T U Y
Z
A
X
V
.2
.6
B
.5
E
.6
.8
.3
W
.3 1
C
Proteins
Ligands
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 5 / 36
7. Introduction
The targets: features for proteins
Possible representations:
amino acid sequence
3D structure
gene expression
cellular location
phylogenetic pro
8. les
...
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 6 / 36
9. Introduction
The ligands: features for compounds
Possible representations:
SMILE format and other text-based
representations
coloured graph representation
10. ngerprints based on physicochemical
descriptors
...
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 7 / 36
11. Introduction
Computational biology deals with interesting
problems
We deal with objects that are:
in large dimension (e.g. microarrays or proteomics data)
structured (e.g. gene sequences, small molecules, interaction
networks, phylogenetic trees...)
heterogeneous (e.g. vectors, sequences, graphs to describe the
same protein)
in large quantities (e.g. more than 106 known protein
sequences)
noisy (e.g. many features are not relevant)
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 8 / 36
12. Introduction
Computational biology often deals with interactions
Relational learning
Predicting properties of two objects, which can be of a dierent type.
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 9 / 36
13. Kernel methods
Kernel methods
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 10 / 36
18. nite kernel if it is
symmetric, that is, k(x; x0) = k(x0; x) for any two objects x; x0 2 X, and
positive semi-de
19. nite, that is,
XN
i=1
XN
j=1
ci cjk(xi ; xj ) 0
for any N 0, any choice of N objects x1; : : : ; xN 2 X, and any choice of
real numbers c1; : : : ; cN 2 R.
Can be seen as generalized covariances.
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 11 / 36
20. Kernel methods Theoretical overview
Interpretation of kernels
Suppose an object x has an
implicit feature representation
(x) 2 F.
A kernel function can be seen
as a dot product in this
feature space:
k(x; x0) = h(x); (x0)i
Linear models in this feature
space F can be made:
y(x) = wT(x)
=
X
n
ank(xn; x)
X F
k h(x), (x0)i
dinsdag, 10 april 2012
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 12 / 36
21. Kernel methods Theoretical overview
Many kernel methods exist
Examples of popular kernel
methods:
Support vector machine
(SVM)
Regularized least squares
(RLS)
Kernel principal
component analysis
(KPCA)
Learning algorithm is
independent of the kernel
representation!
SVM
KPCA
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 13 / 36
22. Kernel methods Dealing with sequences
Kernels using sequence alignment
sequence alignment optimises a score of how well the residues match
use this score as a kernel value (similarity for sequences)
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 14 / 36
23. Kernel methods Dealing with sequences
Kernels using substrings
Spectrum kernel (SK)
The SK considers the number of k-mers m two sequences si and sj have in
common.
SKk (si ; sj ) =
X
m2k
N(m; si )N(m; sj )
with N(m; s) the number of k-mers
m in sequence s.
Many modi
24. cations exist.
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 15 / 36
25. Kernel methods Dealing with graphs
What is a graph?
Graph
Graphs are a set of interconnected objects, called vertices (or nodes), that
are connected through edges.
Graphs can show the structure of an object or interactions between
dierent objects.
Graph are important in bioinformatics!
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 16 / 36
26. Kernel methods Dealing with graphs
Comparing nodes within a graph
Diusion kernel
Constructing a similarity between vertices within the same graph.
Based on performing a
random walk on a graph.
Captures the long-range
relationships between
vertices.
Inspired by the heat
equation. The kernel
quanti
27. es how quickly `heat'
can spread from one node to
another.
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 17 / 36
28. Kernel methods Dealing with graphs
Comparing two separate graphs
Graph kernel
Constructing a similarity between graphs.
Also based on performing a
random walk on both graphs
and counting the number of
matching walks.
Usually very computationally
demanding!
In chemoinformatics:
In structural bioinformatics:
A Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 18 / 36
30. ngerprints
Objects that can be described
by a long binary vector x can
be represented by the
Tanimoto kernel:
KTan(xm; xn) =
hxm; xni
hxm; xmi + hxn; xni hxm; xni
:
Fingerprint representation of
a molecule:
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 19 / 36
31. Kernel methods Other kernels
Kernels for other objects
Kernels for texts: often based on word count (example: medical
papers)
Kernels for point clouds (example: using 3D structure of proteins)
Fisher kernels: use information of a generative model (example: using
a Hidden Markov Model)
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 20 / 36
32. Learning relations
Learning relations
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 21 / 36
34. ne the Vectorization operator:
vec(A) =
2
a11
a12
a21
a22
664
3
775
And the Kronecker product:
A
B =
2
a11b11 a11b12 a12b11 a12b12
a11b21 a11b22 a12b21 a12b22
a21b11 a21b12 a22b11 a22b12
a21b21 a21b22 a22b21 a22b22
664
3
775
Key equation: (BT
A)vec(X) = vec(AXB)
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 22 / 36
35. Learning and Ranking Algorithms Learning relations Kronecker for kernels
Bioinformatics
Introductory example:
Applications
Kernels for pairs of objects
chemogenomics
Michiel Stock, Willem Waegeman, Bernard De Baets
KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics
Pairwise kernel
Combine the kernel matrices of the individual objects to construct a kernel
matrix for pairs of objects.
set of proteins and a database of ligands to aid the process of drug
statistical model based on a data set. Kernel methods allow for the
protein and a ligand.
Introductory example: chemogenomics
binding interactions between a set of proteins and a database of ligands to aid the process of drug
used to model pairwise relations between different types of objects.
Ligands
( , )
( , )
( , )
By optimizing a ranking loss, our algorithms can also be used for
conditional ranking, as shown on the right.
In short, our framework is ideally suited for bioinformatics
...
challenges:
( , )
- efficient learning ( , )
process
- can handle complex objects (graphs, trees, sequences...)
- ability to deal with information retrieval problems
Object kernels
Pairwise kernel
SVM
RLS
...
Learning algorithm
Kronecker kernel: K = K
K
our algorithms can also be used for
the right.
ideally suited for bioinformatics
relevant
relevant
Object kernels
Data set
Conditional ranking algorithm
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 23 / 36
36. Learning relations Kronecker kernels
Kernel ridge regression for relations
set y = vec(Y ) and
K = K
K
We can just use the usual
kernel ridge regression:
arg min
a
(yKa)T (yKa)+
aTKa
This is equivalent to solving
the following linear system:
(K + INMNM)a = y
N objects of type U (e.g.
proteins)
M objects of type V
(e.g. ligands)
Y : N M label matrix
(e.g. molecular
interaction)
K: N N kernel matrix
for objects of type U
K : M M kernel
matrix for objects of
type V
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 24 / 36
37. Learning relations Conditional ranking
( , )
( , )
Conditional ranking
...
( , )
( , )
Motivation
Suppose one is not particularly interested in the exact value of the
interaction but in the order of the proteins for a given ligand.
kernels
Pairwise kernel
SVM
RLS
...
Learning algorithm
used for
bioinformatics
More relevant
Query 1 Query 2
Database objects
More relevant
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 25 / 36
38. Learning relations Conditional ranking
Conditional ranking
Suppose: e = (u; v) 2 E = (U V)
Train the model:
h(e) = wT (e) =
X
e2E
aeK(e; e)
by solving:
A(T) = arg min
h2H L(h;T)+khk2
H:
Where we use a ranking loss:
X
L(h;T) =
u;u02U
X
v;v02V
preference graph:
Figure 1 Example of a multi-graph. If this graph, on the left, would be used for conditioned on C, then A scores better than E, which ranks higher than E, which higher than D and D ranks higher than B. There is no information about the relation and G, respectively, our model could be used to include these two instances in are available. Notice that in this setting unconditional ranking of these objects graph is obviously intransitive. Figure reproduced from (Pahikkala et al., 2010).
(yu;vyu0;v0h(u; v)+h(u0; v0))2:
The proposed framework is based on the Kronecker product kernel implicit joint feature representations of queries and the sets of objects Exactly this kernel construction will allow a straightforward existing framework to dyadic relations and multi-task learning Michiel Stock (KERMIT) Kernels for Computational Biology (Objectives 1 and 2). It has November been proposed 2014 independently 26 by / 36
three
39. Predicting enzyme function
Predicting enzyme function
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 27 / 36
40. Predicting enzyme function
The data set
Data:
two data sets of ca. 1600
enzymes with 21
dierent functions
41. ve dierent similarity
measures of the active
site
active site of an
enzyme:
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 28 / 36
42. Predicting enzyme function
The enzyme commission number
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 29 / 36
46. ning the problem
Conditional ranking of enzymes
Ranking enzymes
For an unannotated enzyme, rank the annotated enzymes so that the
top has a similar function w.r.t. the query.
Minimize ranking error:
number of switches needed
for a perfect ranking
Example: suppose one has an
enzyme with unknown
function: EC ?.?.?.?
1 EC 2.7.7.12
2 EC 2.7.7.12
3 EC 2.7.7.34
4 EC 2.7.1.12
5 EC 2.7.7.34
6 EC 4.2.3.90
7 EC 1.14.11
8 EC 4.6.1.11
) EC 2.7.7.12
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 31 / 36
48. ning the problem
Learning the catalytic similarity
pair of enzymes:
e = (v; v0)
label ye 2 f0; 1; 2; 3; 4g:
the catalytic similarity
49. ve dierent structural
similarities: K(v; v0)
Enzymes
A B C D E F G
A 4 4 0 0 0
B 4 4 0 0 0
C 0 0 4 2 1
D 0 0 2 4 3
E 0 0 1 3 4
F
G
Enzymes
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 32 / 36
50. Predicting enzyme function Results
Qualitative improvement in the enzyme similarities
Example for CavBase structural similarity:
Unsupervised Supervised Ground truth
Lighter color = higher similarity
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 33 / 36
52. ve dierent structural similarity measures:
unsupervised and supervised
ROC curve for the different enzyme similarity
False positive rate
Average true positive rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
CB sup.
FP sup.
LPCS sup.
MCS sup.
SW sup.
CB unsup.
FP unsup.
LPCS unsup.
MCS unsup.
SW unsup.
measurements of data set I
Improvement
Increase of AUC from ca. 0.7 to more than 0.8!
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 34 / 36
53. Conclusions
Conclusions
kernels can be used to work with structured objects...
... and can encode your prior knowledge
many problems in computational biology can be seen as `learning
relations'
relations between objects can be learned elegantly and eciently
using Kronecker kernels
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 35 / 36
54. Conclusions
Kernel Methods and Relational Learning in
Computational Biology
ir. Michiel Stock
Faculty of Bioscience Engineering
Ghent University
November 2014
KERMIT
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 36 / 36