SlideShare a Scribd company logo
1 of 36
Download to read offline
Kernel Methods and Relational Learning in 
Computational Biology 
ir. Michiel Stock 
Faculty of Bioscience Engineering 
Ghent University 
November 2014 
KERMIT 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 1 / 36
Outline 
1 Introduction 
2 Kernel methods 
Theoretical overview 
Dealing with sequences 
Dealing with graphs 
Other kernels 
3 Learning relations 
Kronecker kernels 
Conditional ranking 
4 Predicting enzyme function 
De
ning the problem 
Results 
5 Conclusions 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 2 / 36
Introduction 
Introduction 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 3 / 36
Introduction 
Introductory example: drug design 
Strategy for curing Alzheimer's disease 
Find compounds with good ADMET properties that selectively bind 
cholinesterase and amyloid precursor protein 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 4 / 36
Introduction 
Labels: known protein-ligand interaction 
D 
F 
G 
T U Y 
Z 
A 
X 
V 
.2 
.6 
B 
.5 
E 
.6 
.8 
.3 
W 
.3 1 
C 
Proteins 
Ligands 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 5 / 36
Introduction 
The targets: features for proteins 
Possible representations: 
amino acid sequence 
3D structure 
gene expression 
cellular location 
phylogenetic pro
les 
... 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 6 / 36
Introduction 
The ligands: features for compounds 
Possible representations: 
SMILE format and other text-based 
representations 
coloured graph representation
ngerprints based on physicochemical 
descriptors 
... 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 7 / 36
Introduction 
Computational biology deals with interesting 
problems 
We deal with objects that are: 
in large dimension (e.g. microarrays or proteomics data) 
structured (e.g. gene sequences, small molecules, interaction 
networks, phylogenetic trees...) 
heterogeneous (e.g. vectors, sequences, graphs to describe the 
same protein) 
in large quantities (e.g. more than 106 known protein 
sequences) 
noisy (e.g. many features are not relevant) 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 8 / 36
Introduction 
Computational biology often deals with interactions 
Relational learning 
Predicting properties of two objects, which can be of a dierent type. 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 9 / 36
Kernel methods 
Kernel methods 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 10 / 36
Kernel methods Theoretical overview 
Formal de
nition of a kernel 
Kernels are non-linear functions de
ned over objects x 2 X. 
De
nition 
A function k : X  X ! R is called a positive de
nite kernel if it is 
symmetric, that is, k(x; x0) = k(x0; x) for any two objects x; x0 2 X, and 
positive semi-de
nite, that is, 
XN 
i=1 
XN 
j=1 
ci cjk(xi ; xj )  0 
for any N  0, any choice of N objects x1; : : : ; xN 2 X, and any choice of 
real numbers c1; : : : ; cN 2 R. 
Can be seen as generalized covariances. 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 11 / 36
Kernel methods Theoretical overview 
Interpretation of kernels 
Suppose an object x has an 
implicit feature representation 
(x) 2 F. 
A kernel function can be seen 
as a dot product in this 
feature space: 
k(x; x0) = h(x); (x0)i 
Linear models in this feature 
space F can be made: 
y(x) = wT(x) 
= 
X 
n 
ank(xn; x) 
 
X F 
k h(x), (x0)i 
dinsdag, 10 april 2012 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 12 / 36
Kernel methods Theoretical overview 
Many kernel methods exist 
Examples of popular kernel 
methods: 
Support vector machine 
(SVM) 
Regularized least squares 
(RLS) 
Kernel principal 
component analysis 
(KPCA) 
Learning algorithm is 
independent of the kernel 
representation! 
SVM 
KPCA 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 13 / 36
Kernel methods Dealing with sequences 
Kernels using sequence alignment 
sequence alignment optimises a score of how well the residues match 
use this score as a kernel value (similarity for sequences) 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 14 / 36
Kernel methods Dealing with sequences 
Kernels using substrings 
Spectrum kernel (SK) 
The SK considers the number of k-mers m two sequences si and sj have in 
common. 
SKk (si ; sj ) = 
X 
m2k 
N(m; si )N(m; sj ) 
with N(m; s) the number of k-mers 
m in sequence s. 
Many modi
cations exist. 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 15 / 36
Kernel methods Dealing with graphs 
What is a graph? 
Graph 
Graphs are a set of interconnected objects, called vertices (or nodes), that 
are connected through edges. 
Graphs can show the structure of an object or interactions between 
dierent objects. 
Graph are important in bioinformatics! 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 16 / 36
Kernel methods Dealing with graphs 
Comparing nodes within a graph 
Diusion kernel 
Constructing a similarity between vertices within the same graph. 
Based on performing a 
random walk on a graph. 
Captures the long-range 
relationships between 
vertices. 
Inspired by the heat 
equation. The kernel 
quanti
es how quickly `heat' 
can spread from one node to 
another. 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 17 / 36
Kernel methods Dealing with graphs 
Comparing two separate graphs 
Graph kernel 
Constructing a similarity between graphs. 
Also based on performing a 
random walk on both graphs 
and counting the number of 
matching walks. 
Usually very computationally 
demanding! 
In chemoinformatics: 
In structural bioinformatics: 
A Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 18 / 36
Kernel methods Other kernels 
Kernels for
ngerprints 
Objects that can be described 
by a long binary vector x can 
be represented by the 
Tanimoto kernel: 
KTan(xm; xn) = 
hxm; xni 
hxm; xmi + hxn; xni  hxm; xni 
: 
Fingerprint representation of 
a molecule: 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 19 / 36
Kernel methods Other kernels 
Kernels for other objects 
Kernels for texts: often based on word count (example: medical 
papers) 
Kernels for point clouds (example: using 3D structure of proteins) 
Fisher kernels: use information of a generative model (example: using 
a Hidden Markov Model) 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 20 / 36
Learning relations 
Learning relations 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 21 / 36
Learning relations Kronecker kernels 
A little math... 
A = 
 
a11 a12 
a21 a22 
 
and B = 
 
b11 b12 
b21 b22 
 
: (1) 
We de
ne the Vectorization operator: 
vec(A) = 
2 
a11 
a12 
a21 
a22 
664 
3 
775 
And the Kronecker product: 
A 
 B = 
2 
a11b11 a11b12 a12b11 a12b12 
a11b21 a11b22 a12b21 a12b22 
a21b11 a21b12 a22b11 a22b12 
a21b21 a21b22 a22b21 a22b22 
664 
3 
775 
Key equation: (BT 
 A)vec(X) = vec(AXB) 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 22 / 36
Learning and Ranking Algorithms Learning relations Kronecker for kernels 
Bioinformatics 
Introductory example: 
Applications 
Kernels for pairs of objects 
chemogenomics 
Michiel Stock, Willem Waegeman, Bernard De Baets 
KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics 
Pairwise kernel 
Combine the kernel matrices of the individual objects to construct a kernel 
matrix for pairs of objects. 
set of proteins and a database of ligands to aid the process of drug 
statistical model based on a data set. Kernel methods allow for the 
protein and a ligand. 
Introductory example: chemogenomics 
binding interactions between a set of proteins and a database of ligands to aid the process of drug 
used to model pairwise relations between different types of objects. 
Ligands 
( , ) 
( , ) 
( , ) 
By optimizing a ranking loss, our algorithms can also be used for 
conditional ranking, as shown on the right. 
In short, our framework is ideally suited for bioinformatics 
... 
challenges: 
( , ) 
- efficient learning ( , ) 
process 
- can handle complex objects (graphs, trees, sequences...) 
- ability to deal with information retrieval problems 
Object kernels 
Pairwise kernel 
SVM 
RLS 
... 
Learning algorithm 
Kronecker kernel: K = K 
 K  
our algorithms can also be used for 
the right. 
ideally suited for bioinformatics 
relevant 
relevant 
Object kernels 
Data set 
Conditional ranking algorithm 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 23 / 36
Learning relations Kronecker kernels 
Kernel ridge regression for relations 
set y = vec(Y ) and 
K = K 
 K  
We can just use the usual 
kernel ridge regression: 
arg min 
a 
(yKa)T (yKa)+ 
aTKa 
This is equivalent to solving 
the following linear system: 
(K + INMNM)a = y 
N objects of type U (e.g. 
proteins) 
M objects of type V 
(e.g. ligands) 
Y : N  M label matrix 
(e.g. molecular 
interaction) 
K: N N kernel matrix 
for objects of type U 
K : M  M kernel 
matrix for objects of 
type V 
Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 24 / 36

More Related Content

What's hot

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
A R T I F I C I A L N E U R A L N E T W O R K S J N T U M O D E L P A P ...
A R T I F I C I A L  N E U R A L  N E T W O R K S  J N T U  M O D E L  P A P ...A R T I F I C I A L  N E U R A L  N E T W O R K S  J N T U  M O D E L  P A P ...
A R T I F I C I A L N E U R A L N E T W O R K S J N T U M O D E L P A P ...guest3f9c6b
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysistuxette
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
Cognition, Information and Subjective Computation
Cognition, Information and Subjective ComputationCognition, Information and Subjective Computation
Cognition, Information and Subjective ComputationHector Zenil
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clusteringDmitrii Ignatov
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learningbutest
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machinesMostafa G. M. Mostafa
 
[Ris cy business]
[Ris cy business][Ris cy business]
[Ris cy business]Dino, llc
 
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicApproximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicCemal Ardil
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Mostafa G. M. Mostafa
 
Extremely Low Bit Transformer Quantization for On-Device NMT
Extremely Low Bit Transformer Quantization for On-Device NMTExtremely Low Bit Transformer Quantization for On-Device NMT
Extremely Low Bit Transformer Quantization for On-Device NMTInsoo Chung
 
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...Ryutaroh Matsumoto
 

What's hot (19)

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
A R T I F I C I A L N E U R A L N E T W O R K S J N T U M O D E L P A P ...
A R T I F I C I A L  N E U R A L  N E T W O R K S  J N T U  M O D E L  P A P ...A R T I F I C I A L  N E U R A L  N E T W O R K S  J N T U  M O D E L  P A P ...
A R T I F I C I A L N E U R A L N E T W O R K S J N T U M O D E L P A P ...
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
Cognition, Information and Subjective Computation
Cognition, Information and Subjective ComputationCognition, Information and Subjective Computation
Cognition, Information and Subjective Computation
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 
[Ris cy business]
[Ris cy business][Ris cy business]
[Ris cy business]
 
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicApproximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
Extremely Low Bit Transformer Quantization for On-Device NMT
Extremely Low Bit Transformer Quantization for On-Device NMTExtremely Low Bit Transformer Quantization for On-Device NMT
Extremely Low Bit Transformer Quantization for On-Device NMT
 
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
Exploring Quantum Supremacy in Access Structures of Secret Sharing by Coding ...
 

Similar to Kernel Methods and Relational Learning in Computational Biology

Bioinformatics kernels relations
Bioinformatics kernels relationsBioinformatics kernels relations
Bioinformatics kernels relationsMichiel Stock
 
Logics of Context and Modal Type Theories
Logics of Context and Modal Type TheoriesLogics of Context and Modal Type Theories
Logics of Context and Modal Type TheoriesValeria de Paiva
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012taxonbytes
 
Machine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IMachine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IJon Paul Janet
 
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...National Institute of Informatics
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)cdtpv
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Yueshen Xu
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 
abstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docabstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docbutest
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"ieee_cis_cyprus
 

Similar to Kernel Methods and Relational Learning in Computational Biology (20)

Bioinformatics kernels relations
Bioinformatics kernels relationsBioinformatics kernels relations
Bioinformatics kernels relations
 
prototypes-AMALEA.pdf
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdf
 
Logics of Context and Modal Type Theories
Logics of Context and Modal Type TheoriesLogics of Context and Modal Type Theories
Logics of Context and Modal Type Theories
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
 
Machine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IMachine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part I
 
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Csmr11b.ppt
Csmr11b.pptCsmr11b.ppt
Csmr11b.ppt
 
Constructive Modalities
Constructive ModalitiesConstructive Modalities
Constructive Modalities
 
abstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docabstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.doc
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Mining
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
 
WILF2011 - slides
WILF2011 - slidesWILF2011 - slides
WILF2011 - slides
 

More from Michiel Stock

Wiskunde voor Waterbeheer
Wiskunde voor WaterbeheerWiskunde voor Waterbeheer
Wiskunde voor WaterbeheerMichiel Stock
 
How the mathematics behind Netflix will save the world
How the mathematics behind Netflix will save the worldHow the mathematics behind Netflix will save the world
How the mathematics behind Netflix will save the worldMichiel Stock
 
Disentangling ecological networks using graph embedding methods
Disentangling ecological networks using graph embedding methodsDisentangling ecological networks using graph embedding methods
Disentangling ecological networks using graph embedding methodsMichiel Stock
 
2018 presentation montréal_handouts
2018 presentation montréal_handouts2018 presentation montréal_handouts
2018 presentation montréal_handoutsMichiel Stock
 
A tour in optimal transport
A tour in optimal transportA tour in optimal transport
A tour in optimal transportMichiel Stock
 
Pairwise Learning for Synthetic Biology
Pairwise Learning for Synthetic BiologyPairwise Learning for Synthetic Biology
Pairwise Learning for Synthetic BiologyMichiel Stock
 
PhD defence pairwise learning
PhD defence pairwise learningPhD defence pairwise learning
PhD defence pairwise learningMichiel Stock
 
Bioscience engineering together: participating at iGEM
Bioscience engineering together: participating at iGEMBioscience engineering together: participating at iGEM
Bioscience engineering together: participating at iGEMMichiel Stock
 
Exact and efficient top-K inference for multi-target prediction by querying s...
Exact and efficient top-K inference for multi-target prediction by querying s...Exact and efficient top-K inference for multi-target prediction by querying s...
Exact and efficient top-K inference for multi-target prediction by querying s...Michiel Stock
 
Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016Michiel Stock
 
A two-step method to incorporate task features for large output spaces
A two-step method to incorporate task features for large output spacesA two-step method to incorporate task features for large output spaces
A two-step method to incorporate task features for large output spacesMichiel Stock
 
Enzyme Annotation using Conditional Ranking Algorithms
Enzyme Annotation using Conditional Ranking AlgorithmsEnzyme Annotation using Conditional Ranking Algorithms
Enzyme Annotation using Conditional Ranking AlgorithmsMichiel Stock
 
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...Michiel Stock
 

More from Michiel Stock (13)

Wiskunde voor Waterbeheer
Wiskunde voor WaterbeheerWiskunde voor Waterbeheer
Wiskunde voor Waterbeheer
 
How the mathematics behind Netflix will save the world
How the mathematics behind Netflix will save the worldHow the mathematics behind Netflix will save the world
How the mathematics behind Netflix will save the world
 
Disentangling ecological networks using graph embedding methods
Disentangling ecological networks using graph embedding methodsDisentangling ecological networks using graph embedding methods
Disentangling ecological networks using graph embedding methods
 
2018 presentation montréal_handouts
2018 presentation montréal_handouts2018 presentation montréal_handouts
2018 presentation montréal_handouts
 
A tour in optimal transport
A tour in optimal transportA tour in optimal transport
A tour in optimal transport
 
Pairwise Learning for Synthetic Biology
Pairwise Learning for Synthetic BiologyPairwise Learning for Synthetic Biology
Pairwise Learning for Synthetic Biology
 
PhD defence pairwise learning
PhD defence pairwise learningPhD defence pairwise learning
PhD defence pairwise learning
 
Bioscience engineering together: participating at iGEM
Bioscience engineering together: participating at iGEMBioscience engineering together: participating at iGEM
Bioscience engineering together: participating at iGEM
 
Exact and efficient top-K inference for multi-target prediction by querying s...
Exact and efficient top-K inference for multi-target prediction by querying s...Exact and efficient top-K inference for multi-target prediction by querying s...
Exact and efficient top-K inference for multi-target prediction by querying s...
 
Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016
 
A two-step method to incorporate task features for large output spaces
A two-step method to incorporate task features for large output spacesA two-step method to incorporate task features for large output spaces
A two-step method to incorporate task features for large output spaces
 
Enzyme Annotation using Conditional Ranking Algorithms
Enzyme Annotation using Conditional Ranking AlgorithmsEnzyme Annotation using Conditional Ranking Algorithms
Enzyme Annotation using Conditional Ranking Algorithms
 
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Kernel Methods and Relational Learning in Computational Biology

  • 1. Kernel Methods and Relational Learning in Computational Biology ir. Michiel Stock Faculty of Bioscience Engineering Ghent University November 2014 KERMIT Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 1 / 36
  • 2. Outline 1 Introduction 2 Kernel methods Theoretical overview Dealing with sequences Dealing with graphs Other kernels 3 Learning relations Kronecker kernels Conditional ranking 4 Predicting enzyme function De
  • 3. ning the problem Results 5 Conclusions Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 2 / 36
  • 4. Introduction Introduction Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 3 / 36
  • 5. Introduction Introductory example: drug design Strategy for curing Alzheimer's disease Find compounds with good ADMET properties that selectively bind cholinesterase and amyloid precursor protein Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 4 / 36
  • 6. Introduction Labels: known protein-ligand interaction D F G T U Y Z A X V .2 .6 B .5 E .6 .8 .3 W .3 1 C Proteins Ligands Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 5 / 36
  • 7. Introduction The targets: features for proteins Possible representations: amino acid sequence 3D structure gene expression cellular location phylogenetic pro
  • 8. les ... Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 6 / 36
  • 9. Introduction The ligands: features for compounds Possible representations: SMILE format and other text-based representations coloured graph representation
  • 10. ngerprints based on physicochemical descriptors ... Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 7 / 36
  • 11. Introduction Computational biology deals with interesting problems We deal with objects that are: in large dimension (e.g. microarrays or proteomics data) structured (e.g. gene sequences, small molecules, interaction networks, phylogenetic trees...) heterogeneous (e.g. vectors, sequences, graphs to describe the same protein) in large quantities (e.g. more than 106 known protein sequences) noisy (e.g. many features are not relevant) Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 8 / 36
  • 12. Introduction Computational biology often deals with interactions Relational learning Predicting properties of two objects, which can be of a dierent type. Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 9 / 36
  • 13. Kernel methods Kernel methods Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 10 / 36
  • 14. Kernel methods Theoretical overview Formal de
  • 15. nition of a kernel Kernels are non-linear functions de
  • 16. ned over objects x 2 X. De
  • 17. nition A function k : X X ! R is called a positive de
  • 18. nite kernel if it is symmetric, that is, k(x; x0) = k(x0; x) for any two objects x; x0 2 X, and positive semi-de
  • 19. nite, that is, XN i=1 XN j=1 ci cjk(xi ; xj ) 0 for any N 0, any choice of N objects x1; : : : ; xN 2 X, and any choice of real numbers c1; : : : ; cN 2 R. Can be seen as generalized covariances. Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 11 / 36
  • 20. Kernel methods Theoretical overview Interpretation of kernels Suppose an object x has an implicit feature representation (x) 2 F. A kernel function can be seen as a dot product in this feature space: k(x; x0) = h(x); (x0)i Linear models in this feature space F can be made: y(x) = wT(x) = X n ank(xn; x) X F k h(x), (x0)i dinsdag, 10 april 2012 Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 12 / 36
  • 21. Kernel methods Theoretical overview Many kernel methods exist Examples of popular kernel methods: Support vector machine (SVM) Regularized least squares (RLS) Kernel principal component analysis (KPCA) Learning algorithm is independent of the kernel representation! SVM KPCA Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 13 / 36
  • 22. Kernel methods Dealing with sequences Kernels using sequence alignment sequence alignment optimises a score of how well the residues match use this score as a kernel value (similarity for sequences) Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 14 / 36
  • 23. Kernel methods Dealing with sequences Kernels using substrings Spectrum kernel (SK) The SK considers the number of k-mers m two sequences si and sj have in common. SKk (si ; sj ) = X m2k N(m; si )N(m; sj ) with N(m; s) the number of k-mers m in sequence s. Many modi
  • 24. cations exist. Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 15 / 36
  • 25. Kernel methods Dealing with graphs What is a graph? Graph Graphs are a set of interconnected objects, called vertices (or nodes), that are connected through edges. Graphs can show the structure of an object or interactions between dierent objects. Graph are important in bioinformatics! Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 16 / 36
  • 26. Kernel methods Dealing with graphs Comparing nodes within a graph Diusion kernel Constructing a similarity between vertices within the same graph. Based on performing a random walk on a graph. Captures the long-range relationships between vertices. Inspired by the heat equation. The kernel quanti
  • 27. es how quickly `heat' can spread from one node to another. Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 17 / 36
  • 28. Kernel methods Dealing with graphs Comparing two separate graphs Graph kernel Constructing a similarity between graphs. Also based on performing a random walk on both graphs and counting the number of matching walks. Usually very computationally demanding! In chemoinformatics: In structural bioinformatics: A Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 18 / 36
  • 29. Kernel methods Other kernels Kernels for
  • 30. ngerprints Objects that can be described by a long binary vector x can be represented by the Tanimoto kernel: KTan(xm; xn) = hxm; xni hxm; xmi + hxn; xni hxm; xni : Fingerprint representation of a molecule: Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 19 / 36
  • 31. Kernel methods Other kernels Kernels for other objects Kernels for texts: often based on word count (example: medical papers) Kernels for point clouds (example: using 3D structure of proteins) Fisher kernels: use information of a generative model (example: using a Hidden Markov Model) Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 20 / 36
  • 32. Learning relations Learning relations Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 21 / 36
  • 33. Learning relations Kronecker kernels A little math... A = a11 a12 a21 a22 and B = b11 b12 b21 b22 : (1) We de
  • 34. ne the Vectorization operator: vec(A) = 2 a11 a12 a21 a22 664 3 775 And the Kronecker product: A B = 2 a11b11 a11b12 a12b11 a12b12 a11b21 a11b22 a12b21 a12b22 a21b11 a21b12 a22b11 a22b12 a21b21 a21b22 a22b21 a22b22 664 3 775 Key equation: (BT A)vec(X) = vec(AXB) Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 22 / 36
  • 35. Learning and Ranking Algorithms Learning relations Kronecker for kernels Bioinformatics Introductory example: Applications Kernels for pairs of objects chemogenomics Michiel Stock, Willem Waegeman, Bernard De Baets KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics Pairwise kernel Combine the kernel matrices of the individual objects to construct a kernel matrix for pairs of objects. set of proteins and a database of ligands to aid the process of drug statistical model based on a data set. Kernel methods allow for the protein and a ligand. Introductory example: chemogenomics binding interactions between a set of proteins and a database of ligands to aid the process of drug used to model pairwise relations between different types of objects. Ligands ( , ) ( , ) ( , ) By optimizing a ranking loss, our algorithms can also be used for conditional ranking, as shown on the right. In short, our framework is ideally suited for bioinformatics ... challenges: ( , ) - efficient learning ( , ) process - can handle complex objects (graphs, trees, sequences...) - ability to deal with information retrieval problems Object kernels Pairwise kernel SVM RLS ... Learning algorithm Kronecker kernel: K = K K our algorithms can also be used for the right. ideally suited for bioinformatics relevant relevant Object kernels Data set Conditional ranking algorithm Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 23 / 36
  • 36. Learning relations Kronecker kernels Kernel ridge regression for relations set y = vec(Y ) and K = K K We can just use the usual kernel ridge regression: arg min a (yKa)T (yKa)+ aTKa This is equivalent to solving the following linear system: (K + INMNM)a = y N objects of type U (e.g. proteins) M objects of type V (e.g. ligands) Y : N M label matrix (e.g. molecular interaction) K: N N kernel matrix for objects of type U K : M M kernel matrix for objects of type V Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 24 / 36
  • 37. Learning relations Conditional ranking ( , ) ( , ) Conditional ranking ... ( , ) ( , ) Motivation Suppose one is not particularly interested in the exact value of the interaction but in the order of the proteins for a given ligand. kernels Pairwise kernel SVM RLS ... Learning algorithm used for bioinformatics More relevant Query 1 Query 2 Database objects More relevant Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 25 / 36
  • 38. Learning relations Conditional ranking Conditional ranking Suppose: e = (u; v) 2 E = (U V) Train the model: h(e) = wT (e) = X e2E aeK(e; e) by solving: A(T) = arg min h2H L(h;T)+khk2 H: Where we use a ranking loss: X L(h;T) = u;u02U X v;v02V preference graph: Figure 1 Example of a multi-graph. If this graph, on the left, would be used for conditioned on C, then A scores better than E, which ranks higher than E, which higher than D and D ranks higher than B. There is no information about the relation and G, respectively, our model could be used to include these two instances in are available. Notice that in this setting unconditional ranking of these objects graph is obviously intransitive. Figure reproduced from (Pahikkala et al., 2010). (yu;vyu0;v0h(u; v)+h(u0; v0))2: The proposed framework is based on the Kronecker product kernel implicit joint feature representations of queries and the sets of objects Exactly this kernel construction will allow a straightforward existing framework to dyadic relations and multi-task learning Michiel Stock (KERMIT) Kernels for Computational Biology (Objectives 1 and 2). It has November been proposed 2014 independently 26 by / 36 three
  • 39. Predicting enzyme function Predicting enzyme function Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 27 / 36
  • 40. Predicting enzyme function The data set Data: two data sets of ca. 1600 enzymes with 21 dierent functions
  • 41. ve dierent similarity measures of the active site active site of an enzyme: Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 28 / 36
  • 42. Predicting enzyme function The enzyme commission number Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 29 / 36
  • 44. ning the problem Quantifying enzyme function similarity EC 2.7.7.12 EC 4.2.3.90 EC ?.?.?.? EC 2.7.7.34 EC 4.6.1.11 EC 2.7.1.12 1 0 0 3 0 2 0 2 0 Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 30 / 36
  • 46. ning the problem Conditional ranking of enzymes Ranking enzymes For an unannotated enzyme, rank the annotated enzymes so that the top has a similar function w.r.t. the query. Minimize ranking error: number of switches needed for a perfect ranking Example: suppose one has an enzyme with unknown function: EC ?.?.?.? 1 EC 2.7.7.12 2 EC 2.7.7.12 3 EC 2.7.7.34 4 EC 2.7.1.12 5 EC 2.7.7.34 6 EC 4.2.3.90 7 EC 1.14.11 8 EC 4.6.1.11 ) EC 2.7.7.12 Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 31 / 36
  • 48. ning the problem Learning the catalytic similarity pair of enzymes: e = (v; v0) label ye 2 f0; 1; 2; 3; 4g: the catalytic similarity
  • 49. ve dierent structural similarities: K(v; v0) Enzymes A B C D E F G A 4 4 0 0 0 B 4 4 0 0 0 C 0 0 4 2 1 D 0 0 2 4 3 E 0 0 1 3 4 F G Enzymes Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 32 / 36
  • 50. Predicting enzyme function Results Qualitative improvement in the enzyme similarities Example for CavBase structural similarity: Unsupervised Supervised Ground truth Lighter color = higher similarity Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 33 / 36
  • 51. Predicting enzyme function Results Improvement of the ROC curves ROC curves for the
  • 52. ve dierent structural similarity measures: unsupervised and supervised ROC curve for the different enzyme similarity False positive rate Average true positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 CB sup. FP sup. LPCS sup. MCS sup. SW sup. CB unsup. FP unsup. LPCS unsup. MCS unsup. SW unsup. measurements of data set I Improvement Increase of AUC from ca. 0.7 to more than 0.8! Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 34 / 36
  • 53. Conclusions Conclusions kernels can be used to work with structured objects... ... and can encode your prior knowledge many problems in computational biology can be seen as `learning relations' relations between objects can be learned elegantly and eciently using Kronecker kernels Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 35 / 36
  • 54. Conclusions Kernel Methods and Relational Learning in Computational Biology ir. Michiel Stock Faculty of Bioscience Engineering Ghent University November 2014 KERMIT Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 36 / 36