Bioinformatics kernels relations
Upcoming SlideShare
Loading in...5
×
 

Bioinformatics kernels relations

on

  • 384 views

 

Statistics

Views

Total Views
384
Views on SlideShare
384
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Bioinformatics kernels relations Bioinformatics kernels relations Presentation Transcript

  • Kernel Methods and Relational Learning in Bioinformatics ir. Michiel Stock Dr. Willem Waegeman Prof. dr. Bernard De Baets Faculty of Bioscience Engineering Ghent University November 2012 KERMITir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 1 / 40
  • Outline1 Introduction2 Kernel methods3 Learning relations4 Case studies Enzyme function prediction Protein-ligand interactions Microbial ecology5 Conclusions ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 2 / 40
  • IntroductionIntroductory exampleProblem statementPredict protein-protein interactions based on high-throughput data. Based on a gold standard Typical features that can be used: Yeast two-hybrid Pfam profile Phylogenetic profile Localization PSI-BLAST Expression ... ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 3 / 40
  • IntroductionMachine learning is widelyagaused in bioinformatics 88 Larran‹ et al. Downloaded from bib.oxfordjournals.org at Biomedische Bibliotheek o Figure 1: Classification of the topics where machine learning methods are applied. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 4 / 40
  • IntroductionBioinformatics deals with complex dataBioinformatics data is typically: in large dimension (e.g., microarrays or proteomics data) structured (e.g., gene sequences, small molecules, interaction networks, phylogenetic trees...) heterogeneous (e.g., vectors, sequences, graphs to describe the same protein) in large quantities (e.g., more than 106 known protein sequences) noisy (e.g., many features are not relevant) ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 5 / 40
  • Kernel methodsFormal definition of a kernelKernels are non-linear functions defined over objects x ∈ X .DefinitionA function k : X × X → R is called a positive definite kernel if it issymmetric, that is, k(x, x ) = k(x , x) for any two objects x, x ∈ X , andpositive semi-definite, that is, N N ci cj k(xi , xj ) ≥ 0 i=1 j=1for any N > 0, any choice of N objects x1 , . . . , xN ∈ X , and any choice ofreal numbers c1 , . . . , cN ∈ R.Can be seen as generalized covariances. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 6 / 40
  • Kernel methodsInterpretation of kernels Suppose an object x has an implicit feature representation φ(x) ∈ F. A kernel function can be seen as a dot product in this feature space: X F k(x, x ) = φ(x), φ(x ) h (x), (x0 )i k Linear models in this feature space F can be made: dinsdag, 10 april 2012 T y (x) = w φ(x) = an k(xn , x) n ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 7 / 40
  • Kernel methodsMany kernel methods exist SVM Examples of popular kernel methods: Support vector machine (SVM) Regularized least squares (RLS) Kernel principal KPCA component analysis (KPCA) Learning algorithm is independent of the kernel representation! ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 8 / 40
  • Kernel methodsKernels for (protein) sequencesSpectrum kernel (SK)The SK considers the number of k-mers m two sequences si and sj have incommon. SKk (si , sj ) = N(m, si )∗N(m, sj ) m∈Σk with N(m, s) the number of k-mers m in sequence s. To predict structure, function... of DNA, RNA or proteins. A discriminative alternative for Hidden Markov Models. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 9 / 40
  • Kernel methodsKernels for graphs (1)GraphGraphs are a set of interconnected objects, called vertices (or nodes), thatare connected through edges.Graphs can show the structure of an object or interactions betweendifferent objects. Graph are important in bioinformatics! ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 10 / 40
  • Kernel methodsKernels for graphs (2)Graph kernelConstructing a similarity between graphs. In chemoinformatics: Based on performing a random walk on both graphs and counting the number of In structural bioinformatics: matching walks. Usually very computationally demanding! A ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 11 / 40
  • Kernel methodsKernels for graphs (3)Diffusion kernelConstructing a similarity between vertices within the same graph. Also based on performing a random walk on a graph. Captures the long-range relationships between vertices. Inspired by the heat equation. The kernel quantifies how quickly ‘heat’ can spread from one node to another. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 12 / 40
  • Kernel methodsKernels for fingerprints Fingerprint representation of Objects that can be described an object: by a long binary vector x can be represented by the Tanimoto kernel: KTan (xm , xn ) = xm , xn . xm , xm + xn , xn − xm , xn ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 13 / 40
  • Learning relationsKernels for pairs of objectsProblem statementPredict the binding interaction between a given protein and a ligand(small molecule). Learning Molecular docking. The problem deals with two types of objects: Proteins (graph kernel of structure, sequence kernel, fingerprints...) Ligand (fingerprints, graph kernel...) Label is for a pair of objects. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 14 / 40
  • Learning relationsng and Ranking Algorithms for Bioinformatics example: pairs of objects Kernels for ApplicationsnomicsWillem Waegeman, Bernard De Baets Michiel Stock, Pairwise kernelIT, Department of Mathematical Modelling, Statistics and Bioinformaticsof Combine the kernel matrices of the individual the process of druga kernel proteins and a database of ligands to aid objects to constructistical model based objects. matrix for pairs of on a data set. Kernel methods allow for theroductory example: chemogenomicstein and a from individual kernels for the proteins and ligands: Starting ligand.ding interactions between a set of proteins and a database of ligands to aid the process of drugto model pairwise relations between different types of objects.s Data set Object kernels ( , ) By optimizing a ranking loss, our algorithms can also be used for ( , ) as shown on the right. conditional ranking, ( , ) SVM In short, our framework is ideally suited for bioinformatics RLS ... challenges: ( , ) - efficient learning process ( , ) ... - can handle complex objects (graphs, trees, sequences...) Pairwise kernel - ability to deal with information retrieval problems Object kernels Learning algorithm gorithms can also be used for ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 15 / 40
  • ( , ) Learning relations SVM Conditional ranking (1) RLS ... Motivation( , ) Suppose one is not ) ... ( , particularly interested in the exact value of the interaction but in the order of the proteins for a given ligand. Pairwise kernelrnels Learning algorithmed for More relevant More relevantmatics Query 1 Query 2 Database objects ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 16 / 40
  • Learning relationsConditional ranking (2) Based on a graph description, with e a pair of objects. Train the model: h(e) =< w, Φ(e) >= ae K Φ (e, e ) ¯ e∈E using the algorithm: 2 A(T ) = argmin L(h, T )+λ h H. h∈H Figure 1 Example of a multi-graph. If this graph, on the left, would be used fo conditioned on C, then A scores better than E, which ranks higher than E, w Where we use a ranking loss: higher than D and D ranks higher than B. There is no information about the re and G, respectively, our model could be used to include these two instances in are available. Notice that in this setting unconditional ranking of these objects graph is obviously intransitive. Figure reproduced from (Pahikkala et al., 2010). L(h, T ) = (ye −ye −h(e)+h(¯))2 . ¯ e The proposed framework is based on the Kronecker product ke v ∈V e,¯∈Ev e implicit joint feature representations of queries and the sets of ob Exactly this kernel construction will allow a straightforward existing framework to dyadic relations and multi-task l (Objectives 1 and 2). It has been proposed independently by three modeling pairwise inputs in different application domains (Basilico ir. Michiel Stock (KERMIT) Kernels for Bioinformatics et al. 2004, Ben-Hur et al. November a2012 2005). From different perspective, it h 17 / 40
  • Case studies Enzyme function predictionPredicting enzyme functionProblem statementPredict the function (EC number) of an enzyme using structuralinformation of the active site. Data: active site of an 1730 enzymes with 21 enzyme: different functions four different structural similarities CavBase maximum common subgraph labeled point cloud superposition fingerprints ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 18 / 40
  • Case studies Enzyme function predictionEC numbersEC numberA functional label of an enzyme, based on the reaction that is catalyzed.Example: EC 2.7.6.1 = ribose-phosphate diphosphokinase ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 19 / 40
  • Case studies Enzyme function predictionDefining catalytic similarityCatalytic similarityThe catalytic similarity is the number of successive equal digits in the ECnumber between two enzymes, starting from the first digit. 0 EC 2.7.7.34 EC ?.?.?.? 3 2 0 1 EC 4.2.3.90 0 0 0 EC 4.6.1.11 2 EC 2.7.1.12 EC 2.7.7.12 ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 20 / 40
  • Case studies Enzyme function predictionData exploration Kernel PCA of the cb data Kernel PCA of the fp data Kernel PCA of the mcs data Kernel PCA of the lpcs data ●● ● ● ●● ●● ● ● ●● ●● ●● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ● ●● ●● ● ●● ● ●● ● ●●● ● ● ● ●● ●●● 0.8 ● ● ● ● ● ●● ●●●● ● ●●●● ● ● ●● ●● ● ●●●●● ● ● ●● ● ● ●●●●● ●●● ●● ●●●●● ● ● ● ● 400 ● ● ● ●●● ● ● ● ●●● ● ● ● ● ●● ●●● ●● ● ●● ● ●●● ● ●●● ● ●●● ●●●● ●● ● ● ●●● ● ●●●●● ●●● ●● ● ●●●●●●●●●●● ● ●● ● ● ● ● ● ● ●● ●● ● ●●●● ● ●● ●● ●● ●● ●● ●● ● ● ●● ●●●● ●● ● ●● ● ●● ●● ● ● ● ●●●●●● ●● ●●●● ● ● ● ●●●●● ●● ●●●●●●●●●●●●●●●●●● ●● ●● ●● ● ●●● ●● ● ●● ● ●● ●● ● ●● ●● ●● ● ● ●● ●● ●● ●● ● ●● ●● ●● ●● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●●●●● ●● ●● ● ●● ●● ● 4 ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ●●●●●●●● ●● ●● ●● ● ● ● ●● ● ●● ●● ●● ● ●●●● ● ● ●● ● ● ●●● ●● ●● ●● ● ● ● ●● ●● ●● ● ● ● 0.6 ●●●●●● ●●●● ● ●● ●●●● ● ●● ●● ●● ●● ●●●● ● ● ● ●●●●●● ● ● ●● ●● ●●●● ● ●●●● ● ●● ● ●● ● ● ●● ● ● ● ●● ●●● ●●● ●● ● ●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● 1.0 ●●●●● ●● ●●●●● ●●●●● ●●● ● ●●●● ●●● ● ●●●●● ● ●●● ● ●●● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●●●● ● ●● ● ● ●● ● ● ●●● ●●●● ● ●● ● ● ● ● ●● ● ● ●● ● 200 ● ●●● ● ●● ●●●●● ●●●●● ●●●● ● ●●●●● ●●●●● ●●●●● ●●●●● ●● ● ●●●●● ●●● ● ●● ●●● ● ● ● ● ● ●● ●● ●●●● ● ● ● ● ●●● ● ● ●● ●●● ●● ●● ●●● ●● ● ● ●● ● ●● ● ●● ●● ●●●● ● ●● ● ●●●● ● ●●●● ● ● ●●● ●● ● ●●● ●● ●●●● ● ● ● ● ●● ●● ● ● ●●● ●●● ●● ●● ●● ● ●●● ● ●●●● ●● ●●● ●●● ●●●●● ●●●●● ● ●●● ●●● ●●●●● ●●●● ●●●● ● ● ● ● ● ● ●● ●● ● ●●● ● ●●● ●●●● ●●● ●● ●●●●●● ●● ●●●●● ●●●● ●●● ●●●●● ● ●● 3 ●●● ●● ●●● ●●● ●● ●● ●●●● ● ●●● ●●● ●● ●● ●● ● ●●●● ● ●●●● ●●●●● ●●● ●●●● ●●●● ● ● ●● ● ●●●● ●●●●●● ●●●● ●●●● ●● ● ● ●● ●● ●● ●● ●●●●● ●●● ● ●●● ● ●● ● 0.4 ●● ● ●● ●● ●● ●● ● ● ● ●● ● ●● ● ●●● ● ● ●●● ●● ●● ●● ● ●● ●● ●●●● ● ●● ● ●●●● ● ●●●● ●● ● ●●●● ●●● ●● ●●● ●●●● ●●● ●● ● ●●●● ●● ● ●● ● ● ●●●●●●● ● ●●●●●● ● ●●●●● ● ●● ●● ●● ●● ●● ●●● ● ●● ● ●●●● ● ●●●● ●● ● ●●●● ● ●●● ●●● ●● ●●●●● ●●● ●● ●●● ● ●● ●●● ● ● ●●● ●● ● ●●●●● ● ● ●● ●● ● ● ●●●●●●●● ●●●●● ●● ● ●● ● ●●● ●● ● ●● ● ●●●●● ● ●● ●● ● ●●●●●●● ●● ● ● ● ●●●●●●●●● ●● ● ●● ●●● ●●●● ● ●● ●●●●●● ●●●●● ●● Second component ● ● ●●●●● ●●●●● ●● ● ●● ●● ●● ●● ● ● ● ●● ●●● ●● ●●● ●● ● ●● ● ● ●●●●● ●● ●●● ●● ●●●●● ●●●● ●● ● ●●●● ●●● ●●● ●● ●● ● ● ●●●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●●● ● ●● ● 0 ●● 0.5 ●●● ●● ●● ● ●● ● ● ● ● ●● ●●●●●● ●●●●● ● ● ●● ● ●●● ●● ●● ●● ●● ●● ●●● ● ● ●● ● ● ● ● ● ●●● ● ●●●●●●●●●● ● ● ●●●●●●●●● ●●● ●●●●●●●● ● ● ●●●●●●●● ●●●●● ●●●●●● ●● ● ●● ●● ●●● ● ●● ●● ●● ● ● ●● ● ●● ● ●● ● ● ●●● ● ● ●●●● ●●●●●●● ● ●● ●●● ● ●●● ●● ●●●●●● ●● ●● ●● ●● ●●●●●●● ●● ● ● Third component ● ● ● ●● ● Second component ●●●●● ●●● ● 2 ●● ●●● ●●● ●● ● ●● ● ● ●● ● ● ●●● ● ●●● ● ●● ● Third component ● ● 0.2 ●● ● ● ● ●● ● ●● ●● ● ● ● ●●●● Third component ●●● ● ● ● ●● Second component ● ●●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ●●●● ● ● ● ● ●● ● ● ●● ● ●● ●●●● ● ● ● ● ●●●●● ●● ●● ●● ●● ● ● ● ● ●● ● ● Third component ●●●● ● ● ● ●●●● ● ●● ● ●● ● ●●●●● ●● ● ● ● ●●●●●●●●●●● ●● ● ● ● ● ● ●● −200 ●● ● ●● ● ●● ●●●●● ● ● ●●●● ●● ● Second component ● ●● ●●●●● ● ● ●●●●● ● ●● ●●●●● ●●● ● ● ●● ● ●● ●● ●●●●● ● ●● ● ● ●●●●●●● ●● ●● ● ●●●●● ●●●●● ●●●●● ● ●●●●●●● ● ● ●● ● ●● ●● ● ●● ●● ● ●● ● ●● ● ●● ● ●● ● ● ●● ●● ● ●●●● ● ● ● ●●● ●● ●●●●●●●●● ● ● ● ●●● ● ● ● ●● ●●●●● ●●●●● ●● ●● ● ●●●●● ●● ● ●●● ● ●●●●●● ●●●●● ● ● ● 2.5 ● ●●● ●●● ●● ●● ●●●●● ● ●● ● ● ●● ●●● 0.0 ● ●● ● ●● ● 1 ● ●●● ●● ●● ● ●● ● ●● ●●● ● ●●● ● ●● ●●●● ● ●● ●● ● ● ●● ● ● ●●●●●●● ●●●●●● ●● 0.0 ● ●● ●●●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ●●● ● ● ● ●●● ●● ● ●● ● ●● ● ●● ●● ●● ● ● ●●● ●●●● ●● ● ●●● ●● ●●● ● ●●● ● ● ●● ● ● ● ●● ● ●● ●●●●● ● ●● ● ●● ● ● ● ●● ● ● ●●● ● ●● ●● ●●●● ●● ●● ●● ● ●●●● ● ● ●● ●● ●●● ● ●●● ●● ●● ● ● ●● ● ●● ●● ● ●● ● ● ●●● 2.0 ● ●● ●● ●●● 3 ● ● ● ●● ● −400 ● ●● ●● ●● ●●● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ●● ● 1000 ●● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ●●● ●● ●● ●● ● ●● ● ● ● ●● ●●●● ● ● ●● ●● ● ● ● ●● ●●● ●●●● ● ● ●●●● ● ●● ● ●● ●● ● 2 −0.2 ● ● ● ● ●●●●●●●● ● 0 ● ●● ● 1.5 ● ●●● ●● ●● ● ●●● 1.5 ●● ● 800 ● ● ●● ● ● ●● ● ●●● ●●●● ●● ●● −0.5 ● ● ● ● ● ●● ●● ● ●● 1 ● 1.0 ●● ●● ● ● ●● ● ●●● 600 ●● ●● −600 ● ● ● ● ●● 1.0 ●● ● ● ●● ● ● ● 0 ● 0.5 ●●● ●● ●● ● −0.4 −1 ● ● ●● ●● ●● ● ● ● ● ● 400 ●● ●● ● ●● ●● ● −1 ● ●● 0.0 0.5 ● ● ● ● ● ●● ● ●● −1.0 ● ● ●● 200 ● ● ●● ● −2 −800 ●● −0.5 0.0 ● ●● ● ●● −0.6 ●● ● ●● ● −2 0 ●● ● −1.0 ● ●● ● ● ● ● −3 ● −200 −0.5 ●● −4 −1.5 ● −1000 −0.8 −1.5 −3 −400 −2.0 −1.0 −5 −800 −600 −400 −200 0 200 400 600 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 −2 −1 0 1 2 3 4 −8 −6 −4 −2 0 2 4 First component First component First component First component Hierarchical clustering of the cb data Hierarchical clustering of the fp data Hierarchical clustering of the mcs data Hierarchical clustering of the lpcs data 300 14 10 12 250 8 10 10 200 8 6 150 6 100 4 5 4 50 2 2 0 0 0 0 dist(D) dist(D) dist(D) dist(D) hclust (*, "complete") hclust (*, "complete") hclust (*, "complete") hclust (*, "complete") ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 21 / 40
  • Case studies Enzyme function predictionRanking enzymesRanking enzymesFor a query enzyme with unknown function, construct a ranking of adatabase of annotated enzymes, based on structure. The top of theranking has likely the same function as the query. unsupervised: for a given query enzyme with unknown function, rank the database according to the structural similarity with the query supervised: first a ranking model h(v , v ) is constructed by using an independent training set. Subsequently for a given query enzyme v with unknown function, rank the enzymes vi from the database according to h(v , vi ) ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 22 / 40
  • Case studies Enzyme function prediction Results ranking enzymes Table IIHE RESULTS OBTAINED FOR UNSUPERVISED AND SUPERVISED RANKING . F OR EACH COMBINATION OF CAVITY- BPERFORMANCE MEASURE THE PERFORMANCE IS AVERAGED OVER THE DIFFERENT FOLDS AND QUERIES , WITHWEEN PARENTHESES . F OR EVERY ROW THE BEST RANKING MODEL IS MARKED IN BOLD , WHILE THE WORST MO BY AN UNDERSCORE . cb fp mcs lpcs RA 0.9062 (0.0603) 0.8815 (0.0689) 0.8923 (0.0692) 0.8877 (0.0607) MAP 0.9321 (0.1531) 0.7207 (0.235) 0.8846 (0.1578) 0.7339 (0.2074) Unsupervised AUC 0.9636 (0.0795) 0.8655 (0.1387) 0.9393 (0.0919) 0.8794 (0.1126) nDCG 0.9922 (0.0329) 0.9349 (0.1424) 0.9812 (0.0498) 0.9471 (0.1112) RA 0.9951 (0.017) 0.995 (0.015) 0.9944 (0.0112) 0.9952 (0.0156) MAP 0.9991 (0.0092) 0.9954 (0.0432) 0.9989 (0.0076) 0.9835 (0.0797) Supervised AUC 0.9976 (0.0005) 0.9967 (0.0184) 0.9975 (0.0024) 0.9934 (0.0368) nDCG 0.9968 (0.0171) 0.9942 (0.0424) 0.987 (0.0398) 0.9812 (0.0673)A of the cb data Kernel PCA of the fp data Kernel PCA of the mcs data Kernel PCA of ●● ● ● ●● ●● ● ● ●● ●● ●● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ●● ● 0.8 ● ● ● ● ● ● ● ●● ●●●● ● ●●●● ● ● ● ●● ●●● ●● ●●●●● ● ● ● ●●● ● ● ●● ●●● ●●● ●● ● ●● ● ●●● ● ●●● ● ●●●● ● ● ●●● ● ●●●●● ●●● ●● ● ●●●●●●●●●●●●● ● ● ●● ●● ● ●●●● ● ●● ●● ●● ●● ●● ●● ● ● ●● ●●●● ●● ● ●● ● ●● ●● ● ● ● ●●●●●● ●● ●●●● ● ● ● ●●●●● ●● ●●●●●●●●●●●●●●●●●● ●● ●● ●● ● ●● ●● ●● ●● ● ●● ●● ●● ●● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●●●●● ●● ●● ● 4 ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ●●●●●●●● ●● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ●●● ●● ●● ●● ● ● ● ●● ●● ●● ● ● ● ● ● 0.6 ● ● ● ●● ●● ●● ●●●● ● ●● ●● ● ●● ● ● ●● ● ● ●● ●●● ●●● ●● ● ●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● 1.0 ●● ●● ●● ● ● ●● ● ●●● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ●●●● ● ●● ●● ● ● ● ●● ● ● ● ●●● ● ● ●●●●● ●●● ●●● ●●●● ● ● ●●●● ●● ●● ●● ● ● ● ●● ●●● ●●● ●●●● ● ● ● ●● ●●●●● ●●●● ● ● ●●● ●● ●●● ●● ● ●●● ●●● ●●●● ●●● ● ● ● ●● ●●● ●●● ● ●●●● ●● ●●● ●●●● ●● ●●●● ●●● ● ●●●●● ●●●●●●● ●●●●● ●● ●●● ● ●●● ● ●● ●● ●●● ●●●● ● ●● ● ●●●● ● ●●●● ●●●●● ●●●●● ● ●● 3 ●●● ●● ●● ●● ●●●● ● ●● ● ●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●● ● ●● ●● ●● ●●●●● ●●●● ●●●● ●●●● ● ●●●● ●●●●●● ●●●● ●● ●●● ● ●● ● ● ●●● ● ● 0.4 ●● ● ●● ● ● ●● ●●●● ● ●●● ●● ●● ●● ●● ●● ●● ●●● ●● ●●●● ●● ● ●●●● ●●●● ● ●● ●●●● ●● ● ●●●● ●●● ●● ●●● ●●●● ●●● ●● ● ●●●● ●● ● ● ●●●●●●● ● ● ● ●●● ● ●●●●● ● ● ●●● ●● ●● ●● ●● ● ●●●● ●● ● ● ●● ●●● ●●●● ●●●●● ●● ●●● ●● ●●● ● ●● ●●● ● ●●●●●●● ●● ● ●● ● ●●●●● ●●● ● ● ● ●●●●●●●● ●●● ●●● ●● ●●● ●● ● ● ●●●●●●●●● ●●●● ● ● component ● ● ●● ●●●●● ●● ● ●●●●● ●● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ●●●●●● ●● ●●●● ●● ● ●● ● ● ●●● ●●●● ●●● ●●●● ● ● ●●●●●●●● 0.5 ●●● ●●● ● ●●● ●●● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●● ●● ●●● ● ●● ● ●●● ●● ● ●● ●● ● ● ● ●●●●●●●●●● ● ● ●●●●●●●●● ●●● ●●●●●●●● ● ● ●●●●●●●● ●●●●● ●●●●●● ●●●● ●●●●●●● ● ●● ●●● ● ●●● ●● ●●●●●● ●● ●● ●● ●● ●●●●●●● ●● ● ●● ●● ● ●● ● ● ●●● ● ●● ● omponent ● ●●●●● ●●● ● ● ●● ● ● 2 ●● ●●● ●●● ●● ● ● ●● ● ● ● ●● ● ● ●●● ● ●●● 0.2 ●● ● ●● ●● ●● ● ● ● ●●● ● ponent ●●●● ●● ● ● ●● ●● ponent ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●● ●● ●●●● ● ● ● ●● ● ● ●●●●● ●● ●● ●● ●● ● ● onent ●●●● ● ● ● ●●●●● ● ● ●● ● ●●●●●●● ● ● ● ● ●● ● ●●●●●●●●●●● ●● ● ●● ●● ●●●●● ● ● ●●●● onent ● ●●●●● ●● ● ●●● ● ●● ●● ●● ●●●● ●●● ● ir. Michiel Stock (KERMIT) ●● ● ● ● ●●● ● ●● ● ●● ●●● ●● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● Kernels for Bioinformatics ●●● ● ● ● ● ●●● ● ●●●●● ●●●●● ●●●●●●● ●● ●●● ●●●●●●●● ● ●●●● ●●●●● ● ●●● ●●● ●●●●● ●●● ● ● ● ● ●●●●● ●● ●● ●●●●● ●●●●● ● ● ● ●● ●●● ● ●●●●●● ●●●●● ● ● ●● ● ●●●●● ● ● ● ●● ● November 2012 2.5 23 / 40 0.0 ● ●● ● ● ●●● ●● ●● ● 1 ● ●● ●● ● ● ● ●● ●●●● ● ●●● ●● ●●●●●●● ●●●●●● ● ●● ●●● ● ●●● ● 0.0 ● ●●●● ● ● ●● ● ●● ●● ● ● ●● ● ●● ● ●● ● ● ●● ●●● ● ● ● ●●● ●● ● ●● ●● ● ● ● ●● ●●●● ●● ●●●● ● ● ●●● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●●●●● ● ● ●● ●● ● ● ●
  • Case studies Enzyme function predictionSupervised ranking preserves hierarchies (1) ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 24 / 40
  • Case studies Enzyme function predictionSupervised ranking preserves hierarchies (2) Unsupervised Unsupervised Unsupervised Unsupervised cb fp mcs lpcs 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.50 1.0 140 ● ● ● 0.45 120 0.8 100 0.40 0.6 prediction prediction prediction prediction 80 0.35 ● ● 60 ● 0.4 ● 0.30 ● ● ● 40 ● 0.2 0.25 20 ● ● ● ● ● ● ● ● ● ● ● ● 0.20 ● 0.0 ● ● ● 0 0 1 2 4 0 1 2 4 0 1 2 4 0 1 2 4 cat. cat. cat. cat. similarity similarity similarity similarity Supervised Supervised Supervised Supervised cb fp mcs lpcs 4 4 4 4 3 3 3 3 prediction prediction prediction prediction 2 2 2 2 ● ● 1 1 1 1 ● ● ● ● ● ● ● ● ● ● 0 ● ● ● 0 ● 0 0 ● ● ● ● ● ● ● −1 0 1 2 4 0 1 2 4 0 1 2 4 0 1 2 4 cat. cat. cat. cat. similarity similarity similarity similarity ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 25 / 40
  • Case studies Protein-ligand interactionsPredicting protein-ligand interactionsProblem statementPredict the binding interaction between a given protein and a ligand(small molecule). Learning Molecular docking. Training using the Karaman dataset: 317 kinase targets 38 kinase inhibitors For each combination the dissociation coefficient Kd in nM is known. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 26 / 40
  • Case studies Protein-ligand interactionsKaraman dataset © 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 27 / 40
  • Case studies Protein-ligand interactionsBuilding a modelFeatures CavBase similarity for proteins Tanimoto kernel from the fingerprints derived from ligands Virtual docking resultsModel types: Classification by specifying a cutoff value, using RLS. Conditional ranking, use one type of object to construct a ranking of the other type according to binding energy. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 28 / 40
  • Case studies Protein-ligand interactionsProtein-ligands results classification Test sampling Cutoff [nM] AUC 1000 0.621584 (0.104163) new ligand 10000 0.653330 (0.107727) 1000 0.812184 (0.185627) new protein 10000 0.801310 (0.157205) Cutoff value hardly matters Generalizing to new ligand harder than for new protein ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 29 / 40
  • Case studies Protein-ligand interactionsProtein-ligands results rankingTesting scheme: new query for the same database Query type Ranking error Ligand 0.324000 (0.129307) Protein 0.32799 (0.088344) Query type does not matter (much) Using protein as query somewhat more reliable ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 30 / 40
  • Case studies Microbial ecologyPredicting microbial interactionsProblem statementHow do heterotrophic bacteria influence the growth of methanotrophicbacteria? Dataset: 10 methanotrophs 27 heterotrophs Of each combination a time series of their collective growth (OD) was measured for 14 days. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 31 / 40
  • Case studies Microbial ecologyConcept Methanotrophs Heterotrophs Carbon compounds Methane vitamins? antibiotics? Features: ⌦ ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 32 / 40
  • Case studies Microbial ecologyExperimental setup ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 33 / 40
  • Case studies Microbial ecologyOptical density time series Meth_5 and Hetero_2 Meth_7 and Hetero_10 0.20 0.30 0.25 0.15 max OD ● 0.20 ● ● ● ● 0.10 max OD OD OD 0.15 ● 0.10 ● 0.05 max increasment OD ● 0.05 max increasment OD ● ● ● 0.00 ● ● ● ● 0.00 ● 0 5 10 15 0 5 10 15 Time (days) Time (days)Three types of labels were derived from these plots: maximal optical density maximal increase in optical density time of maximal increase in optical density ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 34 / 40
  • Case studies Microbial ecologyLabels for bacterial combinations Color Key and Histogram Heat map of the log. of 6 max. density Count 4 2 0 −6 −4 −2 0 Value H 15 H 13 H 23 H 19 H 20 H 17 H 21 H 22 H2 H 24 Heterotrophs H 14 H 16 H8 H3 H1 H 25 H5 H4 H 11 H9 H7 NMS H 18 H 10 H 12 H6 NMS M9 M6 M7 M3 M1 M5 M2 M4 Methanotrophs M8 ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 35 / 40
  • Case studies Microbial ecologyRegression resultsPairwise regression of the labels using support vector regression. Testing isdone by withholding each heterotroph in a leave-one-out scheme. Label MSE/var Spearman cor. Max. OD 0.8248 0.6875 Max. incr. OD 0.7888 0.57708 Time max. incr. OD 0.9694 0.3839 This is a hard problem! Exact experimental conditions very important! ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 36 / 40
  • Case studies Microbial ecologyExtra feature selectionIdeaLook for the most relevant genes for interaction in the heterotrophs usinglasso regression (in combination with the LARS algorithm) orRegularized Random Forests. LARS: LASSO 0 1 2 5 13 15 19 21 34 39 47 65 For example, max. OD seems ******* * ** * 40 * ** ** *** ** * ** * * ** **** * ** 2 87 ** to be determined by genes * * ** ** *** * ** ** * * * * * * * * **** * * * * 445 220 * ** * * ********* * * * * * Standardized Coefficients * * ** **** * ** * ** ********* * ** ** **** * * * ** * * * * ** ** *** * ** * * ********* * 0 related to * * * ** * * * * ** * * * * * * * * * ** ** * ** **** ** * ** **** ** * * * ** ** *** ********* * *** * ** * * * * ** * * * * ** ** * ** **** ** * * ** ** *** * * *** *** * ** * ********* * * * **** ** * * * * * * * * *** ** * ********* * * * * * * ** ** *** * ** ** * * **** * ** * ** **** * ** ** ** *** * ** ******* * ** * * * ** * ** * * ** 1234 methenyltetrahydrofolate. −2 **** * * * ** * * ** ** * ** ** * * ** * * ****** * ** * ** * * Take with a large grain of −4 * * ** ** * ** **** * salt! −6 ** ** ** * ** * ** 238 ** * * * * ****** * ** * 0.0 0.2 0.4 0.6 0.8 1.0 |beta|/max|beta| ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 37 / 40
  • ConclusionsTake-home messages Use kernels for complex structured data. Relations can be learned by treating a pair of objects as a special kind of structured object. Predicting a ranking is in many cases a more relevant answer to a research question. Posing the right research question is of vital importance when building models! ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 38 / 40
  • ConclusionsFurther reading I[1] A. Ben-Hur and W. S. Noble. Kernel methods for predicting protein-protein interactions. Bioinformatics, 21 Suppl 1:i38–46, June 2005.[2] S. Erdin, A. M. Lisewski, and O. Lichtarge. Protein function prediction: towards integration of similarity metrics. Current Opinion in Structural Biology, 21(2):180–8, Apr. 2011.[3] L. Jacob and J.-P. Vert. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics, 24(19):2149–56, Oct. 2008.[4] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Efficient regularized least-squares algorithms for conditional ranking on relational data. Machine Learning, Submitted, 2012.[5] B. Sch¨lkopf, K. Tsuda, and J.-P. Vert. Kernel Methods in o Computational Biology. 2004. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 39 / 40
  • ConclusionsFurther reading II[6] M. Stock. Learning pairwise relations in bioinformatics: three case studies. Master’s thesis, Ghent University, 2012.[7] J.-P. Vert, J. Qiu, and W. S. Noble. A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(S-10), Jan. 2007.[8] W. Waegeman, T. Pahikkala, A. Airola, T. Salakoski, M. Stock, and B. De Baets. A kernel-based framework for learning graded relations from data. IEEE Transactions on Fuzzy Systems, 99:1, 2012. ir. Michiel Stock (KERMIT) Kernels for Bioinformatics November 2012 40 / 40