This document discusses graph kernels, which are positive definite kernels defined on graphs that allow applying machine learning algorithms to graph-structured data like molecules. It covers different types of graph kernels like subgraph kernels, path kernels, and walk kernels. Walk kernels count the number of walks between two graphs and can be computed efficiently in polynomial time, unlike subgraph and path kernels. The document also discusses using product graphs to compute walk kernels and presents results on classifying mutagenicity using random walk kernels. It concludes by proposing using graph kernels and product graphs to define data depth measures for labeled graph ensembles.
To describe the dynamics taking place in networks that structurally change over time, we propose an approach to search for attributes whose value changes impact the topology of the graph. In several applications, it appears that the variations of a group of attributes are often followed by some structural changes in the graph that one may assume they generate. We formalize the triggering pattern discovery problem as a method jointly rooted in sequence mining and graph analysis. We apply our approach on three real-world dynamic graphs of different natures - a co-authoring network, an airline network, and a social bookmarking system - assessing the relevancy of the triggering pattern mining approach.
Towards a stable definition of Algorithmic RandomnessHector Zenil
Although information content is invariant up to an additive constant, the range of possible additive constants applicable to programming languages is so large that in practice it plays a major role in the actual evaluation of K(s), the Kolmogorov complexity of a string s. We present a summary of the approach we've developed to overcome the problem by calculating its algorithmic probability and evaluating the algorithmic complexity via the coding theorem, thereby providing a stable framework for Kolmogorov complexity even for short strings. We also show that reasonable formalisms produce reasonable complexity classifications.
In this paper, we introduce the notions of m-shadow graphs and n-splitting graphs,m ³ 2, n ³ 1. We
prove that, the m-shadow graphs for paths, complete bipartite graphs and symmetric product between
paths and null graphs are odd graceful. In addition, we show that, the m-splitting graphs for paths, stars
and symmetric product between paths and null graphs are odd graceful. Finally, we present some examples
to illustrate the proposed theories.
To describe the dynamics taking place in networks that structurally change over time, we propose an approach to search for attributes whose value changes impact the topology of the graph. In several applications, it appears that the variations of a group of attributes are often followed by some structural changes in the graph that one may assume they generate. We formalize the triggering pattern discovery problem as a method jointly rooted in sequence mining and graph analysis. We apply our approach on three real-world dynamic graphs of different natures - a co-authoring network, an airline network, and a social bookmarking system - assessing the relevancy of the triggering pattern mining approach.
Towards a stable definition of Algorithmic RandomnessHector Zenil
Although information content is invariant up to an additive constant, the range of possible additive constants applicable to programming languages is so large that in practice it plays a major role in the actual evaluation of K(s), the Kolmogorov complexity of a string s. We present a summary of the approach we've developed to overcome the problem by calculating its algorithmic probability and evaluating the algorithmic complexity via the coding theorem, thereby providing a stable framework for Kolmogorov complexity even for short strings. We also show that reasonable formalisms produce reasonable complexity classifications.
In this paper, we introduce the notions of m-shadow graphs and n-splitting graphs,m ³ 2, n ³ 1. We
prove that, the m-shadow graphs for paths, complete bipartite graphs and symmetric product between
paths and null graphs are odd graceful. In addition, we show that, the m-splitting graphs for paths, stars
and symmetric product between paths and null graphs are odd graceful. Finally, we present some examples
to illustrate the proposed theories.
Information Content of Complex NetworksHector Zenil
This short talk given in Stockholm, Sweden, explains how algorithmic complexity measures, notably Kolmogorov complexity approximated both by lossless compression algorithms and the Block Decomposition Method (BDM) are capable of characterizing graphs and networks by some of their group-theoretic and topological properties, notably graph automorphism group size and clustering coefficients of complex networks. The method distinguished between models of networks such as regular, random, small-world and scale-free.
Fractal dimension versus Computational ComplexityHector Zenil
We investigate connections and tradeoffs between two important complexity measures: fractal dimension and computational (time) complexity. We report exciting results applied to space-time diagrams of small Turing machines with precise mathematical relations and formal conjectures connecting these measures. The preprint of the paper is available at: http://arxiv.org/abs/1309.1779
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...Hector Zenil
Complexity measures are designed to capture complex behaviour and to quantify how complex that particular behaviour is. If a certain phenomenon is genuinely complex this means that it does not all of a sudden becomes simple by just translating the phenomenon to a different setting or framework with a different complexity value. It is in this sense that we expect different complexity measures from possibly entirely different fields to be related to each other. This work presents our work on a beautiful connection between the fractal dimension of space-time diagrams of Turing machines and their time complexity. Presented at Machines, Computations and Universality (MCU) 2013, Zurich, Switzerland.
Kernelization algorithms for graph and other structure modification problemsAnthony Perez
Thesis defense on November 14th, 2011, in Montpellier.
Jury:
Stéphane Bessy, Bruno Durand, Frédéric Havet, Rolf Niedermeier, Christophe Paul & Ioan Todinca.
Fuzzy clustering algorithm can not obtain good clustering effect when the sample characteristic is not
obvious and need to determine the number of clusters firstly. For thi0s reason, this paper proposes an
adaptive fuzzy kernel clustering algorithm. The algorithm firstly use the adaptive function of clustering
number to calculate the optimal clustering number, then the samples of input space is mapped to highdimensional
feature space using gaussian kernel and clustering in the feature space. The Matlab simulation
results confirmed that the algorithm's performance has greatly improvement than classical clustering algorithm and has faster convergence speed and more accurate clustering results
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...Hector Zenil
We present a novel alternative method (other than using compression algorithms) to approximate the algorithmic complexity of a string by calculating its algorithmic probability and applying Chaitin-Levin's coding theorem.
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...IJERA Editor
Conventional distributed arithmetic (DA) is popular in field programmable gate array (FPGA) design, and it
features on-chip ROM to achieve high speed and regularity. In this paper, we describe high speed area efficient
1-D discrete wavelet transform (DWT) using 9/7 filter based new efficient distributed arithmetic (NEDA)
Technique. Being area efficient architecture free of ROM, multiplication, and subtraction, NEDA can also
expose the redundancy existing in the adder array consisting of entries of 0 and 1. This architecture supports any
size of image pixel value and any level of decomposition. The parallel structure has 100% hardware utilization
efficiency.
In this lecture, you will learn two of the most popular methods for classifying data points into a finite set of categories. Both methods are based on representing a classifier via its decision boundary which is a hyperplane. The parameters of the hyperplane are learned from training data by minimizing a particular loss function.
This paper presents an interesting idea how to compute a consensus of several k-partitions of a set by means of finding an antichain in the concept lattice of an appropriate formal context.
THE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSESgraphhoc
Our work becomes integrated into the general problem of the stability of the network ad hoc. Some, works
attacked (affected) this problem. Among these works, we find the modelling of the network ad hoc in the
form of a graph. We can resume the problem of coherence of the network ad hoc of a problem of allocation
of frequency
We study a new class of graphs, the fat-extended P4 graphs, and we give a polynomial time algorithm to
calculate the Grundy number of the graphs in this class. This result implies that the Grundy number can be
found in polynomial time for many graphs
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...Matt Moores
There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.
Information Content of Complex NetworksHector Zenil
This short talk given in Stockholm, Sweden, explains how algorithmic complexity measures, notably Kolmogorov complexity approximated both by lossless compression algorithms and the Block Decomposition Method (BDM) are capable of characterizing graphs and networks by some of their group-theoretic and topological properties, notably graph automorphism group size and clustering coefficients of complex networks. The method distinguished between models of networks such as regular, random, small-world and scale-free.
Fractal dimension versus Computational ComplexityHector Zenil
We investigate connections and tradeoffs between two important complexity measures: fractal dimension and computational (time) complexity. We report exciting results applied to space-time diagrams of small Turing machines with precise mathematical relations and formal conjectures connecting these measures. The preprint of the paper is available at: http://arxiv.org/abs/1309.1779
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...Hector Zenil
Complexity measures are designed to capture complex behaviour and to quantify how complex that particular behaviour is. If a certain phenomenon is genuinely complex this means that it does not all of a sudden becomes simple by just translating the phenomenon to a different setting or framework with a different complexity value. It is in this sense that we expect different complexity measures from possibly entirely different fields to be related to each other. This work presents our work on a beautiful connection between the fractal dimension of space-time diagrams of Turing machines and their time complexity. Presented at Machines, Computations and Universality (MCU) 2013, Zurich, Switzerland.
Kernelization algorithms for graph and other structure modification problemsAnthony Perez
Thesis defense on November 14th, 2011, in Montpellier.
Jury:
Stéphane Bessy, Bruno Durand, Frédéric Havet, Rolf Niedermeier, Christophe Paul & Ioan Todinca.
Fuzzy clustering algorithm can not obtain good clustering effect when the sample characteristic is not
obvious and need to determine the number of clusters firstly. For thi0s reason, this paper proposes an
adaptive fuzzy kernel clustering algorithm. The algorithm firstly use the adaptive function of clustering
number to calculate the optimal clustering number, then the samples of input space is mapped to highdimensional
feature space using gaussian kernel and clustering in the feature space. The Matlab simulation
results confirmed that the algorithm's performance has greatly improvement than classical clustering algorithm and has faster convergence speed and more accurate clustering results
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...Hector Zenil
We present a novel alternative method (other than using compression algorithms) to approximate the algorithmic complexity of a string by calculating its algorithmic probability and applying Chaitin-Levin's coding theorem.
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...IJERA Editor
Conventional distributed arithmetic (DA) is popular in field programmable gate array (FPGA) design, and it
features on-chip ROM to achieve high speed and regularity. In this paper, we describe high speed area efficient
1-D discrete wavelet transform (DWT) using 9/7 filter based new efficient distributed arithmetic (NEDA)
Technique. Being area efficient architecture free of ROM, multiplication, and subtraction, NEDA can also
expose the redundancy existing in the adder array consisting of entries of 0 and 1. This architecture supports any
size of image pixel value and any level of decomposition. The parallel structure has 100% hardware utilization
efficiency.
In this lecture, you will learn two of the most popular methods for classifying data points into a finite set of categories. Both methods are based on representing a classifier via its decision boundary which is a hyperplane. The parameters of the hyperplane are learned from training data by minimizing a particular loss function.
This paper presents an interesting idea how to compute a consensus of several k-partitions of a set by means of finding an antichain in the concept lattice of an appropriate formal context.
THE RESULT FOR THE GRUNDY NUMBER ON P4- CLASSESgraphhoc
Our work becomes integrated into the general problem of the stability of the network ad hoc. Some, works
attacked (affected) this problem. Among these works, we find the modelling of the network ad hoc in the
form of a graph. We can resume the problem of coherence of the network ad hoc of a problem of allocation
of frequency
We study a new class of graphs, the fat-extended P4 graphs, and we give a polynomial time algorithm to
calculate the Grundy number of the graphs in this class. This result implies that the Grundy number can be
found in polynomial time for many graphs
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...Matt Moores
There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.
Housekeeping department with Other DepartmentTyrara Xieleen
-To know what are relation between housekeeping with other department.
-Important organization to generate income in any hotel/resort
-how they contact with each other
We start with motivation, few examples of uncertainties. Then we discretize elliptic PDE with uncertain coefficients, apply TT format for permeability, the stochastic operator and for the solution. We compare sparse multi-index set approach with full multi-index+TT.
Tensor Train format allows us to keep the whole multi-index set, without any multi-index set truncation.
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
Very often one has to deal with high-dimensional random variables (RVs). A high-dimensional RV can be described by its probability density (\pdf) and/or by the corresponding probability characteristic functions (\pcf), or by a function representation. Here the interest is mainly to compute characterisations like the entropy, or
relations between two distributions, like their Kullback-Leibler divergence, or more general measures such as $f$-divergences,
among others. These are all computed from the \pdf, which is often not available directly, and it is a computational challenge to even represent it in a numerically feasible fashion in case the dimension $d$ is even moderately large. It is an even stronger numerical challenge to then actually compute said characterisations in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose to represent the density by a high order tensor product, and approximate this in a low-rank format.
A generalized class of normalized distance functions called Q-Metrics is described in this presentation. The Q-Metrics approach relies on a unique functional, using a single bounded parameter (Lambda), which characterizes the conventional distance functions in a normalized per-unit metric space. In addition to this coverage property, a distinguishing and extremely attractive characteristic of the Q-Metric function is its low computational complexity. Q-Metrics satisfy the standard metric axioms. Novel networks for classification and regression tasks are defined and constructed using Q-Metrics. These new networks are shown to outperform conventional feed forward back propagation networks with the same size when tested on real data sets.
A generalized class of normalized distance functions called Q-Metrics is described in this presentation. The Q-Metrics approach relies on a unique functional, using a single bounded parameter Lambda, which characterizes the conventional distance functions in a normalized per-unit metric space. In addition to this coverage property, a distinguishing and extremely attractive characteristic of the Q-Metric function is its low computational complexity. Q-Metrics satisfy the standard metric axioms. Novel networks for classification and regression tasks are defined and constructed using Q-Metrics. These new networks are shown to outperform conventional feed forward back propagation networks with the same size when tested on real data sets.
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
MASSICCC - Une plateforme SaaS pour le traitement de la classification de données complexes hétérogènes et incomplètes.
Dans ce Tech Talk venez découvrir, tester et apprendre à maîtriser MASSICCC (Massive clustering in cloud computing) une plateforme SaaS orientée utilisateurs, ainsi que ses trois familles d’algorithmes de #classification, fruits des dernières avancées des équipes de recherche Modal & Celeste de Inria, pour analyser et faire de l’apprentissage sur vos "Big Data" (ex : en immobilier, maintenance prédictive, santé, open data, etc. ).
MASSICCC c’est aussi :
- Un accès gratuit pour le test et la recherche sur https://massiccc.lille.inria.fr
- Un "one for all" de la classification
- Une forte interprétabilité des résultats (avec ses graphiques)
- Un mode SaaS qui vous permet un suivi des expériences (en cours ou terminées)
- Et des algorithmes open source qui sont réutilisables indépendamment.
Distributed solution of stochastic optimal control problem on GPUsPantelis Sopasakis
Stochastic optimal control problems arise in many
applications and are, in principle,
large-scale involving up to millions of decision variables. Their
applicability in control applications is often limited by the
availability of algorithms that can solve them efficiently and within
the sampling time of the controlled system.
In this paper we propose a dual accelerated proximal
gradient algorithm which is amenable to parallelization and
demonstrate that its GPU implementation affords high speed-up
values (with respect to a CPU implementation) and greatly outperforms
well-established commercial optimizers such as Gurobi.
Talk of Michael Samet, entitled "Optimal Damping with Hierarchical Adaptive Quadrature for Efficient Fourier Pricing of Multi-Asset Options in Lévy Models" at the International Conference on Computational Finance (ICCF)", Wuppertal June 6-10, 2022
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
Similar to Graph Kernels for Chemical Informatics (20)
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
5. Chemical space2
Stars Small Molecules
Existing 1022 107
Virtual 0 1060
Access Difficult “Easy”
2
Slide from: Pierre Baldi, UC Irvine
5 / 48
6. Formalization
Problem statement
Given a set of training instances (x1, y1), . . . , (xn, yn), where :
xi ’s are graphs and yi ’s are continuous or discrete variables of
interest.
Estimate a function
y = f (x)
where x is any graph.
6 / 48
7. Classical Approaches1
Classical approaches
1 Map each molecule to a vector of fixed dimension.
2 Apply an algorithm for regression or classification over vectors.
Example: 2D structural keys in chemoinformatics
Then use NN, decision tree, least squares e.t.c
1
Slide from: Jean-Phillipe Vert, ParisTech
7 / 48
9. The kernel trick
Kernel
Let φ(x) be a vector representation of the graph x.
The kernel between two graphs is defined by:
K(x, x ) = φ(x)T
φ(x ).
Many linear algorithms can be expressed only in terms of inner
products between vectors.
Often computing kernel is more efficient than computing φ(x).
9 / 48
10. Kernel trick example: computing distances in the feature
space1
1
Slide from: Jean-Phillipe Vert, ParisTech
10 / 48
11. Positive definite (p.d.) Kernels
Definition
A positive definite (p.d.) kernel on a set χ is a function
K : χ × χ → R that is symmetric and satisfies, for all
N ∈ N, (x1, x2, . . . , xn) ∈ χN and (a1, a2, . . . , an) ∈ RN:
N
i=1
N
j=1
ai aj K(xi , xj ) ≥ 0
11 / 48
12. Positive definite kernels are inner products1
Mercer’s property
K is a p.d. kernel on the set χ if and only if there exists a Hilbert
space H and a mapping
φ : χ → H,
such that, for any x, x’ in χ:
K(x, x ) = φ(x), φ(x ) H
1
Slide from: Jean-Phillipe Vert, ParisTech
12 / 48
13. Graph kernels 1
Definition
A graph kernel K(x, x ) is a p.d. kernel over the set of
(labeled) graphs.
It is equivalent to an embedding φ : χ → H of the set of
graphs to a Hilbert space through the relation:
K(x, x ) = φ(x)T
φ(x ).
1
Slide from: Jean-Phillipe Vert, ParisTech
13 / 48
14. Clarification
Descriptors and kernels in chemoinformatics
1D- SMILES strings
2D- Graph of chemical bonds
2.5D- Surfaces
3D- Atomic coordinates
4D- Temporal evolution
14 / 48
15. Outline
1 Introduction
2 Expressiveness versus complexity
3 Walk kernels
4 Conclusion and future directions
5 Data depth for labeled graph ensembles
15 / 48
17. Expressibility versus complexity
Definition: Complete graph kernels
A graph kernel is complete if it separates nonisomorphic graphs.
Graph isomorphism
Figure from: Wikipedia
17 / 48
18. Expressibility versus complexity
Definition: Complete graph kernels
A graph kernel is complete if it separates nonisomorphic graphs.
Implication
If graph kernel not complete, then it cannot differentiate all
nonisomorphic graphs.
18 / 48
19. Expressibility versus complexity
Definition: Complete graph kernels
A graph kernel is complete if it separates nonisomorphic graphs.
Implication
If graph kernel not complete, then it cannot differentiate all
nonisomorphic graphs.
Tractability
Computing any complete graph kernel is at least as hard as the
graph isomorphism problem (Gartner et al, 2003)
19 / 48
21. Subgraph kernel
Definition: Subgraph
A subgraph of a graph (V , E) is a graph (V , E ) with V ⊂ V and
E ⊂ E.
Definition: Subgraph kernel
Ksubgraph(G1, G2) =
H∈χ
λHφH(G1)φH(G2).
where H ⊂ χ, λH is weight associated with H and φH(Gx ) returns
the number of occurrences of H in Gx .
21 / 48
22. Subgraph kernel
Definition: Subgraph
A subgraph of a graph (V , E) is a graph (V , E ) with V ⊂ V and
E ⊂ E.
Definition: Subgraph kernel
Ksubgraph(G1, G2) =
H∈χ
λHφH(G1)φH(G2).
where H ⊂ χ, λH is weight associated with H and φH(Gx ) returns
the number of occurrences of H in Gx .
Subgraph kernel complexity
Computing the subgraph kernel is NP hard (Gartner et.al. 2003)
22 / 48
23. Path kernels
Definition: Path
A path of a graph (V,E) is a sequence of distinct vertices such that
consecutive vertices share an edge.
23 / 48
24. Path kernels
Definition: Path
A path of a graph (V,E) is a sequence of distinct vertices such that
consecutive vertices share an edge.
Definition: Path kernel
Kpath(G1, G2) =
H∈P
λHφH(G1)φH(G2).
where P ⊂ χ is the set of path graphs.
24 / 48
25. Path kernels
Definition: Path
A path of a graph (V,E) is a sequence of distinct vertices such that
consecutive vertices share an edge.
Definition: Path kernel
Kpath(G1, G2) =
H∈P
λHφH(G1)φH(G2).
where P ⊂ χ is the set of path graphs.
Path kernel complexity
Computing the path kernel is NP hard (Gartner et.al. 2003)
25 / 48
26. Outline
1 Introduction
2 Expressiveness versus complexity
3 Walk kernels
4 Conclusion and future directions
5 Data depth for labeled graph ensembles
26 / 48
27. Walks
Definition
A walk of a graph (V,E) is a sequence of distinct vertices such that
consecutive vertices share an edge. Edge cannot appear in path
only once.
27 / 48
28. Walks
Definition
A walk of a graph (V,E) is a sequence of distinct vertices such that
consecutive vertices share an edge. Edge cannot appear in path
only once.
Definition: Walk kernel
Kwalk(G1, G2) =
w∈S
λw φw (G1)φw (G2).
where S is the set of all walks and φw (Gx ) returns the count of
walk w in Gx .
28 / 48
29. Walk kernel examples
nth order walk kernel
λG (w) = 1 if the length of w is n, 0 other wise.
29 / 48
30. Walk kernel examples
nth order walk kernel
λG (w) = 1 if the length of w is n, 0 other wise.
Geometric walk kernel
λG (w) = βlength(w), for β > 0
30 / 48
31. Walk kernel examples
nth order walk kernel
λG (w) = 1 if the length of w is n, 0 other wise.
Geometric walk kernel
λG (w) = βlength(w), for β > 0
Random walk kernel
λG (w) = PG (w)
31 / 48
32. Walk kernel examples
nth order walk kernel
λG (w) = 1 if the length of w is n, 0 other wise.
Geometric walk kernel
λG (w) = βlength(w), for β > 0
Random walk kernel
λG (w) = PG (w)
Fingerprint based kernels
Dot product kernel
Tanimoto kernel
MinMax kernel
32 / 48
33. Computation of walk kernels
Yay!
All the above walk kernels can be computed efficiently in
polynomial time.
33 / 48
34. Computation of n-th order walk kernel (1/2)1
Product graphs
Let G1 = (V1, E1) and G2 = (V1, E2). Then product graph
G = G1 × G2 is the graph G = (V , E) with:
1 V = {(v1, v2) ∈ V1 × V2: v1 and v2 have same label},
2 E = { (v1, v2), (v1, v2) ∈ V × V : (v1, v1) ∈
E1 and (v2, v2) ∈ E2}
1
Slide from: Jean-Phillipe Vert, ParisTech
34 / 48
35. Computation of n-th order walk kernel (2/2)1
For nth order walk kernel we have λG1×G2 = 1 if length of w
is n, 0 otherwise.
Therefore:
Knth−order (G1, G2) =
w∈Sn(G1×G2)
1 =
i,j
[An
]i,j = 1T
An
1.
Computation in O(n|G1||G2|d1d2), where di is the maximum
degree of Gi .
1
Slide from: Jean-Phillipe Vert, ParisTech
35 / 48
36. Traditional molecular fingerprints
Bit vectors of size ( usually = 512 or 1024)
Steps (as summarized in Ralaivola et al. 2005)
1 DFS exploration from each atom to get set of walks
2 Each path initializes a random number generator to form b
integers.
3 b integers reduced molulo then used to set corresponding
bits in fingerprint vector.
Complexity O(nm) or O(nαd ), where n := # atoms and
m := #edges, α := branching factor and d := depth of walk
36 / 48
37. Generalized molecular fingerprints
Avoids clashes/information loss with reserved bit positions.
Let P(d) be set of all atom-bond labeled path containing max
d bonds.
Binary feature map given depth d:
φd (u) = φpath(u) path∈P(d)
Binary feature map given depth d and fixed vector size :
φd (u) = φγ (path)(u) path∈P(d)
where γ :→ {1, . . . , }b
37 / 48
38. Fingerprint based kernel (Ralaivola et.al. 2005)2
Complexity O(d(n1m1 + n2m2)) using suffix tree data
structure.
2
Slide from: Pierre Baldi, UC Irvine
38 / 48
39. Extensions for walk kernels
Label enrichment
Non tottering walk
3D kernels
Mutual information in fingerprint construction
39 / 48
40. Results(Mahe et al., 2005, Ralaivola et al, 2005)
MUTAG Dataset
Collection of 188 compounds.
Classification of mutagenic activity :high(125) or none(63), as
assayed in Salmonella typhimurium.
Method Accuracy
Progol1(1D) 81.4 %
Random walk kernel (2D) 91.2 %
MinMax kernel (2D) 91.5%
40 / 48
41. Outline
1 Introduction
2 Expressiveness versus complexity
3 Walk kernels
4 Conclusion and future directions
5 Data depth for labeled graph ensembles
41 / 48
42. Conclusion
Summary
Extension of ML algorithms to graph data using definition of
positive definite kernels.
Two classes of 2D kernels for chemical molecule structures.
What next?
Can we use graph kernel machinery for computing depth.
42 / 48
43. Outline
1 Introduction
2 Expressiveness versus complexity
3 Walk kernels
4 Conclusion and future directions
5 Data depth for labeled graph ensembles
43 / 48
44. Data depth
What is depth function?
A depth function is designed to provide a P-based center outward
ordering( and thus a ranking) for ensemble of data objects drawn
from any arbitrary distribution P.
Taxonomy of data depth definitions (Mosler 2012)
Distance based depth functions
Simplex/Halfspace based depth
Weighted mean based depth
44 / 48
45. Band depth
Band depth: A type of simplex based data depth method.
Many definitions for various kinds of data.
Functions Multivariate functions
Paths on graph
45 / 48
46. Band formed by graphs
When alignment is known..
Graphs → Adjacency matrices → Functions
Alignment: A mapping from θ : VGx → VGy , where Gx and Gy
are any two graphs.
When alignment is unknown..
????
46 / 48
47. Product graphs!
V = {(v1, v2) ∈ V1 × V2: v1 and v2 have same label},
E = { (v1, v2), (v1, v2) ∈ V × V : (v1, v1) ∈ E1 and (v2, v2) ∈ E2}
Weak Direct Product (aka Tensor product or Kronecker product..)
E× = { (v1, v2), (v1, v2) ∈ V × V : (v1, v1) ∈ E1 and (v2, v2) ∈
E2}
Strong Product
E = E× ∪ E where
E = { (v1, v2), (v1, v2) ∈ V × V : v1 = v1 and (v2, v2) ∈
E2} or v2 = v2 and (v1, v1) ∈ E1}
47 / 48
48. References
Ralaivola, Liva, Sanjay J. Swamidass, Hiroto Saigo, and Pierre
Baldi. ”Graph kernels for chemical informatics.” Neural
Networks 18, no. 8 (2005): 1093-1110.
Mahe, Pierre, et al. ”Graph kernels for molecular
structure-activity relationship analysis with support vector
machines.” Journal of chemical information and modeling
45.4 (2005): 939-951.
Gartner, Thomas, Peter Flach, and Stefan Wrobel. ”On graph
kernels: Hardness results and efficient alternatives.” Learning
Theory and Kernel Machines. Springer Berlin Heidelberg,
2003. 129-143.
http://videolectures.net/site/normal_dl/tag=9127/
gbr07_vert_ckac_01.pdf
http:
//www.ics.uci.edu/~dock/upload/UCI_CHEM_05.ppt
48 / 48