Below are the important points I note from the 2020 paper by Martin Grohe:
- 1-WL distinguishes almost all graphs, in a probabilistic sense
- Classical WL is two dimensional Weisfeiler-Leman
- DeepWL is an unlimited version of WL graph that runs in polynomial time.
- Knowledge graphs are essentially graphs with vertex/edge attributes
ABSTRACT:
Vector representations of graphs and relational structures, whether handcrafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of methods for generating such embeddings have been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view.
Starting with a survey of embedding techniques that have been used in practice, in this paper we propose two theoretical approaches that we see as central for understanding the foundations of vector embeddings. We draw connections between the various approaches and suggest directions for future research.
This document discusses semantic visualization in design computing. It presents an approach for designing visualization schemes that leverage predefined semantics. The approach is based on a combination of cognitive linguistics models of metaphor and form-semantics-function categorization. It includes metaphor analysis, formalization, and evaluation. Examples are provided of visualizing collaborative design data and virtual worlds to illustrate the approach. The goal is to establish and preserve semantic links between form and function in visualization metaphors.
This document proposes methods for enhancing the visualization of concept lattices generated through formal concept analysis. It discusses extracting tree structures from concept lattices to improve readability. Various criteria are proposed for selecting parent concepts when transforming a lattice into a tree, including stability, support, shared attributes between concepts, and confidence. Visualization techniques like coloring nodes based on criteria values and sizing nodes by extent/intent ratios are also suggested to aid interpretation. The methods aim to make larger datasets more explorable by extracting simpler tree representations while preserving essential lattice features and structure.
Fuzzy formal concept analysis: Approaches, applications and issuesCSITiaesprime
Formal concept analysis (FCA) is today regarded as a significant technique for knowledge extraction, representation, and analysis for applications in a variety of fields. Significant progress has been made in recent years to extend FCA theory to deal with uncertain and imperfect data. The computational complexity associated with the enormous number of formal concepts generated has been identified as an issue in various applications. In general, the generation of a concept lattice of sufficient complexity and size is one of the most fundamental challenges in FCA. The goal of this work is to provide an overview of research articles that assess and compare numerous fuzzy formal concept analysis techniques which have been suggested, as well as to explore the key techniques for reducing concept lattice size. as well as we'll present a review of research articles on using fuzzy formal concept analysis in ontology engineering, knowledge discovery in databases and data mining, and information retrieval.
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
Tutorial at the "Reality of the Semantic Gap in Image Retrieval" tutorial at the first international conference on Semantics And digital Media Technology (SAMT 2006). 6th December 2006.
The spread and abundance of electronic documents requires automatic techniques for extracting useful information from the text they contain. The availability of conceptual taxonomies can be of great help, but manually building them is a complex and costly task. Building on previous work, we propose a technique to automatically extract conceptual graphs from text and reason with them. Since automated learning of taxonomies needs to be robust with respect to missing or partial knowledge and flexible with respect to noise, this work proposes a way to deal with these problems. The case of poor data/sparse concepts is tackled by finding generalizations among disjoint pieces of knowledge. Noise is
handled by introducing soft relationships among concepts rather than hard ones, and applying a probabilistic inferential setting. In particular, we propose to reason on the extracted graph using different kinds of relationships among concepts, where each arc/relationship is associated to a number that represents its likelihood among all possible worlds, and to face the problem of sparse knowledge by using generalizations among distant concepts as bridges between disjoint portions of knowledge.
Extending the knowledge level of cognitive architectures with Conceptual Spac...Antonio Lieto
Extending the knowledge level of cognitive architectures with Conceptual Spaces (+ a case study with Dual-PECCS: a hybrid knowledge representation system for common sense reasoning). Talk given at Stockholm, September 2016.
This document discusses using convolutional neural networks and self-organized maps to visualize knowledge graphs. It presents a compositional model for embedding nodes and relationships in a knowledge graph into a vector space. A self-organized map is used to cluster the embeddings and extract semantic fingerprints. The fingerprints are useful for knowledge discovery and classification tasks. The technique is applied to a subset of the CTD knowledge graph containing compound-gene/protein interactions, and the results are comparable to structural models.
This document discusses probabilistic inference in relational conditional logic. It proposes two semantics - averaging and aggregating - for interpreting first-order probabilistic conditionals. It describes desirable properties for inductive inference in relational frameworks. It then presents two model-based inference operators based on maximum entropy that fulfill these properties in each semantic framework. The operators allow reasoning from incomplete first-order probabilistic knowledge bases in a precise but cautious manner.
This document discusses semantic visualization in design computing. It presents an approach for designing visualization schemes that leverage predefined semantics. The approach is based on a combination of cognitive linguistics models of metaphor and form-semantics-function categorization. It includes metaphor analysis, formalization, and evaluation. Examples are provided of visualizing collaborative design data and virtual worlds to illustrate the approach. The goal is to establish and preserve semantic links between form and function in visualization metaphors.
This document proposes methods for enhancing the visualization of concept lattices generated through formal concept analysis. It discusses extracting tree structures from concept lattices to improve readability. Various criteria are proposed for selecting parent concepts when transforming a lattice into a tree, including stability, support, shared attributes between concepts, and confidence. Visualization techniques like coloring nodes based on criteria values and sizing nodes by extent/intent ratios are also suggested to aid interpretation. The methods aim to make larger datasets more explorable by extracting simpler tree representations while preserving essential lattice features and structure.
Fuzzy formal concept analysis: Approaches, applications and issuesCSITiaesprime
Formal concept analysis (FCA) is today regarded as a significant technique for knowledge extraction, representation, and analysis for applications in a variety of fields. Significant progress has been made in recent years to extend FCA theory to deal with uncertain and imperfect data. The computational complexity associated with the enormous number of formal concepts generated has been identified as an issue in various applications. In general, the generation of a concept lattice of sufficient complexity and size is one of the most fundamental challenges in FCA. The goal of this work is to provide an overview of research articles that assess and compare numerous fuzzy formal concept analysis techniques which have been suggested, as well as to explore the key techniques for reducing concept lattice size. as well as we'll present a review of research articles on using fuzzy formal concept analysis in ontology engineering, knowledge discovery in databases and data mining, and information retrieval.
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
Tutorial at the "Reality of the Semantic Gap in Image Retrieval" tutorial at the first international conference on Semantics And digital Media Technology (SAMT 2006). 6th December 2006.
The spread and abundance of electronic documents requires automatic techniques for extracting useful information from the text they contain. The availability of conceptual taxonomies can be of great help, but manually building them is a complex and costly task. Building on previous work, we propose a technique to automatically extract conceptual graphs from text and reason with them. Since automated learning of taxonomies needs to be robust with respect to missing or partial knowledge and flexible with respect to noise, this work proposes a way to deal with these problems. The case of poor data/sparse concepts is tackled by finding generalizations among disjoint pieces of knowledge. Noise is
handled by introducing soft relationships among concepts rather than hard ones, and applying a probabilistic inferential setting. In particular, we propose to reason on the extracted graph using different kinds of relationships among concepts, where each arc/relationship is associated to a number that represents its likelihood among all possible worlds, and to face the problem of sparse knowledge by using generalizations among distant concepts as bridges between disjoint portions of knowledge.
Extending the knowledge level of cognitive architectures with Conceptual Spac...Antonio Lieto
Extending the knowledge level of cognitive architectures with Conceptual Spaces (+ a case study with Dual-PECCS: a hybrid knowledge representation system for common sense reasoning). Talk given at Stockholm, September 2016.
This document discusses using convolutional neural networks and self-organized maps to visualize knowledge graphs. It presents a compositional model for embedding nodes and relationships in a knowledge graph into a vector space. A self-organized map is used to cluster the embeddings and extract semantic fingerprints. The fingerprints are useful for knowledge discovery and classification tasks. The technique is applied to a subset of the CTD knowledge graph containing compound-gene/protein interactions, and the results are comparable to structural models.
This document discusses probabilistic inference in relational conditional logic. It proposes two semantics - averaging and aggregating - for interpreting first-order probabilistic conditionals. It describes desirable properties for inductive inference in relational frameworks. It then presents two model-based inference operators based on maximum entropy that fulfill these properties in each semantic framework. The operators allow reasoning from incomplete first-order probabilistic knowledge bases in a precise but cautious manner.
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
The document presents a novel fuzzy clustering algorithm called FRECCA that clusters sentences from multi-documents to discover new information. FRECCA uses fuzzy relational eigenvector centrality to calculate page rank scores for sentences within clusters, treating the scores as likelihoods. It uses expectation maximization to optimize cluster membership values and mixing coefficients without a parameterized likelihood function. An evaluation shows FRECCA achieves superior performance to other clustering algorithms on a quotations dataset, identifying overlapping clusters of semantically related sentences.
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
Although there is an emerging trend towards generating embeddings for primarily unstructured data and, recently, for structured data, no systematic suite for measuring the quality of embeddings has been proposed yet.
This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of the encoded structure as well as semantic patterns in the embedding space.
In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect.
Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings.
Furthermore, w.r.t. this framework, multiple experimental studies were run to compare the quality of the available embedding models.
Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts.
We positioned our sampled data and code at https://github.com/alshargi/Concept2vec under GNU General Public License v3.0.
This document presents a computational approach called "taskonomy" to model relationships and structure among visual tasks. It trains task-specific models on 26 visual tasks and computes transfer learning dependencies between tasks. This identifies which tasks provide useful information for other tasks. It finds the optimal transfer policy to solve tasks using limited labeled data, by leveraging relationships between tasks. For example, it shows 10 tasks can be solved using 2/3 less labeled data while maintaining performance, by transferring knowledge between related tasks.
1. The document discusses path compression, a technique used in disjoint set data structures that improves the running time of find operations from logarithmic to almost constant.
2. It explains how path compression works by redirecting parent pointers during find operations so that future finds take direct paths to the root.
3. The document also discusses some applications of set theory in mathematics like defining mathematical structures and relations, and serving as a foundation for other areas of math. Transferring files between computers on a network using disjoint sets is provided as an example algorithm.
A semantic framework and software design to enable the transparent integratio...Patricia Tavares Boralli
This document proposes a conceptual framework to unify representations of natural systems knowledge. The framework is based on separating the ontological nature of an object of study from the context of its observation. Each object is associated with a concept defined in an ontology and an observation context describing aspects like location and time. Models and data are treated as generic knowledge sources with a semantic type and observation context. This allows flexible integration and calculation of states across heterogeneous sources by composing their observation contexts and resolving semantic compatibility. The framework aims to simplify knowledge representation by abstracting away complexity related to data format and scale.
SVHsIEVs for Navigation in Virtual Urban Environmentcsandit
Many virtual reality applications, such as training, urban design or gaming are based on a rich
semantic description of the environment. This paper describes a new representation of semantic
virtual worlds. Our model, called SVHsIEVs1
should provide a consistent representation of the
following aspects: the simulated environment, its structure, and the knowledge items using
ontology, interactions and tasks that virtual humans can perform in the environment. Our first
main contribution is to show the influence of semantic virtual objects on the environment. Our
second main contribution is to use these semantic informations to manage he tasks of each
virtual object. We propose to define each task by a set of attributes and relationships, which
determines the links between attributes in tasks, and links between other tasks. The architecture
has been successfully tested in 3D dynamic environments for navigation in virtual urban
environments.
Many virtual reality applications, such as training, urban design or gaming are based on a rich
semantic description of the environment. This paper describes a new representation of semantic
virtual worlds. Our model, called SVHsIEVs1should provide a consistent representation of the
following aspects: the simulated environment, its structure, and the knowledge items using
ontology, interactions and tasks that virtual humans can perform in the environment. Our first
main contribution is to show the influence of semantic virtual objects on the environment. Our
second main contribution is to use these semantic informations to manage he tasks of each
virtual object. We propose to define each task by a set of attributes and relationships, which
determines the links between attributes in tasks, and links between other tasks. The architecture
has been successfully tested in 3D dynamic environments for navigation in virtual urban
environments.
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
Abstract: Topic modeling techniques have been widely used to uncover dominant themes hidden inside an unstructured document collection. Though these techniques first originated in the probabilistic analysis of word distributions, many deep learning approaches have been adopted recently. In this paper, we propose a novel neural network based architecture that produces distributed representation of topics to capture topical themes in a dataset. Unlike many state-of-the-art techniques for generating distributed representation of words and documents that directly use neighboring words for training, we leverage the outcome of a sophisticated deep neural network to estimate the topic labels of each document. The networks, for topic modeling and generation of distributed representations, are trained concurrently in a cascaded style with better runtime without sacrificing the quality of the topics. Empirical studies reported in the paper show that the distributed representations of topics represent intuitive themes using smaller dimensions than conventional topic modeling approaches.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
FUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATIONijdms
This document discusses fuzzy statistical databases and their physical organization. It introduces the concept of fuzzy cardinality for counting elements in a fuzzy set or fuzzy relation. A fuzzy statistical database is proposed to store fuzzy statistics generated from fuzzy relational databases. This fuzzy statistical database contains fuzzy statistical tables, which can be type-1 or type-2 depending on the complexity of the fuzzy attributes. The physical organization of these fuzzy statistical tables is discussed to facilitate efficient storage and retrieval of the imprecise statistical data.
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
Chinese discourse coherence modeling remains a challenge taskin Natural Language Processing
field.Existing approaches mostlyfocus on the need for feature engineering, whichadoptthe sophisticated
features to capture the logic or syntactic or semantic relationships acrosssentences within a text.In this
paper, we present an entity-drivenrecursive deep modelfor the Chinese discourse coherence evaluation
based on current English discourse coherenceneural network model. Specifically, to overcome the
shortage of identifying the entity(nouns) overlap across sentences in the currentmodel, Our combined
modelsuccessfully investigatesthe entities information into the recursive neural network
freamework.Evaluation results on both sentence ordering and machine translation coherence rating
task show the effectiveness of the proposed model, which significantly outperforms the existing strong
baseline.
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
The document proposes a mixed approach using existing natural language processing techniques and novel techniques to automatically construct conceptual taxonomies from text. It identifies relevant concepts from text using keyword extraction, clustering, and computing relevance weights. It then generalizes similar concepts using WordNet to group concepts and disambiguate word senses. Preliminary evaluations show promising initial results.
The document proposes a mixed approach using existing natural language processing techniques and novel techniques to automatically construct conceptual taxonomies from text. Key steps include identifying relevant concepts and attributes from text, clustering similar concepts, computing relevance weights for concepts, and generalizing concepts using WordNet. Preliminary results suggest the approach shows promise for extending and improving automatic taxonomy construction.
The document presents an overview of searching in metric spaces. It discusses how similarity searching is needed for unstructured data like text, images, and audio, where exact matching is not possible. It describes how similarity is modeled using a distance function between objects in a metric space. The document surveys existing solutions from different fields that address proximity searching in metric spaces and vector spaces. It aims to provide a unified framework to analyze and categorize existing algorithms.
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
[Emnlp] what is glo ve part ii - towards data scienceNikhil Jaiswal
GloVe is a new model for learning word embeddings from co-occurrence matrices that combines elements of global matrix factorization and local context window methods. It trains on the nonzero elements in a word-word co-occurrence matrix rather than the entire sparse matrix or individual context windows. This allows it to efficiently leverage statistical information from the corpus. The model produces a vector space with meaningful structure, as shown by its performance of 75% on a word analogy task. It outperforms related models on similarity tasks and named entity recognition. The full paper describes GloVe's global log-bilinear regression model and how it addresses drawbacks of previous models to encode linear directions of meaning in the vector space.
For non-grid 3D images like point clouds and meshes, and inherently graph-based data.
Inherently graph-based data include for example brain connectivity analysis, scientific article citation networks, (social) network analysis, etc.
Alternative download link:
https://www.dropbox.com/s/2o3cofcd6d6e2qt/geometricGraph_deepLearning.pdf?dl=0
The document discusses clustering documents using a multi-viewpoint similarity measure. It begins with an introduction to document clustering and common similarity measures like cosine similarity. It then proposes a new multi-viewpoint similarity measure that calculates similarity between documents based on multiple reference points, rather than just the origin. This allows a more accurate assessment of similarity. The document outlines an optimization algorithm used to cluster documents by maximizing the new similarity measure. It compares the new approach to existing document clustering methods and similarity measures.
Conceptual Spaces for Cognitive Architectures: A Lingua Franca for Different ...Antonio Lieto
We claim that Conceptual Spaces offer a lingua franca that allows to unify and generalize many aspects of the symbolic, sub-symbolic and diagrammatic approaches (by overcoming some of their typical problems) and to integrate them on a common ground. In doing so we extend and detail some of the arguments explored by Gardenfors [23] for defending the need of a conceptual, intermediate, representation level between
the symbolic and the sub-symbolic one. Additionally, we argue that Conceptual Spaces could offer a unifying framework for interpreting many kinds of diagrammatic and analogical representations. As a consequence, their adoption could also favor the integration of diagrammatical representation and
reasoning in Cognitive Architectures
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...Subhajit Sahu
TrueTime is a service that enables the use of globally synchronized clocks, with bounded error. It returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. If two intervals do not overlap, then we know calls were definitely ordered in real time. In general, synchronized clocks can be used to avoid communication in a distributed system.
The underlying source of time is a combination of GPS receivers and atomic clocks. As there are “time masters” in every datacenter (redundantly), it is likely that both sides of a partition would continue to enjoy accurate time. Individual nodes however need network connectivity to the masters, and without it their clocks will drift. Thus, during a partition their intervals slowly grow wider over time, based on bounds on the rate of local clock drift. Operations depending on TrueTime, such as Paxos leader election or transaction commits, thus have to wait a little longer, but the operation still completes (assuming the 2PC and quorum communication are working).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
More Related Content
Similar to word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data : NOTES
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
The document presents a novel fuzzy clustering algorithm called FRECCA that clusters sentences from multi-documents to discover new information. FRECCA uses fuzzy relational eigenvector centrality to calculate page rank scores for sentences within clusters, treating the scores as likelihoods. It uses expectation maximization to optimize cluster membership values and mixing coefficients without a parameterized likelihood function. An evaluation shows FRECCA achieves superior performance to other clustering algorithms on a quotations dataset, identifying overlapping clusters of semantically related sentences.
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
Although there is an emerging trend towards generating embeddings for primarily unstructured data and, recently, for structured data, no systematic suite for measuring the quality of embeddings has been proposed yet.
This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of the encoded structure as well as semantic patterns in the embedding space.
In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect.
Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings.
Furthermore, w.r.t. this framework, multiple experimental studies were run to compare the quality of the available embedding models.
Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts.
We positioned our sampled data and code at https://github.com/alshargi/Concept2vec under GNU General Public License v3.0.
This document presents a computational approach called "taskonomy" to model relationships and structure among visual tasks. It trains task-specific models on 26 visual tasks and computes transfer learning dependencies between tasks. This identifies which tasks provide useful information for other tasks. It finds the optimal transfer policy to solve tasks using limited labeled data, by leveraging relationships between tasks. For example, it shows 10 tasks can be solved using 2/3 less labeled data while maintaining performance, by transferring knowledge between related tasks.
1. The document discusses path compression, a technique used in disjoint set data structures that improves the running time of find operations from logarithmic to almost constant.
2. It explains how path compression works by redirecting parent pointers during find operations so that future finds take direct paths to the root.
3. The document also discusses some applications of set theory in mathematics like defining mathematical structures and relations, and serving as a foundation for other areas of math. Transferring files between computers on a network using disjoint sets is provided as an example algorithm.
A semantic framework and software design to enable the transparent integratio...Patricia Tavares Boralli
This document proposes a conceptual framework to unify representations of natural systems knowledge. The framework is based on separating the ontological nature of an object of study from the context of its observation. Each object is associated with a concept defined in an ontology and an observation context describing aspects like location and time. Models and data are treated as generic knowledge sources with a semantic type and observation context. This allows flexible integration and calculation of states across heterogeneous sources by composing their observation contexts and resolving semantic compatibility. The framework aims to simplify knowledge representation by abstracting away complexity related to data format and scale.
SVHsIEVs for Navigation in Virtual Urban Environmentcsandit
Many virtual reality applications, such as training, urban design or gaming are based on a rich
semantic description of the environment. This paper describes a new representation of semantic
virtual worlds. Our model, called SVHsIEVs1
should provide a consistent representation of the
following aspects: the simulated environment, its structure, and the knowledge items using
ontology, interactions and tasks that virtual humans can perform in the environment. Our first
main contribution is to show the influence of semantic virtual objects on the environment. Our
second main contribution is to use these semantic informations to manage he tasks of each
virtual object. We propose to define each task by a set of attributes and relationships, which
determines the links between attributes in tasks, and links between other tasks. The architecture
has been successfully tested in 3D dynamic environments for navigation in virtual urban
environments.
Many virtual reality applications, such as training, urban design or gaming are based on a rich
semantic description of the environment. This paper describes a new representation of semantic
virtual worlds. Our model, called SVHsIEVs1should provide a consistent representation of the
following aspects: the simulated environment, its structure, and the knowledge items using
ontology, interactions and tasks that virtual humans can perform in the environment. Our first
main contribution is to show the influence of semantic virtual objects on the environment. Our
second main contribution is to use these semantic informations to manage he tasks of each
virtual object. We propose to define each task by a set of attributes and relationships, which
determines the links between attributes in tasks, and links between other tasks. The architecture
has been successfully tested in 3D dynamic environments for navigation in virtual urban
environments.
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
Abstract: Topic modeling techniques have been widely used to uncover dominant themes hidden inside an unstructured document collection. Though these techniques first originated in the probabilistic analysis of word distributions, many deep learning approaches have been adopted recently. In this paper, we propose a novel neural network based architecture that produces distributed representation of topics to capture topical themes in a dataset. Unlike many state-of-the-art techniques for generating distributed representation of words and documents that directly use neighboring words for training, we leverage the outcome of a sophisticated deep neural network to estimate the topic labels of each document. The networks, for topic modeling and generation of distributed representations, are trained concurrently in a cascaded style with better runtime without sacrificing the quality of the topics. Empirical studies reported in the paper show that the distributed representations of topics represent intuitive themes using smaller dimensions than conventional topic modeling approaches.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
FUZZY STATISTICAL DATABASE AND ITS PHYSICAL ORGANIZATIONijdms
This document discusses fuzzy statistical databases and their physical organization. It introduces the concept of fuzzy cardinality for counting elements in a fuzzy set or fuzzy relation. A fuzzy statistical database is proposed to store fuzzy statistics generated from fuzzy relational databases. This fuzzy statistical database contains fuzzy statistical tables, which can be type-1 or type-2 depending on the complexity of the fuzzy attributes. The physical organization of these fuzzy statistical tables is discussed to facilitate efficient storage and retrieval of the imprecise statistical data.
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
Chinese discourse coherence modeling remains a challenge taskin Natural Language Processing
field.Existing approaches mostlyfocus on the need for feature engineering, whichadoptthe sophisticated
features to capture the logic or syntactic or semantic relationships acrosssentences within a text.In this
paper, we present an entity-drivenrecursive deep modelfor the Chinese discourse coherence evaluation
based on current English discourse coherenceneural network model. Specifically, to overcome the
shortage of identifying the entity(nouns) overlap across sentences in the currentmodel, Our combined
modelsuccessfully investigatesthe entities information into the recursive neural network
freamework.Evaluation results on both sentence ordering and machine translation coherence rating
task show the effectiveness of the proposed model, which significantly outperforms the existing strong
baseline.
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
The document proposes a mixed approach using existing natural language processing techniques and novel techniques to automatically construct conceptual taxonomies from text. It identifies relevant concepts from text using keyword extraction, clustering, and computing relevance weights. It then generalizes similar concepts using WordNet to group concepts and disambiguate word senses. Preliminary evaluations show promising initial results.
The document proposes a mixed approach using existing natural language processing techniques and novel techniques to automatically construct conceptual taxonomies from text. Key steps include identifying relevant concepts and attributes from text, clustering similar concepts, computing relevance weights for concepts, and generalizing concepts using WordNet. Preliminary results suggest the approach shows promise for extending and improving automatic taxonomy construction.
The document presents an overview of searching in metric spaces. It discusses how similarity searching is needed for unstructured data like text, images, and audio, where exact matching is not possible. It describes how similarity is modeled using a distance function between objects in a metric space. The document surveys existing solutions from different fields that address proximity searching in metric spaces and vector spaces. It aims to provide a unified framework to analyze and categorize existing algorithms.
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
[Emnlp] what is glo ve part ii - towards data scienceNikhil Jaiswal
GloVe is a new model for learning word embeddings from co-occurrence matrices that combines elements of global matrix factorization and local context window methods. It trains on the nonzero elements in a word-word co-occurrence matrix rather than the entire sparse matrix or individual context windows. This allows it to efficiently leverage statistical information from the corpus. The model produces a vector space with meaningful structure, as shown by its performance of 75% on a word analogy task. It outperforms related models on similarity tasks and named entity recognition. The full paper describes GloVe's global log-bilinear regression model and how it addresses drawbacks of previous models to encode linear directions of meaning in the vector space.
For non-grid 3D images like point clouds and meshes, and inherently graph-based data.
Inherently graph-based data include for example brain connectivity analysis, scientific article citation networks, (social) network analysis, etc.
Alternative download link:
https://www.dropbox.com/s/2o3cofcd6d6e2qt/geometricGraph_deepLearning.pdf?dl=0
The document discusses clustering documents using a multi-viewpoint similarity measure. It begins with an introduction to document clustering and common similarity measures like cosine similarity. It then proposes a new multi-viewpoint similarity measure that calculates similarity between documents based on multiple reference points, rather than just the origin. This allows a more accurate assessment of similarity. The document outlines an optimization algorithm used to cluster documents by maximizing the new similarity measure. It compares the new approach to existing document clustering methods and similarity measures.
Conceptual Spaces for Cognitive Architectures: A Lingua Franca for Different ...Antonio Lieto
We claim that Conceptual Spaces offer a lingua franca that allows to unify and generalize many aspects of the symbolic, sub-symbolic and diagrammatic approaches (by overcoming some of their typical problems) and to integrate them on a common ground. In doing so we extend and detail some of the arguments explored by Gardenfors [23] for defending the need of a conceptual, intermediate, representation level between
the symbolic and the sub-symbolic one. Additionally, we argue that Conceptual Spaces could offer a unifying framework for interpreting many kinds of diagrammatic and analogical representations. As a consequence, their adoption could also favor the integration of diagrammatical representation and
reasoning in Cognitive Architectures
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Similar to word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data : NOTES (20)
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...Subhajit Sahu
TrueTime is a service that enables the use of globally synchronized clocks, with bounded error. It returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. If two intervals do not overlap, then we know calls were definitely ordered in real time. In general, synchronized clocks can be used to avoid communication in a distributed system.
The underlying source of time is a combination of GPS receivers and atomic clocks. As there are “time masters” in every datacenter (redundantly), it is likely that both sides of a partition would continue to enjoy accurate time. Individual nodes however need network connectivity to the masters, and without it their clocks will drift. Thus, during a partition their intervals slowly grow wider over time, based on bounds on the rate of local clock drift. Operations depending on TrueTime, such as Paxos leader election or transaction commits, thus have to wait a little longer, but the operation still completes (assuming the 2PC and quorum communication are working).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting Bitset for graph : SHORT REPORT / NOTESSubhajit Sahu
Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is commonly used for efficient graph computations. Unfortunately, using CSR for dynamic graphs is impractical since addition/deletion of a single edge can require on average (N+M)/2 memory accesses, in order to update source-offsets and destination-indices. A common approach is therefore to store edge-lists/destination-indices as an array of arrays, where each edge-list is an array belonging to a vertex. While this is good enough for small graphs, it quickly becomes a bottleneck for large graphs. What causes this bottleneck depends on whether the edge-lists are sorted or unsorted. If they are sorted, checking for an edge requires about log(E) memory accesses, but adding an edge on average requires E/2 accesses, where E is the number of edges of a given vertex. Note that both addition and deletion of edges in a dynamic graph require checking for an existing edge, before adding or deleting it. If edge lists are unsorted, checking for an edge requires around E/2 memory accesses, but adding an edge requires only 1 memory access.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Experiments with Primitive operations : SHORT REPORT / NOTESSubhajit Sahu
This includes:
- Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
- Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
- Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
- Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESSubhajit Sahu
https://gist.github.com/wolfram77/54c4a14d9ea547183c6c7b3518bf9cd1
There exist a number of dynamic graph generators. Barbasi-Albert model iteratively attach new vertices to pre-exsiting vertices in the graph using preferential attachment (edges to high degree vertices are more likely - rich get richer - Pareto principle). However, graph size increases monotonically, and density of graph keeps increasing (sparsity decreasing).
Gorke's model uses a defined clustering to uniformly add vertices and edges. Purohit's model uses motifs (eg. triangles) to mimick properties of existing dynamic graphs, such as growth rate, structure, and degree distribution. Kronecker graph generators are used to increase size of a given graph, with power-law distribution.
To generate dynamic graphs, we must choose a metric to compare two graphs. Common metrics include diameter, clustering coefficient (modularity?), triangle counting (triangle density?), and degree distribution.
In this paper, the authors propose Dygraph, a dynamic graph generator that uses degree distribution as the only metric. The authors observe that many real-world graphs differ from the power-law distribution at the tail end. To address this issue, they propose binning, where the vertices beyond a certain degree (minDeg = min(deg) s.t. |V(deg)| < H, where H~10 is the number of vertices with a given degree below which are binned) are grouped into bins of degree-width binWidth, max-degree localMax, and number of degrees in bin with at least one vertex binSize (to keep track of sparsity). This helps the authors to generate graphs with a more realistic degree distribution.
The process of generating a dynamic graph is as follows. First the difference between the desired and the current degree distribution is calculated. The authors then create an edge-addition set where each vertex is present as many times as the number of additional incident edges it must recieve. Edges are then created by connecting two vertices randomly from this set, and removing both from the set once connected. Currently, authors reject self-loops and duplicate edges. Removal of edges is done in a similar fashion.
Authors observe that adding edges with power-law properties dominates the execution time, and consider parallelizing DyGraph as part of future work.
My notes on shared memory parallelism.
Shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Using memory for communication inside a single program, e.g. among its multiple threads, is also referred to as shared memory [REF].
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESSubhajit Sahu
**Community detection methods** can be *global* or *local*. **Global community detection methods** divide the entire graph into groups. Existing global algorithms include:
- Random walk methods
- Spectral partitioning
- Label propagation
- Greedy agglomerative and divisive algorithms
- Clique percolation
https://gist.github.com/wolfram77/b4316609265b5b9f88027bbc491f80b6
There is a growing body of work in *detecting overlapping communities*. **Seed set expansion** is a **local community detection method** where a relevant *seed vertices* of interest are picked and *expanded to form communities* surrounding them. The quality of each community is measured using a *fitness function*.
**Modularity** is a *fitness function* which compares the number of intra-community edges to the expected number in a random-null model. **Conductance** is another popular fitness score that measures the community cut or inter-community edges. Many *overlapping community detection* methods **use a modified ratio** of intra-community edges to all edges with atleast one endpoint in the community.
Andersen et al. use a **Spectral PageRank-Nibble method** which minimizes conductance and is formed by adding vertices in order of decreasing PageRank values. Andersen and Lang develop a **random walk approach** in which some vertices in the seed set may not be placed in the final community. Clauset gives a **greedy method** that *starts from a single vertex* and then iteratively adds neighboring vertices *maximizing the local modularity score*. Riedy et al. **expand multiple vertices** via maximizing modularity.
Several algorithms for **detecting global, overlapping communities** use a *greedy*, *agglomerative approach* and run *multiple separate seed set expansions*. Lancichinetti et al. run **greedy seed set expansions**, each with a *single seed vertex*. Overlapping communities are produced by a sequentially running expansions from a node not yet in a community. Lee et al. use **maximal cliques as seed sets**. Havemann et al. **greedily expand cliques**.
The authors of this paper discuss a dynamic approach for **community detection using seed set expansion**. Simply marking the neighbours of changed vertices is a **naive approach**, and has *severe shortcomings*. This is because *communities can split apart*. The simple updating method *may fail even when it outputs a valid community* in the graph.
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESSubhajit Sahu
A **community** (in a network) is a subset of nodes which are _strongly connected among themselves_, but _weakly connected to others_. Neither the number of output communities nor their size distribution is known a priori. Community detection methods can be divisive or agglomerative. **Divisive methods** use _betweeness centrality_ to **identify and remove bridges** between communities. **Agglomerative methods** greedily **merge two communities** that provide maximum gain in _modularity_. Newman and Girvan have introduced the **modularity metric**. The problem of community detection is then reduced to the problem of modularity maximization which is **NP-complete**. **Louvain method** is a variant of the _agglomerative strategy_, in that is a _multi-level heuristic_.
https://gist.github.com/wolfram77/917a1a4a429e89a0f2a1911cea56314d
In this paper, the authors discuss **four heuristics** for Community detection using the _Louvain algorithm_ implemented upon recently developed **Grappolo**, which is a parallel variant of the Louvain algorithm. They are:
- Vertex following and Minimum label
- Data caching
- Graph coloring
- Threshold scaling
With the **Vertex following** heuristic, the _input is preprocessed_ and all single-degree vertices are merged with their corresponding neighbours. This helps reduce the number of vertices considered in each iteration, and also help initial seeds of communities to be formed. With the **Minimum label heuristic**, when a vertex is making the decision to move to a community and multiple communities provided the same modularity gain, the community with the smallest id is chosen. This helps _minimize or prevent community swaps_. With the **Data caching** heuristic, community information is stored in a vector instead of a map, and is reused in each iteration, but with some additional cost. With the **Vertex ordering via Graph coloring** heuristic, _distance-k coloring_ of graphs is performed in order to group vertices into colors. Then, each set of vertices (by color) is processed _concurrently_, and synchronization is performed after that. This enables us to mimic the behaviour of the serial algorithm. Finally, with the **Threshold scaling** heuristic, _successively smaller values of modularity threshold_ are used as the algorithm progresses. This allows the algorithm to converge faster, and it has been observed a good modularity score as well.
From the results, it appears that _graph coloring_ and _threshold scaling_ heuristics do not always provide a speedup and this depends upon the nature of the graph. It would be interesting to compare the heuristics against baseline approaches. Future work can include _distributed memory implementations_, and _community detection on streaming graphs_.
Application Areas of Community Detection: A Review : NOTESSubhajit Sahu
This is a short review of Community detection methods (on graphs), and their applications. A **community** is a subset of a network whose members are *highly connected*, but *loosely connected* to others outside their community. Different community detection methods *can return differing communities* these algorithms are **heuristic-based**. **Dynamic community detection** involves tracking the *evolution of community structure* over time.
https://gist.github.com/wolfram77/09e64d6ba3ef080db5558feb2d32fdc0
Communities can be of the following **types**:
- Disjoint
- Overlapping
- Hierarchical
- Local.
The following **static** community detection **methods** exist:
- Spectral-based
- Statistical inference
- Optimization
- Dynamics-based
The following **dynamic** community detection **methods** exist:
- Independent community detection and matching
- Dependent community detection (evolutionary)
- Simultaneous community detection on all snapshots
- Dynamic community detection on temporal networks
**Applications** of community detection include:
- Criminal identification
- Fraud detection
- Criminal activities detection
- Bot detection
- Dynamics of epidemic spreading (dynamic)
- Cancer/tumor detection
- Tissue/organ detection
- Evolution of influence (dynamic)
- Astroturfing
- Customer segmentation
- Recommendation systems
- Social network analysis (both)
- Network summarization
- Privary, group segmentation
- Link prediction (both)
- Community evolution prediction (dynamic, hot field)
<br>
<br>
## References
- [Application Areas of Community Detection: A Review : PAPER](https://ieeexplore.ieee.org/document/8625349)
This paper discusses a GPU implementation of the Louvain community detection algorithm. Louvain algorithm obtains hierachical communities as a dendrogram through modularity optimization. Given an undirected weighted graph, all vertices are first considered to be their own communities. In the first phase, each vertex greedily decides to move to the community of one of its neighbours which gives greatest increase in modularity. If moving to no neighbour's community leads to an increase in modularity, the vertex chooses to stay with its own community. This is done sequentially for all the vertices. If the total change in modularity is more than a certain threshold, this phase is repeated. Once this local moving phase is complete, all vertices have formed their first hierarchy of communities. The next phase is called the aggregation phase, where all the vertices belonging to a community are collapsed into a single super-vertex, such that edges between communities are represented as edges between respective super-vertices (edge weights are combined), and edges within each community are represented as self-loops in respective super-vertices (again, edge weights are combined). Together, the local moving and the aggregation phases constitute a stage. This super-vertex graph is then used as input fof the next stage. This process continues until the increase in modularity is below a certain threshold. As a result from each stage, we have a hierarchy of community memberships for each vertex as a dendrogram.
Approaches to perform the Louvain algorithm can be divided into coarse-grained and fine-grained. Coarse-grained approaches process a set of vertices in parallel, while fine-grained approaches process all vertices in parallel. A coarse-grained hybrid-GPU algorithm using multi GPUs has be implemented by Cheong et al. which grabbed my attention. In addition, their algorithm does not use hashing for the local moving phase, but instead sorts each neighbour list based on the community id of each vertex.
https://gist.github.com/wolfram77/7e72c9b8c18c18ab908ae76262099329
Survey for extra-child-process package : NOTESSubhajit Sahu
Useful additions to inbuilt child_process module.
📦 Node.js, 📜 Files, 📰 Docs.
Please see attached PDF for literature survey.
https://gist.github.com/wolfram77/d936da570d7bf73f95d1513d4368573e
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERSubhajit Sahu
This paper presents two algorithms for efficiently computing PageRank on dynamically updating graphs in a batched manner: DynamicLevelwisePR and DynamicMonolithicPR. DynamicLevelwisePR processes vertices level-by-level based on strongly connected components and avoids recomputing converged vertices on the CPU. DynamicMonolithicPR uses a full power iteration approach on the GPU that partitions vertices by in-degree and skips unaffected vertices. Evaluation on real-world graphs shows the batched algorithms provide speedups of up to 4000x over single-edge updates and outperform other state-of-the-art dynamic PageRank algorithms.
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Subhajit Sahu
For the PhD forum an abstract submission is required by 10th May, and poster by 15th May. The event is on 30th May.
https://gist.github.com/wolfram77/1c1f730d20b51e0d2c6d477fd3713024
Fast Incremental Community Detection on Dynamic Graphs : NOTESSubhajit Sahu
In this paper, the authors describe two approaches for dynamic community detection using the CNM algorithm. CNM is a hierarchical, agglomerative algorithm that greedily maximizes modularity. They define two approaches: BasicDyn and FastDyn. BasicDyn backtracks merges of communities until each marked (changed) vertex is its own singleton community. FastDyn undoes a merge only if the quality of merge, as measured by the induced change in modularity, has significantly decreased compared to when the merge initially took place. FastDyn also allows more than two vertices to contract together if in the previous time step these vertices eventually ended up contracted in the same community. In the static case, merging several vertices together in one contraction phase could lead to deteriorating results. FastDyn is able to do this, however, because it uses information from the merges of the previous time step. Intuitively, merges that previously occurred are more likely to be acceptable later.
https://gist.github.com/wolfram77/1856b108334cc822cdddfdfa7334792a
Can you fix farming by going back 8000 years : NOTESSubhajit Sahu
1. Human population didn't explode, but plateued.
2. Fertilizer prices are going to the sky.
3. Farmers are looking for alternatives such as animal waste (manure) or even human waste.
4. Manure prices are also going up.
5. Switching to organic farming not an option.
https://gist.github.com/wolfram77/49067fc3ddc1ba2e1db4f873056fd88a
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆Sérgio Sacani
Context. The early-type galaxy SDSS J133519.91+072807.4 (hereafter SDSS1335+0728), which had exhibited no prior optical variations during the preceding two decades, began showing significant nuclear variability in the Zwicky Transient Facility (ZTF) alert stream from December 2019 (as ZTF19acnskyy). This variability behaviour, coupled with the host-galaxy properties, suggests that SDSS1335+0728 hosts a ∼ 106M⊙ black hole (BH) that is currently in the process of ‘turning on’. Aims. We present a multi-wavelength photometric analysis and spectroscopic follow-up performed with the aim of better understanding the origin of the nuclear variations detected in SDSS1335+0728. Methods. We used archival photometry (from WISE, 2MASS, SDSS, GALEX, eROSITA) and spectroscopic data (from SDSS and LAMOST) to study the state of SDSS1335+0728 prior to December 2019, and new observations from Swift, SOAR/Goodman, VLT/X-shooter, and Keck/LRIS taken after its turn-on to characterise its current state. We analysed the variability of SDSS1335+0728 in the X-ray/UV/optical/mid-infrared range, modelled its spectral energy distribution prior to and after December 2019, and studied the evolution of its UV/optical spectra. Results. From our multi-wavelength photometric analysis, we find that: (a) since 2021, the UV flux (from Swift/UVOT observations) is four times brighter than the flux reported by GALEX in 2004; (b) since June 2022, the mid-infrared flux has risen more than two times, and the W1−W2 WISE colour has become redder; and (c) since February 2024, the source has begun showing X-ray emission. From our spectroscopic follow-up, we see that (i) the narrow emission line ratios are now consistent with a more energetic ionising continuum; (ii) broad emission lines are not detected; and (iii) the [OIII] line increased its flux ∼ 3.6 years after the first ZTF alert, which implies a relatively compact narrow-line-emitting region. Conclusions. We conclude that the variations observed in SDSS1335+0728 could be either explained by a ∼ 106M⊙ AGN that is just turning on or by an exotic tidal disruption event (TDE). If the former is true, SDSS1335+0728 is one of the strongest cases of an AGNobserved in the process of activating. If the latter were found to be the case, it would correspond to the longest and faintest TDE ever observed (or another class of still unknown nuclear transient). Future observations of SDSS1335+0728 are crucial to further understand its behaviour. Key words. galaxies: active– accretion, accretion discs– galaxies: individual: SDSS J133519.91+072807.4
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxgoluk9330
Ahota Beel, nestled in Sootea Biswanath Assam , is celebrated for its extraordinary diversity of bird species. This wetland sanctuary supports a myriad of avian residents and migrants alike. Visitors can admire the elegant flights of migratory species such as the Northern Pintail and Eurasian Wigeon, alongside resident birds including the Asian Openbill and Pheasant-tailed Jacana. With its tranquil scenery and varied habitats, Ahota Beel offers a perfect haven for birdwatchers to appreciate and study the vibrant birdlife that thrives in this natural refuge.
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxshubhijain836
Centrifugation is a powerful technique used in laboratories to separate components of a heterogeneous mixture based on their density. This process utilizes centrifugal force to rapidly spin samples, causing denser particles to migrate outward more quickly than lighter ones. As a result, distinct layers form within the sample tube, allowing for easy isolation and purification of target substances.
PPT on Sustainable Land Management presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...Sérgio Sacani
We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS
+
53.13485
−
27.82088
with a host spectroscopic redshift of
2.903
±
0.007
. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (
�
(
�
−
�
)
∼
0.9
) despite a host galaxy with low-extinction and has a high Ca II velocity (
19
,
000
±
2
,
000
km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-
�
Ca-rich population. Although such an object is too red for any low-
�
cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement (
≲
1
�
) with
Λ
CDM. Therefore unlike low-
�
Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-
�
truly diverge from their low-
�
counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Sérgio Sacani
Wereport the study of a huge optical intraday flare on 2021 November 12 at 2 a.m. UT in the blazar OJ287. In the binary black hole model, it is associated with an impact of the secondary black hole on the accretion disk of the primary. Our multifrequency observing campaign was set up to search for such a signature of the impact based on a prediction made 8 yr earlier. The first I-band results of the flare have already been reported by Kishore et al. (2024). Here we combine these data with our monitoring in the R-band. There is a big change in the R–I spectral index by 1.0 ±0.1 between the normal background and the flare, suggesting a new component of radiation. The polarization variation during the rise of the flare suggests the same. The limits on the source size place it most reasonably in the jet of the secondary BH. We then ask why we have not seen this phenomenon before. We show that OJ287 was never before observed with sufficient sensitivity on the night when the flare should have happened according to the binary model. We also study the probability that this flare is just an oversized example of intraday variability using the Krakow data set of intense monitoring between 2015 and 2023. We find that the occurrence of a flare of this size and rapidity is unlikely. In machine-readable Tables 1 and 2, we give the full orbit-linked historical light curve of OJ287 as well as the dense monitoring sample of Krakow.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Signatures of wave erosion in Titan’s coastsSérgio Sacani
The shorelines of Titan’s hydrocarbon seas trace flooded erosional landforms such as river valleys; however, it isunclear whether coastal erosion has subsequently altered these shorelines. Spacecraft observations and theo-retical models suggest that wind may cause waves to form on Titan’s seas, potentially driving coastal erosion,but the observational evidence of waves is indirect, and the processes affecting shoreline evolution on Titanremain unknown. No widely accepted framework exists for using shoreline morphology to quantitatively dis-cern coastal erosion mechanisms, even on Earth, where the dominant mechanisms are known. We combinelandscape evolution models with measurements of shoreline shape on Earth to characterize how differentcoastal erosion mechanisms affect shoreline morphology. Applying this framework to Titan, we find that theshorelines of Titan’s seas are most consistent with flooded landscapes that subsequently have been eroded bywaves, rather than a uniform erosional process or no coastal erosion, particularly if wave growth saturates atfetch lengths of tens of kilometers.
PPT on Alternate Wetting and Drying presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Sérgio Sacani
Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data : NOTES
1. word2vec, node2vec, graph2vec, X2vec:
Towards a Theory of Vector Embeddings of
Structured Data
Martin Grohe
RWTH Aachen University
Vector representations of graphs and relational structures, whether hand-
crafted feature vectors or learned representations, enable us to apply standard
data analysis and machine learning techniques to the structures. A wide
range of methods for generating such embeddings have been studied in the
machine learning and knowledge representation literature. However, vector
embeddings have received relatively little attention from a theoretical point
of view.
Starting with a survey of embedding techniques that have been used in
practice, in this paper we propose two theoretical approaches that we see as
central for understanding the foundations of vector embeddings. We draw
connections between the various approaches and suggest directions for future
research.
Figure 1: Embedding graphs into a vector space
1
arXiv:2003.12590v1
[cs.LG]
27
Mar
2020
Wow
2. 1 Introduction
Typical machine learning algorithms operating on structured data require representa-
tions of the often symbolic data as numerical vectors. Vector representations of the
data range from handcrafted feature vectors via automatically constructed graph ker-
nels to learned representations, either computed by dedicated embedding algorithms
or implicitly computed by learning architectures like graph neural networks. The per-
formance of machine learning methods crucially depends on the quality of the vector
representations. Therefore, there is a wealth of research proposing a wide range of
vector-embedding methods for various applications. Most of this research is empirical
and often geared towards specific application areas. Given the importance of the topic,
there is surprisingly little theoretical work on vector embeddings, especially when it
comes to representing structural information that goes beyond metric information (that
is, distances in a graph).
The goal of this paper is to give an overview over the various embedding techniques
for structured data that are used in practice and to introduce theoretical ideas that
can, and to some extent have been used to understand and analyse them. The research
landscape on vector embeddings is unwieldy, with several communities working largely
independently on related questions, motivated by different application areas such as
social network analysis, knowledge graphs, chemoinformatics, computational biology,
etc. Therefore, we need to be selective, focussing on common ideas and connections
where we see them.
Vector embeddings can bridge the gap between the “discrete” world of relational data
and the “differentiable” world of machine learning and for this reason have a great
potential for database research. Yet relatively little work has been done on embeddings
of relational data beyond the binary relations of knowledge graphs. Throughout the
paper, I will try to point out potential directions for database related research questions
on vector embeddings.
A vector embedding for a class X of objects is a mapping f from X into some vector
space, called the latent space, which we usually assume to be a real vector space Rd
of finite dimension d. The idea is to define a vector embedding in such a way that
geometric relationships in the latent space reflect semantic relationships between the
objects in X. Most importantly, we want similar objects in X to be mapped to vectors
close to one another with respect to some standard metric on the latent space (say,
Euclidean). For example, in an embedding of words of a natural language we want
words with similar meanings, like “shoe” and “boot”, to be mapped to vectors that
are close to each other. Sometimes, we want further-reaching correspondences between
properties of and relations between objects in X and the geometry of their images in
latent space. For example, in an embedding f of the entities of a knowledge base, among
them Paris, France, Santiago, Chile, we may want t := f(Paris) − f(France) to be
(approximately) equal to f(Santiago) − f(Chile), so that the relation is-capital-of
corresponds to the translation by the vector t in latent space.
A difficulty is that the semantic relationships and similarities between the objects in
X can rarely be quantified precisely. They usually only have an intuitive meaning that,
2
Ternary?
Similarity, like
SimRank
Or dot product
There can be different kinds of similarity: Are teeth and mouth more similar, or teeth and bone
Maybe need to consider multiple layers of relationships - multilayer graphs!
Choice of
relation in
latent space?
3. moreover, may be application dependent. However, this is not necessarily a problem,
because we can learn vector representations in such a way that they yield good results
when we use them to solve machine learning tasks (so-called downstream tasks). This
way, we never have to make the semantic relationships explicit. As a simple example, we
may use a nearest-neighbour based classification algorithm on the vectors our embedding
gives us; if it performs well then the distance between vectors must be relevant for this
classification task. This way, we can even use vector embeddings, trained to perform well
on certain machine learning tasks, to define semantically meaningful distance measures
on our original objects, that is, to define the distance distf (X, Y ) between objects X, Y ∈
X to be kf(X) − f(Y )k. We call distf the distance measure induced by the embedding
f.
In this paper, the objects X ∈ X we want to embed either are graphs, possibly
labelled or weighted, or more generally relational structures, or they are nodes of a
(presumably large) graph or more generally elements or tuples appearing in a relational
structure. When we embed entire graphs or structures, we speak of graph embeddings
or relational structure embeddings; when we embed only nodes or elements we speak
of node embeddings. These two types of embeddings are related, but there are clear
differences. Most importantly, in node embeddings there are explicit relations such as
adjacency and derived relations such as distance between the objects of X (the nodes
of a graph), whereas in graph embeddings all relations between objects are implicit or
“semantic”, for example “having the same number of vertices” or”having the same girth”
(see Figure 1).
The key theoretical questions we will ask about vector embeddings of objects in X are
the following.
Expressivity: Which properties of objects X ∈ X are represented by the embedding?
What is the meaning of the induced distance measure? Are there geometric prop-
erties of the latent space that represent meaningful relations on X?
Complexity: What is the computational cost of computing the vector embedding? What
are efficient embedding algorithms? How can we efficiently retrieve semantic in-
formation of the embedded data, for example, answer queries?
A third question that relates to both expressivity and complexity is what dimension to
choose for the latent space. In general, we expect a trade-off between (high) expressivity
and (low) dimension, but it may well be that there is an inherent dimension of the
data set. It is an appealing idea (see, for example, [98]) to think of “natural” data sets
appearing in practice as lying on a low dimensional manifold in high dimensional space.
Then we can regard the dimension of this manifold as the inherent dimension of the data
set.
Reasonably well-understood from a theoretical point of view are node embeddings of
graphs that aim to preserve distances between nodes, that is, embeddings f : V (G) → Rd
of the vertex set V (G) of some graph G such that distG(x, y) ≈ kf(x) − f(y)k, where
distG is the shortest-path distance in G. There is a substantial theory of such metric
3
Cool, not
just similarity
4. embeddings (see [64]). In many applications of node embeddings, metric embeddings are
indeed what we need.
However, the metric is only one aspect of the information carried by a graph or rela-
tional structure, and arguably not the most important one from a database perspective.
Moreover, if we consider graph embeddings rather than node embeddings, there is no
metric to start with. In this paper, we are concerned with structural vector embeddings
of graphs, relational structures, and their nodes. Two theoretical ideas that have been
shown to help in understanding and even designing vector embeddings of structures are
the Weisfeiler-Leman algorithm and various concepts in its context, and homomorphism
vectors, which can be seen as a general framework for defining “structural” (as opposed
to “metric”) embeddings. We will see that these theoretical concepts have a rich theory
that connects them to the embedding techniques used in practice in various ways.
The rest of the paper is organised as follows. Section 2 is a very brief survey of some
of the embedding techniques that can be found in the machine learning and knowledge
representation literature. In Section 3, we introduce the Weisfeiler-Leman algorithm.
This algorithm, originally a graph isomorphism test, turns out to be an important link
between the embedding techniques described in Section 2 and the theory of homomor-
phism vectors, which will be discussed in detail in Section 4. Finally, Section 5 is devoted
to a discussion of similarity measures for graphs and structures.
2 Embedding Techniques
In this section, we give a brief and selective overview of embedding techniques. More
thorough recent surveys are [50] (on node embeddings), [104] (on graph neural networks),
[102] (on knowledge graph embeddings), and [62] (on graph kernels).
2.1 From Metric Embeddings to Node Embeddings
Node embeddings can be traced back to the theory of embeddings of finite metric spaces
and dimensionality reduction, which have been studied in geometry (e.g. [21, 55]) and
algorithmic graph theory (e.g. [54, 64]). In statistics and data science, well-known tradi-
tional methods of metric embeddings and dimensionality reduction are multidimensional
scaling [63], Isomap [98], and Laplacian eigenmap [11]. More recent related approaches
are [1, 25, 85, 97]. The idea is always to embed the nodes of a graph in such a way that
the distance (or similarity) between vectors approximates the distance (or similarity) be-
tween nodes. Sometimes, this can be viewed as a matrix factorisation. Suppose we have
defined a similarity measure on the nodes of our graph G = (V, E) that is represented
by a similarity matrix S ∈ RV ×V . In the simplest version, we can just take S to be the
adjacency matrix of the graph; in the literature this is sometimes referred to a first-order
proximity. Another common choice is S = (Svw) with Svw := exp(−c distG(v, w)), where
c > 0 is a parameter. We describe our embedding of V into Rd by a matrix X ∈ RV ×d
whose rows xv ∈ Rd are the images of the nodes. If we measure the similarity between
vectors x, y by their normalised inner product hx,yi
kxkkyk (this is known as the cosine sim-
ilarity), then our objective is to find a matrix X with normalised rows that minimises
4
OM course
SimRank
5. (a)
(b)
(c)
Figure 2: Three node embeddings of a graph using (a) singular value decomposition of
adjacency matrix, (b) singular value decomposition of the similarity matrix
with entries Svw = exp(−2 dist(v, w)), (c) node2vec [48]
kXX> − SkF with respect to the Frobenius norm k · kF (or any other matrix norm, see
Section 5). In the basic version with the Frobenius norm, the problem can be solved
using the singular value decomposition of S. In a more general form, we compute a
similarity matrix b
S ∈ RV ×V whose (v, w)-entry quantifies the similarity between vec-
tors xv, xw and minimise the distance between S and b
S, for example using stochastic
gradient descent. In [50], this approach to learning node embeddings is described as an
encoder-decoder framework.
Learned word embeddings and in particular the word2vec algorithm [74] introduced
new ideas that had huge impact in natural language processing. These ideas also inspired
new approaches to node embeddings like DeepWalk [87] and node2vec [48] based on
taking short random walks in a graph and interpreting the sequence of nodes seen on such
random walks as if they were words appearing together in a sentence. These approaches
can still be described in the matrix-similarity (or encoder-decoder) framework: as the
similarity between nodes v and w we take the probability that a fixed-length random
walk starting in v ends in w. We can approximate this probability by sampling random
5
Unarguably, this looks the best
6. walks. Note that, even in undirected graphs, this similarity measure is not necessarily
symmetric.
From a deep learning perspective, the embedding methods described so far are all
“shallow” in that they directly optimise the output vectors and there are no hidden
layers; computing the vector xv corresponding to a node v amounts to a table lookup.
There are also deep learning methods for computing node embeddings (e.g. [26, 49, 101]).
Before we discuss such approaches any further, let us introduce graph neural networks
as a general deep learning framework for graphs that has received a lot of attention in
recent years.
2.2 Graph Neural Networks
When trying to apply deep learning methods to graphs or relational structures, we face
two immediate difficulties: (1) we want the methods to scale across different graph sizes,
but standard feed-forward neural networks have a fixed input size; (2) we want the
methods to be isomorphism invariant and not depend on a specific representation of the
input graph. Both points show that it is problematic to just feed the adjacency matrix
(or any other standard representation of a graph) into a deep neural network in a generic
“end-to-end” learning architecture.
Graph neural networks (GNNs) are a deep learning framework for graphs that avoids
both of these difficulties, albeit at the price of limited expressiveness (see Section 3).
Early forms of GNNs were introduced in [90, 36]; the version we present here is based on
[58, 49, 88]. Intuitively, a GNN model can be thought of as a message passing network
over the input graph G = (V, E). Each node v has a state xv ∈ Rd. Nodes can exchange
messages along the edges of G and update their states. To specify the model, we need
to specify two functions: an aggregation function that takes the current states of the
neighbours of a node and aggregates them into a single vector, and an update function
that takes the aggregate value obtained from the neighbours and the current state of the
node as inputs and computes the new state of the node. In a simple form, we may take
the following functions:
Aggregate : a(t+1)
v ←
X
w∈N(v)
Wagg · x(t)
w , (2.1)
Update : x(t+1)
v ← σ Wup ·
x
(t)
v
a
(t+1)
v
!!
, (2.2)
where Wagg ∈ Rc×d and Wup ∈ Rd×(c+d) are learned parameter matrices and σ is a
nonlinear “activation” function, for example the ReLU (rectified linear unit) function
σ(x) := max{0, x} applied pointwise to a vector. It is important to note that the
parameter matrices Wagg and Wup do not depend on the node v; they are shared across
all nodes of a graph. This parameter sharing allows it to use the same GNN model for
graphs of arbitrary sizes.
Of course, we can also use more complicated aggregation and update functions. We
only want these functions to be differentiable to be able to use gradient descent opti-
6
7. misation methods in the training phase, and we want the aggregation function to be
symmetric in its arguments x
(t)
w for w ∈ N(v) to make sure that the GNN computes a
function that is isomorphism invariant. For example, in [100] we use a linear aggregation
function and an update function computed by an LSTM (long short-term memory, [51]),
a specific recurrent neural network component that allows it to “remember” relevant in-
formation from the sequence x
(0)
v , x
(1)
v , . . . , x
(t)
v .
The computation of such a GNN model starts from a initial configuration x
(0)
v )v∈V
and proceeds through a fixed-number t of aggregation- and update-steps, resulting in a
final configuration x
(t)
v )v∈V . Note that this configuration gives us a node embedding
v 7→ x
(t)
v of the input graph. We can also stack several such GNN layers, each with its own
aggregation and activation function, on top of one another, using the final configuration
of each (but the last) layer as the initial configuration of the following layer and the
final configuration of the last layer as the node embedding. As initial states, we can take
constant vectors like the all-ones vector for each node, or we can assign a random initial
state to each node. We can also use the initial state to represent the node labels if the
input graph is labelled.
To train a GNN for computing a node-embedding, in principle we can use any of
the loss functions used by the embedding techniques described in Section 2.1. The
reader may wonder what advantage the complicated GNN architecture has over just
optimising the embedding matrix X (as the methods described in Section 2.1 do). The
main advantage is that the GNN method is inductive, whereas the previously described
methods are transductive. This means that a GNN represents a function that we can
apply to arbitrary graphs, not just to the graph it was originally trained on. So if the
graph changes over time and, for example, nodes are added, we do not have to re-train
the embedding, but just embed the new nodes using the GNN model we already have,
which is much more efficient. We can even apply the model to an entirely new graph
and still hope it gives us a reasonable embedding. The most prominent example of an
inductive node-embedding tool based on GNNs is GraphSage [49].
Let me close this section by remarking that GNNs are used for all kinds of machine
learning tasks on graphs and not only to compute node embeddings. For example, a GNN
based architecture for graph classification would plug the output of the GNN layer(s)
into a standard feedforward network (possibly consisting only of a single softmax layer).
2.3 Knowledge Graph and Relational Structure Embeddings
Node embeddings of knowledge graphs have also been studied quite intensely in recent
years, remarkably by a community that seems almost disjoint from that involved in the
node embedding techniques described in Section 2.1. What makes knowledge graphs
somewhat special is that they come with labelled edges (or, equivalently, many different
binary relations) as well as labelled nodes. It is not completely straightforward to adapt
the methods of Section 2.1 to edge- and vertex-labelled graphs. Another important
difference is in the objective function: the methods of Section 2.1 mainly focus on the
graph metric (even though approaches based on random walks like node2vec are flexible
7
8. and also incorporate structural criteria). However, shortest-path distance is less relevant
in knowledge graphs.
Rather than focussing on distances, knowledge graph embeddings focus on establishing
a correspondence between the relations of the knowledge graph and geometric relation-
ships in the latent space. A very influential algorithm, TransE [18] aims to associate
a specific translation of the latent space with each relation. Recall the example of the
introduction, where entities Paris, France, Santiago, Chile were supposed to be em-
bedded in such a way that xParis − xFrance ≈ xSantiago − xChile, so that the relation
is-capital-of corresponds to the translation by t := xParis − xFrance.
A different algorithm for mapping relations to geometric relationships is Rescal [83].
Here the idea is to associate a bilinear form βR with each relation R in such a way
that for all entities v, w it holds that βR(xv, xw) ≈ 1 if (v, w) ∈ R and βR(xv, xw) ≈ 0
if (v, w) 6∈ R. We can represent such a bilinear form βR by a matrix BR such that
βR(x, y) = x>BRy. Then the objective is to minimise, simultaneously for all R, the
term kXBRX> − ARk, where X is the embedding matrix with rows xv and AR is the
adjacency matrix of the relation R. Note that this is a multi-relational version of the
matrix-factorisation approach described in Section 2.1, with the additional twist that we
also need to find the matrix BR for each relation R.
Completing our remarks on knowledge graph embeddings, we mention that it is fairly
straightforward to generalise the GNN based node embeddings to (vertex- and edge-
)labelled graphs and hence to knowledge graphs [91].
While there is a large body of work on embedding knowledge graphs, that is, binary
relational structures, not much is known about embedding relations of higher arities. Of
course one approach to embedding relational structures of higher arities is to transform
them into their binary incidence structures (see Section 4.2 for a definition) and then
embed these using any of the methods available for binary structures. Currently, I
am not aware of any empirical studies on the practical viability of this approach. An
alternative approach [16, 17] is based on the idea of treating the rows of a table, that is,
tuples in a relation, like sentences in natural language and then use word embeddings
to embed the entities.
2.4 Graph Kernels
There are machine learning methods that operate on vector representations of their
input objects, but only use these vector representations implicitly and never actually
access the vectors. All they need to access is the inner product between two vectors.
For reasons that will become apparent soon, such methods are known as kernel methods.
Among them are support vector machines for classification [29] and principal component
analysis as well as k-means clustering for unsupervised learning [92] (see [93, Chapter 16]
for background).
A kernel functions for a set X of objects is a binary function K : X × X → R that
is symmetric, that is, K(x, y) = K(y, x) for all x, y ∈ X, and positive semidefinite, that
is, for all n ≥ 1 and all x1, . . . , xn ∈ X the matrix M with entries Mij := K(xi, xj) is
positive semidefinite. Recall that a symmetric matrix M ∈ Rn×n is positive semidefinite
8
9. if all its eigenvalues are nonnegative, or equivalently, if xT Mx ≥ 0 for all x ∈ Rn. It
can be shown that a symmetric function K : X × X → R is a kernel function if and only
if there is a vector embedding f : X → H of X into some Hilbert space H such that K
is the mapping induced by the inner product of H, that is, K(x, y) = hf(x), f(y)i for
all x, y ∈ X (see [93, Lemma 16.2] for a proof). For our purposes, it suffices to know
that a Hilbert space is a potentially infinite dimensional real vector space H with a a
symmetric bilinear form h·, ·i : H × H → R satisfying hx, xi > 0 for all x ∈ H {0}. A
difference between the embeddings underlying kernels and most other vector embeddings
is that the kernel embeddings usually embed into higher dimensional spaces (even infinite
dimensional Hilbert spaces). Dimension does not play a big role for kernels, because the
embeddings are only used implicitly. Nevertheless, it can sometimes be more efficient to
use the embeddings underlying kernels explicitly [61].
Kernel methods have dominated machine learning on graphs for a long time and,
despite of the recent successes of GNNs, they are still competitive. Quoting [62], “It
remains a current challenge in research to develop neural techniques for graphs that are
able to learn feature representations that are clearly superior to the fixed feature spaces
used by graph kernels.”
The first dedicated graph kernels were the random walk graph kernels [37, 56] based on
counting walks of all lengths. In some sense, they are similar to the random walk based
node-embedding algorithms described in Section 2.1. Other graph kernels are based on
counting shortest paths, trees and cycles, and small subgraph patterns [20, 52, 89, 95].
The important Weisfeiler-Leman kernels [94], which we will describe in Section 3, are
based on aggregating local neighbourhood information. All these kernels are defined for
graphs with discrete labels. The techniques, all essentially based on counting certain
subgraphs, are not directly applicable to graphs with continuous labels such as edge
weights. An adaptation to continuous labels based on hashing is proposed in [77].
Let me remark that there are also node kernels defined on the nodes of a graph
[60, 82, 96]. They implicitly give a vector embedding of the nodes. However, compared
to the node embedding techniques discussed before, they only play a minor role. For
graph embeddings, we are in the opposite situation: kernels are the dominant technique.
However, there are a few other approaches.
2.5 Graph Embeddings
Graph2Vec [80] is a transductive approach to embedding graphs inspired by word2vec
and some of the node embedding techniques discussed in Section 2.1. As before, “trans-
ductive” means that the embedding is computed for a fixed set of graphs in form of an
embedding matrix (or look-up table) and thus it only yields an embedding for graphs
known at training time. For typical machine learning applications it is unusual to op-
erate on set of graphs that is fixed in advance, so inductive embedding approaches are
clearly preferable.
GNNs can also be used to embed entire graphs, in the simplest form by just aggregating
the embeddings of the nodes computed by the GNN. Graph autoencoders [59, 86] give
a way to train such graph embeddings in an unsupervised manner. A more advanced
9
10. GNN architecture for learning graph embedding, based on capsule neural networks, is
proposed in [105].
3 The Weisfeiler-Leman Algorithm
We slightly digress from our main theme and introduce the Weisfeiler-Leman algorithm,
a very efficient combinatorial partitioning algorithm that was originally introduced as a
fingerprinting technique for chemical molecules [75]. The algorithm plays an important
role in the graph isomorphism literature, both in theory (for example, [7, 41]) and
practice, where it appears as a subroutine in all competitive graph isomorphism tools
(see [73]). As we will see, the algorithm has interesting connections with the embedding
techniques discussed in the previous section.
3.1 1-Dimensional Weisfeiler-Leman
The Weisfeiler-Leman algorithm has a parameter k, its dimension. We start by de-
scribing the 1-dimensional version 1-WL, which is also known as colour refinement or
naive vertex classification. The algorithm computes a partition of the nodes of its input
graph. It is convenient to think of the classes of the partition as colours of the nodes.
A colouring (or partition) is stable if any two nodes v, w of the same colour c have the
same number of neighbours of any colour d. The algorithm computes a stable colouring
by iteratively refining an initial colouring as described in Algorithm 1. Figure 3 shows
an example run of 1-WL. The algorithm is very efficient; it can be implemented to run
in time O((n + m) log n), where n is the number of vertices and m the number of edges
of the input graph [27]. Under reasonable assumptions on the algorithms used, this is
best possible [12].
1-WL
Input: Graph G
Initialisation: All nodes get the same colour.
Refinement Round: For all colours c in the current colouring and all nodes v, w of
colour c, the nodes v and w get different colours in the new colouring if there is
some colour d such that v and w have different numbers of neighbours of colour
d.
The refinement is repeated until the colouring is stable, then the stable colouring is
returned.
Algorithm 1: The 1-dimensional WL algorithm
To use 1-WL as an isomorphism test, we note that the colouring computed by the
algorithm is isomorphism invariant, which means that if we run the algorithm on two
isomorphic graphs the resulting coloured graphs will still be isomorphic and in particular
10
11. (a) initial graph (b) colouring after round 1
(c) colouring after round 2 (d) stable colouring after round 3
Figure 3: A run of 1-WL
have the same numbers of nodes of each colour. Thus, if we run the algorithm on two
graphs and find that they have distinct numbers of vertices of some colour, we have
produced a certificate of non-isomorphism. If this is the case, we say that 1-WL distin-
guishes the two graphs. Unfortunately, 1-WL does not distinguish all non-isomorphic
graphs. For example, it does not distinguish a cycle of length 6 from the disjoint union
of two triangles. But, remarkably, 1-WL does distinguish almost all graphs, in a precise
probabilistic sense [8].
3.2 Variants of 1-WL
The version of 1-WL we have formulated is designed for undirected graphs. For directed
graphs it is better to consider in-neighbours and out-neighbours of nodes separately.
1-WL can easily be adapted to labelled graphs. If vertex labels are present, they can be
incorporated in the initial colouring: two vertices get the same initial colour if and only
if they have the same label(s). We can incorporate edge labels in the refinement rounds:
two nodes v and w get different colours in the new colouring if there is some colour d
and some edge label λ such that v and w have a different number of λ-neighbours of
colour d.
However, if the edge labels are real numbers, which we interpret as edge weights, or
more generally elements of an arbitrary commutative monoid, then we can also use the
following weighted version of 1-WL due to [44]. Instead of refining by the number of
edges into some colour, we refine by the sum of the edge weights into that colour. Thus
the refinement round of Algorithm 1 is modified as follows: for all colours c in the current
colouring and all nodes v, w of colour c, v and w get different colours in the new colouring
if there is some colour d such that
X
x of colour d
α(v, x) 6=
X
x of colour d
α(w, x), (3.1)
where α(x, y) denotes the weight of the edge from x to y, and we set α(x, y) = 0
11
Can we use this property for almost isomorphic graphs?
12. 0.3 2 1 0 0.7
1 0 1 1 1
0.7 2 0 1 0.3
0
v1
v2
v3
w1 w2 w3 w4 w5
v1
v2
v3
w1
w2
w3
w4
w5
0.3
2
0
.
7
0
.
7
2
0.3
Figure 4: Stable colouring of a matrix and the corresponding weighted bipartite graph
computed by matrix WL
if there is no edge from x to y. This idea also allows us to define 1-WL on matri-
ces: with a matrix A ∈ Rm×n we associate a weighted bipartite graph with vertex set
{v1, . . . , vm, w1, . . . , wn} and edge weights α vi, wj) := Aij and α(vi, vi0 ) = α(wj, wj0 ) =
0 and run weighted 1-WL on this weighted graph with initial colouring that distinguishes
the vi (rows) from the wj (columns). An example is shown in Figure 4. This matrix-
version of WL was applied in [44] to design a dimension reduction techniques that speeds
up the solving of linear programs with many symmetries (or regularities).
3.3 Higher-Dimensional WL
For this paper, the 1-dimensional version of the Weisfeiler-Leman algorithm is the most
relevant, but let us briefly describe the higher dimensional versions. In fact, it is the
2-dimensional version, also referred to as classical WL, that was introduced by Weis-
feiler and Leman [103] in 1968 and gave the algorithm its name. The k-dimensional
Weisfeiler-Leman algorithm (k-WL) is based on the same iterative-refinement idea as 1-
WL. However, instead of vertices, k-WL colours k-tuples of vertices of a graph. Initially,
each k-tuple is “coloured” by the isomorphism type of the subgraph it induces. Then in
the refinement rounds, the colour information is propagated between “adjacent” tuples
that only differ in one coordinate (details can be found in [24]). If implemented using
similar ideas as for 1-WL, k-WL runs in time O(nk+1 log n) [53].
Higher-dimensional WL is much more powerful than 1-WL, but Cai, Fürer, and Im-
merman [24] proved that for every k there are non-isomorphic graphs Gk, Hk that are
not distinguished by k-WL. These graphs, known as the CFI graphs, have size O(k) and
are 3-regular.
DeepWL, a WL-Version of unlimited dimension that can distinguish the CFI-graphs
in polynomial time, was recently introduced in [47].
3.4 Logical and Algebraic Characterisations
The beauty of the WL algorithm lies in the fact that its expressiveness has several natural
and completely unrelated characterisations. Of these, we will see two in this section.
Later, we will see two more characterisations in terms of GNNs and homomorphism
12
13. numbers.
The logic C is the extension of first-order logic by counting quantifiers of the form
∃≥px (“there exists at least p elements x”). Every C-formula is equivalent to a formula
of plain first-order logic. However, here we are interested in fragments of C obtained
by restricting the number of variables of formulas, and the translation from C to first-
order logic may increase the number of variables. For every k ≥ 1, by Ck
we denote the
fragment of C consisting of all formulas with at most k (free or bound) variables. The
finite variable logics Ck
play an important role in finite model theory (see, for example,
[40]). Cai, Fürer, and Immerman [24] have related these fragments to the WL algorithm.
Theorem 3.1 ([24]). Two graphs are Ck+1
-equivalent, that is, they satisfy the same
sentences of the logic Ck+1
, if and only if k-WL does not distinguish the graphs.
Let us now turn to an algebraic characterisation of WL. Our starting point is the
observation that two graphs G, H with vertex sets V, W and adjacency matrices A ∈
RV ×V , B ∈ RW×W are isomorphic if and only there is a permutation matrix X ∈ RV ×W
such that X>AX = B. Recall that a permutation matrix is a {0, 1}-matrix that has
exactly one 1-entry in each row and in each column. Since permutation matrices are
orthogonal (i.e., they satisfy X> = X−1), we can rewrite this as AX = XB, which has
the advantage of being linear. This corresponds to the following linear equations in the
variables Xvw, for v ∈ V and w ∈ W:
X
v0∈V
Avv0 Xv0w =
X
w0∈W
Xvw0 Bw0w for all v ∈ V, w ∈ W. (3.2)
We can add equations expressing that the row and column sums of the matrix X are 1,
which implies that X is a permutation matrix if the Xvw are nonnegative integers.
X
w0∈W
Xvw0 =
X
v0∈V
Xv0w = 1 for all v ∈ V, w ∈ W. (3.3)
Obviously, equations (3.2) and (3.3) have a nonnegative integer solution if and only if
the graphs G and H are isomorphic. This does not help much from an algorithmic point
of view, because it is NP-hard to decide if a system of linear equations and inequalities
has an integer solution. But what about nonnegative rational solutions? We know that
we can compute them in polynomial time. A nonnegative rational solution to (3.2) and
(3.3), which can also be seen as a doubly stochastic matrix satisfying AX = XB, is
called a fractional isomorphism between G and H. If such a fractional isomorphism
exists, we say that G and H are fractionally isomorphic. Tinhofer [99] proved the
following theorem.
Theorem 3.2 ([99]). Graphs G and H are fractionally isomorphic if and only if 1-WL
does not distinguish G and H.
A corresponding theorem also holds for the weighted and the matrix version of 1-
WL [44]. Moreover, Atserias and Maneva [5] proved a generalisation that relates k-WL
13
14. G
C0
C1
C2
Figure 5: Viewing colours of WL as trees
to the level-k Sherali-Adams relaxation of the system of equations and thus yields an
algebraic characterisation of k-WL indistinguishability (also see [45, 71] and [6, 84, 13, 39]
for related algebraic aspects of WL).
Note that to decide whether two graphs G, H with adjacency matrices G, H are frac-
tionally isomorphic, we can minimise the convex function kAX −XBkF , where X ranges
over the convex set of doubly stochastic matrices. To minimise this function, we can
use standard gradient descent techniques for convex minimisation. It was shown in [57]
that, surprisingly, the refinement rounds of 1-WL closely correspond to the iterations of
the Frank-Wolfe convex minimisation algorithm.
3.5 Weisfeiler-Leman Graph Kernels
The WL algorithm collects local structure information and propagates it along the edges
of a graph. We can define very effective graph kernels based on this local information.
For every i ≥ 0, let Ci be the set of colours that 1-WL assigns to the vertices of a graph
in the i-th round. Figure 5 illustrates that we can identify the colours in Ci with rooted
trees of height i. For every graph G and every colour c ∈ Ci, by wl(c, G) we denote the
number of vertices that receive colour c in the ith round of 1-WL.
Example 3.3. For the graph G shown in Figure 5 we have
wl
, G
= 2, wl
, G
= 0.
For every t ∈ N, the t-round WL-kernel is the mapping K
(t)
WL defined by
K
(t)
WL(G, H) :=
t
X
i=0
X
c∈Ci
wl(c, G) · wl(c, H)
for all graphs G, H. It is easy to see that this mapping is symmetric and positive-
semidefinite and thus indeed a kernel mapping; the corresponding vector embedding
maps each graph G to the vector
wl(c, G) c ∈
t
[
i=0
Ci
.
14
15. Note that formally, we are mapping G to an infinite dimensional vector space, because
all the sets Ci for i ≥ 1 are infinite. However, for a graph G of order n the vector as at
most kn + 1 nonzero entries. We can also define a version KWL of the WL-kernel that
does not depend on a fixed-number of rounds by letting
KWL(G, H) :=
X
i≥0
1
2i
X
c∈Ci
wl(c, G) · wl(c, H).
The WL-kernel was introduced by Shervashidze et al. [94] under the name Weisfeiler-
Leman subtree kernel. They also introduce variants such as a Weisfeiler-Leman shortest
path kernel. A great advantage the WL (subtree) kernel has over most of the graph
kernels discussed in Section 2.4 is its efficiency, while performing at least as good as
other kernels on downstream tasks. Shervashidze et al. [94] report that in practice, t = 5
is a good number of rounds for the t-round WL-kernel.
There are also graph kernels based on higher dimensional WL algorithm [76].
3.6 Weisfeiler-Leman and GNNs
Recall that a GNN computes a sequence (x
(t)
v )v∈V , for t ≥ 0, of vector embeddings of a
graph G = (V, E). In the most general form, it is recursively defined by
x(t+1)
v = fUP
x(t)
v , fAGG xw w ∈ N(v)
,
where the aggregation function fAGG is symmetric in its arguments. It has been observed
in several places [49, 78, 106] that this is very similar to the update process of 1-WL.
Indeed, it is easy to see that if the initial embedding x
(0)
v is constant then for any two
vertices v, w, if 1-WL assigns the same colour to v and w then x
(t)
v = x
(t)
w . This implies
that two graphs that cannot be distinguished by 1-WL will give the same result for any
GNN applied to them; that is, GNNs are at most as expressive as 1-WL. It is shown in
[78] that a converse of this holds as well, even if the aggregation and update functions
of the GNN are of a very simple form (like (2.1) and (2.2) in Section 2.2). Based on
the connection between WL and logic, a more refined analysis of the expressiveness of
GNNs was carried out in [10]. However, the limitations of the expressiveness only hold
if the initial embedding x
(0)
v is constant (or at least constant on all 1-WL colour classes).
We can increase the expressiveness of GNNs by assigning random initial vectors x
(0)
v
to the vertices. The price we pay for this increased expressiveness is that the output
of a run of the GNN model is no longer isomorphism invariant. However, the whole
randomised process is still isomorphism invariant. More formally, the random variable
that associates an output x
(0)
v )v∈V with each graph G is isomorphism invariant.
A fully invariant way to increase the expressiveness of GNNs is to build “higher-
dimensional” GNNs, inspired by the higher-dimensional WL algorithm. Instead of nodes
of the graphs, they operate on constant sized tuples or sets of vertices. A flexible
architecture for such higher-dimensional GNNs is proposed in [78].
15
16. 4 Counting Homomorphisms
Most of the graph kernels and also some of the node embedding techniques are based
on counting occurrences of substructures like walks, cycles, or trees. There are differ-
ent ways of embedding substructures into a graph. For example, walks and paths are
the same structures, but we allow repeated vertices in a walk. Formally, “walks” are
homomorphic images of path graphs, whereas “paths” are embedded path graphs. It
turns that homomorphisms and homomorphic images give us a very robust and flexible
“basis” for counting all kinds of substructures [30].
A homomorphism from a graph F to a graph G is a mapping h from the nodes of
F to the nodes of G such that for all edges uu0 of F the image h(u)h(u0) is an edge of
G. On labelled graphs, homomorphisms have to preserve vertex and edge labels, and on
directed graphs they have to preserve the edge direction. Of course we can generalise
homomorphisms to arbitrary relational structures, and we remind the reader of the close
connection between homomorphisms and conjunctive queries. We denote the number of
homomorphisms from F to G by hom(F, G).
Example 4.1. For the graph G shown in Figure 5 we have
hom
, G
= 18, hom
, G
= 114.
To calculate these numbers, we observe that for the star Sk (tree of height 1 with k
leaves) we have hom(Sk, G) =
P
v∈V (G) degG(v)k.
For every class F of graphs, the homomorphism counts hom(F, G) give a graph em-
bedding HomF defined by
HomF(G) := hom(F, G) F ∈ F
for all graphs G. If F is infinite, the latent space RF of the embedding HomF is an infinite
dimensional vector space. By suitably scaling the infinite series involved, we can define
an inner product on a subspace HF of RF that includes the range of HomF. This also
gives us a graph kernel. One way of making this precise is as follows. For every k, we
let Fk be the set of all F ∈ F of order |F| := |V (F)| = k. Then we let
KF(G, H) :=
∞
X
k=1
1
|Fk|
X
F∈Fk
1
kk
hom(F, G) · hom(F, H). (4.1)
There are various other ways of doing this, for example, rather than looking at the
sum over all F ∈ Fk we may look at the maximum. In practice, one will simply cut
off of the infinite series and only consider a finite subset of F. A problem with using
homomorphism vectors as graph embeddings is that the homomorphism numbers quickly
get tremendously large. In practice, we take logarithms of theses numbers, possibly
scaled by the size of the graphs from F. So, a practically reasonable graph embedding
16
17. based on homomorphism vectors would take a finite class F of graphs and map each G
to the vector 1
|F|
log hom(F, G)
F ∈ F
.
Initial experiments show that this graph embedding performs very well on downstream
classification tasks even if we take F to be a small class (of size 20) of graphs consisting
of binary trees and cycles. This is a good indication that homomorphism vectors extract
relevant features from a graph. Note that the size of the class F is the dimension of the
feature space.
Apart from these practical considerations, homomorphisms vectors have a beautiful
theory that links them to various natural notions of similarity between structures, in-
cluding indistinguishability by the Weisfeiler-Leman algorithm.
4.1 Homomorphism Indistinguishability
Two graphs G and H are homomorphism-indistinguishable over a class F of a graphs if
HomF(G) = HomF(H). Lovász proved that homomorphism indistinguishability over the
class G of all graphs corresponds to isomorphism.
Theorem 4.2 ([65]). For all graphs G and H,
HomG(G) = HomG(H) ⇐⇒ G and H are isomorphic.
Proof. The backward direction is trivial. For the forward direction, suppose that HomG(G) =
HomG(H), that is, for all graphs F it holds that hom(F, G) = hom(F, H).
We can decompose every homomorphism h : F → F0 as h = f ◦ g such that for some
graph F00:
• g : F → F00 is an epimorphism, that is, a homomorphism such that for every
v ∈ V (F00) there is a u ∈ V (F) with g(u) = v and for every vv0 ∈ E(F00) there is
a uu0 ∈ E(F) with g(u) = v and g(u0) = v0;
• f : F00 → F0 is an embedding (or monomorphism), that is, a homomorphism such
that f(u) 6= f(u0) for all u 6= u0.
Note that the graph F00 is isomorphic to the image h(F) and thus unique up to isomor-
phism. Moreover, there are precisely
aut(F00
) = number of automorphisms of F00
isomorphisms from F00 to h(F). This means that there are aut(F00) pairs (f, g) such that
h = f ◦ g and g : F → F00 is an epimorphism and f : F00 → F0 is an embedding. Thus
we can write
hom(F, F0
) =
X
F00
1
aut(F00)
· epi(F, F00
) · emb(F00
, F0
), (4.2)
where epi(F, F00) is the number of epimorphisms from F onto F00, emb(F00, F0) is the
number of embeddings of F00 into F0, and the sum ranges over all isomorphism types of
17
18. G H
Figure 6: Co-spectral graphs
graphs F00. Furthermore, as epi(F, F00) = 0 if |F00| |F|, we can restrict the sum to F00
of order |F00| ≤ |F|.
Let F1, . . . , Fm be an enumeration of all graphs of order at most n := max{|G|, |H|}
such that each graph of order at most n is isomorphic to exactly one graph in this list and
that i ≤ j implies |Fi| |Fj| or |Fi| = |Fj| and kFik := |E(Fi)| ≤ kFjk. Let HOM be the
matrix with entries HOM ij := hom(Fi, Fj), P the matrix with entries Pij := epi(Fi, Fj),
M the matrix with entries Mij := emb(Fi, Fj), and D the diagonal matrix with entries
Dii := 1
aut(Fi). Then (4.2) yields the following matrix equation;
HOM = P · D · M. (4.3)
The crucial observation is that P is an lower triangular matrix, because epi(Fi, Fj) 0
implies |Fi| ≥ |Fj| and kFik ≥ kFjk. Moreover, P has positive diagonal entries, because
epi(Fi, Fi) ≥ 1. Similarly, M is an upper triangular matrix with positive diagonal entries.
Thus P and M are invertible. The diagonal matrix D is invertible as well, because
Dii = aut(Fi) ≥ 1. Thus the matrix HOM is invertible.
As |G|, |H| ≤ n there are graphs Fi, Fj such that G is isomorphic to Fi and H is
isomorphic to Fj. Since hom(Fk, G) = hom(Fk, H) for all k by the assumption of the
lemma, the ith and jth column of HOM are identical. As HOM is invertible, this implies
that i = j and thus that G and H are isomorphic.
The theorem can be seen as a the starting point for the theory of graph limits [19, 67,
68]. The graph embedding HomG maps graphs into an infinite dimensional real vector
space, which can be turned into a Hilbert space by defining a suitable inner product.
This transformation enables us to analyse graphs with methods of linear algebra and
functional analysis and, for example, to consider convergent sequences of graphs and
their limits, called graphons (see [67]).
However, not only the “full” homomorphism vector HomG(G) of a graph G, but also
its projections HomF(G) to natural classes F capture very interesting information about
G. A first result worth mentioning is the following. This result is well-known, though
usually phrased differently. Two graphs are co-spectral if their adjacency matrices have
the same eigenvalues with the same multiplicities. Figure 6 shows two graph that are
co-spectral, but not isomorphic.
Theorem 4.3 (Folklore). For all graphs G and H,
HomC(G) = HomC(H) ⇐⇒ G and H are co-spectral.
Here C denotes the class of all cycles.
18
19. Proof sketch. Observe that for the cycle Ck of length k we have hom(Ck, G) := trace(Ak),
where A is the adjacency matrix of G. It is well-known that the trace of a symmetric real
matrix is the sum of its eigenvalues and that the eigenvalues of Ak are the kth powers
of the eigenvalues of A. This immediately implies the backward direction. By a simple
linear-algebraic argument, it also implies the forward direction.
Dvorák [33] proved that homomorphism counts of trees, and more generally, graphs
of bounded tree width link homomorphism vectors to the Weisfeiler-Leman algorithm.
Theorem 4.4 ([33]). For all graphs G and H and all k ≥ 1,
HomTk
(G) = HomTk
(H) ⇐⇒ k-WL does not distinguish G and H.
Here Tk denotes the class of all graphs of tree width at most k.
To prove the theorem, it suffices to consider connected graphs in Tk, because for a
graph F with connected components F1, . . . , Fm we have hom(F, G) =
Qm
i=1 hom(Fi, G).
The connected graphs of tree width 1 are the trees. We sketch a proof of the theorem
for trees in Section 4.4.
In combination with Theorem 3.2, Theorem 4.4 implies the following.
Corollary 4.5. For all graphs G and H,
HomT(G) = HomT(H) ⇐⇒ G and H are fraction-
ally isomorphic, that
is, equations (3.2) and
(3.3) have a nonnega-
tive rational solution.
Here T denotes the class of all trees.
Remarkably, for paths we obtain a similar characterisation that involves the same
equations, but drops the nonnegativity constraint.
Theorem 4.6 ([32]). For all graphs G and H,
HomP(G) = HomP(H) ⇐⇒ equations (3.2) and
(3.3) have a
rational solution.
Here P denotes the class of all paths.
While the proof of Theorem 4.4 relies on techniques similar to the proof of Theo-
rem 4.2, the proof of Theorem 4.3 is based on spectral techniques similar to the proof of
Theorem 4.3.
Example 4.7. For the co-spectral graphs G, H shown in Figure 6 we have
hom
, G
= 20, hom
, H
= 16.
Thus HomP(G) 6= HomP(H).
19
20. G H
Figure 7: Two graphs that are homorphism-indistinguishable over the class of paths
Example 4.8. Figure 7 shows graphs G, H with
HomP(G) = HomP(H).
Obviously, 1-WL distinguishes the two graphs. Thus HomT(G) 6= HomT(H). It can also
be checked that the graphs are not co-spectral. Hence HomC(G) 6= HomC(H)
Combined with Theorem 3.1, Theorem 4.4 implies the following correspondence be-
tween homomorphism counts of graphs of bounded tree width and the finite variable
fragments of the counting logic C introduced in Section 3.4.
Corollary 4.9. For all graphs G and H and all k ≥ 1,
HomTk
(G) = HomTk
(H) ⇐⇒ G and H are Ck+1
-equivalent.
This is interesting, because it shows that the homomorphism vector HomTk
(G) gives
us all the information necessary to answer queries expressed in the logic Ck+1
. Unfortu-
nately, the result does not tell us how to answer Ck+1
-queries algorithmically if we have
access to the vector HomTk
(G). To make the question precise, suppose we have oracle
access that allows us to obtain, for every graph F ∈ Tk, the entry hom(F, G) of the
homomorphism vector. Is it possible to answer a Ck+1
-query in polynomial time (either
with respect to data complexity or combined complexity)?
Arguably, from a logical perspective it is even more natural to restrict the quantifier
rank (maximum number of nested quantifiers) in a formula rather than the number of
variables. Let Ck be the fragment of C consisting of all formulas of quantifier rank at most
k. We obtain the following characterisation of Ck-equivalence in terms of homomorphism
vectors over the class of graphs of tree depth at most k. Tree depth, introduced by Nešetřil
and Ossona de Mendez [81], is another structural graph parameter that has received a
lot of attention in recent years (e.g. [9, 23, 28, 35, 34]).
Theorem 4.10 ([42]). For all graphs G and H and all k ≥ 1,
HomTDk
(G) = HomTDk
(H) ⇐⇒ G and H are Ck-.equivalent.
Here TDk denotes the class of all graphs of tree depth at most k.
Another very remarkable result, due to Mančinska and Roberson [72], states that two
graphs are homomorphism-indistinguishable over the class of all planar graphs if and
only if they are quantum isomorphic. Quantum isomorphism, introduced in [4], is a
complicated notion that is based on similar systems of equations as (3.2) and (3.3) and
their generalisation characterising higher-dimensional Weisfeiler-Leman indistinguisha-
bility, but with non-commutative variables ranging over the elements of a C∗-algebra.
20
21. 4.2 Beyond Undirected Graphs
So far, we have only considered homomorphism indistinguishability on undirected graphs.
A few results are known for directed graphs. In particular, Theorem 4.2 directly extends
to directed graphs. Actually, we have the following stronger result, also due to Lovász
[66] (also see [14]).
Theorem 4.11 ([66]). For all directed graphs G and H,
HomDA(G) = HomDA(G) ⇐⇒ G and H are isomorphic.
Here DA denote the class of all directed acyclic graphs.
It is straightforward to extend Theorem 4.2, Theorem 4.4 (for the natural generali-
sation of the WL algorithms to relational structures), and Theorem 4.10 to arbitrary
relational structures. This is very useful for binary relational structures such as knowl-
edge graphs. But for relations of higher arity one may consider another version based
on the incidence graph of a structure.
Let σ = {R1, . . . , Rm} be a relational vocabulary, where Ri is a ki-ary relation symbol.
Let k be the maximum of the ki. We let σI := {E1, . . . , Ek, P1, . . . , Pm}, where the
Ei are binary and the Pj are unary relation symbols. With every σ-structure A =
(V (A), R1(A), . . . , Rm(A)) we associates a σI-structure AI called the incidence structure
of A as follows:
• the universe of AI is
V (AI) := V (A) ∪
m
[
i=1
(Ri, v1, . . . , vki
(v1, . . . , vki
) ∈ Ri(A) ;
• for 1 ≤ j ≤ k, the relation Ej(AI) consists of all pairs
vj, (Ri, v1, . . . , vki
)
for (Ri, v1, . . . , vki
) ∈ V (AI) with ki ≥ j;
• for 1 ≤ i ≤ m, the relation Pi consist of all (Ri, v1, . . . , vki
) ∈ V (AI).
With this encoding of general structures as binary incidence structures we obtain the
following corollary.
Corollary 4.12. For all σ-structures A and B, the following are equivalent.
(1) HomT(σI )(AI) = HomT(σI )(BI), where T(σI) denotes the class of all σI-structures
whose underlying (Gaifman) graph is a tree;
(2) AI and BI are not distinguished by 1-WL;
(3) AI and BI are C2
-equivalent.
21
22. Böker [14] gave a generalisation of Theorem 4.4 to hypergraphs that is also based on
incidence graphs.
In a different direction, we can generalise the results to weighted graphs. Let us
consider undirected graphs with real-valued edge weights. We can also view them as
symmetric matrices over the reals. Recall that we denote the weight of an edge uv by
α(u, v) and that weighted 1-WL refines by sums of edge weights (instead of numbers of
edges). Let F be an unweighted graph and G a weighted graph. For every mapping
h : V (F) → V (G) we let
wt(h) :=
Y
uu0∈E(F)
α(h(u), h(u0
)).
As α(v, v0) = 0 if and only if vv0 6∈ E(G), we have wt(h) 6= 0 if and only if h is a
homomorphism from F to G; the weight of this homomorphism is the product of the
weights of the edges in its image. We let
hom(F, G) :=
X
h:V (F)→V (G)
wt(h).
In statistical physics, such sum-product functions are known as partition functions. For
a class F of graphs, we let
HomF(G) := hom(F, G) F ∈ F
.
Theorem 4.13 ([22]). For all weighted graphs G and H, the following are equivalent.
(1) HomT(G) = HomT(H) (recall that T denotes the class of all trees);
(2) G and H are not distinguished by weighted 1-WL;
(3) equations (3.2) and (3.3) have a nonnegative rational solution.
4.3 Complexity
In general, counting the number of homomorphisms from a graph F to a graph G is a #P-
hard problem. Dalmau and Jonsson [31] proved that, under the reasonable complexity
theoretic assumption #W[1] 6= FPT from parameterised complexity theory, for all classes
F of graphs, computing hom(F, G) given a graph F ∈ F and an arbitrary graph G, is
in polynomial time if and only if F has bounded tree width. This makes Theorem 4.4
even more interesting, because the entries of a homomorphism vector HomF(G) are
computable in polynomial time precisely for bounded tree width classes.
However, the computational problem we are facing is not to compute individual entries
of a homomorphism vectors, but to decide if two graphs have the same homomorphism
vector, that is, if they are homomorphism indistinguishable. The characterisation the-
orems of Section 4.1 imply that homomorphism indistinguishability is polynomial-time
decidable over the classes P of paths, C of cycles, T of trees, Tk of graphs of tree width at
most k, TDk of tree depth at most k. Moreover, homomorphism indistinguishability over
22
23. the class of G of all graphs is decidable in quasi-polynomial time by Babai’s [7] celebrated
result that graph isomorphism is decidable in quasi-polynomial time. It was proved in
[15] that homomorphism indistinguishability over the class of complete graphs is com-
plete for the complexity class C=P, which implies that it is co-NP hard, and that there
is a polynomial time decidable class F of graphs of bounded tree width such that homo-
morphism indistinguishability over F is undecidable. Quite surprisingly, the fact that
quantum isomorphism is undecidable [4] implies that homomorphism distinguishability
over the class of planar graphs is undecidable.
4.4 Homomorphism Node Embeddings
So far, we have defined graph and structure embeddings based on homomorphism vec-
tors. But we can also use homomorphism vectors to define node embeddings. A rooted
graph is a pair (G, v) where G is a graph and v ∈ V (G). For two rooted graphs (F, u)
and (G, v), by hom(F, G; u 7→ v) we denote the number of homomorphism h from F to
G with h(u) = v. For a class F∗ of rooted graphs and a rooted graph (G, v), we let
HomF∗ (G, v) := hom(F, G; u 7→ v) (F, u) ∈ F∗
.
If we keep the graph G fixed, this gives us an embedding of the nodes of G into an infinite
dimensional vector space. Note that in the terminology of Section 2.1, this embedding is
“inductive” and not “transductive”, because it is not tied to a fixed graph. (Nevertheless,
the term “inductive” is not fitting very well here, because the embedding is not learned.)
In the same way we defined graph kernels based on homomorphism vectors of graphs,
we can now define node kernels.
It is straightforward to generalise Theorem 4.2 to rooted graphs, showing that for all
rooted graphs (G, v) and (H, w) it holds that
HomG∗ (G, v) = HomG∗ (H, w) ⇐⇒ there is an isopmor-
phism f from G to H
with f(v) = w.
Here G∗ denote the class of all rooted graphs. Maybe the easiest way to prove this is by
a reduction to node-labelled graphs.
Another key result of Section 4.1 that can be adapted to the node setting is Theo-
rem 4.4. We only state the version for trees.
Theorem 4.14. For all graphs G, H and all v ∈ V (G), w ∈ V (H), the following are
equivalent.
(1) HomT∗ (G, v) = HomT∗ (H, w) for the class T∗ of all rooted trees;
(2) 1-WL assigns the same colour to v and w.
This result is implicit in the proof of Theorem 4.4 (see [33, 32]). In fact, it can be
viewed as the graph theoretic core of the proof. We sketch the proof here and also show
how to derive Theorem 4.4 (for trees) from it.
23
24. Proof sketch. Recall from Section 3.5 that we can view the colours assigned by 1-WL
as rooted trees (see Figure 5). For the kth round of WL, this is a tree of height k, and
for the stable colouring we can view it as an infinite tree. Suppose now that the colour
of a vertex v of G is T. The crucial observation is that for every rooted tree (S, r) the
number hom(S, G; r 7→ v) is precisely the number of mappings h : V (S) → V (T) that
map the root r of S to the root of T and, for each node s ∈ V (S), map the children of
s to the children of h(s) in T. Let us call such mappings rooted tree homomorphisms.
Implication (2) =⇒ (1) follows directly from this observation. Implication (1) =⇒ (2)
follows as well, because by an argument similar to that used in the proof of Theorem 4.2
it can be shown that for distinct rooted trees T, T0 there is a rooted tree that has distinct
numbers of rooted tree homomorphisms to T, T0.
Corollary 4.15. For all graphs G, H and all v ∈ V (G), w ∈ V (H), the following are
equivalent.
(1) HomT∗ (G, v) = HomT∗ (H, w);
(2) for all formulas ϕ(x) of the logic C2
,
G |= ϕ(v) ⇐⇒ H |= ϕ(w).
Note that the node embeddings based on homomorphism vectors are quite different
from the node embeddings described in Section 2.1. They are solely based on structural
properties and ignore the distance information. Results like Corollary 4.15 show that the
structural information captured by the homomorphism-based embeddings in principle
enables us to answer queries directly on the embedding, which may be more useful than
distance information in database applications.
We close this section by sketching how Theorem 4.4 (for trees) follows from Theo-
rem 4.14.
Proof sketch of Theorem 4.4 (for trees). Let G, H be graphs. We need to prove
HomT(G) = HomT(H) ⇐⇒ 1-WL does not distinguish G and H. (4.4)
Without loss of generality we assume that V (G) ∩ V (H) = ∅. For x, y ∈ V (G) ∪ V (H),
we write x ∼ y if 1-WL assigns the same colour to x and y. By Theorem 4.14, for
X, Y ∈ {G, H} and x ∈ V (X), y ∈ V (Y ) we have x ∼ y if and only if HomT∗ (X, x) =
HomT∗ (X, y). Let R1, . . . , Rn be the ∼-equivalence classes, and for every j, let Pj :=
Rj ∩ V (G) and Qj := Rj ∩ V (H). Furthermore, let pj := |Pj| and qj := |Qj|.
We first prove the backward direction of (4.4). Assume that 1-WL does not distinguish
G and H. Then pj = qj for all j ∈ [n]. Let T be a tree, and let t ∈ V (T). Let
24
25. hj := hom(T, X; t 7→ x) for x ∈ Rj and X ∈ {G, H} with x ∈ V (X). Then
hom(T, G) =
X
v∈V (G)
hom(T, G; t 7→ v)
=
n
X
j=1
pjhj =
n
X
j=1
qjhj
=
X
w∈V (H)
hom(T, H; t 7→ w) = hom(T, H).
Since T was arbitrary, this proves HomT(G) = HomT(H).
The proof of the forward direction of (4.4) is more complicated. Assume HomT(G) =
HomT(H). There is a finite collection of m ≤ n
2
rooted trees (T1, r1), . . . , (Tm, rm) such
that for all X, Y ∈ {G, H} and x ∈ V (X), y ∈ V (Y ) we have x ∼ y if and only if for all
i ∈ [m],
hom(Ti, X; ri 7→ x) = hom(Ti, Y ; ri 7→ y).
Let aij := hom(Ti, X; ri 7→ x) for x ∈ Rj and X ∈ {G, H} with x ∈ V (X). Then for all
i we have
n
X
j=1
aijpj = hom(Ti, G) = hom(Ti, H) =
n
X
j=1
aijqj
Unfortunately, the matrix A = (aij)i∈[m],j∈[n] is not necessarily invertible, so we cannot
directly conclude that pj = qj for all j. All we know is that for any two columns of the
matrix there is a row such that the two columns have distinct values in that row. It turns
out that this is sufficient. For every vector d = (d1, . . . , dm) of nonnegative integers, let
(T(d), r(d)) be the rooted tree obtained by taking the disjoint union of di copies of Ti for
all i and then identifying the roots of all these trees. It is easy to see that
hom(T(d)
, X; r(d)
7→ x) =
m
Y
i=1
hom(Ti, X; ri 7→ x).
Thus, letting a
(d)
j :=
Qm
i=1 adi
ij , we have
n
X
j=1
a
(d)
j pj = hom(T(d)
, G) = hom(T(d)
, H) =
n
X
j=1
a
(d)
j qj.
Using these additional equations, it can be shown that pj = qj for all j (see [43,
Lemma 4.2]). Thus WL does not distinguish G and H.
4.5 Homomorphisms and GNNs
We have a correspondence between homomorphism vectors and and the Weisfeiler-Leman
algorithm (Theorems 4.4 and 4.14) and between the the WL algorithm and GNNs (see
Section 3.6). This also establishes a correspondence between homomorphism vectors and
GNNs. More directly, the correspondence between GNNs and homomorphism counts is
also studied in [69].
25
26. 5 Similarity
The results described in the previous section can be interpreted as results on the expres-
siveness of homomorphism-based embeddings of structures and their nodes. However,
all these results only show what it means that two objects are mapped to the same ho-
momorphism vector. More interesting is the similarity measure the vector embeddings
induce via some an inner product/norm on the latent space (see (4.1)). We can speculate
that, given the nice results regarding equality of vectors, the similarity measure will have
similarly nice properties. Let me propose the following, admittedly vague, hypothesis.
For suitable classes F, the homomorphism embedding HomF combined with a
suitable inner product on the latent space induces a natural similarity measure
on graphs or relational structures.
From a practical perspective, we could support this hypothesis by showing that the
vector embeddings give good results when combined with similarity based downstream
tasks. As mentioned earlier, initial experiments show that homomorphism vectors in
combination with support vector machines perform well on standard graph classification
benchmarks. But a more thorough experimental study will be required to have conclusive
results.
From a theoretical perspective, we can compare the homomorphism-based similar-
ity measures with other similarity measures for graphs and discrete structures. If we
can prove that they coincide or are close to each other, then this would support our
hypothesis.
5.1 Similarity from Matrix Norms
A standard way of defining similarity measures on graphs is based on comparing their ad-
jacency matrices. Let us briefly review a few matrix norms. Recall the standard `p-vector
norm kxkp := (
P
i |xi|p)1/p
. 1 The two best-known matrix norms are the Frobenius
norm kMkF :=
qP
i,j M2
ij and the spectral norm kMkh2i := supx∈Rn,kxk2=1 kMxk2.
More generally, for every p 0 we define
kMkp :=
X
i,j
|Mij|p
1/p
(so kMkF = kMk2) and
kMkhpi := sup
x∈Rn,kxkp=1
kMxkp, .
Thus both kMkp and kMkhpi are derived from the `p-vector norm. For kMkp, the matrix
is simply flattened into a vector, and for kMkhpi the matrix is viewed as a linear operator.
1
Note that kxk2 is just the Euclidean norm, which we denoted by kxk earlier in this paper.
26
27. Another matrix norm that is interesting here is the cut norm defined by
kMk := max
S,T
X
i∈S,j∈T
Mij ,
where S, T range over all subsets of the index set of the matrix. Observe that for
M ∈ Rn×n we have
kMk ≤ kMk1 ≤ nkMkF ,
where the second inequality follows from the Cauchy-Schwarz inequality. If we compare
matrices of different size, it can be reasonable to scale the norms by a factor depending
on n.
For technical reasons, we only consider matrix norms k · k that are invariant under
permutations of the rows and columns, that is,
kMk = kMPk = kQMk for all permutation matrices P, Q. (5.1)
It is easy to see that the norms discussed above have this property.
Now let G, H be graphs with vertex sets V, W and adjacency matrices A ∈ RV ×V , B ∈
RW×W . For convenience, let us assume that |G| = |H| =: n. Then both A, B are n × n-
matrices, and we can compare them using a matrix norm. However, it does not make
much sense to just consider kA − Bk, because graphs do not have a unique adjacency
matrix, and even if G and H are isomorphic, kA−Bk may be large. Therefore, we align
the two matrices in an optimal way by permuting the rows and columns of A. For a
matrix norm k · k, we define a graph distance measure distk·k
distk·k(G, H) := min
P∈{0,1}V ×W
permutation matrix
kP
AP − Bk.
It follows from (5.1) that distk·k is well-defined, that is, does not depend on the choice
of the particular adjacency matrices A, B. It also follows from (5.1) and that fact that
P−1 = P for permutation matrices that
distk·k(G, H) = min
P∈{0,1}V ×W
permutation matrix
kAP − PBk, (5.2)
which is often easier to work with because the expression AP − PB is linear in the
“variables” Pij. To simplify the notation, we let distp := distk·kp
and disthpi := distk·khpi
for all p, and we let dist := distk·k
.
The distances defined from the `1-norm have natural interpretations as edit distances.
dist1(G, H) is twice the number of edges that need to be flipped to turn G into a graph
isomorphic to H, and disth1i(G, H) is the maximum number of edges incident with a
27
28. single vertex we need to flip to turn G into a graph isomorphic to H. Formally,
dist1(G, H) = 2 min
f:V →W
bijection
f(v)f(v0
) vv0
∈ E(G) 4E(H) . (5.3)
disth1i(G, H) =
min
f:V →W
bijection
max
v∈V
f(v0
) v0
∈ NG(v) 4
w0
w ∈ NH(f(v)) . (5.4)
Here 4 denotes the symmetric difference. Equation (5.3) is obvious; the factor ’2’
comes from the fact that we regard (undirected) edges as 2-element sets and not as
ordered pairs. To prove (5.4), we observe that for every matrix M ∈ Rn×n it holds that
kMkh1i = maxj∈[n]
Pn
i=1 |Mij|.
Despite these intuitive interpretations, it is debatable how much “semantic relevance”
these distance measures have. How similar are two graphs that can be transformed into
each other by flipping, say, 5% of the edges? Again, the answer to this question may
depend on the application context.
A big disadvantage the graph distance measures based on matrix norms have is that
computationally they are highly intractable (see, for example, [3] and the references
therein). It is even NP-hard to compute the distance between two trees (see [46] for
Frobenius distance and [38] for the distances based on operator norms), and distances
are hard to approximate. The problem of computing these distances is related to the
maximisation version of the quadratic assignment problem (see [70, 79]), a notoriously
hard combinatorial optimisation problem. Better behaved is the cut-distance dist; at
least it can be approximated within a factor of 2 [2].
The main source of hardness is the minimisation over the unwieldy set of all permuta-
tions (or permutation matrices). To alleviate this hardness, we can relax the integrality
constraints and minimise over the convex set of all doubly stochastic matrices instead.
That is, we define a relaxed distance measure
g
distk·k(G, H) = min
X∈{0,1}V ×W
doubly stochastic
kAX − XBk. (5.5)
Note that g
distk·k is only a pseudo-metric: the distance between nonisomorphic graphs
may be 0. Indeed, it follows from Theorem 3.2 that g
distk·k(G, H) = 0 if and only if G
and H are fractionally isomorphic. The advantage of these “relaxed” distances is that
for many norms k · k, computing g
distk·k is a convex minimisation problem that can be
solved efficiently.
So far, we have only discussed distance measures for graphs of the same order. To
extend these distance measures to arbitrary graphs, we can replace vertices by sets of
identical vertices in both graphs to obtain two graphs whose order is the least common
multiple of the orders of the two initial graphs (see [67, Section 8.1] for details).
Note that these matrix based similarity measures are only defined for (possibly weighted)
graphs. In particular for the operator norms, it is not clear how to generalise them to
relational structures, and if such a generalisation would even be meaningful.
28
29. 5.2 Comparing Homomorphism Distances and Matrix Distances
It would be very nice if we could establish a connection between graph distance mea-
sures based on homomorphism vectors and those based on matrix norms. At least one
important result in this direction exists: Lovász [67] proves an equivalence between the
cut-distance of graphs and a distance measure derived from a suitably scaled homomor-
phism vector HomG.
It is tempting to ask if a similar correspondence can be established between g
distk·k
and HomF. There are many related question that deserve further attention.
6 Concluding Remarks
In this paper, we gave an overview of embeddings techniques for graphs and relational
structures. Then we discussed two related theoretical approaches, the Weisfeiler-Leman
algorithm with its various ramifications and homomorphism vectors. We saw that they
have a rich and beautiful theory that leads to new, generic families of vector embeddings
and helps us to get a better understanding of some of the techniques used in practice,
for example graph neural networks.
Yet we have also seen that we are only at the beginning and many questions remain
open, in particular when it comes to similarity measures defined on graphs and relational
structures.
From a database perspective, it will be important to generalise the embedding tech-
niques to relations of higher arities, which is not as trivial as it may seem (and where
surprisingly little has been done so far). A central question is then how to query the
embedded data. Which queries can we answer at all when we only see the vectors in
latent space? How do imprecisions and variations due to randomness affect the outcome
of such query answers? Probably, we can only answer queries approximately, but what
exactly is the semantics of such approximations? These are just a few questions that
need to be answered, and I believe they offer very exciting research opportunities for
both theoreticians and practitioners.
Acknowledgements
This paper was written in strange times during COVID-19 lockdown. I appreciate it
that some of my colleagues nevertheless took the time to answer various questions I
had on the topics covered here and to give valuable feedback on an earlier version of
this paper. In particular, I would like to thank Pablo Barceló, Neta Friedman, Benny
Kimelfeld, Christopher Morris, Petra Mutzel, Martin Ritzert, and Yufei Tao.
References
[1] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A.J. Smola.
Distributed large-scale natural graph factorization. In Proceedings of the 22nd
International World Wide Web Conference, pages 37–48, 2013.
29
30. [2] N. Alon and A. Naor. Approximating the cut-norm via Grothendieck’s inequality.
SIAM Journal on Computing, 35:787–803, 2006.
[3] V. Arvind, J. Köbler, S. Kuhnert, and Y. Vasudev. Approximate graph isomor-
phism. In B. Rovan, V. Sassone, and P. Widmayer, editors, Proceedings of the 37th
International Symposium on Mathematical Foundations of Computer Science, vol-
ume 7464 of Lecture Notes in Computer Science, pages 100–111. Springer Verlag,
2012.
[4] A. Atserias, Laura Mančinska, D.E. Roberson, R. Šámal, S. Severini, and
A. Varvitsiotis. Quantum and non-signalling graph isomorphisms. Journal of
Combinatorial Theory, Series B, 136:289–328, 2019.
[5] A. Atserias and E. Maneva. Sherali–Adams relaxations and indistinguishability in
counting logics. SIAM Journal on Computing, 42(1):112–137, 2013.
[6] A. Atserias and J. Ochremiak. Definable ellipsoid method, sums-of-squares proofs,
and the isomorphism problem. In Proceedings of the 33rd Annual ACM/IEEE
Symposium on Logic in Computer Science, pages 66–75, 2018.
[7] L. Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the 48th
Annual ACM Symposium on Theory of Computing (STOC ’16), pages 684–697,
2016.
[8] L. Babai, P. Erdös, and S. Selkow. Random graph isomorphism. SIAM Journal
on Computing, 9:628–635, 1980.
[9] M. Bannach and T. Tantau. Parallel multivariate meta-theorems. In J. Guo and
D. Hermelin, editors, Proceedings of the 11th International Symposium on Pa-
rameterized and Exact Computation, volume 63 of LIPIcs, pages 4:1–4:17. Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
[10] P. Barceló, E.V. Kostylev, M. Monet, J. Pérez, J. Reutter, and J.P. Silva. The logi-
cal expressiveness of graph neural networks. In Proceedings of the 8th International
Conference on Learning Representations, 2020.
[11] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and
data representation. Neural Computation, 15(6):1373–1396, 2003.
[12] C. Berkholz, P. Bonsma, and M. Grohe. Tight lower and upper bounds for the com-
plexity of canonical colour refinement. Theory of Computing Systems, 60(4):581–
614, 2017.
[13] C. Berkholz and M. Grohe. Limitations of algebraic approaches to graph isomor-
phism testing. In M.M. Halldórsson, K. Iwama, N. Kobayashi, and B. Speckmann,
editors, Proceedings of the 42nd International Colloquium on Automata, Languages
and Programming, Part I, volume 9134 of Lecture Notes in Computer Science,
pages 155–166. Springer Verlag, 2015.
30
31. [14] J. Böker. Color refinement, homomorphisms, and hypergraphs. In I. Sau and
D.M. Thilikos, editors, Proceedings of the 45th International Workshop on Graph-
Theoretic Concepts in Computer Science, volume 11789 of Lecture Notes in Com-
puter Science, pages 338–350. Springer, 2019.
[15] J. Böker, Y. Chen, M. Grohe, and G. Rattan. The complexity of homomorphism
indistinguishability. In P. Rossmanith, P. Heggernes, and J.-P. Katoen, editors,
Proceedings of the 44th International Symposium on Mathematical Foundations of
Computer Science, volume 138 of Leibniz International Proceedings in Informatics
(LIPIcs), pages 54:1–54:13. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik,
2019.
[16] R. Bordawekar and O. Shmueli. Using word embedding to enable semantic queries
in relational databases. In Proceedings of the Data Management for End to End
Learning Work Workshop, SIGMOD’17, 2017.
[17] R. Bordawekar and O. Shmueli. Exploiting latent information in relational
databases via word embedding and application to degrees of disclosure. In Pro-
ceedings of the 9th Biennial Conference on Innovative Data Systems Research,
2019.
[18] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Trans-
lating embeddings for modeling multi-relational data. In Advances in neural in-
formation processing systems, pages 2787–2795, 2013.
[19] C. Borgs, J. Chayes, L. Lovász, V. Sós, B. Szegedy, and K. Vesztergombi. Graph
limits and parameter testing. In Proceedings of the 38th Annual ACM Symposium
on Theory of Computing, pages 261–270, 2006.
[20] K.M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proceedings
of the 5th IEEE International Conference on Data Mining, pages 74–81, 2005.
[21] J. Bourgain. On Lipschitz embeddings of finite metric spaces in Hilbert spaces.
Israel Journal of Mathematics, 52(1-2):46–52, 1985.
[22] A. Bulatov, M. Grohe, and G. Rattan. In preparation.
[23] J. Bulian and A. Dawar. Graph isomorphism parameterized by elimination dis-
tance to bounded degree. In M. Cygan and P. Heggernes, editors, Proceedings of
the 9th International Symposium on Parameterized and Exact Computation, vol-
ume 8894 of Lecture Notes in Computer Science, pages 135–146. Springer Verlag,
2014.
[24] J. Cai, M. Fürer, and N. Immerman. An optimal lower bound on the number of
variables for graph identification. Combinatorica, 12:389–410, 1992.
[25] S. Cao, W. Lu, and Q. Xu. GraRep: Learning graph representations with global
structural information. In Proceedings of the 24th ACM International on Confer-
ence on Information and Knowledge Management, pages 891–900, 2015.
31
32. [26] S. Cao, W. Lu, and Q. Xu. Deep neural networks for learning graph representa-
tions. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages
1145–1152, 2016.
[27] A. Cardon and M. Crochemore. Partitioning a graph in O(|A| log2 |V |). Theoretical
Computer Science, 19(1):85 – 98, 1982.
[28] Y. Chen and J. Flum. Tree-depth, quantifier elimination, and quantifier rank.
In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer
Science, pages 225–234, 2018.
[29] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–
297, 1995.
[30] R. Curticapean, H. Dell, and D. Marx. Homomorphisms are a good basis for count-
ing small subgraphs. In Proceedings of the 49th Annual ACM SIGACT Symposium
on Theory of Computing (STOC ’17), pages 210–223, 2017.
[31] V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from
the other side. Theoretical Computer Science, 329(1-3):315–323, 2004.
[32] H. Dell, M. Grohe, and G. Rattan. Lovász meets Weisfeiler and Leman. In
I. Chatzigiannakis, C. Kaklamanis, D. Marx, and D. Sannella, editors, Proceedings
of the 45th International Colloquium on Automata, Languages and Programming
(Track A), volume 107 of LIPIcs, pages 40:1–40:14. Schloss Dagstuhl - Leibniz-
Zentrum für Informatik, 2018.
[33] Z. Dvorák. On recognizing graphs by numbers of homomorphisms. Journal of
Graph Theory, 64(4):330–342, 2010.
[34] M. Elberfeld, M. Grohe, and T. Tantau. Where first-order and monadic second-
order logic coincide. ACM Transaction on Computational Logic, 17(4), 2016. Ar-
ticle No. 25.
[35] M. Elberfeld, A. Jakoby, and T. Tantau. Algorithmic meta theorems for circuit
classes of constant and logarithmic depth. In C. Dürr and T. Wilke, editors, Pro-
ceedings of the 29th International Symposium on Theoretical Aspects of Computer
Science, volume 14 of LIPIcs, pages 66–77. Schloss Dagstuhl - Leibniz-Zentrum
fuer Informatik, 2012.
[36] C. Gallicchio and A. Micheli. Graph echo state networks. In Proceedings of the
IEEE International Joint Conference on Neural Networks, 2010.
[37] T. Gärtner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and
efficient alternatives. In Learning theory and kernel machines, pages 129–143.
Springer Verlag, 2003.
[38] T. Gervens. Spectral graph similarity. Master Thesis at RWTH Aachen, 2018.
32
33. [39] E. Grädel, M. Grohe, B. Pago, and W. Pakusa. A finite-model-theoretic view on
propositional proof complexity. Logical Methods in Computer Science, 15(1):4:1–
4:53, 2019.
[40] E. Grädel, P.G. Kolaitis, L. Libkin, M. Marx, J. Spencer, M.Y. Vardi, Y. Venema,
and S. Weinstein. Finite Model Theory and Its Applications. Springer Verlag,
2007.
[41] M. Grohe. Descriptive Complexity, Canonisation, and Definable Graph Structure
Theory, volume 47 of Lecture Notes in Logic. Cambridge University Press, 2017.
[42] M. Grohe. Counting bounded tree depth homomorphisms. ArXiv,
arXiv:2003.08164 [cs.LO], 2020.
[43] M. Grohe. Counting bounded tree depth homomorphisms. In Submitted, 2020.
[44] M. Grohe, K. Kersting, M. Mladenov, and E. Selman. Dimension reduction via
colour refinement. In A. Schulz and D. Wagner, editors, Proceedings of the 22nd
Annual European Symposium on Algorithms, volume 8737 of Lecture Notes in
Computer Science, pages 505–516. Springer-Verlag, 2014.
[45] M. Grohe and M. Otto. Pebble games and linear equations. Journal of Symbolic
Logic, 80(3):797–844, 2015.
[46] M. Grohe, G. Rattan, and G. Woeginger. Graph similarity and approximate iso-
morphism. In I. Potapov, P.G. Spirakis, and J. Worrell, editors, Proceedings of the
43rd International Symposium on Mathematical Foundations of Computer Science,
volume 117 of LIPIcs, pages 20:1–20:16. Schloss Dagstuhl - Leibniz-Zentrum für
Informatik, 2018.
[47] M. Grohe, P. Schweitzer, and D. Wiebking. Deep weisfeiler leman. ArXiv,
arXiv:2003.10935 [cs.LO], 2020.
[48] A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In
B. Krishnapuram, M. Shah, A.J. Smola, C.C. Aggarwal, D. Shen, and R. Ras-
togi:, editors, Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pages 855–864, 2016.
[49] W. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on
large graphs. In Proceedings of the 30th Annual Conference on Neural Information
Processing Systems, pages 1024–1034, 2017.
[50] W.L. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs:
methods and applications. ArXiv, arXiv:1709.05584 [cs.SI], 2017.
[51] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation,
9(8):1735–1780, 1997.
33
34. [52] T. Horváth, T. Gärtner, and S. Wrobel. Cyclic pattern kernels for predictive graph
mining. In Proceedings of the 10th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pages 158–167, 2004.
[53] N. Immerman and E. Lander. Describing graphs: A first-order approach to graph
canonization. In A. Selman, editor, Complexity theory retrospective, pages 59–81.
Springer-Verlag, 1990.
[54] P. Indyk. Algorithmic applications of low-distortion geometric embeddings. In
Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science,
pages 10–33, 2001.
[55] W. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert
space. Contemporary Mathematics, 26:189–206, 1984.
[56] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled
graphs. In Proceedings of the 20th International Conference on Machine Learning,
pages 321–328, 2003.
[57] K. Kersting, M. Mladenov, R. Garnet, and M. Grohe. Power iterated color refine-
ment. In P. Stone C.E. Brodley, editor, Proceedings of the 28th AAAI Conference
on Artificial Intelligence, pages 1904–1910, 2014.
[58] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolu-
tional networks. In Proceedings of the 5th International Conference on Learning
Representations, 2017.
[59] T.N. Kipf and M. Welling. Variational graph auto-encoders. ArXiv,
arXiv:1611.07308 [stat.ML], 2016.
[60] R. Kondor and J.D. Lafferty. Diffusion kernels on graphs and other discrete input
spaces. In Proceedings of the 19th International Conference on Machine Learning,
pages 315–322, 2002.
[61] N. Kriege, M. Neumann, C. Morris, K. Kersting, and P. Mutzel. A unifying view
of explicit and implicit feature maps of graph kernels. Data Mining and Knowledge
Discovery, 33(6):1505–1547, 2019.
[62] N.M. Kriege, F.D Johansson, and C. Morris. A survey on graph kernels. ArXiv,
arXiv:1903.11835 [cs.LG], 2019.
[63] J.B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric
hypothesis. Psychometrika, 29(1):1–27, 1964.
[64] N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of its
algorithmic applications. Combinatorica, 15:212–245, 1995.
[65] L. Lovász. Operations with structures. Acta Mathematica Hungarica, 18:321–328,
1967.
34
35. [66] L. Lovász. On the cancellation law among finite relational structures. Periodica
Mathematica Hungarica, 1(2):145–156, 1971.
[67] L. Lovász. Large Networks and Graph Limits. American Mathematical Society,
2012.
[68] L. Lovász and B. Szegedy. Limits of dense graph sequences. Journal of Combina-
torial Theory, Series B, 96(6):933–957, 2006.
[69] T. Maehara and H. NT. A simple proof of the universality of invariant/equivariant
graph neural networks. ArXiv, arXiv:1910.03802 [cs.LG], 2019.
[70] K. Makarychev, R. Manokaran, and M. Sviridenko. Maximum quadratic assign-
ment problem: Reduction from maximum label cover and lp-based approximation
algorithm. ACM Transactions on Algorithms, 10(4):18, 2014.
[71] P. Malkin. Sherali–adams relaxations of graph isomorphism polytopes. Discrete
Optimization, 12:73–97, 2014.
[72] L. Mančinska and D.E. Roberson. Quantum isomorphism is equivalent to equality
of homomorphism counts from planar graphs. ArXiv, arXiv:1910.06958v2 [quant-
ph], 2019.
[73] B.D. McKay and A. Piperno. Practical graph isomorphism, II. Journal of Symbolic
Compututation, 60:94–112, 2014.
[74] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean. Distributed repre-
sentations of words and phrases and their compositionality. In Proceedings of the
27th Annual Conference on Neural Information Processing Systems, pages 3111–
3119, 2013.
[75] H.L. Morgan. The generation of a unique machine description for chemical
structures—a technique developed at chemical abstracts service. Journal of Chem-
ical Documentation, 5(2):107–113, 1965.
[76] C. Morris, K. Kersting, and P. Mutzel. Globalized Weisfeiler-Lehman graph ker-
nels: Global-local feature maps of graphs. In Proceedings of the 2017 IEEE Inter-
national Conference on Data Mining, pages 327–336, 2017.
[77] C. Morris, N.M. Kriege, K. Kersting, and P. Mutzel. Faster kernel for graphs with
continuous attributes via hashing. In Proceedings of the 16th IEEE International
Conference on Data Mining, pages 1095–1100, 2016.
[78] C. Morris, M. Ritzert, M. Fey, W. Hamilton, J.E. Lenssen, G. Rattan, and
M. Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks.
In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, volume
4602-4609. AAAI Press, 2019.
35
36. [79] V. Nagarajan and M. Sviridenko. On the maximum quadratic assignment prob-
lem. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 516–524, 2009.
[80] A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal.
graph2vec: Learning distributed representations of graphs. ArXiv (CoRR),
arXiv:1707.05005 [cs.AI], 2017.
[81] J. Nešetřil and P. Ossona de Mendez. Linear time low tree-width partitions and
algorithmic consequences. In Proceedings of the 38th ACM Symposium on Theory
of Computing, pages 391–400, 2006.
[82] M. Neumann, R. Garnett, and K. Kersting. Coinciding walk kernels: Parallel
absorbing random walks for learning with graphs and few labels. In Proceedings
of the 5th Asian Conference on Machine Learning, pages 357–372, 2013.
[83] M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way model for collective learning
on multi-relational data. In Proceedings of the 28th International Conference on
Machine Learning, pages 809–816, 2011.
[84] R. O’Donnell, J. Wright, C. Wu, and Y. Zhou. Hardness of robust graph isomor-
phism, Lasserre gaps, and asymmetry of random graphs. In Proceedings of the 25th
Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1659–1677, 2014.
[85] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu. Asymmetric transitivity preserv-
ing graph embedding. In Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pages 1105–1114, 2016.
[86] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang. Adversarially regular-
ized graph autoencoder for graph embedding. ArXiv (CoRR), arXiv:1802.04407
[cs.LG], 2018.
[87] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social rep-
resentations. In Proceedings of the 20th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pages 710–710, 2014.
[88] T. Pham, T. Tran, D. Phung, and S. Venkatesh. Column networks for collective
classification. In Proceedings of the 31st AAAI Conference on Artificial Intelli-
gence, pages 2485–2491, 2017.
[89] J. Ramon and T. Gärtner. Expressivity versus efficiency of graph kernels. In
Proceedings of the 1st International Workshop on Mining Graphs, Trees and Se-
quences, pages 65–74, 2003.
[90] F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph
neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
36
37. [91] M . Schlichtkrull, T.N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling.
Modeling relational data with graph convolutional networks. In A. Gangemi,
R. Navigli, M.-E. Vidal, P. Hitzler, R. Troncy, L. Hollink, A. Tordai, and M. Alam,
editors, Proceedings of the European Semantic Web Conference, volume 10843 of
Lecture Notes in Computer Science, pages 593–607. Springer Verlag, 2018.
[92] B. Schölkopf, A. Smola, and K.-R. Müller. Kernel principal component analysis. In
W. Gerstner, A. Germond, M. Hasler, and J.D. Nicoud, editors, Proceedings of the
International Conference on Artificial Neural Networks, volume 1327 of Lecture
Notes in Computer Science, pages 583–588. Springer Verlag, 1997.
[93] S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From
Theory to Algorithms. Cambridge University Press, 2014.
[94] N. Shervashidze, P. Schweitzer, E.J. van Leeuwen, K. Mehlhorn, and K.M. Borg-
wardt. Weisfeiler-Lehman graph kernels. Journal of Machine Learning Research,
12:2539–2561, 2011.
[95] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt.
Efficient graphlet kernels for large graph comparison. In Artificial Intelligence and
Statistics, pages 488–495, 2009.
[96] A.J. Smola and R. Kondor. Kernels and regularization on graphs. In B. Schölkopf
and M.K. Warmuth, editors, Proceedings of the 16th Annual Conference on Com-
putational Learning Theory, volume 2777 of Lecture Notes in Computer Science,
pages 144–158. Springer Verlag, 2003.
[97] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale
information network embedding. In Proceedings of the 24th International World
Wide Web Conference, pages 1067–1077, 2015.
[98] J. Tenenbaum, V. De Silva, and J. Langford. A global geometric framework for
nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.
[99] G. Tinhofer. A note on compact graphs. Discrete Applied Mathematics, 30:253–
264, 1991.
[100] J. Tönshoff, M. Ritzert, H. Wolf, and M. Grohe. Graph neural networks for
maximum constraint satisfaction. ArXiv (CoRR), arXiv:1909.08387 [cs.AI], 2019.
[101] D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, pages 1225–1234, 2016.
[102] Q. Wang, Z. Mao, B. Wang, and L. Guo. Knowledge graph embedding: A survey of
approaches and applications. IEEE TRANSACTIONS ON KNOWLEDGE AND
DATA ENGINEERING, 29(12):2724–2743, 2017.
37
38. [103] B. Weisfeiler and A. Leman. The reduction of a graph to canonical form and
the algebgra which appears therein. NTI, Series 2, 1968. English transala-
tion by G. Ryabov avalable at https://www.iti.zcu.cz/wl2018/pdf/wl_paper_
translation.pdf.
[104] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P.S. Yu. A comprehensive survey
on graph neural networks. ArXiv, arXiv:1901.00596 [cs.Lg], 2019.
[105] Z. Xinyi and L. Chen. Capsule graph neural network. In Proceedings of the 7th
International Conference on Learning Representations. OpenReviw.net, 2019.
[106] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural net-
works? In Proceedings of the 7th International Conference on Learning Represen-
tations, 2019.
38