Discusses the concept of Language Models in Natural Language Processing. The n-gram models, markov chains are discussed. Smoothing techniques such as add-1 smoothing, interpolation and discounting methods are addressed.
The document provides an overview of various language embedding models including static word embeddings like Word2Vec, dynamic context embeddings like BERT, ELMO and GPT, graph embeddings like Node2Vec and Graph2Vec, and exponential family embeddings. It discusses the key techniques, algorithms, and architectures of these models for obtaining low dimensional vector representations of words and graphs that encode semantic and syntactic relationships.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
The document discusses word embedding techniques used to represent words as vectors. It describes Word2Vec as a popular word embedding model that uses either the Continuous Bag of Words (CBOW) or Skip-gram architecture. CBOW predicts a target word based on surrounding context words, while Skip-gram predicts surrounding words given a target word. These models represent words as dense vectors that encode semantic and syntactic properties, allowing operations like word analogy questions.
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...ssuser2624f71
This document proposes sparse graph attention networks (SGATs) which integrate a sparse attention mechanism into graph attention networks. SGATs simplify GAT architectures by sharing attention coefficients across heads and layers. SGATs can identify and remove noisy edges from graphs to achieve similar or improved accuracy on classification tasks. The proposed method is tested on several graph datasets and is shown to learn more robust representations compared to GAT, especially on disassortative graphs where GAT fails. Future work involves applying SGATs to edge detection against adversarial attacks and unsupervised domain adaptation.
A Generalization of Transformer Networks to Graphs.pptxssuser2624f71
This document summarizes a research paper on Graph Transformers, which generalizes transformer networks to graph-structured data. It introduces the Graph Transformer model, which addresses two key challenges of applying transformers to graphs: sparsity and positional encodings. The model uses Laplacian eigenvectors to encode node positions and handles sparsity through restricted self-attention. Experiments show the Graph Transformer outperforms GNN baselines on molecular property prediction and node classification tasks. Future work may explore efficient training on large graphs and heterogeneous domains.
This paper aims to develop an effective sentence model using a dynamic convolutional neural network (DCNN) architecture. The DCNN applies 1D convolutions and dynamic k-max pooling to capture syntactic and semantic information from sentences with varying lengths. This allows the model to relate phrases far apart in the input sentence and draw together important features. Experiments show the DCNN approach achieves strong performance on tasks like sentiment analysis of movie reviews and question type classification.
Gspan is an algorithm for frequent subgraph mining that avoids two major costs of previous approaches. It represents graphs as depth-first search (DFS) codes and builds a DFS code tree to systematically explore the search space. Each node in the tree represents a unique graph. Gspan tests for graph isomorphism by comparing minimum DFS codes, allowing it to prune redundant portions of the search space. An experimental evaluation showed it has good performance and scales well compared to previous methods.
Discusses the concept of Language Models in Natural Language Processing. The n-gram models, markov chains are discussed. Smoothing techniques such as add-1 smoothing, interpolation and discounting methods are addressed.
The document provides an overview of various language embedding models including static word embeddings like Word2Vec, dynamic context embeddings like BERT, ELMO and GPT, graph embeddings like Node2Vec and Graph2Vec, and exponential family embeddings. It discusses the key techniques, algorithms, and architectures of these models for obtaining low dimensional vector representations of words and graphs that encode semantic and syntactic relationships.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
The document discusses word embedding techniques used to represent words as vectors. It describes Word2Vec as a popular word embedding model that uses either the Continuous Bag of Words (CBOW) or Skip-gram architecture. CBOW predicts a target word based on surrounding context words, while Skip-gram predicts surrounding words given a target word. These models represent words as dense vectors that encode semantic and syntactic properties, allowing operations like word analogy questions.
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...ssuser2624f71
This document proposes sparse graph attention networks (SGATs) which integrate a sparse attention mechanism into graph attention networks. SGATs simplify GAT architectures by sharing attention coefficients across heads and layers. SGATs can identify and remove noisy edges from graphs to achieve similar or improved accuracy on classification tasks. The proposed method is tested on several graph datasets and is shown to learn more robust representations compared to GAT, especially on disassortative graphs where GAT fails. Future work involves applying SGATs to edge detection against adversarial attacks and unsupervised domain adaptation.
A Generalization of Transformer Networks to Graphs.pptxssuser2624f71
This document summarizes a research paper on Graph Transformers, which generalizes transformer networks to graph-structured data. It introduces the Graph Transformer model, which addresses two key challenges of applying transformers to graphs: sparsity and positional encodings. The model uses Laplacian eigenvectors to encode node positions and handles sparsity through restricted self-attention. Experiments show the Graph Transformer outperforms GNN baselines on molecular property prediction and node classification tasks. Future work may explore efficient training on large graphs and heterogeneous domains.
This paper aims to develop an effective sentence model using a dynamic convolutional neural network (DCNN) architecture. The DCNN applies 1D convolutions and dynamic k-max pooling to capture syntactic and semantic information from sentences with varying lengths. This allows the model to relate phrases far apart in the input sentence and draw together important features. Experiments show the DCNN approach achieves strong performance on tasks like sentiment analysis of movie reviews and question type classification.
Gspan is an algorithm for frequent subgraph mining that avoids two major costs of previous approaches. It represents graphs as depth-first search (DFS) codes and builds a DFS code tree to systematically explore the search space. Each node in the tree represents a unique graph. Gspan tests for graph isomorphism by comparing minimum DFS codes, allowing it to prune redundant portions of the search space. An experimental evaluation showed it has good performance and scales well compared to previous methods.
Gspan is an algorithm for frequent subgraph mining that avoids two major costs of previous approaches. It represents graphs as depth-first search (DFS) codes to compare graphs for isomorphism testing. The algorithm grows patterns by extending edges in lexicographic order, checking the anti-monotonic property to prune infrequent subgraphs. Gspan compares the minimum DFS codes of two graphs to determine isomorphism, allowing simple string comparison of graphs. This helps reduce the problem size versus subgraph isomorphism testing.
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...thanhdowork
1. GraphACL is a self-supervised contrastive learning method for graph-structured data that aims to capture both one-hop neighborhood context and two-hop monophily without relying on homophily assumptions.
2. It introduces an additional predict objective to encourage the encoder to learn representations that can predict neighboring node features, implicitly capturing neighborhood context.
3. GraphACL minimizes an upper bound on a contrastive loss to push node representations away from each other and avoid collapsed representations. It performs well on both heterophilic and homophilic graphs for node classification.
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...thanhdowork
This document proposes a new graph embedding algorithm called Walklets that explicitly learns multi-scale network representations. Walklets uses a "skipping" mechanism during random walks to capture structural information at different scales. It learns representations by optimizing a loss function via stochastic gradient descent. Evaluation on social networks shows Walklets outperforms baselines by better modeling multi-scale effects and scales to large graphs through sampling approximations.
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
This document describes the node2vec algorithm for feature learning in networks. Node2vec uses random walks to sample the neighborhood of nodes in a network. It learns feature representations that maximize the likelihood of preserving network neighborhoods in a low-dimensional space. The algorithm introduces two parameters, p and q, that allow it to flexibly explore node neighborhoods. Experiments on real-world networks show node2vec produces high quality feature representations that achieve strong performance on tasks like multi-label classification and link prediction.
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
This document summarizes research on k-dimensional graph neural networks (k-GNNs), which are a generalization of graph neural networks (GNNs) based on the k-dimensional Weisfeiler-Leman graph isomorphism test. It presents the theoretical basis for k-GNNs, describes the k-GNN model and a hierarchical variant, and reports the results of experimental studies comparing k-GNNs to GNNs and kernel methods on several benchmark datasets. The research found that k-GNNs outperformed GNNs and were able to match the performance of kernel methods, demonstrating their ability to learn graph properties beyond what GNNs can represent.
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
Generalized linear models (GLMs) are a class of models that include linear regression, logistic regression, and other forms. GLMs are implemented in both MLlib and SparkR in Spark. They support various solvers like gradient descent, L-BFGS, and iteratively re-weighted least squares. Performance is optimized through techniques like sparsity, tree aggregation, and avoiding unnecessary data copies. Future work includes better handling of categoricals, more model statistics, and model parallelism.
Generalized Linear Models in Spark MLlib and SparkRDatabricks
Generalized linear models (GLMs) unify various statistical models such as linear regression and logistic regression through the specification of a model family and link function. They are widely used in modeling, inference, and prediction with applications in numerous fields. In this talk, we will summarize recent community efforts in supporting GLMs in Spark MLlib and SparkR. We will review supported model families, link functions, and regularization types, as well as their use cases, e.g., logistic regression for classification and log-linear model for survival analysis. Then we discuss the choices of solvers and their pros and cons given training datasets of different sizes, and implementation details in order to match R’s model output and summary statistics. We will also demonstrate the APIs in MLlib and SparkR, including R model formula support, which make building linear models a simple task in Spark. This is a joint work with Eric Liang, Yanbo Liang, and some other Spark contributors.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
The document summarizes research on locally densest subgraph discovery. It discusses limitations of prior work that focuses on finding only the single densest subgraph or top-k dense subgraphs through a greedy approach. This may fail to fully characterize the graph's dense regions. The paper proposes defining a locally densest subgraph as one that is maximally ρ-compact, meaning it is connected and removal of nodes removes at least ρ times as many edges, ensuring it is not contained within a better subgraph. This formal definition can better represent different dense regions for applications like community detection.
Graph Techniques for Natural Language ProcessingSujit Pal
Natural Language embodies the human ability to make “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965). A relatively small number of words can be combined using a grammar in myriad different ways to convey all kinds of information. Languages model inter-relationships between their words, just like graphs model inter-relationships between their vertices. It is not surprising then, that graphs are a natural tool to study Natural Language and glean useful information from it, automatically, and at scale. This presentation will focus on NLP techniques to convert raw text to graphs, and present Graph Theory based solutions to some common NLP problems. Solutions presented will use Apache Spark or Neo4j depending on problem size and scale. Examples of Graph Theory solutions presented include PageRank for Document Summarization, Link Prediction from raw text for Knowledge Graph enhancement, Label Propagation for entity classification, and Random Walk techniques to find similar documents.
This document discusses Lempel-Ziv algorithms for lossless data compression. It introduces the LZ77 and LZ78 algorithms, which use adaptive dictionaries to encode repeated patterns in data. The document describes how LZ77 uses a sliding window to find the longest matching string and represents it with an offset-length pair. It also explains how LZ78 builds an explicit dictionary as it encodes. The document provides examples and discusses improvements made to these original Lempel-Ziv algorithms.
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
The document discusses various machine learning techniques for image classification, including clustering strategies, feature extraction, and classifiers. It provides examples of k-means clustering, agglomerative clustering, mean-shift clustering, spectral clustering, bag-of-features representations, nearest neighbor classification, linear and nonlinear support vector machines (SVMs). SVMs are discussed in more detail, covering how they can learn nonlinear decision boundaries using the kernel trick, common kernel functions for images, and pros and cons of SVMs for classification.
The document provides an overview of deep learning concepts and techniques for natural language processing tasks. It includes the following:
1. A schedule for a deep learning workshop covering fundamentals of deep learning for machine translation, word embeddings, neural language models, and neural machine translation.
2. Descriptions of neural networks, activation functions, backpropagation, and word embeddings.
3. Details about feedforward neural network language models, recurrent neural network language models, and how they are applied to tasks like language modeling and machine translation.
4. An explanation of attention-based encoder-decoder models for neural machine translation.
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...ssuser4b1f48
This document presents GOAT, a scalable global transformer model for graph-structured data. GOAT uses a novel local attention module to absorb rich local information from node neighborhoods, in addition to a global attention mechanism that allows each node to attend to all other nodes. The document reports that GOAT achieves strong performance on large-scale homophilous and heterophilous node classification benchmarks, demonstrating its ability to leverage both local and global graph information for prediction tasks. Ablation studies on codebook size further indicate GOAT's effectiveness at modeling long-range interactions through its global attention.
Gspan is an algorithm for frequent subgraph mining that avoids two major costs of previous approaches. It represents graphs as depth-first search (DFS) codes to compare graphs for isomorphism testing. The algorithm grows patterns by extending edges in lexicographic order, checking the anti-monotonic property to prune infrequent subgraphs. Gspan compares the minimum DFS codes of two graphs to determine isomorphism, allowing simple string comparison of graphs. This helps reduce the problem size versus subgraph isomorphism testing.
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...thanhdowork
1. GraphACL is a self-supervised contrastive learning method for graph-structured data that aims to capture both one-hop neighborhood context and two-hop monophily without relying on homophily assumptions.
2. It introduces an additional predict objective to encourage the encoder to learn representations that can predict neighboring node features, implicitly capturing neighborhood context.
3. GraphACL minimizes an upper bound on a contrastive loss to push node representations away from each other and avoid collapsed representations. It performs well on both heterophilic and homophilic graphs for node classification.
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...thanhdowork
This document proposes a new graph embedding algorithm called Walklets that explicitly learns multi-scale network representations. Walklets uses a "skipping" mechanism during random walks to capture structural information at different scales. It learns representations by optimizing a loss function via stochastic gradient descent. Evaluation on social networks shows Walklets outperforms baselines by better modeling multi-scale effects and scales to large graphs through sampling approximations.
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
This document describes the node2vec algorithm for feature learning in networks. Node2vec uses random walks to sample the neighborhood of nodes in a network. It learns feature representations that maximize the likelihood of preserving network neighborhoods in a low-dimensional space. The algorithm introduces two parameters, p and q, that allow it to flexibly explore node neighborhoods. Experiments on real-world networks show node2vec produces high quality feature representations that achieve strong performance on tasks like multi-label classification and link prediction.
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
This document summarizes research on k-dimensional graph neural networks (k-GNNs), which are a generalization of graph neural networks (GNNs) based on the k-dimensional Weisfeiler-Leman graph isomorphism test. It presents the theoretical basis for k-GNNs, describes the k-GNN model and a hierarchical variant, and reports the results of experimental studies comparing k-GNNs to GNNs and kernel methods on several benchmark datasets. The research found that k-GNNs outperformed GNNs and were able to match the performance of kernel methods, demonstrating their ability to learn graph properties beyond what GNNs can represent.
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
Generalized linear models (GLMs) are a class of models that include linear regression, logistic regression, and other forms. GLMs are implemented in both MLlib and SparkR in Spark. They support various solvers like gradient descent, L-BFGS, and iteratively re-weighted least squares. Performance is optimized through techniques like sparsity, tree aggregation, and avoiding unnecessary data copies. Future work includes better handling of categoricals, more model statistics, and model parallelism.
Generalized Linear Models in Spark MLlib and SparkRDatabricks
Generalized linear models (GLMs) unify various statistical models such as linear regression and logistic regression through the specification of a model family and link function. They are widely used in modeling, inference, and prediction with applications in numerous fields. In this talk, we will summarize recent community efforts in supporting GLMs in Spark MLlib and SparkR. We will review supported model families, link functions, and regularization types, as well as their use cases, e.g., logistic regression for classification and log-linear model for survival analysis. Then we discuss the choices of solvers and their pros and cons given training datasets of different sizes, and implementation details in order to match R’s model output and summary statistics. We will also demonstrate the APIs in MLlib and SparkR, including R model formula support, which make building linear models a simple task in Spark. This is a joint work with Eric Liang, Yanbo Liang, and some other Spark contributors.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
The document summarizes research on locally densest subgraph discovery. It discusses limitations of prior work that focuses on finding only the single densest subgraph or top-k dense subgraphs through a greedy approach. This may fail to fully characterize the graph's dense regions. The paper proposes defining a locally densest subgraph as one that is maximally ρ-compact, meaning it is connected and removal of nodes removes at least ρ times as many edges, ensuring it is not contained within a better subgraph. This formal definition can better represent different dense regions for applications like community detection.
Graph Techniques for Natural Language ProcessingSujit Pal
Natural Language embodies the human ability to make “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965). A relatively small number of words can be combined using a grammar in myriad different ways to convey all kinds of information. Languages model inter-relationships between their words, just like graphs model inter-relationships between their vertices. It is not surprising then, that graphs are a natural tool to study Natural Language and glean useful information from it, automatically, and at scale. This presentation will focus on NLP techniques to convert raw text to graphs, and present Graph Theory based solutions to some common NLP problems. Solutions presented will use Apache Spark or Neo4j depending on problem size and scale. Examples of Graph Theory solutions presented include PageRank for Document Summarization, Link Prediction from raw text for Knowledge Graph enhancement, Label Propagation for entity classification, and Random Walk techniques to find similar documents.
This document discusses Lempel-Ziv algorithms for lossless data compression. It introduces the LZ77 and LZ78 algorithms, which use adaptive dictionaries to encode repeated patterns in data. The document describes how LZ77 uses a sliding window to find the longest matching string and represents it with an offset-length pair. It also explains how LZ78 builds an explicit dictionary as it encodes. The document provides examples and discusses improvements made to these original Lempel-Ziv algorithms.
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
The document discusses various machine learning techniques for image classification, including clustering strategies, feature extraction, and classifiers. It provides examples of k-means clustering, agglomerative clustering, mean-shift clustering, spectral clustering, bag-of-features representations, nearest neighbor classification, linear and nonlinear support vector machines (SVMs). SVMs are discussed in more detail, covering how they can learn nonlinear decision boundaries using the kernel trick, common kernel functions for images, and pros and cons of SVMs for classification.
The document provides an overview of deep learning concepts and techniques for natural language processing tasks. It includes the following:
1. A schedule for a deep learning workshop covering fundamentals of deep learning for machine translation, word embeddings, neural language models, and neural machine translation.
2. Descriptions of neural networks, activation functions, backpropagation, and word embeddings.
3. Details about feedforward neural network language models, recurrent neural network language models, and how they are applied to tasks like language modeling and machine translation.
4. An explanation of attention-based encoder-decoder models for neural machine translation.
Similar to NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs", 2016 (20)
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...ssuser4b1f48
This document presents GOAT, a scalable global transformer model for graph-structured data. GOAT uses a novel local attention module to absorb rich local information from node neighborhoods, in addition to a global attention mechanism that allows each node to attend to all other nodes. The document reports that GOAT achieves strong performance on large-scale homophilous and heterophilous node classification benchmarks, demonstrating its ability to leverage both local and global graph information for prediction tasks. Ablation studies on codebook size further indicate GOAT's effectiveness at modeling long-range interactions through its global attention.
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...ssuser4b1f48
This document summarizes the Cluster-GCN method for training graph convolutional networks (GCNs) in a memory-efficient and scalable way. The key contributions of Cluster-GCN are that it achieves the best memory usage for training GCNs on large graphs, especially deep GCNs, while maintaining training speed comparable to or faster than existing methods. Experimental results demonstrate that Cluster-GCN can efficiently train very deep GCNs on large graphs and achieve state-of-the-art performance.
This document summarizes a research paper on Gate Graph Sequence Neural Networks (GGSNN). GGSNN is a model that incorporates time dependencies and higher-order relationships in graphs using GRU-based methods. It generates an output sequence to allow for graph-level analysis. The model can be used for a wide range of tasks involving logical formulas. It uses GRU to compute slopes via backpropagation over time, allowing it to capture long-term dependencies between output time steps. Node representations in GGSNN can be updated over time using label data, unlike previous graph neural networks.
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...ssuser4b1f48
1) The document proposes a deep learning framework called DeepLGF to predict drug-drug interactions by combining local and global feature extraction from biomedical knowledge graphs.
2) DeepLGF uses graph neural networks and knowledge graph embedding methods to extract local drug features from chemical structures and biological functions, and global features from the relationships between drugs and other biological entities.
3) Experimental results on prediction tasks using several drug interaction datasets demonstrate that DeepLGF outperforms other state-of-the-art models and has promising applications in drug development and clinical use.
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...ssuser4b1f48
1. The document summarizes the GraphSAGE framework for inductive node embedding proposed by Hamilton et al.
2. GraphSAGE leverages node features to learn an embedding function that generalizes to unseen nodes using a sample and aggregate approach.
3. Across citation, Reddit, and other datasets, GraphSAGE improves classification F1-scores by 51% on average compared to using node features alone and outperforms strong baselines.
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...ssuser4b1f48
This document proposes a new self-supervised learning framework called Relational Graph Representation Learning (RGRL). RGRL aims to learn node representations that preserve relationships between nodes even after augmentation. It does this by focusing training on low-degree nodes and using both global and local contexts to sample anchor nodes. Experiments on 14 real-world datasets show RGRL outperforms previous methods on tasks like node classification and link prediction.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs", 2016
1. Ho-Beom Kim
Network Science Lab
Dept. of Mathematics
The Catholic University of Korea
E-mail: hobeom2001@catholic.ac.kr
2023 / 07 / 03
Annamalai Narayanan, et al.
2. 2
Introduction
• Problem statements
• Contributions
Methodology
Related work
Experiments
Results
Discussion
Conclusion
3. 3
1. Introduction
Contributions
• Propose subgraph2vec, an unsupervised representation learning technique to learn latent
representations of rooted subgraphs present in large graphs
• Develop a modified version of the skipgram language model which is capable of modeling varying
lengthradial contexts around target subgraphs
• Subgraph2vec’s representation learning technique would help deep learning variant of WL kernel
• Demonstrate that subgraph2vec could significantly outperform state-of-the-art
4. 4
1. Introduction
Limitations of Existing Graph kernels
• (L1) Structural similarity
• Substructures that are used to compute the kernel matrix are not independent.
• (L2) Diagonal Dominance
• Since graph kernels regard these substructures as separate features, the dimensional- ity of the
feature space often grows exponentially with the number of substructures.
• only a few sub- structures will be common across graphs.
• leads to diagonal dominance, that is, a given graph is similar to it- self but not to any other graph
in the dataset.
• This leads to poor classification/clustering accuracy.
5. 5
1. Introduction
Existing Solution
• DGK : Deep Graph Kernels
• 𝐾 𝐺, 𝐺′ = Φ 𝐺 𝑇𝑀Φ 𝐺′
• 𝑀𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑠 𝑎 𝑉 𝑥 𝑉 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑒𝑚𝑖 −
𝑑𝑒𝑓𝑖𝑛𝑒 𝑚𝑎𝑡𝑟𝑖𝑥 𝑡ℎ𝑎𝑡 𝑒𝑛𝑐𝑜𝑑𝑒𝑠 𝑡ℎ𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑢𝑏𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒𝑠 𝑎𝑛𝑑 𝑉 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑣𝑜𝑐𝑎𝑏𝑢
• Lary of substrucures obtained from the training data
• M matrix respects the similarity of the substructure space
6. 6
1. Introduction
Related Work
• Deep Walk and node2vec intend to learn node embeddings by generating random walks in a single
graph.
• Both these works rely on existence of node labels for at least a small portion of nodes and take a
semi-supervised approach to learn node embeddings.
• Subgraph2vec learns subgraph embeddings in an unsupervised manner
• Graph kernel’s categories
• Kernels for limited-size subgraphs
• Kernels based on subtree patterns
• Kernels based on walks and paths
• Subgraph2vec is complementary to these existing graph kernels where the substructures exhibit
reasonable similarities among them
7. 7
1. Introduction
Problem Statements
• Consider the problem of learning distributed representations of rooted subgraphs from a given set of
graphs
• 𝐺 = (𝑉, 𝐸, 𝜆)
• 𝐺 = 𝑉
𝑠𝑔, 𝐸𝑠𝑔, 𝜆𝑠𝑔
• Sg : a sub-graph of G iff there exists an injective mapping 𝜇 ∶ 𝑉
𝑠𝑔 → 𝑉 s.t. (𝑣1, 𝑣2) ∈
𝐸𝑠𝑔 𝑖𝑓𝑓 𝜇 𝑣1 , 𝜇 𝑣2 ∈ 𝐸
• 𝒢 = 𝐺1, 𝐺2, … , 𝐺𝑛 ∶ 𝑎 𝑠𝑒𝑡 𝑜𝑓 𝑔𝑟𝑎𝑝ℎ𝑠
• D : positive integer
8. 8
4. Background : Language Models
Traditional language models
• The traditional language models determine the likelihood of a sequence of words appearing in it.
• Pr 𝑤𝑡 𝑤1, … , 𝑤𝑡−1
• Estimate the likelihood of observing the target word 𝑤𝑡 given n previous words (𝑤1, … , 𝑤𝑡−1)
observed thus far
9. 9
4. Background : Language Models
Neural language models
• The recently developed neural language models focus on learning distributed vector representation of
words
• These models improve traditional n-gram models by using vector embeddings for words
• Neural language models exploit the of the notion of context where a context is. Efined as a. ixed number
of. Ords surrounding the target word
• 𝑡=1
𝑇
𝑙𝑜𝑔𝑃𝑟 𝑤𝑡 𝑤1, … , 𝑤𝑡−1
• 𝑤𝑡 𝑤1, … , 𝑤𝑡−1 are the context of the target word 𝑤𝑡
•
10. 10
4. Background : Language Models
Skip Gram
• The skipgram model maximizes co-occurrence probability among the words that appear within a given
context window.
• Give a context window of size c and the target word wt, skipgram model attempts to predict the words
that appear in the context of the target word, (𝑤𝑡−𝑐, ..., 𝑤𝑡−𝑐 ).
• 𝑡=1
𝑇
𝑙𝑜𝑔𝑃𝑟(𝑤𝑡−𝑐, … , 𝑤𝑡+𝑐|𝑤𝑡)
• 𝑃𝑟(𝑤𝑡−𝑐, … , 𝑤𝑡+𝑐|𝑤𝑡) : computed as Π−𝑐≤𝑗≤𝑐,𝑗≠0Pr(𝑤𝑡+𝑗|𝑤𝑡)
• 𝑃𝑟(𝑤𝑡+𝑗 |𝑤𝑡) :
𝑒𝑥𝑝(Φ𝑤
𝑇
𝑡Φ𝑤𝑡+𝑗
′ )
𝑤=1
𝑉 𝑒𝑥𝑝(Φ𝑤
𝑇
𝑡Φ𝑤
′ )
• Φ𝑤 Φ𝑤
′
: input and output vectors of word w
11. 11
4. Background : Language Models
Negative Sampling
• Negative sampling selects the words that are not in the context at random instead of considering all
words in the vocabulary.
• If a word w appears in the context of another word w′, then the vector embedding of w is closer to
that of w′ compared to any other randomly chosen word from the vocabulary.
• the learned word embeddings preserve semantics
• can utilize word embedding models to learn dimensions of similarity between subgraphs.
• that similar subgraphs will be close to each other in the embedding space.
12. 12
5. Method
Learning Sub-Graph Representations
• Similar to the language modeling convention, the only re- quired input is a corpus and a vocabulary of
subgraphs for subgraph2vec to learn representations.
• Given a dataset of graphs, subgraph2vec considers all the neighbourhoods of rooted subgraphs around
every rooted subgraph as its corpus, and set of all rooted subgraphs around every node in every graph
as its vocabulary.
• Following the language model training process with the subgraphs and their contexts, subgraph2vec
learns the intended subgraph embeddings.
13. 13
5. Method
Algorithm : subgraph2vec
• The algorithm consists of two main components
• A procedure to generate rooted subgraphs
around every node in a given graph and
second every node in a given graph
• The procedure to learn embeddings of
those subgraphs
• Learn δ dimensional embeddings of subgraphs
(up to degree D) from all the graphs in dataset
G in e epochs. We begin by building a
vocabulary of all the subgraphs
14. 14
5. Method
Algorithm : subgraph2vec
• The algorithm consists of two main components
• A procedure to generate rooted subgraphs
around every node in a given graph and
second every node in a given graph
• The procedure to learn embeddings of
those subgraphs
• Learn δ dimensional embeddings of subgraphs
(up to degree D) from all the graphs in dataset
G in e epochs. We begin by building a
vocabulary of all the subgraphs
• Then the embeddings for all subgraphs in the
vocabulary (Φ) is initialized randomly
• we proceed with learning the embeddings in
several epochs iterating over the graphs in G.
15. 15
5. Method
Extracting Rooted Subgraphs
• To extract these subgraphs, we follow the well-known WL relabeling process which lays the basis for the
WL kernel and WL test of graph isomorphism
•
16. 16
5. Method
Radial Skipgram – Modeling the radial context
• unlike words in a traditional text corpora, subgraphs do not have a linear co-occurrence relationship.
• consider the breadth-first neighbours of the root node as its context as it directly follows from the
definition of WL relabeling process.
• Define the context of a degree-d subgraph 𝑠𝑔𝑣
(𝑑)
rooted at v, as the multiset of subgraphs of
degrees d-1, d, d+1 rooted at each of the neighbors of v(lines 2-6 in algorithm3)
• Subgraph of degrees d-1, d, d+1 to be in the context of a subgraph of degree d
• Degree-d subgraph is likely to be rather similar to subgraphs of degrees that are closer to d
17. 17
5. Method
Radial Skipgram – Vanilla Skip Gram
• the vanilla skipgram language model captures fixed-length linear contexts over the words in a given
sentence.
• For learning a subgraph;s radial context , the vanilla skipgram model could not be used
18. 18
5. Method
Radial Skipgram – Modification
• This could be in several thousands/millions in the case of large graphs.
• Training such models would re- quire large amount of computational resources.
• To alleviate this bottleneck, we approximate the probability distribution using the negative sampling
approach.
19. 19
5. Method
Negative sampling
• 𝑠𝑔𝑐𝑜𝑛𝑡 ∈ 𝑆𝐺𝑣𝑜𝑐𝑎𝑏 𝑎𝑛𝑑 𝑆𝐺𝑣𝑜𝑐𝑎𝑏 𝑖𝑠 𝑣𝑒𝑟𝑦 𝑙𝑎𝑟𝑔𝑒, Pr(𝑠𝑔𝑐𝑜𝑢𝑛𝑡|Φ(𝑠𝑔𝑣
(𝑑)
)) is prohibitively expensive
• We follow the negative sampling strategy to calculate above mentioned posterior probability
• every training cycle of Algorithm3 , we choose a fixed number of subgraphs (denoted as negsamples) as
negative samples and update their embeddings as well.
• Negative samples adhere to the following conditions
• If 𝑛𝑒𝑔𝑠𝑎𝑚𝑝𝑙𝑒𝑠 = 𝑠𝑔𝑛𝑒𝑔1, 𝑠𝑔𝑛𝑒𝑔2, … , 𝑡ℎ𝑒𝑛 𝑛𝑒𝑔𝑎𝑠𝑎𝑚𝑝𝑙𝑒𝑠 ⊂ 𝑆𝐺𝑣𝑜𝑐𝑎𝑏, 𝑛𝑒𝑔𝑠𝑎𝑚𝑝𝑙𝑒𝑠 ≪
𝑆𝐺𝑣𝑜𝑐𝑎𝑏 𝑎𝑛𝑑 𝑛𝑒𝑔𝑎𝑠𝑎𝑚𝑝𝑙𝑒𝑠 ∩ 𝑐𝑜𝑛𝑡𝑒𝑥𝑡𝑣
𝑑
= {}
• Φ 𝑠𝑔𝑣
𝑑
𝑐𝑙𝑜𝑠𝑒𝑟 𝑡𝑜 𝑡ℎ𝑒 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑠 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑠𝑢𝑏𝑔𝑟𝑎𝑝ℎ𝑠 𝑖𝑡𝑠 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝑎𝑛𝑑 𝑎𝑡 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑡ℎ𝑒
Same from the embeddings of a fixed number of subgraphs that are not its context
•
20. 20
5. Method
Optimization
• Stochastic gradient descent (SGD) optimizer is used to op- timize these parameters
• derivatives are estimated using the back-propagation algorithm.
• The learning rate α is empirically tuned.
21. 21
5. Method
Relation to Deep WL kernel
• each of the subgraph in SGvocab is obtained using the WL re-labelling strategy, and hence represents
the WL neighbourhood labels of a node.
• Hence learning latent representations of such subgraphs amounts to learning representations of WL
neighbourhood labels.
• Therefore, once the embeddings of all the subgraph in SGvocab are learnt using Algorithm 1, one could
use it to build the deep learning variant of the WL kernel among the graphs in G.
24. 24
6. Evaluation
Results and Discussion
• Accuracy
• SVMs with subgraph2vec’s embeddings achieve better accuracy on 3 datasets and comparable
accuracy on the remaining 2 datasets
25. 25
6. Evaluation
Results and Discussion
• Efficiency
• It is important to note that classification on these benchmark datasets are much simpler than real-
world classification tasks.
• By using trivial features such as number of nodes in the graph, achieved comparable accuracies to
the SOTA graph kernels.
26. 26
7. Conclusion
Evaluation
• Presented subgraph2vec, an unsupervised representation learning technique to learn embedding of
rooted subgraphs that exist in large graphs
• Through our large-scale experiment involving benchmark and real-world graph classification and
clustering datasets
• We demonstrate that subgraph embeddings learnt by our approach could be used in conjunction
with classifiers such as CNNs, SVMs and relational data clustering algorithms to ahieve sign