•

0 likes•933 views

We present a new graph representation learning approach called SEMAC that jointly exploits fine-grained node features as well as the overall graph topology. In contrast to the SGNS or SVD methods espoused in previous representation-based studies, our model represents nodes in terms of subgraph embeddings acquired via a form of convex matrix completion to iteratively reduce the rank, and thereby, more effectively eliminate noise in the representation. Thus, subgraph embeddings and convex matrix completion are elegantly integrated into a novel link prediction framework.

Report

Share

Report

Share

Download to read offline

Locally densest subgraph discovery

The document summarizes research on locally densest subgraph discovery. It discusses limitations of prior work that focuses on finding only the single densest subgraph or top-k dense subgraphs through a greedy approach. This may fail to fully characterize the graph's dense regions. The paper proposes defining a locally densest subgraph as one that is maximally ρ-compact, meaning it is connected and removal of nodes removes at least ρ times as many edges, ensuring it is not contained within a better subgraph. This formal definition can better represent different dense regions for applications like community detection.

240408_JW_labseminar[Asymmetric Transitivity Preserving Graph Embedding].pptx

This document presents a method for high-order proximity preserved embedding (HOPE) to learn embedding vectors that capture the asymmetric transitivity of directed graphs. HOPE derives a general formulation for commonly used high-order proximity measurements and approximates them using generalized SVD, enabling scalable embedding of large graphs. Experimental results on citation, social media, and synthetic networks demonstrate that HOPE more accurately approximates proximities and outperforms baselines in tasks like link prediction and vertex recommendation by preserving asymmetric transitivity in the embeddings.

Pinsage Stanford slides.pdf

This document summarizes a lecture on graph neural network models for recommendation systems. It discusses three graph neural network models: PinSage for item recommendation on Pinterest, Decagon for heterogeneous graphs, and GCPN for goal-directed generation. PinSage is described in detail, including how it uses graph convolutions to generate embeddings for items, importance sampling to select neighbors, training on positive and hard negative pairs, and its performance gains over baseline methods for related item recommendation.

1 chayes

This document summarizes a talk on algorithms that use locality to solve network problems efficiently. It discusses how limitations on network visibility require local algorithms that make sequential decisions using limited information. It presents local algorithms for preferential attachment networks and general graphs that solve problems like finding high-degree nodes and computing minimum dominating sets. It also describes how locality can enable sublinear-time algorithms for estimating PageRank values and solving influence maximization problems in viral marketing models. The talk outlines techniques like multiscale analysis and sparse matrix methods that allow computing PageRank summaries and influential nodes faster than previous methods.

240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx

This document describes the node2vec algorithm for feature learning in networks. Node2vec uses random walks to sample the neighborhood of nodes in a network. It learns feature representations that maximize the likelihood of preserving network neighborhoods in a low-dimensional space. The algorithm introduces two parameters, p and q, that allow it to flexibly explore node neighborhoods. Experiments on real-world networks show node2vec produces high quality feature representations that achieve strong performance on tasks like multi-label classification and link prediction.

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs", 2016

DeepWalk: Online Learning of Representations

DeepWalk is an approach to learn latent representations of graphs by treating short random walks as sentences in a language modeling framework. It learns the representations by predicting context vertices using the Skip-Gram model. DeepWalk represents each vertex in a graph as a d-dimensional feature vector learned from random walks in the graph. It scales to large graphs and outperforms other methods on network classification tasks using representations as features.

Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu

This document discusses how WeWork is using graph embeddings and the node2vec algorithm to power member recommendations. It first describes WeWork's member knowledge graph that contains data on members' profiles, interactions, interests and skills. It then explains how node2vec can learn vector representations of each member node that capture similarities, which can be used for recommendations. WeWork runs node2vec on the social graph of each location to map members to vectors and identify the most similar members to power recommendations like onboarding suggestions and introductions between members.

Locally densest subgraph discovery

The document summarizes research on locally densest subgraph discovery. It discusses limitations of prior work that focuses on finding only the single densest subgraph or top-k dense subgraphs through a greedy approach. This may fail to fully characterize the graph's dense regions. The paper proposes defining a locally densest subgraph as one that is maximally ρ-compact, meaning it is connected and removal of nodes removes at least ρ times as many edges, ensuring it is not contained within a better subgraph. This formal definition can better represent different dense regions for applications like community detection.

240408_JW_labseminar[Asymmetric Transitivity Preserving Graph Embedding].pptx

This document presents a method for high-order proximity preserved embedding (HOPE) to learn embedding vectors that capture the asymmetric transitivity of directed graphs. HOPE derives a general formulation for commonly used high-order proximity measurements and approximates them using generalized SVD, enabling scalable embedding of large graphs. Experimental results on citation, social media, and synthetic networks demonstrate that HOPE more accurately approximates proximities and outperforms baselines in tasks like link prediction and vertex recommendation by preserving asymmetric transitivity in the embeddings.

Pinsage Stanford slides.pdf

This document summarizes a lecture on graph neural network models for recommendation systems. It discusses three graph neural network models: PinSage for item recommendation on Pinterest, Decagon for heterogeneous graphs, and GCPN for goal-directed generation. PinSage is described in detail, including how it uses graph convolutions to generate embeddings for items, importance sampling to select neighbors, training on positive and hard negative pairs, and its performance gains over baseline methods for related item recommendation.

1 chayes

This document summarizes a talk on algorithms that use locality to solve network problems efficiently. It discusses how limitations on network visibility require local algorithms that make sequential decisions using limited information. It presents local algorithms for preferential attachment networks and general graphs that solve problems like finding high-degree nodes and computing minimum dominating sets. It also describes how locality can enable sublinear-time algorithms for estimating PageRank values and solving influence maximization problems in viral marketing models. The talk outlines techniques like multiscale analysis and sparse matrix methods that allow computing PageRank summaries and influential nodes faster than previous methods.

240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx

This document describes the node2vec algorithm for feature learning in networks. Node2vec uses random walks to sample the neighborhood of nodes in a network. It learns feature representations that maximize the likelihood of preserving network neighborhoods in a low-dimensional space. The algorithm introduces two parameters, p and q, that allow it to flexibly explore node neighborhoods. Experiments on real-world networks show node2vec produces high quality feature representations that achieve strong performance on tasks like multi-label classification and link prediction.

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs", 2016

DeepWalk: Online Learning of Representations

DeepWalk is an approach to learn latent representations of graphs by treating short random walks as sentences in a language modeling framework. It learns the representations by predicting context vertices using the Skip-Gram model. DeepWalk represents each vertex in a graph as a d-dimensional feature vector learned from random walks in the graph. It scales to large graphs and outperforms other methods on network classification tasks using representations as features.

Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu

This document discusses how WeWork is using graph embeddings and the node2vec algorithm to power member recommendations. It first describes WeWork's member knowledge graph that contains data on members' profiles, interactions, interests and skills. It then explains how node2vec can learn vector representations of each member node that capture similarities, which can be used for recommendations. WeWork runs node2vec on the social graph of each location to map members to vectors and identify the most similar members to power recommendations like onboarding suggestions and introductions between members.

NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...

This document describes a method called High-Order Proximity preserved Embedding (HOPE) to preserve asymmetric transitivity in directed graph embedding. HOPE approximates high-order proximities using a scalable generalized SVD algorithm. Experiments on several datasets show HOPE more accurately reconstructs graphs compared to baselines and better preserves proximity measurements like Katz index and rooted PageRank. Future work will explore nonlinear models to capture more complex directed graph structures.

J. Park, J. Song, ICLR 2022, MLILAB, KAISTAI

This paper proposes GraphENS, a method to synthesize ego networks to address the neighbor memorization problem that causes GNN models to overfit to minor classes in class-imbalanced node classification tasks. GraphENS samples ego networks from minor and target classes, assigns neighbors through sampling, mixes node features based on saliency, and attaches the synthesized ego network to the original graph to construct a balanced graph. Experiments show GraphENS mitigates both node and neighbor memorization, outperforming baselines on citation and co-purchase networks.

Sigmod11 outsource shortest path

The document summarizes research on computing the shortest distance between nodes in a graph while outsourcing the graph to the cloud for computational power and maintaining privacy of the data. The researchers propose (1) transforming the original graph into 1-neighborhood-d-radius graphs for outsourcing, (2) using a greedy algorithm to perform the transformation to minimize overhead, and (3) allowing approximate distances to further reduce overhead. Experiments demonstrate their methods achieve the security and privacy goals with low overhead.

PageRank on an evolving graph - Yanzhao Yang : NOTES

Highlighted notes while research with Prof. Dip Sankar Banerjee, Prof. Kishore Kothapalli:
PageRank on an evolving graph - Yanzhao Yang.
https://theory.utdallas.edu/seminar/G2S13/YY/Pagerank%20on%20evolving%20graph-Yanzhao%20Yang.pdf
Pagerank may be always imprecise, due to lack of knowledge of up-to-date/complete graph. Millions of hyperlinks/social-links modified each day. Which portions of the web should a crawler focus most (probing strategy)? Probing techniques discussed are Random probing, Round-robin probing, Proportional probing (random, proportional to node's pagerank), Priority probing (deterministic, pick node with highest cumulative pagerank sum), Hybrid probing (proportional + round-robin).

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

The document discusses content-based image retrieval. It begins with an overview of the problem of using a query image to retrieve similar images from a large dataset. Common techniques discussed include using SIFT features with bag-of-words models or convolutional neural network (CNN) features. The document outlines the classic SIFT retrieval pipeline and techniques for using features from pre-trained CNNs, such as max-pooling features from convolutional layers or encoding them with VLAD. It also discusses learning image representations specifically for retrieval using methods like the triplet loss to learn an embedding space that clusters similar images. The state-of-the-art methods achieve the best performance by learning global or regional image representations from CNNs trained on large, generated datasetsLens: Data exploration with Dask and Jupyter widgets

Lens is an open source Python library for automated data exploration of large datasets using Dask. It computes summary statistics and relationships between columns in a dataset. The results are serialized to JSON for interactive exploration through Jupyter widgets or a web UI. Dask allows the computations to run in parallel across a cluster for scalability. Lens integrates with the SherlockML platform to analyze all datasets uploaded.

Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...

YouTube Link: https://youtu.be/PbCl67GY1ck
** Machine Learning Engineer Masters Program: https://www.edureka.co/masters-program/machine-learning-engineer-training **
In this Edureka Session on Breadth-First Search Algorithm, we will discuss the logic behind graph traversal methods and use examples to understand the working of the Breadth-First Search algorithm.
Here’s a list of topics covered in this session:
1. Introduction To Graph Traversal
2. What is the Breadth-First Search?
3. Understanding the Breadth-First Search algorithm with an example
4. Breadth-First Search Algorithm Pseudocode
5. Applications Of Breadth-First Search
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in

SVD and the Netflix Dataset

Short summary and explanation of LSI (SVD) and how it can be applied to recommendation systems and the Netflix dataset in particular.

Data Mining Seminar - Graph Mining and Social Network Analysis

Delivered a formal presentation on course material for the Data Mining (EECS 4412) course at York University, Canada, about graph mining. Graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. The formal seminar was 50 to 60 minutes followed by 10 to 20 minutes for questions.
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412/lectures

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION

This document proposes a technique called dedensification to improve the scalability of graph pattern matching queries over large graphs. Dedensification compresses graphs by reducing the number of connections to high-degree nodes through the use of compressor nodes. This lossless compression technique allows graph pattern matching queries to be rewritten and executed more efficiently on the compressed graph. Experimental results on real-world graphs demonstrate that dedensification can significantly improve query performance compared to executing queries on the original uncompressed graphs.

cutnpeel_wsdm2022_slide.pdf

The document proposes a method called CutNPeel to efficiently find a set of near bi-cliques in dynamic graphs that satisfy criteria of being precise, exhaustive, and concise. CutNPeel works in two steps - first it partitions the graph, then peels good near bi-cliques from each partition using a top-down search approach that iteratively removes sparse nodes. The goal is to minimize a quality function measuring the total encoding cost of the near bi-cliques found.

Graph Sample and Hold: A Framework for Big Graph Analytics

Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs(e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy.While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we pro-pose a generic stream sampling framework for big-graph analytics,called Graph Sample and Hold (gSH), which samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state in memory. We use a Horvitz-Thompson construction in conjunction with a scheme that samples arriving edges without adjacencies to previously sampled edges with probability p and holds edges with adjacencies with probability q. Our sample and hold framework facilitates the accurate estimation of subgraph patterns by enabling the dependence of the sampling process to vary based on previous history. Within our framework, we show how to produce statistically unbiased estimators for various graph properties from the sample. Given that the graph analytic swill run on a sample instead of the whole population, the runtime complexity is kept under control. Moreover, given that the estimators are unbiased, the approximation error is also kept under control.

2015 vancouver-vanbug

This document discusses new directions for the khmer bioinformatics platform, including developing semi-streaming algorithms for sequence analysis using k-mers. Digital normalization is presented as an initial approach that compresses sequencing data, though it discards information. Later work introduced a two-pass semi-streaming framework using saturation detection to enable error correction and variant calling using minimal memory. Current work includes developing a pair-HMM-based graph aligner and applying it to tasks like variant calling. The khmer platform provides implementations of these streaming algorithms to enable analysis of large genomic and metagenomic datasets.

Lecture 6-computer vision features descriptors matching

This document provides an overview of computer vision techniques for image matching and feature detection. It discusses how features can be detected using measures like the Harris operator that evaluate changes in pixel values when shifting a window over an image. It also covers how to make feature detection scale invariant by finding local maxima across image pyramids. Finally, it introduces the need for feature descriptors to uniquely match detected features between images in a way that is invariant to transformations like rotation, scale, and illumination changes. SIFT is presented as a state-of-the-art approach for building descriptors.

[ICDE 2012] On Top-k Structural Similarity Search

In this talk, we talk about the following classic problem: given a node in a graph, how can we efficiently track the top-k similar nodes regarding this node, by simply checking the graph link structure? This talk is accompanying with the ICDE 2012 paper "On Top-k Structural Similarity Search", which can be found at http://www.cs.ubc.ca/~peil/research.html

An introduction to similarity search and k-nn graphs

Similarity search is an essential component of machine learning algorithms. However, performing efficient similarity search can be extremely challenging, especially if the dataset is distributed between multiple computers, and even more if the similarity measure is not a metric. With the rise of Big Data processing, these challenging datasets are actually more and more common. In this presentation we show how k nearest neighbors (k-nn) graphs can be used to perform similarity search, clustering and anomaly detection.

HalifaxNGGs

The document discusses N-gram graphs, which represent the proximity or co-occurrence of items in a text by modeling them as a graph. An N-gram graph is constructed by extracting n-grams from a text, determining their neighborhood based on a window size, and assigning edge weights based on co-occurrence frequencies. The document outlines the process for constructing N-gram graphs and describes their potential uses, including representing sets of items with a single graph, comparing graphs through clustering, and defining similarity measures between graphs. N-gram graphs aim to capture proximity information in a way that is domain-agnostic, allows different analysis levels, and can represent multiple texts with a single graph structure.

Dagstuhl seminar talk on querying big graphs

The document describes techniques for querying large graph datasets. It discusses how graphs arise in many domains like social networks, biological networks, and knowledge graphs. It outlines some challenges in querying big graphs like heterogeneity, uncertainty, and massive scale. It then presents the author's work on approximate subgraph matching to enable flexible querying of heterogeneous graphs. The key ideas are converting graph structures to vectors to compute similarity, defining a cost function for subgraph matching, and using loopy belief propagation for inference. Experimental results on real datasets demonstrate the effectiveness of the proposed techniques.

Ranking systems

The document discusses PageRank and ranking systems. It presents a set of axioms that any valid ranking system should satisfy, including being independent of vertex names (isomorphism), handling self-edges, handling vote by committees, collapsing nodes, proxies, deletions and duplications. It then proves that PageRank is the only ranking system that satisfies all the axioms, making it a unique representation.

Ranking systems

The document discusses PageRank and ranking systems. It presents a set of axioms that any valid ranking system should satisfy, including being independent of vertex names (isomorphism), handling self-edges, handling vote by committees, collapsing nodes, proxies, deletions and duplications. It then proves that PageRank is the only ranking system that satisfies all the axioms, making it a unique representation.

How to Manage your Research

While traditional scholarship has tended to emphasize thorough reading, reflection, and learning, many researchers nowadays – both in academia and industry – find themselves in a fast-paced and demanding environment. A successful research career crucially depends on management-related skills, and devoting some time to such skills is likely to pay off very quickly. One important example is time and task management, which is critical when there are numerous conflicting demands and opportunities. Another example is being able to cope with challenges and failure. Researchers also need to be creative and bold in defending their ideas. This talk provides an overview of these and other skills that are vital in modern research environments.

Knowlywood: Mining Activity Knowledge from Hollywood Narratives

Knowlywood is a new knowledge graph mined from movies, TV series, and literature. It provides commonsense knowledge about human activities, e.g. participants, preceding and following activities, and so on.

NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...

This document describes a method called High-Order Proximity preserved Embedding (HOPE) to preserve asymmetric transitivity in directed graph embedding. HOPE approximates high-order proximities using a scalable generalized SVD algorithm. Experiments on several datasets show HOPE more accurately reconstructs graphs compared to baselines and better preserves proximity measurements like Katz index and rooted PageRank. Future work will explore nonlinear models to capture more complex directed graph structures.

J. Park, J. Song, ICLR 2022, MLILAB, KAISTAI

This paper proposes GraphENS, a method to synthesize ego networks to address the neighbor memorization problem that causes GNN models to overfit to minor classes in class-imbalanced node classification tasks. GraphENS samples ego networks from minor and target classes, assigns neighbors through sampling, mixes node features based on saliency, and attaches the synthesized ego network to the original graph to construct a balanced graph. Experiments show GraphENS mitigates both node and neighbor memorization, outperforming baselines on citation and co-purchase networks.

Sigmod11 outsource shortest path

The document summarizes research on computing the shortest distance between nodes in a graph while outsourcing the graph to the cloud for computational power and maintaining privacy of the data. The researchers propose (1) transforming the original graph into 1-neighborhood-d-radius graphs for outsourcing, (2) using a greedy algorithm to perform the transformation to minimize overhead, and (3) allowing approximate distances to further reduce overhead. Experiments demonstrate their methods achieve the security and privacy goals with low overhead.

PageRank on an evolving graph - Yanzhao Yang : NOTES

Highlighted notes while research with Prof. Dip Sankar Banerjee, Prof. Kishore Kothapalli:
PageRank on an evolving graph - Yanzhao Yang.
https://theory.utdallas.edu/seminar/G2S13/YY/Pagerank%20on%20evolving%20graph-Yanzhao%20Yang.pdf
Pagerank may be always imprecise, due to lack of knowledge of up-to-date/complete graph. Millions of hyperlinks/social-links modified each day. Which portions of the web should a crawler focus most (probing strategy)? Probing techniques discussed are Random probing, Round-robin probing, Proportional probing (random, proportional to node's pagerank), Priority probing (deterministic, pick node with highest cumulative pagerank sum), Hybrid probing (proportional + round-robin).

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

The document discusses content-based image retrieval. It begins with an overview of the problem of using a query image to retrieve similar images from a large dataset. Common techniques discussed include using SIFT features with bag-of-words models or convolutional neural network (CNN) features. The document outlines the classic SIFT retrieval pipeline and techniques for using features from pre-trained CNNs, such as max-pooling features from convolutional layers or encoding them with VLAD. It also discusses learning image representations specifically for retrieval using methods like the triplet loss to learn an embedding space that clusters similar images. The state-of-the-art methods achieve the best performance by learning global or regional image representations from CNNs trained on large, generated datasetsLens: Data exploration with Dask and Jupyter widgets

Lens is an open source Python library for automated data exploration of large datasets using Dask. It computes summary statistics and relationships between columns in a dataset. The results are serialized to JSON for interactive exploration through Jupyter widgets or a web UI. Dask allows the computations to run in parallel across a cluster for scalability. Lens integrates with the SherlockML platform to analyze all datasets uploaded.

Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...

YouTube Link: https://youtu.be/PbCl67GY1ck
** Machine Learning Engineer Masters Program: https://www.edureka.co/masters-program/machine-learning-engineer-training **
In this Edureka Session on Breadth-First Search Algorithm, we will discuss the logic behind graph traversal methods and use examples to understand the working of the Breadth-First Search algorithm.
Here’s a list of topics covered in this session:
1. Introduction To Graph Traversal
2. What is the Breadth-First Search?
3. Understanding the Breadth-First Search algorithm with an example
4. Breadth-First Search Algorithm Pseudocode
5. Applications Of Breadth-First Search
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in

SVD and the Netflix Dataset

Short summary and explanation of LSI (SVD) and how it can be applied to recommendation systems and the Netflix dataset in particular.

Data Mining Seminar - Graph Mining and Social Network Analysis

Delivered a formal presentation on course material for the Data Mining (EECS 4412) course at York University, Canada, about graph mining. Graphs have become increasingly important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. The formal seminar was 50 to 60 minutes followed by 10 to 20 minutes for questions.
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412
https://wiki.eecs.yorku.ca/course_archive/2014-15/F/4412/lectures

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION

This document proposes a technique called dedensification to improve the scalability of graph pattern matching queries over large graphs. Dedensification compresses graphs by reducing the number of connections to high-degree nodes through the use of compressor nodes. This lossless compression technique allows graph pattern matching queries to be rewritten and executed more efficiently on the compressed graph. Experimental results on real-world graphs demonstrate that dedensification can significantly improve query performance compared to executing queries on the original uncompressed graphs.

cutnpeel_wsdm2022_slide.pdf

The document proposes a method called CutNPeel to efficiently find a set of near bi-cliques in dynamic graphs that satisfy criteria of being precise, exhaustive, and concise. CutNPeel works in two steps - first it partitions the graph, then peels good near bi-cliques from each partition using a top-down search approach that iteratively removes sparse nodes. The goal is to minimize a quality function measuring the total encoding cost of the near bi-cliques found.

Graph Sample and Hold: A Framework for Big Graph Analytics

Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs(e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy.While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we pro-pose a generic stream sampling framework for big-graph analytics,called Graph Sample and Hold (gSH), which samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state in memory. We use a Horvitz-Thompson construction in conjunction with a scheme that samples arriving edges without adjacencies to previously sampled edges with probability p and holds edges with adjacencies with probability q. Our sample and hold framework facilitates the accurate estimation of subgraph patterns by enabling the dependence of the sampling process to vary based on previous history. Within our framework, we show how to produce statistically unbiased estimators for various graph properties from the sample. Given that the graph analytic swill run on a sample instead of the whole population, the runtime complexity is kept under control. Moreover, given that the estimators are unbiased, the approximation error is also kept under control.

2015 vancouver-vanbug

This document discusses new directions for the khmer bioinformatics platform, including developing semi-streaming algorithms for sequence analysis using k-mers. Digital normalization is presented as an initial approach that compresses sequencing data, though it discards information. Later work introduced a two-pass semi-streaming framework using saturation detection to enable error correction and variant calling using minimal memory. Current work includes developing a pair-HMM-based graph aligner and applying it to tasks like variant calling. The khmer platform provides implementations of these streaming algorithms to enable analysis of large genomic and metagenomic datasets.

Lecture 6-computer vision features descriptors matching

This document provides an overview of computer vision techniques for image matching and feature detection. It discusses how features can be detected using measures like the Harris operator that evaluate changes in pixel values when shifting a window over an image. It also covers how to make feature detection scale invariant by finding local maxima across image pyramids. Finally, it introduces the need for feature descriptors to uniquely match detected features between images in a way that is invariant to transformations like rotation, scale, and illumination changes. SIFT is presented as a state-of-the-art approach for building descriptors.

[ICDE 2012] On Top-k Structural Similarity Search

In this talk, we talk about the following classic problem: given a node in a graph, how can we efficiently track the top-k similar nodes regarding this node, by simply checking the graph link structure? This talk is accompanying with the ICDE 2012 paper "On Top-k Structural Similarity Search", which can be found at http://www.cs.ubc.ca/~peil/research.html

An introduction to similarity search and k-nn graphs

Similarity search is an essential component of machine learning algorithms. However, performing efficient similarity search can be extremely challenging, especially if the dataset is distributed between multiple computers, and even more if the similarity measure is not a metric. With the rise of Big Data processing, these challenging datasets are actually more and more common. In this presentation we show how k nearest neighbors (k-nn) graphs can be used to perform similarity search, clustering and anomaly detection.

HalifaxNGGs

The document discusses N-gram graphs, which represent the proximity or co-occurrence of items in a text by modeling them as a graph. An N-gram graph is constructed by extracting n-grams from a text, determining their neighborhood based on a window size, and assigning edge weights based on co-occurrence frequencies. The document outlines the process for constructing N-gram graphs and describes their potential uses, including representing sets of items with a single graph, comparing graphs through clustering, and defining similarity measures between graphs. N-gram graphs aim to capture proximity information in a way that is domain-agnostic, allows different analysis levels, and can represent multiple texts with a single graph structure.

Dagstuhl seminar talk on querying big graphs

The document describes techniques for querying large graph datasets. It discusses how graphs arise in many domains like social networks, biological networks, and knowledge graphs. It outlines some challenges in querying big graphs like heterogeneity, uncertainty, and massive scale. It then presents the author's work on approximate subgraph matching to enable flexible querying of heterogeneous graphs. The key ideas are converting graph structures to vectors to compute similarity, defining a cost function for subgraph matching, and using loopy belief propagation for inference. Experimental results on real datasets demonstrate the effectiveness of the proposed techniques.

Ranking systems

The document discusses PageRank and ranking systems. It presents a set of axioms that any valid ranking system should satisfy, including being independent of vertex names (isomorphism), handling self-edges, handling vote by committees, collapsing nodes, proxies, deletions and duplications. It then proves that PageRank is the only ranking system that satisfies all the axioms, making it a unique representation.

NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...

NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...

J. Park, J. Song, ICLR 2022, MLILAB, KAISTAI

J. Park, J. Song, ICLR 2022, MLILAB, KAISTAI

Sigmod11 outsource shortest path

Sigmod11 outsource shortest path

PageRank on an evolving graph - Yanzhao Yang : NOTES

PageRank on an evolving graph - Yanzhao Yang : NOTES

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

Lens: Data exploration with Dask and Jupyter widgets

Lens: Data exploration with Dask and Jupyter widgets

Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...

Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...

SVD and the Netflix Dataset

SVD and the Netflix Dataset

Data Mining Seminar - Graph Mining and Social Network Analysis

Data Mining Seminar - Graph Mining and Social Network Analysis

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION

cutnpeel_wsdm2022_slide.pdf

cutnpeel_wsdm2022_slide.pdf

Graph Sample and Hold: A Framework for Big Graph Analytics

Graph Sample and Hold: A Framework for Big Graph Analytics

2015 vancouver-vanbug

2015 vancouver-vanbug

Lecture 6-computer vision features descriptors matching

Lecture 6-computer vision features descriptors matching

[ICDE 2012] On Top-k Structural Similarity Search

[ICDE 2012] On Top-k Structural Similarity Search

An introduction to similarity search and k-nn graphs

An introduction to similarity search and k-nn graphs

HalifaxNGGs

HalifaxNGGs

Dagstuhl seminar talk on querying big graphs

Dagstuhl seminar talk on querying big graphs

Ranking systems

Ranking systems

Ranking systems

Ranking systems

How to Manage your Research

While traditional scholarship has tended to emphasize thorough reading, reflection, and learning, many researchers nowadays – both in academia and industry – find themselves in a fast-paced and demanding environment. A successful research career crucially depends on management-related skills, and devoting some time to such skills is likely to pay off very quickly. One important example is time and task management, which is critical when there are numerous conflicting demands and opportunities. Another example is being able to cope with challenges and failure. Researchers also need to be creative and bold in defending their ideas. This talk provides an overview of these and other skills that are vital in modern research environments.

Knowlywood: Mining Activity Knowledge from Hollywood Narratives

Knowlywood is a new knowledge graph mined from movies, TV series, and literature. It provides commonsense knowledge about human activities, e.g. participants, preceding and following activities, and so on.

Learning Multilingual Semantics from Big Data on the Web

This document summarizes Gerard de Melo's presentation on learning multilingual semantics from big data on the web. It discusses how lexical and taxonomic knowledge can be extracted at large scale from online resources like Wiktionary, Wikipedia, and WordNet. Methods are presented for merging structured data like knowledge graphs and integrating taxonomies across languages using techniques like linear program relaxation and belief propagation. The goal is to build large yet reasonably clean multilingual knowledge bases to power applications in areas like semantic search and the digital humanities.

From Big Data to Valuable Knowledge

Big Data is more than just hype. The vast quantities of data now available have led to two important challenges that are fundamentally changing the way we develop data-intensive systems. The first is at the data management level, where we are finally moving beyond vanilla MapReduce towards infrastructure that allows for more flexible data processing pipelines. The second challenge is transitioning from quantity to quality and distilling genuine knowledge from the raw data. For this, we still need innovative algorithms that facilitate data cleaning, unsupervised and semi-supervised learning, knowledge harvesting, and knowledge integration. Examples include data integration, and large-scale knowledge bases such as UWN/MENTA, and collections of commonsense knowledge such as WebChild.

Scalable Learning Technologies for Big Data Mining

These are slides of a tutorial by Gerard de Melo and Aparna Varde presented at the DASFAA 2015 conference.
As data expands into big data, enhanced or entirely novel data mining algorithms often become necessary. The real value of big data is often only exposed when we can adequately mine and learn from it. We provide an overview of new scalable techniques for knowledge discovery. Our focus is on the areas of cloud data mining and machine learning, semi-supervised processing, and deep learning. We also give practical advice for choosing among different methods and discuss open research problems and concerns.

Searching the Web of Data (Tutorial)

These are slides of a tutorial at ECIR by Gerard de Melo and Katja Hose.
Search is currently undergoing a major paradigm shift away from the traditional document-centric “10 blue links” towards more explicit and actionable information. Recent advances in this area are Google’s Knowledge Graph, Virtual Personal Assistants such as Siri and Google Now, as well as the now ubiquitous entity-oriented vertical search results for places, products, etc. Apart from novel query understanding methods, these developments are largely driven by structured data that is blended into the Web Search experience. We discuss efficient indexing and query processing techniques to work with large amounts of structured data. Finally, we present query interpretation and understanding methods to map user queries to these structured data sources.

From Linked Data to Tightly Integrated Data

Invited Talk at the 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing. Reykjavik, Iceland, 27th May 2014
The ideas behind the Web of Linked Data have great allure. Apart from the prospect of large amounts of freely available data, we are also promised nearly effortless interoperability. Common data formats and protocols have indeed made it easier than ever to obtain and work with information from different sources simultaneously, opening up new opportunities in linguistics, library science, and many other areas.
In this talk, however, I argue that the true potential of Linked Data can only be appreciated when extensive cross-linkage and integration engenders an even higher degree of interconnectedness. This can take the form of shared identifiers, e.g. those based on Wikipedia and WordNet, which can be used to describe numerous forms of linguistic and commonsense knowledge. An alternative is to rely on sameAs and similarity links, which can automatically be discovered using scalable approaches like the LINDA algorithm but need to be interpreted with great care, as we have observed in experimental studies. A closer level of linkage is achieved when resources are also connected at the taxonomic level, as exemplified by the MENTA approach to taxonomic data integration. Such integration means that one can buy into ecosystems already carrying a range of valuable pre-existing assets. Even more tightly integrated resources like Lexvo.org combine triples from multiple sources into unified, coherent knowledge bases. Finally, I also comment on how to address some remaining challenges that are still impeding a more widespread adoption of Linked Data on the Web. In the long run, I believe that such steps will lead us to significantly more tightly integrated Linked Data.

Information Extraction from Web-Scale N-Gram Data

Search engines are increasingly relying on structured data to provide direct answers to certain types of queries. However, extracting such structured data from text is challenging, especially due to the scarcity of explicitly expressed knowledge. Even when relying on large document collections, pattern-based information extraction approaches typically expose only insufficient amounts of information. This paper evaluates to what extent n-gram statistics, derived from volumes of texts several orders of magnitude larger than typical corpora, can allow us to overcome this bottleneck. An extensive experimental evaluation is provided for three different binary relations, comparing different sources of n-gram data as well as different learning algorithms.

UWN: A Large Multilingual Lexical Knowledge Base

We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

Multilingual Text Classification using Ontologies

In this paper, we investigate strategies for automatically classifying documents in different languages thematically, geographically or according to other criteria. A novel linguistically motivated text representation scheme is presented that can be used with machine learning algorithms in order to learn classifications from pre-classified examples and then automatically classify documents that might be provided in entirely different languages. Our approach makes use of ontologies and lexical resources but goes beyond a simple mapping from terms to concepts by fully exploiting the external knowledge manifested in such resources and mapping to entire regions of concepts. For this, a graph traversal algorithm is used to explore related concepts that might be relevant. Extensive testing has shown that our methods lead to significant improvements compared to existing approaches.

Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

Example sentences provide an intuitive means of grasping the meaning of a word, and are frequently used to complement conventional word definitions. When a word has multiple meanings, it is useful to have example sentences for specific senses (and hence definitions) of that word rather than indiscriminately lumping all of them together. In this paper, we investigate to what extent such sense-specific example sentences can be extracted from parallel corpora using lexical knowledge bases for multiple languages as a sense index. We use word sense disambiguation heuristics and a cross-lingual measure of semantic similarity to link example sentences to specific word senses. From the sentences found for a given sense, an algorithm then selects a smaller subset that can be presented to end users, taking into account both representativeness and diversity. Preliminary results show that a precision of around 80% can be obtained for a reasonable number of word senses, and that the subset selection yields convincing results.

Towards a Universal Wordnet by Learning from Combined Evidence

Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their
meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high
level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.

Not Quite the Same: Identity Constraints for the Web of Linked Data

Linked Data is based on the idea that information from different sources can flexibly be connected to enable novel applications that individual datasets do not support on their own. This hinges upon the existence of links between datasets that would otherwise be isolated. The most notable form, sameAs links, are intended to express that two identifiers are equivalent in all respects. Unfortunately, many existing ones do not reflect such genuine identity. This study provides a novel method to analyse this phenomenon, based on a thorough theoretical analysis, as well as a novel graph-based method to resolve such issues to some extent. Our experiments on a representative Web-scale set of sameAs links from the Web of Data show that our method can identify and remove hundreds of thousands of constraint violations.

Good, Great, Excellent: Global Inference of Semantic Intensities

Adjectives like good, great, and excellent are similar in meaning, but differ in intensity. Intensity order information is very useful for language learners as well as in several NLP tasks, but is missing in most lexical resources (dictionaries, WordNet, and thesauri). In this paper, we present a primarily unsupervised approach that uses semantics from Web-scale data (e.g., phrases like good but not excellent) to rank words by assigning them positions on a continuous scale. We rely on Mixed Integer Linear Programming to jointly determine the ranks, such that individual decisions benefit from global information. When ranking English adjectives, our global algorithm achieves substantial improvements over previous work on both pairwise and rank correlation metrics (specifically, 70% pairwise accuracy as compared to only 56% by previous work). Moreover, our approach can incorporate external synonymy information (increasing its pairwise accuracy to 78%) and extends easily to new languages.

YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology

The YAGO-SUMO integration incorporates millions of entities from YAGO, which is based on Wikipedia and WordNet, into the Suggested Upper Merged Ontology (SUMO), a highly axiomatized formal upper ontology. With the combined force of the two ontologies, an enormous, unprecedented corpus of formalized world knowledge is available for automated processing and reasoning, providing information about millions of entities such as people, cities, organizations, and companies.
Compared to the original YAGO, more advanced reasoning is possible due to the axiomatic knowledge delivered by SUMO. A reasoner can conclude e.g. that a child of a human must also be a human and cannot be born before its parents, or that two people sharing the same parents must be siblings.

How to Manage your Research

How to Manage your Research

Knowlywood: Mining Activity Knowledge from Hollywood Narratives

Knowlywood: Mining Activity Knowledge from Hollywood Narratives

Learning Multilingual Semantics from Big Data on the Web

Learning Multilingual Semantics from Big Data on the Web

From Big Data to Valuable Knowledge

From Big Data to Valuable Knowledge

Scalable Learning Technologies for Big Data Mining

Scalable Learning Technologies for Big Data Mining

Searching the Web of Data (Tutorial)

Searching the Web of Data (Tutorial)

From Linked Data to Tightly Integrated Data

From Linked Data to Tightly Integrated Data

Information Extraction from Web-Scale N-Gram Data

Information Extraction from Web-Scale N-Gram Data

UWN: A Large Multilingual Lexical Knowledge Base

UWN: A Large Multilingual Lexical Knowledge Base

Multilingual Text Classification using Ontologies

Multilingual Text Classification using Ontologies

Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

Towards a Universal Wordnet by Learning from Combined Evidence

Towards a Universal Wordnet by Learning from Combined Evidence

Not Quite the Same: Identity Constraints for the Web of Linked Data

Not Quite the Same: Identity Constraints for the Web of Linked Data

Good, Great, Excellent: Global Inference of Semantic Intensities

Good, Great, Excellent: Global Inference of Semantic Intensities

YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology

YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology

Nordic Marketo Engage User Group_June 13_ 2024.pptx

Slides from event

5th LF Energy Power Grid Model Meet-up Slides

5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.

UI5 Controls simplified - UI5con2024 presentation

UI5con 2024 presentation

Programming Foundation Models with DSPy - Meetup Slides

Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.

How to use Firebase Data Connect For Flutter

This is how to use data connect in flutter.

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.

Mariano G Tinti - Decoding SpaceX

A project that aims to unveil some insights from SpaceX

Generating privacy-protected synthetic data using Secludy and Milvus

During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.

National Security Agency - NSA mobile device best practices

Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.

Fueling AI with Great Data with Airbyte Webinar

This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.

Recommendation System using RAG Architecture

Concept of how to create a RAG arhcitecture

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.Artificial Intelligence for XMLDevelopment

In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.

20240609 QFM020 Irresponsible AI Reading List May 2024

Everything I found interesting about the irresponsible use of machine intelligence in May 2024

WeTestAthens: Postman's AI & Automation Techniques

Postman's AI and Automation Techniques

Building Production Ready Search Pipelines with Spark and Milvus

Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.

Nordic Marketo Engage User Group_June 13_ 2024.pptx

Nordic Marketo Engage User Group_June 13_ 2024.pptx

5th LF Energy Power Grid Model Meet-up Slides

5th LF Energy Power Grid Model Meet-up Slides

UI5 Controls simplified - UI5con2024 presentation

UI5 Controls simplified - UI5con2024 presentation

Programming Foundation Models with DSPy - Meetup Slides

Programming Foundation Models with DSPy - Meetup Slides

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Monitoring and Managing Anomaly Detection on OpenShift.pdf

How to use Firebase Data Connect For Flutter

How to use Firebase Data Connect For Flutter

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

Mariano G Tinti - Decoding SpaceX

Mariano G Tinti - Decoding SpaceX

Generating privacy-protected synthetic data using Secludy and Milvus

Generating privacy-protected synthetic data using Secludy and Milvus

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

National Security Agency - NSA mobile device best practices

National Security Agency - NSA mobile device best practices

Fueling AI with Great Data with Airbyte Webinar

Fueling AI with Great Data with Airbyte Webinar

Recommendation System using RAG Architecture

Recommendation System using RAG Architecture

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

Artificial Intelligence for XMLDevelopment

Artificial Intelligence for XMLDevelopment

20240609 QFM020 Irresponsible AI Reading List May 2024

20240609 QFM020 Irresponsible AI Reading List May 2024

WeTestAthens: Postman's AI & Automation Techniques

WeTestAthens: Postman's AI & Automation Techniques

Building Production Ready Search Pipelines with Spark and Milvus

Building Production Ready Search Pipelines with Spark and Milvus

- 1. Link Prediction via Subgraph Embedding-Based Convex Matrix Completion Link Prediction via Subgraph Embedding-Based Convex Matrix Completion Zhu Cao Tsinghua University Linlin Wang Tsinghua University Gerard de Melo http://gerard.demelo.org Rutgers University
- 2. Social NetworksSocial Networks Image: CC-BY-SA by Tanja Cappell. https://www.flickr.com/photos/frauhoelle/8464661409
- 5. Traditional Neighbour Overlap Methods Traditional Neighbour Overlap Methods ? Examples: ● Common Neighbours ● Adamic/Adar (frequency weighted) ● Jaccard coefficient based on neighbours
- 6. Traditional Neighbour Overlap Methods Traditional Neighbour Overlap Methods ? Examples: ● Common Neighbours ● Adamic/Adar (frequency weighted) ● Jaccard coefficient based on neighbours
- 7. Previous Work: Connectivity-Based Methods Previous Work: Connectivity-Based Methods ? Examples: ● PageRank ● SimRank ● Hitting Time ● Commute Time (roundtrip) ● Katz
- 8. Learning RepresentationsLearning Representations Image: Adapted from Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations
- 9. SVDSVD E.g. Graph's Adjacency Matrix Image: Adapted from Tim Roughgarden & Gregory Valiant
- 10. SVDSVD Left Singular Vectors Right Singular Vectors Singular Values E.g. Graph's Adjacency Matrix Image: Adapted from Tim Roughgarden & Gregory Valiant
- 11. Low-Rank Approximation via SVD Low-Rank Approximation via SVD ≈ k Left Singular Vectors k Right Singular Vectors k Singular ValuesE.g. Graph's Adjacency Matrix Image: Adapted from Tim Roughgarden & Gregory Valiant k k k
- 12. Word Vector Representations: word2vec Word Vector Representations: word2vec 0.02 0.12 0.04 ... 0.03 0.08 Prediction … the local population of sparrows ... Large-Scale Text sparrows population local ... word2vec Skip-Gram Model
- 13. Previous Work: DeepWalk Previous Work: DeepWalk 0.02 0.12 0.04 ... 0.03 0.08 Prediction … n15 n382 n49 n729 n23 ... Social Network n49 n729 n382 ... word2vec Skip-Gram Model Random Walk Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations
- 14. Previous Work: node2vec Previous Work: node2vec Grover & Leskovec (2016). node2vec: Scalable Feature Learning for Networks. Biased Random Walks
- 17. Our Approach: SEMAC Our Approach: SEMAC Step 1: Run Breadth-First Search for different depths d from each node v Step 1: Run Breadth-First Search for different depths d from each node v
- 22. Our Approach: SEMAC Our Approach: SEMAC Step 2: Create a new graph G' with edges between subgraphs Step 2: Create a new graph G' with edges between subgraphs
- 23. Our Approach: SEMAC Our Approach: SEMAC Step 2: Create a new graph G' with edges between subgraphs Step 2: Create a new graph G' with edges between subgraphs Edges: a) same node, depth ± 1 b) same depth neighbour node
- 24. Our Approach: SEMAC Our Approach: SEMAC Edges: a) same node, depth ± 1 b) same depth, neighbour node Step 2: Create a new graph G' with edges between subgraphs Step 2: Create a new graph G' with edges between subgraphs
- 25. Our Approach: SEMAC Our Approach: SEMAC Edges: a) same node, depth ± 1 b) same depth, neighbour node Step 2: Create a new graph G' with edges between subgraphs Step 2: Create a new graph G' with edges between subgraphs
- 26. Our Approach: SEMAC Our Approach: SEMAC Step 3: Learn subgraph embeddings using G' Step 3: Learn subgraph embeddings using G'
- 27. Our Approach: SEMAC Our Approach: SEMAC Step 3: Learn subgraph embeddings using G' Step 3: Learn subgraph embeddings using G' Nuclear Norm Nuclear Norm Minimization Find W that minimizes
- 28. Our Approach: SEMAC Our Approach: SEMAC Step 3: Learn subgraph embeddings using G' Step 3: Learn subgraph embeddings using G' Nuclear Norm Nuclear Norm Minimization Find W that minimizes Frobenius NormCompare only non-zero (observed) entries (unlike SVD)
- 29. Our Approach: SEMAC Our Approach: SEMAC Result of Step 3: Embedding for every subgraph Result of Step 3: Embedding for every subgraph 0.32 ... 0.27 0.81 ... 0.12
- 30. Our Approach: SEMAC Our Approach: SEMAC Step 4: Create node embeddings Concatenate embeddings of subgraphs for different depths d Step 4: Create node embeddings Concatenate embeddings of subgraphs for different depths d 0.32 ... 0.27 0.81 ... 0.12 ... 0.32 ... 0.27 0.81 ... 0.12
- 31. Link PredictionLink Prediction ? 0.32 0.14 0.03 ... 0.18 0.09 0.28 0.11 0.08 ... 0.24 0.13 Step 5: Link prediction via Vector Cosine Step 5: Link prediction via Vector Cosine
- 32. ExperimentsExperiments Image: CC-BY by Marc Smith with NodeXL. https://www.flickr.com/photos/marc_smith/6871711979
- 34. ExperimentsExperiments Facebook (McAuley & Leskovec 2012) Small connected component subsets of Facebook
- 35. ExperimentsExperiments Wikipedia Coauthorship (Leskovec & Krevl 2014)
- 36. ExperimentsExperiments Wikipedia Coauthorship Protein-Protein Interactions (Breitkreutz et al. 2008) (Leskovec & Krevl 2014)
- 37. ExperimentsExperiments AUC based on 5-fold Cross-Validation
- 38. SummarySummarySummarySummary Goal: State-of-the-Art Link Prediction Consider Subgraphs ► Different depths ► Graph of subgraphs with links to related subgraphs Create Representations ► Nuclear-Norm Minimization to better account for unobserved links ► Concatenate, Cosine Get in Touch! http://gerard.demelo.org gdm@demelo.org Get in Touch! http://gerard.demelo.org gdm@demelo.org Thank you! 0.32 0.14 ... 0.09 0.28 0.11 ... 0.13
- 39. AcknowledgmentsAcknowledgmentsAcknowledgmentsAcknowledgments User Icons by Freepik (CC-BY) https://www.freepik.com Title Image: CC-BY by Chris Potter https://www.flickr.com/photos/865 Thank you! 0.32 0.14 ... 0.09 0.28 0.11 ... 0.13