SIGMOD 2018 GRADES-MDA workshop demo talk.
Gremlinator is a, first of its kind, SPARQL-to-Gremlin traversal compiler based on the Apache TinkerPop framework. It allows querying Property Graphs via SPARQL, thus avoiding need for undergoing the steep learning curve in order to learn a new Graph Query Language.
The document discusses how web search engines work. It covers text search and indexing, including inverted indexes and document preprocessing. It then discusses query processing, relevance ranking using the vector space model, and link analysis techniques like PageRank. It also covers challenges in ranking web pages and measures for evaluating search engine performance.
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
Efficient federated query processing is of significant importance to tame the large amount of data available on the Web of Data. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising source selection approaches beyond triple pattern-wise source selection has not received much attention. This work presents HiBISCuS, a novel hypergraph-based source selection approach to federated SPARQL querying. Our approach can be directly combined with existing SPARQL query federation engines to achieve the same recall while querying fewer data sources. We extend three well-known SPARQL query federation engines with HiBISCus and compare our extensions with the original approaches on FedBench. Our evaluation shows that HiBISCuS can efficiently reduce the total number of sources selected without losing recall. Moreover, our approach significantly reduces the execution time of the selected engines on most of the benchmark queries.
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesMuhammad Saleem
This document describes SAFE (Policy Aware SPARQL Query Federation Over RDF Data Cubes), a system for securely querying distributed RDF data cubes. SAFE uses source selection, access policy filtering, and query rewriting to enable policy-aware querying over clinical data from multiple sources while preserving privacy. It selects relevant data sources for a query based on triple patterns and an index, filters sources based on access policies for the user, and rewrites the query to retrieve and integrate results from authorized sources only. Evaluation shows SAFE can efficiently perform source selection and query execution over large real-world datasets compared to existing federated query systems.
Efficient source selection for sparql endpoint federationMuhammad Saleem
Muhammad Saleem defended his PhD thesis on efficient source selection for SPARQL endpoint query federation. The thesis addressed five main research questions: (1) how to perform join-aware source selection while ensuring complete result sets, (2) how to perform duplicate-aware source selection, (3) how to perform policy-aware source selection, (4) how to perform data distribution-aware source selection, and (5) how to design comprehensive benchmarks for federated SPARQL queries and triple stores. The thesis proposed four source selection algorithms (HIBISCUS, DAW, SAFE, TopFed) and two benchmarking systems (LargeRDFBench, FEASIBLE) to address the identified
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
The document provides a refresher on RDF and property graphs, comparing their models and query languages. It debunks some common misconceptions about RDF versus property graphs, noting that RDF does not impose a particular storage and can be stored in graph databases. Semantics in RDF are just optional rules that are difficult to implement effectively. The nature of the data and intended usage should be considered rather than assuming one model is inherently better for unstructured or semantic data.
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
This document describes BioFed, a system for federated query processing over large biomedical datasets. It discusses how BioFed selects relevant data sources for query subpatterns and rewrites queries into a federated form using SPARQL 1.1's SERVICE clause. Source selection is done by identifying sources that contain predicate terms and then pruning based on subject/object bindings. Queries are rewritten by grouping subpatterns with the same source and using UNION and SERVICE for patterns with multiple sources. The document concludes by mentioning an evaluation of BioFed on a federated benchmark and providing a link to demo the system.
Federated SPARQL query processing over the Web of DataMuhammad Saleem
The document discusses approaches for federating SPARQL queries over the web of data. It describes SPARQL endpoint federation, linked data federation, and distributed hash tables approaches. It also discusses techniques for optimizing query federation, including query rewriting, source selection, join order selection, and join implementations. Source selection algorithms discussed include index-free using SPARQL ASK queries, index-only using data summaries, and hybrid approaches.
The document describes federated SPARQL query processing over the Web of Data. It discusses different approaches to SPARQL query federation including SPARQL endpoint federation, linked data federation, linked data fragments federation, and hybrid approaches. It also covers topics related to federated query optimization such as source selection, join order selection, and join implementations. Source selection algorithms discussed include index-free, index-only, and hybrid approaches.
The document discusses how web search engines work. It covers text search and indexing, including inverted indexes and document preprocessing. It then discusses query processing, relevance ranking using the vector space model, and link analysis techniques like PageRank. It also covers challenges in ranking web pages and measures for evaluating search engine performance.
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
Efficient federated query processing is of significant importance to tame the large amount of data available on the Web of Data. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising source selection approaches beyond triple pattern-wise source selection has not received much attention. This work presents HiBISCuS, a novel hypergraph-based source selection approach to federated SPARQL querying. Our approach can be directly combined with existing SPARQL query federation engines to achieve the same recall while querying fewer data sources. We extend three well-known SPARQL query federation engines with HiBISCus and compare our extensions with the original approaches on FedBench. Our evaluation shows that HiBISCuS can efficiently reduce the total number of sources selected without losing recall. Moreover, our approach significantly reduces the execution time of the selected engines on most of the benchmark queries.
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesMuhammad Saleem
This document describes SAFE (Policy Aware SPARQL Query Federation Over RDF Data Cubes), a system for securely querying distributed RDF data cubes. SAFE uses source selection, access policy filtering, and query rewriting to enable policy-aware querying over clinical data from multiple sources while preserving privacy. It selects relevant data sources for a query based on triple patterns and an index, filters sources based on access policies for the user, and rewrites the query to retrieve and integrate results from authorized sources only. Evaluation shows SAFE can efficiently perform source selection and query execution over large real-world datasets compared to existing federated query systems.
Efficient source selection for sparql endpoint federationMuhammad Saleem
Muhammad Saleem defended his PhD thesis on efficient source selection for SPARQL endpoint query federation. The thesis addressed five main research questions: (1) how to perform join-aware source selection while ensuring complete result sets, (2) how to perform duplicate-aware source selection, (3) how to perform policy-aware source selection, (4) how to perform data distribution-aware source selection, and (5) how to design comprehensive benchmarks for federated SPARQL queries and triple stores. The thesis proposed four source selection algorithms (HIBISCUS, DAW, SAFE, TopFed) and two benchmarking systems (LargeRDFBench, FEASIBLE) to address the identified
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
The document provides a refresher on RDF and property graphs, comparing their models and query languages. It debunks some common misconceptions about RDF versus property graphs, noting that RDF does not impose a particular storage and can be stored in graph databases. Semantics in RDF are just optional rules that are difficult to implement effectively. The nature of the data and intended usage should be considered rather than assuming one model is inherently better for unstructured or semantic data.
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
This document describes BioFed, a system for federated query processing over large biomedical datasets. It discusses how BioFed selects relevant data sources for query subpatterns and rewrites queries into a federated form using SPARQL 1.1's SERVICE clause. Source selection is done by identifying sources that contain predicate terms and then pruning based on subject/object bindings. Queries are rewritten by grouping subpatterns with the same source and using UNION and SERVICE for patterns with multiple sources. The document concludes by mentioning an evaluation of BioFed on a federated benchmark and providing a link to demo the system.
Federated SPARQL query processing over the Web of DataMuhammad Saleem
The document discusses approaches for federating SPARQL queries over the web of data. It describes SPARQL endpoint federation, linked data federation, and distributed hash tables approaches. It also discusses techniques for optimizing query federation, including query rewriting, source selection, join order selection, and join implementations. Source selection algorithms discussed include index-free using SPARQL ASK queries, index-only using data summaries, and hybrid approaches.
The document describes federated SPARQL query processing over the Web of Data. It discusses different approaches to SPARQL query federation including SPARQL endpoint federation, linked data federation, linked data fragments federation, and hybrid approaches. It also covers topics related to federated query optimization such as source selection, join order selection, and join implementations. Source selection algorithms discussed include index-free, index-only, and hybrid approaches.
The document provides an overview of the semantic web including its goals of making data meaningful and discoverable. It discusses approaches to building the semantic web such as RDF, RDFS, OWL, and SPARQL. It also covers microformats as a more practical approach and provides examples of using RDF, OWL, SPARQL, and various microformats.
FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte
The final slides of our talk about FedX at the 10th International Semantic Web Conference in Bonn. For details about FedX see http://www.fluidops.com/fedx/
Visualising the Australian open data and research data landscapeJonathan Yu
"Visualising the Australian open data and research data landscape" at C3DIS May 2018 in Melbourne. In this talk, we presented work around the visualisation of an survey of open government and research data in Australia. This features a first attempt at formalising a quantitative based approach to measuring the data ecosystem in Australia.
Mapping Hierarchical Sources into RDF using the RML Mapping Languageandimou
Incorporating structured data in the Linked Data cloud is still complicated, despite the numerous existing tools. In particular, hierarchical structured data (e.g., JSON) are underrepresented, due to their processing complexity. A uniform mapping formalisation for data in different formats, which would enable reuse and exchange between tools and applied data, is missing. This paper describes a novel approach of mapping heterogeneous and hierarchical data sources into RDF using the RML mapping language, an extension over R2RML (the W3C standard for mapping relational databases into RDF). To facilitate those mappings, we present a toolset for producing RML mapping files using the Karma data modelling tool, and for consuming them using a prototype RML processor. A use case shows how RML facilitates the mapping rules’ definition and execution to map several heterogeneous sources.
http://rml.io
https://github.com/mmlab/RMLProcessor
The document discusses the Semantic Web and Linked Data. It provides an overview of RDF syntaxes, storage and querying technologies for the Semantic Web. It also discusses issues around scalability and reasoning over large amounts of semantic data. Examples are provided to illustrate SPARQL querying of RDF data, including graph patterns, conjunctions, optional patterns and value testing.
Boston Globe investigative reporter Todd Wallack prepared this presentation on finding data-driven enterprise stories off your beat for journalists attending New England NewsTrain on Oct. 14, 2017. It is accompanied by a handout: Data-driven enterprise. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info: http://bit.ly/NewsTrain
The document discusses querying live linked data from millions of diverse data sources on the web. It presents different approaches for source selection when querying over dynamic linked data, including using indexes, data summaries, and direct execution. Evaluation of the approaches shows that combining querying of static RDF stores and the live web through source selection dynamics can improve query time and return fresher results.
The document presents a method for clustering and exploring search results using timelines. It describes annotating documents with temporal metadata, constructing time outlines from the metadata to organize search results chronologically, and clustering documents based on time granularity. An evaluation using Amazon Mechanical Turk found the method improved search result relevance by adding temporal context and snippets.
The new RDA: resource description in libraries and beyond / Gordon DunsireCILIP MDG
This document discusses the new RDA (Resource Description and Access), which provides data elements, guidelines and instructions for creating metadata for library and cultural heritage resources. Key points:
- The RDA Toolkit provides user-focused elements, guidelines and instructions. The RDA Registry provides infrastructure for well-formed, linked RDA data applications.
- There are now 13 entities and over 1700 elements in RDA. Elements are now the main unit of focus and have standard structure/layout.
- Recording methods have been extended to all elements and now make the linked data method explicit. Instructions are now more optional to accommodate local practice.
- Effective description requires choosing appropriate entities and elements based on an application profile
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfHarsh Thakkar
Knowledge graphs have become popular over the past decade and frequently rely on the Resource Description Framework (RDF) or property graph databases as data models. We present, the first translator from SPARQL -- the W3C standardised language for RDF -- and Gremlin -- a popular property graph traversal language. Gremlinator translates SPARQL queries to Gremlin path traversals for executing graph pattern matching queries over graph databases.
This allows a user, who is well versed in SPARQL, to access and query a wide variety of Graph Data Management Systems (DMSs) avoiding the steep learning curve for adapting to a new Graph Query Language (GQL). Gremlin is a graph computing system agnostic traversal language (covering both OLTP graph database or OLAP graph processors), making it a desirable choice for supporting interoperability for querying Graph DMSs.
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
The document discusses developing an open benchmarking framework called LITMUS to evaluate diverse data management systems (DMSs) in a standardized way. It addresses key challenges in 1) converting between data models like RDF and property graphs, 2) translating queries between languages like SPARQL and Gremlin, and 3) selecting appropriate key performance indicators (KPIs). LITMUS is designed with modules for data integration, querying, profiling system performance, and analyzing results. Addressing the challenges of data conversion, query translation, and metrics selection is needed to realize LITMUS' goal of enabling automated, cross-domain benchmarking of different types of DMSs.
Introduction to property graphs and gremlinHarsh Thakkar
The document provides an introduction to property graphs and the Gremlin traversal language. It discusses why graphs are useful for modeling real-world data and relationships. It describes where graphs are found in various domains. It then explains the differences between property graph and RDF graph data models, and how property graphs compactly represent data with nodes and edges having their own properties. Finally, it briefly mentions some graph query languages including Gremlin, SPARQL, and PGQL.
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWLTony Hammond
A presentation on 23 October 2017 by Tony Hammond, Michele Pasin and Evangelos Theodoridis to the International Semantic Web Conference (ISWC) 2017 Industry Track on managing Springer Nature (SN) SciGraph with SHACL and OWL. See http://scigraph.com/ for more information on the project.
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
Traditionally, machine learning based approaches to information retrieval have taken the form of supervised learning-to-rank models. Recently, other machine learning approaches—such as adversarial learning and reinforcement learning—have started to find interesting applications in retrieval systems. At Bing, we have been exploring some of these methods in the context of web search. In this talk, I will share couple of our recent work in this area that we presented at SIGIR 2018.
The increasing amount of valuable semi-structured data has become available online. In this talk, we overview the state of the art in entity ranking over structured data ("linked data").
[SIGIR17] Learning to Rank Using Localized Geometric Mean MetricsYuxin Su
Yuxin Su, Irwin King, and Michael Lyu from the Chinese University of Hong Kong propose a new localized geometric mean metric learning (L-GMML) algorithm for query-independent learning to rank. L-GMML learns multiple local metrics to better capture similarity between query-document pairs compared to global metrics. Experiments on benchmark datasets show L-GMML outperforms state-of-the-art ranking algorithms like GBRT and λ-MART in terms of accuracy while also having better computational scalability for large datasets. The paper makes contributions by being the first to apply metric learning and local metrics to query-independent learning to rank.
This document introduces VenmoPlus.com, a service that allows users to explore their Venmo network. It provides features such as fuzzy username searching, labeling relationships between payers and receivers, friend recommendations, searching transactions within a friend circle, and listing friends. It uses Redis to store the graph structure and Elasticsearch to store all other data. Algorithms like breadth-first search and bidirectional search are used to find the distance between nodes in the dynamic graph in real time. The system employs a two database approach with Redis and Elasticsearch to support these features efficiently.
Linking Content Information with Bayesian Personalized Ranking via Multiple C...Ladislav Peska
In this paper, we propose a multiple content alignments extension to the Bayesian Personalized Ranking Matrix Factorization (BPR-MCA). The proposed method incorporates multiple sources of content information in the form of user-to-user or object-to-object similarity matrices and aligns users’ and items’ latent factors ac-cording to these similarities. During the training phase, BPR-MCA also learns the relevance weight of each similarity matrix.
BPR-MCA was evaluated on the MovieLens 1M dataset, extended by the content information from IMDB, DBTropes and ZIP code statistics. The experiment shows that BPR-MCA can help to significantly improve recommendation w.r.t. nDCG and AUPR over standard BPR under several cold-start scenarios.
Open government data portals: from publishing to use and impactElena Simperl
The document discusses open government data portals and their evolution from initial publishing of data to supporting reuse and impact. It describes the key stages in developing portals, including the first portal launched over 13 years ago and the current European data portal. The document outlines work done to support the entire data value chain, analyze portal usage, develop guidelines to make portals more user-centric, and measure their effectiveness in promoting reuse. Examples are provided for how portals can better organize data, promote reuse, and co-locate documentation to support users.
VenmoPlus.com is a service that allows users to explore their Venmo network through additional features like fuzzy username searching, labeling relationships between payers and receivers, friend recommendations, searching transactions within a friend circle, and listing friends. It uses Redis to store the graph structure and Elasticsearch to store all other data. Algorithms like breadth-first search and bidirectional search are used to find distances between nodes in the dynamic graph in real time. The system is optimized through the use of two databases, algorithm design, and query optimizations to support features like searching relationships and transactions within a user's friend circle.
The document provides an overview of the semantic web including its goals of making data meaningful and discoverable. It discusses approaches to building the semantic web such as RDF, RDFS, OWL, and SPARQL. It also covers microformats as a more practical approach and provides examples of using RDF, OWL, SPARQL, and various microformats.
FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte
The final slides of our talk about FedX at the 10th International Semantic Web Conference in Bonn. For details about FedX see http://www.fluidops.com/fedx/
Visualising the Australian open data and research data landscapeJonathan Yu
"Visualising the Australian open data and research data landscape" at C3DIS May 2018 in Melbourne. In this talk, we presented work around the visualisation of an survey of open government and research data in Australia. This features a first attempt at formalising a quantitative based approach to measuring the data ecosystem in Australia.
Mapping Hierarchical Sources into RDF using the RML Mapping Languageandimou
Incorporating structured data in the Linked Data cloud is still complicated, despite the numerous existing tools. In particular, hierarchical structured data (e.g., JSON) are underrepresented, due to their processing complexity. A uniform mapping formalisation for data in different formats, which would enable reuse and exchange between tools and applied data, is missing. This paper describes a novel approach of mapping heterogeneous and hierarchical data sources into RDF using the RML mapping language, an extension over R2RML (the W3C standard for mapping relational databases into RDF). To facilitate those mappings, we present a toolset for producing RML mapping files using the Karma data modelling tool, and for consuming them using a prototype RML processor. A use case shows how RML facilitates the mapping rules’ definition and execution to map several heterogeneous sources.
http://rml.io
https://github.com/mmlab/RMLProcessor
The document discusses the Semantic Web and Linked Data. It provides an overview of RDF syntaxes, storage and querying technologies for the Semantic Web. It also discusses issues around scalability and reasoning over large amounts of semantic data. Examples are provided to illustrate SPARQL querying of RDF data, including graph patterns, conjunctions, optional patterns and value testing.
Boston Globe investigative reporter Todd Wallack prepared this presentation on finding data-driven enterprise stories off your beat for journalists attending New England NewsTrain on Oct. 14, 2017. It is accompanied by a handout: Data-driven enterprise. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info: http://bit.ly/NewsTrain
The document discusses querying live linked data from millions of diverse data sources on the web. It presents different approaches for source selection when querying over dynamic linked data, including using indexes, data summaries, and direct execution. Evaluation of the approaches shows that combining querying of static RDF stores and the live web through source selection dynamics can improve query time and return fresher results.
The document presents a method for clustering and exploring search results using timelines. It describes annotating documents with temporal metadata, constructing time outlines from the metadata to organize search results chronologically, and clustering documents based on time granularity. An evaluation using Amazon Mechanical Turk found the method improved search result relevance by adding temporal context and snippets.
The new RDA: resource description in libraries and beyond / Gordon DunsireCILIP MDG
This document discusses the new RDA (Resource Description and Access), which provides data elements, guidelines and instructions for creating metadata for library and cultural heritage resources. Key points:
- The RDA Toolkit provides user-focused elements, guidelines and instructions. The RDA Registry provides infrastructure for well-formed, linked RDA data applications.
- There are now 13 entities and over 1700 elements in RDA. Elements are now the main unit of focus and have standard structure/layout.
- Recording methods have been extended to all elements and now make the linked data method explicit. Instructions are now more optional to accommodate local practice.
- Effective description requires choosing appropriate entities and elements based on an application profile
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfHarsh Thakkar
Knowledge graphs have become popular over the past decade and frequently rely on the Resource Description Framework (RDF) or property graph databases as data models. We present, the first translator from SPARQL -- the W3C standardised language for RDF -- and Gremlin -- a popular property graph traversal language. Gremlinator translates SPARQL queries to Gremlin path traversals for executing graph pattern matching queries over graph databases.
This allows a user, who is well versed in SPARQL, to access and query a wide variety of Graph Data Management Systems (DMSs) avoiding the steep learning curve for adapting to a new Graph Query Language (GQL). Gremlin is a graph computing system agnostic traversal language (covering both OLTP graph database or OLAP graph processors), making it a desirable choice for supporting interoperability for querying Graph DMSs.
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
The document discusses developing an open benchmarking framework called LITMUS to evaluate diverse data management systems (DMSs) in a standardized way. It addresses key challenges in 1) converting between data models like RDF and property graphs, 2) translating queries between languages like SPARQL and Gremlin, and 3) selecting appropriate key performance indicators (KPIs). LITMUS is designed with modules for data integration, querying, profiling system performance, and analyzing results. Addressing the challenges of data conversion, query translation, and metrics selection is needed to realize LITMUS' goal of enabling automated, cross-domain benchmarking of different types of DMSs.
Introduction to property graphs and gremlinHarsh Thakkar
The document provides an introduction to property graphs and the Gremlin traversal language. It discusses why graphs are useful for modeling real-world data and relationships. It describes where graphs are found in various domains. It then explains the differences between property graph and RDF graph data models, and how property graphs compactly represent data with nodes and edges having their own properties. Finally, it briefly mentions some graph query languages including Gremlin, SPARQL, and PGQL.
Data Integration & Disintegration: Managing SN SciGraph with SHACL and OWLTony Hammond
A presentation on 23 October 2017 by Tony Hammond, Michele Pasin and Evangelos Theodoridis to the International Semantic Web Conference (ISWC) 2017 Industry Track on managing Springer Nature (SN) SciGraph with SHACL and OWL. See http://scigraph.com/ for more information on the project.
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
Traditionally, machine learning based approaches to information retrieval have taken the form of supervised learning-to-rank models. Recently, other machine learning approaches—such as adversarial learning and reinforcement learning—have started to find interesting applications in retrieval systems. At Bing, we have been exploring some of these methods in the context of web search. In this talk, I will share couple of our recent work in this area that we presented at SIGIR 2018.
The increasing amount of valuable semi-structured data has become available online. In this talk, we overview the state of the art in entity ranking over structured data ("linked data").
[SIGIR17] Learning to Rank Using Localized Geometric Mean MetricsYuxin Su
Yuxin Su, Irwin King, and Michael Lyu from the Chinese University of Hong Kong propose a new localized geometric mean metric learning (L-GMML) algorithm for query-independent learning to rank. L-GMML learns multiple local metrics to better capture similarity between query-document pairs compared to global metrics. Experiments on benchmark datasets show L-GMML outperforms state-of-the-art ranking algorithms like GBRT and λ-MART in terms of accuracy while also having better computational scalability for large datasets. The paper makes contributions by being the first to apply metric learning and local metrics to query-independent learning to rank.
This document introduces VenmoPlus.com, a service that allows users to explore their Venmo network. It provides features such as fuzzy username searching, labeling relationships between payers and receivers, friend recommendations, searching transactions within a friend circle, and listing friends. It uses Redis to store the graph structure and Elasticsearch to store all other data. Algorithms like breadth-first search and bidirectional search are used to find the distance between nodes in the dynamic graph in real time. The system employs a two database approach with Redis and Elasticsearch to support these features efficiently.
Linking Content Information with Bayesian Personalized Ranking via Multiple C...Ladislav Peska
In this paper, we propose a multiple content alignments extension to the Bayesian Personalized Ranking Matrix Factorization (BPR-MCA). The proposed method incorporates multiple sources of content information in the form of user-to-user or object-to-object similarity matrices and aligns users’ and items’ latent factors ac-cording to these similarities. During the training phase, BPR-MCA also learns the relevance weight of each similarity matrix.
BPR-MCA was evaluated on the MovieLens 1M dataset, extended by the content information from IMDB, DBTropes and ZIP code statistics. The experiment shows that BPR-MCA can help to significantly improve recommendation w.r.t. nDCG and AUPR over standard BPR under several cold-start scenarios.
Open government data portals: from publishing to use and impactElena Simperl
The document discusses open government data portals and their evolution from initial publishing of data to supporting reuse and impact. It describes the key stages in developing portals, including the first portal launched over 13 years ago and the current European data portal. The document outlines work done to support the entire data value chain, analyze portal usage, develop guidelines to make portals more user-centric, and measure their effectiveness in promoting reuse. Examples are provided for how portals can better organize data, promote reuse, and co-locate documentation to support users.
VenmoPlus.com is a service that allows users to explore their Venmo network through additional features like fuzzy username searching, labeling relationships between payers and receivers, friend recommendations, searching transactions within a friend circle, and listing friends. It uses Redis to store the graph structure and Elasticsearch to store all other data. Algorithms like breadth-first search and bidirectional search are used to find distances between nodes in the dynamic graph in real time. The system is optimized through the use of two databases, algorithm design, and query optimizations to support features like searching relationships and transactions within a user's friend circle.
- Engineering student at UC Berkeley interested in data science and AI positions
- Relevant technical skills include machine learning, optimization, mathematical modeling, and programming skills such as Python, SQL, R, and C
- Current projects include using machine learning for patent analysis, optimizing forest fire waste supply chains, and developing music recommendations using convolutional neural networks
Similar to Grades nda 2018 - gremlinator demo talk - harsh thakkar (14)
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
1. Two for One -- Querying Property Graphs using SPARQL
via GREMLINATOR
Harsh Thakkar, Dharmen Punjani, Jens Lehmann, Sören Auer
2. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
●
●
●
●
●
3. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
●
●
●
●
●
4. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
● Graph formalism → intuitive way of modeling complex, highly connected data
● Resource Description Framework (RDF, W3C standard, 2004) data model and the
Property Graph (PG) data model are most popular graph data models.
● Various Graph Query Languages (GQLs) have been proposed to address:
○ Declarative style - Pattern Matching
○ Imperative style - Traversing
● SPARQL (W3C standard, 2008) for querying RDF databases, ?? (standard) for PG
databases
● Lack of standardization → Vendor lock-in → Interoperability gap → many issues!
5. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
● Gremlinator advantages:
○ Avoid a steep learning curve for users well versed in SPARQL for querying graph databases
○ Perform both OLTP and OLAP querying using SPARQL
○ Bridge the gap between the two Graph data models (RDF & PG), and
■ between the Semantic Web and Graph database communities
■ Best of both the worlds ⇒ get “Two for One”
6. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
●
●
●
●
●
7. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
RDF Data Model
● RDF is a triple based graph model (W3C’04), where :
○ Subject: URI, Blank node
○ Predicate: URIs -> property
○ Object: URI, Literal, Blank node
“2018”
ex:Eventex:Person
ex:Houston
“GRADES-NDA’18”
ex:year
ex:name
ex:place
ex:speaker
URI = Universal Resource identifier, analogous
to ISBN for books
Literals = data values
Blank nodes = Desc. of entities that don’t need
to be named.
IRIs*
ex:stim
e
“20”
@prefix ex: <http://example.org>
ex:Person ex:speaker ex:Event
ex:Person ex:name “Harsh”
ex:Person ex:place ex:Bonn
ex:Person ex:age “28”
ex:Event ex:name “GRADES-NDA’18”
ex:Event ex:Year “2018”
interpretation
representation
“Harsh” ex:name
ex:place
ex:Bonn
“28”
ex:age
8. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
RDF Graphs (RDFGs)
● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals)
● Going from information to Knowledge using OWL (DLs) and Ontologies (RDFS, RDFa,
etc)
● Bulky
○ Everything is a node-edge-node (edges do not have properties)
○ More relationships per node → More total number of triples!
■ Triple/dataset explosion
9. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
Property Graph Data Model
● Edge-labelled, directed, attributed, multi-graph
● Vertices and edges both have properties
● Main components:
○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings)
● Super neat (compact), super cute
● Easier to add weighted, reified edges
● Query Languages - CYPHER, Gremlin, PGQL, etc
Name: GRADES-NDA’18
Year: 2018
Place: Houston
Name: Harsh
Age: 28
Place: Bonn
Time: 20
Person Event
speaker
10. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
●
●
●
●
●
11. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png
http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png
Gremlin’s Multi-Graph Query Language (GQL) support
12. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
…
Multi-DMS & platform support
https://tinkerpop.apache.org/images/oltp-and-olap.png And thus… ➤
13. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlinator
Me
Coffee
14. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
●
●
●
●
●
15. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
• Gremlinator is a novel translation approach that maps SPARQL queries to Gremlin
pattern matching traversals [1, 2]
Talk@Graph Day 2017
[1] Thakkar, Harsh, Dharmen Punjani, et al. "Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin." In proceedings of DEXA 2017, pp. 81-91. Springer, Cham, (2017).
[2] Thakkar, Harsh, Dharmen Punjani, et al. "A Stitch in Time Saves Nine--SPARQL querying of Property Graphs using Gremlin Traversals." under review at the Semantic Web Journal (submitted Feb,
2018).
16. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
➞
⇒
⇒
⇒
⇒
Name: GRADES-NDA’18
Year: 2018
Place: Houston
Name: Harsh
Age: 28
From: Bonn, DE
Time: 20
Person Event
* Rodriguez, Marko A., and Peter Neubauer.
"The graph traversal pattern." arXiv preprint
arXiv:1004.1001 (2010).
speaker
17. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
➞
Mapping corresponding Gremlin operators
18. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
Ex: Tinker Modern-Crew Graph
“Select only those persons who are younger or equal
than 30 and created a soft. Collectively.”
SELECT ?a ?b ?c WHERE {
?a v:label "person" .
?a e:knows ?b .
?a e:created ?c .
?b e:created ?c .
?a v:age ?d .
FILTER (?d <= 30)
}
19. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
Contd…
SELECT ?a ?b ?c WHERE {
?a v:label "person" .
?a e:knows ?b .
?a e:created ?c .
?b e:created ?c .
?a v:age ?d .
FILTER (?d <= 30)
}
[MatchStartStep@[a], HasStep([~label.eq(person)]), MatchEndStep]
[MatchStartStep@[a], VertexStep(OUT,[knows],vertex)@[b], MatchEndStep]
[MatchStartStep@[a], VertexStep(OUT,[created],vertex)@[c], MatchEndStep]
[MatchStartStep@[b], VertexStep(OUT,[created],vertex)@[c], MatchEndStep]
[MatchStartStep@[a], PropertiesStep([age],value)@[d], MatchEndStep]
[WhereTraversalStep([WhereStartStep(d), IsStep(leq(30))]), MatchEndStep]
s
BGPs
BGP (q) ➞ SST* ( )
20. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
[GraphStep(vertex,[]), MatchStep(AND,[
[MatchStartStep(a), HasStep([~label.eq(person)]), MatchEndStep],
[MatchStartStep(a), VertexStep(OUT,[knows],vertex), MatchEndStep(b)],
[MatchStartStep(a), VertexStep(OUT,[created],vertex), MatchEndStep(c)],
[MatchStartStep(b), VertexStep(OUT,[created],vertex), MatchEndStep(c)],
[MatchStartStep(a), PropertiesStep([age],value), MatchEndStep(d)],
[MatchStartStep(d), WhereTraversalStep([WhereStartStep, IsStep(leq(30))]), MatchEndStep] ] ),
SelectStep([a, b, c])]
SELECT ?a ?b ?c WHERE {
?a v:label "person" .
?a e:knows ?b .
?a e:created ?c .
?b e:created ?c .
?a v:age ?d .
FILTER (?d <= 30)
}
Contd…
sSPARQL Query
CGP (Q) ➞ Traversal ( )
21. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
“Select only those persons who are younger or
equal than 30 and created a soft. Collectively.”
SELECT ?a ?b ?c WHERE {
?a v:label "person" .
?a e:knows ?b .
?a e:created ?c .
?b e:created ?c .
?a v:age ?d .
FILTER (?d <= 30)
}
{a=v[2], b=v[4], c=v[3]}
Contd…
22. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
https://sd.keepcalm-o-matic.co.uk/i-w600/keep-calm-it-is-demo-time.jpg
http://gremlinator.iai.uni-bonn.de:8080/Demo/
23. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
Team
Other Resources
Dharmen PunjaniHarsh Thakkar Prof. Dr. Sören
Auer
Prof. Dr. Jens
Lehmann
24. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
Demo: http://195.201.31.31:8080/Demo/ (OR) http://gremlinator.iai.uni-bonn.de:8080/Demo/
Harsh Thakkar
University of Bonn
Twitter: @harsh9t
LinkedIn: thakkarharsh
E-mail: harsh9t@gmail.com
Questions? Comments?
Insults? Injuries?
25. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
26. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
● A first of its kind, SPARQL-to-Gremlin traversal compiler [1,2] based on the Apache
TinkerPop framework.
SPARQLLanguage
FILTER
GROUPBY
LIMIT+OFFSET
UNION OPTIONAL
COUNT
GROUP
BY
GREMLINATOR
It allows querying
Property Graphs via
SPARQL
Can query a wide
variety of Graph
DBs using SPARQL
[1] Thakkar, Harsh, Dharmen Punjani, et al. "Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin." In proceedings of DEXA 2017, pp. 81-91. Springer, Cham, (2017).
[2] Thakkar, Harsh, Dharmen Punjani, et al. "A Stitch in Time Saves Nine--SPARQL querying of Property Graphs using Gremlin Traversals." under review at the Semantic Web Journal (submitted Feb, 2018).
27. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
●
●
●
●
○ WHERE
○
○ SELECT
28. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
GPM using Gremlin*
1. g.V().match(
__.as(‘x’).out(‘Created’).as(‘y’)).
select(‘x’).dedup()
2. g.V(2).match(__.as(‘x’).out(‘Created’).
as(‘y’)).dedup()
*In Gremlin GPM is executed by the match() step
29. GRADES-NDA‘18 ⦿ Houston, TX, USA ⦿ June 10, 2018 Two for One - SPARQL Querying of PGs via Gremlinator ⦿ Harsh Thakkar ⦿ University of Bonn
==>x:v[4]
==>x:v[2]
==>x:v[5]
Output
*In Gremlin GPM is executed by the match() step
==>x:v[3]
Output
x
x
x
y
y
GPM using Gremlin*
1. g.V().match(
__.as(‘x’).out(‘Created’).as(‘y’)).
select(‘x’).dedup()
2. g.V(2).match(__.as(‘x’).out(‘Created’).
as(‘y’)).dedup()