The document describes an approach called Odyssey for optimizing federated SPARQL queries. It involves computing concise statistics about links between triple patterns, called characteristic sets (CS), at a single location. These CSs capture joins and are connected to each other through characteristic pairs (CP). The approach uses these statistics to efficiently optimize query execution plans through dynamic programming. This leads to significant improvements in optimization and execution times compared to existing federated query optimization techniques.
The document summarizes several papers presented at the SIGIR 2011 workshop on query representation and understanding.
1. One paper analyzed temporal queries using web snippets and query logs to identify queries with implicit temporal intent, finding that dates were more frequent in snippets than logs.
2. A second paper used complex network analysis to show that search queries have a kernel-periphery structure similar to natural language, with popular query segments in the kernel and rarer segments in the periphery.
3. A third paper investigated query refinement via topic analysis and learning, using latent Dirichlet allocation on query logs to identify topics to personalize candidate query suggestions for individual users.
PyGotham NY 2017: Natural Language Processing from ScratchNoemi Derzsy
This document discusses using natural language processing techniques to analyze over 32,000 open datasets from NASA. It describes preprocessing text data, generating word clouds, calculating term frequency-inverse document frequency, performing topic modeling with Latent Dirichlet Allocation and K-means clustering. The topic modeling helps evaluate how accurately descriptions and keywords capture dataset topics. Connections between NASA datasets and other US government collections are also examined.
Spatial Latent Dirichlet Allocation (SLDA) is an extension of LDA that incorporates spatial information to improve topic modeling of image data. SLDA treats each region of an image grid as a document and assigns visual words representing local image patches to the closest region. This allows it to capture co-occurrence relationships between visual words better than LDA. The paper demonstrates SLDA can outperform LDA on image classification tasks by incorporating spatial context between visual words.
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
This document discusses supporting virtual integration of Linked Data through just-in-time query recompilation. It presents a technique for compiling input queries into target SPARQL queries over individual data sources. Microcompilers encode knowledge of data schemas and query skeletons provide templates. Experiments show overhead is mostly standard but it enables queries not otherwise possible and is efficient when expanding queries. Future work includes optimizing overhead and investigating other languages and templating.
The application of linked data concepts and technologies promises great benefits for the retrieval and access of library and cultural heritage resources, but requires new ways of thinking by professionals and end-users. This presentation will use linked data examples from RDA: resource description and access, and will discuss:
- A basic introduction to data structures: triples, chains, and clusters
- What is a linked data record?
- Global, multilingual linked data
- Linking data from multiple sources
- The Semantic Web: a paradigm shift?
The document describes Dedalo, a system that automatically explains clusters of data by traversing linked data to find explanations. It evaluates different heuristics for guiding the traversal, finding that entropy and conditional entropy outperform other measures by reducing redundancy and search time. Experiments on authorship clusters, publication clusters, and library book borrowings demonstrate Dedalo's ability to discover explanatory linked data patterns within a limited domain. Future work includes extending Dedalo to handle more complex datasets by addressing issues such as sameAs linking and use of literals.
The document summarizes several papers presented at the SIGIR 2011 workshop on query representation and understanding.
1. One paper analyzed temporal queries using web snippets and query logs to identify queries with implicit temporal intent, finding that dates were more frequent in snippets than logs.
2. A second paper used complex network analysis to show that search queries have a kernel-periphery structure similar to natural language, with popular query segments in the kernel and rarer segments in the periphery.
3. A third paper investigated query refinement via topic analysis and learning, using latent Dirichlet allocation on query logs to identify topics to personalize candidate query suggestions for individual users.
PyGotham NY 2017: Natural Language Processing from ScratchNoemi Derzsy
This document discusses using natural language processing techniques to analyze over 32,000 open datasets from NASA. It describes preprocessing text data, generating word clouds, calculating term frequency-inverse document frequency, performing topic modeling with Latent Dirichlet Allocation and K-means clustering. The topic modeling helps evaluate how accurately descriptions and keywords capture dataset topics. Connections between NASA datasets and other US government collections are also examined.
Spatial Latent Dirichlet Allocation (SLDA) is an extension of LDA that incorporates spatial information to improve topic modeling of image data. SLDA treats each region of an image grid as a document and assigns visual words representing local image patches to the closest region. This allows it to capture co-occurrence relationships between visual words better than LDA. The paper demonstrates SLDA can outperform LDA on image classification tasks by incorporating spatial context between visual words.
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
This document discusses supporting virtual integration of Linked Data through just-in-time query recompilation. It presents a technique for compiling input queries into target SPARQL queries over individual data sources. Microcompilers encode knowledge of data schemas and query skeletons provide templates. Experiments show overhead is mostly standard but it enables queries not otherwise possible and is efficient when expanding queries. Future work includes optimizing overhead and investigating other languages and templating.
The application of linked data concepts and technologies promises great benefits for the retrieval and access of library and cultural heritage resources, but requires new ways of thinking by professionals and end-users. This presentation will use linked data examples from RDA: resource description and access, and will discuss:
- A basic introduction to data structures: triples, chains, and clusters
- What is a linked data record?
- Global, multilingual linked data
- Linking data from multiple sources
- The Semantic Web: a paradigm shift?
The document describes Dedalo, a system that automatically explains clusters of data by traversing linked data to find explanations. It evaluates different heuristics for guiding the traversal, finding that entropy and conditional entropy outperform other measures by reducing redundancy and search time. Experiments on authorship clusters, publication clusters, and library book borrowings demonstrate Dedalo's ability to discover explanatory linked data patterns within a limited domain. Future work includes extending Dedalo to handle more complex datasets by addressing issues such as sameAs linking and use of literals.
The document summarizes three papers presented at the SIGIR 2011 workshop on query representation and understanding.
1. The first paper examines using web snippets and query logs to identify implicit temporal intents in queries by analyzing dates mentioned in snippets and previous queries. It finds snippets contain more temporal information than query logs.
2. The second paper analyzes web search query networks and finds a kernel-periphery structure, where high-degree "kernel" words differ from low-degree "peripheral" words. This structure is less pronounced than in natural language networks.
3. The third paper proposes a topic modeling approach to query refinement that generates candidate refinements, scores them based on topic relationships, and incorporates personalization
Leveraging Linked Data to Infer Semantic Relations within Structured SourcesMohsen Taheriyan
Information sources such as spreadsheets and databases contain a vast amount of structured data. Understanding the semantics of this information is essential to automate searching and integrating it. Semantic models capture the intended meaning of data sources by mapping them to the concepts and relationships defined by a domain ontology. Most of the effort to automatically build semantic models is focused on labeling the data fields with ontology classes and/or properties, e.g., annotating the first column of a table with dbpedia:Person and the second one with dbpedia:Film. However, a precise semantic model needs to explicitly represent the relationships too, e.g., stating that dbpedia:director is the relation between the first and second column. In this paper, we present a novel approach that leverages the small graph patterns occurring in the Linked Open Data (LOD) to automatically infer the semantic relations within a given data source assuming that the source attributes are already annotated with semantic labels. We evaluated our approach on a dataset of museum sources using the linked data published by Smithsonian American Art Museum as background knowledge. Mining only patterns of length one and two, our method achieves an average precision of 78% and recall of 70% in inferring the relationships included in the semantic models associated with data sources.
Building generic data queries using python astAdrien Chauve
This document discusses using Python's abstract syntax tree (AST) to build generic data queries. It explains that the AST can represent code as a tree structure that can then be walked and evaluated. By walking the AST, custom evaluators can be built to generate Pandas DataFrame columns or SQL queries from a formula string. This allows adding new computed columns to datasets in a generic way using only a few lines of code.
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
Presentation of the tool LODeX (http://www.dbgroup.unimore.it/lodex2/testCluster) at the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, Singapore, December 6-8, 2015
This document discusses adding semantic structure to real-time social data from Twitter through Twitter Annotations. It describes how Annotations can be mapped to existing Semantic Web vocabularies and linked to datasets to enable real-time semantic search over social and linked data. A system called TwitLogic is presented that captures Twitter data, converts it to RDF, and publishes it as linked streams to allow for continuous querying and integration with the live Semantic Web.
Text as Data: processing the Hebrew BibleDirk Roorda
The merits of stand-off markup (LAF) versus inline markup (TEI) for processing text as data. Ideas applied to work with the Hebrew Bible, resulting in tools for researchers and end-users.
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
The document discusses using RDFS and OWL reasoning to integrate heterogeneous linked data by addressing issues like terminology and naming heterogeneity. It presents an approach using a subset of OWL 2 RL rules to reason over a billion triple corpus in a scalable way, handling the TBox separately from the ABox to avoid quadratic inferences. It also describes augmenting the reasoning with annotations to track trustworthiness and using this to filter inferences, detect inconsistencies and perform a light repair of the data. Consolidation is discussed as rewriting URIs to canonical identifiers based on owl:sameAs relations. Performance results show the different techniques taking between 1-20 hours to run over the corpus distributed across 9 machines.
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
The document discusses a reference architecture for searching and querying knowledge graphs with Solr/SIREn. It describes challenges in indexing and searching knowledge graphs due to their complex relational structure and diversity of data. The proposed architecture aims to simplify the task by reducing custom code through standardized tools and enabling quick adaptation to changes in data schemas or requirements. Key components include using SPARQL to extract relevant graph subsets and map them to a simplified schema, generating JSON documents from the extracted subgraphs for indexing, and leveraging the SIREn plugin to support structured queries over nested and relational data.
This document discusses research into automatically discovering strong relationships between entities in Linked Data using genetic programming. The researchers aim to learn a cost function that can guide uninformed searches over Linked Data to find the most promising relationship paths. They experiment with different topological and semantic features as inputs to genetic programming to learn cost functions. The best-performing cost functions incorporate features like namespace variety, conditional node degree, and topics. This suggests specific, well-described paths through entities of different types are indicators of strong relationships in Linked Data.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
The document summarizes three papers presented at the SIGIR 2011 workshop on query representation and understanding.
1. The first paper examines using web snippets and query logs to identify implicit temporal intents in queries by analyzing dates mentioned in snippets and previous queries. It finds snippets contain more temporal information than query logs.
2. The second paper analyzes web search query networks and finds a kernel-periphery structure, where high-degree "kernel" words differ from low-degree "peripheral" words. This structure is less pronounced than in natural language networks.
3. The third paper proposes a topic modeling approach to query refinement that generates candidate refinements, scores them based on topic relationships, and incorporates personalization
Leveraging Linked Data to Infer Semantic Relations within Structured SourcesMohsen Taheriyan
Information sources such as spreadsheets and databases contain a vast amount of structured data. Understanding the semantics of this information is essential to automate searching and integrating it. Semantic models capture the intended meaning of data sources by mapping them to the concepts and relationships defined by a domain ontology. Most of the effort to automatically build semantic models is focused on labeling the data fields with ontology classes and/or properties, e.g., annotating the first column of a table with dbpedia:Person and the second one with dbpedia:Film. However, a precise semantic model needs to explicitly represent the relationships too, e.g., stating that dbpedia:director is the relation between the first and second column. In this paper, we present a novel approach that leverages the small graph patterns occurring in the Linked Open Data (LOD) to automatically infer the semantic relations within a given data source assuming that the source attributes are already annotated with semantic labels. We evaluated our approach on a dataset of museum sources using the linked data published by Smithsonian American Art Museum as background knowledge. Mining only patterns of length one and two, our method achieves an average precision of 78% and recall of 70% in inferring the relationships included in the semantic models associated with data sources.
Building generic data queries using python astAdrien Chauve
This document discusses using Python's abstract syntax tree (AST) to build generic data queries. It explains that the AST can represent code as a tree structure that can then be walked and evaluated. By walking the AST, custom evaluators can be built to generate Pandas DataFrame columns or SQL queries from a formula string. This allows adding new computed columns to datasets in a generic way using only a few lines of code.
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
Presentation of the tool LODeX (http://www.dbgroup.unimore.it/lodex2/testCluster) at the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, Singapore, December 6-8, 2015
This document discusses adding semantic structure to real-time social data from Twitter through Twitter Annotations. It describes how Annotations can be mapped to existing Semantic Web vocabularies and linked to datasets to enable real-time semantic search over social and linked data. A system called TwitLogic is presented that captures Twitter data, converts it to RDF, and publishes it as linked streams to allow for continuous querying and integration with the live Semantic Web.
Text as Data: processing the Hebrew BibleDirk Roorda
The merits of stand-off markup (LAF) versus inline markup (TEI) for processing text as data. Ideas applied to work with the Hebrew Bible, resulting in tools for researchers and end-users.
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
The document discusses using RDFS and OWL reasoning to integrate heterogeneous linked data by addressing issues like terminology and naming heterogeneity. It presents an approach using a subset of OWL 2 RL rules to reason over a billion triple corpus in a scalable way, handling the TBox separately from the ABox to avoid quadratic inferences. It also describes augmenting the reasoning with annotations to track trustworthiness and using this to filter inferences, detect inconsistencies and perform a light repair of the data. Consolidation is discussed as rewriting URIs to canonical identifiers based on owl:sameAs relations. Performance results show the different techniques taking between 1-20 hours to run over the corpus distributed across 9 machines.
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
The document discusses a reference architecture for searching and querying knowledge graphs with Solr/SIREn. It describes challenges in indexing and searching knowledge graphs due to their complex relational structure and diversity of data. The proposed architecture aims to simplify the task by reducing custom code through standardized tools and enabling quick adaptation to changes in data schemas or requirements. Key components include using SPARQL to extract relevant graph subsets and map them to a simplified schema, generating JSON documents from the extracted subgraphs for indexing, and leveraging the SIREn plugin to support structured queries over nested and relational data.
This document discusses research into automatically discovering strong relationships between entities in Linked Data using genetic programming. The researchers aim to learn a cost function that can guide uninformed searches over Linked Data to find the most promising relationship paths. They experiment with different topological and semantic features as inputs to genetic programming to learn cost functions. The best-performing cost functions incorporate features like namespace variety, conditional node degree, and topics. This suggests specific, well-described paths through entities of different types are indicators of strong relationships in Linked Data.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Talk odysseyiswc2017
1. The Odyssey Approach for
Optimizing Federated SPARQL
Queries
Gabriela Montoya1
, Hala Skaf-Molli2
, and Katja Hose1
1Aalborg University, Denmark
{gmontoya,khose}@cs.aau.dk
2Nantes University, France
hala.skaf@univ-nantes.fr
October 25th, 2017
2. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Optimizing Federated SPARQL Queries
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG
2
3. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Optimizing Federated SPARQL Queries
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG
Subquery Relevant Sources
?film dbo:director ?director . DBpedia
?film rdf:type dbo:Film DBpedia,Drugbank,LMDB,ChEBI
Geonames,NYTimes,Jamendo,SWDF,KEGG
?movie owl:sameAs ?film DBpedia,Drugbank,LMDB
Geonames,NYTimes,Jamendo,SWDF,KEGG
?movie dcterms:title ?title LMDB,SWDF
?movie movie:film subject film subject:444 LMDB
2
4. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Existing approaches
FedX1 SemaGrow2
Plan
tp1 tp2
tp3
tp5
tp4
@DBpedia
@DBpedia,Drugbank,
Geonames,Jamendo
KEGG,LMDB,
NYTimes,SWDF
@LMDB,SWDF
@LMDB
1
tp5 tp3
tp4 tp2 tp1
@DBpedia
@DBpedia,Drugbank,
LMDB,Geonames,
NYTimes,Jamendo,
SWDF,KEGG
@LMDB,
SWDF
@LMDB
1
Optimization Technique Heuristics Dynamic Programming
Optimization Time (s) 0.74 4.75
Execution Time (s) 142 6.93
1
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
2
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
3
5. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Considering joins between triple patterns
improves the optimization
Subquery Relevant Sources
?movie owl:sameAs ?film . LMDB
?movie dcterms:title ?title .
?movie movie:film subject film subject:444
Only entities that satisfy owl:sameAs, dcterms:title,
and movie:film subject are part of LMDB!
Entities from all other sources that seem relevant
for some triple patterns, actually will not contribute
to the result!
4
6. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Contributions
Concise statistics describing links among triples
while guaranteeing result completeness.
A technique to compute such statistics in a
federated setup.
Affordable optimization based on dynamic
programming.
5
7. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
8. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
9. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
10. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
11. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
12. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Other basic statistics are also stored
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
13. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
14. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
Characteristic Pairs (CP)4
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
15. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
Characteristic Pairs (CP)4
(CD,1, CD,2, owl:sameAs) count((CD,1, CD,2, owl:sameAs)) = 1
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
17. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Cardinality estimation
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type ? type . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t ? s u b j e c t ( tp5 )
}
estimatedCardinality((Pk , Pl , p)) =
Pk ⊆Ci ∧Pl ⊆Cj
count((Ci , Cj , p))
∗
pk ∈Pk −{p}
ocurrences(pk , Ci )
count(Ci )
∗
pl ∈Pl
ocurrences(pl , Cj )
count(Cj )
(2)
Pk = {owl : sameAs, dcterms : title, movie : film subject}
Pl = {dbo : director, rdf : type}
p = owl : sameAs
8
18. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Cardinality estimation
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
estimatedCardinality((Pk , Pl , p)) = max
Pk ⊆Ci ∧Pl ⊆Cj
count((Ci , Cj , p))
∗
pk ∈Pk −{p}
ocurrences(pk , Ci )
count(Ci )
∗
pl ∈Pl
ocurrences(pl , Cj )
count(Cj )
(3)
Pk = {owl : sameAs, dcterms : title, movie : film subject}
Pl = {dbo : director, rdf : type}
p = owl : sameAs
9
19. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
1. At each source the CSs are computed
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
...
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
...
10
20. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
1. At each source the CSs are computed
2. At each source local statistics are computed
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
...
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
...
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={
dbr:Kate & Leopold, dbr:Journey to the Center of Time, ...}
10
21. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
3. Local statistics and CSs are transferred to the
federated query engine
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...}
local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...}
local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...}
...
11
22. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
3. Local statistics and CSs are transferred to the
federated query engine
4. Set overlap is computed to estimate the statistics
of federated CSs and CPs
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...}
local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...}
local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...}
...
(CLMDB,j ,CDBpedia,j ,owl:sameAs)
11
25. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
1. Identify the star-shaped subqueries
2. Identify the relevant CSs and estimate cardinality
SELECT DISTINCT ∗ WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type dbo:Film . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject film subject:444 ( tp5 )
}
CDBpedia,j ={dbo:director,rdf:type,...}
estimatedCardinality({dbo:director,rdf:type})=162
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
estimatedCardinality({movie:film subject,owl:sameAs,dcterms:title})=4
12
26. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
13
27. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
4. A cost function is used to select the plan that
leads to transfer the lower number of tuples.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
tp1 tp2
tp5 tp3
tp4
@DBpedia
@LMDB
1
cost=4+1=5 cost=162+1=163
13
28. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
4. A cost function is used to select the plan that
leads to transfer the lower number of tuples.
5. If necessary, compute join ordering using
Dynamic Programming.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
tp1 tp2
tp5 tp3
tp4
@DBpedia
@LMDB
1
cost=4+1=5 cost=162+1=163
13
29. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey provides a better plan
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
5
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
6
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
7
G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17.
14
30. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey provides a better plan
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
FedX5 SemaGrow6 Odyssey7
Plan
tp1 tp2
tp3
tp5
tp4
@DBpedia
@DBpedia,Drugbank,
Geonames,Jamendo
KEGG,LMDB,
NYTimes,SWDF
@LMDB,SWDF
@LMDB
1
tp5 tp3
tp4 tp2 tp1
@DBpedia
@DBpedia,Drugbank,
LMDB,Geonames,
NYTimes,Jamendo,
SWDF,KEGG
@LMDB,
SWDF
@LMDB
1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
OT 0.74s 4.75s 0.22s
ET 142s 6.93s 1.30s
5
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
6
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
7
G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17.
14
31. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Experimental setup
Queries and datasets from FedBench8
.
Comparison with existing approaches FedX9
,
SemaGrow10
, SPLENDID11
, HiBISCuS12
.
A Virtuoso 7.2.4.2 endpoint was deployed for
each dataset.
Plots present the average over nine executions
with a timeout of 30m.
8
M. Schmidt et al. “FedBench: A Benchmark Suite for Federated Semantic Data Query Processing”. In:
ISWC’11.
9
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
10
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
11
O. G¨orlitz and S. Staab. “SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions”. In:
COLD’11.
12
M. Saleem and A. N. Ngomo. “HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint
Federation”. In: ESWC’14.
15
32. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Optimization Time is
comparable with other approaches’
100
102
104
LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11
OT(ms)
Odyssey
HiBISCuS−Warm
HiBISCuS−Cold
FedX−Warm
FedX−Cold
SemaGrow
SPLENDID
100
102
104
CD1 CD2 CD3 CD4 CD5 CD6 CD7
OT(ms)
100
102
104
LS1 LS2 LS3 LS4 LS5 LS6 LS7
OT(ms)
16
33. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s plans have less subqueries than
other approaches’ plans
0
5
10
15
20
LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11
NSQ
Odyssey
HiBISCuS−Warm
HiBISCuS−Cold
FedX−Warm
FedX−Cold
SemaGrow
SPLENDID
0
5
10
15
20
CD1 CD2 CD3 CD4 CD5 CD6 CD7
NSQ
0
5
10
15
20
LS1 LS2 LS3 LS4 LS5 LS6 LS7
NSQ
17
35. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Conclusions
Odyssey uses cardinality estimations that allow
for better optimizations.
Local statistics allow to discover connections
between datasets in a federated setup.
Odyssey’s plans are in general better than
existing approaches’ plans.
19
36. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Future works
Odyssey’s optimization can be improved using
with runtime optimizations (ASK queries).
Reduce the local statistics computation times
and sizes.
Provide efficient strategies to update the
statistics.
20
37. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Questions?
21