Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints that derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this article, we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation
of an ontological query into an equivalent first-order query against the underlying extensional database.
We present a novel query rewriting algorithm for rather general types of ontological constraints that is well suited for practical implementations. In particular, we show how a conjunctive query against a knowledge base, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog+/- family of ontology languages, can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this rewriting process soas to produce possibly small and cost-effective UCQ rewritings for an input query.
Alternative Infill Strategies for Expensive Multi-Objective OptimisationAlma Rahat
Many multi-objective optimisation problems incorporate computationally or financially expensive objective functions. State-of-the-art algorithms therefore construct surrogate model(s) of the parameter space to objective functions mapping to guide the choice of the next solution to expensively evaluate. Starting from an initial set of solutions, an infill criterion — a surrogate-based indicator of quality — is extremised to determine which solution to evaluate next, until the budget of expensive evaluations is exhausted. Many successful infill criteria are dependent on multi-dimensional integration, which may result in infill criteria that are themselves impractically expensive. We propose a computationally cheap infill criterion based on the minimum probability of improvement over the estimated Pareto set. We also present a range of set-based scalarisation methods modelling hypervolume contribution, dominance ratio and distance measures. These permit the use of straightforward expected improvement as a cheap infill criterion. We investigated the performance of these novel strategies on standard multi-objective test problems, and compared them with the popular SMS-EGO and ParEGO methods. Unsurprisingly, our experiments show that the best strategy is problem dependent, but in many cases a cheaper strategy is at least as good as more expensive alternatives.
Preprint repository: https://ore.exeter.ac.uk/repository/handle/10871/27157
SAT based planning for multiagent systemsRavi Kuril
Multi-agent Classical planning using SAT approach. This document describes the approach and discusses all the experiments and the respective results. I have considered State of the art tools for comparison purpose. Implementation code can be found on GitHub link https://github.com/ravikuril/SATbasedClassicalPlanning . For more Information contact me on ravikuril.du.or@gmail.com
Alternative Infill Strategies for Expensive Multi-Objective OptimisationAlma Rahat
Many multi-objective optimisation problems incorporate computationally or financially expensive objective functions. State-of-the-art algorithms therefore construct surrogate model(s) of the parameter space to objective functions mapping to guide the choice of the next solution to expensively evaluate. Starting from an initial set of solutions, an infill criterion — a surrogate-based indicator of quality — is extremised to determine which solution to evaluate next, until the budget of expensive evaluations is exhausted. Many successful infill criteria are dependent on multi-dimensional integration, which may result in infill criteria that are themselves impractically expensive. We propose a computationally cheap infill criterion based on the minimum probability of improvement over the estimated Pareto set. We also present a range of set-based scalarisation methods modelling hypervolume contribution, dominance ratio and distance measures. These permit the use of straightforward expected improvement as a cheap infill criterion. We investigated the performance of these novel strategies on standard multi-objective test problems, and compared them with the popular SMS-EGO and ParEGO methods. Unsurprisingly, our experiments show that the best strategy is problem dependent, but in many cases a cheaper strategy is at least as good as more expensive alternatives.
Preprint repository: https://ore.exeter.ac.uk/repository/handle/10871/27157
SAT based planning for multiagent systemsRavi Kuril
Multi-agent Classical planning using SAT approach. This document describes the approach and discusses all the experiments and the respective results. I have considered State of the art tools for comparison purpose. Implementation code can be found on GitHub link https://github.com/ravikuril/SATbasedClassicalPlanning . For more Information contact me on ravikuril.du.or@gmail.com
OPAL: automated form understanding for the deep web - WWW 2012Giorgio Orsi
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding unlocks this content for applications ranging from crawlers to meta-search engines and is essential for improving usability and accessibility of the web. Form understanding has received surprisingly little attention other than as component in specific applications such as crawlers. No comprehensive approach to form understanding exists and previous works disagree even in the definition of the problem. In this paper, we present OPAL, the first comprehensive approach to form understanding. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines signals from the text, structure, and visual rendering of a web page, yielding robust characterisations of common design patterns. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern web forms OPAL outperforms previous approaches by a significant margin. For form interpretation, we introduce a template language to describe frequent form patterns. These two parts of OPAL combined yield form understanding with near perfect accuracy (> 98%).
Querying UML Class Diagrams - FoSSaCS 2012Giorgio Orsi
UML Class Diagrams (UCDs) are the best known class-based formalism for conceptual modeling. They are used by software engineers to model the intensional structure of a system in terms of classes, attributes and operations, and to express constraints that must hold for every instance of the system. Reasoning over UCDs is of paramount importance in design, validation, maintenance and system analysis; however, for medium and large software projects, reasoning over UCDs may be impractical. Query answering, in particular, can be used to verify whether a (possibly incomplete) instance of the system modeled by the UCD, i.e., a snapshot, enjoys a certain property. In this work, we study the problem of querying UCD instances, and we relate it to query answering under guarded Datalog +/-, that is, a powerful Datalog-based language for ontological modeling. We present an expressive and meaningful class of UCDs, named UCDLog, under which conjunctive query answering is tractable in the size of the instances.
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)Giorgio Orsi
Web forms are the interfaces of the deep web. Though modern web browsers provide facilities to assist in form filling, this assistance is limited to prior form fillings or keyword matching. Automatic form understanding enables a broad range of applications, including crawlers, meta-search engines, and usability and accessibility support for enhanced web browsing. In this demonstration, we use a novel form understanding approach, OPAL, to assist in form filling even for complex, previously unknown forms. OPAL associates form labels to fields by analyzing structural properties in the HTML encoding and visual features of the page rendering. OPAL interprets this labeling and classifies the fields according to a given domain ontology. The combination of these two properties, allows OPAL to deal effectively with many forms outside of the grasp of existing form filling techniques. In the UK real estate domain, OPAL achieves >99% accuracy in form understanding.
Search engines are the sinews of the web. These sinews have become strained, however: Where the web's function once was a mix of library and yellow pages, it has become the central marketplace for information of almost any kind. We search more and more for objects with specific characteristics, a car with a certain mileage, an affordable apartment close to a good school, or the latest accessory for our phones. Search engines all too often fail to provide reasonable answers, making us sift through dozens of websites with thousands of offers--never to be sure a better offer isn't just around the corner. What search engines are missing is understanding of the objects and their attributes published on websites.
Automatically identifying and extracting these objects is akin to alchemy: transforming unstructured web information into highly structured data with near perfect accuracy. With DIADEM we present a formula for this transformation, but at a price: DIADEM identifies and extracts data from a website with high accuracy. The price is that for this task we need to provide DIADEM with extensive knowledge about the ontology and phenomenology of the domain, i.e., about entities (and relations) and about the representation of these entities in the textual, structural, and visual language of a website of this domain. In this demonstration, we demonstrate with a first prototype of DIADEM that, in contrast to alchemists, DIADEM has developed a viable formula.
I am Elijah L. I am an Algorithm Assignment Expert at programminghomeworkhelp.com. I hold a Bachelor’s Degree in Programming, Leeds University, UK. I have been helping students with their homework for the past 6 years. I solve assignments related to Algorithms.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Algorithm assignments.
Intro to AI STRIPS Planning & Applications in Video-games Lecture3-Part1Stavros Vassos
This is a short course that aims to provide an introduction to the techniques currently used for the decision making of non-player characters (NPCs) in commercial video games, and show how a simple deliberation technique from academic artificial intelligence research can be employed to advance the state-of-the art.
For more information and downloading the supplementary material please use the following links:
http://stavros.lostre.org/2012/05/19/video-games-sapienza-roma-2012/
http://tinyurl.com/AI-NPC-LaSapienza
OPAL: automated form understanding for the deep web - WWW 2012Giorgio Orsi
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding unlocks this content for applications ranging from crawlers to meta-search engines and is essential for improving usability and accessibility of the web. Form understanding has received surprisingly little attention other than as component in specific applications such as crawlers. No comprehensive approach to form understanding exists and previous works disagree even in the definition of the problem. In this paper, we present OPAL, the first comprehensive approach to form understanding. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines signals from the text, structure, and visual rendering of a web page, yielding robust characterisations of common design patterns. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern web forms OPAL outperforms previous approaches by a significant margin. For form interpretation, we introduce a template language to describe frequent form patterns. These two parts of OPAL combined yield form understanding with near perfect accuracy (> 98%).
Querying UML Class Diagrams - FoSSaCS 2012Giorgio Orsi
UML Class Diagrams (UCDs) are the best known class-based formalism for conceptual modeling. They are used by software engineers to model the intensional structure of a system in terms of classes, attributes and operations, and to express constraints that must hold for every instance of the system. Reasoning over UCDs is of paramount importance in design, validation, maintenance and system analysis; however, for medium and large software projects, reasoning over UCDs may be impractical. Query answering, in particular, can be used to verify whether a (possibly incomplete) instance of the system modeled by the UCD, i.e., a snapshot, enjoys a certain property. In this work, we study the problem of querying UCD instances, and we relate it to query answering under guarded Datalog +/-, that is, a powerful Datalog-based language for ontological modeling. We present an expressive and meaningful class of UCDs, named UCDLog, under which conjunctive query answering is tractable in the size of the instances.
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)Giorgio Orsi
Web forms are the interfaces of the deep web. Though modern web browsers provide facilities to assist in form filling, this assistance is limited to prior form fillings or keyword matching. Automatic form understanding enables a broad range of applications, including crawlers, meta-search engines, and usability and accessibility support for enhanced web browsing. In this demonstration, we use a novel form understanding approach, OPAL, to assist in form filling even for complex, previously unknown forms. OPAL associates form labels to fields by analyzing structural properties in the HTML encoding and visual features of the page rendering. OPAL interprets this labeling and classifies the fields according to a given domain ontology. The combination of these two properties, allows OPAL to deal effectively with many forms outside of the grasp of existing form filling techniques. In the UK real estate domain, OPAL achieves >99% accuracy in form understanding.
Search engines are the sinews of the web. These sinews have become strained, however: Where the web's function once was a mix of library and yellow pages, it has become the central marketplace for information of almost any kind. We search more and more for objects with specific characteristics, a car with a certain mileage, an affordable apartment close to a good school, or the latest accessory for our phones. Search engines all too often fail to provide reasonable answers, making us sift through dozens of websites with thousands of offers--never to be sure a better offer isn't just around the corner. What search engines are missing is understanding of the objects and their attributes published on websites.
Automatically identifying and extracting these objects is akin to alchemy: transforming unstructured web information into highly structured data with near perfect accuracy. With DIADEM we present a formula for this transformation, but at a price: DIADEM identifies and extracts data from a website with high accuracy. The price is that for this task we need to provide DIADEM with extensive knowledge about the ontology and phenomenology of the domain, i.e., about entities (and relations) and about the representation of these entities in the textual, structural, and visual language of a website of this domain. In this demonstration, we demonstrate with a first prototype of DIADEM that, in contrast to alchemists, DIADEM has developed a viable formula.
I am Elijah L. I am an Algorithm Assignment Expert at programminghomeworkhelp.com. I hold a Bachelor’s Degree in Programming, Leeds University, UK. I have been helping students with their homework for the past 6 years. I solve assignments related to Algorithms.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Algorithm assignments.
Intro to AI STRIPS Planning & Applications in Video-games Lecture3-Part1Stavros Vassos
This is a short course that aims to provide an introduction to the techniques currently used for the decision making of non-player characters (NPCs) in commercial video games, and show how a simple deliberation technique from academic artificial intelligence research can be employed to advance the state-of-the art.
For more information and downloading the supplementary material please use the following links:
http://stavros.lostre.org/2012/05/19/video-games-sapienza-roma-2012/
http://tinyurl.com/AI-NPC-LaSapienza
Köhler, Sven, Bertram Ludäscher, and Yannis Smaragdakis. 2012. “Declarative Datalog Debugging for Mere Mortals.” In Datalog in Academia and Industry, edited by Pablo Barceló and Reinhard Pichler, 111–22. Lecture Notes in Computer Science 7494. Springer Berlin Heidelberg. doi:10.1007/978-3-642-32925-8_12.
Abstract. Tracing why a “faulty” fact A is in the model M = P(I) of program P on input I quickly gets tedious, even for small examples. We propose a simple method for debugging and “logically profiling” P by generating a provenance-enriched rewriting P̂, which records rule firings according to the logical semantics. The resulting provenance graph can be easily queried and analyzed using a set of predefined and ad-hoc queries. We have prototypically implemented our approach for two different Datalog engines (DLV and LogicBlox), demonstrating the simplicity, effectiveness, and system-independent nature of our method.
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014Giorgio Orsi
ROSeAnn - Reconciling Opinions of Semantic Annotators. VLDB 2014 Conference.
A growing number of resources are available for enriching documents with semantic annotations. While originally focused on a few standard classes of annotations, the ecosystem of annotators is now becoming increasingly diverse. Although annotators often
have very different vocabularies, with both high-level and specialist concepts, they also have many semantic interconnections. We will show that both the overlap and the diversity in annotator vocabularies motivate the need for semantic annotation integration: middleware
that produces a unified annotation on top of diverse semantic annotators. On the one hand, the diversity of vocabulary allows applications to benefit from the much richer vocabulary available in an integrated vocabulary. On the other hand, we present evidence that the most widely-used annotators on the web suffer from serious accuracy deficiencies: the overlap in vocabularies from individual annotators allows an integrated annotator to boost accuracy by exploiting inter-annotator agreement and disagreement.
The integration of semantic annotations leads to new challenges, both compared to usual data integration scenarios and to standard aggregation of machine learning tools. We overview an approach to these challenges that performs ontology-aware aggregation. We introduce an approach that requires no training data, making use of ideas from database repair. We experimentally compare this with a supervised approach, which adapts maximal entropy Markov models to the setting of ontology-based annotations. We further experimentally
compare both these approaches with respect to ontology-unaware supervised approaches, and to individual annotators.
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsGiorgio Orsi
The Semantic Web effort has steadily been gaining traction in the recent years. In particular,Web search companies are recently realizing that their products need to evolve towards having richer semantic search capabilities. Description logics (DLs) have been adopted as the formal underpinnings for Semantic Web languages used in describing ontologies. Reasoning under uncertainty has recently taken a leading role in this arena, given the nature of data found on theWeb. In this paper, we present a probabilistic extension of the DL EL++ (which underlies the OWL2 EL profile) using Markov logic networks (MLNs) as probabilistic semantics. This extension is tightly coupled, meaning that probabilistic annotations in formulas can refer to objects in the ontology. We show that, even though the tightly coupled nature of our language means that many basic operations are data-intractable, we can leverage a sublanguage of MLNs that allows to rank the atomic consequences of an ontology relative to their probability values (called ranking queries) even when these values are not fully computed. We present an anytime algorithm to answer ranking queries, and provide an upper bound on the error that it incurs, as well as a criterion to decide when results are guaranteed to be correct.
Wrapper induction faces a dilemma: To reach web scale, it requires automatically generated examples, but to produce accurate results, these examples must have the quality of human annotations. We resolve this conflict with AMBER, a system for fully automated data extraction from result pages. In contrast to previous approaches, AMBER employs domain specific gazetteers to discern basic domain attributes on a page, and leverages repeated occurrences of similar attributes to group related attributes into records rather than relying on the noisy structure of the DOM. With this approach AMBER is able to identify records and their attributes with almost perfect accuracy (>98%) on a large sample of websites. To make such an approach feasible at scale, AMBER automatically learns domain gazetteers from a small seed set. In this demonstration, we show how AMBER uses the repeated structure of records on deep web result pages to learn such gazetteers. This is only possible with a highly accurate extraction system. Depending on its parametrization, this learning process runs either fully automatically or with human interaction. We show how AMBER bootstraps a gazetteer for UK locations in 4 iterations: From a small seed sample we achieve 94.4% accuracy in recognizing UK locations in the 4th iteration.
Nyaya: Semantic data markets: a flexible environment for knowledge management...Giorgio Orsi
We present Nyaya , a flexible system for the management of Semantic-Web data which couples a general-purpose storage mechanism with efficient ontology reasoning and querying capabilities. Nyaya processes large Semantic-Web datasets,
expressed in a variety of formalisms, by transforming them into a collection of Semantic Data Kiosks. Each kiosk exposes the native meta-data in a uniform fashion using Datalog± , a very general rule-based language for the representation of ontological constraints. The kiosks form a Semantic Data Market where the data in each kiosk can be uniformly accessed using conjunctive queries and where users can specify user-defined constraints over the data. Nyaya is easily extensible and robust to updates of both data and meta-data in the kiosk and can readily adapt to different logical organization of the persistent storage. The approach has been experimented using well-known benchmarks, and compared to state-of-the-art research prototypes and commercial systems.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Query Rewriting and Optimization for Ontological Databases
1. Query Rewriting and Optimization
for Ontological Databases!
Georg Gottlob, Giorgio Orsi, and Andreas Pieris!
name.surname@cs.ox.ac.uk!
!
!
Department of Computer Science!
Oxford University!
!
Dagstuhl 2014
References: !Gottlob et Al. Query Rewriting and Optimization for Ontological Databases.
! ! ! ACM TODS 2014.
3. Ontological query answering
query
theory
?
Q
⌃
D
database
chasing is an option but we would like to use a DBMS to answer the query.
4. Our setting:
The constraint language:
tuple-generating dependencies (TGDs)
8X8Y '(X,Y) ! 9Z (X,Z)
8X '(X) ! Xi = Xj equality-generating dependencies (EGDs)
8X '(X)!? negative constraints (NCs)
Query language:
Q : 9Y'(X,Y) conjunctive queries (CQs)
EGDs and NCs that are well-behaved can be checked beforehand and
then ignored during the rewriting (i.e., non conflicting, separable).
5. FO-rewritability
Very well known in the DL area [Calvanese et Al. JAR’07]
Q ⌃
Q⌃
rewrite
using
D
8D : chase(D,⌃) |= Q $ D |= Q⌃
6. Concrete FO-rewritable classes
Linear TGDs
one body atom (generalize IDs)
[Cali et Al. PODS’09]
8X8Y8Z r(X,Y) ! 9Z (X,Z)
Sticky TGDs
0:12 t(X,Y,Z) → ∃W s(Y,W)
[Cali et Al. VLDB’10]
Join variables stick to the inferred atoms
Abstract classes
Finite unification sets FUS [Baget et Al. AIJ’11]
r(X,Y),p(Y,Z) → ∃W t(X,Y,W)
r(X,Y),p(Y,Z) → ∃W t(X,Y,W)
t(X,Y,Z) → ∃W s(X,W)
×
√
(a) Fig. 3. Sticky property and propagation
7. Bounded derivation depth property
[Cali et Al. PODS’09]
BDDP ) (positive) first-order rewritability
Q
D
chase(D,Σ)
depth depends on
Q and Σ
but not on D
8. BDDP rewriting
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Q : 9A hasCollaborator(A, db,B)
[Cali et Al. PODS’09]: enumerate all the possible ancestor databases such
that the input query maps to their chase.
9. BDDP rewriting
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Q : 9A hasCollaborator(A, db,B)
[Cali et Al. PODS’09]: enumerate all the possible ancestor databases such
that the input query maps to their chase.
D
hasCollaborator(a, db, b)
D⌃ hasCollaborator (a, db, b)
collaborator (a)
Q maps!
constants fromQ
frozen nulls
Q1 : 9AhasCollaborator(A, db,B)
10. BDDP rewriting
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Q : 9A hasCollaborator(A, db,B)
[Cali et Al. PODS’09]: enumerate all the possible ancestor databases such
that the input query maps to their chase.
D D
…
hasCollaborator(a, db, b)
Q maps!
inArea(a, db)
project(a)
inArea(a, db) D⌃
project(a)
Q1 : 9AhasCollaborator(A, db,B)
Q maps!
Q2 : project(A), inArea(A, db)
D⌃ hasCollaborator (a, db, b)
collaborator (a) hasCollaborator(z0, db, a)
collaborator (z0)
constants fromQ
frozen nulls
11. BDDP rewriting
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Q : 9A hasCollaborator(A, db,B)
[Cali et Al. PODS’09]: enumerate all the possible ancestor databases such
that the input query maps to their chase.
D D
…
hasCollaborator(a, db, b)
Q maps!
inArea(a, db)
project(a)
inArea(a, db) D⌃
project(a)
Q1 : 9AhasCollaborator(A, db,B)
Q maps!
Q2 : project(A), inArea(A, db)
Q⌃ : Q1 _ Q2 _ . . .
D⌃ hasCollaborator (a, db, b)
collaborator (a) hasCollaborator(z0, db, a)
collaborator (z0)
constants fromQ
frozen nulls
12. Backward resolution
[Calvanese et Al. JAR’07, Cali et Al. AMW’10]:
Construct a (perfect) rewriting via backward resolution.
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Q : 9A hasCollaborator(A,B,C), collaborator (A)
13. Backward resolution
[Calvanese et Al. JAR’07, Cali et Al. AMW’10]:
Construct a (perfect) rewriting via backward resolution.
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Q : 9A hasCollaborator(A,B,C), collaborator (A)
Step 1: rewriting
Q : 9A hasCollaborator(A,B,C), collaborator (A)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
14. Backward resolution
[Calvanese et Al. JAR’07, Cali et Al. AMW’10]:
Construct a (perfect) rewriting via backward resolution.
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Step 1: rewriting
Q : 9A hasCollaborator(A,B,C), collaborator (A)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
μ : {X7! A}
Q : 9A hasCollaborator(A,B,C), collaborator (A)
15. Backward resolution
[Calvanese et Al. JAR’07, Cali et Al. AMW’10]:
Construct a (perfect) rewriting via backward resolution.
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Q : 9A hasCollaborator(A,B,C), collaborator (A)
Step 1: rewriting
Q : 9A hasCollaborator(A,B,C), collaborator (A)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
μ : {X7! A}
Q0 : 9A9V09V1 hasCollaborator(A,B,C), hasCollaborator(A, V0, V1)
16. Careful with MGUs!
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
unsound rewriting
Q : hasCollaborator(a, b,C)
μ : {Z7! a, Y7! b,X7! C}
Q0 : project(C), inArea(C, b)
17. Careful with MGUs!
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
unsound rewriting
Q : hasCollaborator(a, b,C)
μ : {Z7! a, Y7! b,X7! C}
Q0 : project(C), inArea(C, b)
D
project(a)
inArea(a, b)
D⌃ project(a)
inArea(a, b)
hasCollaborator(z0, b, a)
Q does not map here but Q0 does!
Applicability: prevent MGUs which unify existential
variables with join variables (included self) and
constants.
19. Backward resolution
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Step 2: minimization
Q0 : 9A9V09V1 hasCollaborator(A,B,C), hasCollaborator(A, V0, V1)
stuck! Applicability prevents us from rewriting. Rewriting is still incomplete.
20. Backward resolution
1 : project(X), inArea(X, Y ) ! 9Z hasCollaborator(Z, Y ,X)
2 : hasCollaborator(X, Y ,Z) ! collaborator(X)
Step 2: minimization
Q0 : 9A9V09V1 hasCollaborator(A,B,C), hasCollaborator(A, V0, V1)
stuck! Applicability prevents us from rewriting. Rewriting is still incomplete.
subquery
μ : {V07! B, V17! C}
(different from Chandra Merlin)
Q00 : 9A hasCollaborator(A,B,C)
now we can rewrite with 2
Loop until no more steps are applicable.
In general, this process might not terminate, each class of TGDs might
require a different stop condition (exploit linearity and stickiness).
21. Opt. 1: Factorization
Exhaustive minimisations are not necessary.
Only those which free shared existential variables are useful reductions
: s(X) ! 9Y r(X, Y )
Q : 9B9C9D r(A,B), r(C,B), r(B,D)
22. Opt. 1: Factorization
Exhaustive minimisations are not necessary.
Only those which free shared existential variables are useful reductions
: s(X) ! 9Y r(X, Y )
Q : 9B9C9D r(A,B), r(C,B), r(B,D) μ : {A7! C}
Q0 : 9B9D r(C,B), r(B,D) B is still shared!
23. Opt. 1: Factorization
Exhaustive minimisations are not necessary.
Only those which free shared existential variables are useful reductions
: s(X) ! 9Y r(X, Y )
Q : 9B9C9D r(A,B), r(C,B), r(B,D) μ : {A7! C}
Q0 : 9B9D r(C,B), r(B,D) B is still shared!
Piece unifiers, factorizations [Baget et Al. ICCS’96, Gottlob et Al. ICDE ’11].
: s(X), r(X, Y ) ! 9Z t(X, Y ,Z)
Q1 : 9B9C t(a,A,C), t(B, a,C)
24. Opt. 1: Factorization
Exhaustive minimisations are not necessary.
: s(X) ! 9Y r(X, Y )
Q : 9B9C9D r(A,B), r(C,B), r(B,D) μ : {A7! C}
Q0 : 9B9D r(C,B), r(B,D) B is still shared!
Backward resolution still complete if we perform only those that free shared
existential variables [Baget et Al. ICCS’96]
Application in query rewriting: [Gottlob et Al. ICDE ’11, Konig et Al. RR’12].
: s(X), r(X, Y ) ! 9Z t(X, Y ,Z)
Q1 : 9B9C t(a,A,C), t(B, a,C)
25. Opt. 1: Factorization
Exhaustive minimisations are not necessary.
: s(X) ! 9Y r(X, Y )
Q : 9B9C9D r(A,B), r(C,B), r(B,D) μ : {A7! C}
Q0 : 9B9D r(C,B), r(B,D) B is still shared!
Backward resolution still complete if we perform only those that free shared
existential variables [Baget et Al. ICCS’96]
Application in query rewriting: [Gottlob et Al. ICDE ’11, Konig et Al. RR’12].
: s(X), r(X, Y ) ! 9Z t(X, Y ,Z)
Q1 : 9B9C t(a,A,C), t(B, a,C)
Q2 : 9B9C9E s(C), t(A,B,C), t(A,E,C)
26. Opt. 1: Factorization
Exhaustive minimisations are not necessary.
: s(X) ! 9Y r(X, Y )
Q : 9B9C9D r(A,B), r(C,B), r(B,D) μ : {A7! C}
Q0 : 9B9D r(C,B), r(B,D) B is still shared!
Backward resolution still complete if we perform only those that free shared
existential variables [Baget et Al. ICCS’96]
Application in query rewriting: [Gottlob et Al. ICDE ’11, Konig et Al. RR’12].
: s(X), r(X, Y ) ! 9Z t(X, Y ,Z)
Q1 : 9B9C t(a,A,C), t(B, a,C)
Q2 : 9B9C9E s(C), t(A,B,C), t(A,E,C)
Q3 : 9B9C t(A,B,C), t(A,C,C)
27. Opt. 1: Factorization
Exhaustive minimisations are not necessary.
: s(X) ! 9Y r(X, Y )
Q : 9B9C9D r(A,B), r(C,B), r(B,D) μ : {A7! C}
Q0 : 9B9D r(C,B), r(B,D) B is still shared!
Backward resolution still complete if we perform only those that free shared
existential variables [Baget et Al. ICCS’96]
Application in query rewriting: [Gottlob et Al. ICDE ’11, Konig et Al. RR’12].
: s(X), r(X, Y ) ! 9Z t(X, Y ,Z)
Q1 : 9B9C t(a,A,C), t(B, a,C)
Q2 : 9B9C9E s(C), t(A,B,C), t(A,E,C)
Q3 : 9B9C t(A,B,C), t(A,C,C)
28. Opt. 2: Atom coverage
Queries can be reduced during rewriting.
We eliminate rewriting steps!
: t(X, Y ) ! 9Z r(X, Y ,Z)
Q : 9B9C t(A,B), r(A,B,C), s(A,B,B)
29. Opt. 2: Atom coverage
Queries can be reduced during rewriting.
We eliminate rewriting steps!
: t(X, Y ) ! 9Z r(X, Y ,Z)
Q : 9B9C t(A,B), r(A,B,C), s(A,B,B)
r(A,B,C)
Under the atom can be eliminated, obtaining an (equivalent) query
Q0 : 9B t(A,B), s(A,B,B)
Chase and Backchase [Deutsch et Al. PODS’08, Maier VLDBJ’14]
But… chase is of exponential size and has to be constructed for (worst-case)
exponentially many queries.
30. Opt. 2: Atom coverage
For linear TGDs (one body atom), it amounts to reachability in a graph
(but we lose some implications…).
: t(X, Y ) ! 9Z r(X, Y ,Z)
Precompute a propagation graph
t[1]
t[2]
r[1]
r[2]
r[3]
Roughly:
check that it is possible to propagate terms
from the covering atom to the covered one
using the same sequence of TGDs
The elimination strategy is unique w.r.t. the
size (number of atoms) of the resulting query.
Q : 9B9C t(A,B), r(A,B,C), s(A,B,B)
31. Opt. 2: Atom coverage
Instead of testing reachability we can use a cover graph
Q(ui.eery., Rthewe rtitriangnsanitdivOe pctilmoiszuatrieon ofof rthOen tpolroogpicaalgDaatitoabna gsersaph). 0:41
Table IV. Propagation and cover graphs.
Size (#nodes,#edges) LP Time (ms) Memory
P-GRAPH C-GRAPH C-GRAPHP-GRAPHC-GRAPHP-GRAPHC-GRAPH
V (214,445) (214,1194) 7 4 158 190Kb 4.7Mb
S (41,103) (41,405) 8 1 120 45Kb 4,4Mb
U (86,189) (86,416) 6 1 37 81Kb 4.4Mb
A (135,319) (135,1133) 10 43 708 154Kb 4.7Mb
P5 (15,32) (15,43) 3 0 1 14Kb 4.3Mb
SF (100,195) (100,1,050) 19 1 346 84Kb 4.8Mb
CLQ (11,143) N/A N/A 2 N/A 39Kb N/A
Differently from the query graph, the propagation and the cover graph depend only
on the input ontology and not on the input query. Table IV reports the characteristics
of The both cover the propagation graph becomes graph (unmanageable P-GRAPH) and the for cover large graph ontologies (C-GRAPH) (Galen constructed
lite).
for each ontology. In particular, we report on the size of the two structures in terms
of Either the number pay the of reachability nodes and edges, price on the the time propagation necessary to graph construct (use a them, graph and DB).
their
memory footprint. For the cover graph, we also report the length of the longest label
on an edge (LP), corresponding to the longest tight sequence of TGDs that we have to
consider during the computation of cover sets. Since query elimination can be applied
34. Parallelization
The rewriting can be computed in parallel by decomposing the query
1 : p(X, Y ), s(Y ,Z) ! 9W t(Y ,X,W)
2 : t(X, Y ,Z) ! 9W p(W,Z)
1 : {t[3], p[2]}
2 : {p[1], t[2]}
Q : q(A,D) p(D,B), t(A,B,C), p(E,C)
Atoms that do not join on existential positions can be rewritten independently.
Q1 : q1(A,B) t(A,B,C), p(E,C)
Q2 : q2(D,B) p(D,B)
Bind in the head of the component
distinguished and cross-component
join variables.
reconciliation rule.
Q⇢ : q(A,D) q1(A,B), q2(D,B)
Then, unfold using the reconciliation rule.
36. A note on subsumption check
A classic optimization mechanism (but expensive) is subsumption check
Q1 subsumes Q2 if there exists μ such that μ(body(Q1)) ✓ body(Q2) and
μ(head(Q1)) = head(Q2)
Remove the subsumed queries from the rewriting.
Three modes:
• TAIL, at the end of the rewriting procedure (guarantees minimality)
• intra-rewriting (IREW), after each rewriting step (guarantees minimality)
• Careful! the rewriting operator must be prunable [Konig et Al. SWJ ’14]
• intra-decomposition (IDEC), like TAIL but for each component (parallel version)
40. Caching
App–8 G. Gottlob et al.
Classical caching mechanisms can be applied during rewriting
Table VIII. The impact of caching (per-query averages).
V S U A P5 SF CLQ
Factorization 0 1 0 37k 9k 0 0
Same invocation (%) 0 0 0 0 0 0 0
Distinct input 0 1 0 37k 9k 0 0
Atom coverage 12 18 12 642k 202k 10 N/A
Same invocation (%) 0 12 11.2 86 63.4 0 N/A
Distinct input 12 2 9 3k 2k 10 N/A
Homomorphism check 447 12 8 863k 145 351 1.4k
Same invocation (%) 0 0 0 0 0 0 0
Distinct input 447 12 8 863k 145 351 1.4k
MGU computation 457 97 25 294k 93k 500k 24k
Same invocation (%) 72.3 50.4 34.8 98.2 79.6 77.6 98.3
Distinct input 73 27 13 384 6k 92 345
Canonical renaming 5k 249 558 4.6M 63k 17k 6k
Same invocation (%) 90.6 66.8 64.6 97.2 78.8 63.4 75.6
Distinct input 372 69 106 76k 15k 8k 2k
Bottom line: caching canonical renamings and MGUs does most of the job.
that eliminatek+1(q, S1,Σ)eliminatek+1(q, S2,Σ) but they have the same cardinality.
Now, consider an atom ̸= b ∈ body(q). If ak ≺q
b, since q
, by transitivity of
Σ ak+1 ≺Σ ak≺q
Σ, we get that ak+1 ≺q
Σ b. Conversely, if ak+1 ≺q
Σ b, since ak ≺q
Σ ak+1, we get that
Caching ak ≺q
b. Hence, factorizations ak ≺q
b iff and ak+homomorphic ≺q
checks is useless, i.e., our
Σ Σ 1 algorithm explores the rewriting space efficiently.
Σ b, or, equivalently, ak ∈ cover(b) iff ak+1 ∈ cover(b).
Observe that, after the (k + 1)-th application of the for-loop, for each i ∈ {k +2, . . . ,n},
either cover(S1[i]) = cover(S2[i]), or cover(S1[i]) cover(S2[i]) = {ak+1} and cover(S2[i]) cover(S1[i]) = {ak}. Consequently, |eliminate(q, S1,Σ)| = |eliminate(q, S2,Σ)|.
41. What is left to do…
Parallelization
Recursive query decomposition
Ontology decomposition
Execution
Map-reducing queries and rewritings
Exploiting storage constraints [de Virgilio et Al. CIKM ’11, Rodriguez-Muro
et Al. DL’14]
Static analysis
Analyze the theory to determine which optimisations to apply
42. Full set of experiments and code at: https://bitbucket.org/giorsi/nyaya