SlideShare a Scribd company logo
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Semantic Data Management
Techniques for Federations of Endpoints-
Half-Day Tutorial at ESWC 2012
Maribel Acosta1,2, Cosmin Basca3, Gabriela Montoya1
Edna Ruckhaus1,Maria-Esther Vidal1
1 Universidad Sim´on Bol´ıvar, Venezuela
{macosta,gmontoya,ruckhaus,mvidal}@ldc.usb.ve
2 Institute AIFB, Karlsruhe Institute of Technology, Germany
Maribel.Acosta@aifb.uni-karlsruhe.de
3 Department of Informatics, University of Zurich, Switzerland
basca@ifi.uzh.ch
http://www.ldc.usb.ve/˜mvidal/TUTORIAL2012
May 28th, 2012
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Introduction
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Motivation
Cloud of Linked Data: a large number of
datasets.
Data locally or remotely accessed by RDF
engines and SPARQL endpoints.
RDF engines rely performance on physical
access and storage structures that are locally
stored, but:
loading the data and their links is not
always feasible.
The optimize-then-execute paradigm is
usually followed, but:
lack of statistics.
unpredictable conditions of remote
queries.
changing characteristics.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Motivation
Cloud of Linked Data: a large number of
datasets.
Data locally or remotely accessed by RDF
engines and SPARQL endpoints.
RDF engines rely performance on physical
access and storage structures that are locally
stored, but:
loading the data and their links is not
always feasible.
The optimize-then-execute paradigm is
usually followed, but:
lack of statistics.
unpredictable conditions of remote
queries.
changing characteristics.
Techniques are not adaptable to unpredictable data transfers or data availability.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Motivating Example
Clinical trials and drugs used for Breast, Colorectal, Ovarian, and Lung Cancer.
PREFIX linkedct: <http://data.linkedct.org/resource/linkedct/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX rdf:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia:<http://dbpedia.org/property/>
PREFIX elements:<http://purl.org/dc/elements/1.1/>
SELECT DISTINCT ?fn1 ?fn2 ?fn3 ?fn4 ?I ?DB ?ID1 ?ID2 ?L
WHERE {
?A1 foaf:page ?fn1.?A1 linkedct:condition ?A2.
?A2 linkedct:condition name ”Colorectal Cancer”.
?A1 linkedct:intervention ?I.
?A3 foaf:page ?fn3.?A3 linkedct:condition ?A4.
?A4 linkedct:condition name ”Breast Cancer”.
?A3 linkedct:intervention ?I.
?A5 foaf:page ?fn5.?A5 linkedct:condition ?A6.
?A6 linkedct:condition name ”Ovarian Cancer”.
?A5 linkedct:intervention?I.
?A7 foaf:page?fn7.?A7 linkedct:condition ?A8.
?A8 linkedct:condition name ”Lung Cancer”.
?A7 linkedct:intervention ?I
?I linkedct:intervention type ”Drug”.
?I rdf:seeAlso ?DB. ?WK foaf:primaryTopic ?DB.
?WK dbpedia:revisionId ?ID1.
?WK dbpedia:pageId ?ID2.?WK elements:language ?L.}
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Motivating Example
Clinical trials and drugs used for Breast, Colorectal, Ovarian, and Lung Cancer.
PREFIX linkedct: <http://data.linkedct.org/resource/linkedct/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX rdf:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia:<http://dbpedia.org/property/>
PREFIX elements:<http://purl.org/dc/elements/1.1/>
SELECT DISTINCT ?fn1 ?fn2 ?fn3 ?fn4 ?I ?DB ?ID1 ?ID2 ?L
WHERE {
?A1 foaf:page ?fn1.?A1 linkedct:condition ?A2.
?A2 linkedct:condition name ”Colorectal Cancer”.
?A1 linkedct:intervention ?I.
?A3 foaf:page ?fn3.?A3 linkedct:condition ?A4.
?A4 linkedct:condition name ”Breast Cancer”.
?A3 linkedct:intervention ?I.
?A5 foaf:page ?fn5.?A5 linkedct:condition ?A6.
?A6 linkedct:condition name ”Ovarian Cancer”.
?A5 linkedct:intervention?I.
?A7 foaf:page?fn7.?A7 linkedct:condition ?A8.
?A8 linkedct:condition name ”Lung Cancer”.
?A7 linkedct:intervention ?I
?I linkedct:intervention type ”Drug”.
?I rdf:seeAlso ?DB. ?WK foaf:primaryTopic ?DB.
?WK dbpedia:revisionId ?ID1.
?WK dbpedia:pageId ?ID2.?WK elements:language ?L.}
select ?fn1 ?I ?DB
?A1 foaf:page ?fn1.
?A1 linkedct:condition ?A2.
?A2 linkedct:condition_name "Breastl Cancer".
?A1 linkedct:intervention ?I.
?I rdf:seeAlso ?DB
select ?fn1 ?I ?DB
?A1 foaf:page ?fn1.
?A1 linkedct:condition ?A2.
?A2 linkedct:condition_name "Ovarian Cancer".
?A1 linkedct:intervention ?I.
?I rdf:seeAlso ?DB
select ?fn1 ?I ?DB
?A1 foaf:page ?fn1.
?A1 linkedct:condition ?A2.
?A2 linkedct:condition_name "Lung Cancer".
?A1 linkedct:intervention ?I.
?I rdf:seeAlso ?DB
select ?fn1 ?I ?DB
?A1 foaf:page ?fn1.
?A1 linkedct:condition ?A2.
?A2 linkedct:condition_name
"Colorectal Cancer".
?A1 linkedct:intervention ?I.
?I rdf:seeAlso ?DB
JOIN
JOIN
JOIN
select ?ID1 ?ID2 ?L
?WK foaf:primaryTopic t(?DB).
?WK dbpedia:revisionId ?ID1.
?WK dbpedia:pageId ?ID2.
?WK elements:language ?L.
JOIN
t(?DB}
A star-shaped plan; four star-shaped sub-queries against LinkedCT
endpoints and one against DBPedia endpoint.
Endpoints may time out before producing any answer.
Query too complex, and needs to be decomposed into simple sub-queries.
Adaptive query processing techniques need to be implemented to incrementally produce answers.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Agenda
1 Basic Concepts and Background (20 minutes)
2 Adaptive Query Processing Techniques (30 minutes)
3 RDF Engines for Federations of Endpoints (30 minutes)
4 Questions and Discussion (5 minutes)
5 Coffee Break (30 minutes)
6 Hands-on (90 minutes)
Explanation of existing Benchmarks (10 minutes)
Evaluation (60 minutes)
Discussion of the Observed Results (15 minutes)
7 Summary and Closing (5 minutes)
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Background
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Introduction-Traditional Data Management
Systems[11, 16]
A Database Management System (DBMS) is a software designed
to:
consistently store and organize data,
Provide physical data structures to store large databases.
Stage large datasets between main memory and secondary
storage.
Protect data from inconsistencies.
and to provide a secure and concurrent access to the data.
Provide a query language.
Crash recovery.
Security and access control.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
The Optimize-Then-Execute Paradigm
Traditional Architectures
Query Processing is divided into three
mayor steps:
Statistics generation.
Query optimization.
Query Execution.
Database
Catalog
Statistics
Generator
Query
Optimizer
Query
Execution
Engine
Query
RunStatistics
Collected
Statistics
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
The Optimize-then-Execute Paradigm
Traditional Query Processing techniques:
Parse a declarative query.
Generate an intermediate representation of the query.
Produce an efficient logical and physical plan; minimize disk
I/O access.
Execute the query plan without making runtime decisions.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Query Optimizer Architecture[11, 16, 22]
Planner
Rewriter
Algebraic
Space
Method-
Structure
Space
Cost Model
Size-
Distribution
Estimator
Dictionary/Catalog
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Query Optimizer Architecture[11, 16, 22]
Planner
Rewriter
Algebraic
Space
Method-
Structure
Space
Cost Model
Size-
Distribution
Estimator
Dictionary/Catalog
Rewriter
Applies algebraic
transformations to the query to
produce a more efficient
equivalent query; some
transformations are: flatten out
queries, using views, etc.
Nested queries are decomposed
into query blocks.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Query Optimizer Architecture[11, 16, 22]
Planner
Rewriter
Algebraic
Space
Method-
Structure
Space
Cost Model
Size-
Distribution
Estimator
Dictionary/Catalog
Planner
Traverses the space of possible
execution plans following a
particular search strategy, e.g.,
dynamic programming,
randomized algorithms.
Query blocks are optimized one at
a time.
All available access methods are
considered for each relation.
All the ways to join the relations
one-at-a-time; permutations of
relations and different join
methods are considered.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Query Optimizer Architecture[11, 16, 22]
Planner
Rewriter
Algebraic
Space
Method-
Structure
Space
Cost Model
Size-
Distribution
Estimator
Dictionary/Catalog
Algebraic Space
Set of algebraic rules that
restrict the space of plans
transformations, and guide the
planner into the space of
efficient logical plans.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Query Optimizer Architecture[11, 16, 22]
Planner
Rewriter
Algebraic
Space
Method-
Structure
Space
Cost Model
Size-
Distribution
Estimator
Dictionary/Catalog
Method-Structure Space
Set of rules that restrict the
space of physical
implementations of each logical
plan, and guide the planner
into the space of efficient
physical plans.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Query Optimizer Architecture[11, 16, 22]
Planner
Rewriter
Algebraic
Space
Method-
Structure
Space
Cost Model
Size-
Distribution
Estimator
Dictionary/Catalog
Cost Model
Set of arithmetic rules or
statistical techniques, to
estimate the execution cost
and cardinality of logical and
physical plans.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Query Optimizer Architecture[11, 16, 22]
Planner
Rewriter
Algebraic
Space
Method-
Structure
Space
Cost Model
Size-
Distribution
Estimator
Dictionary/Catalog
Size-Distribution Estimator
Set of arithmetic rules or
statistical techniques to
estimate the cardinalities of
dataset to be accessed as well
as its distributions, and the
selectivity of a query select
condition.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Query Optimizer Architecture[11, 16, 22]
Planner
Rewriter
Algebraic
Space
Method-
Structure
Space
Cost Model
Size-
Distribution
Estimator
Dictionary/Catalog
Dictionary
Set of statistics that
characterize the dataset, e.g.,
number of different values of
an attribute, or different
subjects, properties, or values;
distributions, etc. Stored
statistics depend on the type of
size-distribution estimator.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Join Ordering: Search Space
EMP DEPT
ACCOUNT
BANK
bno=bno
ano=ano
dno=dno
Left-linear plan
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Join Ordering: Search Space
EMP DEPT
ACCOUNT
BANK
bno=bno
ano=ano
dno=dno
Left-linear plan
DEPT EMP
BANK ano=ano
dno=dno
bno=bno
ACCOUNT
Right-linear plan
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Join Ordering: Search Space
EMP DEPT
ACCOUNT
BANK
bno=bno
ano=ano
dno=dno
Left-linear plan
DEPT EMP
BANK ano=ano
dno=dno
bno=bno
ACCOUNT
Right-linear plan
DEPTEMP BANK
ano=ano
dno=dno bno=bno
ACCOUNT
Bushy Plan
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Index Nested Loop Join
Indices exist on the join attributes
of the inner relation of the join.
Cost depends on the type of index
and on the clustering:
To find matching tuples:
clustered index: 1 I/O,
unclustered: up to 1 I/O
per matching tuple.
SSN NAME
230
120
100
John Smith
Mary Williams
Jose Gomez
99 Peter Wigle
SSN COURSE GRADE
120 CS6019 B
100 CS6019 A
230 CS6019 B
100 CS6020 A
120 CS6020 B
230 CS6020 B
100 CS6021 A
120 CS6021 B
Index on SSN
Index Nested Loop Join by SSN
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Sort Merge Join
Sorts input tables on
the join attributes;
Scans sorted tables;
Merges on join
attributes.
Cost is optimal in case
input tables are sorted.
SSN NAME
230
120
100
John Smith
Mary Williams
Jose Gomez
SSN COURSE GRADE
120 CS6019 B
100 CS6019 A
230 CS6019 B
100 CS6020 A
120 CS6020 B
230 CS6020 B
100 CS6021 A
120 CS6021 B
99 Peter Wigle
Sort Merge by SSN
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Sort Merge Join
Sorts input tables on
the join attributes;
Scans sorted tables;
Merges on join
attributes.
Cost is optimal in case
input tables are sorted.
SSN NAME
230
120
100
John Smith
Mary Williams
Jose Gomez
SSN COURSE GRADE
120 CS6019 B
100 CS6019 A
230 CS6019 B
100 CS6020 A
120 CS6020 B
230 CS6020 B
100 CS6021 A
120 CS6021 B
99 Peter Wigle
Sort Merge by SSN
SSN NAME
230
120
100
John Smith
Mary Williams
Jose Gomez
SSN COURSE GRADE
120
CS6019
B
100 CS6019 A
230
CS6019 B
100 CS6020 A
120 CS6020 B
230 CS6020
B
100 CS6021 A
120 CS6021
B
99 Peter Wigle
Sorting by SSN
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Sort Merge Join
Sorts input tables on
the join attributes;
Scans sorted tables;
Merges on join
attributes.
Cost is optimal in case
input tables are sorted.
SSN NAME
230
120
100
John Smith
Mary Williams
Jose Gomez
SSN COURSE GRADE
120 CS6019 B
100 CS6019 A
230 CS6019 B
100 CS6020 A
120 CS6020 B
230 CS6020 B
100 CS6021 A
120 CS6021 B
99 Peter Wigle
Sort Merge by SSN
SSN NAME
230
120
100
John Smith
Mary Williams
Jose Gomez
SSN COURSE GRADE
120
CS6019
B
100 CS6019 A
230
CS6019 B
100 CS6020 A
120 CS6020 B
230 CS6020
B
100 CS6021 A
120 CS6021
B
99 Peter Wigle
Sorting by SSN
SSN COURSE GRADE
120
CS6019
B
100 CS6019 A
230
CS6019 B
100 CS6020 A
120 CS6020 B
230 CS6020
B
100 CS6021 A
120 CS6021
B
NAME
Jose Gomez
Jose Gomez
Jose Gomez
Mary Williams
Mary Williams
Mary Williams
John Smith
John Smith
Merging by SSN
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Hash Join
Two fold physical operator:
Partition: input tables are partitioned using a
hash function h1; only tuples in the
same partitions of two tables are
joined.
Probing: each partition of the outer table is
loaded in a hash table using a hash
function h2; the corresponding
partition of the inner table is scanned
to search for matches.
Tables
Partitions
A Join B
Partition
Phase
Probing
Phase
Hash Tables
Joined tuples
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Example Hash Join
Original Relations
Disk
INPUT
OUTPUT
1
2
B-1
hash
function
h1
1
2
B-1
Disk
Partitions
... ...
Partition Phase
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Example Hash Join
Original Relations
Disk
INPUT
OUTPUT
1
2
B-1
hash
function
h1
1
2
B-1
Disk
Partitions
... ...
Partition Phase
Partitions Input Tables
Disk
Output Buffer
hash
function
h2
Disk
Join Results
... ...
Input Buffer
...
Hash table for partition
h2
Probing Phase
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Execution Engine-Blocking Operators
Require more than one pass over the input to create
the output.
Hold intermediate state to compute the output.
Some blocking operators:
Project: must sort and then merge the input.
Sort Merge Join: must sort the inputs and then merge
them.
Hash Join: must partition the inputs, temporary
store the input in a hash table, probe
hashed data many times. Input Data
Materialization
Intermediate
Structures
Blocking
Operator
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Example-Blocking Operators
A B
D
Hash Join
Merge Join
Index Nested Loop Join
C
C.a Hashtable
SORT SORT
Materialization points
Blocking Operators
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Execution Engine-Pipelined Plans
Executes all operators in the plan in parallel.
Operators or iterators, consume tuples from its children and
output produced tuples to their parents.
Tuples flow from the leaf nodes towards the root of the tree.
Iterators have the following functions:
Open prepares an operator for producing data.
Next produces an item.
Close performs final house-keeping.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Examples of Iterator Functions [11]
Iterator Open Next Close
Scan open input call next on input close input
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Examples of Iterator Functions [11]
Iterator Open Next Close
Scan open input call next on input close input
Select open input call next on input close input
until an item qualifies
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Examples of Iterator Functions [11]
Iterator Open Next Close
Scan open input call next on input close input
Select open input call next on input close input
until an item qualifies
Hash Join allocate hash directory call next on probe close probe input
open left partition input input until a deallocate hash
build hash table match is found directory
next on partition input
close partition input
open right probe input
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Examples of Iterator Functions [11]
Iterator Open Next Close
Scan open input call next on input close input
Select open input call next on input close input
until an item qualifies
Hash Join allocate hash directory call next on probe close probe input
open left partition input input until a deallocate hash
build hash table match is found directory
next on partition input
close partition input
open right probe input
Merge Join open both inputs get next item from close both inputs
input with smaller key
until a match is found
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Limitations of the Optimize-Then-Execute Paradigm
Three Mayor Steps
Off-line statistics generation to
collect cardinalities, values
frequencies, and costs to access
data.
Cost and heuristic-based query
optimization to identify
”good-enough” plans.
Assumptions:
Different values of the attributes
Availability of the data
Cardinality of operators
Behavior of data sources
Query Execution:
pipelined operators are processed
at the time and output is produced by
its sub-operators.
materialized operators are
completely executed before firing its
parents for processing.
Blocked Iterator Model.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Introduction
Summary: The Optimize-Then-Execute Paradigm
Relies on knowledge about the cost and cardinality of the
accessed datasets.
Effectively achieves the generation of query execution plans
where the size of intermediate results is minimized.
Does not scale due to lack of statistics and unpredictable data
transfers or data availability.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
The Adaptive Query
Processing Paradigm
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Adaptive Query Processing [4, 9, 18]
Adaptivity is needed:
Misestimated or missing statistics.
Unexpected correlations.
Unpredictable costs.
Dynamically changing data, workload and sources availability.
Rates at which tuples arrive from the sources, change.
Initial Delays.
Slow Delivery
Bursty Arrivals
Iterative environments.
Systems able to change their behavior by learning behavior of data providers.
Receive information from the environment.
Use up-to-date information to change their behavior.
Keep iterating over time to adapt their behavior based on the environment
conditions.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Adaptive Query Processing
Database
Catalog
Statistics
Generator
Query
Optimizer
Query
Execution
Engine
Query
RunStatistics
Collected
Statistics
Typical Optimize-Then-Execute
Architecture.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Adaptive Query Processing
Database
Catalog
Statistics
Generator
Query
Optimizer
Query
Execution
Engine
Query
RunStatistics
Collected
Statistics
Typical Optimize-Then-Execute
Architecture.
Database
Catalog
Statistics
Generator
Query
Optimizer
Query
Execution
Engine
Query
RunStatistics
ReOptimize
Collected
Statistics
Adaptive Query Processing Architecture.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Agenda
Types of Adaptation.
Adaptivity in Relational Databases
Adaptivity in the Web of Data
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Types of Adaptive-Based Solution
Adaptation Level
Source Selection searching strategies to select the best sources
for answering a query according to real-time source conditions.
Query Execution strategies to adapt query processing to
environment conditions.
Granularity of the Adaptation
Fine-grained adaptation of small processes, e.g., per-tuple or
source basis.
Coarse-grained is attempted for large processes, e.g., several
queries are executed under certain assumptions about the
conditions of the environment.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Adaptive Query Processing Strategies
Plan-based Systems adapt to environment conditions, by
extending the optimize-then-execute paradigm.
Adaptive Join Processing(Intra-Operator) adaptivity is performed
at tuple level and query operators are able to adapt
to the environment changes, even in the context of a
fixed query plan.
Non-Pipelined Query Execution (Inter-Operator Re-optimization)
sub-queries against remote data sources are
re-scheduled based on uncertainties in the execution
cost of the remote access, size of the materialized
intermediate results, and unexpected delays.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Inter-Operator Re-optimization[4, 17, 21, 27]
Ingress [30] interleaves sub-plan generation with plan execution; for each table
in a query, the smallest tables are selected first until plans of all the tuples and
tables are considered.
Dynamic Query Planning [7] off-line define a set of constraints that optimal
plans must meet; during execution time, plans that meet constraints are
generated based on dataset conditions.
DEC RDB [2] implements competition, an adaptive strategy which generates
and runs multiple plans in parallel for a while, and after a short period of time it
selects the most promising ones and discards the rest.
Query Scrambling [27, 28] starts with a plan and it may change it on-the-fly,
depending on data transfer delays or misestimations; operators can be ordered
differently, or not directly combined tables can be joined by a new operator
during re-optimization.
Routing techniques avoid traditional optimizers, and generate plans for each
tuple during runtime by sending tuples through the pool of eligible operators;
Eddy [3, 25], M-Join [29] and STAIRs [8] follow this adaptive strategy.
Fine-Grained adaptivity is achieved.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Inter-Operator Re-optimization[4, 17, 21, 27]
Detect and correct situations where optimizers select sub-optimal plans, which
could be improved if accurate statistics are considered.
Existing approaches extend statistics tracking with a log of estimates considered
by the optimizer to generate the query plan, and of system conditions during
query execution.
Physical operators are implemented to track statistics and fire re-optimizations.
Re-optimization must ensure that a new plan will not duplicate or miss outputs,
and reuse previous computations.
Should avoid thrashing, i.e., spend most of the time and resources, monitoring
operators.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Query Scrambling[27, 28]
Query Scrambling is an execution query
method that reacts to data arrival delays.
First, a plan is generated, and reacts during
query execution to misestimations.
Existing operators can be rescheduled,
retaining one part of the plan, and scheduling
another.
Query operators are divided into runnable and
blocked.
Maximal runnable sub-trees are identified, i.e.,
sub-tree where all the operators are runnable.
Executed sub-trees are materialized until the
delayed sources become available and the
original plan can resumed.
Query optimizer is invoked to identify
non-delayed sub-trees that can be
materialized first.
New operators can be introduced to join
tables that were not directly joined in the
original plan.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Query Scrambling[27, 28]
Query Scrambling is an execution query
method that reacts to data arrival delays.
First, a plan is generated, and reacts during
query execution to misestimations.
Existing operators can be rescheduled,
retaining one part of the plan, and scheduling
another.
Query operators are divided into runnable and
blocked.
Maximal runnable sub-trees are identified, i.e.,
sub-tree where all the operators are runnable.
Executed sub-trees are materialized until the
delayed sources become available and the
original plan can resumed.
Query optimizer is invoked to identify
non-delayed sub-trees that can be
materialized first.
New operators can be introduced to join
tables that were not directly joined in the
original plan.
A B C D E F
1
3
5
4
2
Source A is delayed.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Query Scrambling[27, 28]
Query Scrambling is an execution query
method that reacts to data arrival delays.
First, a plan is generated, and reacts during
query execution to misestimations.
Existing operators can be rescheduled,
retaining one part of the plan, and scheduling
another.
Query operators are divided into runnable and
blocked.
Maximal runnable sub-trees are identified, i.e.,
sub-tree where all the operators are runnable.
Executed sub-trees are materialized until the
delayed sources become available and the
original plan can resumed.
Query optimizer is invoked to identify
non-delayed sub-trees that can be
materialized first.
New operators can be introduced to join
tables that were not directly joined in the
original plan.
A B C
D E F
1
3
5
Sub-tree 4 is materialized.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Query Scrambling[27, 28]
Query Scrambling is an execution query
method that reacts to data arrival delays.
First, a plan is generated, and reacts during
query execution to misestimations.
Existing operators can be rescheduled,
retaining one part of the plan, and scheduling
another.
Query operators are divided into runnable and
blocked.
Maximal runnable sub-trees are identified, i.e.,
sub-tree where all the operators are runnable.
Executed sub-trees are materialized until the
delayed sources become available and the
original plan can resumed.
Query optimizer is invoked to identify
non-delayed sub-trees that can be
materialized first.
New operators can be introduced to join
tables that were not directly joined in the
original plan.
A B C
D E F
1 3
5
6
A new operator is created between node 3 and D, E and
F.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Query Scrambling[27, 28]
Query Scrambling is an execution query
method that reacts to data arrival delays.
First, a plan is generated, and reacts during
query execution to misestimations.
Existing operators can be rescheduled,
retaining one part of the plan, and scheduling
another.
Query operators are divided into runnable and
blocked.
Maximal runnable sub-trees are identified, i.e.,
sub-tree where all the operators are runnable.
Executed sub-trees are materialized until the
delayed sources become available and the
original plan can resumed.
Query optimizer is invoked to identify
non-delayed sub-trees that can be
materialized first.
New operators can be introduced to join
tables that were not directly joined in the
original plan.
A
B C D E F
1
5
The new operator is executed.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Intra-Operator Techniques
Adapts plans to environment changes, even in presence of a
fixed plan.
Incrementally produce results as they become available.
Symmetric Hash Join [9] and Xjoin [26] overcome Hash Join
blocking limitations, and hide delays while producing answers
even if both input data sources become blocked.
Fine-Grained adaptivity is achieved at each operator.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Example-Blocking Operators
A B
D
Hash Join
Merge Join
Index Nested Loop Join
C
C.a Hashtable
SORT SORT
Materialization points
Blocking Operators
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Symmetric Hash Join[9]
Tables
Partitions
A Join B
Partition
Phase
Probing
Phase
Hash Tables
Joined tuples
Traditional Hash Join blocks the probing
phase until the partition phase is
completely finished.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
The Adaptive Query Processing Paradigm
Symmetric Hash Join[9]
Tables
Partitions
A Join B
Partition
Phase
Probing
Phase
Hash Tables
Joined tuples
Traditional Hash Join blocks the probing
phase until the partition phase is
completely finished.
A B
Hash
Table
Hash
Table
Joined tuples
Partition Partition
Probe
Probe
Symmetric Hash Join builds hash tables
on both inputs; when a tuple arrives, it is
inserted in its hash table and probed
against the other table.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptivity in the Web of Data
Adaptivity in the Web of
Data
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptivity in the Web of Data
Adaptive Query Processing for the Web of Data
Adaptivity during Source Selection relies on source or endpoint
descriptions and statistics describe data stored at the
endpoints.
Adaptivity at Query Execution Level approaches implement intra-
and inter-operator query processing techniques to
adapt execution to no accurate statistics,
uncontrollable network conditions, query complexity,
dynamic data and workload.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptivity in the Web of Data
Evolution of Adaptive Approaches in the Web of Data
Summary
Coarse-Grained Granularity Fine-Grained Granularity
Plan-based Techniques
i.e., ARQ, DARQ
Adaptive Source-Selection
Techniques
i.e., FedX, SPLENDID
Adaptive Query Processing
Techniques
i.e., ANAPSID, Avalanche,
Hartig et. al. 09, Ladwig and Tran 10
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptivity in the Web of Data
Adaptivity during Source Selection
coarse-Grained Approaches
Harth et al.[12]: Qtree-based index structure to store data
source statistics collected in a pre-processing stage; these
statistics are used for source ranking and to determine the
best sources to answer a join query.
Li and Heflin[20] tree structure to support integration of data
from multiple heterogeneous sources; nodes are annotated
with source join selectivities and used to select the
corresponding datasets.
Finer granularity is achieved whenever the statistics are
re-computed frequently.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptivity in the Web of Data
Fined-Grained Approaches to traverse
the Web of Data
Hartig et al.[13, 14, 15] source selection and link traversal are
interleaved during query execution time. A non-blocking
iterator model is used for traversing relevant links
relies on an asynchronous pipeline of iterators executed in an
order heuristically determined. e.g., the most selective iterators
are executed first [14].
query engine is able to adapt the execution to source
availability by detecting on-the-fly whenever a dereferenced
dataset stops responding; iterators can be cancelled, and other
requests are submitted to alive datasets.
Fine-Grained adaptivity is achieved during query selection and link
traversal.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptivity in the Web of Data
Combined-Grained Approaches to traverse
the Web of Data
Ladwig and Tran[19]
coarse-grained granularity: aggregate indexes keep data
distributions and are used for source selection.
Fine-grained granularity: Symmetric Hash Joins incrementally
produce answers even when sources become
blocked.
Adaptivity is managed at different levels and granularity during link
traversal, but these approaches are not tailored to access data from
federations of endpoints.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
SPARQL Endpoints
Implement SPARQL protocol and enable users to
query particular datasets or to dereference linked data.
Developed for very light-weight use.
Executions against several endpoints can be
unsuccessful because of endpoint timeouts,
unpredictable data transfers and endpoint availability.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
SPLENDID: SPARQL Endpoint Federation Exploiting
VOID Descriptions[10]
Statistical Information stored in VOID Descriptions is used to
perform source selection first stage.
Contact Source during query execution is used to improve the
preliminar source selection.
Pattern grouping is performed for set of triples that share the
same unique endpoint, and for sameAs triple that shares an
unbound subject variable with other triples.
Cost-based join ordering using dynamic programming
determines the best plan, giving preference to bushy tree
plans.
Bind Joins that reduces network overhead.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
SPLENDID
Example: drugs and their components’ url and image
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/>
PREFIX purl:<http://purl.org/dc/elements/1.1/>
PREFIX bio2rdf:<http://bio2rdf.org/ns/bio2rdf#>
SELECT $drug $keggUrl $chebiImage WHERE {
$drug rdf:type drugbank:drugs .
$drug drugbank:keggCompoundId $keggDrug .
$drug drugbank:genericName $drugBankName .
$keggDrug bio2rdf:url $keggUrl .
$chebiDrug purl:title $drugBankName .
$chebiDrug bio2rdf:image $chebiImage .
}
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
SPLENDID
Example: drugs and their components’ url and image
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/>
PREFIX purl:<http://purl.org/dc/elements/1.1/>
PREFIX bio2rdf:<http://bio2rdf.org/ns/bio2rdf#>
SELECT $drug $keggUrl $chebiImage WHERE {
(1)
(2)
(3)


¨
©
$drug rdf:type drugbank:drugs .
$drug drugbank:keggCompoundId $keggDrug .
$drug drugbank:genericName $drugBankName .
Drugbank
(4)
£
¢
 
¡$keggDrug bio2rdf:url $keggUrl . KEGG, Chebi
(5)
£
¢
 
¡$chebiDrug purl:title $drugBankName . KEGG, Chebi, Jamendo, SW Dog Food
(6)
£
¢
 
¡$chebiDrug bio2rdf:image $chebiImage . KEGG, Chebi
}
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
SPLENDID
Example: drugs and their components’ url and image
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/
PREFIX purl:http://purl.org/dc/elements/1.1/
PREFIX bio2rdf:http://bio2rdf.org/ns/bio2rdf#
SELECT $drug $keggUrl $chebiImage WHERE {
(1)
(2)
(3)


¨
©
$drug rdf:type drugbank:drugs .
$drug drugbank:keggCompoundId $keggDrug .
$drug drugbank:genericName $drugBankName .
Drugbank
(4)
£
¢
 
¡$keggDrug bio2rdf:url $keggUrl . KEGG, Chebi
(5)
£
¢
 
¡$chebiDrug purl:title $drugBankName . KEGG, Chebi, Jamendo, SW Dog Food
(6)
£
¢
 
¡$chebiDrug bio2rdf:image $chebiImage . KEGG, Chebi
}
Execution plan
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
SPLENDID
Example: drugs that interact with antibiotics, antiviral and antihypertensive agents
PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/
PREFIX dbcategory: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/
PREFIX owl: http://www.w3.org/2002/07/owl#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX dbowl: http://dbpedia.org/ontology/
PREFIX kegg: http://bio2rdf.org/ns/kegg#
SELECT DISTINCT ?drug ?enzyme ?reaction Where {
?drug1 drugbank:drugCategory dbcategory:antibiotics .
?drug2 drugbank:drugCategory dbcategory:antiviralAgents .
?drug3 drugbank:drugCategory dbcategory:antihypertensiveAgents .
?I1 drugbank:interactionDrug2 ?drug1 .
?I1 drugbank:interactionDrug1 ?drug .
?I2 drugbank:interactionDrug2 ?drug2 .
?I2 drugbank:interactionDrug1 ?drug .
?I3 drugbank:interactionDrug2 ?drug3 .
?I3 drugbank:interactionDrug1 ?drug .
?drug owl:sameAs ?drug5 .
?drug drugbank:keggCompoundId ?cpd .
?enzyme kegg:xSubstrate ?cpd .
?enzyme rdf:type kegg:Enzyme .
?reaction kegg:xEnzyme ?enzyme .
?reaction kegg:equation ?equation .
?drug5 rdf:type dbowl:Drug .
}
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
SPLENDID
Example: drugs that interact with antibiotics, antiviral and antihypertensive agents
PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/
PREFIX dbcategory: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/
PREFIX owl: http://www.w3.org/2002/07/owl#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX dbowl: http://dbpedia.org/ontology/
PREFIX kegg: http://bio2rdf.org/ns/kegg#
SELECT DISTINCT ?drug ?enzyme ?reaction Where {
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
9
8
6
7
?drug1 drugbank:drugCategory dbcategory:antibiotics .
?drug2 drugbank:drugCategory dbcategory:antiviralAgents .
?drug3 drugbank:drugCategory dbcategory:antihypertensiveAgents .
?I1 drugbank:interactionDrug2 ?drug1 .
?I1 drugbank:interactionDrug1 ?drug .
?I2 drugbank:interactionDrug2 ?drug2 .
?I2 drugbank:interactionDrug1 ?drug .
?I3 drugbank:interactionDrug2 ?drug3 .
?I3 drugbank:interactionDrug1 ?drug .
?drug owl:sameAs ?drug5 .
?drug drugbank:keggCompoundId ?cpd .
Drugbank
(12)
(13)
(14)
(15)




?enzyme kegg:xSubstrate ?cpd .
?enzyme rdf:type kegg:Enzyme .
?reaction kegg:xEnzyme ?enzyme .
?reaction kegg:equation ?equation .
KEGG
(16)
£
¢
 
¡?drug5 rdf:type dbowl:Drug . DBpedia
}
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
SPLENDID
Execution plan
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
Adaptivity at Endpoint Selection Level
Fine-Grained Granularity
FedX [24]
no knowledge about mappings or statistics is maintained.
endpoints are consulted on-the-fly to determine if a predicates
can be answered.
cache is used to record endpoint predicates.
queries are divided into exclusive groups, i.e., triple patterns
that can be executed by only one endpoint.
heuristically groups are ordered.
Fine-grained granularity at source selection level while
coarse-grained granularity during plan generation.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
FedX
Example: drugs and their components’ url and image
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/
PREFIX purl:http://purl.org/dc/elements/1.1/
PREFIX bio2rdf:http://bio2rdf.org/ns/bio2rdf#
SELECT $drug $keggUrl $chebiImage WHERE {
$drug rdf:type drugbank:drugs .
$drug drugbank:keggCompoundId $keggDrug .
$drug drugbank:genericName $drugBankName .
$keggDrug bio2rdf:url $keggUrl .
$chebiDrug purl:title $drugBankName .
$chebiDrug bio2rdf:image $chebiImage .
}
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
FedX
Example: drugs and their components’ url and image
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/
PREFIX purl:http://purl.org/dc/elements/1.1/
PREFIX bio2rdf:http://bio2rdf.org/ns/bio2rdf#
SELECT $drug $keggUrl $chebiImage WHERE {
(1)
(2)
(3)


¨
©
$drug rdf:type drugbank:drugs .
$drug drugbank:keggCompoundId $keggDrug .
$drug drugbank:genericName $drugBankName .
Drugbank
(4)
£
¢
 
¡$keggDrug bio2rdf:url $keggUrl . KEGG, Chebi
(5)
£
¢
 
¡$chebiDrug purl:title $drugBankName . KEGG, Chebi, Jamendo, SW Dog Food
(6)
£
¢
 
¡$chebiDrug bio2rdf:image $chebiImage . KEGG, Chebi
}
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
FedX
Example: drugs and their components’ url and image
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/
PREFIX purl:http://purl.org/dc/elements/1.1/
PREFIX bio2rdf:http://bio2rdf.org/ns/bio2rdf#
SELECT $drug $keggUrl $chebiImage WHERE {
(1)
(2)
(3)


¨
©
$drug rdf:type drugbank:drugs .
$drug drugbank:keggCompoundId $keggDrug .
$drug drugbank:genericName $drugBankName .
Drugbank
(4)
£
¢
 
¡$keggDrug bio2rdf:url $keggUrl . KEGG, Chebi
(5)
£
¢
 
¡$chebiDrug purl:title $drugBankName . KEGG, Chebi, Jamendo, SW Dog Food
(6)
£
¢
 
¡$chebiDrug bio2rdf:image $chebiImage . KEGG, Chebi
}
Execution plan
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
FedX
Example: drugs that interact with antibiotics, antiviral and antihypertensive agents
PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/
PREFIX dbcategory: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/
PREFIX owl: http://www.w3.org/2002/07/owl#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX dbowl: http://dbpedia.org/ontology/
PREFIX kegg: http://bio2rdf.org/ns/kegg#
SELECT DISTINCT ?drug ?enzyme ?reaction Where {
?drug1 drugbank:drugCategory dbcategory:antibiotics .
?drug2 drugbank:drugCategory dbcategory:antiviralAgents .
?drug3 drugbank:drugCategory dbcategory:antihypertensiveAgents .
?I1 drugbank:interactionDrug2 ?drug1 .
?I1 drugbank:interactionDrug1 ?drug .
?I2 drugbank:interactionDrug2 ?drug2 .
?I2 drugbank:interactionDrug1 ?drug .
?I3 drugbank:interactionDrug2 ?drug3 .
?I3 drugbank:interactionDrug1 ?drug .
?drug drugbank:keggCompoundId ?cpd .
?enzyme kegg:xSubstrate ?cpd .
?enzyme rdf:type kegg:Enzyme .
?reaction kegg:xEnzyme ?enzyme .
?reaction kegg:equation ?equation .
?drug5 rdf:type dbowl:Drug .
?drug owl:sameAs ?drug5 .
}
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
FedX
Example: drugs that interact with antibiotics, antiviral and antihypertensive agents
PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/
PREFIX dbcategory: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/
PREFIX owl: http://www.w3.org/2002/07/owl#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX dbowl: http://dbpedia.org/ontology/
PREFIX kegg: http://bio2rdf.org/ns/kegg#
SELECT DISTINCT ?drug ?enzyme ?reaction Where {
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
9
8
6
7
?drug1 drugbank:drugCategory dbcategory:antibiotics .
?drug2 drugbank:drugCategory dbcategory:antiviralAgents .
?drug3 drugbank:drugCategory dbcategory:antihypertensiveAgents .
?I1 drugbank:interactionDrug2 ?drug1 .
?I1 drugbank:interactionDrug1 ?drug .
?I2 drugbank:interactionDrug2 ?drug2 .
?I2 drugbank:interactionDrug1 ?drug .
?I3 drugbank:interactionDrug2 ?drug3 .
?I3 drugbank:interactionDrug1 ?drug .
?drug drugbank:keggCompoundId ?cpd .
Drugbank
(11)
(12)
(13)
(14)




?enzyme kegg:xSubstrate ?cpd .
?enzyme rdf:type kegg:Enzyme .
?reaction kegg:xEnzyme ?enzyme .
?reaction kegg:equation ?equation .
KEGG
(15)
£
¢
 
¡?drug5 rdf:type dbowl:Drug . DBpedia
(16)
£
¢
 
¡?drug owl:sameAs ?drug5 . DBpedia, KEGG, Drugbank
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
Execution plan
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
Adaptivity at Query Execution Level
Coarse-Grained Granularity
SPARQL-DQP [6] and ARQ 1 exploit information on SPARQL
1.1 queries to decide where the subqueries of triple patterns
will be executed. Both rely on statistics or heuristics during
query optimization, reaching thus coarse-grained adaptation
at plan level.
FedX [24] no adaptation is performed during query processing.
Finer granularity is achieved whenever the statistics are
re-computed frequently.
1
http://jena.sourceforge.net/ARQ
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
Adaptivity at Query Execution Level
Fine-Grained Granularity
AVALANCHE [5]
inter-operator solution to heuristically select on-the-fly the
most promising plans.
statistics about cardinalities and data distribution are
considered to identify possibly good plans.
competition strategy top-k plans are executed in parallel and
the output of a query corresponds to the answers produced by
the most promising plan(s) in a given period of time.
Fine-grained granularity. Decision of completely executing top-k
plan(s) is taken on-the-fly, based on the current conditions of the
environment.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
Adaptivity at Query Execution Level
Fine-Grained Granularity
Avalanche – execution model
SELECT ?name ?title WHERE {
?paper dblp:has-title ?title .
?paper dblp:has-author ?author .
?author dblp:has-name ?name .
}
find
endpoints request
statistics
distributed
join
Endpoints Directory
or Search Engine
Avalanche
SPARQL endpoint
... on the WWW
Query
Borrows from:
p2p systems
the Web
No completeness
guarantees, first K results
are obtained iteratively
(plan-level) given
stopping criteria:
timeout
relative saturation
LIMIT modifier
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
Adaptivity at Query Execution Level
Fine-Grained Granularity
Avalanche – architecture
3] Query Planning and Execution phaseAVALANCHE endpoints Search Engine
i.e., http://void.rkbexplorer.com/
1] Source Discovery phase
Plans
Queue
Plan
Generator
Finished
Plans
Queue
Results
Queue
Query
Stopper
Executor
MaterializerExecutor
Executor
Executor
Materializer
Materializer
Materializer
Results
Statistics
Requester
Query
Query
Parser
2] Statistics Gathering phase
Source
Selector
optimization is concomitant with query execution
scheduling is dynamic between plans (highlighted) – results come in out-of-order
plan space: |Hosts| ∗ |SubQueries| ∗ PlanDepth
multi-path depth-first heuristic search based on parametric cost model
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
Adaptivity at Query Execution Level
Fine-Grained Granularity
Avalanche – plan execution
1) Join (s1, s2)
3) Send(R1)
10) Update (s2, s1)
12) Send(R2)host A
2) R1 = Execute (s1)
13) R1 = Filter (R1, R2)
?sameFilm
?sameFilm dbpedia:hasPhotoCollection
?photoCollection.
?sameFilm dbpedia:studio
ʻʻProducers Circleʼʼ.
SubQuery 1 - s1
?film dc:title ?title.
?film movie:actor ?actor.
?film owl:sameAs ?sameFilm.
SubQuery 2 - s2
?actor a foaf:Person.
?actor movie:actor_name ?name.
SubQuery 3 - s3
5) Join (s2, s3)
6) Send(FR2)
host C
7) FR3 = ExecuteFilter (R3)
9) Send(FR3)
8) Update (s3, s2)
host B
4) FR2 = ExecuteFilter (R1)
11) R2 = Filter (FR2)
?actor
DBPEDIA endpointsLinked Movie Database endpoints
left linear execution strategy
not adaptive
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
Adaptivity at Query Execution Level
Fine-Grained Granularity
Avalanche – plan execution
1) Join (s1, s2)
3) Send(R1)
10) Update (s2, s1)
12) Send(R2)host A
2) R1 = Execute (s1)
13) R1 = Filter (R1, R2)
?sameFilm
?sameFilm dbpedia:hasPhotoCollection
?photoCollection.
?sameFilm dbpedia:studio
ʻʻProducers Circleʼʼ.
SubQuery 1 - s1
?film dc:title ?title.
?film movie:actor ?actor.
?film owl:sameAs ?sameFilm.
SubQuery 2 - s2
?actor a foaf:Person.
?actor movie:actor_name ?name.
SubQuery 3 - s3
5) Join (s2, s3)
6) Send(FR2)
host C
7) FR3 = ExecuteFilter (R3)
9) Send(FR3)
8) Update (s3, s2)
host B
4) FR2 = ExecuteFilter (R1)
11) R2 = Filter (FR2)
?actor
DBPEDIA endpointsLinked Movie Database endpoints
left linear execution strategy
not adaptive
could benefit from adaptive execution . . .
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
ANAPSID: An Adaptive Query Processing Engine for
Federation of Endpoints [1]
ANAPSID – Architecture
Mediator
Query Planner
SPARQL 1.1 Query
Query
Optimizer
Query
Decomposer
Adaptive
Query
Engine
Query
Schema Alignments
Endpoint Endpoint
Adaptive Plan
SPARQL 1.0 Query
ANAPSID
Decomposer: rewrites the query into sub-queries.
Optimizer: generates an optimal bushy tree execution plan.
Query Engine: implements inter- and intra-operators
techniques to perform fine-grained adaptivity.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
ANAPSID: An Adaptive Query Processing Engine for
Federation of Endpoints [1]
Adaptivity is performed at both source
selection and query execution levels.
Endpoint descriptions used to select
endpoints that can answer a triple pattern;
triple patterns that can be executed by the
same endpoint are grouped together;
endpoints may be contacted on-the-fly to
decide if it is able to answer a triple pattern.
Bushy-tree plans are generated to reduce the
size of intermediate results and the number
of Cartesian products.
Non-blocking operators efficiently retrieve
data from endpoints and keep producing
new answers even when endpoints become
blocked; agJoin, agUnion and agOptional
provide adaptive implementations of
SPARQL logical operators.
Mediator
Query Planner
SPARQL 1.1 Query
Query
Optimizer
Query
Decomposer
Adaptive
Query
Engine
Query
Schema Alignments
Endpoint Endpoint
Adaptive Plan
SPARQL 1.0 Query
ANAPSID
Fine-grained granularity during query
execution and medium fine-grained
during endpoint selection.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
ANAPSID Adaptive Join Operator
Operators implement main
memory replacement policies to
flush previously computed matches
to secondary memory, ensuring no
duplicate generation.
Each operator maintains a data
structure called Resource Join
Tuple (RJT), that records for each
instantiation of the variable(s), the
tuples that have already matched.
(R, {TR1, TR2, ..., TRn}).
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Adaptive Query Processing for Federations of Endpoints
ANAPSID Adaptive Join Operator
Adaptive Join Operator works in three
stages:
First stage is performed while at
least one endpoint sends data. If
main memory becomes full,
Resource Join Tuples are flushed
to secondary memory.
Second stage is performed
whenever both are blocked.
Resource Join Tuples is main
memory are joined with Resource
Join Tuples in secondary memory
Third stage is fired when both
endpoints finish sending data.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Hands-On Section
Fedbench Datasets and Queries
Dataset Endpoint name #triples
NY Times News 314k
LinkedMDB Movies 6.14M
Jamendo Music 1.04M
Geonames
Geography 7.98M
SW Dog Food SW 84k
KEGG Chemicals 10.9M
Drugbank Drugs 517k
ChEBI Compounds 4.77M
SP2B-10M Bibliographic 10M
DBPedia subset
Infobox Types 5.49M
Infobox Properties 10.80M
Titles 7.33M
Articles Categories 10.91M
Images 3.88M
SKOS Categories 2.24M
Other 2.45M
Queries
Cross Domain CD1-CD7.
Linked Data LD1-LD11.
Life Science Linked Data
LSD1-LD7.
Complex Queries C1-C5.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Hands-On Section
Delays Configuration
Message Size: 16KB
No Delay: perfect network.
Gamma 1 : fast network with a gamma distribution (α = 1, β = 0.3) of
response latency resulting in an average latency of 0.3 seconds.
Gamma 2: medium fast network with a gamma distribution (α = 3, β = 1.0)
of response latency resulting in an average latency of 3 seconds.
Gamma 3: slow network with a gamma distribution (α = 3, β = 1.5) of
response latency resulting in an average latency of 4.5 seconds per message.
Probability Density Function for Gamma
0
0.5
1
0 2 4 6 8 10 12 14 16 18 20
Density
Gamma 1 (α = 1, β = 0.3)
Gamma 2 (α = 3, β = 1.0)
Gamma 3 (α = 3, β = 1.5)
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Hands-On Section
Evaluation Parameters
Exec Result
Time Completeness
Query
# patterns
instantiations
result size
Data
data distribution
data correlations
dataset size
Network
#endpoints
latency
data transfer distribution
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Hands-On Section
Evaluation
Objective: Study the Adaptive Capability of the Different
Engines to Network Variation.
Measurements:
Time first tuple: elapsed time between the
execution of the plan and the output of the first
answer.
Time all tuples: elapsed time between the
execution of the plan and the output of the
whole answer.
Number of Results: number of tuples
produced for each query.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
Summary and Future Directions
Summary and Future
Directions
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
References
[1] Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio
Castillo, and Edna Ruckhaus. ANAPSID: AN Adaptive query
ProcesSing engIne for sparql enDpoints. In Proceedings of the
International Semantic Web Conference (ISWC), 2011.
[2] Gennady Antoshenkov and Mohamed Ziauddin. Query
processing and optimization in oracle rdb. VLDB J.,
5(4):229–237, 1996.
[3] Ron Avnur and Joseph M. Hellerstein. Eddies: Continuously
adaptive query processing. In SIGMOD Conference, pages
261–272, 2000.
[4] Shivnath Babu and Pedro Bizarro. Adaptive query processing
in the looking glass. In CIDR, pages 238–249, 2005.
[5] Cosmin Basca and Abraham Bernstein. Avalanche: Putting
the Spirit of the Web back into Semantic Web Querying. In
SSWS2010 Workshop, Shanghai, China, 2010.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
References
[6] Carlos Buil-Aranda, Marcelo Arenas, and Oscar Corcho.
Semantics and optimization of the sparql 1.1 federation
extension. In ESWC (2), pages 1–15, 2011.
[7] Richard L. Cole and Goetz Graefe. Optimization of dynamic
query evaluation plans. In SIGMOD Conference, pages
150–160, 1994.
[8] Amol Deshpande and Joseph M. Hellerstein. Lifting the
burden of history from adaptive query processing. In VLDB,
pages 948–959, 2004.
[9] Amol Deshpande, Zachary G. Ives, and Vijayshankar Raman.
Adaptive query processing. Foundations and Trends in
Databases, 1(1):1–140, 2007.
[10] Olaf G¨orlitz and Steffen Staab. SPLENDID: SPARQL
Endpoint Federation Exploiting VOID Descriptions. In
Proceedings of the 2nd International Workshop on Consuming
Linked Data, Bonn, Germany, 2011.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
References
[11] Goetz Graefe. Query evaluation techniques for large
databases. ACM Comput. Surv., 25(2):73–170, 1993.
[12] Andreas Harth, Katja Hose, Marcel Karnstedt, Axel Polleres,
Kai-Uwe Sattler, and J¨urgen Umbrich. Data summaries for
on-demand queries over linked data. In WWW, 2010.
[13] Olaf Hartig. How caching improves efficiency and result
completeness for querying linked data. In the 4th Linked Data
on the Web (LDOW) Workshop at the World Wide Web
Conference (WWW), 2011.
[14] Olaf Hartig. Zero-knowledge query planning for an iterator
implementation of link traversal based query execution. In
Extended Semantic Web Conference, 2011.
[15] Olaf Hartig, Christian Bizer, and Johann Christoph Freytag.
Executing sparql queries over the web of linked data. In
International Semantic Web Conference, pages 293–309,
2009.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
References
[16] Yannis E. Ioannidis. Query optimization. ACM Comput.
Surv., 28(1):121–123, 1996.
[17] Navin Kabra and David J. DeWitt. Efficient mid-query
re-optimization of sub-optimal query execution plans. In
SIGMOD Conference, pages 106–117, 1998.
[18] Kamlesh Laddhad and S. Sudarshan. Adaptive query
processing. Technical Report 05329014, Kanwal Rekhi School
of Information Technology, Indian Institute of Technology,
Bombay, Mumbai, 2006.
[19] Gunter Ladwig and Thanh Tran. Linked data query processing
strategies. In International Semantic Web Conference
(ISWC), 2010.
[20] Yingjie Li and Jeff Heflin. Using reformulation trees to
optimize queries over distributed heterogeneous sources. In
ISWC, pages 502–517, 2010.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
References
[21] Volker Markl, Vijayshankar Raman, David E. Simmen, Guy M.
Lohman, and Hamid Pirahesh. Robust query processing
through progressive optimization. In SIGMOD Conference,
pages 659–670, 2004.
[22] Priti Mishra and Margaret H. Eich. Join processing in
relational databases. ACM Comput. Surv., 24(1):63–113,
1992.
[23] Bastian Quilitz and Ulf Leser. Querying distributed rdf data
sources with sparql. In ESWC, pages 524–538, 2008.
[24] Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel,
and Michael Schmidt. Fedx: Optimization techniques for
federated query processing on linked data. In International
Semantic Web Conference (1), pages 601–616, 2011.
[25] Feng Tian and David J. DeWitt. Tuple routing strategies for
distributed eddies. In VLDB, pages 333–344, 2003.
Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012
References
[26] Tolga Urhan and Michael J. Franklin. Xjoin: A
reactively-scheduled pipelined join operator. IEEE Data Eng.
Bull., 23(2):27–33, 2000.
[27] Tolga Urhan and Michael J. Franklin. Dynamic pipeline
scheduling for improving interactive query performance. In
VLDB, pages 501–510, 2001.
[28] Tolga Urhan, Michael J. Franklin, and Laurent Amsaleg. Cost
based query scrambling for initial delays. In SIGMOD
Conference, pages 130–141, 1998.
[29] Stratis Viglas, Jeffrey F. Naughton, and Josef Burger.
Maximizing the output rate of multi-way join queries over
streaming information sources. In VLDB, pages 285–296,
2003.
[30] Eugene Wong and Karel Youssefi. Decomposition-a strategy
for query processing. ACM Trans. on Database Systems, 1(3),
1976.

More Related Content

Similar to Adaptive Semantic Data Management Techniques for Federations of Endpoints

Innovate2010 jazz keynote
Innovate2010 jazz keynoteInnovate2010 jazz keynote
Innovate2010 jazz keynote
oslc
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
dallemang
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
DataWorks Summit
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
Kellyton Brito
 
The Essentials Of Project Management
The Essentials Of Project ManagementThe Essentials Of Project Management
The Essentials Of Project Management
Laura Arrigo
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services”
diannepatricia
 
PROJECT FOR CSE BY TUSHAR DHOOT
PROJECT FOR CSE BY TUSHAR DHOOTPROJECT FOR CSE BY TUSHAR DHOOT
PROJECT FOR CSE BY TUSHAR DHOOT
Tushar Dhoot
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suitesmarru
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
Ian Foster
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
IEEEMEMTECHSTUDENTSPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
IEEEFINALYEARSTUDENTPROJECT
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
Laboratory of Information Science and Semantic Technologies
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ilkay Altintas, Ph.D.
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationJacqueline Stern
 
BrodtKerry_122016
BrodtKerry_122016BrodtKerry_122016
BrodtKerry_122016Kerry Brodt
 
Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1
ensmjd
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Marcel Kurovski
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
inovex GmbH
 

Similar to Adaptive Semantic Data Management Techniques for Federations of Endpoints (20)

Innovate2010 jazz keynote
Innovate2010 jazz keynoteInnovate2010 jazz keynote
Innovate2010 jazz keynote
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
 
The Essentials Of Project Management
The Essentials Of Project ManagementThe Essentials Of Project Management
The Essentials Of Project Management
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services”
 
PROJECT FOR CSE BY TUSHAR DHOOT
PROJECT FOR CSE BY TUSHAR DHOOTPROJECT FOR CSE BY TUSHAR DHOOT
PROJECT FOR CSE BY TUSHAR DHOOT
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suite
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Wrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and InspirationWrangling RedCap_An Introduction and Inspiration
Wrangling RedCap_An Introduction and Inspiration
 
BrodtKerry_122016
BrodtKerry_122016BrodtKerry_122016
BrodtKerry_122016
 
Santosh_Nayak_CV
Santosh_Nayak_CVSantosh_Nayak_CV
Santosh_Nayak_CV
 
Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 

More from Maribel Acosta Deibe

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
Maribel Acosta Deibe
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Maribel Acosta Deibe
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
Maribel Acosta Deibe
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Maribel Acosta Deibe
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
Maribel Acosta Deibe
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Maribel Acosta Deibe
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentMaribel Acosta Deibe
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
Maribel Acosta Deibe
 

More from Maribel Acosta Deibe (8)

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
 

Recently uploaded

一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 

Recently uploaded (20)

一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 

Adaptive Semantic Data Management Techniques for Federations of Endpoints

  • 1. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Maribel Acosta1,2, Cosmin Basca3, Gabriela Montoya1 Edna Ruckhaus1,Maria-Esther Vidal1 1 Universidad Sim´on Bol´ıvar, Venezuela {macosta,gmontoya,ruckhaus,mvidal}@ldc.usb.ve 2 Institute AIFB, Karlsruhe Institute of Technology, Germany Maribel.Acosta@aifb.uni-karlsruhe.de 3 Department of Informatics, University of Zurich, Switzerland basca@ifi.uzh.ch http://www.ldc.usb.ve/˜mvidal/TUTORIAL2012 May 28th, 2012
  • 2. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Introduction
  • 3. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Motivation Cloud of Linked Data: a large number of datasets. Data locally or remotely accessed by RDF engines and SPARQL endpoints. RDF engines rely performance on physical access and storage structures that are locally stored, but: loading the data and their links is not always feasible. The optimize-then-execute paradigm is usually followed, but: lack of statistics. unpredictable conditions of remote queries. changing characteristics.
  • 4. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Motivation Cloud of Linked Data: a large number of datasets. Data locally or remotely accessed by RDF engines and SPARQL endpoints. RDF engines rely performance on physical access and storage structures that are locally stored, but: loading the data and their links is not always feasible. The optimize-then-execute paradigm is usually followed, but: lack of statistics. unpredictable conditions of remote queries. changing characteristics. Techniques are not adaptable to unpredictable data transfers or data availability.
  • 5. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Motivating Example Clinical trials and drugs used for Breast, Colorectal, Ovarian, and Lung Cancer. PREFIX linkedct: <http://data.linkedct.org/resource/linkedct/> PREFIX foaf:<http://xmlns.com/foaf/0.1/> PREFIX rdf:<http://www.w3.org/2000/01/rdf-schema#> PREFIX dbpedia:<http://dbpedia.org/property/> PREFIX elements:<http://purl.org/dc/elements/1.1/> SELECT DISTINCT ?fn1 ?fn2 ?fn3 ?fn4 ?I ?DB ?ID1 ?ID2 ?L WHERE { ?A1 foaf:page ?fn1.?A1 linkedct:condition ?A2. ?A2 linkedct:condition name ”Colorectal Cancer”. ?A1 linkedct:intervention ?I. ?A3 foaf:page ?fn3.?A3 linkedct:condition ?A4. ?A4 linkedct:condition name ”Breast Cancer”. ?A3 linkedct:intervention ?I. ?A5 foaf:page ?fn5.?A5 linkedct:condition ?A6. ?A6 linkedct:condition name ”Ovarian Cancer”. ?A5 linkedct:intervention?I. ?A7 foaf:page?fn7.?A7 linkedct:condition ?A8. ?A8 linkedct:condition name ”Lung Cancer”. ?A7 linkedct:intervention ?I ?I linkedct:intervention type ”Drug”. ?I rdf:seeAlso ?DB. ?WK foaf:primaryTopic ?DB. ?WK dbpedia:revisionId ?ID1. ?WK dbpedia:pageId ?ID2.?WK elements:language ?L.}
  • 6. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Motivating Example Clinical trials and drugs used for Breast, Colorectal, Ovarian, and Lung Cancer. PREFIX linkedct: <http://data.linkedct.org/resource/linkedct/> PREFIX foaf:<http://xmlns.com/foaf/0.1/> PREFIX rdf:<http://www.w3.org/2000/01/rdf-schema#> PREFIX dbpedia:<http://dbpedia.org/property/> PREFIX elements:<http://purl.org/dc/elements/1.1/> SELECT DISTINCT ?fn1 ?fn2 ?fn3 ?fn4 ?I ?DB ?ID1 ?ID2 ?L WHERE { ?A1 foaf:page ?fn1.?A1 linkedct:condition ?A2. ?A2 linkedct:condition name ”Colorectal Cancer”. ?A1 linkedct:intervention ?I. ?A3 foaf:page ?fn3.?A3 linkedct:condition ?A4. ?A4 linkedct:condition name ”Breast Cancer”. ?A3 linkedct:intervention ?I. ?A5 foaf:page ?fn5.?A5 linkedct:condition ?A6. ?A6 linkedct:condition name ”Ovarian Cancer”. ?A5 linkedct:intervention?I. ?A7 foaf:page?fn7.?A7 linkedct:condition ?A8. ?A8 linkedct:condition name ”Lung Cancer”. ?A7 linkedct:intervention ?I ?I linkedct:intervention type ”Drug”. ?I rdf:seeAlso ?DB. ?WK foaf:primaryTopic ?DB. ?WK dbpedia:revisionId ?ID1. ?WK dbpedia:pageId ?ID2.?WK elements:language ?L.} select ?fn1 ?I ?DB ?A1 foaf:page ?fn1. ?A1 linkedct:condition ?A2. ?A2 linkedct:condition_name "Breastl Cancer". ?A1 linkedct:intervention ?I. ?I rdf:seeAlso ?DB select ?fn1 ?I ?DB ?A1 foaf:page ?fn1. ?A1 linkedct:condition ?A2. ?A2 linkedct:condition_name "Ovarian Cancer". ?A1 linkedct:intervention ?I. ?I rdf:seeAlso ?DB select ?fn1 ?I ?DB ?A1 foaf:page ?fn1. ?A1 linkedct:condition ?A2. ?A2 linkedct:condition_name "Lung Cancer". ?A1 linkedct:intervention ?I. ?I rdf:seeAlso ?DB select ?fn1 ?I ?DB ?A1 foaf:page ?fn1. ?A1 linkedct:condition ?A2. ?A2 linkedct:condition_name "Colorectal Cancer". ?A1 linkedct:intervention ?I. ?I rdf:seeAlso ?DB JOIN JOIN JOIN select ?ID1 ?ID2 ?L ?WK foaf:primaryTopic t(?DB). ?WK dbpedia:revisionId ?ID1. ?WK dbpedia:pageId ?ID2. ?WK elements:language ?L. JOIN t(?DB} A star-shaped plan; four star-shaped sub-queries against LinkedCT endpoints and one against DBPedia endpoint. Endpoints may time out before producing any answer. Query too complex, and needs to be decomposed into simple sub-queries. Adaptive query processing techniques need to be implemented to incrementally produce answers.
  • 7. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Agenda 1 Basic Concepts and Background (20 minutes) 2 Adaptive Query Processing Techniques (30 minutes) 3 RDF Engines for Federations of Endpoints (30 minutes) 4 Questions and Discussion (5 minutes) 5 Coffee Break (30 minutes) 6 Hands-on (90 minutes) Explanation of existing Benchmarks (10 minutes) Evaluation (60 minutes) Discussion of the Observed Results (15 minutes) 7 Summary and Closing (5 minutes)
  • 8. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Background
  • 9. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Introduction-Traditional Data Management Systems[11, 16] A Database Management System (DBMS) is a software designed to: consistently store and organize data, Provide physical data structures to store large databases. Stage large datasets between main memory and secondary storage. Protect data from inconsistencies. and to provide a secure and concurrent access to the data. Provide a query language. Crash recovery. Security and access control.
  • 10. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction The Optimize-Then-Execute Paradigm Traditional Architectures Query Processing is divided into three mayor steps: Statistics generation. Query optimization. Query Execution. Database Catalog Statistics Generator Query Optimizer Query Execution Engine Query RunStatistics Collected Statistics
  • 11. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction The Optimize-then-Execute Paradigm Traditional Query Processing techniques: Parse a declarative query. Generate an intermediate representation of the query. Produce an efficient logical and physical plan; minimize disk I/O access. Execute the query plan without making runtime decisions.
  • 12. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Query Optimizer Architecture[11, 16, 22] Planner Rewriter Algebraic Space Method- Structure Space Cost Model Size- Distribution Estimator Dictionary/Catalog
  • 13. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Query Optimizer Architecture[11, 16, 22] Planner Rewriter Algebraic Space Method- Structure Space Cost Model Size- Distribution Estimator Dictionary/Catalog Rewriter Applies algebraic transformations to the query to produce a more efficient equivalent query; some transformations are: flatten out queries, using views, etc. Nested queries are decomposed into query blocks.
  • 14. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Query Optimizer Architecture[11, 16, 22] Planner Rewriter Algebraic Space Method- Structure Space Cost Model Size- Distribution Estimator Dictionary/Catalog Planner Traverses the space of possible execution plans following a particular search strategy, e.g., dynamic programming, randomized algorithms. Query blocks are optimized one at a time. All available access methods are considered for each relation. All the ways to join the relations one-at-a-time; permutations of relations and different join methods are considered.
  • 15. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Query Optimizer Architecture[11, 16, 22] Planner Rewriter Algebraic Space Method- Structure Space Cost Model Size- Distribution Estimator Dictionary/Catalog Algebraic Space Set of algebraic rules that restrict the space of plans transformations, and guide the planner into the space of efficient logical plans.
  • 16. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Query Optimizer Architecture[11, 16, 22] Planner Rewriter Algebraic Space Method- Structure Space Cost Model Size- Distribution Estimator Dictionary/Catalog Method-Structure Space Set of rules that restrict the space of physical implementations of each logical plan, and guide the planner into the space of efficient physical plans.
  • 17. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Query Optimizer Architecture[11, 16, 22] Planner Rewriter Algebraic Space Method- Structure Space Cost Model Size- Distribution Estimator Dictionary/Catalog Cost Model Set of arithmetic rules or statistical techniques, to estimate the execution cost and cardinality of logical and physical plans.
  • 18. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Query Optimizer Architecture[11, 16, 22] Planner Rewriter Algebraic Space Method- Structure Space Cost Model Size- Distribution Estimator Dictionary/Catalog Size-Distribution Estimator Set of arithmetic rules or statistical techniques to estimate the cardinalities of dataset to be accessed as well as its distributions, and the selectivity of a query select condition.
  • 19. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Query Optimizer Architecture[11, 16, 22] Planner Rewriter Algebraic Space Method- Structure Space Cost Model Size- Distribution Estimator Dictionary/Catalog Dictionary Set of statistics that characterize the dataset, e.g., number of different values of an attribute, or different subjects, properties, or values; distributions, etc. Stored statistics depend on the type of size-distribution estimator.
  • 20. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Join Ordering: Search Space EMP DEPT ACCOUNT BANK bno=bno ano=ano dno=dno Left-linear plan
  • 21. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Join Ordering: Search Space EMP DEPT ACCOUNT BANK bno=bno ano=ano dno=dno Left-linear plan DEPT EMP BANK ano=ano dno=dno bno=bno ACCOUNT Right-linear plan
  • 22. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Join Ordering: Search Space EMP DEPT ACCOUNT BANK bno=bno ano=ano dno=dno Left-linear plan DEPT EMP BANK ano=ano dno=dno bno=bno ACCOUNT Right-linear plan DEPTEMP BANK ano=ano dno=dno bno=bno ACCOUNT Bushy Plan
  • 23. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Index Nested Loop Join Indices exist on the join attributes of the inner relation of the join. Cost depends on the type of index and on the clustering: To find matching tuples: clustered index: 1 I/O, unclustered: up to 1 I/O per matching tuple. SSN NAME 230 120 100 John Smith Mary Williams Jose Gomez 99 Peter Wigle SSN COURSE GRADE 120 CS6019 B 100 CS6019 A 230 CS6019 B 100 CS6020 A 120 CS6020 B 230 CS6020 B 100 CS6021 A 120 CS6021 B Index on SSN Index Nested Loop Join by SSN
  • 24. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Sort Merge Join Sorts input tables on the join attributes; Scans sorted tables; Merges on join attributes. Cost is optimal in case input tables are sorted. SSN NAME 230 120 100 John Smith Mary Williams Jose Gomez SSN COURSE GRADE 120 CS6019 B 100 CS6019 A 230 CS6019 B 100 CS6020 A 120 CS6020 B 230 CS6020 B 100 CS6021 A 120 CS6021 B 99 Peter Wigle Sort Merge by SSN
  • 25. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Sort Merge Join Sorts input tables on the join attributes; Scans sorted tables; Merges on join attributes. Cost is optimal in case input tables are sorted. SSN NAME 230 120 100 John Smith Mary Williams Jose Gomez SSN COURSE GRADE 120 CS6019 B 100 CS6019 A 230 CS6019 B 100 CS6020 A 120 CS6020 B 230 CS6020 B 100 CS6021 A 120 CS6021 B 99 Peter Wigle Sort Merge by SSN SSN NAME 230 120 100 John Smith Mary Williams Jose Gomez SSN COURSE GRADE 120 CS6019 B 100 CS6019 A 230 CS6019 B 100 CS6020 A 120 CS6020 B 230 CS6020 B 100 CS6021 A 120 CS6021 B 99 Peter Wigle Sorting by SSN
  • 26. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Sort Merge Join Sorts input tables on the join attributes; Scans sorted tables; Merges on join attributes. Cost is optimal in case input tables are sorted. SSN NAME 230 120 100 John Smith Mary Williams Jose Gomez SSN COURSE GRADE 120 CS6019 B 100 CS6019 A 230 CS6019 B 100 CS6020 A 120 CS6020 B 230 CS6020 B 100 CS6021 A 120 CS6021 B 99 Peter Wigle Sort Merge by SSN SSN NAME 230 120 100 John Smith Mary Williams Jose Gomez SSN COURSE GRADE 120 CS6019 B 100 CS6019 A 230 CS6019 B 100 CS6020 A 120 CS6020 B 230 CS6020 B 100 CS6021 A 120 CS6021 B 99 Peter Wigle Sorting by SSN SSN COURSE GRADE 120 CS6019 B 100 CS6019 A 230 CS6019 B 100 CS6020 A 120 CS6020 B 230 CS6020 B 100 CS6021 A 120 CS6021 B NAME Jose Gomez Jose Gomez Jose Gomez Mary Williams Mary Williams Mary Williams John Smith John Smith Merging by SSN
  • 27. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Hash Join Two fold physical operator: Partition: input tables are partitioned using a hash function h1; only tuples in the same partitions of two tables are joined. Probing: each partition of the outer table is loaded in a hash table using a hash function h2; the corresponding partition of the inner table is scanned to search for matches. Tables Partitions A Join B Partition Phase Probing Phase Hash Tables Joined tuples
  • 28. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Example Hash Join Original Relations Disk INPUT OUTPUT 1 2 B-1 hash function h1 1 2 B-1 Disk Partitions ... ... Partition Phase
  • 29. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Example Hash Join Original Relations Disk INPUT OUTPUT 1 2 B-1 hash function h1 1 2 B-1 Disk Partitions ... ... Partition Phase Partitions Input Tables Disk Output Buffer hash function h2 Disk Join Results ... ... Input Buffer ... Hash table for partition h2 Probing Phase
  • 30. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Execution Engine-Blocking Operators Require more than one pass over the input to create the output. Hold intermediate state to compute the output. Some blocking operators: Project: must sort and then merge the input. Sort Merge Join: must sort the inputs and then merge them. Hash Join: must partition the inputs, temporary store the input in a hash table, probe hashed data many times. Input Data Materialization Intermediate Structures Blocking Operator
  • 31. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Example-Blocking Operators A B D Hash Join Merge Join Index Nested Loop Join C C.a Hashtable SORT SORT Materialization points Blocking Operators
  • 32. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Execution Engine-Pipelined Plans Executes all operators in the plan in parallel. Operators or iterators, consume tuples from its children and output produced tuples to their parents. Tuples flow from the leaf nodes towards the root of the tree. Iterators have the following functions: Open prepares an operator for producing data. Next produces an item. Close performs final house-keeping.
  • 33. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Examples of Iterator Functions [11] Iterator Open Next Close Scan open input call next on input close input
  • 34. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Examples of Iterator Functions [11] Iterator Open Next Close Scan open input call next on input close input Select open input call next on input close input until an item qualifies
  • 35. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Examples of Iterator Functions [11] Iterator Open Next Close Scan open input call next on input close input Select open input call next on input close input until an item qualifies Hash Join allocate hash directory call next on probe close probe input open left partition input input until a deallocate hash build hash table match is found directory next on partition input close partition input open right probe input
  • 36. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Examples of Iterator Functions [11] Iterator Open Next Close Scan open input call next on input close input Select open input call next on input close input until an item qualifies Hash Join allocate hash directory call next on probe close probe input open left partition input input until a deallocate hash build hash table match is found directory next on partition input close partition input open right probe input Merge Join open both inputs get next item from close both inputs input with smaller key until a match is found
  • 37. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Limitations of the Optimize-Then-Execute Paradigm Three Mayor Steps Off-line statistics generation to collect cardinalities, values frequencies, and costs to access data. Cost and heuristic-based query optimization to identify ”good-enough” plans. Assumptions: Different values of the attributes Availability of the data Cardinality of operators Behavior of data sources Query Execution: pipelined operators are processed at the time and output is produced by its sub-operators. materialized operators are completely executed before firing its parents for processing. Blocked Iterator Model.
  • 38. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Introduction Summary: The Optimize-Then-Execute Paradigm Relies on knowledge about the cost and cardinality of the accessed datasets. Effectively achieves the generation of query execution plans where the size of intermediate results is minimized. Does not scale due to lack of statistics and unpredictable data transfers or data availability.
  • 39. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm The Adaptive Query Processing Paradigm
  • 40. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Adaptive Query Processing [4, 9, 18] Adaptivity is needed: Misestimated or missing statistics. Unexpected correlations. Unpredictable costs. Dynamically changing data, workload and sources availability. Rates at which tuples arrive from the sources, change. Initial Delays. Slow Delivery Bursty Arrivals Iterative environments. Systems able to change their behavior by learning behavior of data providers. Receive information from the environment. Use up-to-date information to change their behavior. Keep iterating over time to adapt their behavior based on the environment conditions.
  • 41. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Adaptive Query Processing Database Catalog Statistics Generator Query Optimizer Query Execution Engine Query RunStatistics Collected Statistics Typical Optimize-Then-Execute Architecture.
  • 42. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Adaptive Query Processing Database Catalog Statistics Generator Query Optimizer Query Execution Engine Query RunStatistics Collected Statistics Typical Optimize-Then-Execute Architecture. Database Catalog Statistics Generator Query Optimizer Query Execution Engine Query RunStatistics ReOptimize Collected Statistics Adaptive Query Processing Architecture.
  • 43. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Agenda Types of Adaptation. Adaptivity in Relational Databases Adaptivity in the Web of Data
  • 44. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Types of Adaptive-Based Solution Adaptation Level Source Selection searching strategies to select the best sources for answering a query according to real-time source conditions. Query Execution strategies to adapt query processing to environment conditions. Granularity of the Adaptation Fine-grained adaptation of small processes, e.g., per-tuple or source basis. Coarse-grained is attempted for large processes, e.g., several queries are executed under certain assumptions about the conditions of the environment.
  • 45. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Adaptive Query Processing Strategies Plan-based Systems adapt to environment conditions, by extending the optimize-then-execute paradigm. Adaptive Join Processing(Intra-Operator) adaptivity is performed at tuple level and query operators are able to adapt to the environment changes, even in the context of a fixed query plan. Non-Pipelined Query Execution (Inter-Operator Re-optimization) sub-queries against remote data sources are re-scheduled based on uncertainties in the execution cost of the remote access, size of the materialized intermediate results, and unexpected delays.
  • 46. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Inter-Operator Re-optimization[4, 17, 21, 27] Ingress [30] interleaves sub-plan generation with plan execution; for each table in a query, the smallest tables are selected first until plans of all the tuples and tables are considered. Dynamic Query Planning [7] off-line define a set of constraints that optimal plans must meet; during execution time, plans that meet constraints are generated based on dataset conditions. DEC RDB [2] implements competition, an adaptive strategy which generates and runs multiple plans in parallel for a while, and after a short period of time it selects the most promising ones and discards the rest. Query Scrambling [27, 28] starts with a plan and it may change it on-the-fly, depending on data transfer delays or misestimations; operators can be ordered differently, or not directly combined tables can be joined by a new operator during re-optimization. Routing techniques avoid traditional optimizers, and generate plans for each tuple during runtime by sending tuples through the pool of eligible operators; Eddy [3, 25], M-Join [29] and STAIRs [8] follow this adaptive strategy. Fine-Grained adaptivity is achieved.
  • 47. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Inter-Operator Re-optimization[4, 17, 21, 27] Detect and correct situations where optimizers select sub-optimal plans, which could be improved if accurate statistics are considered. Existing approaches extend statistics tracking with a log of estimates considered by the optimizer to generate the query plan, and of system conditions during query execution. Physical operators are implemented to track statistics and fire re-optimizations. Re-optimization must ensure that a new plan will not duplicate or miss outputs, and reuse previous computations. Should avoid thrashing, i.e., spend most of the time and resources, monitoring operators.
  • 48. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Query Scrambling[27, 28] Query Scrambling is an execution query method that reacts to data arrival delays. First, a plan is generated, and reacts during query execution to misestimations. Existing operators can be rescheduled, retaining one part of the plan, and scheduling another. Query operators are divided into runnable and blocked. Maximal runnable sub-trees are identified, i.e., sub-tree where all the operators are runnable. Executed sub-trees are materialized until the delayed sources become available and the original plan can resumed. Query optimizer is invoked to identify non-delayed sub-trees that can be materialized first. New operators can be introduced to join tables that were not directly joined in the original plan.
  • 49. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Query Scrambling[27, 28] Query Scrambling is an execution query method that reacts to data arrival delays. First, a plan is generated, and reacts during query execution to misestimations. Existing operators can be rescheduled, retaining one part of the plan, and scheduling another. Query operators are divided into runnable and blocked. Maximal runnable sub-trees are identified, i.e., sub-tree where all the operators are runnable. Executed sub-trees are materialized until the delayed sources become available and the original plan can resumed. Query optimizer is invoked to identify non-delayed sub-trees that can be materialized first. New operators can be introduced to join tables that were not directly joined in the original plan. A B C D E F 1 3 5 4 2 Source A is delayed.
  • 50. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Query Scrambling[27, 28] Query Scrambling is an execution query method that reacts to data arrival delays. First, a plan is generated, and reacts during query execution to misestimations. Existing operators can be rescheduled, retaining one part of the plan, and scheduling another. Query operators are divided into runnable and blocked. Maximal runnable sub-trees are identified, i.e., sub-tree where all the operators are runnable. Executed sub-trees are materialized until the delayed sources become available and the original plan can resumed. Query optimizer is invoked to identify non-delayed sub-trees that can be materialized first. New operators can be introduced to join tables that were not directly joined in the original plan. A B C D E F 1 3 5 Sub-tree 4 is materialized.
  • 51. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Query Scrambling[27, 28] Query Scrambling is an execution query method that reacts to data arrival delays. First, a plan is generated, and reacts during query execution to misestimations. Existing operators can be rescheduled, retaining one part of the plan, and scheduling another. Query operators are divided into runnable and blocked. Maximal runnable sub-trees are identified, i.e., sub-tree where all the operators are runnable. Executed sub-trees are materialized until the delayed sources become available and the original plan can resumed. Query optimizer is invoked to identify non-delayed sub-trees that can be materialized first. New operators can be introduced to join tables that were not directly joined in the original plan. A B C D E F 1 3 5 6 A new operator is created between node 3 and D, E and F.
  • 52. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Query Scrambling[27, 28] Query Scrambling is an execution query method that reacts to data arrival delays. First, a plan is generated, and reacts during query execution to misestimations. Existing operators can be rescheduled, retaining one part of the plan, and scheduling another. Query operators are divided into runnable and blocked. Maximal runnable sub-trees are identified, i.e., sub-tree where all the operators are runnable. Executed sub-trees are materialized until the delayed sources become available and the original plan can resumed. Query optimizer is invoked to identify non-delayed sub-trees that can be materialized first. New operators can be introduced to join tables that were not directly joined in the original plan. A B C D E F 1 5 The new operator is executed.
  • 53. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Intra-Operator Techniques Adapts plans to environment changes, even in presence of a fixed plan. Incrementally produce results as they become available. Symmetric Hash Join [9] and Xjoin [26] overcome Hash Join blocking limitations, and hide delays while producing answers even if both input data sources become blocked. Fine-Grained adaptivity is achieved at each operator.
  • 54. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Example-Blocking Operators A B D Hash Join Merge Join Index Nested Loop Join C C.a Hashtable SORT SORT Materialization points Blocking Operators
  • 55. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Symmetric Hash Join[9] Tables Partitions A Join B Partition Phase Probing Phase Hash Tables Joined tuples Traditional Hash Join blocks the probing phase until the partition phase is completely finished.
  • 56. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 The Adaptive Query Processing Paradigm Symmetric Hash Join[9] Tables Partitions A Join B Partition Phase Probing Phase Hash Tables Joined tuples Traditional Hash Join blocks the probing phase until the partition phase is completely finished. A B Hash Table Hash Table Joined tuples Partition Partition Probe Probe Symmetric Hash Join builds hash tables on both inputs; when a tuple arrives, it is inserted in its hash table and probed against the other table.
  • 57. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptivity in the Web of Data Adaptivity in the Web of Data
  • 58. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptivity in the Web of Data Adaptive Query Processing for the Web of Data Adaptivity during Source Selection relies on source or endpoint descriptions and statistics describe data stored at the endpoints. Adaptivity at Query Execution Level approaches implement intra- and inter-operator query processing techniques to adapt execution to no accurate statistics, uncontrollable network conditions, query complexity, dynamic data and workload.
  • 59. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptivity in the Web of Data Evolution of Adaptive Approaches in the Web of Data Summary Coarse-Grained Granularity Fine-Grained Granularity Plan-based Techniques i.e., ARQ, DARQ Adaptive Source-Selection Techniques i.e., FedX, SPLENDID Adaptive Query Processing Techniques i.e., ANAPSID, Avalanche, Hartig et. al. 09, Ladwig and Tran 10
  • 60. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptivity in the Web of Data Adaptivity during Source Selection coarse-Grained Approaches Harth et al.[12]: Qtree-based index structure to store data source statistics collected in a pre-processing stage; these statistics are used for source ranking and to determine the best sources to answer a join query. Li and Heflin[20] tree structure to support integration of data from multiple heterogeneous sources; nodes are annotated with source join selectivities and used to select the corresponding datasets. Finer granularity is achieved whenever the statistics are re-computed frequently.
  • 61. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptivity in the Web of Data Fined-Grained Approaches to traverse the Web of Data Hartig et al.[13, 14, 15] source selection and link traversal are interleaved during query execution time. A non-blocking iterator model is used for traversing relevant links relies on an asynchronous pipeline of iterators executed in an order heuristically determined. e.g., the most selective iterators are executed first [14]. query engine is able to adapt the execution to source availability by detecting on-the-fly whenever a dereferenced dataset stops responding; iterators can be cancelled, and other requests are submitted to alive datasets. Fine-Grained adaptivity is achieved during query selection and link traversal.
  • 62. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptivity in the Web of Data Combined-Grained Approaches to traverse the Web of Data Ladwig and Tran[19] coarse-grained granularity: aggregate indexes keep data distributions and are used for source selection. Fine-grained granularity: Symmetric Hash Joins incrementally produce answers even when sources become blocked. Adaptivity is managed at different levels and granularity during link traversal, but these approaches are not tailored to access data from federations of endpoints.
  • 63. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints SPARQL Endpoints Implement SPARQL protocol and enable users to query particular datasets or to dereference linked data. Developed for very light-weight use. Executions against several endpoints can be unsuccessful because of endpoint timeouts, unpredictable data transfers and endpoint availability.
  • 64. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions[10] Statistical Information stored in VOID Descriptions is used to perform source selection first stage. Contact Source during query execution is used to improve the preliminar source selection. Pattern grouping is performed for set of triples that share the same unique endpoint, and for sameAs triple that shares an unbound subject variable with other triples. Cost-based join ordering using dynamic programming determines the best plan, giving preference to bushy tree plans. Bind Joins that reduces network overhead.
  • 65. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints SPLENDID Example: drugs and their components’ url and image PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/> PREFIX purl:<http://purl.org/dc/elements/1.1/> PREFIX bio2rdf:<http://bio2rdf.org/ns/bio2rdf#> SELECT $drug $keggUrl $chebiImage WHERE { $drug rdf:type drugbank:drugs . $drug drugbank:keggCompoundId $keggDrug . $drug drugbank:genericName $drugBankName . $keggDrug bio2rdf:url $keggUrl . $chebiDrug purl:title $drugBankName . $chebiDrug bio2rdf:image $chebiImage . }
  • 66. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints SPLENDID Example: drugs and their components’ url and image PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/> PREFIX purl:<http://purl.org/dc/elements/1.1/> PREFIX bio2rdf:<http://bio2rdf.org/ns/bio2rdf#> SELECT $drug $keggUrl $chebiImage WHERE { (1) (2) (3) ¨ © $drug rdf:type drugbank:drugs . $drug drugbank:keggCompoundId $keggDrug . $drug drugbank:genericName $drugBankName . Drugbank (4) £ ¢   ¡$keggDrug bio2rdf:url $keggUrl . KEGG, Chebi (5) £ ¢   ¡$chebiDrug purl:title $drugBankName . KEGG, Chebi, Jamendo, SW Dog Food (6) £ ¢   ¡$chebiDrug bio2rdf:image $chebiImage . KEGG, Chebi }
  • 67. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints SPLENDID Example: drugs and their components’ url and image PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ PREFIX purl:http://purl.org/dc/elements/1.1/ PREFIX bio2rdf:http://bio2rdf.org/ns/bio2rdf# SELECT $drug $keggUrl $chebiImage WHERE { (1) (2) (3) ¨ © $drug rdf:type drugbank:drugs . $drug drugbank:keggCompoundId $keggDrug . $drug drugbank:genericName $drugBankName . Drugbank (4) £ ¢   ¡$keggDrug bio2rdf:url $keggUrl . KEGG, Chebi (5) £ ¢   ¡$chebiDrug purl:title $drugBankName . KEGG, Chebi, Jamendo, SW Dog Food (6) £ ¢   ¡$chebiDrug bio2rdf:image $chebiImage . KEGG, Chebi } Execution plan
  • 68. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints SPLENDID Example: drugs that interact with antibiotics, antiviral and antihypertensive agents PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ PREFIX dbcategory: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/ PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX dbowl: http://dbpedia.org/ontology/ PREFIX kegg: http://bio2rdf.org/ns/kegg# SELECT DISTINCT ?drug ?enzyme ?reaction Where { ?drug1 drugbank:drugCategory dbcategory:antibiotics . ?drug2 drugbank:drugCategory dbcategory:antiviralAgents . ?drug3 drugbank:drugCategory dbcategory:antihypertensiveAgents . ?I1 drugbank:interactionDrug2 ?drug1 . ?I1 drugbank:interactionDrug1 ?drug . ?I2 drugbank:interactionDrug2 ?drug2 . ?I2 drugbank:interactionDrug1 ?drug . ?I3 drugbank:interactionDrug2 ?drug3 . ?I3 drugbank:interactionDrug1 ?drug . ?drug owl:sameAs ?drug5 . ?drug drugbank:keggCompoundId ?cpd . ?enzyme kegg:xSubstrate ?cpd . ?enzyme rdf:type kegg:Enzyme . ?reaction kegg:xEnzyme ?enzyme . ?reaction kegg:equation ?equation . ?drug5 rdf:type dbowl:Drug . }
  • 69. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints SPLENDID Example: drugs that interact with antibiotics, antiviral and antihypertensive agents PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ PREFIX dbcategory: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/ PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX dbowl: http://dbpedia.org/ontology/ PREFIX kegg: http://bio2rdf.org/ns/kegg# SELECT DISTINCT ?drug ?enzyme ?reaction Where { (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) 9 8 6 7 ?drug1 drugbank:drugCategory dbcategory:antibiotics . ?drug2 drugbank:drugCategory dbcategory:antiviralAgents . ?drug3 drugbank:drugCategory dbcategory:antihypertensiveAgents . ?I1 drugbank:interactionDrug2 ?drug1 . ?I1 drugbank:interactionDrug1 ?drug . ?I2 drugbank:interactionDrug2 ?drug2 . ?I2 drugbank:interactionDrug1 ?drug . ?I3 drugbank:interactionDrug2 ?drug3 . ?I3 drugbank:interactionDrug1 ?drug . ?drug owl:sameAs ?drug5 . ?drug drugbank:keggCompoundId ?cpd . Drugbank (12) (13) (14) (15) ?enzyme kegg:xSubstrate ?cpd . ?enzyme rdf:type kegg:Enzyme . ?reaction kegg:xEnzyme ?enzyme . ?reaction kegg:equation ?equation . KEGG (16) £ ¢   ¡?drug5 rdf:type dbowl:Drug . DBpedia }
  • 70. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints SPLENDID Execution plan
  • 71. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints Adaptivity at Endpoint Selection Level Fine-Grained Granularity FedX [24] no knowledge about mappings or statistics is maintained. endpoints are consulted on-the-fly to determine if a predicates can be answered. cache is used to record endpoint predicates. queries are divided into exclusive groups, i.e., triple patterns that can be executed by only one endpoint. heuristically groups are ordered. Fine-grained granularity at source selection level while coarse-grained granularity during plan generation.
  • 72. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints FedX Example: drugs and their components’ url and image PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ PREFIX purl:http://purl.org/dc/elements/1.1/ PREFIX bio2rdf:http://bio2rdf.org/ns/bio2rdf# SELECT $drug $keggUrl $chebiImage WHERE { $drug rdf:type drugbank:drugs . $drug drugbank:keggCompoundId $keggDrug . $drug drugbank:genericName $drugBankName . $keggDrug bio2rdf:url $keggUrl . $chebiDrug purl:title $drugBankName . $chebiDrug bio2rdf:image $chebiImage . }
  • 73. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints FedX Example: drugs and their components’ url and image PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ PREFIX purl:http://purl.org/dc/elements/1.1/ PREFIX bio2rdf:http://bio2rdf.org/ns/bio2rdf# SELECT $drug $keggUrl $chebiImage WHERE { (1) (2) (3) ¨ © $drug rdf:type drugbank:drugs . $drug drugbank:keggCompoundId $keggDrug . $drug drugbank:genericName $drugBankName . Drugbank (4) £ ¢   ¡$keggDrug bio2rdf:url $keggUrl . KEGG, Chebi (5) £ ¢   ¡$chebiDrug purl:title $drugBankName . KEGG, Chebi, Jamendo, SW Dog Food (6) £ ¢   ¡$chebiDrug bio2rdf:image $chebiImage . KEGG, Chebi }
  • 74. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints FedX Example: drugs and their components’ url and image PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ PREFIX purl:http://purl.org/dc/elements/1.1/ PREFIX bio2rdf:http://bio2rdf.org/ns/bio2rdf# SELECT $drug $keggUrl $chebiImage WHERE { (1) (2) (3) ¨ © $drug rdf:type drugbank:drugs . $drug drugbank:keggCompoundId $keggDrug . $drug drugbank:genericName $drugBankName . Drugbank (4) £ ¢   ¡$keggDrug bio2rdf:url $keggUrl . KEGG, Chebi (5) £ ¢   ¡$chebiDrug purl:title $drugBankName . KEGG, Chebi, Jamendo, SW Dog Food (6) £ ¢   ¡$chebiDrug bio2rdf:image $chebiImage . KEGG, Chebi } Execution plan
  • 75. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints FedX Example: drugs that interact with antibiotics, antiviral and antihypertensive agents PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ PREFIX dbcategory: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/ PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX dbowl: http://dbpedia.org/ontology/ PREFIX kegg: http://bio2rdf.org/ns/kegg# SELECT DISTINCT ?drug ?enzyme ?reaction Where { ?drug1 drugbank:drugCategory dbcategory:antibiotics . ?drug2 drugbank:drugCategory dbcategory:antiviralAgents . ?drug3 drugbank:drugCategory dbcategory:antihypertensiveAgents . ?I1 drugbank:interactionDrug2 ?drug1 . ?I1 drugbank:interactionDrug1 ?drug . ?I2 drugbank:interactionDrug2 ?drug2 . ?I2 drugbank:interactionDrug1 ?drug . ?I3 drugbank:interactionDrug2 ?drug3 . ?I3 drugbank:interactionDrug1 ?drug . ?drug drugbank:keggCompoundId ?cpd . ?enzyme kegg:xSubstrate ?cpd . ?enzyme rdf:type kegg:Enzyme . ?reaction kegg:xEnzyme ?enzyme . ?reaction kegg:equation ?equation . ?drug5 rdf:type dbowl:Drug . ?drug owl:sameAs ?drug5 . }
  • 76. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints FedX Example: drugs that interact with antibiotics, antiviral and antihypertensive agents PREFIX drugbank: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ PREFIX dbcategory: http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugcategory/ PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX dbowl: http://dbpedia.org/ontology/ PREFIX kegg: http://bio2rdf.org/ns/kegg# SELECT DISTINCT ?drug ?enzyme ?reaction Where { (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) 9 8 6 7 ?drug1 drugbank:drugCategory dbcategory:antibiotics . ?drug2 drugbank:drugCategory dbcategory:antiviralAgents . ?drug3 drugbank:drugCategory dbcategory:antihypertensiveAgents . ?I1 drugbank:interactionDrug2 ?drug1 . ?I1 drugbank:interactionDrug1 ?drug . ?I2 drugbank:interactionDrug2 ?drug2 . ?I2 drugbank:interactionDrug1 ?drug . ?I3 drugbank:interactionDrug2 ?drug3 . ?I3 drugbank:interactionDrug1 ?drug . ?drug drugbank:keggCompoundId ?cpd . Drugbank (11) (12) (13) (14) ?enzyme kegg:xSubstrate ?cpd . ?enzyme rdf:type kegg:Enzyme . ?reaction kegg:xEnzyme ?enzyme . ?reaction kegg:equation ?equation . KEGG (15) £ ¢   ¡?drug5 rdf:type dbowl:Drug . DBpedia (16) £ ¢   ¡?drug owl:sameAs ?drug5 . DBpedia, KEGG, Drugbank
  • 77. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints Execution plan
  • 78. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints Adaptivity at Query Execution Level Coarse-Grained Granularity SPARQL-DQP [6] and ARQ 1 exploit information on SPARQL 1.1 queries to decide where the subqueries of triple patterns will be executed. Both rely on statistics or heuristics during query optimization, reaching thus coarse-grained adaptation at plan level. FedX [24] no adaptation is performed during query processing. Finer granularity is achieved whenever the statistics are re-computed frequently. 1 http://jena.sourceforge.net/ARQ
  • 79. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints Adaptivity at Query Execution Level Fine-Grained Granularity AVALANCHE [5] inter-operator solution to heuristically select on-the-fly the most promising plans. statistics about cardinalities and data distribution are considered to identify possibly good plans. competition strategy top-k plans are executed in parallel and the output of a query corresponds to the answers produced by the most promising plan(s) in a given period of time. Fine-grained granularity. Decision of completely executing top-k plan(s) is taken on-the-fly, based on the current conditions of the environment.
  • 80. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints Adaptivity at Query Execution Level Fine-Grained Granularity Avalanche – execution model SELECT ?name ?title WHERE { ?paper dblp:has-title ?title . ?paper dblp:has-author ?author . ?author dblp:has-name ?name . } find endpoints request statistics distributed join Endpoints Directory or Search Engine Avalanche SPARQL endpoint ... on the WWW Query Borrows from: p2p systems the Web No completeness guarantees, first K results are obtained iteratively (plan-level) given stopping criteria: timeout relative saturation LIMIT modifier
  • 81. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints Adaptivity at Query Execution Level Fine-Grained Granularity Avalanche – architecture 3] Query Planning and Execution phaseAVALANCHE endpoints Search Engine i.e., http://void.rkbexplorer.com/ 1] Source Discovery phase Plans Queue Plan Generator Finished Plans Queue Results Queue Query Stopper Executor MaterializerExecutor Executor Executor Materializer Materializer Materializer Results Statistics Requester Query Query Parser 2] Statistics Gathering phase Source Selector optimization is concomitant with query execution scheduling is dynamic between plans (highlighted) – results come in out-of-order plan space: |Hosts| ∗ |SubQueries| ∗ PlanDepth multi-path depth-first heuristic search based on parametric cost model
  • 82. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints Adaptivity at Query Execution Level Fine-Grained Granularity Avalanche – plan execution 1) Join (s1, s2) 3) Send(R1) 10) Update (s2, s1) 12) Send(R2)host A 2) R1 = Execute (s1) 13) R1 = Filter (R1, R2) ?sameFilm ?sameFilm dbpedia:hasPhotoCollection ?photoCollection. ?sameFilm dbpedia:studio ʻʻProducers Circleʼʼ. SubQuery 1 - s1 ?film dc:title ?title. ?film movie:actor ?actor. ?film owl:sameAs ?sameFilm. SubQuery 2 - s2 ?actor a foaf:Person. ?actor movie:actor_name ?name. SubQuery 3 - s3 5) Join (s2, s3) 6) Send(FR2) host C 7) FR3 = ExecuteFilter (R3) 9) Send(FR3) 8) Update (s3, s2) host B 4) FR2 = ExecuteFilter (R1) 11) R2 = Filter (FR2) ?actor DBPEDIA endpointsLinked Movie Database endpoints left linear execution strategy not adaptive
  • 83. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints Adaptivity at Query Execution Level Fine-Grained Granularity Avalanche – plan execution 1) Join (s1, s2) 3) Send(R1) 10) Update (s2, s1) 12) Send(R2)host A 2) R1 = Execute (s1) 13) R1 = Filter (R1, R2) ?sameFilm ?sameFilm dbpedia:hasPhotoCollection ?photoCollection. ?sameFilm dbpedia:studio ʻʻProducers Circleʼʼ. SubQuery 1 - s1 ?film dc:title ?title. ?film movie:actor ?actor. ?film owl:sameAs ?sameFilm. SubQuery 2 - s2 ?actor a foaf:Person. ?actor movie:actor_name ?name. SubQuery 3 - s3 5) Join (s2, s3) 6) Send(FR2) host C 7) FR3 = ExecuteFilter (R3) 9) Send(FR3) 8) Update (s3, s2) host B 4) FR2 = ExecuteFilter (R1) 11) R2 = Filter (FR2) ?actor DBPEDIA endpointsLinked Movie Database endpoints left linear execution strategy not adaptive could benefit from adaptive execution . . .
  • 84. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints ANAPSID: An Adaptive Query Processing Engine for Federation of Endpoints [1] ANAPSID – Architecture Mediator Query Planner SPARQL 1.1 Query Query Optimizer Query Decomposer Adaptive Query Engine Query Schema Alignments Endpoint Endpoint Adaptive Plan SPARQL 1.0 Query ANAPSID Decomposer: rewrites the query into sub-queries. Optimizer: generates an optimal bushy tree execution plan. Query Engine: implements inter- and intra-operators techniques to perform fine-grained adaptivity.
  • 85. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints ANAPSID: An Adaptive Query Processing Engine for Federation of Endpoints [1] Adaptivity is performed at both source selection and query execution levels. Endpoint descriptions used to select endpoints that can answer a triple pattern; triple patterns that can be executed by the same endpoint are grouped together; endpoints may be contacted on-the-fly to decide if it is able to answer a triple pattern. Bushy-tree plans are generated to reduce the size of intermediate results and the number of Cartesian products. Non-blocking operators efficiently retrieve data from endpoints and keep producing new answers even when endpoints become blocked; agJoin, agUnion and agOptional provide adaptive implementations of SPARQL logical operators. Mediator Query Planner SPARQL 1.1 Query Query Optimizer Query Decomposer Adaptive Query Engine Query Schema Alignments Endpoint Endpoint Adaptive Plan SPARQL 1.0 Query ANAPSID Fine-grained granularity during query execution and medium fine-grained during endpoint selection.
  • 86. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints ANAPSID Adaptive Join Operator Operators implement main memory replacement policies to flush previously computed matches to secondary memory, ensuring no duplicate generation. Each operator maintains a data structure called Resource Join Tuple (RJT), that records for each instantiation of the variable(s), the tuples that have already matched. (R, {TR1, TR2, ..., TRn}).
  • 87. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Adaptive Query Processing for Federations of Endpoints ANAPSID Adaptive Join Operator Adaptive Join Operator works in three stages: First stage is performed while at least one endpoint sends data. If main memory becomes full, Resource Join Tuples are flushed to secondary memory. Second stage is performed whenever both are blocked. Resource Join Tuples is main memory are joined with Resource Join Tuples in secondary memory Third stage is fired when both endpoints finish sending data.
  • 88. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Hands-On Section Fedbench Datasets and Queries Dataset Endpoint name #triples NY Times News 314k LinkedMDB Movies 6.14M Jamendo Music 1.04M Geonames Geography 7.98M SW Dog Food SW 84k KEGG Chemicals 10.9M Drugbank Drugs 517k ChEBI Compounds 4.77M SP2B-10M Bibliographic 10M DBPedia subset Infobox Types 5.49M Infobox Properties 10.80M Titles 7.33M Articles Categories 10.91M Images 3.88M SKOS Categories 2.24M Other 2.45M Queries Cross Domain CD1-CD7. Linked Data LD1-LD11. Life Science Linked Data LSD1-LD7. Complex Queries C1-C5.
  • 89. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Hands-On Section Delays Configuration Message Size: 16KB No Delay: perfect network. Gamma 1 : fast network with a gamma distribution (α = 1, β = 0.3) of response latency resulting in an average latency of 0.3 seconds. Gamma 2: medium fast network with a gamma distribution (α = 3, β = 1.0) of response latency resulting in an average latency of 3 seconds. Gamma 3: slow network with a gamma distribution (α = 3, β = 1.5) of response latency resulting in an average latency of 4.5 seconds per message. Probability Density Function for Gamma 0 0.5 1 0 2 4 6 8 10 12 14 16 18 20 Density Gamma 1 (α = 1, β = 0.3) Gamma 2 (α = 3, β = 1.0) Gamma 3 (α = 3, β = 1.5)
  • 90. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Hands-On Section Evaluation Parameters Exec Result Time Completeness Query # patterns instantiations result size Data data distribution data correlations dataset size Network #endpoints latency data transfer distribution
  • 91. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Hands-On Section Evaluation Objective: Study the Adaptive Capability of the Different Engines to Network Variation. Measurements: Time first tuple: elapsed time between the execution of the plan and the output of the first answer. Time all tuples: elapsed time between the execution of the plan and the output of the whole answer. Number of Results: number of tuples produced for each query.
  • 92. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 Summary and Future Directions Summary and Future Directions
  • 93. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 References [1] Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, and Edna Ruckhaus. ANAPSID: AN Adaptive query ProcesSing engIne for sparql enDpoints. In Proceedings of the International Semantic Web Conference (ISWC), 2011. [2] Gennady Antoshenkov and Mohamed Ziauddin. Query processing and optimization in oracle rdb. VLDB J., 5(4):229–237, 1996. [3] Ron Avnur and Joseph M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD Conference, pages 261–272, 2000. [4] Shivnath Babu and Pedro Bizarro. Adaptive query processing in the looking glass. In CIDR, pages 238–249, 2005. [5] Cosmin Basca and Abraham Bernstein. Avalanche: Putting the Spirit of the Web back into Semantic Web Querying. In SSWS2010 Workshop, Shanghai, China, 2010.
  • 94. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 References [6] Carlos Buil-Aranda, Marcelo Arenas, and Oscar Corcho. Semantics and optimization of the sparql 1.1 federation extension. In ESWC (2), pages 1–15, 2011. [7] Richard L. Cole and Goetz Graefe. Optimization of dynamic query evaluation plans. In SIGMOD Conference, pages 150–160, 1994. [8] Amol Deshpande and Joseph M. Hellerstein. Lifting the burden of history from adaptive query processing. In VLDB, pages 948–959, 2004. [9] Amol Deshpande, Zachary G. Ives, and Vijayshankar Raman. Adaptive query processing. Foundations and Trends in Databases, 1(1):1–140, 2007. [10] Olaf G¨orlitz and Steffen Staab. SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions. In Proceedings of the 2nd International Workshop on Consuming Linked Data, Bonn, Germany, 2011.
  • 95. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 References [11] Goetz Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73–170, 1993. [12] Andreas Harth, Katja Hose, Marcel Karnstedt, Axel Polleres, Kai-Uwe Sattler, and J¨urgen Umbrich. Data summaries for on-demand queries over linked data. In WWW, 2010. [13] Olaf Hartig. How caching improves efficiency and result completeness for querying linked data. In the 4th Linked Data on the Web (LDOW) Workshop at the World Wide Web Conference (WWW), 2011. [14] Olaf Hartig. Zero-knowledge query planning for an iterator implementation of link traversal based query execution. In Extended Semantic Web Conference, 2011. [15] Olaf Hartig, Christian Bizer, and Johann Christoph Freytag. Executing sparql queries over the web of linked data. In International Semantic Web Conference, pages 293–309, 2009.
  • 96. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 References [16] Yannis E. Ioannidis. Query optimization. ACM Comput. Surv., 28(1):121–123, 1996. [17] Navin Kabra and David J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD Conference, pages 106–117, 1998. [18] Kamlesh Laddhad and S. Sudarshan. Adaptive query processing. Technical Report 05329014, Kanwal Rekhi School of Information Technology, Indian Institute of Technology, Bombay, Mumbai, 2006. [19] Gunter Ladwig and Thanh Tran. Linked data query processing strategies. In International Semantic Web Conference (ISWC), 2010. [20] Yingjie Li and Jeff Heflin. Using reformulation trees to optimize queries over distributed heterogeneous sources. In ISWC, pages 502–517, 2010.
  • 97. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 References [21] Volker Markl, Vijayshankar Raman, David E. Simmen, Guy M. Lohman, and Hamid Pirahesh. Robust query processing through progressive optimization. In SIGMOD Conference, pages 659–670, 2004. [22] Priti Mishra and Margaret H. Eich. Join processing in relational databases. ACM Comput. Surv., 24(1):63–113, 1992. [23] Bastian Quilitz and Ulf Leser. Querying distributed rdf data sources with sparql. In ESWC, pages 524–538, 2008. [24] Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. Fedx: Optimization techniques for federated query processing on linked data. In International Semantic Web Conference (1), pages 601–616, 2011. [25] Feng Tian and David J. DeWitt. Tuple routing strategies for distributed eddies. In VLDB, pages 333–344, 2003.
  • 98. Adaptive Semantic Data Management Techniques for Federations of Endpoints- Half-Day Tutorial at ESWC 2012 References [26] Tolga Urhan and Michael J. Franklin. Xjoin: A reactively-scheduled pipelined join operator. IEEE Data Eng. Bull., 23(2):27–33, 2000. [27] Tolga Urhan and Michael J. Franklin. Dynamic pipeline scheduling for improving interactive query performance. In VLDB, pages 501–510, 2001. [28] Tolga Urhan, Michael J. Franklin, and Laurent Amsaleg. Cost based query scrambling for initial delays. In SIGMOD Conference, pages 130–141, 1998. [29] Stratis Viglas, Jeffrey F. Naughton, and Josef Burger. Maximizing the output rate of multi-way join queries over streaming information sources. In VLDB, pages 285–296, 2003. [30] Eugene Wong and Karel Youssefi. Decomposition-a strategy for query processing. ACM Trans. on Database Systems, 1(3), 1976.