20110516_ria_ENC

Query Optimization Using
Case-based Reasoning in
Ubiquitous Environments

Lourdes Angelica Martinez-Medina
Christophe Bibineau
Jose Luis Zevhinelli-Martini

2009 Mexican International Conference on Computer Science (ENC '09)

2011/05/16 - Ria Mae Borromeo

Introduction
 Query Optimization
 Rely on cost models that are dependent on metadata (statistics,
cardinality estimates)
 Typically restricted to execution time estimation

 Problem
 There are computational environments where metadata
acquisition and support are expensive.
 i.e. Ubiquitous environments

 Proposed Solution
 Query Optimization technique based on learning, particularly
case-based reasoning

2

Ubiquitous Environment
 Integrates information from
different computational tools and
application

 Characteristics
1. Heterogeneity ( )
• extensive range of computational
resources and electronic devices
• devices have different physical and logical
characteristics

2. Dynamicity ( )
• resources change continuously due to
mobility
• communication network properties and
the resources that interact with it vary

3

Ubiquitous Environment
3. Distribution ( )
• resources are distributed within a physical space thus information used by these
resources are also distributed

4. Autonomy ( )
• resources can change their availability status anytime

6. Physical Constraints ( )
• i.e.: processing and storage capability, energy consumption, location

7. Metadata lack ( )
• Constant changes --> Expensive maintenance --> No global schema

4

ill be available again. is composed by three phases: logical, global, and physical
s. Resources present physical lim- Logical and physical optimization phases are related to cen
Classical Query Optimization
ain their appropriate operation, e.g.
rage capability, energy consump-
tralized environments. Global optimization is required in
distributed environment. Figure 1 illustrates the optimization
ng others. A device or a process is phases of the typical optimization process.
e a task only if it counts with the
 Evaluation cost models used
ational resources. It is convenient
for most of classical query
sk performance based on speciﬁc
optimization techniques are
he resource characteristics previ-
tightly tied to metadata
make difﬁcult the acquisition and
use.
tadata like cardinality and statistics
alues. There is not a global schema
 Each phase requires
utational environments, its mainte-
nsive different constant changes
due to the metadata types
and has different
ational environments metadata ac-
optimization objectives
ce is very expensive. Ubiquitous
must provide a set of methods to
m available resources. The proper-
ources in ubiquitous environment
s for query processing. Some of
Figure 1. Phases of the optimization process
metadata required for estimating
xecution plans (possible execution
5
esults of a query) as a result of

 Logical Optimization
 Aims to reduce the number of tuples combined as
intermediate results
 Appropriate order for applying selection, projection and join
operators must be decided
 Uses heuristics and metadata
 Result:

Figure 2. Algebraic query trees
6

 Global Optimization
 Aims to minimize communication cost related to interactions
among resources and a set of views
 Global optimizer: decides where to perform each part of the
execution tree
 Result: new execution tree with communication operators

7

 Physical Optimization
 Aims to reduce disk access for retrieving requested data and
minimize execution time for executing query plans
 Metadata related to execution context is required

Figure 2. Algebraic query trees

Algebraic query trees 8
timization Figure 3. Query execution plan

Contribution of the Paper
 Proposes a query optimization technique for ubiquitous
environments

 Allows query optimization according to user requirements

 Query optimization based on learning
 Goal: Improve or acquire new capabilities rom experience
related some specific tasks

9

Query Optimization Based on Learning
 Learn from past experience!
 Experience : the knowledge gained from a problem resolution

 Learning : the acquisition of knowledge in order to improve the
behavior or to acquire new capabilities from previous
experiences
 Machine Learning : a sub-discipline of AI that is in-charge of
designing and developing methods that allow computers to
automatically learn in order to improve or create specific
capabilities

10

Case-based Reasoning
 Proposes a reasoning process that aims to solve new
problems using the experience gained when similar
problems are solved

Case minimum unit of reasoning
Problem Description

Solution

Set of annotations that
describe how the
solution was derived

11

consists of (i) a problem description, (ii) its correspondent
solution, and, (iii) a set of annotations that describe how s
Case-based Reasoning Process
the solution was derived. Case based reasoning has been t
formalized as a four-step process: retrieve, reuse, review and
retain [7].

(4) Store as a new (1) Get relevant cases
case in the memory

(2) Adjust the solution
(3) New solution must of the relevant case
be verified in the real to the problem
world (simulation)

Figure 4. Case-based reasoning process
12

Case-based Reasoning Adaptation to
Query Optimization
 Adapts case-based reasoning to provide optimal execution plans
for new queries
 Uses the knowledge acquired from experience to optimize and
execute similar queries

 The solution is represented by the current execution plan:
1. Query
2. Problem
3. Case
4. Reasoning Process

13

to solve new The whereClause speciﬁes the set of conditions (for data
milar problems
f reasoning. It 1. Query
selection and data combination or join) that must be veriﬁed
by the data to form part of the query result.
correspondent Figure 5 illustrates the model that we propose for repre-
describe how
 Modular part of knowledge in the definition of and join operations are
senting a query. In a query, selection a problem & case
ning haspiece of knowledge that links amost frequent. the existing
 The been the most important and problem with
use, cases and
review
 selectClause
 fromClause
 whereClause

Query Representation (UML Diagram)
Figure 5. Query representation (UML diagram)

ss
14

1. Query
 Query Operation
 Type
 Select condition(atttexp, cnstexp)
 Join condition(attrexp.a, attrexp.b)
 Set of attributes
 Specific Condition

Q = {O1, O2, O3, O4 }
SELECT Rest.nom
FROM Resto, Ville, Region
WHERE Region.nom = ‘RA’ O1
AND Resto.spec = ‘IT’ O2
AND Resto.vil = Ville.nom O3
AND Ville.numDep = Region.numDep O4

15

We propose the concept of operation family in ord
1. Query
group operations that include the same condition applie
the same attributes and for this reason, the same relat
Two operations ox and oy pertain to the same oper
Operation Family
family if they associated to asame operation families or join)
 All queries are are of the set of type (selection
 Used to group operations that include the same condition
involve thethe sameattributesand sameof them must pertain
applied to same attributes (each relations
theTwo operations Ox and Oy respectively). An operation fami
 same data source are from the same operation family if:
represented as follows:
 same operation type (selection or join)
 same attributes

(1) R.an = {on | on = condition(R.an ,value)}

an attribute that pertains
The operation set
operations family is composed
by
R.an the relation R
to
operations set on with a condition of the
condition(R.an , value), where an is an attribute
16

of all possible comparison operators: Equal, EqualOrLower,
set. These operations are members of different operation the T p
Lower, GreaterOrEqual, Greater and Different. All the
families: R1.a1 , R2.a2 and R1.a3,R2.a4 . Equation (2) inclu
1. Query
queries are associated to a set of operation families. The
shows the operationa familiesQ is that are associated to each
Q defined by an operations
with
unde
whereClause of query simi
requi
operations in Q.
set. These operations are members of different operation solv
 The whereClause ,of a query Q is defined by. an operations set Th
families: R1.a1 R2.a2 and R1.a3,R2.a4 Equation (2) within
of
(On) Q the {
shows = operation families Q that are ,associated to }
(2) R1.a1 , R2.a2 , R1.a3,R2.a4 R2.a4,R3.a5 each com
simil
 These operations are members of different operation families
operations in Q. that
solve
 Operation families associated to each operation in Q
Each different combination of operation families R.an of int exec
conforms a = { R1.a1 , R2.a2 , i.e. the class R2.a4,R3.a5 } by comp
(2) Q class description, R1.a3,R2.a4 , Cn defined chan
 Class operation families in (3). The queries are classified in a
the Description (Cn) that2a
set ofEach different combinationoperation families mustR.an
Each different combination of of operation families
classes. execu
conform text
to conforms a class description, i.e. the class Cn defined by
this. chang
the operation families in (3). The queries are classified in a Figu
(3) Cn = { Rn.an , Rm.am , Rn.ap,Rm.aq , R2.a4,R3.a5 } 2)
set of classes. text e
 composed of all queries that contain at least one operation
Figur
that (3)class=Cto is composed specified families that contain
The Cn {n Rn.an ,ofRm.amby Rn.ap,Rm.aq ,Qn
pertains each the , all queries
R2.a4,R3.a5 }
at least one operation that pertains to each of the specified
families as definedisin (4). Thisby all queries Qn Qn pertains
The class Cn composed means, a query that contain
17

The class Cn is composed by all queries Qn that contain at least one op
at least one operation that pertains to each of the specified families as defi
1. Query
families as defined in (4). This means, a query Qn pertains to the class Cn
selection C if and only ifpertains operation family family
to the class n operation o2 for all to operation that describes C
that  Qn, pertains,to operationCnoif andnonlyQnto operation family is of
R2.a2 the Cn exists class
describes join the an operation o in if for all operation family
3 pertains such as this operation
operation is of the, form nofthe operation family n o4 Cn such pertains
F that describes C , exists an operation O in that as this
R2.a4,R1.a3 and the join operation .
to the operation the form of the operation family F (4) Qand Cn i
operation is of family R1.a1,R3.a6 . The operator n ∈
(4) Qnattribute (∀ Rn.an ∈ not) ∃ ((on ∈ Qn ) ∧ determine the
the ∈ Cn iff value are Cn important to (on ∈ Rn.an ))
Rn.an ))
operation family to which a specific operationVille
Relation R1 pertains,
Q = {O1, O2, O3, O4 }
the important knowledge is related to a1 the operation to
According
numReg
According to the query Q presented above, the selection operation o1 Fi
type and the attribute(s) included in the a2
SELECT Rest.nom operation. The
spec p
FROM o1 pertains to operation family
operationResto, Ville, Region R3.a5 , the nom
a3
operation families ‘RA’
WHERE Region.nom = described before make a4
O1
up a class a).
vil
Any Resto.spec = ‘IT’composed by operations that pertain
AND query that is O2
AND Resto.vil = Ville.nom Relation R2 Resto
toAND Ville.numDep = Region.numDep pertains to the same !!! b).
the families described before O3
a5
class
nom
O4
a6 num

a) C = { R3.a5 , R2.a2 , R2.a4,R1.a3 and R1.a1,R3.a6
b) q ∈ C iff (∀ Rn.an ∈ Cn )∃((on ∈q)∧(on ∈ Rn.an ))
18

computational resources consumed by the query and those
that are available at the moment that the new query will be
n
y
2. Problem
executed as well as in the optimization objective that can
changes each time the query is executed.
a 2) Problem: A problem is composed by a query, a con-
text execution representation, and an optimization objective.
 Specifies an optimized query, optimization parameters and
measures illustrates to computational resources available of query
Figure 6 related the components of a problem.
execution
 context
n  query
d  optimization
ns target
is

∈

Problem Representation (UML Diagram)
n Figure 6. Problem representation (UML diagram)
e 19

available memory, and remaining energy, among others.
Finally, the optimization objective indicates the resource or
2. Problem
set of resources that will be optimized, e.g. minimize energy
consumption. Figure 7 shows an example.

Figure 9
 Context - representsFigure 7. An example ofcomputational
measure of the a problem resources
instance sol
available when the query is executed which is a
The set of touples that represent the instance of context de- projection,
 Optimization Objectiveis: indicates{ the resource or set of data source
picted on Figure 7 - Context = <memory, 400>, <CPU,
resources75>, <energy, 70> } . Finally, the optimization objective
that will be optimized consumed
indicates the resource or resources from which their con- posed quer
sumption must be optimized.20 Typically, optimization means { <memory,

minimize the utilization of these resources. According to o
example, the optimization objective is minimize the memo
3. Case
consumption speciﬁed by F(memory).
3) Case: A case is composed of a query, a solution (que
plan) and a set of evaluation measures used to express t
 Specifies an optimized query, the solution query. Figure query and t
optimization objective of a to solve the 8 illustrates
the measures related to computational resources that were
components of a case.
consumed by the query execution
 query
 solution
 evaluation measures used to
express optimization objective

Case Representation (UML Diagram)
Figure 8. Case representation (UML diagram)
21

imization target to a set of measures collected during the query execution.
cribed as a set These measures are represented as couples of the form
that represents
ilable when the 3. Case
<attribute, value> and express the computational resources
(e.g. memory, CPU, or energy) consumed by the query
de CPU charge, execution. Figure 9 shows an example.
among others.
the resource or
minimize energy

 Query - optimization target that hasof a case evaluated and solved
Figure 9. An example been

m  Solution - physical execution plan that of this model. Such
Figure 9 presents a simple instance solves the query
instance solves the query Q by means of the query problem
which is an ordered and pertinent sequence of selection,
 Evaluationprojection, sort, and join collected during query of
ce of context de- - set of measures operations for accessing a set execution
, 400>, <CPU, data sources. The set of touples representing the resources
22
zation objective consumed during the query evaluation applying the pro-

are solved. A case is the minimum unit of reasoning. It by t
consists of (i) a problem description, (ii) its correspondent F
solution, and, (iii) aReasoning Process
4. set of annotations that describe how sent
the solution was derived. Case based reasoning has been the
formalized as a four-step process: retrieve, Retrieval review and
reuse,
retain query class, query plan
[7].
Retention
* The
* Get relevant cases using a
similarity function
and consumption measures * If there is no relevant case in
are stored in form of a case the case base, a new query plan
within the case base must be psuedo-randomly
Retrieval generated to increase the query
optimizer knowledge

Retention Reuse
Reuse
* Adjust the solution of the
Review relevant case to the
* Execution plan is problem
verified Review * The matching processes
depends on the cases’
23 similarity
Figure 4. Case-based reasoning process

relevant case within the class must be retrieved by means
Similarity Function
of an intra-class similarity function [10][11]. When the most
relevant case is retrieved, a detailed comparison between the
clauses of the new query and the relevant query (the query
Inter-class Similarity Function
included by the relevant case) is carried out. This determines
* used to define membership of a query
a similarity level between the two queries.
These functions are based on the contrast model of
similarity proposed by Tversky [12] that allow us to
determine Intra-class Similarity Function
the similarity between two objects by means
* used to retrieve most relevant case
of a feature-matching function. Similarity increases as
most common features and decreases as most distinctive
 Uses features [13]. The formalization of the original deﬁnition is
a feature-matching function
 Similarity increases as most common features and decreases as
expressed as follows [12]:
most distinctive features
(5) S (a, b) = θf(A ∩ B) - αf(A - B) - βf(B - A)

Similarity between a and b, is deﬁned in terms of the
24

ion families and as a decreasing function of distinctive families,
go- in other words, families that pertain to one query but not the
ific Inter-class similarity
other. The function can be applied to both classes, each one
ing defined by a set of operation families, or applied to a query
and a class. In this case, it is necessary to determine the
 Increasing function of common operation families
mp- operation families related to the involved operations. The
 Decreasing function of distinctive families
the formalization of this definition in terms of the similarity
 Determine operation and a class is expressedinvolved operations
between a query families related to the as follows:

(6) S(C1 ,Q) = θ (C1 ∩ Q) - α (C1 -Q) - β (Q-C1 )
ase-
vantoperation families commonC C1 and Qis defined in terms of
 Similarity between to and Q,
1
her-features that pertain tocommon to C and Q, C ∩ Q, the

operation families C1 only 1 1
 features that pertain to Q only
on features that pertain to C1 but no to Q, C1 - Q, and those
em. that pertain to Q but no to C1 , Q - C1 . The function f
ase refers particularly to operation families . According to the
ble purpose of our work, these are the features that must be
the compared.
the 25
For practical purposes, suppose that we know the class

ble of the query q and the definition of the classes c1 and c2 .
wo purpose of our work, these are the features that must be
the compared.
tep
the
ost
Inter-class similarity
q For practical3 } c = { R.a1 , ∈ R.a2 ,weR.a3,R.a4 } class
= {o1 , o2 , o purposes, suppose that know the
ans of the{query ,q and the definition }of the classes c1 and c2 .
c1 = R.a1 R.a2 , R.a3,R.a4
wo c2 = { R.a1 , R.a2 , x }
ost
tep
the q = {o1 , o2 , o3 } c = { R.a1 , ∈ R.a2 , R.a3,R.a4 }
most
ery c1From R.a1 ,intersections between the query class c that
= { the R.a2 , R.a3,R.a4 }
ans describesR.a1 , query, q x }
nes c2 = { the R.a2 and the classes c1 and c2 , it is
most possible to state that the query class c is similar to c1 .
the Compute for intersections of C with C1 and C2

of
ery From the intersections between the query class c that
to
nes describes the 1 ∩Q)={ and ,the classes c1 and c2 , it is
S(c1 ,q) = (C query q R.a1 R.a2 , R.a3,R.a4 }
ans possible = state∩Q)={ R.a1 , R.a2 } c is similar to c1 .
S(c2 ,q) to (C 2 that the query class
as
of Query class C is similar to C1

ive
to B. Intra-class 1 ∩Q)={ R.a1 , R.a2 , R.a3,R.a4 }
S(c1 ,q) = (Csimilarity
n is
ans S(c2 ,q) = (C2 ∩Q)={ function aims to find the most similar
Intra-class similarity R.a1 , R.a2 }
as queries with respect to a new query, which is desired to
tive be Intra-class similarity same class. In this step, all the
B. optimized, within the
n is compared queries are defined exactly to find the most similar
Intra-class similarity function26aims by the same operations
(operation type and involved attributes), the is desired to
queries with respect to a new query, which difference is

queries with respect to a new query, which is desired to
be optimized, within the same class. In this step, all the
Intra-class Similarity
compared queries are defined exactly by the same operations
(operation type and involved attributes), the difference is
he related to the comparison operators, as well as the attribute

ain Aims to find the most similar two queries Q and to a is
values. Similarity between queries with respect Q2 new query
1
no
 All defined as an increasing function ofoperationsoperations
compared queries have the same common
ve
 Comparison operators or attribute values may differ and
(identical operations in terms of its type, attributes
of operators). The formalization of this definition is as follows:

de Increasing function of common operations
ies (7) S (Q1 , Q2 ) = θo(Q1 ∩ Q2 ) - αo(Q1 - Q2 ) - βo(Q1 - Q2 )
ity
 Operations that are common to Q1 and features that pertain to
Q1 but not to Q2
!!"
 Find the query that contains the maximum number of operation
mappings!

27

two main modules, the case-reasoner and the execution plan
ons in common, they differ in the operator
generator. The case-base reasoner is in charge of adapting
join operation. Also, q1 and q2 have two
Query Optimizer Architecture
the solutions of similar queries to the new situation. The ex-
mmon, they differ in the operator applied by
ecution plan generator is in charge of generating new query
n. Finally, q1 and q5 have only one operation
plans in a pseudo-aleatory way. The case-base reasoner is
ording to this analysis, q2 is the most similar
the most complex of the two modules but the smartest, on
ect to q1 because contains the maximum
the other hand, the execution plan generator is simpler and
 Reutilizes the solutions related to queries that does not been solved
tion mappings. q5 is the most different query
probably faster; however it have apply machine learning
cause it contains the minimum number of
techniques. Figure 10 illustrates the optimizer architecture.
ngs. Generates new has exactly the
 On the other hand, q1 solutions
f mappings with q3 and q4 . How can we
hese two queries is the most similar to q1 .
A.
vels Case-based Reasoner
level 1. Smart queries indicates which
between two Search Engine
levant query must be adapted. This adapta-
ormed 2. Adapter and Where clauses.
just over Select
lause, interesting attributes to be projected
3. Execution Manager
he Where clause, comparison operators or
ted to the variablesBase Manager the
4. Case can be modiﬁed. On
From clause can not be changed because the
ried can not be changed. Table I illustrates
arity levels. Here, selectClause is expresed
B. Execution Plan Generator
se as FC and whereClause as WC.
n must be performed for the similarity levels
). If the similarity level is (3) the From Figure 10. Optimizer architecture
ry clauses are equal, the adaptation must
28
n the select clause, which means that the

Case-based Reasoner
 Adapts solutions of similar queries to the new situation

1. Smart Search Engine
• retrieves relevant cases
• applies Inter and Intra-class Similarity functions
• selects the query that minimizes the optimization parameters

2. Adapter
• adapts the query plan included in the relevant case to query
problem specifications
• used to facilitate and minimize the cost of the adaptation
process
29

Case-based Reasoner
3. Execution Engine
• tests the new query execution plan created by the adaptation
module

4. Case Base Manager
• allows to retain a new knowledge in form of a case
• similarity function is also used

30

Execution Plan Generator

31

20110516_ria_ENC

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (8)

Similar to 20110516_ria_ENC

Similar to 20110516_ria_ENC (20)

Recently uploaded

Recently uploaded (20)

20110516_ria_ENC

Editor's Notes