07.Overview_of_Query_Processing.pdf

Distributed Database Systems
Autumn, 2008
Chapter 7
Overview of Query
Processing
1

SQL: Non-Procedural Language of RDB
 Tuple calculus
◦ { t | F(t) } where:
 t : tuple variable
 F(t) : well formed formula
 Example
◦ Get the No. and name of all managers
2
   
 
"
"
|
, MANAGER
TITLE
t
EMP
t
ENAME
ENO
t 



 Domain calculus
where:
 xi : domain variables
 : well formed formula
 Example
{ x, y | E(x, y, "manager") }
3
 
 
,
,
,
|
,
,
, 2
1
2
1 n
n x
x
x
F
x
x
x 





 
n
x
x
x
F ,
,
, 2
1 


Variables are position sensitive!

 SQL is a tuple calculus language
SELECT ENO,ENAME
FROM EMP
WHERE TITLE=“manager”
4
End user uses non-procedural languages
to express queries.

Query Processor
 Query processor transforms queries into
procedural operations to access data
5

Query Processor
 Distributed query processor has to deal
with
◦query decomposition, and
◦data localization
6

7.1 Query Processing Problems
Distributed Database Systems 7

 Centralized query processor must
◦transform calculus query into
algebra operation, and
◦choose the best execution plan
 Example:
SELECT ENAME
FROM E,G
WHERE E.ENO = G.ENO
AND RESP=“manager”
8

 Relational Algebra 1
 Relational Algebra 2
9
 
 
G
E Manager
RESP
ENO
ENAME "
"


 

 
 
G
E
ENO
G
ENO
E
Manager
RESP
ENAME 


 .
.
"
"


Execution plan 2 is better for consuming
less resources!

 In DDB, the query processor must
consider the communication cost and
select the best site!
 Same query as last example, but G and E
are distributed.
 Simple plan:
◦ To transport all segments to query site and
execute there.This causes too much network
traffic, very costly.
10

 Distributed Query Example
◦ Distribution of E and G
11

◦ Query
12
 
 
G
E Manager
REPSP
ENO
ENAME "
"


 


◦ Optimized Processing
13

7.2 Objectives of Query Processing

 Two-fold objectives:
◦Transformation, and
◦Optimization
15

 Cost to be considered for optimization:
◦CPU time
◦I/O time, and
◦Communication time
16
WAN: the last cost is dominant
LAN: all three are equal

7.3 Complexity of Relational Algebra Operations

7.3 Complexity of Relational Algebra Operations
 Measured by n (cardinality) and tuples are
sorted on comparison attributes
O(n)
O(nlogn)
O(nlogn)
O(n2)
)
duplicates
(with
,

GROUP
),
duplicates
(with


 ,
,
,
, 





7.4 Characterization of Query Processor

7.4.1 Languages
 For users:
◦ calculus or algebra based languages.
 For query processor:
◦ map the input into internal form of
algebra augmented with
communication primitives.

7.4.2 Types of Optimization
 Exhaustive search
◦ Workable for small solution space
 Heuristics
◦ Perform first, semi-join, etc. for large
solution space
 
,

7.4.3 Optimization Timing
 Static
◦ Do it at compiling time by using statistics,
appropriate for exhaustive search, optimized
once, but executed many times.
 Dynamic
◦ Do it at execution time, accurate, repeated
for every execution, expensive.

7.4.4 Statistics
 Facts of
◦ Cardinalities
◦ Attribute value distribution
◦ Size of relation, etc.
 Provided to query optimizer and
periodically updated.

7.4.5 Decision Site
 For query optimization, it may be done by
◦ Single site – centralized approach, or
◦ All the sites involved – distributed, or
◦ Hybrid – one site makes major decision in
cooperation with other sites making local
decisions

7.4.6 Exploration of the NetworkTopology
 WAN
◦ communication cost is dominant
 LAN
◦ communication cost is comparable to I/O
cost. Broadcasting capability, star network,
satellite network should be considered.

7.4.7 Exploration of Replicated Fragments
Use replications to minimize
communication costs.

7.4.8 Use of Semi-joins
Reduce the size of operand
relations to cut down
communication costs when
overhead is not significant.

7.5 Layers of Query Processing

Generic Laying Scheme
for Distributed Query
Processing

7.5.1 Query Decomposition
 Decompose calculus query into algebra
query using global conceptual schema
information.
Step 1 – calculus normalization
Step 2 – semantic analysis to reject
incorrect queries
Step 3 – simplification to eliminate
redundant components
Step 4 – translation of calculus query
into optimized algebra query.

7.5.2 Data Localization
Distributed query is mapped into
a fragment query and simplified
to produce a good one.

7.5.3 Global Query Optimization
 Find an execution strategy close to
optimal.
 Find the best ordering of operations in
the fragment query, including
communication operations.
 Cost function defined in time is required.

7.5.4 Local Query Optimization
Centralized system algorithms
(to be discussed in chapter 9)

7.6 Conclusions

7.6 Conclusions
 Query processor – must be able to find
good execution plan for a calculus query, s.
t. CPU time, I/O time and communication
time are minimized.
 Method: laying of
◦ decomposition
◦ localization
◦ global query optimization
◦ local query optimization

07.Overview_of_Query_Processing.pdf

Recommended

Recommended

More Related Content

Similar to 07.Overview_of_Query_Processing.pdf

Similar to 07.Overview_of_Query_Processing.pdf (20)

Recently uploaded

Recently uploaded (20)

07.Overview_of_Query_Processing.pdf