1. Lecture 3/2 Query Processing
• T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 5th edition,
Addison-Wesley, 2009. ISBN: 0-321-60110-6, ISBN-13: 978-0-321-60110-0 (International Edition).
• T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 4th edition,
Addison-Wesley, 2004. ISBN: 0-321-21025-5.
• R. Elmasri and S. B. Navathe, “Fundamentals of Database Systems”, 5th ed., Pearson, 2007, ISBN: 0-321-41506-X.
ITM661 – Database Systems
2. Query Processing 2
Objectives
Objectives of query processing and optimization.
Static versus dynamic query optimization.
How a query is decomposed and semantically analyzed.
How to create a R.A.T. to represent a query.
Rules of equivalence for RA operations.
How to apply heuristic transformation rules to improve
efficiency of a query.
How pipelining can be used to improve efficiency of
queries.
Difference between materialization and pipelining.
Advantages of left-deep trees.
(R.A.T = Relational Algebra Tree)
3. Query Processing 3
An Example (Branch and Staff Relations)
• A relation is a table with
columns and rows.
Only applies to logical structure of
the database, not the physical
structure.
• An attribute is a named
column of a relation.
• A domain is a set of
allowable values for one or
more attributes.
• A tuple is a row of a
relation.
• A degree is a number of
attributes in a relation.
• A cardinality is a number of
tuples in a relation.
• A relational database is a
collection of normalized
relations.
4. Query Processing 4
Relational Algebra and Calculus
Relational algebra and relational calculus are formal
languages associated with the relational model.
Informally,
Relational algebra is a (high-level) procedural
language and
Relational calculus a non-procedural language.
However, formally both are equivalent to one another.
A language that produces a relation that can be derived
using relational calculus is relationally complete.
5. Query Processing 5
Relational Algebra
There are 5 basic operations, in relational algebra, that performs
most of the data retrieval operations needed.
Selection
Projection
Cartesian Product
Union
Set Difference
Also operations that can be expressed by 5 basic operations.
Join
Intersection
Division
9. Query Processing 9
Query Processing
Activities involved in retrieving data from the
database.
Aims of QP:
transform query written in high-level language (e.g.
SQL), into correct and efficient execution strategy
expressed in low-level language (implementing
Relational Algebra);
execute the strategy to retrieve required data.
10. Query Processing 10
Query Optimization
Activity of choosing an efficient execution
strategy for processing query.
As there are many equivalent transformations of
same high-level query, aim of QO is to choose
one that minimizes resource usage.
Generally, reduce total execution time of query.
May also reduce response time of query.
Problem computationally intractable with large
number of relations, so strategy adopted is
reduced to finding near optimum solution.
11. Query Processing 11
Different Strategies
Find all Managers that work at a London branch.
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
Three equivalent RA queries are:
(1) s(position='Manager’) (city='London') (staff.bno=branch.bno) (Staff Branch)
(2) s(position='Manager') (city='London') ( Staff |><| staff.bno=branch.bno Branch)
(3) (sposition='Manager’(Staff)) |><| staff.bno=branch.bno (s city='London' (Branch))
15. Query Processing 15
Cost Comparison in Different Strategies
Assume:
1000 tuples in Staff; 50 tuples in Branch;
50 Managers; 5 London branches;
No indexes or sort keys;
Results of any intermediate operations stored on disk;
Cost of the final write is ignored;
Tuples are accessed one at a time.
(1) s(position='Manager’) (city='London') (staff.bno=branch.bno) (Staff Branch)
(2) s(position='Manager') (city='London') ( Staff |><| staff.bno=branch.bno Branch)
(3) (sposition='Manager’(Staff)) |><| staff.bno=branch.bno (s city='London' (Branch))
(1) (1000 + 50) + 2*(1000 * 50) = 101,050
(2) 2*1000 + (1000 + 50) = 3,050
(3) 1000 + 2*50 + 50 + 2*(5) = 1,160
Cartesian product and
join operations are
much more expensive
than selection
17. Query Processing 17
Dynamic versus Static Optimization
Two choices when first three phases of QP can be
carried out:
1. dynamically every time query is run.
Advantages if dynamic QO arise from fact that
information is up-to-date.
Disadvantages are that performance of query is
affected, time may limit finding optimum strategy.
2. Statically when query is first submitted.
Advantages of static QO are removal of runtime
overhead more time to find optimum strategy.
Disadvantages arise from fact that chosen execution
strategy may no longer be optimal when query is
run.
Could use a hybrid approach to overcome this.
18. Query Processing 18
Query Decomposition
Aims are to transform high-level query into
RA query and check that query is syntactically
and semantically correct.
Typical stages are:
analysis
normalization
semantic analysis
simplification
query restructuring
19. Query Processing 19
Analysis (I)
Analyze query lexically and syntactically using compiler
techniques.
Verify relations and attributes exist.
Verify operations are appropriate for object type.
EX: SELECT staff_no
FROM Staff
WHERE position > 10;
This query would be rejected on two grounds:
Staff_no is not defined for Staff relation (should be
StaffNo).
Comparison ‘>10’ is incompatible with type ‘position’,
which is variable character string.
20. Query Processing 20
Analysis (II)
Finally, query transformed into some internal
representation more suitable for processing.
Some kind of query tree is typically chosen,
constructed as follows:
Leaf node created for each base relation.
Non-leaf node created for each intermediate
relation produced by RA operation.
Root of tree represents query result.
Sequence is directed from leaves to root.
22. Query Processing 22
Normalization
Converts query into a normalized form for easier
manipulation.
Predicate can be converted into one of two forms:
Conjunctive normal form:
(position = 'Manager' salary > 20000) (bno = 'B3')
Disjunctive normal form:
(position = 'Manager' bno = 'B3' )
(salary > 20000 bno = 'B3')
23. Query Processing 23
Semantic Analysis
Rejects normalized queries that are incorrectly
formulated or contradictory.
Query is incorrectly formulated if components do not
contribute to generation of result.
Query is contradictory if its predicate cannot be
satisfied by any tuple.
Algorithms to determine correctness exist only for
queries that do not contain disjunction and negation.
24. Query Processing 24
24
Semantic Analysis
For these queries, could construct:
A relation connection graph.
Normalized attribute connection graph.
Relation connection graph
Create node for each relation and node for result.
Create edges between two nodes that represent a
join, and edges between nodes that represent
projection.
If not connected, query is incorrectly formulated.
25. Query Processing 25
25
Normalized Attribute Connection Graph
Create node for each reference to an attribute, or
constant 0.
Create directed edge between nodes that represent a
join, and directed edge between attribute node and 0
node that represents selection.
Weight edges a b with value c, if it represents
inequality condition (a b + c); weight edges 0 a
with -c, if it represents inequality condition (a c).
If graph has cycle for which valuation sum is
negative, query is contradictory.
26. Query Processing 26
Checking Semantic Correctness
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.clientNo = v.clientNo AND
c.maxRent >= 500 AND
c.prefType = 'Flat' AND p.ownerNo = 'CO93';
Relation connection graph not fully connected, so query is not
correctly formulated. Omiting the join (v.propertyNo =
p.propertyNo).
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.maxRent > 500 AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.prefType = 'Flat' AND c.maxRent < 200;
Normalized attribute connection graph has cycle between
nodes R.maxRent and 0 with negative valuation sum, so query
is contradictory
28. Query Processing 28
Simplification
Simplification strategy:
Detects redundant qualifications,
Eliminates common sub-expressions,
Transforms query to semantically equivalent but more
easily and efficiently computed form.
Typically, access restrictions, view definitions, and integrity
constraints are considered.
Assuming user has appropriate access privileges, first apply
well-known idempotency rules of Boolean algebra.
Ex.: p p equivalent to p
p true equivalent to true
Next step is Query restructuring
29. Query Processing 29
Transformation Rules for RA Operation (I)
Conjunctive selection operations can cascade into individual
selection operations (vice versa).
sp q r(R) = sp(sq(sr(R)))
Sometimes referred to as cascade of selection.
sbranchNo='B003‘ salary>15000(Staff) = sbranchNo='B003'(ssalary>15000(Staff))
Commutativity of selection.
sp(sq(R)) = sq(sp(R))
For example:
sbranchNo='B003'(ssalary>15000(Staff)) = ssalary>15000(sbranchNo='B003'(Staff))
30. Query Processing 30
Transformation Rules for RA Operation (II)
In a sequence of projection operations, only the last in the
sequence is required.
PLPM … PN(R) = PL (R)
For example:
PlNamePlName, branchNo(Staff) = PlName (Staff)
Commutativity of selection and projection.
PAi, …, Am(sp(R)) = sp(PAi, …, Am(R)) where p {A1, A2, …, Am}
For example:
PfName,lName(slName='Beech'(Staff)) = slName='Beech'(PfName,lName(Staff))
31. Query Processing 31
Transformation Rules for RA Operation (III)
Commutativity of q-join (and Cartesian product).
R p S = S p R
R S = S R
Rule also applies to equi-join and natural join. For example:
Staff staff.branchNo=branch.branchNo Branch =
Branch staff.branchNo=branch.branchNo Staff
32. Query Processing 32
Transformation Rules for RA Operation (IV)
Commutativity of selection and theta-join (or Cartesian
product).
If selection predicate involves only attributes of one of join
relations, selection and join (or Cartesian product) operations
commute:
sp(R r S) = (sp(R)) r S
sp(R S) = (sp(R)) S
where p is a constraint on attributes of R
33. Query Processing 33
Transformation Rules for RA Operation (V)
If selection predicate is conjunctive predicate having form (p
ู q), where p only involves attributes of R, and q only
attributes of S, selection and theta-join operations commute as:
sp q(R r S) = (sp(R)) r (sq(S))
sp q(R S) = (sp(R)) (sq(S))
For example:
sposition='Manager’ city='London'(Staff staff.branchNo=branch.branchNo Branch) =
(sposition='Manager'(Staff)) staff.branchNo=branch.branchNo (scity='London' (Branch))
34. Query Processing 34
Transformation Rules for RA Operation (VI)
Commutativity of projection and theta-join
(or Cartesian product).
If projection list is of form L = L1 L2, where L1 only
involves attributes of R, and L2 only involves attributes of S,
provided join condition only contains attributes of L,
projection and theta-join operations commute as:
PL1 L2(R r S) = (PL1(R)) r (PL2(S))
35. Query Processing 35
Transformation Rules for RA Operation (VII)
If join condition contains additional attributes not in L, say
attributes M = M1 M2 where M1 only involves attributes of
R, and M2 only involves attributes of S, a final projection op.
is required:
PL1 L2(R r S) = PL1 L2( (PL1 M1(R)) r (PL2 M2(S)))
For example:
Pposition, city, branchNo(Staff staff.branchNo=branch.branchNo Branch) =
(Pposition, branchNo(Staff)) staff.branchNo=branch.branchNo (Pcity, branchNo (Branch))
and using the latter rule:
Pposition, city(Staff staff.branchNo=branch.branchNo Branch) =
Pposition, city ((Pposition, branchNo(Staff)) staff.branchNo=branch.branchNo ( Pcity, branchNo (Branch)))
36. Query Processing 36
Transformation Rules for RA Operation (VIII)
Commutativity of union and intersection (but not set
difference).
R S = S R
R S = S R
Commutativity of selection and set operations (union,
intersection, and set difference).
sp(R S) = sp(S) sp(R)
sp(R S) = sp(S) sp(R)
sp(R - S) = sp(S) - sp(R)
Commutativity of projection and union.
PL(R S) = PL(S) PL(R)
Associativity of union and intersection (but not set difference).
(R S) T = S (R T)
(R S) T = S (R T)
37. Query Processing 37
Transformation Rules for RA Operation (IX)
Associativity of q-join (and Cartesian product).
Cartesian product and natural join are always associative:
(R S) T = R (S T)
(R S) T = R (S T)
If join condition q involves attributes only from S and T, then
theta-join is associative as follows:
(R p S) q r T = R p r (S q T)
For example:
((Staff Staff.staffNo=PropertyForRent.staffNo PropertyForRent)
PropertyForRent.ownerNo=owner.ownerNo staff.lName=owner.lName Owner)
=
((Staff Staff.staffNo=PropertyForRent.staffNo Staff.lName=Owner.lName (PropertyForRent
PropertyForRent.ownerNo = Owner.ownerNo Owner)
38. Query Processing 38
Use of Transformation Rules (I)
For prospective renters of flats, find properties that match
requirements and owned by CO93.
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND
p.ownerNo = ‘CO93’;
39. Query Processing 39
Use of Transformation Rules (II)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;
40. Query Processing 40
Use of Transformation Rules (III)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;
41. Query Processing 41
Use of Transformation Rules (IV)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;
42. Query Processing 42
Heuristical Processing Strategies
Perform selection operations as early as possible.
Keep predicates on same relation together.
Combine Cartesian product with subsequent selection whose
predicate represents join condition into a join operation.
Use associativity of binary operations to rearrange leaf nodes
so leaf nodes with most restrictive selection operations
executed first.
Perform projection as early as possible.
Keep projection attributes on same relation together.
Compute common expressions once.
If common expression appears more than once, and result
not too large, store result and reuse it when required.
It is useful when querying views, as same expression is used
to construct view each time.
43. Query Processing 43
Pipelining
Materialization - output of one operation is stored in
temporary relation for processing by next.
Could also pipeline results of one operation to
another without creating temporary relation.
Known as pipelining or on-the-fly processing.
Pipelining can save on cost of creating temporary
relations and reading results back in again.
Generally, pipeline is implemented as separate
process or thread.
45. Query Processing 45
Pipelining
With linear trees, relation on one side of each
operator is always a base relation.
However, as need to examine entire inner
relation for each tuple of outer relation, inner
relations must always be materialized.
This makes left-deep trees appealing as inner
relations are always base relations.
Reduces search space for optimum strategy,
and allows QO to use dynamic processing.
Not all execution strategies are considered.
46. Query Processing 46
Exercise I
Assume that there are 1000 tuples in Student, 40 tuples in
Course, 5000 tuples in TakeCourse, 40 tuples in Lecturer,
25 internal lecturers, 30 three-credit courses, 8 IT lecturers
and 200 IT students.
Assume that the following SQL statement is executed
SELECT *
FROM Lecturer L, Course C
WHERE C.LecturerID = L.LecturerID AND
L.InOrEx = ‘I’ AND C.Credit = 3
Compare the cost of disk accesses of three following equivalent relational
algebra queries corresponding to the above SQL statement.
(a) s(C.LecturerID = L.LecturerID) (C.Credit = 3) ( L.InOrEx = ‘I’ ) (Course X Lecturer)
(b) s(C.Credit = 3) ( L.InOrEx = ‘I’ ) ( Course |X|C.LecturerID = L.LecturerID Lecturer)
(c) (s(C.Credit = 3) (Course)) |X|C.LecturerID = L.LecturerID (s( L.InOrEx = ‘I’ ) (Lecturer))
47. Query Processing 47
Exercise II
Given the following SQL query, illustrate relational algebra trees based
on heuristical processing strategies.
SELECT S.SFName, S.SLName, L.LFName
FROM Student S, Course C, Lecturer L, TakeCourse T
WHERE C.LecturerID = L.LecturerID AND
C.CourseID = T.CourseID AND
S.StudentID = T.StudentID AND
C.Credit = 3 AND S.DeptID = ‘IT’ AND
L.LDeptID = S.DeptID;
Hint: The heuristics are
(1) Perform selection operations as early as possible
(2) Combine the Cartesian product with a subsequent selection operation whose predicate represents
a join condition into a join operation.
(3) Use associativity of binary operations to rearrange leaf nodes so that the leaf nodes with the most
restrictive selection operations are executed first.
(4) Perform projection as early as possible
(5) Compute common expressions once.