Query optimisation

⇒ Motivation for Query Optimisation
⇒ Phases of Query Processing
⇒ Query Trees
⇒ RA Transformation Rules
⇒ Heuristic Processing Strategies
⇒ Cost Estimation for RA Operations
LECTURE PLAN

Motivation for Query Optimisation
List all the managers that work in the sales department.
SELECT *
FROM emp, dept
WHERE emp.deptno = dept.deptno
AND emp.job = ‘Manager’
AND dept.name = ‘Sales’;
σ(job = ‘Manager’) ∧ (name=‘Sales’) ∧ (emp.deptno = dept.deptno) (EMP X DEPT)
σ(job = ‘Manager’) ∧ (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT)
(σ(job = ‘Manager’) (EMP)) emp.deptno = dept.deptno (σ(name=‘Sales’) (DEPT))
There are at least three
alternative ways of
representing this query
as a Relational Algebra
expression.

Metrics:
1000 tuples in the EMP relation
50 tuples in the DEPT relation
50 employees are Managers (one per department)
5 separate Sales departments (across the country)
Cost of processing the following query alternate:
Cartesian product of EMP and DEPT:
(1000 + 50) record I/O’s to read the relations
+ (1000 * 50) record I/O’s to create an intermediate relation to store result
Selection on result of Cartesian product:
(1000 * 50) record I/O’s to read tuples and compare against predicate
Total cost of the query:
(1000 + 50) + 2*(1000 * 50) = 101, 050 record I/O’s.

Metrics:
1000 tuples in the EMP relation
50 tuples in the DEPT relation
50 employees are Managers (one per department)
5 separate Sales departments (across the country)
Cost of processing the following query alternate:
Join of EMP and DEPT over deptno:
+ (1000) record I/O’s to create an intermediate relation to store join result
Selection on result of Join:
(1000) record I/O’s to read each tuple and compare against predicate
(1000 + 50) + 2*(1000) = 3, 050 record I/O’s.

Cost of processing the following query:
Select ‘Managers’ in EMP:
(1000) record I/O’s to read the relations
+ (50) record I/O’s to create an intermediate relation to store select result
Select ‘Sales’ in DEPT:
(50) record I/O’s to read the relations
+ (5) record I/O’s to create an intermediate relation to store select result
Join of previous two selections over deptno:
(1000 2*(50) + 5 +(50 +5)) = 1, 160 record I/O’s.

Query Processing Stage - 1
 Cast the query into internal form
 This involves the conversion of the original (SQL)
query into some internal representation more suitable
for machine manipulation.
 The internal representation typically chosen is either
some kind of ‘abstract syntax tree’, or a relational
algebra ‘query tree’.

Relational Algebra Query Trees
A Relational Algebra query can be represented as a ‘query tree’. For
example the query to list all the managers that work in the sales
department could be described as one of the following:
EMP DEPT
X
σ(job = ‘Manager’) ∧ (name=‘Sales’) ∧ (emp.deptno = dept.deptno)
Leaves
Intermediate
operations
Root

A Relational Algebra query can be represented as a ‘query tree’. For
example the query to list all the managers that work in the sales
department could be described as one of the following:
EMP DEPT
X
σ(job = ‘Manager’) ∧ (name=‘Sales’)
∧ (emp.deptno = dept.deptno)
Leaves
Intermediate
operations
Root

EMP DEPT
σ(job = ‘Manager’) ∧ (name=‘Sales’)
emp.deptno = dept.deptno
Alternative‘query tree’ for the query to list all the managers that work
in the sales department:

EMP DEPT
σ(job = ‘Manager’) σ(name=‘Sales’)
Alternative‘query tree’ for the query to list all the managers that work
in the sales department:

 Convert to canonical form
 Find a more ‘efficient’ representation of the query by
converting the internal representation into some
equivalent (canonical) form through the application
of a set of well-defined ‘transformation rules’.
 The set of transformation rules to apply will
generally be the result of the application of specific
heuristic processing strategies associated with
particular DBMSs.

1. Conjunctive selection operations can cascade into
individual selection operations (and vice versa).
Sometimes referred to as cascade of selection.
σp∧q∧r(R) = σp(σq(σr(R)))
Example:
σdeptno=10 ∧sal>1000(Emp) = σdeptno=10(σsal>1000(Emp))
Transformation Rules for RA Operations

2. Commutativity of selection
σp(σq(R)) = σq(σp(R))
Example:
σsal>1000(σdeptno=10(Emp)) = σdeptno=10(σsal>1000(Emp))

3. In a sequence of projection operations, only the last
in the sequence is required.
ΠLΠM … ΠN(R) = ΠL (R)
Example:
ΠdeptnoΠname(Dept) = Πdeptno (Dept))

4. Commutativity of selection and projection.
ΠAi,…,Am(σp(R)) = σp(ΠAi,…,Am(R))
where p ∈{A1, A2, …, Am}
Example:
Πname, job(σname=‘Smith’(Emp)) = σname=‘Smith'(Πname,job(Staff))
Selection predicate (p) is only
made up of projected attributes

5. Commutativity of theta-join (and Cartesian product).
R pS = S pR
R X S = S X R
Example:
EMP emp.deptno = dept.deptno DEPT
= DEPT emp.deptno = dept.deptno EMP
NOTE: Theta-join is a generalisation
of both the equi-join and natural-join

6. Commutativity of selection and theta-join
(or Cartesian product).
Example:
(σemp.deptno=10 (EMP)) emp.deptno = dept.deptno DEPT
= σemp.deptno=10 (EMP emp.deptno = dept.deptno DEPT)
(σp(R)) r S = σp(R r S)
where p ∈{A1, A2, …, Am}
Selection predicate (p) is only
made up of join attributes

7. Commutativity of projection and theta-join
(or Cartesian product).
Example:
Π job, location, deptno (EMP emp.deptno = dept.deptno DEPT)
= (Π job, deptno (EMP)) emp.deptno = dept.deptno (Π location, deptno (DEPT))
ΠL(R r S) = (ΠL1(R)) r (ΠL2(S))
Project attributes L = L1 ∪ L2, where L1 are attributes of R, and
L2 are attributes of S. L will also contain the join attributes

8. Commutativity of union and intersection
(but not set difference).
R ∪ S = S ∪ R
R ∩ S = S ∩ R

9. Commutativity of selection and set operations
(union, intersection, and set difference).
Union
σp(R ∪ S) = σp(S) ∪ σp(R)
Intersection
σp(R ∩ S) = σp(S) ∩ σp(R)
Set Difference
σp(R - S) = σp(S) - σp(R)

10 Commutativity of projection and union
ΠL(R ∪ S) = ΠL(S) ∪ ΠL(R)

11 Associativity of natural join (and Cartesian product)
Natural Join
(R S) T = R (S T)
Cartesian Product
(R X S) X T = R X (S X T)

12 Associativity of union and intersection (but not set
difference)
Union
(R ∪ S) ∪ T = S ∪ (R ∪ T)
Intersection
(R ∩ S) ∩ T = S ∩ (R ∩ T)

Heuristic Processing Strategies
 Perform selection operations as early as possible
 Translate a Cartesian product and subsequent
selection (whose predicate represents a join condition)
into a join operation.
 Use associativity of binary operations to ensure
that the most restrictive selection operations are
executed first
 Perform projections as early as possible.
 Compute common expressions once

Heuristic Processing - Example
EMP DEPT
σ(job =‘Manager’) ∧ (name=‘Sales’)
EMP DEPT
EMP DEPT
EMP DEPT
σ(job =‘Manager’) σ(name=‘Sales’)
EMP DEPT
EMP DEPT
σ(job =‘Manager’)
EMP DEPT
X
EMP DEPT
X
EMP DEPT
X
Optimised
Canonical Query

 Choose candidate low-level procedures
 Consider the (optimised canonical) query as a series
of low-level operations (join, restrict, etc…).
 For each of these operations generate alternative
execution strategies and calculate the cost of such
strategies on the basis of statistical information held
about the database tables (files).

 Generate query plans and choose the cheapest
 Construct a set of ‘candidate’ Query Execution Plans (QEPs).
 Each QEP is constructed by selecting a candidate
implementation procedure for each operation in the canonical
query and then combining them to form a string of associated
operations.
 Each QEP will have an (estimated) cost associated with it – the
sum of the cost of each of its operations.
 Choose the QEP with the least cost.

Cost Based Optimisation
 Cost Based Optimisation (stages 3 & 4)
 A good declarative query optimiser does not rely
solely on heuristic processing strategies.
 It chooses the QEP with the lowest estimated cost.
 After heuristic rules are applied to a query, there still
remains a number of alternative ways to execute it .
 The Query Optimiser estimates the cost of executing
each one (or at least a number) of these alternatives, and
selects the cheapest one.

Costs associated with query execution
 Secondary storage access costs:
 Searching for data blocks on disk,
 Reading data blocks from disk
 Writing data block to disk
 Storage costs
 Cost of storing intermediate (temp) files
 Computation costs
 Cost of CPU usage
 Main memory usage costs
 Cost of buffering data
 Communication costs
 Cost of moving data across

Database statistics used in cost estimation
Information held on each relation:
 number of tuples
 number of blocks
 blocking factor
 primary access method
 primary access attributes
 secondary indexes
 secondary indexing attributes
 number of levels for each index
 number of distinct values of each attribute

Physical Data Structures – File Types
 Heap (Sequential, Unordered)
 no key columns
 queries, other than appends, scan every page
 rows are appended at the end
 duplicate rows are allowed
 Ordered
 physically sorted data file with no index
 Hash (Random, Direct)
 data is located based on the (calculated) value of a hash field (key)
 Indexed Sequential (ISAM)
 sorted data file with a primary index
 B+
Tree
 dynamic multilevel index
 reuses deleted space on associated data pages

Strategies for implementing the RESTRICT operation
Different access strategies dependant upon the structure of
the file in which the relation is stored, and whether the
predicate attribute(s) have been indexed/hashed: Each uses a
different cost algorithm (which refers to specific database statistics).
 Linear Search (Heap)
 Binary Search (Ordered)
 Equality on Hash Key
 Equality condition on primary key
 Inequality condition on primary key
 Equality condition on secondary index
 Inequality condition on secondary B+
Tree index
If the selection predicate is a composite (AND & OR) then there
are additional cost considerations!

Strategies for implementing the JOIN operation
Different access strategies dependant upon the structure of the
files in which the relations to be joined are stored, and whether
the join attributes have been indexed/hashed: Each uses its
own cost algorithm (which refers to specific database statistics).
 Block nested loop join
 Indexed nested loop join
 Sort-merge join
 Hash join

Query Optimisation Summary
 The aims of query processing are to transform a query
written in a high-level language (SQL), into a correct and
efficient execution strategy expressed in a low-level
language (Relational Algebra), and to execute the strategy to
retrieve the required data.
 There are many equivalent transformations of the same high-
level query, the DBMS has to choose the one that minimises
resource usage.
 There are two main techniques for query optimisation. The
first uses heuristic rules that order the operations in a query.
The second compares different execution strategies for those
operations, based on their relative costs, and selects the least
resource intensive (cheapest) ones.

Query optimisation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Query optimisation

Similar to Query optimisation (20)

More from WBUTTUTORIALS

More from WBUTTUTORIALS (12)

Recently uploaded

Recently uploaded (20)

Query optimisation