1
Chapter 4
Query Processing and Optimization
Query processing :
• In general, a query is a form of questioning, in a line of inquiry
• A query language is a language in which user requests
information from the database
 The aim of query processing is to find information in one or
more databases and deliver it to the user
Query optimization:
It is the process of choosing a suitable execution strategy for
processing a query
 Two internal representations of a query:
 Query Tree
 Query graph
Query Processing and Optimization
 Scanner: identify language components.
keywords, attribute, relation names
 Parser : check query system
 Validation: check attributes & relations
 Query tree (query graph) : internal
representation
 Execution strategy: plan
 Query optimization : choose a strategy
(reasonably efficient strategy)
3
Query Processing (cont…)
Query Processing (cont…)
 Query Processing can be divided into four main phases:
 Decomposition:
 Optimization
 Code generation, and
 Execution
 Decomposition: it is the process of transforming a high
level query into a relational algebra query, and to check
that the query is syntactically and semantically correct.
Query decomposition consists of parsing and validation
4
Query Processing (cont…)
 Typical stages in query decomposition are:
 Analysis: lexical and syntactical analysis of the query
(correctness).

Query tree will be built for the query containing leaf
node for base relations, one or many non-leaf
nodes for relations produced by relational algebra
operations and root node for the result of the query.
Sequence of operation is from the leaves to the
root.
 Normalization: convert the query into a normalized
form. The predicate WHERE will be converted to
Conjunctive () or Disjunctive () Normal form.
5
Query Processing (cont…)
 Typical stages in query decomposition(cont…)
 Semantic Analysis: to reject normalized queries that are not
correctly formulated or contradictory.

Incorrect if components do not contribute to generate
result.

Contradictory if the predicate can not be satisfied by any
tuple.
 Simplification: to detect redundant qualifications, eliminate
common sub-expressions, and transform the query to a
semantically equivalent but more easily and effectively
computed form.
 Query Restructuring When more than one translation is
possible, use transformation rules to re arranging nodes so
that the most restrictive condition will be executed first.
6
7
Transformation Rules for Relational Algebra
 Relational algebra operations:

Select, Project , Join, Union, Intersection, Cartesian Poduct
-
refer the text book for detail understanding
 General Transformation rules for relational
algebra
1. Cascade of s: A conjunctive selection condition can be
broken up into a cascade (sequence) of individual s
operations:

s c1 AND c2 AND ... AND cn(R) = sc1 (sc2 (...(scn(R))...) )
2. Commutativity of s:

The s operation is commutative:

sc1 (sc2(R)) = sc2 (sc1(R))
3. Cascade of p: In a cascade (sequence) of p operations, all
but the last one can be ignored:

pList1 (pList2 (...(pListn(R))...) ) = pList1(R)
8
Transformation Rules for Relational Algebra
(cont…)
4. Commuting s with p:
 If the selection condition c involves only the
attributes A1, ..., An in the projection list, the two
operations can be commuted:
 pA1, A2, ..., An (sc (R)) = sc (pA1, A2, ..., An (R))
5. Commuting of and X: both are commutative
R c S = S c R, RXS=SXR
9
Transformation Rules for Relational Algebra
(cont…)
6. Commuting p with : If projection list is L = {A1, ..., An,
B1, ..., Bm}, where A1, ..., An are attributes of R and
B1, ..., Bm are attributes of S and the join condition c
involves only attributes in L, the two operations can be
commuted as follows:

pL ( R C S ) = (pA1, ..., An (R)) C (p B1, ..., Bm (S))
7. Converting a (s, x) sequence into : If the condition c of
a s that follows a x Corresponds to a join condition,
convert the (s, x) sequence into a as follows:
(sC (R x S)) = (R C S)
Transformation Rules for Relational
Algebra (cont…)
 Reading Assignment
 8. Commutativity of THETA JOIN/Cartesian Product
R X S is equivalent to S X R
Also holds for Equi-Join and Natural-Join
(R c1S)= (S c1R)
 9. Commutativity of SELECTION with THETA JOIN
 If the predicate c1 involves only attributes of one of the
relations (R) being joined, then the Selection and Join
operations commute
10. Commuting SELECTION with SET OPERATIONS
11. Commuting PROJECTION with UNION
10
11
Query Processing
 Query Block:

It is the basic unit that can be translated into the
algebraic operators
 A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
 Nested queries within a query are identified as
separate query blocks.

Example: Find the name of employees whose
salary is below the average salary of all employees
who are working at department number 5
12
Query Processing (cont…)
SELECT FNAME, LNAME
FROM EMPLOYEE
WHERE SALARY < C ( SELECT AVG (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
SELECT AVG (SALARY)
FROM EMPLOYEE
WHERE DNO = 5
SELECT FNAME, LNAME
FROM EMPLOYEE
WHERE SALARY < C
πFNAME, LNAME (σSALARY<C(EMPLOYEE)) ℱAVGSALARY (σDNO=5 (EMPLOYEE))
13
Basic algorithms for executing query
operations
 External sorting:
 Refers to sorting algorithms that are suitable for large
files of records stored on disk that do not fit entirely in
main memory, such as most database files
 External Sorting Uses Sort-Merge Strategy:
 Starts by sorting small sub files (runs) of the main file
and then merges the sorted runs, creating larger sorted
sub files that are merged in turn
 Sorting Phase:

Number of subfiles (runs) nR = (b/nB)
 Merging phase:

Degree of merging(dM) = Min (nB-1, nR); nP =
(logdM(nR))
14
Sort-Merge strategy (cont…)
 nR: number of initial runs;
 b: number of file blocks;
 nB: available buffer space;
 dM: degree of merging;
 P: number of passes
 The size of a run and number of initial run depends on the
number of file blocks (b) and available buffer space (nB)
 Example: if nB=5 blocks and size of the file=1024 blocks,
 nR=(b/nB)= (1024/5) =205 runs each of size 5 blocks
except the last run which will have 4 blocks.
 Hence, after the sort phase, 205 sorted runs are stored
as temporary subfiles on disk
15
Sort-Merge strategy (cont…)
 In the merging phase, the sorted runs are merged during
one or more passes.
 The degree of merging (dM) is the number of runs that
can be merged in each pass.
 dM=min((nB-1) and nR))
 The number of passes=[(logdM
(nR)
)]
 In each pass, one buffer block is needed to hold one block
from each of the runs being merged and one block is
needed for containing one block of the merge result
16
Sort-Merge strategy (cont…)
 In the above example, dM=4(four way merging)
 Hence, the 205 initial sorted runs would be merged into:
 52 at the end of the first pass
 13 at the end of the second pass
 4 at the end of the third pass
 1 at the end of the fourth pass
 The minimum dM of 2 gives the worst case performance
of the algorithm, which is

2*b +2(*(b*(log2
(nR)
)
 The first term represents the number of block access for
the sort phase and the second term represents the
number of block access for the merge phase
Use the following logical model to understand the
discussions in this chapter
17
Implementing SELECT operation
18
 There are many options for executing a SELECT
operation
 Some options depend on the file having specific
access paths and may apply only to certain types of
selection conditions
 Examples: Use the logical model provided in the
previous slide to understand the following operations:
(OP1) σSsn=12345689 (EMPLOYEE)
equality comparison on key attribute
(OP2) σDNUMBER > 5 (DEPARMENT)
nonequality comparison on key attribute
(OP3) σDNO=5 (EMPLOYEE)
equality comparison on non key attribute
(OP4) σDNO=5 AND SALARY >30000 AND SEX=F(EMPLOYEE)
conjunctive condition
(OP5) σSsn=123456789 AND PNO=10 (WORKS_ON)
conjunctive condition and composite key
19
Implementing the SELECT Operation (cont…)
 Search Methods for implementing Simple Selection:

S1 Linear search (brute force):

Retrieve every record in the file, and test whether its attribute
values satisfy the selection condition.

S2 Binary search:

If the selection condition involves an equality comparison on a
key attribute on which the file is ordered, binary search (which
is more efficient than linear search) can be used. An example
is OP1 if IDNo is the ordering attribute for EMPLOYEE file

S3 Using a primary index or hash key to retrieve a
single record:

If the selection condition involves an equality comparison on a
key attribute with a primary index (or a hash key), use the
primary index (or the hash key) to retrieve the record.

For Example, OP1 use primary index to retrieve the record
20
Implementing the SELECT Operation (cont…)
 Search Methods for implementing Simple Selection(cont…)
 S4 Using a primary index to retrieve multiple records:

If the comparison condition is >, ≥, <, or ≤ on a key
field with a primary index, use the index to find the
record satisfying the corresponding equality condition,
then retrieve all subsequent records in the (ordered)
file. (see OP2)

S5 Using a clustering index to retrieve multiple
records:

If the selection condition involves an equality
comparison on a non-key attribute with a clustering
index, use the clustering index to retrieve all the
records satisfying the selection condition. (See OP3)
21
Implementing the SELECT Operation
(cont…)

S6: using a secondary index on an equality
comparison:
• This search method can be used to retrieve a single record if
the indexing field is a key (has unique values) or to retrieve
multiple records if the indexing field is not a key
• Search Methods for implementing complex Selection

S7: Conjunctive selection using an individual index :

If an attribute involved in any single simple condition in
the conjunctive condition has an access path that
permits the use of one of the methods S2 to S5, use that
condition to retrieve the records and then check whether
each retrieved record satisfies the remaining simple
conditions in the conjunctive condition
22
Implementing the SELECT Operation (summarized)
 Whenever a single condition specifies the selection, we
can only check whether an access path exists on the
attribute involved in that condition.

If an access path exists, the method corresponding
to that access path is used;

Otherwise, the “brute force” linear search approach
of method S1 is used
 For conjunctive selection conditions, whenever
more than one of the attributes involved in the
conditions have an access path, query optimization
should be done to choose the access path that
retrieves the fewest records in the most efficient way
23
Implementing the SELECT Operation
(summarized)
 Disjunctive selection conditions: This is a situation where
simple conditions are connected by the OR logical
connective rather than AND
 Compared to conjunctive selection, It is much harder to
process and optimize
 Example:s DNO=5 OR SALARY>3000 OR SEX=‘F‘ (EMPLOYEE)
 Little optimization can be done because the records
satisfying the disjunctive condition are the union of the
records satisfying the individual conditions
 Hence, if any of the individual conditions does not have
an access path, we are compelled to use the brute force
approach
24
Implementing the JOIN Operation:
 The join operation is one of the most time
consuming operation in query processing
 Join

two–way join: a join on two files

e.g. R A=B S

multi-way joins: joins involving more than two files

e.g. R A=B S C=D T
 Examples

(OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT

(OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE
25
Implementing the JOIN Operation(cont…)
 Methods for implementing joins:
 J1 Nested-loop join (brute force):

For each record t in R (outer loop), retrieve every
record s from S (inner loop) and test whether the
two records satisfy the join condition t[A] = s[B]
 J2 Single-loop join (Using an access structure to
retrieve the matching records):

If an index (or hash key) exists for one of the two
join attributes — say, B of S — retrieve each record
t in R, one at a time, and then use the access
structure to retrieve directly all matching records s
from S that satisfy s[B] = t[A].
26
Algorithms for SELECT and JOIN
Operations (cont…)
 Implementing the JOIN Operation (cont...):
 Factors affecting JOIN performance
 Available buffer space
 Join selection factor
 Choice of inner VS outer relation
 Selectivity
 Is ratio of the number of records (tuples) that satisfy
the condition to the total number of records (tuples) in
the file (relation
Algorithms for SELECT and JOIN
Operations (cont…)
 is a number between zero and 1
 zero selectivity means no records satisfy the
condition and 1 means all the records satisfy the
condition.
 Although exact selectivities of all conditions may
not be available, estimates of selectivities are
often kept in the DBMS catalog and are used by
the optimizer
27
28
Algorithms for PROJECT and SET
Operations
 Algorithm for PROJECT operations ()
 <attribute list>(R)
1. If <attribute list> has a key of relation R, extract all tuples
from R with only the values for the attributes in <attribute
list>.
2. If <attribute list> does NOT include a key of relation R,
duplicated tuples must be removed from the results.
 Methods to remove duplicate tuples
1. Sorting
2. Hashing
Query Optimization
 Given the database structure, the challenge of query
optimization is to find the sequence of steps that produces
the answer to user request in the most efficient manner
 The performance of a query is affected by the tables or
queries that underlies the query and by the complexity of the
query.
 Given a request for data manipulation or retrieval, an
optimizer will choose an optimal plan for evaluating the
request from among the available alternative strategies.
 There are many ways (access paths) for accessing desired
file/record. The optimizer tries to select the most efficient
(cheapest) access path for accessing the data.
 DBMS is responsible to pick the best execution strategy
based on various considerations.
29
Approaches to Query Optimization
 Heuristics Approach
 The heuristic approach uses the knowledge of the
characteristics of the relational algebra operations and
the relationship between the operators to optimize the
query.
 Thus the heuristic approach of optimization will make
use of:

Properties of individual operators

Association between operators

Query Tree: a graphical representation of the
operators, relations, attributes and predicates and
processing sequence during query processing.
30
31
Using Heuristics in Query Optimization
 Process for heuristics optimization
1. The parser of a high-level query generates an initial
internal representation;
2. Apply heuristics rules to optimize the internal
representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query
 The main heuristic is to apply first the operations that
reduce the size of intermediate results
 E.g., Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations
32
Using Heuristics in Query Optimization
(cont…)
 Heuristic Optimization of Query Trees:
 The same query could correspond to many different
relational algebra expressions — and hence many different
query trees.
 The task of heuristic optimization of query trees is to find a
final query tree that is efficient to execute.
 Example: Find the name of all employees who are working
in AQUARIUS project and born before ‘1957-12-31’
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON,
PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE < ‘1957-12-31’;
Steps in Typical Heuristic optimization
1. Deconstruct conjunctive selections into a sequence of
single selection operations (Equiv. rule 1.).
2. Move selection operations down the query tree for the
earliest possible execution
3. Execute first those selection and join operations that will
produce the smallest relations
4. Replace Cartesian product operations that are followed by
a selection condition by join operations
5. Deconstruct and move as far down the tree as possible
lists of projection attributes, creating new projections
where needed
33
34
Steps in converting a query tree during heuristic optimization:
Initial query tree for the query Q on slide 32
35
Steps in converting a query tree during heuristic optimization:
Moving the select operation down the tree
36
Steps in converting a query tree during heuristic optimization:
Applying the more restrictive select operation first
37
Steps in converting a query tree during heuristic optimization:
Replacing Cartesian product and select with join
38
Steps in converting a query tree during heuristic optimization:
Moving project operations down the query tree
39
Heuristics in Query Optimization (cont…)
 Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations that
reduce the size of intermediate results
2. Perform select operations as early as possible to reduce
the number of tuples and perform project operations as
early as possible to reduce the number of attributes. (This
is done by moving select and project operations as far
down the tree as possible.)
3. The select and join operations that are most restrictive
should be executed before other similar operations. (This
is done by reordering the leaf nodes of the tree among
themselves and adjusting the rest of the tree
appropriately.)
40
Using Selectivity and Cost Estimates in
Query Optimization
 Cost-based query optimization:
 Estimate and compare the costs of executing a query
using different execution strategies and choose the
strategy with the lowest cost estimate
 Cost Components for Query Execution
1. Access cost to secondary storage
2. Computation cost
3. Memory usage cost
4. Communication cost
 Note: Different database systems may focus on different
cost components.

41
Using Selectivity and Cost Estimates in
Query Optimization (cont…)
 Catalog Information Used in Cost Functions
 Information about the size of a file

number of records (tuples) (r),

record size (R),

number of blocks (b)

blocking factor (bfr)
 Information about indexes and indexing attributes of a file

Number of levels (x) of each multilevel index

Number of first-level index blocks (bI1)

Number of distinct values (d) of an attribute

Selectivity (sl) of an attribute
42
Semantic Query Optimization
 Semantic Query Optimization:
 Uses constraints specified on the database schema in order to
modify one query into another query that is more efficient to
execute
 Read more about:
 Sematic query optimization and
 Semantic Query Optimization (SQO) Techniques

Chapter 4 - Query Processing and Optimization.pptx

  • 1.
    1 Chapter 4 Query Processingand Optimization Query processing : • In general, a query is a form of questioning, in a line of inquiry • A query language is a language in which user requests information from the database  The aim of query processing is to find information in one or more databases and deliver it to the user Query optimization: It is the process of choosing a suitable execution strategy for processing a query  Two internal representations of a query:  Query Tree  Query graph
  • 2.
    Query Processing andOptimization  Scanner: identify language components. keywords, attribute, relation names  Parser : check query system  Validation: check attributes & relations  Query tree (query graph) : internal representation  Execution strategy: plan  Query optimization : choose a strategy (reasonably efficient strategy)
  • 3.
  • 4.
    Query Processing (cont…) Query Processing can be divided into four main phases:  Decomposition:  Optimization  Code generation, and  Execution  Decomposition: it is the process of transforming a high level query into a relational algebra query, and to check that the query is syntactically and semantically correct. Query decomposition consists of parsing and validation 4
  • 5.
    Query Processing (cont…) Typical stages in query decomposition are:  Analysis: lexical and syntactical analysis of the query (correctness).  Query tree will be built for the query containing leaf node for base relations, one or many non-leaf nodes for relations produced by relational algebra operations and root node for the result of the query. Sequence of operation is from the leaves to the root.  Normalization: convert the query into a normalized form. The predicate WHERE will be converted to Conjunctive () or Disjunctive () Normal form. 5
  • 6.
    Query Processing (cont…) Typical stages in query decomposition(cont…)  Semantic Analysis: to reject normalized queries that are not correctly formulated or contradictory.  Incorrect if components do not contribute to generate result.  Contradictory if the predicate can not be satisfied by any tuple.  Simplification: to detect redundant qualifications, eliminate common sub-expressions, and transform the query to a semantically equivalent but more easily and effectively computed form.  Query Restructuring When more than one translation is possible, use transformation rules to re arranging nodes so that the most restrictive condition will be executed first. 6
  • 7.
    7 Transformation Rules forRelational Algebra  Relational algebra operations:  Select, Project , Join, Union, Intersection, Cartesian Poduct - refer the text book for detail understanding  General Transformation rules for relational algebra 1. Cascade of s: A conjunctive selection condition can be broken up into a cascade (sequence) of individual s operations:  s c1 AND c2 AND ... AND cn(R) = sc1 (sc2 (...(scn(R))...) ) 2. Commutativity of s:  The s operation is commutative:  sc1 (sc2(R)) = sc2 (sc1(R)) 3. Cascade of p: In a cascade (sequence) of p operations, all but the last one can be ignored:  pList1 (pList2 (...(pListn(R))...) ) = pList1(R)
  • 8.
    8 Transformation Rules forRelational Algebra (cont…) 4. Commuting s with p:  If the selection condition c involves only the attributes A1, ..., An in the projection list, the two operations can be commuted:  pA1, A2, ..., An (sc (R)) = sc (pA1, A2, ..., An (R)) 5. Commuting of and X: both are commutative R c S = S c R, RXS=SXR
  • 9.
    9 Transformation Rules forRelational Algebra (cont…) 6. Commuting p with : If projection list is L = {A1, ..., An, B1, ..., Bm}, where A1, ..., An are attributes of R and B1, ..., Bm are attributes of S and the join condition c involves only attributes in L, the two operations can be commuted as follows:  pL ( R C S ) = (pA1, ..., An (R)) C (p B1, ..., Bm (S)) 7. Converting a (s, x) sequence into : If the condition c of a s that follows a x Corresponds to a join condition, convert the (s, x) sequence into a as follows: (sC (R x S)) = (R C S)
  • 10.
    Transformation Rules forRelational Algebra (cont…)  Reading Assignment  8. Commutativity of THETA JOIN/Cartesian Product R X S is equivalent to S X R Also holds for Equi-Join and Natural-Join (R c1S)= (S c1R)  9. Commutativity of SELECTION with THETA JOIN  If the predicate c1 involves only attributes of one of the relations (R) being joined, then the Selection and Join operations commute 10. Commuting SELECTION with SET OPERATIONS 11. Commuting PROJECTION with UNION 10
  • 11.
    11 Query Processing  QueryBlock:  It is the basic unit that can be translated into the algebraic operators  A query block contains a single SELECT-FROM- WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block.  Nested queries within a query are identified as separate query blocks.  Example: Find the name of employees whose salary is below the average salary of all employees who are working at department number 5
  • 12.
    12 Query Processing (cont…) SELECTFNAME, LNAME FROM EMPLOYEE WHERE SALARY < C ( SELECT AVG (SALARY) FROM EMPLOYEE WHERE DNO = 5); SELECT AVG (SALARY) FROM EMPLOYEE WHERE DNO = 5 SELECT FNAME, LNAME FROM EMPLOYEE WHERE SALARY < C πFNAME, LNAME (σSALARY<C(EMPLOYEE)) ℱAVGSALARY (σDNO=5 (EMPLOYEE))
  • 13.
    13 Basic algorithms forexecuting query operations  External sorting:  Refers to sorting algorithms that are suitable for large files of records stored on disk that do not fit entirely in main memory, such as most database files  External Sorting Uses Sort-Merge Strategy:  Starts by sorting small sub files (runs) of the main file and then merges the sorted runs, creating larger sorted sub files that are merged in turn  Sorting Phase:  Number of subfiles (runs) nR = (b/nB)  Merging phase:  Degree of merging(dM) = Min (nB-1, nR); nP = (logdM(nR))
  • 14.
    14 Sort-Merge strategy (cont…) nR: number of initial runs;  b: number of file blocks;  nB: available buffer space;  dM: degree of merging;  P: number of passes  The size of a run and number of initial run depends on the number of file blocks (b) and available buffer space (nB)  Example: if nB=5 blocks and size of the file=1024 blocks,  nR=(b/nB)= (1024/5) =205 runs each of size 5 blocks except the last run which will have 4 blocks.  Hence, after the sort phase, 205 sorted runs are stored as temporary subfiles on disk
  • 15.
    15 Sort-Merge strategy (cont…) In the merging phase, the sorted runs are merged during one or more passes.  The degree of merging (dM) is the number of runs that can be merged in each pass.  dM=min((nB-1) and nR))  The number of passes=[(logdM (nR) )]  In each pass, one buffer block is needed to hold one block from each of the runs being merged and one block is needed for containing one block of the merge result
  • 16.
    16 Sort-Merge strategy (cont…) In the above example, dM=4(four way merging)  Hence, the 205 initial sorted runs would be merged into:  52 at the end of the first pass  13 at the end of the second pass  4 at the end of the third pass  1 at the end of the fourth pass  The minimum dM of 2 gives the worst case performance of the algorithm, which is  2*b +2(*(b*(log2 (nR) )  The first term represents the number of block access for the sort phase and the second term represents the number of block access for the merge phase
  • 17.
    Use the followinglogical model to understand the discussions in this chapter 17
  • 18.
    Implementing SELECT operation 18 There are many options for executing a SELECT operation  Some options depend on the file having specific access paths and may apply only to certain types of selection conditions  Examples: Use the logical model provided in the previous slide to understand the following operations: (OP1) σSsn=12345689 (EMPLOYEE) equality comparison on key attribute (OP2) σDNUMBER > 5 (DEPARMENT) nonequality comparison on key attribute (OP3) σDNO=5 (EMPLOYEE) equality comparison on non key attribute (OP4) σDNO=5 AND SALARY >30000 AND SEX=F(EMPLOYEE) conjunctive condition (OP5) σSsn=123456789 AND PNO=10 (WORKS_ON) conjunctive condition and composite key
  • 19.
    19 Implementing the SELECTOperation (cont…)  Search Methods for implementing Simple Selection:  S1 Linear search (brute force):  Retrieve every record in the file, and test whether its attribute values satisfy the selection condition.  S2 Binary search:  If the selection condition involves an equality comparison on a key attribute on which the file is ordered, binary search (which is more efficient than linear search) can be used. An example is OP1 if IDNo is the ordering attribute for EMPLOYEE file  S3 Using a primary index or hash key to retrieve a single record:  If the selection condition involves an equality comparison on a key attribute with a primary index (or a hash key), use the primary index (or the hash key) to retrieve the record.  For Example, OP1 use primary index to retrieve the record
  • 20.
    20 Implementing the SELECTOperation (cont…)  Search Methods for implementing Simple Selection(cont…)  S4 Using a primary index to retrieve multiple records:  If the comparison condition is >, ≥, <, or ≤ on a key field with a primary index, use the index to find the record satisfying the corresponding equality condition, then retrieve all subsequent records in the (ordered) file. (see OP2)  S5 Using a clustering index to retrieve multiple records:  If the selection condition involves an equality comparison on a non-key attribute with a clustering index, use the clustering index to retrieve all the records satisfying the selection condition. (See OP3)
  • 21.
    21 Implementing the SELECTOperation (cont…)  S6: using a secondary index on an equality comparison: • This search method can be used to retrieve a single record if the indexing field is a key (has unique values) or to retrieve multiple records if the indexing field is not a key • Search Methods for implementing complex Selection  S7: Conjunctive selection using an individual index :  If an attribute involved in any single simple condition in the conjunctive condition has an access path that permits the use of one of the methods S2 to S5, use that condition to retrieve the records and then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive condition
  • 22.
    22 Implementing the SELECTOperation (summarized)  Whenever a single condition specifies the selection, we can only check whether an access path exists on the attribute involved in that condition.  If an access path exists, the method corresponding to that access path is used;  Otherwise, the “brute force” linear search approach of method S1 is used  For conjunctive selection conditions, whenever more than one of the attributes involved in the conditions have an access path, query optimization should be done to choose the access path that retrieves the fewest records in the most efficient way
  • 23.
    23 Implementing the SELECTOperation (summarized)  Disjunctive selection conditions: This is a situation where simple conditions are connected by the OR logical connective rather than AND  Compared to conjunctive selection, It is much harder to process and optimize  Example:s DNO=5 OR SALARY>3000 OR SEX=‘F‘ (EMPLOYEE)  Little optimization can be done because the records satisfying the disjunctive condition are the union of the records satisfying the individual conditions  Hence, if any of the individual conditions does not have an access path, we are compelled to use the brute force approach
  • 24.
    24 Implementing the JOINOperation:  The join operation is one of the most time consuming operation in query processing  Join  two–way join: a join on two files  e.g. R A=B S  multi-way joins: joins involving more than two files  e.g. R A=B S C=D T  Examples  (OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT  (OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE
  • 25.
    25 Implementing the JOINOperation(cont…)  Methods for implementing joins:  J1 Nested-loop join (brute force):  For each record t in R (outer loop), retrieve every record s from S (inner loop) and test whether the two records satisfy the join condition t[A] = s[B]  J2 Single-loop join (Using an access structure to retrieve the matching records):  If an index (or hash key) exists for one of the two join attributes — say, B of S — retrieve each record t in R, one at a time, and then use the access structure to retrieve directly all matching records s from S that satisfy s[B] = t[A].
  • 26.
    26 Algorithms for SELECTand JOIN Operations (cont…)  Implementing the JOIN Operation (cont...):  Factors affecting JOIN performance  Available buffer space  Join selection factor  Choice of inner VS outer relation  Selectivity  Is ratio of the number of records (tuples) that satisfy the condition to the total number of records (tuples) in the file (relation
  • 27.
    Algorithms for SELECTand JOIN Operations (cont…)  is a number between zero and 1  zero selectivity means no records satisfy the condition and 1 means all the records satisfy the condition.  Although exact selectivities of all conditions may not be available, estimates of selectivities are often kept in the DBMS catalog and are used by the optimizer 27
  • 28.
    28 Algorithms for PROJECTand SET Operations  Algorithm for PROJECT operations ()  <attribute list>(R) 1. If <attribute list> has a key of relation R, extract all tuples from R with only the values for the attributes in <attribute list>. 2. If <attribute list> does NOT include a key of relation R, duplicated tuples must be removed from the results.  Methods to remove duplicate tuples 1. Sorting 2. Hashing
  • 29.
    Query Optimization  Giventhe database structure, the challenge of query optimization is to find the sequence of steps that produces the answer to user request in the most efficient manner  The performance of a query is affected by the tables or queries that underlies the query and by the complexity of the query.  Given a request for data manipulation or retrieval, an optimizer will choose an optimal plan for evaluating the request from among the available alternative strategies.  There are many ways (access paths) for accessing desired file/record. The optimizer tries to select the most efficient (cheapest) access path for accessing the data.  DBMS is responsible to pick the best execution strategy based on various considerations. 29
  • 30.
    Approaches to QueryOptimization  Heuristics Approach  The heuristic approach uses the knowledge of the characteristics of the relational algebra operations and the relationship between the operators to optimize the query.  Thus the heuristic approach of optimization will make use of:  Properties of individual operators  Association between operators  Query Tree: a graphical representation of the operators, relations, attributes and predicates and processing sequence during query processing. 30
  • 31.
    31 Using Heuristics inQuery Optimization  Process for heuristics optimization 1. The parser of a high-level query generates an initial internal representation; 2. Apply heuristics rules to optimize the internal representation. 3. A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query  The main heuristic is to apply first the operations that reduce the size of intermediate results  E.g., Apply SELECT and PROJECT operations before applying the JOIN or other binary operations
  • 32.
    32 Using Heuristics inQuery Optimization (cont…)  Heuristic Optimization of Query Trees:  The same query could correspond to many different relational algebra expressions — and hence many different query trees.  The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute.  Example: Find the name of all employees who are working in AQUARIUS project and born before ‘1957-12-31’ Q: SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME = ‘AQUARIUS’ AND PNMUBER=PNO AND ESSN=SSN AND BDATE < ‘1957-12-31’;
  • 33.
    Steps in TypicalHeuristic optimization 1. Deconstruct conjunctive selections into a sequence of single selection operations (Equiv. rule 1.). 2. Move selection operations down the query tree for the earliest possible execution 3. Execute first those selection and join operations that will produce the smallest relations 4. Replace Cartesian product operations that are followed by a selection condition by join operations 5. Deconstruct and move as far down the tree as possible lists of projection attributes, creating new projections where needed 33
  • 34.
    34 Steps in convertinga query tree during heuristic optimization: Initial query tree for the query Q on slide 32
  • 35.
    35 Steps in convertinga query tree during heuristic optimization: Moving the select operation down the tree
  • 36.
    36 Steps in convertinga query tree during heuristic optimization: Applying the more restrictive select operation first
  • 37.
    37 Steps in convertinga query tree during heuristic optimization: Replacing Cartesian product and select with join
  • 38.
    38 Steps in convertinga query tree during heuristic optimization: Moving project operations down the query tree
  • 39.
    39 Heuristics in QueryOptimization (cont…)  Summary of Heuristics for Algebraic Optimization: 1. The main heuristic is to apply first the operations that reduce the size of intermediate results 2. Perform select operations as early as possible to reduce the number of tuples and perform project operations as early as possible to reduce the number of attributes. (This is done by moving select and project operations as far down the tree as possible.) 3. The select and join operations that are most restrictive should be executed before other similar operations. (This is done by reordering the leaf nodes of the tree among themselves and adjusting the rest of the tree appropriately.)
  • 40.
    40 Using Selectivity andCost Estimates in Query Optimization  Cost-based query optimization:  Estimate and compare the costs of executing a query using different execution strategies and choose the strategy with the lowest cost estimate  Cost Components for Query Execution 1. Access cost to secondary storage 2. Computation cost 3. Memory usage cost 4. Communication cost  Note: Different database systems may focus on different cost components. 
  • 41.
    41 Using Selectivity andCost Estimates in Query Optimization (cont…)  Catalog Information Used in Cost Functions  Information about the size of a file  number of records (tuples) (r),  record size (R),  number of blocks (b)  blocking factor (bfr)  Information about indexes and indexing attributes of a file  Number of levels (x) of each multilevel index  Number of first-level index blocks (bI1)  Number of distinct values (d) of an attribute  Selectivity (sl) of an attribute
  • 42.
    42 Semantic Query Optimization Semantic Query Optimization:  Uses constraints specified on the database schema in order to modify one query into another query that is more efficient to execute  Read more about:  Sematic query optimization and  Semantic Query Optimization (SQO) Techniques