CHAPTER - 2
QUERY PROCESSING AND
OPTIMIZATION
1
Objectives of the chapter
At the end of the chapter : the student able to understand :-
 Introduction
 Query Processing steps
 Translating SQL Queries into Relational Algebra
 Heuristic Rules in Query Optimization
 Selectivity and Cost Estimates in Query Optimization
2
3
Introduction of Query
Processing
Overview of Query Processing
 A query can either be a request for data results from your database or for
action on the data, or for both.
 A query can give you an answer to a simple question, combine data from
different tables, add, change, or delete data from a database.
 A database query is the vehicle for instructing a DBMS to update or
retrieve specific data to/from the physically stored medium.
 Query processing refers to the range of activities involved in
extracting data from a database.
Cont’d
 The query processing activities includes transform a query written in a high-
level language, typically SQL into expressions that can be used at the physical
level of the file system, a variety of query-optimizing transformations, into
correct and efficient execution strategy expressed in a low-level language
(implementing the relational algebra), and to execute the strategy to retrieve the
required data.
 Examples of such operations for a relational DBMS can be relational algebra
operations such as project, join, select, Cartesian product, etc.
 Query Processing takes various steps for fetching the data from the database.
6
Query Processing Steps
Query Processing Steps
 There are three phases that a query passes through during the DBMS’
processing of that query:
 Decomposition (Parsing and translation)
 Optimization
 Evaluation
1. Query Decomposition
 Query decomposition is the first phase of query processing.
 The aims of query decomposition are to transform a high-level query into a
relational algebra query and to check whether the query is syntactically and
semantically correct.
 A query expressed in a high-level query language such as SQL must first be
scanned, parsed, and validated.
Cont’d
 Scanner - Identifies the language tokens—such as SQL keywords, attribute
names, and relation names—that appear in the text of the query.
 Parser - Checks the query syntax to determine whether it is formulated
according to the syntax rules (rules of grammar) of the query language.
 Validator - Validate by checking that all attribute and relation names are valid
and semantically meaningful names in the schema of the particular database
being queried.
Cont’d
 Parser extract the tokens from the raw string of characters and translate
them into the corresponding internal data elements (i.e. Relational
algebra operations and operands) and structures.
 An internal representation (Translator) of the query is then created, usually
as a tree data structure called a query tree or a graph data structure called a
query graph.
2. Query Optimization Process
 Optimization Definition from Dictionary:- is finding an alternative with the most cost
effective or highest achievable performance under the given constraints.
 In second stage, the query processor applies rules to the internal data structures of the query
to transform these structures into equivalent, but more efficient representations.
 Selecting the proper rules to apply, when to apply them and how they are applied is the
function of the query optimization engine.
 A query typically has many possible execution strategies, and the process of choosing a
suitable one for processing a query is known as query optimization.
3. Query Evaluation
 The final step in processing a query is the evaluation phase.
 The best evaluation plan candidates generated by the optimization engine is selected
and then executed.
 Code generator generates the code to execute that plan either in compiled or
interpreted mode.
 The runtime database processor has the task of running (executing) the query code,
whether in compiled or interpreted mode, to produce the query result.
Figure 1:- for Steps in Query Processing
Figure 2:- for Steps in Query Processing
15
Translating SQL Queries into
Relational Algebra (RA)
Translating SQL Queries into Relational
Algebra
 An SQL query is first translated into an equivalent extended relational algebra
expression—represented as a query tree data structure—that is then optimized.
 SQL queries are decomposed into query blocks, (Query block - basic units that can
be translated into the algebraic operators and optimized.
 Operations for a relational DBMS can be relational algebra operations such as
project, join, select, Cartesian product, etc.
 Nested(inner) queries within a query are identified as separate query block
Generating Execution Plan –Example 1
 SQL QUERY:- 1, SELECT title, price FROM book WHERE price >50
 This can be translated into either of the following RELATIONAL ALGEBRA
expressions:
1. σprice>50(Πtitle,price(book))
2. Πtitle,price (σprice>50 (book))
Generating Execution Plan –Example 1
 This can also be represented as either of the following query trees:
Generating Execution Plan –Example 2
 SQL QUERY:- 2, SELECT Fname, Lname FROM Employee WHERE age>25 AND
sex=female
 This can be translated into either of the following RELATIONAL ALGEBRA
expressions:
1. σage>25 AND σsex=female(ΠFname,Lname ( Employee))
2. ΠFname,Lname (σage>25 AND σsex=female (Employee))
Generating Execution Plan –Example 2
 This can also be represented as either of the following query trees:
σage>25 AND σsex=female ΠFname,Lname
ΠFname,Lname σage>25 AND σsex=female
Employee Employee
Generating Execution Plan –Example 3
 SQL QUERY:- 3, SELECT balance FROM account WHERE balance < 2500
 This can be translated into either of the following RELATIONAL ALGEBRA
expressions:
1. σbalance< 2500(Πbalance(account))
2. Πbalance(σbalance<2500(account))
Generating Execution Plan –Example 3
 This can also be represented as either of the following query trees:
Generating Execution Plan –Example 4
SQL QUERY:- 4, SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary> (SELECT MAX(Salary)
FROM EMPLOYEE
WHERE Dno=5 );
 This query retrieves the names of employees (from any department in the
company) who earn a salary that is greater than the highest salary in department 5.
 The query includes a nested subquery and hence would be decomposed into
two blocks.
Generating Execution Plan –Example 4
 The inner block is:
(SELECT MAX(Salary) FROM EMPLOYEE WHERE Dno=5 )
 This retrieves the highest salary in department 5.
 The outer query block is:
SELECT Lname, Fname FROM EMPLOYEE WHERE Salary >c
 where c represents the result returned from the inner block.
Generating Execution Plan –Example 4
The inner block could be translated into the following extended relational algebra
expression:
ΠMAX Salary(σDno=5(EMPLOYEE))
And the outer block into the expression:
ΠLname,Fname(σSalary>C(EMPLOYEE))
 The query optimizer would then choose an execution plan for each query block.
 Notice that in the above example, the inner block needs to be evaluated only once to
produce the maximum salary of employees in department 5, which is then used as the
constant c by the outer block.
26
Heuristics Approach to Query
Optimization
Heuristics Approach to Query Optimization
 Heuristic rules are used to modify the internal representation of a query which is
usually in the form of a query tree or a query graph data structure to improve its
expected performance.
 The rules typically reorder the operation in query tree.
 The scanner and parser of an SQL query first generate a data structure that
corresponds to an initial query representation, which is then optimized according to
heuristic rules.
 This leads to an optimized query representation, which corresponds to the query
execution strategy.
Cont’d
 One of the main heuristic rules is to apply SELECT and PROJECT operations
before applying the JOIN or other binary operations.
 Because the size of the file resulting from a binary operation such as JOIN is
usually a multiplicative function of the sizes of the input files.
 The SELECT and PROJECT operations reduce the size of a file and hence should
be applied before a join or other binary operation.
 Heuristic is rule that works well in most cases but is not guaranteed to work well in
every possible cases.
Notation for Query Trees
 Query tree - Tree data structure that corresponds to a relational algebra expression.
 Input relations of the query as leaf nodes of the tree
 Relational algebra operations as internal nodes.
 An execution of the query tree consists of executing an internal node operation
whenever its operands are available and then replacing that internal node by the
relation that results from executing the operation.
 The order of execution of operations starts at the leaf nodes, and ends at the root
node.
 The execution terminates when the root node operation is executed and produces
the result relation for the query.
Steps in Converting A Query Tree During
Heuristic Optimization
1. Initial (canonical) query tree for SQL query Q.
2. Moving SELECT operations down the query tree.
3. Applying the more restrictive SELECT operation first.
4. Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.
5. Moving PROJECT operations down the query tree execute.
Exercise
CONSIDER THE FOLLOWING TABLE
Employee (Fname,Mname,Lname,SSN,Bdate,Address,Salary,Gender,Dno)
Project (Pname,Pnumber,Plocation,Dnum)
Woks_on (ESSN,Pno,Hours)
 QUERY
SELECT Lname
FROM Employee ,Works_on, Project
WHERE PName=‘Aquaris’AND PNumber=Pno AND
ESSN=SSN AND BDATE>’1977-12-21’ ;
Consider the following relations for the
examples provided here after
Exercise 1:
 List name of students, course taken by them and name of the
department giving the course for all whose cgpa>3.
 QUERY
SELECT Name, Cname, Dname
FROM Student ,Course, Department
WHERE Cid=Cno AND Did=Dno AND Cgpa>3 ;
Exercise 1:
 Relational Algebraic Expression
Exercise 1:
 Step 1 :Draw the Initial (canonical) form
Name,Cname,Dname
 Cid=Cno and Did=Dno and Cgpa>3
×
× Department
Student Course
Exercise 1:
 Step 2: Moving SELECT operations down the query tree
Exercise 1:
 Step 4: Replacing CARTESIAN PRODUCT and SELECT with JOIN operations
 NOTE:- We already jump step 3 due to the
reason we do not have more select operation.
Exercise 1:
 Step 5: Moving PROJECT operations down the query tree execute
39
Using Selectivity and Cost
Estimates in Query Optimization
Using Selectivity and Cost Estimates in Query
Optimization
 Heuristic is rule that works well in most cases but is not guaranteed to work well in
every possible cases.
 A query optimizer does not depend solely on heuristic rules; it also estimates and
compares the costs of executing a query using different execution strategies and
algorithms, and it then chooses the strategy with the lowest cost estimate.
 In addition, the optimizer must limit the number of execution strategies to be
considered; otherwise, too much time will be spent making cost estimates for the
many possible execution strategies.
Cost Components for Query Execution
 The cost of executing a query includes the following components:
1. Access cost to secondary storage: This is the cost of transferring (reading and
writing) data blocks between secondary disk storage and main memory buffers.
This is also known as disk I/O (input/output) cost.
2. Disk storage cost: This is the cost of storing on disk any intermediate files that are
generated by an execution strategy for the query.
Cont’d
3. Computation cost: This is the cost of performing in-memory operations on the records within
the data buffers during query execution.
 Such operations include searching for and sorting records, merging records for a join
or a sort operation, and performing computations on field values.
 This is also known as CPU (central processing unit) cost.
4. Memory usage cost: This is the cost relating to the number of main memory buffers needed
during query execution.
5. Communication cost: This is the cost that is associated with sending or communicating the
query and its results from one place to another. It also includes the cost of transferring the table
and results to the various sites during the process of query evaluation.
Individual Assignment
 Deadline 14/04/2015 E.C
 Algorithms for Executing query
 External sorting
 Sort strategy
 Merge strategy
 Algorithm for select operation
 Methods and Implementation of join operation
43
44

Ch-2-Query-Process.pptx advanced database

  • 1.
    CHAPTER - 2 QUERYPROCESSING AND OPTIMIZATION 1
  • 2.
    Objectives of thechapter At the end of the chapter : the student able to understand :-  Introduction  Query Processing steps  Translating SQL Queries into Relational Algebra  Heuristic Rules in Query Optimization  Selectivity and Cost Estimates in Query Optimization 2
  • 3.
  • 4.
    Overview of QueryProcessing  A query can either be a request for data results from your database or for action on the data, or for both.  A query can give you an answer to a simple question, combine data from different tables, add, change, or delete data from a database.  A database query is the vehicle for instructing a DBMS to update or retrieve specific data to/from the physically stored medium.  Query processing refers to the range of activities involved in extracting data from a database.
  • 5.
    Cont’d  The queryprocessing activities includes transform a query written in a high- level language, typically SQL into expressions that can be used at the physical level of the file system, a variety of query-optimizing transformations, into correct and efficient execution strategy expressed in a low-level language (implementing the relational algebra), and to execute the strategy to retrieve the required data.  Examples of such operations for a relational DBMS can be relational algebra operations such as project, join, select, Cartesian product, etc.  Query Processing takes various steps for fetching the data from the database.
  • 6.
  • 7.
    Query Processing Steps There are three phases that a query passes through during the DBMS’ processing of that query:  Decomposition (Parsing and translation)  Optimization  Evaluation
  • 8.
    1. Query Decomposition Query decomposition is the first phase of query processing.  The aims of query decomposition are to transform a high-level query into a relational algebra query and to check whether the query is syntactically and semantically correct.  A query expressed in a high-level query language such as SQL must first be scanned, parsed, and validated.
  • 9.
    Cont’d  Scanner -Identifies the language tokens—such as SQL keywords, attribute names, and relation names—that appear in the text of the query.  Parser - Checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language.  Validator - Validate by checking that all attribute and relation names are valid and semantically meaningful names in the schema of the particular database being queried.
  • 10.
    Cont’d  Parser extractthe tokens from the raw string of characters and translate them into the corresponding internal data elements (i.e. Relational algebra operations and operands) and structures.  An internal representation (Translator) of the query is then created, usually as a tree data structure called a query tree or a graph data structure called a query graph.
  • 11.
    2. Query OptimizationProcess  Optimization Definition from Dictionary:- is finding an alternative with the most cost effective or highest achievable performance under the given constraints.  In second stage, the query processor applies rules to the internal data structures of the query to transform these structures into equivalent, but more efficient representations.  Selecting the proper rules to apply, when to apply them and how they are applied is the function of the query optimization engine.  A query typically has many possible execution strategies, and the process of choosing a suitable one for processing a query is known as query optimization.
  • 12.
    3. Query Evaluation The final step in processing a query is the evaluation phase.  The best evaluation plan candidates generated by the optimization engine is selected and then executed.  Code generator generates the code to execute that plan either in compiled or interpreted mode.  The runtime database processor has the task of running (executing) the query code, whether in compiled or interpreted mode, to produce the query result.
  • 13.
    Figure 1:- forSteps in Query Processing
  • 14.
    Figure 2:- forSteps in Query Processing
  • 15.
    15 Translating SQL Queriesinto Relational Algebra (RA)
  • 16.
    Translating SQL Queriesinto Relational Algebra  An SQL query is first translated into an equivalent extended relational algebra expression—represented as a query tree data structure—that is then optimized.  SQL queries are decomposed into query blocks, (Query block - basic units that can be translated into the algebraic operators and optimized.  Operations for a relational DBMS can be relational algebra operations such as project, join, select, Cartesian product, etc.  Nested(inner) queries within a query are identified as separate query block
  • 17.
    Generating Execution Plan–Example 1  SQL QUERY:- 1, SELECT title, price FROM book WHERE price >50  This can be translated into either of the following RELATIONAL ALGEBRA expressions: 1. σprice>50(Πtitle,price(book)) 2. Πtitle,price (σprice>50 (book))
  • 18.
    Generating Execution Plan–Example 1  This can also be represented as either of the following query trees:
  • 19.
    Generating Execution Plan–Example 2  SQL QUERY:- 2, SELECT Fname, Lname FROM Employee WHERE age>25 AND sex=female  This can be translated into either of the following RELATIONAL ALGEBRA expressions: 1. σage>25 AND σsex=female(ΠFname,Lname ( Employee)) 2. ΠFname,Lname (σage>25 AND σsex=female (Employee))
  • 20.
    Generating Execution Plan–Example 2  This can also be represented as either of the following query trees: σage>25 AND σsex=female ΠFname,Lname ΠFname,Lname σage>25 AND σsex=female Employee Employee
  • 21.
    Generating Execution Plan–Example 3  SQL QUERY:- 3, SELECT balance FROM account WHERE balance < 2500  This can be translated into either of the following RELATIONAL ALGEBRA expressions: 1. σbalance< 2500(Πbalance(account)) 2. Πbalance(σbalance<2500(account))
  • 22.
    Generating Execution Plan–Example 3  This can also be represented as either of the following query trees:
  • 23.
    Generating Execution Plan–Example 4 SQL QUERY:- 4, SELECT Lname, Fname FROM EMPLOYEE WHERE Salary> (SELECT MAX(Salary) FROM EMPLOYEE WHERE Dno=5 );  This query retrieves the names of employees (from any department in the company) who earn a salary that is greater than the highest salary in department 5.  The query includes a nested subquery and hence would be decomposed into two blocks.
  • 24.
    Generating Execution Plan–Example 4  The inner block is: (SELECT MAX(Salary) FROM EMPLOYEE WHERE Dno=5 )  This retrieves the highest salary in department 5.  The outer query block is: SELECT Lname, Fname FROM EMPLOYEE WHERE Salary >c  where c represents the result returned from the inner block.
  • 25.
    Generating Execution Plan–Example 4 The inner block could be translated into the following extended relational algebra expression: ΠMAX Salary(σDno=5(EMPLOYEE)) And the outer block into the expression: ΠLname,Fname(σSalary>C(EMPLOYEE))  The query optimizer would then choose an execution plan for each query block.  Notice that in the above example, the inner block needs to be evaluated only once to produce the maximum salary of employees in department 5, which is then used as the constant c by the outer block.
  • 26.
    26 Heuristics Approach toQuery Optimization
  • 27.
    Heuristics Approach toQuery Optimization  Heuristic rules are used to modify the internal representation of a query which is usually in the form of a query tree or a query graph data structure to improve its expected performance.  The rules typically reorder the operation in query tree.  The scanner and parser of an SQL query first generate a data structure that corresponds to an initial query representation, which is then optimized according to heuristic rules.  This leads to an optimized query representation, which corresponds to the query execution strategy.
  • 28.
    Cont’d  One ofthe main heuristic rules is to apply SELECT and PROJECT operations before applying the JOIN or other binary operations.  Because the size of the file resulting from a binary operation such as JOIN is usually a multiplicative function of the sizes of the input files.  The SELECT and PROJECT operations reduce the size of a file and hence should be applied before a join or other binary operation.  Heuristic is rule that works well in most cases but is not guaranteed to work well in every possible cases.
  • 29.
    Notation for QueryTrees  Query tree - Tree data structure that corresponds to a relational algebra expression.  Input relations of the query as leaf nodes of the tree  Relational algebra operations as internal nodes.  An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation.  The order of execution of operations starts at the leaf nodes, and ends at the root node.  The execution terminates when the root node operation is executed and produces the result relation for the query.
  • 30.
    Steps in ConvertingA Query Tree During Heuristic Optimization 1. Initial (canonical) query tree for SQL query Q. 2. Moving SELECT operations down the query tree. 3. Applying the more restrictive SELECT operation first. 4. Replacing CARTESIAN PRODUCT and SELECT with JOIN operations. 5. Moving PROJECT operations down the query tree execute.
  • 31.
    Exercise CONSIDER THE FOLLOWINGTABLE Employee (Fname,Mname,Lname,SSN,Bdate,Address,Salary,Gender,Dno) Project (Pname,Pnumber,Plocation,Dnum) Woks_on (ESSN,Pno,Hours)  QUERY SELECT Lname FROM Employee ,Works_on, Project WHERE PName=‘Aquaris’AND PNumber=Pno AND ESSN=SSN AND BDATE>’1977-12-21’ ;
  • 32.
    Consider the followingrelations for the examples provided here after
  • 33.
    Exercise 1:  Listname of students, course taken by them and name of the department giving the course for all whose cgpa>3.  QUERY SELECT Name, Cname, Dname FROM Student ,Course, Department WHERE Cid=Cno AND Did=Dno AND Cgpa>3 ;
  • 34.
    Exercise 1:  RelationalAlgebraic Expression
  • 35.
    Exercise 1:  Step1 :Draw the Initial (canonical) form Name,Cname,Dname  Cid=Cno and Did=Dno and Cgpa>3 × × Department Student Course
  • 36.
    Exercise 1:  Step2: Moving SELECT operations down the query tree
  • 37.
    Exercise 1:  Step4: Replacing CARTESIAN PRODUCT and SELECT with JOIN operations  NOTE:- We already jump step 3 due to the reason we do not have more select operation.
  • 38.
    Exercise 1:  Step5: Moving PROJECT operations down the query tree execute
  • 39.
    39 Using Selectivity andCost Estimates in Query Optimization
  • 40.
    Using Selectivity andCost Estimates in Query Optimization  Heuristic is rule that works well in most cases but is not guaranteed to work well in every possible cases.  A query optimizer does not depend solely on heuristic rules; it also estimates and compares the costs of executing a query using different execution strategies and algorithms, and it then chooses the strategy with the lowest cost estimate.  In addition, the optimizer must limit the number of execution strategies to be considered; otherwise, too much time will be spent making cost estimates for the many possible execution strategies.
  • 41.
    Cost Components forQuery Execution  The cost of executing a query includes the following components: 1. Access cost to secondary storage: This is the cost of transferring (reading and writing) data blocks between secondary disk storage and main memory buffers. This is also known as disk I/O (input/output) cost. 2. Disk storage cost: This is the cost of storing on disk any intermediate files that are generated by an execution strategy for the query.
  • 42.
    Cont’d 3. Computation cost:This is the cost of performing in-memory operations on the records within the data buffers during query execution.  Such operations include searching for and sorting records, merging records for a join or a sort operation, and performing computations on field values.  This is also known as CPU (central processing unit) cost. 4. Memory usage cost: This is the cost relating to the number of main memory buffers needed during query execution. 5. Communication cost: This is the cost that is associated with sending or communicating the query and its results from one place to another. It also includes the cost of transferring the table and results to the various sites during the process of query evaluation.
  • 43.
    Individual Assignment  Deadline14/04/2015 E.C  Algorithms for Executing query  External sorting  Sort strategy  Merge strategy  Algorithm for select operation  Methods and Implementation of join operation 43
  • 44.