Query optimization


Published on

Query Optimization process

Published in: Technology

Query optimization

  1. 1. Query Optimization
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Steps in Cost-based query optimization- Query Flow </li></ul><ul><li>Projection Example </li></ul><ul><li>Query Interaction in DBMS </li></ul><ul><li>Cost-based query Optimization: Algebraic Expressions </li></ul>
  3. 3. Introduction <ul><li>What is Query Optimization? </li></ul><ul><li>Suppose you were given a chance to visit 15 pre-selected different cities in Europe. The only constraint would be ‘Time’ </li></ul><ul><li>-> Would you have a plan to visit the cities in any order? </li></ul>
  4. 4. Europe
  5. 5. <ul><li>Plan: </li></ul><ul><li>-> Place the 15 cities in different groups based on their proximity to each other. </li></ul><ul><li>-> Start with one group and move on to the next group. </li></ul><ul><li>Important point made over here is that you would have visited the cities in a more organized manner, and the ‘Time’ constraint mentioned earlier would have been dealt with efficiently. </li></ul>
  6. 6. <ul><li>Query Optimization works in a similar way: </li></ul><ul><li>There can be many different ways to get an answer from a given query. The result would be same in all scenarios. </li></ul><ul><li>DBMS strive to process the query in the most efficient way (in terms of ‘Time’) to produce the answer. </li></ul><ul><li>Cost = Time needed to get all answers </li></ul>
  7. 7. <ul><li>Starting with System-R, most of the commercial DBMSs use cost-based optimizers. </li></ul><ul><li>The estimation should be accurate and easy. Another important point is the need for being logically consistent because the least cost plan will always be consistently low. </li></ul>
  8. 8. Steps in a Cost-based query optimization <ul><li>Parsing </li></ul><ul><li>Transformation </li></ul><ul><li>Implementation </li></ul><ul><li>Plan selection based on cost estimates </li></ul>
  9. 9. Query Flow Parser Optimizer Code Generator/Interpreter Processor SQL
  10. 10. <ul><li>Query Parser – Verify validity of the SQL statement. Translate query into an internal structure using relational calculus. </li></ul><ul><li>Query Optimizer – Find the best expression from various different algebraic expressions. Criteria used is ‘Cheapness’ </li></ul><ul><li>Code Generator/Interpreter – Make calls for the Query processor as a result of the work done by the optimizer. </li></ul><ul><li>Query Processor – Execute the calls obtained from the code generator. </li></ul>
  11. 11. <ul><li>Cost of physical plans includes processor time and communication time. The most important factor to consider is disk I/Os because it is the most time consuming action. </li></ul><ul><li>Some other costs associated are: </li></ul><ul><li>- Operations (joins, unions, intersections). </li></ul><ul><li>- The order of operations. </li></ul><ul><li>Why? </li></ul>
  12. 12. <ul><li>Joins, unions, and intersections are associative and commutative. </li></ul><ul><li> - Management of storage of arguments and passing of it. </li></ul><ul><li>Factors mentioned above should be limited and minimized when creating the best physical plan. </li></ul>
  13. 13. Projection Example: <ul><li>Projections produce a result tuple for every argument tuple. </li></ul><ul><li>What is the change? </li></ul><ul><li>Change in the output size is the change in the length of tuples </li></ul><ul><li>Let’s take a relation ‘R’ </li></ul><ul><li>Relation (20,000 tuples): R(a, b, c) </li></ul><ul><li>Each Tuple (190 bytes): header = 24 bytes, a = 8 bytes, b = 8 bytes, c = 150 bytes </li></ul><ul><li>Each Block (1024): header = 24 bytes </li></ul>
  14. 14. <ul><li>We can fit 5 tuples into 1 block </li></ul><ul><li>5 tuples * 190 bytes/tuple = 950 bytes can fit into 1 block </li></ul><ul><li>For 20,000 tuples, we would require 4,000 blocks (20,000 / 5 tuples per block = 4,000 </li></ul><ul><li>With a projection resulting in elimination of column c (150 bytes), we could estimate that each tuple would decrease to 40 bytes (190 – 150 bytes) </li></ul>
  15. 15. <ul><li>Now, the new estimate will be 25 tuples in 1 block. </li></ul><ul><li>25 tuples * 40 bytes/tuple = 1000 bytes will be able to fit into 1 block </li></ul><ul><li>With 20,000 tuples, the new estimate is 800 blocks (20,000 tuples / 25 tuples per block = 800 blocks) </li></ul><ul><li>Result is reduction by a factor of 5 </li></ul>
  16. 16. Query interaction in DBMS <ul><li>How does a query interact with a DBMS? </li></ul><ul><li>- Interactive users </li></ul><ul><li>- Embedded queries in programs written in C, C++, etc. </li></ul><ul><li>What is the difference between these two ? </li></ul>
  17. 17. <ul><li>Interactive Users: </li></ul><ul><li>- When there is an interactive user query, the query goes through the Query Parser, Query Optimizer, Code Generator, and Query Processor each time. </li></ul>
  18. 18. <ul><li>Embedded Query: </li></ul><ul><li>When there is an embedded query, the query does not have to through the Query Parser, Query Optimizer, Code Generator, and the Query Processor each time. </li></ul>
  19. 19. <ul><li>In an embedded query, the calls generated by the code generator are stored in the database. Each time the query is reached within the program at run-time, the Query Processor invokes the stored calls in the database. </li></ul><ul><li>Optimization is independent in embedded queries. </li></ul>
  20. 20. Cost-based query Optimization: Algebraic Expressions <ul><li>If we had the following query- </li></ul><ul><li>SELECT p.pname, d.dname </li></ul><ul><li>FROM Patients p, Doctors d </li></ul><ul><li>WHERE p.doctor = d.dname </li></ul><ul><li>AND d.dgender = ‘M’ </li></ul>
  21. 21. <ul><li>projection </li></ul><ul><li> filter </li></ul><ul><li> join </li></ul><ul><li>Scan (Patients) Scan (Doctors) </li></ul>
  22. 22. Cost-based query Optimization : Transformation <ul><li>projection projection </li></ul><ul><li>filter join </li></ul><ul><li>join Filter </li></ul><ul><li>Scan (Patients) Scan (Doctors) Scan(Patients) Scan(Doctors) </li></ul>
  23. 23. Cost-based query Optimization: Implementation <ul><li>projection projection </li></ul><ul><li>filter hash join </li></ul><ul><li> natural join filter </li></ul><ul><li>Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors) </li></ul>
  24. 24. Cost-based query Optimization: Plan selection based on costs <ul><li>projection projection </li></ul><ul><li>filter hash join </li></ul><ul><li> natural join filter </li></ul><ul><li>Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors) </li></ul>Estimated Costs = 100ms Estimated Costs = 50ms