Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Query Optimization
Outline <ul><li>Introduction </li></ul><ul><li>Steps in Cost-based query optimization- Query Flow </li></ul><ul><li>Projec...
Introduction <ul><li>What is Query Optimization? </li></ul><ul><li>Suppose you were given a chance to visit 15 pre-selecte...
Europe
<ul><li>Plan:  </li></ul><ul><li>-> Place the 15 cities in different groups based on their proximity to each other.  </li>...
<ul><li>Query Optimization works in a similar way:  </li></ul><ul><li>There can be many different ways to get an answer fr...
<ul><li>Starting with System-R, most of the commercial DBMSs use cost-based optimizers. </li></ul><ul><li>The estimation s...
Steps in a Cost-based query optimization <ul><li>Parsing  </li></ul><ul><li>Transformation </li></ul><ul><li>Implementatio...
Query Flow Parser Optimizer Code Generator/Interpreter Processor SQL
<ul><li>Query Parser  – Verify validity of the SQL statement. Translate query into an internal structure using relational ...
<ul><li>Cost of physical plans includes processor time and communication time. The most important  factor to consider is d...
<ul><li>Joins, unions, and intersections are associative and commutative.  </li></ul><ul><li>  - Management of storage of ...
Projection Example: <ul><li>Projections produce a result tuple for every argument tuple.  </li></ul><ul><li>What is the ch...
<ul><li>We can fit 5 tuples into 1 block </li></ul><ul><li>5 tuples * 190 bytes/tuple = 950 bytes can fit into 1 block  </...
<ul><li>Now, the new estimate will be 25 tuples in 1 block. </li></ul><ul><li>25 tuples * 40 bytes/tuple = 1000 bytes will...
Query interaction in DBMS <ul><li>How does a query interact with a DBMS? </li></ul><ul><li>- Interactive users </li></ul><...
<ul><li>Interactive Users: </li></ul><ul><li>- When there is an interactive user query, the query goes through the Query P...
<ul><li>Embedded Query: </li></ul><ul><li>When there is an embedded query, the query  does not  have to through the Query ...
<ul><li>In an embedded query, the calls generated by the code generator are stored in the database. Each time the query is...
Cost-based query Optimization: Algebraic Expressions <ul><li>If we had the following query- </li></ul><ul><li>SELECT p.pna...
<ul><li>projection </li></ul><ul><li>  filter </li></ul><ul><li>  join </li></ul><ul><li>Scan (Patients) Scan (Doctors) </...
Cost-based query Optimization : Transformation <ul><li>projection projection </li></ul><ul><li>filter  join </li></ul><ul>...
Cost-based query Optimization: Implementation <ul><li>projection  projection </li></ul><ul><li>filter  hash join </li></ul...
Cost-based query Optimization: Plan selection based on costs <ul><li>projection  projection </li></ul><ul><li>filter  hash...
Upcoming SlideShare
Loading in …5
×

Query optimization

21,274 views

Published on

Query Optimization process

Published in: Technology
  • Be the first to comment

Query optimization

  1. 1. Query Optimization
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Steps in Cost-based query optimization- Query Flow </li></ul><ul><li>Projection Example </li></ul><ul><li>Query Interaction in DBMS </li></ul><ul><li>Cost-based query Optimization: Algebraic Expressions </li></ul>
  3. 3. Introduction <ul><li>What is Query Optimization? </li></ul><ul><li>Suppose you were given a chance to visit 15 pre-selected different cities in Europe. The only constraint would be ‘Time’ </li></ul><ul><li>-> Would you have a plan to visit the cities in any order? </li></ul>
  4. 4. Europe
  5. 5. <ul><li>Plan: </li></ul><ul><li>-> Place the 15 cities in different groups based on their proximity to each other. </li></ul><ul><li>-> Start with one group and move on to the next group. </li></ul><ul><li>Important point made over here is that you would have visited the cities in a more organized manner, and the ‘Time’ constraint mentioned earlier would have been dealt with efficiently. </li></ul>
  6. 6. <ul><li>Query Optimization works in a similar way: </li></ul><ul><li>There can be many different ways to get an answer from a given query. The result would be same in all scenarios. </li></ul><ul><li>DBMS strive to process the query in the most efficient way (in terms of ‘Time’) to produce the answer. </li></ul><ul><li>Cost = Time needed to get all answers </li></ul>
  7. 7. <ul><li>Starting with System-R, most of the commercial DBMSs use cost-based optimizers. </li></ul><ul><li>The estimation should be accurate and easy. Another important point is the need for being logically consistent because the least cost plan will always be consistently low. </li></ul>
  8. 8. Steps in a Cost-based query optimization <ul><li>Parsing </li></ul><ul><li>Transformation </li></ul><ul><li>Implementation </li></ul><ul><li>Plan selection based on cost estimates </li></ul>
  9. 9. Query Flow Parser Optimizer Code Generator/Interpreter Processor SQL
  10. 10. <ul><li>Query Parser – Verify validity of the SQL statement. Translate query into an internal structure using relational calculus. </li></ul><ul><li>Query Optimizer – Find the best expression from various different algebraic expressions. Criteria used is ‘Cheapness’ </li></ul><ul><li>Code Generator/Interpreter – Make calls for the Query processor as a result of the work done by the optimizer. </li></ul><ul><li>Query Processor – Execute the calls obtained from the code generator. </li></ul>
  11. 11. <ul><li>Cost of physical plans includes processor time and communication time. The most important factor to consider is disk I/Os because it is the most time consuming action. </li></ul><ul><li>Some other costs associated are: </li></ul><ul><li>- Operations (joins, unions, intersections). </li></ul><ul><li>- The order of operations. </li></ul><ul><li>Why? </li></ul>
  12. 12. <ul><li>Joins, unions, and intersections are associative and commutative. </li></ul><ul><li> - Management of storage of arguments and passing of it. </li></ul><ul><li>Factors mentioned above should be limited and minimized when creating the best physical plan. </li></ul>
  13. 13. Projection Example: <ul><li>Projections produce a result tuple for every argument tuple. </li></ul><ul><li>What is the change? </li></ul><ul><li>Change in the output size is the change in the length of tuples </li></ul><ul><li>Let’s take a relation ‘R’ </li></ul><ul><li>Relation (20,000 tuples): R(a, b, c) </li></ul><ul><li>Each Tuple (190 bytes): header = 24 bytes, a = 8 bytes, b = 8 bytes, c = 150 bytes </li></ul><ul><li>Each Block (1024): header = 24 bytes </li></ul>
  14. 14. <ul><li>We can fit 5 tuples into 1 block </li></ul><ul><li>5 tuples * 190 bytes/tuple = 950 bytes can fit into 1 block </li></ul><ul><li>For 20,000 tuples, we would require 4,000 blocks (20,000 / 5 tuples per block = 4,000 </li></ul><ul><li>With a projection resulting in elimination of column c (150 bytes), we could estimate that each tuple would decrease to 40 bytes (190 – 150 bytes) </li></ul>
  15. 15. <ul><li>Now, the new estimate will be 25 tuples in 1 block. </li></ul><ul><li>25 tuples * 40 bytes/tuple = 1000 bytes will be able to fit into 1 block </li></ul><ul><li>With 20,000 tuples, the new estimate is 800 blocks (20,000 tuples / 25 tuples per block = 800 blocks) </li></ul><ul><li>Result is reduction by a factor of 5 </li></ul>
  16. 16. Query interaction in DBMS <ul><li>How does a query interact with a DBMS? </li></ul><ul><li>- Interactive users </li></ul><ul><li>- Embedded queries in programs written in C, C++, etc. </li></ul><ul><li>What is the difference between these two ? </li></ul>
  17. 17. <ul><li>Interactive Users: </li></ul><ul><li>- When there is an interactive user query, the query goes through the Query Parser, Query Optimizer, Code Generator, and Query Processor each time. </li></ul>
  18. 18. <ul><li>Embedded Query: </li></ul><ul><li>When there is an embedded query, the query does not have to through the Query Parser, Query Optimizer, Code Generator, and the Query Processor each time. </li></ul>
  19. 19. <ul><li>In an embedded query, the calls generated by the code generator are stored in the database. Each time the query is reached within the program at run-time, the Query Processor invokes the stored calls in the database. </li></ul><ul><li>Optimization is independent in embedded queries. </li></ul>
  20. 20. Cost-based query Optimization: Algebraic Expressions <ul><li>If we had the following query- </li></ul><ul><li>SELECT p.pname, d.dname </li></ul><ul><li>FROM Patients p, Doctors d </li></ul><ul><li>WHERE p.doctor = d.dname </li></ul><ul><li>AND d.dgender = ‘M’ </li></ul>
  21. 21. <ul><li>projection </li></ul><ul><li> filter </li></ul><ul><li> join </li></ul><ul><li>Scan (Patients) Scan (Doctors) </li></ul>
  22. 22. Cost-based query Optimization : Transformation <ul><li>projection projection </li></ul><ul><li>filter join </li></ul><ul><li>join Filter </li></ul><ul><li>Scan (Patients) Scan (Doctors) Scan(Patients) Scan(Doctors) </li></ul>
  23. 23. Cost-based query Optimization: Implementation <ul><li>projection projection </li></ul><ul><li>filter hash join </li></ul><ul><li> natural join filter </li></ul><ul><li>Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors) </li></ul>
  24. 24. Cost-based query Optimization: Plan selection based on costs <ul><li>projection projection </li></ul><ul><li>filter hash join </li></ul><ul><li> natural join filter </li></ul><ul><li>Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors) </li></ul>Estimated Costs = 100ms Estimated Costs = 50ms

×