Overview of query evaluation


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Overview of query evaluation

  1. 1. Overview of Query Evaluation <ul><li>System catalogs is used to find the best way to evaluate the query </li></ul><ul><li>SQL queries are translated into an extended form of relational algebra </li></ul><ul><li>Queries are composed of several operators, and the algorithm for individual operators can be combined in many ways to evaluate the query </li></ul><ul><li>System catalogs in Oracle </li></ul><ul><ul><li>Called data dictionary </li></ul></ul><ul><ul><li>Access is allowed through views </li></ul></ul><ul><ul><li>Categories (used as a prefix) </li></ul></ul><ul><li>» USER </li></ul><ul><li>» ALL </li></ul><ul><li>» DBA </li></ul><ul><li>» Tables </li></ul><ul><li>– ALL_CATALOG </li></ul><ul><li>– _TAB_COLUMNS </li></ul><ul><li>– _TABLES </li></ul><ul><li>– _INDEXES </li></ul><ul><li>– _VIEWS </li></ul>
  2. 2. Examples of system catalog <ul><li>SELECT * FROM all_catalog </li></ul><ul><li>WHERE owner = 'SMITH'; </li></ul><ul><li>SELECT table_name, column_name </li></ul><ul><li>FROM user_tab_columns WHERE table_name = 'EMPLOYEE'; </li></ul><ul><li>SELECT num_rows, blocks, empty_blocks </li></ul><ul><li>FROM user_tables </li></ul><ul><li>Where table_name = 'EMPLOYEE'; </li></ul><ul><li>SELECT view_name, text </li></ul><ul><li>FROM user_views; </li></ul><ul><li>Select * from user_constraintsl; </li></ul><ul><li>Select CONSTRAINT_TYPE from user_constraints where TABLE_NAME=‘STUD’; </li></ul>
  3. 3. Query optimization <ul><li>Strengths of relational query language is the wide variety of ways in which a user can express the query and system can evaluate it </li></ul><ul><li>How flexible the queries are written , it expresses the performance (good/bad) greatly on the quality of query optimizer </li></ul><ul><li>Queries are parsed and then presented to query optimizer, which is responsible for identifying an efficient execution plan </li></ul><ul><li>Optimizer generates the alternative plans and least estimated cost plan is chosen ;Query is essentially treated as σ – П – join algebra exprn with remaining operations carried out on the result of above exprn </li></ul><ul><li>Query optimization is the process of identifying the access plan with the minimum cost </li></ul><ul><ul><li>Cost = Time taken to get all the answers </li></ul></ul><ul><li>Starting with System-R, most DBMSs use the same algorithm </li></ul><ul><ul><li>generate most of the access plans and select the cheapest one </li></ul></ul><ul><li>First, how do we determine the cost of a plan? </li></ul><ul><li>Then, how long is this process going to take and how do we make it faster? </li></ul>
  4. 4. Query evaluation <ul><li>Alternative ways of evaluating a given query </li></ul><ul><ul><li>Equivalent expressions </li></ul></ul><ul><ul><li>Different algorithms for each operation </li></ul></ul>
  5. 5. Query execution cost <ul><li>Query execution cost is usually a weighted sum of the I/O cost (# disk accesses) and CPU cost (msec) </li></ul><ul><ul><li>w * IO_COST + CPU_COST </li></ul></ul><ul><li>Basic Idea: </li></ul><ul><ul><li>Cost of an operator depends on input data size, data distribution, physical layout </li></ul></ul><ul><ul><li>The optimizer uses statistics about the relations to estimate the cost </li></ul></ul><ul><ul><li>Need statistics on base relations and intermediate results </li></ul></ul>
  6. 6. CPU costing model for query <ul><li>Platform: Oracle , DB Ver: 9.2 </li></ul><ul><li>The formula for the cost (using the CPU Costing Model) of a query is: Cost = ( #SRds * sreadtime  + #MRds * mreadtime + #CPUCycles / cpuspeed ) / sreadtime where: #SRds = number of single block reads #MRds = number of multi block reads #CPUCycles = number of CPU Cycles sreadtim = single block read time mreadtime = multi block read time cpuspeed = Standard 'Oracle' CPU cycles per second The translation of this formula is: The cost is the time spent on single block reads, plus the time spent on multiblock reads, plus the CPU time required, all divided by the time is takes to do a single block read. This means that the cost of a query is the PREDICTED EXECUTION TIME, counted in number of single block read times and is effectively the unit of measure of the cost. </li></ul>
  7. 7. Query evaluation plan <ul><li>It consists of an extended relational algebra tree, with info at each node indicating the access methods to use for each table and the implementation method to use for each relational operator </li></ul><ul><li>Consider the query:- </li></ul><ul><ul><li>Select s.sname from reserves R,Sailor S where R.sid=S.sid and R.bid=100 and s.rating>5; </li></ul></ul><ul><ul><li>In Relational algebra it can be expressed as, </li></ul></ul><ul><ul><ul><li>П sname( σ bid=100 & rating > 5( σ sid=sid reserves join sailors)) </li></ul></ul></ul><ul><ul><li>(draw diag.) </li></ul></ul>
  8. 8. Query processing <ul><li>Query is processed in 3 phases, as below:- </li></ul><ul><ul><li>Parsing : DBMs parses the SQL query and chooses the most efficient access/execution plan </li></ul></ul><ul><ul><li>Execution: the DBMs executes the SQL query using the chosen execution plan </li></ul></ul><ul><ul><li>Fetching: the DBMS fetches the data and sends the result set back to the client </li></ul></ul><ul><ul><li>The processing of DDL is different from DML </li></ul></ul><ul><ul><li>For DDL, DBMS actually updates the data dictionary tables or system catalog while DML manipulates end user data </li></ul></ul>
  9. 9. SQL parsing phase <ul><li>Optimization process includes breaking down, parsing the query into smaller units and transforming the original query into slightly diff. version of original sql code </li></ul><ul><li>SQL query can be fully equivalent and more efficient </li></ul><ul><li>Fully equivalent means optimized query results are always as same as the original query </li></ul><ul><li>More efficient means optimized query will always execute faster than original query </li></ul><ul><li>Parsing activities are performed by query optimizer, they are as below :- </li></ul><ul><ul><li>Validated for syntax compliance </li></ul></ul><ul><ul><li>Validated against data dictionary to ensure tables and col.are correct </li></ul></ul><ul><ul><li>Validated againt data dictionary to ensure the user has proper access permissions </li></ul></ul><ul><ul><li>Analyzed and decomposed into more atomic components </li></ul></ul><ul><ul><li>Prepared for execution by determining the most efficient execution plan </li></ul></ul>
  10. 10. SQL parsing ex <ul><li>The following operations are made during the  parsing . </li></ul><ul><li>Validate the syntax of the statement: is the query a valid SQL statement? SQL> select nothing where 1=2; select nothing where 1=2                * ERROR at line 1: ORA-00923: FROM keyword not found where expected </li></ul><ul><li>Validate the semantic of the statement: are the objects valid? is there any ambiguity? does the constant fit into the column?... SQL> select col from not_existent_table; select col from not_existent_table                 * ERROR at line 1: ORA-00942: table or view does not exist Search in the  shared pool : </li></ul><ul><ul><li>Is the query text already known (search among all the query texts)? if not, error </li></ul></ul><ul><ul><li>Does the query referenced the same objects (search among all versions of the query)? if not, error </li></ul></ul><ul><ul><li>Is the execution environment identical (same search)? If yes, execute the query. </li></ul></ul><ul><li>Allocate memory in the  shared pool  to store the data about the query </li></ul><ul><li>Get the values of the bind variables and check if all values fit in the columns </li></ul>
  11. 11. Parsing ex.contd <ul><li>SQL> var v varchar2(20); SQL> exec :v := '12345678901' PL/SQL procedure successfully completed. SQL> insert into michel.t values (:v); insert into michel.t values (:v)                               * ERROR at line 1: ORA-12899: value too large for column &quot;MICHEL&quot;.&quot;T&quot;.&quot;COL&quot; (actual: 11, maximum: 10) </li></ul><ul><li>Optimize the query execution </li></ul><ul><li>Build the  parse tree  and the execution plan in a format that the SQL engine can use, this is named  row source generation </li></ul><ul><li>Store the  parse tree  and the execution plan in the  shared pool . </li></ul>
  12. 12. Parsing and execution <ul><ul><li>Once the SQL stmt is transformed , the DBMS created what is commonly known as an access/execution plan </li></ul></ul><ul><ul><li>Access/execution plan contains series of steps a DBMs will use to execute the query and return the result set in most efficient way </li></ul></ul><ul><ul><li>SQL execution :- all i/o operations are indicated in the access plan are executed. When the execution plan is run, the proper locks are acquired for the data to be accessed and then retrieved from data files and placed in DBMs data cache </li></ul></ul><ul><ul><li>SQL fetching :- after the parsing and execution phases are completed, all rows that match the specified conditions are retrieved ,sorted and grouped and/or aggregated </li></ul></ul><ul><ul><li>In the fetching phase, the rows of resulting query result set are returned to the client. During this phase, the DBMS may use temporary table space to store temporary data </li></ul></ul>
  13. 13. Query evaluation plan <ul><li>An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated </li></ul>
  14. 14. cost-based query optimization <ul><li>Cost difference between evaluation plans for a query can be enormous </li></ul><ul><ul><li>E.g. seconds vs. days in some cases </li></ul></ul><ul><li>Steps in cost-based query optimization </li></ul><ul><ul><li>Generate logically equivalent expressions using equivalence rules </li></ul></ul><ul><ul><li>Annotate resultant expressions to get alternative query plans </li></ul></ul><ul><ul><li>Choose the cheapest plan based on estimated cost </li></ul></ul><ul><li>Estimation of plan cost based on: </li></ul><ul><ul><li>Statistical information about relations. Examples: </li></ul></ul><ul><ul><ul><li>number of tuples, number of distinct values for an attribute </li></ul></ul></ul><ul><ul><li>Statistics estimation for intermediate results </li></ul></ul><ul><ul><ul><li>to compute cost of complex expressions </li></ul></ul></ul><ul><ul><li>Cost formulae for algorithms, computed using statistics </li></ul></ul>
  15. 15. optimization <ul><li>Explain plan for select * from table where v_nm like ‘b%’ order by column; </li></ul><ul><li>Explained – o/p </li></ul><ul><li>Select * from table(DBMS_XPLAN.DISPLAY); </li></ul><ul><li>Plan_table_o/p </li></ul><ul><li>Predicate info </li></ul><ul><li>Note:- </li></ul><ul><li>No_of_rows selected </li></ul>
  16. 16. Optimization contd… <ul><li>Analyze table table_nm compute statistics; </li></ul><ul><li>Explain plan for select * from table where …. </li></ul><ul><li>Select * from table(DBMS_XPLAN.DISPLAY) </li></ul><ul><li>Predicate info(identified by operation id) </li></ul><ul><li>Note: CPU costing is off </li></ul>
  17. 17. Query graph and query plan <ul><li>Query Graph is a single graph corresponding to each query. It does not specify any order on which operation to perform first. </li></ul><ul><li>Query Plan ( prev.diag) presents a specific order of operations for executing a query. </li></ul><ul><li>It is a set of steps used to help accessing and modifying a SQL RDMS. Since SQL is declarative, there are typically a large number of alternative ways to execute a given query, with widely varying performance. </li></ul><ul><li>When a query is submitted to the database, the query optimizer evaluates some of the different, correct possible plans for executing the query and returns what it considers the best alternative </li></ul><ul><li>SQL query will be analysed first and parsed into a query graph </li></ul>
  18. 18. System catalog <ul><li>System catalog </li></ul><ul><ul><li>The collection of files corresponding to user’s tables and indexes represents the data in the database </li></ul></ul><ul><ul><li>A relational DBMS contains info about every table and index that it contains </li></ul></ul><ul><ul><li>The descriptive info is stored in a collection of special tables called as catalog tables </li></ul></ul><ul><ul><li>The catalog tables are known as data dictionary or system catalog </li></ul></ul>
  19. 19. Information catalog <ul><li>In this, we have the info such as the size of the buffer pool,the page size and following info about the tables, indexes and views </li></ul><ul><ul><li>For each table, </li></ul></ul><ul><ul><ul><li>Its name,the file name, and the structure if the file in which it is stored </li></ul></ul></ul><ul><ul><ul><li>The attribute name and the type </li></ul></ul></ul><ul><ul><ul><li>The index name of each index on the table </li></ul></ul></ul><ul><ul><ul><li>Integrity constraints </li></ul></ul></ul><ul><ul><li>For each index </li></ul></ul><ul><ul><ul><li>The index name and the structure of index </li></ul></ul></ul><ul><ul><ul><li>The search key attributes </li></ul></ul></ul><ul><ul><li>For each view </li></ul></ul><ul><ul><ul><li>Its view name and definition </li></ul></ul></ul>
  20. 20. Statistics on System catalog <ul><li>(i)Cardinality :-the no. of N tuples for table R </li></ul><ul><li>(ii)size:-the N no.of pages for each table R </li></ul><ul><li>(iii)Index cardinality:-the no.of distinct key values for each index I </li></ul><ul><li>(iv)Index size:-the no.of pages for each index I </li></ul><ul><li>(v)Index height:-the number of non leaf levels for each tree index I </li></ul><ul><li>(vi)Index range:- the minimum present key value low val and max value for each index I </li></ul>
  21. 21. Common techq. For operator evaluation <ul><li>Indexing: if selection/join is specified use an index to examine tuples to satisfy condition </li></ul><ul><li>Iteration: examine all tuples in an input table,one after other. </li></ul><ul><li>Partitioning: partitioning tuples on a sort key. Sorting and hashing are used as partitioning techq. </li></ul>
  22. 22. Access Paths & cost model <ul><li>The selectivity of access paths is the number of pages retrieved(index and data pages), we use access paths to retrieve all desired tuples </li></ul><ul><li>If a table contains an index that matches given selection, there are at least 2 access paths:- </li></ul><ul><ul><li>Index </li></ul></ul><ul><ul><li>A scan of the data file </li></ul></ul><ul><li>The most selective access path is the one that retrieves the fewest pages; </li></ul><ul><li>selective access paths minimizes the cost of data retrieval </li></ul>
  23. 23. <ul><li>The selectivity of the access paths depends on primary conjuncts in the selection condition </li></ul><ul><li>Each conjunct acts as a filter on the table </li></ul><ul><li>The fraction of the tuples that satisfy the conjunct is called the reduction factor </li></ul><ul><li>Ex. We have a hash index H on sailors with search key(rname,bid,sid) and selection condition is rname=‘joe’ and bid=5 and sid=3 </li></ul><ul><li>Index can be used to retrieve the tuples that satisfy all three </li></ul>
  24. 24. <ul><li>The catalog contains the number of distinct key values ,Nkeys(H),in the hash index, as well as the number of pages, Npages, in the sailors table. </li></ul><ul><li>The fraction of pages satisfying primary conjuncts is Npages(sailors)*1/Nkeys(H) </li></ul><ul><li>Selection, project and join </li></ul><ul><li>Selection :- it is in the form σ R.attr op value (R) </li></ul><ul><li>Projection is to eliminate duplicates, to use partitioning </li></ul><ul><li>Join :- joining the relations </li></ul>
  25. 25. Pipelined evaluation <ul><li>When a query is composed of several operators, the result of one operator is pipelined to another operator without creating temporary table to hold intermediate result </li></ul><ul><li>If the o/p of an operator is saved in a temp. table for processing by the next operator, then it is materialized </li></ul><ul><li>Pipelined evaluation has lower overhead costs than materialization(obviously as one new table is used) pg.407 </li></ul>