Class Note


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Class Note

  1. 1. Chapter 6: Database Evolution <ul><li>Title: AutoAdmin “What-if” Index Analysis Utility </li></ul><ul><li>Authors: Surajit Chaudhuri, Vivek Narasayya </li></ul><ul><li>ACM SIGMOD 1998 </li></ul>
  2. 2. AutoAdmin “What-if” Index Analysis Utility <ul><li>Problem </li></ul><ul><ul><li>Problem Statement </li></ul></ul><ul><ul><li>Why is this problem important? </li></ul></ul><ul><ul><li>Why is this problem hard? </li></ul></ul><ul><li>Approaches </li></ul><ul><ul><li>Approach description, key concepts </li></ul></ul><ul><ul><li>Contributions (novelty, improved) </li></ul></ul><ul><ul><li>Assumptions </li></ul></ul>
  3. 3. Problem Statement – Index Selection <ul><li>Given </li></ul><ul><ul><li>Database, i.e. tables </li></ul></ul><ul><ul><li>A Database Management System, i.e. SQL Server </li></ul></ul><ul><li>Find </li></ul><ul><ul><li>Index Selection Tool to compare alternate indexes </li></ul></ul><ul><ul><ul><li>by estimated processing cost of workload queries </li></ul></ul></ul><ul><ul><li>Tools to define workload queries, alternative indices, … </li></ul></ul><ul><li>Objectives </li></ul><ul><ul><li>Assist database administrators, i.e. reduce manual effort </li></ul></ul><ul><li>Constraints </li></ul><ul><ul><li>Avoid physical alternation to Databases </li></ul></ul><ul><ul><li>Avoid scan of large tables to create statistics for query optimizer </li></ul></ul>
  4. 4. Why is this problem important? <ul><li>Total cost of ownership </li></ul><ul><ul><li>Database administration is a significant component </li></ul></ul><ul><ul><li>Annual salaries compare with cost of software and hardware </li></ul></ul><ul><li>Thus, DBA time is valuable </li></ul><ul><ul><li>Tools to assist DBA produce significant savings </li></ul></ul><ul><li>DBA perform many services </li></ul><ul><ul><li>Performance tuning via Index selection </li></ul></ul><ul><ul><li>Other, e.g. recovery, upgrade, … </li></ul></ul>
  5. 5. Why is this problem Hard? <ul><li>Index selection is intrinsically a hard search problem . </li></ul><ul><ul><li>A large # of possible single and multi-column indexes </li></ul></ul><ul><ul><li>Various types of indexes </li></ul></ul><ul><ul><li>Variety of usage by query optimizer </li></ul></ul><ul><ul><ul><li>e.g., indexed-only access </li></ul></ul></ul><ul><li>Trade-off between read and update queries </li></ul><ul><ul><li>Cost of update, insert, delete, bulk load may go up! </li></ul></ul><ul><ul><li>Impact analysis is required </li></ul></ul>
  6. 6. Novelty of Contribution <ul><li>Limitations of Related Work </li></ul><ul><ul><li>Lack of impact analysis </li></ul></ul><ul><ul><li>Reliance of large physical changes to databases </li></ul></ul><ul><ul><ul><li>Ex. ‘views’ in simulating hypothetical database [Stonebraker] </li></ul></ul></ul><ul><ul><ul><li>Very computation intensive </li></ul></ul></ul><ul><li>Contributions </li></ul><ul><ul><li>Built ‘index selection’ & ‘index analysis’ utility. </li></ul></ul><ul><ul><li>(This paper more focuses on index analysis utility.) </li></ul></ul><ul><ul><li>Hypothetical index structures </li></ul></ul><ul><ul><li>Enables a large class of analysis at low cost. </li></ul></ul><ul><ul><ul><li>Query optimizer to evaluate indexes </li></ul></ul></ul><ul><ul><ul><li>Sampling to collect statistics </li></ul></ul></ul>
  7. 7. Basic Concepts <ul><li>Ideas for modeling </li></ul><ul><ul><li>‘ Workload’ = a set of SQL statements </li></ul></ul><ul><ul><li>‘ Configuration’ = a set of indexes </li></ul></ul><ul><ul><li>Hypothetical Configuration Analysis (HCA) </li></ul></ul><ul><ul><ul><li>Summary analysis on simulation results </li></ul></ul></ul><ul><li>Ideas for efficient implementation </li></ul><ul><ul><li>Simulate a hypothetical configuration </li></ul></ul><ul><ul><ul><li>estimate the cost of queries in the workload </li></ul></ul></ul><ul><ul><li>Sampling to estimate statistics needed by optimizer </li></ul></ul>
  8. 8. Key Concepts - Simulating Configurations <ul><li>Simulating Hypothetical Configurations to Estimate </li></ul><ul><ul><li>(a) the cost of queries in the workload : is relative to the total cost of the workload in the current configuration. </li></ul></ul><ul><ul><li>(b) usage of indexes : represents the indexes that are expected to be used by the server to answer the query if the hypothetical configuration exists </li></ul></ul>
  9. 9. Key Concepts - Sampling Strategy <ul><li>Used for creating hypothetical indexes. </li></ul><ul><ul><li>Accuracy – Figure 6 </li></ul></ul><ul><ul><li>Cost – Figure 5 </li></ul></ul><ul><li>Details - adaptive page-level sampling algorithm </li></ul><ul><ul><li>Psuedo-code (fig.. 3) </li></ul></ul>
  10. 10. Key Concepts - Architecture
  11. 11. Validation Methodology <ul><li>Prototyped index analysis utility </li></ul><ul><ul><li>for Microsoft SQL Server 7.0 </li></ul></ul><ul><li>Case Study </li></ul><ul><ul><li>Workload – TPC-D – Decision Support benchmark </li></ul></ul><ul><ul><li>Usage analysis </li></ul></ul><ul><ul><ul><li>Figure 10. Usage of each index in configuration for the workload </li></ul></ul></ul><ul><ul><ul><li>Figure 11. – Distribution of selection conditions on a given table </li></ul></ul></ul><ul><ul><li>Comparison of two configuration by cost </li></ul></ul><ul><ul><ul><li>Figure 13 – For 10 most expensive queries </li></ul></ul></ul><ul><ul><ul><li>Figure 14 – By SQL Statement types </li></ul></ul></ul>
  12. 12. Examples of Summary Analysis <ul><li>Distribution of workload by SQL type </li></ul><ul><li>Distribution of conditions over tables </li></ul>
  13. 13. Summary <ul><li>Paper’s focus </li></ul><ul><ul><li>Index Analysis Utility </li></ul></ul><ul><li>Ideas </li></ul><ul><ul><li>Simulating hypothetical indexes can estimate the cost of queries and usage of indexes. </li></ul></ul><ul><li>Contributions </li></ul><ul><ul><li>The first of its kind, index analysis utility with low cost </li></ul></ul><ul><ul><li>Presented efficient mechanism for implementation </li></ul></ul><ul><li>Analytical Validation </li></ul><ul><ul><li>Case study with Microsoft SQL Server 7.0 </li></ul></ul>
  14. 14. Assumptions, Rewrite today <ul><li>Assumptions </li></ul><ul><ul><li>The simulation represents the real behavior of database system. </li></ul></ul><ul><li>Rewrite today </li></ul><ul><ul><li>Compare with newer methods </li></ul></ul><ul><ul><ul><li>DB2 Design Advisor </li></ul></ul></ul><ul><ul><li>Experimental evaluation </li></ul></ul><ul><ul><ul><li>To measure the efficiency of the index analysis utility </li></ul></ul></ul>