Your SlideShare is downloading. ×
SQL Indexing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SQL Indexing

180

Published on

SQL Indexing

SQL Indexing

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
180
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Overview Oracle has a wealth of features that allow the DBA and developer to fetch data from an Oracle database with a minimum of trips to the database. Ever since direct access storage (DASD) was introduced a half-century ago, computer scientists have recognized that disks have addresses, and that indexes can be built on any disk file. The Indexed sequential access method (ISAM) was done by creating a new sorted file that contains the symbolic key and the disk address. As the first large-scale commercial databases were introduced (IMS, IDMS), commercial databases started offering choices of indexing methods. These early databases allowed the DBA to crate tree indexes, and allowed the DBA to control the size of the index nodes. While the concept of an index is straightforward, the implementation of an intelligent Oracle indexing strategy can be extremely complex. Oracle Database 11g offers many types of indexing but the usage of indexes depends upon the shifting SQL workloads. The vast majority of OLTP shops experience vastly different workloads for daytime transaction processing, evening batch job and end of period reporting and it is these workloads that drive the index usage. Oracle does not report on wasted indexing resources, and there are many perils to over-indexing and under-indexing for any given SQL workload. Just as a missing index requires unnecessary large-table full-table scans, maintaining an un-used index requires a huge amount of computing resources. In many cases, indexes that are only used periodically will be expensive and inefficient to maintain while they are not being used, and it is far more cost effective to create and destroy indexes as-needed, based on the needs of each specific SQL workload. Just as manufacturing has implemented just-in-time delivers of raw materials, an intelligent DBA strategy allows for specific indexes, optimized instance parameter settings, and customized optimizer statistics to be delivered immediately before a shift in the workload. Using data from real-world production databases, this presentation will present a proven approach for intelligent indexing: 1. Understand the cost and benefits of indexes: The DBA needs to understand the relative merits and cost of sorted hash clusters, index-organized tables, b-tree indexes, bitmap indexes multi-column indexes. After reviewing these and learn how to match the appropriate index type to their specific SQL workload. 2. Intelligent index enhancement: The Oracle DBA needs to know how to add human intelligence to tip-off the SQL optimizer. This includes optimizer parameters such as optimizer_index_caching, intelligently creating column histograms and deploying Oracle
  • 2. Database 11g extended optimizer statistics, tools a smart DBA can use to improve the efficiency of the SQL optimizer. 3. Identify repeating workloads: The DBA will use SQL execution plan feedback from AWR and STATSPACK to identify distinct repeating workloads. These workload are then named (e.g. day_mode, night_mode, eow_mode, eom_mode), and a profile of index usage is gathered from historical data. 4. Profile index usage for each workload: Next, we determine the optimal indexing strategy for each workload, counting the number of time per minute that each index is invoked and plotting usage patterns for multi-column indexes. 5. Intelligently schedule instance customization: Based on the usage for each index, we will identify duplicitous indexes, un-used indexes and those indexes that are unique to each workload. We can then use the dbms_scheduler utility to customize the Oracle instance to meet the demands of each distinct workload. This presentation also reveals secrets scripts to see track indexes are the most popular over time and how often the index is invoked by SQL. We also explore how to track which columns of a multi-column indexes are referenced by your SQL. In sum, this presentation is indispensible for any DBA who must manage a mission critical database corporate database with limited computing resources. Oracle & Indexing As Oracle came to dominate the data processing world, they began to offer a wealth of indexing strategies. We need to start by distinguishing between “logical” and “physical” indexing structures. Note: Oracle indexes can cause “extra” I/O operations, and the choice to use indexes is dependent upon your choice of optimizer_mode. Not all database transactions will benefit from index usage and sometimes you want your SQL optimizer to optimize for fast throughput instead of fast response time: All_rows – The all_rows optimizer_mode is designed to maximize throughput First_rows_n – The first_rows_n optimizer_mode is designed to minimize response time. The logical indexing strategies are deployed at the analytical level and do not involve any knowledge of the underlying physical database structures: Logical Indexes: Function-based Indexes Multi-column indexes Virtual column indexes STAR transformation indexing Oracle Physical Indexes Oracle is the world‟s most flexible and robust database because he offers almost every conceivable algorithm for speeding-up access to data:
  • 3. Physical Index structures: B-tree indexes Bitmap Indexes Bitmap join indexes Hash key indexes (sorted hash clusters) Hash partitioned indexes Index-organized tables (IOT) Con*Text Indexes While b-tree indexes are great for simple queries, they are not very good for the following situations: Low-cardinality columns: Columns with less a tiny number of distinct values do not have the selectivity that is required in order to benefit from standard b-tree index structures. No support for SQL functions: B-tree indexes are not able to support SQL queries using Oracle's built-in functions. Bitmap Indexes Bitmap indexes are great for low cardinality columns but they have a huge overhead for DML, and most shops drop and rebuild bitmaps during off hours processing. Hence, bitmap indexes are not for active OLTP databases and they are used primarily for readonly DSS of data warehouse systems. Bitmaps, while not very useful by themselves, are great when combined together. For example, assume there is a motor vehicle database with numerous low-cardinality columns such as car_color, car_make, car_model and car_year. Each column contains less than 100 distinct values by themselves, and a b-tree index would be fairly useless in a database of 20 million vehicles. However, combining these indexes together in a query can provide blistering response times a lot faster than the traditional method of reading each one of the 20 million rows in the base table. For example, assume we wanted to find old blue Toyota Corollas manufactured in 1981: select license_plat_nbr from vehicle where color = „blue‟ and make = „toyota‟ and year = 1981;
  • 4. Oracle uses a specialized optimizer method called a bitmapped index merge to service this query. In a bitmapped index merge, each Row-ID, or RID, list is built independently by using the bitmaps, and a special merge routine is used in order to compare the RID lists and find the intersecting values. Using this methodology, Oracle can provide sub-second response time when working against multiple low-cardinality columns: Oracle bitmap merge join Does Size Matter? To meet the requirements of all operating systems from a mainframe to a Mac, Oracle has chosen to implement the index node size based upon the DBA‟s choice for the database block size. An index built in a db_32k_cache_size will contain more keys per index node than the same index created in a db_2k_block_size. While the vast majority of indexes would reside in your database‟s default db_cache_size, there are exceptions and some large OLTP databases have shown great performance improvements when moved into a blocksize that matches the I/O characteristics of the index. For example, during a nested loop join, one index received multi-block reads (index range scans) while the other index is doing index unique probes. In these cases, moving the index that experienced multi-block reads into a large blocksize allows more keys to be read in a single data buffer get. Workload-driven index usage It‟s backwards, but at implementation time, the DBA may not be aware of the SQL workload that will be used against their tables and this lack of a-priori knowledge of the SQL can lead to the phenomenon of “missing indexes”, especially function-based indexes. Hence, all DBAs adopt a workload-based approach, examining their workloads for I/O signatures and implementing indexing strategies that match the workload. Most production systems have at least two workload modes, a “day mode” characterized by small OLTP SQL optimized for fast response time, and a “night mode” characterized by large batch jobs that need to be optimized for fast throughput.
  • 5. Day Mode Night Mode EOM Mode Data Warehouse read-only batch updates summary reports OLTP transactions batch reports summary reports small An iterative process of intelligent SQL workload tuning involves these steps: 1. Analyze the workload: Use monitoring tools to identify repeating processing signatures 2. Codify and “name” the workload signatures (day mode, night mode, EOW mode, EOM mode) 3. Identify optimal setting for each workload (init.ora parameters, SGA regions and CBO statistics, building custom indexes) 4. Schedule just-in-time morphing of Oracle to accommodate the changing workload End of Period Workloads In addition to the normal day mode and night mode workloads, there are many systems that have special end-of-week, end-of-month, and end-of-quarter processing that require an entirely new SGA and database configuration. PGA adjustments: You can over-ride the governor that only allows any job to grab 5% of the PGA, and you can set very large values for sort_area_size and hash_area_size to allow the single-threaded batch job to consume all of the PGA, if necessary. Optimizer parameters: Certain parameters like optimizer_mode, optimizer_index_caching can change the propensity of Oracle to use indexes. Statistics: Customized EOM CBO statistics can be collected, save and imported when you switch to EOM mode. This is especially important for cases where specialized “weekend-only” indexes are used and optimizer histograms are required. SGA Adjustments: When you have a batch mode when only a few concurrent jobs will run simultaneously, you will adjust the “default governors” in Oracle wo allow the batch jobs to consume all available computing resources. B-Tree Indexes: Some indexes are only used for EOW processing, and it‟s smart to build them on Friday night and nuke the index on Sunday night. Bitmap Indexes: Many EOM reports are read-only and building temporary bitmap indexes can greatly improve the throughout for jobs that may run more then 10 hours. Con*Text Indexes: Many EOY reports need to “look inside” CLOB columns, and a Friday night build of a word index on a CLOB is perfect for read-only weekend reports. The index is destroyed on Sunday night. Proactive workload analysis and predictive modeling
  • 6. Statistically proven predictive analytic procedures comes with the Oracle Data mining package (ODM), but DBAs are now only learning that these same principles can be applied to their own production SQL workloads. It is possible to fix problems before they affect your users! All you need is a way to interrogate your historical data from STATSPACK or AWR and identify repeating “signatures”, statistically significant repeating patterns of usage that can be used to predict future events! “Those who forget the past are condemned to repeat it” – George Santayana “The farther that you can look into the past, the farther you can look into the future” – Winston Churchill “Predictions are difficult, especially about the future" - Yogi Berra If you closely study your past workloads, it‟s easy to develop a proactive approach that will intelligently anticipate new repeating workloads and change the SGA to accommodate the changes, before it impacts the end-users. Oracle offers a similar one-sized-fits-all approach called Automatic Memory Management (AMM) but AMM is reactive and waits until a problem is detected before fixing it. SQL workload Starting in Oracle 10g, Oracle finally recognized that it is the SQL workload that determines the optimal index placement. Intelligent index management Identifying missing indexes Identifying un-used indexes Interrogating index usage within a workload Building custom Indexes for workloads Using function-based indexes Interrogating index usage within a workload index_usage_hr.sql col c1 heading „Begin|Interval|time‟ format a20 col c2 heading „Search Columns‟ format 999 col c3 heading „Invocation|Count‟ format 99,999,999 break on c1 skip 2
  • 7. accept idxname char prompt „Enter Index Name: „ ttitle „Invocation Counts for index|&idxname‟ select to_char(sn.begin_interval_time,'yy-mm-dd hh24') c1, p.search_columns c2, count(*) c3 from dba_hist_snapshot sn, dba_hist_sql_plan p, dba_hist_sqlstat st where st.sql_id = p.sql_id and sn.snap_id = st.snap_id and p.object_name = „&idxname' group by begin_interval_time,search_columns; This will produce an output like this, showing a summary count of the index specified during the snapshot interval. This can be compared to the number of times that a table was invoked from SQL. Here is a sample of the output from this script: Interval Invocation time Search Columns Count -------------------- -------------- ----------04-10-21 15 1 3 04-10-10 16 0 1 04-10-10 19 1 1 04-10-11 02 0 2 04-10-11 04 2 1 04-10-11 06 3 1 04-10-11 11 0 1 04-10-11 12 0 2 04-10-11 13 2 1 04-10-11 15 0 3 04-10-11 17 0 14 Using function-based indexes Here we see a Function Based Index (FBI) on a large VARCHAR column based on a credit card number that it contains. The database will create this index, but even with a hint, the optimizer will not use it…and it is expensive to maintain. create index card_idx on dumbtbl (REGEXP_SUBSTR (lots_data,‟(([09]{4})([[:space:]])){3}[0-9]{4}‟));
  • 8. Table row re-sequencing and index usage The original Relational Manifesto by Codd and Date stated that the physically ordering of rows on disk should not matter, but in the real world, row placement matters a lot, especially for tables whose rows are always accessed in the same key order. The dba_indexes view has a clustering_factor column that tells the optimizer how “synchronized an index is with the tables and values range from “blocks” (synchronized): On the other extreme, when clustering_factor approaches num_rows (not synchronized), range scan operations may require hundreds more I/Os to complete a transaction:
  • 9. Many DBAs will search the workload and seek tables that are frequently accessed by the same index, using an index range scan. Here is an unclustered index situation. This transaction displays all customers in a city: Only 10 rows per data block Clustering factor is “poor” for the primary index Lots of “unnecessary” disk reads to fetch all customers for a city Unclustered index situation Below is a clustering index situation: 400 Rows fit onto a data block.
  • 10. As we see, the decision to reorganize tables into the same order of an index depends on several factors: The percentage of total that the index is used to access the table rows. Remember, a table can only be re-sequenced according to one index, and it must be the most popular index. The index uses and index full-scan, index fast-full-scan, or index_range_scan. Remember, some indexes are only used for “unique” index probes, and these will see not benefit from reorganization. The database block size (db_nn_cache_size) The average index key length (dba_tables.avg_row_len) Typical cases might include: Order Items: An order processing systems displays all items for an order 1,000 times per hour with an average of 40 consistent gets per execution. System-wide, the item_idx is used 95% of the time to show the items within an order. Because there are about 70 items per order, and the item rows are 80 bytes each, reorganizing the item table by item_idx will reduce I/O by a factor of at least 40x. Student Classes: Assume a 2k University database that displays all classes for a student 10 times per second. The class table is using the student_semester_idx 97% of the time with index range scans, and a student takes an average of 4 classes per semester. What factors should we consider when re-organizing this table? Oracle provides several methods for allowing the DBA to re-sequence table rows into the same order as the most popular index: Sorted hash cluster tables CTAS with order by Index-Organized Tables
  • 11. Oracle MOSC Note: 223117.1 has some great advice for tuning-down “db file sequential read” waits by table reorganization in row-order: If Index Range scans are involved, more blocks than necessary could be being visited if the index is unselective: by forcing or enabling the use of a more selective index, we can access the same table data by visiting fewer index blocks (and doing fewer physical I/Os). If the index being used has a large Clustering Factor, then more table data blocks have to be visited in order to get the rows in each Index block: by rebuilding the table with its rows sorted by the particular index columns we can reduce the Clustering Factor and hence the number of table data blocks that we have to visit for each index block. Conclusion Even though Oracle has introduced many automated features for locating missing indexes, no artificial tools will ever be able to replace the intuition of a human expert. Those that forget the past are condemned to repeat it, and it is the job of every data base administrator to analyze their historical performance data to identify workload based indexing opportunities. By applying an intelligent approach to Oracle indexing you can insure that the indexes match the data access patterns and prevent wasting computing resources by maintaining unnecessary indexes.

×