• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Indexing for Performance for SQL Server 2005
 

Indexing for Performance for SQL Server 2005

on

  • 703 views

 

Statistics

Views

Total Views
703
Views on SlideShare
703
Embed Views
0

Actions

Likes
0
Downloads
51
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • I’ve been tuning SQL databases for over 15 years. Since being introduced to I3, when I’m called in for a short-term performance and tuning gig, the first thing I do is ask if I can send a tool in ahead of me to monitor the database while I’m in transit, so that I’ve got information to start with when I arrive. The success rate has been phenomenal; so far, I’ve had a 100% hit rate with DBAs wanting to buy it after they’ve seen it in action on their own platforms.
  • It was easy to quantify day 1 vs. day 2 performance
  • Note: This is from a different engagement. When I went on that last engagement, I hadn’t been planning on demonstrating this technique to anybody other than the client who was looking over my shoulder at the time.
  • Note that the highlighted plan on the left corresponds with tables and columns on the right, and can highlight the need for index addition or changes.
  • When the vast majority of elapsed time is not in the database, you have evidence that it’s not your fault. In this case, it was the fault of significant requests to an SSL layer; it turned out that some of the screens were taking 5-6 seconds to encrypt due to the quantity of dynamic data displayed.

Indexing for Performance for SQL Server 2005 Indexing for Performance for SQL Server 2005 Presentation Transcript

  • Indexing for Performance for SQL Server 2005 Single - Table Optimization Chapter Four Jeff Garbus – [email_address] Tony Cannizzo – [email_address]
  • Who We Are, What We Do
    • 18 Years of Solving Customer Problems Across The Database Life Cycle
    • People
      • Noted Authors
      • Hands-on Instructors
      • Experienced Consultants
    • Solutions
      • Business Continuity
      • Application Availability
      • Performance Management
    • Core Competencies
      • Performance and Tuning
      • Database Administration
      • Development Best Practices
    • Deliverables
      • Performance Assessments
      • Architectural Review
      • P&T Training
      • Emergency Triage
      • Troubleshooting
      • Implementation Assurance
      • Hardware Validation
      • Contract DBA
    • NEW!
      • Remote Monitoring
  • Acknowledgements © Soaring Eagle Consulting, Inc Page 4 -
    • Microsoft SQL Server and Microsoft SQL Server Management Studio are trademarks of Microsoft Inc.
    • This presentation is copyrighted.
    • This presentation is not for re-sale
    • This presentation shall not be used or modified without express written consent of Soaring Eagle Consulting, Inc.
  • Topics
    • Examine detailed topics in query optimization
    • Indexes with SARGs
    • Improvised SARGs
    • Clustered vs. nonclustered indexes
    • Queries with OR
    • Index covering
    • Forcing index selection
    © Soaring Eagle Consulting, Inc Page 4 -
  • SQL Server 2005 Search Techniques
    • SQL Server 2005 uses three basic search techniques for query resolution
      • Table Scans
      • Index Searches
      • Covered Index Searches
    © Soaring Eagle Consulting, Inc Page 4 -
  • Table Scans
    • If SQL Server 2005 can’t resolve a query any other way, it does a table scan
      • Scans are expensive
      • Table scans may be the best way to resolve a query
      • If there is a clustered index on the table, SQL Server will try and use it instead of performing a table scan
    • Table Scan Search
    © Soaring Eagle Consulting, Inc Page 4 - select * from pt_tx where id = 1
  • Table Scans (Cont’d)
    • Query Plan
    • Verify table scans with:
      • set statistics io on
    © Soaring Eagle Consulting, Inc Page 4 - Table 'pt_tx'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0
  • Table Scan Output: Update
      • showplan
    © Soaring Eagle Consulting, Inc Page 4 - update pt_tx set id = id + 1
  • Index Selection
    • Topics
      • Optimizer selection criteria
      • When indexes slow access
      • When indexes cause deadlocks
      • Index statistics and usage
    © Soaring Eagle Consulting, Inc Page 4 -
  • Optimizer Selection Criteria
    • During the index selection phase of optimization the optimizer decides which (if any) indexes best resolve the query
    • Identify which indexes match the where and join clauses
    • Estimate rows to be returned
    • Estimate page reads
    © Soaring Eagle Consulting, Inc Page 4 -
  • SARG Matching
    • Indexes usually correspond with SARGs
    • Useful indexes will specify a row or rows or set bounds for the result set
    • An index may be used if any column of the index matches the SARG
    © Soaring Eagle Consulting, Inc Page 4 - where dob between '3/3/1941' and '4/4/65' create unique index nci on authors (au_lname, au_fname)
  • SARG Matching (Cont’d)
    • Which of the following queries (if any) could be helped by the index?
    • If there are not enough rows in the table, indexes that look useful may never be used
    © Soaring Eagle Consulting, Inc Page 4 - select * from authors where au_lname = 'Smith' or au_fname = 'Jim' select * from authors where au_fname = 'Jim' select * from authors where au_fname = 'Jim' and au_lname = 'Smith' create unique index nci on authors (au_lname, au_fname)
  • Index Selection
    • Review of index types
    • Optimizer selection criteria
    • When indexes slow access
    • When indexes cause deadlocks
    • Index statistics and usage
    © Soaring Eagle Consulting, Inc Page 4 - Topics
  • Index Types
    • SQL Server provides three types of indexes
      • Clustered
      • Nonclustered
      • Full text
    • One clustered index per table
      • Data is maintained in clustered index order
    • 248 nonclustered indexes per table
      • Nonclustered indexes maintain pointers to rows
    • Full text is beyond scope
    © Soaring Eagle Consulting, Inc Page 4 -
  • Clustered Index Mechanism
    • With a clustered index, there will be one entry on the last intermediate index level page for each data page
    • The data page is the leaf or bottom level of the index
    • (Assume a clustered index on last name)
    © Soaring Eagle Consulting, Inc Page 4 -
  • Nonclustered Index Mechanism
    • The nonclustered index has an extra, leaf level for page / row pointers
    • Data placement is not affected by non-clustered indexes
    • (Assume an NCI on first name)
    © Soaring Eagle Consulting, Inc Page 4 -
  • Clustered vs. Nonclustered
    • A clustered index tends to be 1 I/O faster than a nonclustered index for a single-row lookup
    • Clustered indexes are excellent for retrieving ranges of data
    • Clustered indexes are excellent for queries with order by
    • Nonclustered indexes are a bit slower, take up much more disk space, but are the next best alternative to a table scan
    • Nonclustered indexes may cover the query for maximal retrieval speed
    • For some queries; covered queries, nonclustered indexes can be faster
    • When creating a clustered index, you need free space in your database approximately equal to 120% of the total table size
    © Soaring Eagle Consulting, Inc Page 4 -
  • Using Indexes
    • Clustered Index Indications
    • Columns searched by range of values
    • Columns by which the data is frequently sorted (order by or group by)
    • Sequentially accessed columns
    • Static columns
    • Join columns (if other than the primary key)
    • Nonclustered Index Indications
    • NCI selection tends to be much more effective if less than about 20% of the data is to be accessed
    • NCIs help sorts, joins, group by clauses, etc., if other column(s) must be used for the CI
    • Index covering
    © Soaring Eagle Consulting, Inc Page 4 -
  • Other Index Limitations
    • Maximum 16 columns
    • Maximum 900 bytes column width
    © Soaring Eagle Consulting, Inc Page 4 -
  • Primary Key vs. Clustering vs. Nonclustering
    • A primary key is a logical concept, not a physical concept
    • Indexes are physical concepts, not logical concepts
    • There is a strong correlation between the logical concept of a key and the physical concept of an index
    • By default, when you define relationships as part of table design, you will build indexes to support the joins / lookups
    • By default, when you define a primary key, you will create a unique clustered index on the table
      • Unique is good, clustered isn’t always good
    • When you define a clustered index, the server automatically appends the key column(s) (plus a unique identifier, if necessary) to the nonclustered indexes
    © Soaring Eagle Consulting, Inc Page 4 -
  • Key / index features (New in SQL Server 2005)
    • Columns that are not part of the index key can be included in nonclustered indexes. Including the nonkey columns in the index can speed queries (Index covering) and can exceed the current index size limitations of a maximum of 16 key columns and a maximum index key size of 900 bytes
    • The new ALLOW_ROW_LOCKS and ALLOW_PAGE_LOCKS options in CREATE INDEX and ALTER INDEX can be used to control the level at which locking occurs for the index
    • The query optimizer can match more queries to indexed views than in previous versions, including queries that contain scalar expressions, scalar aggregate and user-defined functions, interval expressions, and equivalency conditions
    • Indexed view definitions can also now contain scalar aggregate and user-defined functions with certain restrictions.
      • (More in “Views”)
    © Soaring Eagle Consulting, Inc Page 4 -
  • Optimizer Selection Criteria
    • During the index selection phase of optimization the optimizer decides which (if any) indexes best resolve the query
      • Identify which indexes match the clauses
      • Estimate rows to be returned
      • Estimate page reads
    © Soaring Eagle Consulting, Inc Page 4 -
  • Index Selection Examples
    • 1. What index will optimize this query?
    • 2. What indexes optimize these queries?
    • 3. In the second query, what would the net effect be of changing the range to this?
    © Soaring Eagle Consulting, Inc Page 4 - select title from titles where title = ‘Alleviating VDT Eye Strain’ select title from titles where price between $5. and $10. between $500 and $600
  • CI vs. NCI
    • Table facts:
        • 2,000,000 titles (= 14492 pages)
        • 138 rows / page
        • 1 million rows in the range
    © Soaring Eagle Consulting, Inc Page 4 - select title from titles where price between $5. and $10.
  • CI vs. NCI
    • It is feasible, occasionally likely, that a table scan is faster than using a nonclustered index for specific queries
    • The server evaluates all options at optimization time and selects the least expensive query
    © Soaring Eagle Consulting, Inc Page 4 -
  • Or Indexing
    • Questions
      • What indexes should (could) be used?
      • Will a compound index help?
      • Which column(s) should be indexed?
    © Soaring Eagle Consulting, Inc Page 4 - select title from titles where price between $5. and $10. or type = 'computing'
  • Or Indexing (Cont’d)
      • How is the following query different (from a processing standpoint)?
      • What is a useful index for?
    © Soaring Eagle Consulting, Inc Page 4 - select title from titles where price between $5. and $10. and type = 'computing' select * from authors where au_fname in ( ' Fred ' , ' Sally ' )
  • Or Clauses
    • Format
    © Soaring Eagle Consulting, Inc Page 4 - select * from authors where au_lname = 'Smith' or au_fname = 'Fred' select * from authors where au_lname in ('Smith', 'Jones', 'N/A' )
    • (How many indexes may be useful?)
    SARG or SARG
  • Or Strategy
    • An or clause may be resolved via a table scan, a multiple match index or using or strategy
    • Table Scan
    • Each row is read, and criteria applied
    • Matching rows are returned in the result set
    • The cost of all the index accesses is greater than the cost of a table scan
    • At least one of the clauses names a column that is not indexed, so the only way to resolve the clause is to perform a table scan
    © Soaring Eagle Consulting, Inc Page 4 -
  • Or Strategy (Cont’d)
    • Multiple match index
    • Using each part of the or clause, select an index and retrieve the row
    • Only used if the results sets can not return duplicate rows
    • Rows are returned to the user as they are processed
    © Soaring Eagle Consulting, Inc Page 4 -
  • Or: Query Plan
    • select company, street2 from pt_sample
        • where id = 2017 or id = 2163
    © Soaring Eagle Consulting, Inc Page 4 - Query Execution Plan
  • Index Selection and the Select List
    • Questions
      • What is the best index?
      • Do the columns being selected have a bearing on the index?
    © Soaring Eagle Consulting, Inc Page 4 -
      • select * from publishers where pub_id = 'BB1111'
  • Index Selection and the Select List
    • Question
      • Should there be a difference between the utilization of the following two indexes?
    © Soaring Eagle Consulting, Inc Page 4 - select royalty from titles where price between $10 and $20 create index idx1 on titles (price) /* or */ create index idx2 on titles (price, royalty)
  • Index Covering
    • The server can use the leaf level of a nonclustered index the way it usually reads the data pages of a table: this is index covering
    • The server can skip reading data pages
    • The server can walk leaf page pointers
    • A nonclustered index will be faster than a clustered index if the index covers the query for a range of data (why?)
    • Adding columns to nonclustered indexes is a common method of reducing query time
    • This has particular benefits with aggregates
    © Soaring Eagle Consulting, Inc Page 4 -
  • Index Covering (Cont’d)
    • Beware making the index too wide; As index width approaches row width, the benefit of covering is reduced
      • # of levels in the index increases
      • Index scan time approaches table scan time
    • Remember that changes to data will cascade into indexes
    © Soaring Eagle Consulting, Inc Page 4 -
  • Composite Indexes
    • Composite (compound) indexes may be selected by the server if the first column of the index is specified in a where clause, or if it is a clustered index
    © Soaring Eagle Consulting, Inc Page 4 - create index idx1 on employee (minit, job_id , job_lvl)
  • Composite Indexes (Cont’d)
    • Which queries may use the index?
    © Soaring Eagle Consulting, Inc Page 4 - select * from employee where minit = 'A' and job_id != 4 and job_lvl = 135 select * from employee where job_id != 4 and job_lvl = 135 select * from employee where minit = 'A' and job_lvl = 135 create index idx1 on employee (minit, job_id , job_lvl)
  • Composite vs. Many Indexes
    • Each additional index impacts update performance
    • In order to select appropriate indexes, we need to know how many indexes the optimizer will use, and how many rows are represented by the where clause
    © Soaring Eagle Consulting, Inc Page 4 - select pub_id, title, notes from titles where type = 'Computer' and price > $15.
  • Options
    • CI or NCI on type
    • CI or NCI on price
    • One index on each of type & price
    • Composite on type, price
    • Composite on price, type
    • CI or NCI on type, price, pub_id, title, notes
    • Which are the best options in which circumstances?
    © Soaring Eagle Consulting, Inc Page 4 - select pub_id, title, notes from titles where type = 'Computer' and price > $15.
  • Index Usefulness
    • It is imperative to be able to estimate rows returned for an index. Therefore, the server will estimate rows returned before index assignation
    • If statistics are available (When would they not be?) the server estimates number of rows using distribution steps or index density
    • SQL Server 2005 automatically generates statistics about index key distributions using efficient sampling algorithms
    • If you have an equality join on a unique index, the server knows only one row will match and doesn't need to use statistics
    • The query analyzer index analyzer can analyze a query and recommend indexes
    • The more selective an index is, the more useful the index
    © Soaring Eagle Consulting, Inc Page 4 -
  • Data Distribution
    • You have a 1,000,000 row table. The unique key has a range (and random distribution) of 0 to 10,000,000
    • Question
    • How many rows will be returned by the following query?
    • How does the optimizer know whether to use an index or table scan?
    © Soaring Eagle Consulting, Inc Page 4 - select * from table where key between 1000000 and 2000000
  • Index Statistics
    • SQL Server keeps distribution information about indexes in a “statblob” column in the sysindexes table
    • There is distribution for every index
    • The optimizer uses this information to estimate the number of rows returned for a query
    • The distribution information is built at index creation time and maintained by the server if set to automatically do so
    © Soaring Eagle Consulting, Inc Page 4 -
  • Distribution Steps
    • The server creates the statistics by walking the index, and storing appropriate key values at each step increment
    © Soaring Eagle Consulting, Inc Page 4 - 10,000,000 rows have an integer key. 1 page has (2005 bytes / 4 bytes + 2 between) =~ 500 steps 10,000,000 rows / 500 steps = 20,000 rows / step
  • Distribution Steps
    • The optimizer will walk the index, storing the key value every 20,000 rows
    • When a query is executed
    • The number of keys in the range * 20,000 rows / key is the approximate number of rows affected
    © Soaring Eagle Consulting, Inc Page 4 - select * from table where key between 1000000 and 2005000
  • Viewing Index Statistics
    • Viewed with the dbcc show_statistics
    © Soaring Eagle Consulting, Inc Page 4 - Continued next page dbcc show_statistics (table_name,index_name)
  • Viewing Index Statistics (Cont’d) © Soaring Eagle Consulting, Inc Page 4 - Continued next page
  • Explaining DBCC Show Statistics
    • Updated date and time: When the statistics were last updated
    • Rows: Number of rows in the table
    • Rows Sampled: Number of rows sampled for statistics information
    • Density: Selectivity of the index
    • Average key length: Average length of an index row
    • All density: Selectivity of the specified column prefix in the index
    • Columns: Name of the index column prefix for which the all density is displayed
    • Steps: Number of histogram values in the current distribution statistics for the specified target on the specified table
    © Soaring Eagle Consulting, Inc Page 4 -
  • Estimating Logical Page I/O
    • If there is no index, there will be a table scan, and the estimate will be the number of pages in the table
    • If there is a clustered index, estimate will be the number of index levels plus the number of pages to scan
    • For a nonclustered index, estimate will be index levels + number of leaf pages + number of qualifying rows (which will correspond to the number of physical pages to read)
    • For a unique index and an equality join, the estimate will be 1 plus the number of index levels
    © Soaring Eagle Consulting, Inc Page 4 -
  • When to Force Index Selection
    • Don't Do it
      • With every release of the server, the optimizer gets better at selecting optimal query paths
      • Forcing the optimizer to behave in a specific manner does not allow it the freedom to change selection as data skews
      • It also does not permit the optimizer to take advantage of new strategies as advances are made in the server software
    © Soaring Eagle Consulting, Inc Page 4 -
  • When to Force Index Selection (Cont’d)
    • Exceptions
      • When you (the developer) have information about a table that SQL Server 2005 will not have at the time the query is processed (i.e., using a temp table in a nested stored procedure)
      • Occasions when you've proven the optimizer wrong
    © Soaring Eagle Consulting, Inc Page 4 -
  • How to Force Index Selection
    • How to Force Index Selection
    • To force the server to use a specific index for a specific table, you must first know the index id of the index you want to use
    • In this example, the titles index with the id of 2 will be used for the titles table, and the publishers index with an id of 1 will be used for publishers
    © Soaring Eagle Consulting, Inc Page 4 - select * from titles (2), publishers (1) where titles.pub_id = publishers.pub_id
  • When to Force Index Selection (Cont’d)
    • The following SQL will list all table names and their corresponding index ids
    • Allowing you to use the following syntax to force indexes
    • Instead, identify why the optimizer picked incorrectly
    © Soaring Eagle Consulting, Inc Page 4 - select * from titles (index(titleind)), publishers (index( UPKCL_pubind) ) where titles.pub_id = publishers.pub_id select ' table ' =o.name, ' index ' =i.name, indid from sysindexes i, sysobjects o where i.id = o.id
  • Summary
    • The optimizer uses indexes to improve query performance when possible
    • Watch out for improvised SARGs
    • Queries with OR may require a table scan
    • Try to take advantage of covered queries
    • Be careful when forcing an index
    © Soaring Eagle Consulting, Inc Page 4 -
  • GOT BLAME?
    • Database
    • Network
    • Application
    • The Blame Game: “My piece is fine..”
    • Storage
  • Insight: Drill into the Problem Tier
  • Case Study #1 – Problem in the Database
    • Pilot of a newly developed internet pharmacy application
      • SQL Server running on a 2-CPU system
      • currently supporting 120 users
    • System crashing frequently
      • CPU (per MS Perf Mon) was pegging the box at 100%
    • Solution after 6 weeks?
      • add 2 more CPUs to double server capacity
      • performance was “acceptable”
      • but CPU was still at 95%
    • Panic had set in
      • going to 800+ users on initial production roll out
      • how much more hardware would they need?
      • so far, this increased their software license costs by $70K
    • Here is what we found Monday at 10AM
    • Time to Value – 1 Day
    • Monday
    • 95% of 4 CPU’s
    • Tuesday
    • 7% of 2 CPU’s
  • Drill Down: Instance> Database> Statement
  • Drilling down into root causes
  • Further drill down
    • Click Here
  • Statement-level information
    • Significant io wait
  • i3 traps, tracks, and stores all query plans…
  • … And will tell you when & why the plans change
  • Drilling Down into Stored Procedures
    • REPLACE WITH MORE RECENT PICTURE
  • Who is Doing the Work?
  • What tables are being accessed the most? Drill down and identify the queries hitting them.
  • You’ve found the high-activity table; how is it being accessed?
  • Ever wonder which indexes may be safely removed?
  • New Wait States
  • Procedure Cache
  • Actual Plan
  • Case Study #2: Not in the Database
    • 4-processor SQL Server in an Application Service Provider (ASP) environment
    • Key queries (stored procedures) were rewritten to run in under a half second, from 5-20 seconds.
    • Further follow-up found that application response time was still in the 5-second + category
    • We knew the issue was not in the database; but where was it?
  • How to demonstrate that it’s not the database
  • Where is the application spending its time?
    • When the vast majority of elapsed time is not in the database, you have evidence that it’s not your fault.
    • In this case, I had been brought in to tune the database, knocked all the queries down from 5-10 second range to significantly subsecond. Performance was still in the 3-6 second range for much of the application. I knew it was not a database issue, but what was it?
    • In this case, it was the result of significant requests to an SSL layer; it turned out that some of the screens were taking 5-6 seconds to encrypt due to the quantity of dynamic data displayed.
  • Why APM is Important
    • 70% - studies have shown that 70%+ of all application performance issues are directly attributable to the source code, NOT INFRASTRUCTURE
      • It’s critical to monitor and track performance of the application code components
    • 25% - studies have shown that only 25% of performance issues identified in production could have been anticipated and resolved in test/dev/QA
      • Monitoring production performance is critical to application availability and performance
    • 60% of i3’s customer’s cancel or defer hardware upgrade purchases within the first year of ownership
      • Multi-tier web-based applications are extremely complex with great interdependency among hardware/software/application components. This makes it virtually impossible to determine root cause in a timely fashion – a typical response to the frustration is to upgrade expensive hardware. I3 streamlines your application components.
    • 100% - i3’s customers have found that their staff can double their IT responsibility without additional workload
    • 3 – i3 has an average payback of 3 months, 1.8 months at one large Federal agency
    • 1-3% - i3 has an average overhead of 1-3% in production environments
    • 24% of IT staff time is devoted to addressing application performance delays