• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011

Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011



Speaker: Don Vilen, Chief Scientist, Buysight & former Microsoft SQL Server Team Member ...

Speaker: Don Vilen, Chief Scientist, Buysight & former Microsoft SQL Server Team Member

This session covers the basics of Filtered Indexes and Sparse Columns, and then dives into the areas where they work well—and not so well—both together and separately. Don will show demos that show how they work and when they work well.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011 Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011 Presentation Transcript

    • Microsoft SQL ServerFiltered Indexes and Sparse Columns: Together, Together Separately Speaker: Don Vilen Chief S i i BuySight Chi f Scientist, B Si h February 2011 Mark Ginnebaugh, User Group Leader www.bayareasql.org
    • 15 Feb 2011Filtered Indexes andSparse Columns: Together, Separately –Don Vilen Chief Scientist Buysight Vilen, Scientist,DVilen@buysight.com
    • Agenda ◦ Filtered Indexes ◦ Filtered Statistics ◦ Wide Tables ◦ Sparse Columns S C l ◦ T th … Together ◦ … and Separately ◦ Everything is SQL Server 2008 (and later), in all editions
    • The Scenario ◦ 100,000 rows in the table  99 500 rows are hi 99,500 historical, remaining 500 rows are current i l i i  Indicated by NULL EndDate column or IsActive bit, etc. ◦ All queries on current data use index ◦ But why index all the historical 99.5% of the table? ◦ 1 000 columns in a table 1,000 ◦ BikeColor column is relevant only if ItemType is ‘Bicycle’  For 0.5% of the rows; remainder are NULL ◦ But why index all the rows regardless of ItemType value?
    • Filtered Indexes ◦ Indexes only rows with values that match WHERE clause  CREATE INDEX xyz ON table(columns, …) y ( , )  WHERE EndDate IS NULL  WHERE IsActive = 1  WHERE ItemType = ‘Bicycle’ ◦ Uses:  Ranges of values for smaller portion of large table  Avoid the common 80-90% of data where the index wouldn’t be helpful  For categories of row data  Index on Column120 and Column121 only useful when C1 = 37  Table partitions, where index is needed only on the ‘current’ partition(s)  Each partition will have the index structure, but only ‘current’ partitions will have any rows in the index ◦ Benefits  Better query performance  Reduction in storage costs  Reduction in maintenance cost/time
    • Filtered Index – Allowed Syntax◦ WHERE <filter_predicate>[from BOL: CREATE INDEX]  <filter_predicate> ::= <conjunct> [ AND <conjunct> ]  <conjunct> ::= <disjunct> | <comparison>  <disjunct> ::= column_name IN (constant ,…)  <comparison> ::= column_name <comparison_op> constant  <comparison_op> ::= { IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !< }◦ No BETWEEN, no LIKE, no subquery, no variables◦ So must be simple and deterministic
    • Filtered Indexes – Requirements ◦ Always some comparison involved, so must agree on how operations work, so requires standard work SET options  ON for ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT ANSI WARNINGS ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER  OFF for NUMERIC_ROUNDABORT ◦ Else:  If not set when index is created, won’t create the index  If not set when INSERT, UPDATE, DELETE, MERGE affects the data, gives error and rolls back  If not set when the index might be used to optimize the query, it will not be considered
    • Filtered Indexes – Applicability ◦ Non-clustered indexes only (rather obviously ) ◦ F UNIQUE i d For indexes, only th i d d rows l the indexed must have unique index values  Duplicates in the non-indexed rows are not checked, but be careful that an update to a qualifying column doesn’t doesn t cause a duplicate to occur  CREATE UNIQUE INDEX ix1 ON xyz (c3) WHERE c2 = 10  So now there is a way to create a unique index on column with multiple NULL values; create index WHERE ColY IS NOT NULL ◦ Fil Filtered i d d indexes d not apply to: do l  XML indexes  Full-text indexes  Spatial indexes
    • Filtered Indexes – Getting Them Used 1 ◦ QO can only use the index when it knows the index will match the conditions in the query’s WHERE clause query s ◦ Assume Column120 and Column121 useful only when C1 = 37  So CREATE INDEX i1 on dbo.t1 (Column120, Column121) dbo t1 (Column120 WHERE C1 = 37  SELECT Column121 FROM dbo.t1 WHERE Column120 = 13 Cannot use the index even if Column120 and Column121 only appear for C1 = 37  As far as the QO knows, there may be other Column120 or Column121 values that are not in the index ◦ Help the QO by adding more limiting predicates to WHERE clause  Make it WHERE Column120 = 13 AND C1 = 37
    • Filtered Indexes – Getting Them Used 2 ◦ WHERE with a variable rather than a literal ◦ Assume index is on WHERE IsActive > 0  DECLARE @IsActive int; SET @IsActive = 1;  SELECT xyz FROM table WHERE IsActive = @IsActive ◦ QO doesn’t know value of variable, so doesn’t know if index fits  So shouldn’t use variables as if they were constants ◦ Again, help the QO by adding more limiting p predicates to WHERE clause  Make it WHERE IsActive = @IsActive AND IsActive > 0 But B t perhaps that d h th t doesn’t really make sense h ’t ll k here
    • Filtered Indexes – Getting Them Used 3 ◦ WHERE with a function or conversion on the filter predicate  Obvious: WHERE ABS(C1) = 37  Cannot use index on WHERE C1 = 37  Could change it to WHERE C1 = ABS(37) if same meaning .. but not in this case hi  Implicit conversions:  Assume index is WHERE c3 > 100  DECLARE @varR real; SET @varR = 1000.5; @ @  SELECT * FROM tv2 WHERE c3 = @varR  Requires conversion of c3 to real before comparison, so can’t use index  SELECT * FROM tv2 WHERE c3 = cast(@varR as int) (@ )  At least it requires no conversion of c3, but is unknown value at optimization time, so can’t use index  So add a limiting predicate … assuming you know it will always be right  SELECT * FROM tv2 WHERE c3 = cast(@varR as int) AND c3 > 100
    • A Mis-Application of Filtered Indexes Mis- ◦ Create a filtered index on c and b with WHERE on c ◦ Attempt to use the index as a validation table ◦ In code use the index in a hint and expect to get no row back for a b where c is a match, but b it gets an error instead due to hint dd h prevents a plan from being created
    • Filtered Indexes – And Views ◦ Cannot create a Filtered index on a view, not even a non-clustered index on an indexed view  But a filtered index can be chosen by the QO for the query formed from a view .. or function f df i f ti
    • Filtered Indexes – Considerations 1 ◦ Storage size differences  Fewer index rows take less space  Less IO, more information fits in memory  4,000 pages vs. 1 page p g p g ◦ Limits auto-parameterization  QO will not auto-parameterize if predicate is used in a filtered index (“in most cases”, per BOL) ( in cases  Otherwise would inhibit use of filtered index  So can affect plan reuse ◦ Index maintenance – same rebuild and reorganize as regular index  But hopefully much less work to do
    • Filtered Indexes – Considerations 2 ◦ Covering index  Consider INCLUDEing other columns so more likely to be selected by QO ◦ DTA can suggest a filtered index fil di d  ColX IS NOT NULL – only of this form  But the missing indexes functionality does not flag missing-indexes them as missing ◦ When not to use:  When non-filtered index already exists, or another access path is likely better or adequate  Avoid the extra index maintenance
    • Filtered Statistics ◦ CREATE STATISTICS stats1 ON table (cols) WHERE <condition> ◦ Uses:  Can create filtered statistics on skewed data to assist QO  Filtered Statistics will likely be more precise because they cover only the data in the filtered subset (or filtered index)  Table partitions, where statistics are needed only on ‘current’ partition(s) ◦ Cannot reference a computed column, a UDT column, a spatial data type column, or a hierarchyID data type column ◦ AutoCreateStats will create statistics on Filtered Index key columns ◦ AutoCreateStats will not create filtered statistics on other columns  You have to create them yourself ◦ AutoUpdateStats will keep them updated once they are created
    • Metadata for Indexes, Statistics ◦ sys.indexes  has_filter, filter_definition ◦ sys.stats  has_filter, filter_definition ◦ SSMS  Indexes and Statistics Properties have a Filter tab
    • Questions on Filtered Indexes,Statistics Any q y questions? Now we’ll move on to Wide Tables we ll Tables, Sparse Columns
    • Wide Tables ◦ Up to 30,000 Columns  Great for Sharepoint-like “a row is an object, some attributes depend on other attributes” ◦ Some limits:  Columns per non-wide table: 1,024  Columns per wide table: 30,000  Columns per SELECT statement: 4,096  Columns per INSERT statement: 4,096  Indexes per table: 1 000 1,000  Statistics per table: 30,000  BOL: Maximum Capacity Specifications for SQL Server
    • Wide Table◦ A wide table has defined a column set, using sparse columns  New row structure for sparse columns  {column, value}, {column, value} …  Can create flexible schemas within an application  Can add or drop columns whenever you want without having to touch each row◦ The maximum size of a wide table row is 8,018 8 018 bytes, so most of the data in a row has to be NULL  Or has to be varchar-type columns so it can overflow to another page◦ Limit is still 1,024 for number of non-sparse columns plus computed columns, even in a wide table
    • Wide Tables – Performance Impact ◦ Performance considerations:  Increased run-time and compile-time memory requirements  Wid t bl can h Wide tables have up t 30,000 columns defined; to 30 000 l d fi d this can increase compile time  There can be up to 1,000 indexes on a wide table, p , , which increases the index maintenance time  Nonclustered indexes should be filtered indexes to minimize their impact  For more information, see BOL: Performance Considerations for Wide Tables
    • Sparse Columns◦ CREATE TABLE … (…, c1 int SPARSE NULL, …)◦ New row format for sparse columns◦ Column:  Must be NULLable  Cannot be part of a cluster index  Cannot b part of a primary key index C be f k d  Cannot have a DEFAULT  Cannot be a computed column
    • Sparse Columns – Some More Cannots ◦ Some types cannot be sparse:  geography • ntext • User-defined data types  geometry • text  image • timestamp ◦S Some attributes cannot be on sparse columns b b l  No Filestream  N t Id tit Not Identity  Not RowGuidCol
    • Sparse Columns – Types and Size ◦ Size impact  An important consideration but not the only one ◦ At what percentage of NULLs does a sparse column take less space than a non-sparse column? Non-Sparse N S Sparse S Null Estimate N ll E i  BIT 1/8th byte 4 1/8th bytes –> 98%  BIGINT 8 bytes y 12 bytes y –> 52%  See BOL: Using Sparse Columns for a complete table of types
    • Column Sets◦ How do you know which columns ‘exist’ for a row?◦ You could just SELECT them; those that don t exist are NULL don’t◦ Can define a “Column set”  Optional, only one per table◦ Include a column:  MyColSet XML COLUMN_SET FOR ALL_SPARSE_COLUMNS◦ Selecting from MyColSet returns an XML description of the sparse columns in that row  <c25>ABC</c25><c34>599</c34>◦ Can INSERT / UPDATE sparse columns by  Referring to them by name as usual, or  Specifying the XML for the Column_Set column  See BOL: Using Column Sets for more details
    • Feature / Technology Support ◦ Sparse columns and column sets are not fully supported b some SQL Server technologies d by S h l i ◦ S arse Col mns not s Sparse Columns supported b : orted by:  Merge Replication ◦ Column Sets not supported by:  Replication, Distributed Query, Change Data p y g Capture  See BOL: Using Column Sets for more details
    • Meta Data for Sparse Columns ◦ sys.columns – is_sparse, is_column_set  And in:  sys.system_columns  sys.all_columns sys all columns  sys.computed_columns  sys.identity_columns ◦ Do not confuse with sparse files as used for Database Snapshots  The is_sparse in sys.database_files, sys.master_files
    • Together ◦ Sparse Columns together with Filtered Index ◦ On Sparse column, filtered index with xx IS NOT NULL avoids indexing all the rows with no value ◦ Makes a lot of sense, and likely the driving force behind filtered indexes ◦ B not needed on every sparse column But d d l
    • Separately ◦ Filtered Index without Sparse Column  Filtered indexes on skewed data  Filtered statistics on skewed data ◦ Sparse Column without Filtered Index  Sparse columns on sparse data, perhaps no index to go with it
    • Summary ◦ Filtered Indexes ◦ Filtered Statistics ◦ Wide Tables ◦ Sparse Columns ◦ Together … ◦ … and Separately ◦ Don Vilen  Chief Scientist, Buysight  DVilen@buysight com DVilen@buysight.com
    • To learn more or inquire about speaking opportunities, please q p g pp ,p contact:Mark Ginnebaugh, User Group Leader mark@designmind.com