Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Introduction of sql server indexing

  1. SQL Server Data Indexing
  2. Clustered Tables vs Heap Tables • If a table has no indexes or only has non-clustered indexes it is called a heap An age old question is whether or not a table must have a clustered index. The answer is no, but in most cases, it is a good idea to have a clustered index on the table to store the data in a specific order. • The name suggests itself, these tables have a Clustered Index. Data is stored in a specific order based on a Clustered Index key. Cluster table Heap Tables
  3. Clustered Tables vs Heap Tables HEAP • Data is not stored in any particular order • Specific data can not be retrieved quickly, unless there are also non- clustered indexes. • Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages • Since there is no clustered index, additional time is not needed to maintain the index • Since there is no clustered index, there is not the need for additional space to store the clustered index tree • These tables have a index_id value of 0 in the sys.indexes catalog view
  4. Clustered Table • Data is stored in order based on the clustered index key • Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns • Data pages are linked for faster sequential access • Additional time is needed to maintain clustered index based on INSERTS, UPDATES and DELETES • Additional space is needed to store clustered index tree • These tables have a index_id value of 1 in the sys.indexes catalog view Clustered Tables vs Heap Tables
  5. Types of Indexes • Clustered index • Nonclustered index • Unique index • Filtered index
  6. • Covered Index • Columnstore index • Non-Key Index Columns • Implied indexes Created by some constraints i. Primary Key ii. Unique Types of Indexes
  7. • Full-text index A special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string data. • Spatial index A spatial index provides the ability to perform certain operations more efficiently on spatial objects (spatial data) in a column of the geometry data type. Types of Indexes
  8. SQL Server Index Basics
  9. Clustered Index • The top-most node of this tree is called the "root node" • The bottom level of the nodes is called "leaf nodes" • Any index level between the root node and leaf node is called an "intermediate level" • The leaf nodes contain the data pages of the table in the case of a cluster index. • The root and intermediate nodes contain index pages holding an index row. • Each index row contains a key value and pointer to intermediate level pages of the B-tree or leaf level of the index. • The pages in each level of the index are linked in a doubly-linked list.
  10. Clustered Index Database and leaf node Root Abby Bob Carol Dave Abby Ada Andy Ann Ada Alan Amanda Amy • A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. • The clustered index is implemented as a B- tree index structure that supports fast retrieval of the rows, based on their clustered index key values. The basic syntax to create a clustered index is CREATE CLUSTERED INDEX Index_Name ON Schema.TableName(Column); • A clustered index stores the data for the table based on the columns defined in the create index statement. As such, only one clustered index can be defined for the table because the data can only be stored and sorted one way per table.
  11. Nonclustered Index • Index Leaf Nodes and Corresponding Table Data • Each index entry consists of the indexed columns (the key, column 2) and refers to the corresponding table row (via ROWID or RID). • Unlike the index, the table data is stored in a heap structure and is not sorted at all. • There is neither a relationship between the rows stored in the same table block nor is there any connection between the blocks.
  12. Nonclustered Index Database Root Abby Bob Carol Dave Amy Ada Amanda Alan Leaf node Abby Ada Andy Ann Ada Alan Amanda Amy • A nonclustered index can be defined on a table or view with a clustered index or on a heap. • Each index row in the nonclustered index contains the nonclustered key value and a row locator The basic syntax for a nonclustered index is CREATE INDEX Index_Name ON Schema.TableName(Column); • SQL Server supports up to 999 nonclustered indexes per table.
  13. CLUSTERED VS. NONCLUSTERED INDEXES • Clustered index: a SQL Server index that sorts and stores data rows in a table, based on key values. • Nonclustered index: a SQL Server index which contains a key value and a pointer to the data in the heap or clustered index. • The difference between clustered and nonclustered SQL Server indexes is that • a clustered index controls the physical order of the data pages. • The data pages of a clustered index will always include all the columns in the table, even if you only create the index on one column. • The column(s) you specify as key columns affect how the pages are stored in the B-tree index structure • A nonclustered index does not affect the ordering and storing of the data
  14. Clustered and Nonclustered Indexes Interact • Clustered indexes are always unique – If you don’t specify unique when creating them, SQL Server may add a “uniqueifier” to the index key • Only used when there actually is a duplicate • Adds 4 bytes to the key • The clustering key is used in nonclustered indexes – This allows SQL Server to go directly to the record from the nonclustered index – If there is no clustered index, a record identifier will be used instead 1 Jones John 2 Smith Mary 3 Adams Mark 4 Douglas Susan Adams 3 Douglas 4 Jones 1 Smith 2 Leaf node of a clustered index on EmployeeID Leaf node of a nonclustered index on LastName
  15. Clustered and Nonclustered Indexes Interact (continued) • Another reason to keep the clustering key small! • Consider the following query: SELECT LastName, FirstName FROM Employee WHERE LastName = 'Douglas' • When SQL Server uses the nonclustered index, it – Traverses the nonclustered index until it finds the desired key – Picks up the associated clustering key – Traverses the clustered index to find the data
  16. Deciding what indexes go where? • Indexes speed access, but costly to maintain – Almost every update to table requires altering both data pages and every index. • All inserts and deletions affect all indexes • Many updates will affect non-clustered indexes • Sometimes less is more – Not creating an index sometimes may be best • Code for tranasaction have where clause? What columns used? Sort requried?
  17. • Selectivity – Indexes, particularly non-clustered indexes, are primarily beneficial in situations where there is a reasonably HIGH LEVEL of Selectivity within the index. • % of values in column that are unique • Higher percentage of unique values, the higher the selectivity – If 80% of parts are either ‘red’ or ‘green’ not very selective Deciding what indexes go where?
  18. Choosing Clustered Index • Only one per table! - Choose wisely • Default, primary key creates clustered index – Do you really want your prime key to be clustered index? – Option: create table foo myfooExample (column1 int identify primary key nonclustered column2 …. ) – Changing clustered index can be costly • How long? Do I have enough space?
  19. Clustered Indexes Pros & Cons • Pros – Clustered indexes best for queries where columns in question will frequently be the subject of • RANGE query (e.g., between) • Group by with max, min, count – Search can go straight to particular point in data and just keep reading sequentially from there. – Clustered indexes helpful with order by based on clustered key
  20. Clustered Indexes Pros & Cons • The Cons – two situations – Don’t use clustered index on column just because seems thing to do (e.g., primary key default) – Lots of inserts in non-sequential order • Constant page splits, include data page as well as index pages • Choose clustered key that is going to be sequential inserting • Don’t use a clustered index at all perhaps?
  21. These are limits, not goals. Every index you create will take up space in your database. The index will also need to be modified when inserts, updates, and deletes are performed. This will lead to CPU and disk overhead, so craft indexes carefully and test them thoroughly There are a few limits to indexes. • There can be only one clustered index per table. • SQL Server supports up to 999 nonclustered indexes per table. • An index – clustered or nonclustered – can be a maximum of 16 columns and 900 bytes. Limits to indexes
  22. PRIMARY KEY AS A CLUSTERED INDEX • Primary key: a constraint to enforce uniqueness in a table. The primary key columns cannot hold NULL values. • In SQL Server, when you create a primary key on a table, if a clustered index is not defined and a nonclustered index is not specified, a unique clustered index is created to enforce the constraint. • However, there is no guarantee that this is the best choice for a clustered index for that table. • Make sure you are carefully considering this in your indexing strategy.
  23. Unique Index • An index that ensures the uniqueness of each value in the indexed column. • If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. • For example, • if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated. • A unique index is automatically created when you define a primary key or unique constraint: • Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key. • Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table. • A unique index ensures that the index key contains no duplicate values. Both clustered and nonclustered indexes can be unique.
  24. Filtered index • An optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. • SQL Server 2008 introduces Filtered Indexes which is an index with a WHERE clause • Filtered indexes can provide the following advantages over full-table indexes: • Improved query performance and plan quality • Reduced index maintenance costs • Reduced index storage costs A well-designed filtered index improves query performance and execution plan quality because it is smaller than a full-table nonclustered index and has filtered statistics An index is maintained only when data manipulation language (DML) statements affect the data in the index. A filtered index reduces index maintenance costs compared with a full-table nonclustered index because it is smaller and is only maintained when the data in the index is changed. Creating a filtered index can reduce disk storage for nonclustered indexes when a full-table index is not necessary.
  25. Filtered index Design Considerations • When a column only has a small number of relevant values for queries, you can create a filtered index on the subset of values. For example, when the values in a column are mostly NULL and the query selects only from the non-NULL values, you can create a filtered index for the non-NULL data rows. The resulting index will be smaller and cost less to maintain than a full-table nonclustered index defined on the same key columns. • When a table has heterogeneous data rows, you can create a filtered index for one or more categories of data. This can improve the performance of queries on these data rows by narrowing the focus of a query to a specific area of the table. Again, the resulting index will be smaller and cost less to maintain than a full-table nonclustered index. SELECT ComponentID, StartDate FROM Production.BillOfMaterials WITH ( INDEX ( FIBillOfMaterialsWithEndDate ) ) WHERE EndDate IN ('20000825', '20000908', '20000918'); To ensure that a filtered index is used in a SQL query CREATE NONCLUSTERED INDEX FIBillOfMaterialsWithEndDate ON Production.BillOfMaterials (ComponentID, StartDate) WHERE EndDate IS NOT NULL ;
  26. Covering Indexes • When a nonclustered index includes all the data requested in a query (both the items in the SELECT list and the WHERE clause), it is called a covering index • With a covering index, there is no need to access the actual data pages – Only the leaf nodes of the nonclustered index are accessed – For example, your query might retrieve the FirstName ,LastName and DOB columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns. • Because the leaf node of a clustered index is the data itself, a clustered index covers all queries Leaf node of a nonclustered index on LastName, FirstName, Birthdate Adams Mark 1/14/1956 3 Douglas Susan 12/12/1947 4 Jones John 4/15/1967 1 Smith Mary 7/14/1970 2 The last column is EmployeeID. Remember that the clustering key is always included in a nonclustered index.
  27. Non-Key Index Columns • SQL Server 2005 and later allow you to include columns in a non-clustered index that are not part of the key – Allows the index to cover more queries – Included columns only appear in the leaf level of the index – Up to 1,023 additional columns – Can include data types that cannot be key columns • Except text, ntext, and image data types • Syntax CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name ON <object> ( column [ ASC | DESC ] [ ,...n ] ) [ INCLUDE ( column_name [ ,...n ] ) ] • Example CREATE NONCLUSTERED INDEX NameRegion_IDX ON Employees(LastName) INCLUDE (Region)
  28. KEY VS. NONKEY COLUMNS • Key columns: the columns specified to create a clustered or nonclustered index. • Nonkey columns: columns added to the INCLUDE clause of a nonclustered index. • The basic syntax to create a nonclustered index with nonkey columns is: • CREATE INDEX Index_Name ON Schema.TableName(Column) INCLUDE (ColumnA, ColumnB); • A column cannot be both a key and a non-key. It is either a key column or a non- key, included column. • The difference lies in where the data about the column is stored in the B-tree. Clustered and nonclustered key columns are stored at every level of the index – the columns appear on the leaf and all intermediate levels. A nonkey column will only be stored at the leaf level, however. • There are benefits to using non-key columns. • Columns can be accessed with an index scan. • Data types not allowed in key columns are allowed in nonkey columns. All data types but text, ntext, and image are allowed. • Included columns do not count against the 900 byte index key limit enforced by SQL Server.
  29. The query we want to use is SELECT ProductID, Name, ProductNumber, Color FROM dbo.Products WHERE Color = 'Black'; The first index is nonclustered, with two key columns: CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name, ProductNumber); The second is also nonclustered, with two key columns and three nonkey columns: CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber) INCLUDE (Color, Class, Style); In this case, the first index would not be a covering index for that query. The second index would be a covering index for that specific query. COVERING INDEXES EXAMPLES
  30. Column Store Index Basic There are two types of storage available in the database; RowStore and ColumnStore. In RowStore, data rows are placed sequentially on a page while in ColumnStore values from a single column, but from multiple rows are stored adjacently. So a ColumnStore Index works using ColumnStore storage. We cannot perform DML ( Insert Update Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.
  31. Creating Column Store Index Creating a ColumnStore Index is the same as creating a NonClustered Index except we need to add the ColumnStore keyword as shown below. The syntax of a ColumnStore Index is: CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name (Column1,Column2,... Column N) Example: -- Creating Non - CLustered ColumnStore Index on 3 Columns CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]ON [dbo].[Test_Person]([FirstName] , [MiddleName],[LastName]) • The cost when using the ColumnStore index is 4 times less than the traditional non-clustered index.
  32. Fill Factor • When you create an index the fill factor option indicates how full the leaf level pages are when the index is created or rebuilt. • Valid values are 0 to 100. • A fill factor of 0 means that all of the leaf level pages are full. • If data is always inserted at the end of the table, then the fill factor could be between 90 to 100 percent since the data will never be inserted into the middle of a page. • If the data can be inserted anywhere in the table then a fill factor of 60 to 80 percent could be appropriate based on the INSERT, UPDATE and DELETE activity.
  33. How SQL Server Indexes Work
  34. B-Tree Index Data Structure • SQL Server indexes are based on B-trees – Special records called nodes that allow keyed access to data – Two kinds of nodes are special • Root • Leaf Intermediate node Leaf node Data pages Root node A O O T T W E IGCA T A C E G I K M N O Q A I • If there are enough records, intermediate levels may be added as well. • Clustered index leaf-level pages contain the data in the table. • Nonclustered index leaf-level pages contain the key value and a pointer to the data row in the clustered index or heap.
  35. SQL Server B-Tree Rules • Root and intermediate nodes point only to other nodes • Only leaf nodes point to data • The number of nodes between the root and any leaf is the same for all leaves • B+tree can have more than 1 keys in a node, in fact thousands of keys is seen typically stored in a node and hence, the branching factor of a B+tree is very large. • B-trees are always sorted • The tree will be maintained during insertion, deletion, and updating so that these rules are met – When records are inserted or updated, nodes may split – When records are deleted, nodes may be collapsed • B+trees have all the key values in their leaf nodes. All the leaf nodes of a B+tree are at the same height, which implies that every index lookup will take same number of B+tree lookups to find a value. • Within a B+tree all leaf nodes are linked together in a linked-listed, left to right, and since the values at the leaf nodes are sorted, so range lookups are very efficient.
  36. What Is a Node? • A page that contains key and pointer pairs Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer
  37. Splitting a B-Tree Node Root (Level 0) Node (Level 1) Leaf (Level 2) Abby Bob Carol Dave Abby Ada Andy Ann Ada Alan Amanda Amy Bob Alan Amanda Carol Amy Dave Ada DB
  38. Let’s Add Alice • Step 1: Split the leaf node Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Amanda Amy
  39. Adding Alice • Step 2: Split the next level up DB Leaf Abby Ada Amanda Andy Ann Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Amanda Amy
  40. Adding Alice (continued)• Split the root DB LeafAda Alan Alice Bob Alan Amanda Carol Amy Dave Ada Alice Amanda Amy Andy Ann Carol DaveAbby Andy Bob Abby Ada Amanda
  41. Adding Alice (continued) • When the root splits, the tree grows another level Root (Level 0) Node (Level 1) Node (Level 2) Leaf (Level 3) DB Abby Carol Amanda Amy Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Abby Andy Bob Abby Ada Amanda Carol Dave Andy Ann
  42. Page splits cause fragmentation • Two types of fragmentation – Data pages in a clustered table – Index pages in all indexes • Fragmentation happens because these pages must be kept in order • Data page fragmentation happens when a new record must be added to a page that is full – Consider an Employee table with a clustered index on LastName, FirstName – A new employee, Peter Dent, is hired ExtentAdams, Carol Ally, Kent Baccus, Mary David, Sue Dulles, Kelly Edom, Mike Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...
  43. Data Page Fragmentation Extent ExtentDulles, Kelly Edom, Mike ... Adams, Carol Ally, Kent Baccus, Mary David, Sue Dent, Peter Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...
  44. Index Fragmentation • Index page fragmentation occurs when a new key-pointer pair must be added to an index page that is full – Consider an Employee table with a nonclustered index on Social Security Number • Employee 048-12-9875 is added 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-33-9874, pointer 052-87-8373, pointer 116-11-9987, pointer 116-33-9874, pointer ... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer Extent
  45. Index Fragmentation (continued) Extent Extent 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-12-9875, pointer 116-11-9987, pointer 116-33-9874, pointer ... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer 048-33-9874, pointer 052-87-8373, pointer ...
  46. How B+tree Indexes Impact Performance
  47. Why use B+tree? • B+tree is used for an obvious reason and that is speed. • As we know that there are space limitations when it comes to memory, and not all of the data can reside in memory, and hence majority of the data has to be saved on disk. • Disk as we know is a lot slower as compared to memory because it has moving parts. • So if there were no tree structure to do the lookup, then to find a value in a database, the DBMS would have to do a sequential scan of all the records. • Now imagine a data size of a billion rows, and you can clearly see that sequential scan is going to take very long. • But with B+tree, its possible to store a billion key values (with pointers to billion rows) at a height of 3, 4 or 5, so that every key lookup out of the billion keys is going to take 3, 4 or 5 disk accesses, which is a huge saving.
  48. This goes to show the effectiveness of a B+tree index, more than 16 million key values can be stored in a B+tree of height 1 and every key value can be accessed in exactly 2 lookups. How is B+tree structured? • B+trees are normally structured in such a way that the size of a node is chosen according to the page size. • Why? Because whenever data is accessed on disk, instead of reading a few bits, a whole page of data is read, because that is much cheaper. • Let us look at an example, Consider InnoDB whose page size is 16KB • and suppose we have an index on a integer column of size 4bytes • So a node can contain at most 16 * 1024 / 4 = 4096 keys, and a node can have at most 4097 children. • So for a B+tree of height 1, the root node has 4096 keys and the nodes at height 1 (the leaf nodes) have 4096 * 4097 = 16781312 key values.
  49. • So the size of the index values have a direct bearing on performance! How important is the size of the index values? As can be seen from the above example, the size of the index values plays a very important role for the following reasons: • The longer the index, the less number of values that can fit in a node, and hence the more the height of the B+tree. • The more the height of the tree, the more disk accesses are needed. • The more the disk accesses the less the performance.
  50. Index Design • For tables that are heavily updated, use as few columns as possible in the index, and don’t over-index the tables. • If a table contains a lot of data but data modifications are low, use as many indexes as necessary to improve query performance • For clustered indexes, try to keep the length of the indexed columns as short as possible. Ideally, try to implement your clustered indexes on unique columns that do not permit null values. • The uniqueness of values in a column affects index performance. In general, the more duplicate values you have in a column, the more poorly the index performs. Index design should take into account a number of considerations.
  51. Index Design • In addition, indexes are automatically updated when the data rows themselves are updated, which can lead to additional overhead and can affect performance. • Due to the storage and sorting impacts, be sure to carefully determine the best column for this index. • The number of columns in the clustered (or non clustered) index can have significant performance implications with heavy INSERT, UPDATE and DELETE activity in your database. • For composite indexes, take into consideration the order of the columns in the index definition. Columns that will be used in comparison expressions in the WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first. • You can also index computed columns if they meet certain requirements. For example, the expression used to generate the values must be deterministic (which means it always returns the same result for a specified set of inputs).
  52. Identifying Fragmentation vs. page splits DBCC SHOWCONTIG Page 283
  53. Resolving Fragmentation Heap Tables: • For heap tables this is not as easy. The following are different options you can take to resolve the fragmentation: • Create a clustered index • Create a new table and insert data from the heap table into the new table based on some sort order • Export the data, truncate the table and import the data back into the table Clustered Tables: • Resolving the fragmentation for a clustered table can be done easily by rebuilding or reorganizing your clustered index. This was shown in this previous tip: SQL Server 2000 to 2005 Crosswalk - Index Rebuilds. DBCC DBREINDEX DBCC INDEXDEFRAG ( { database_name | database_id | 0 } , { table_name | table_id} , { index_name | index_id } )
  54. Mahabubur Rahaman Senior Database Architect Orion Informatics Ltd
Advertisement