3. The fundamental business problem was performance and data availability
SQL Server does not need to have indexes on a table to retrieve data.
A table can simply be scanned to find the piece of data that is requested.
However, the amount of time to find a piece of data is directly proportional
to the amount of data in the table.
Indexes are designed to improve the performance of data retrieval
operations.
An index is useful only if it can provide a means to find data very quickly
regardless of the volume of data that is stored.
You can define an index by using one or more columns in the table, called
the index key.
You can define an index with a maximum of 16 columns.
The maximum size of the index key is 900 bytes.
4. The structure that SQL Server uses to build and maintain indexes is called a
balanced tree (B-tree)
5. A Balance Tree (B-tree) is always symmetrical, with the same
number of pages on both the left and right halves at each level.
When a query is issued against an indexed column, the query
engine starts at the root node and navigates down through the
intermediate nodes, with each layer of the intermediate level more
granular than the one above.
The query engine continues down through the index nodes until it
reaches the leaf node.
The query engine would first look in the root level to determine
which page to reference in the top intermediate level.
The leaf node will contain either the entire row of data or a pointer to
that row, depending on whether the index is Clustered or Non
Clustered.
6. Page Splits:
A page is 8Kb of data which can be related to an index, data.
If the row length increases (longer data value), SQL Server will move the
other rows in the page to accommodate the change.
If the page turns out to be small for all these rows, then SQL Server grabs
another new page and moves rows to the left/right of the modification onto
it. This is termed as a page split.
A page split occurs when there is no enough space left to perform an
INSERT in the available memory / space.
Pages in SQL Server can store up to 8,060 bytes of data. So an index
created on a column with an INT data type can store 2,015 (8,060/ 4 Byte)
values on a single page within the index, whereas an index based on a
column with a datetime2 data type can store only about half as many values
per page, or 1,007 values per page.
Based on the number of bytes required to store an index key, determined by
the data type.
The amount of time needed to locate data also depends upon writing efficient
queries.
Index Entry Storage
7. Page Splits (continue)
The factors that affect the number of page splits are:
1. Number of users
2. Level of user activity
3. The frequency of the rebuild of indexes
4. The primary key being a clustered index
5. The performance of your I/O subsystem
6. Read or write operations
7. The fill-factor used in table indexes
Solution to reduce the number of page splits:
1. Increase the fill factor on your indexes.
2. Rebuild your indexes more often.
3. Add clustered indexes to your monotonically increasing primary keys.
4. Get a faster I/O subsystem.
8. Clustered Indexes
The column(s) defined for the clustered index are referred to as the
clustering key
A clustered index stores the actual data rows at the leaf level of the
index.
A clustered index is special because it causes SQL Server to arrange
the data in the table according to the clustering key
Because a table cannot be sorted more than one way (ascending or
descending) you can define ONLY one clustered index on a table.
Data in a table is sorted only if a clustered index has been defined on
a table.
A table that has a clustered index is referred to as a clustered table.
A table that has no clustered index is referred to as a heap. So pages
chains are not stored in sorted order.
Index type
9. Clustered Indexes
A clustered index does not physically store the data on disk in a
sorted order.
A clustered index ensures that the page chain of the index is sorted
logically.
The general syntax for creating a relational index is as follows:
CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX
index_name
ON <object> ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
[ WHERE <filter_predicate> ]
[ WITH ( <relational_index_option> [ ,...n ] ) ]
[ ON { partition_scheme_name ( column_name ) | filegroup_name |
default } ]
[ FILESTREAM_ON { filestream_filegroup_name |
partition_scheme_name | "NULL" } ][ ;
]
10. Non Clustered Indexes
A Non Clustered index can be defined on a table or view with a clustered
index or on a heap.
Each index row in the Non Clustered index contains the Non Clustered
key value and a row locator.
This locator points to the data row in the clustered index or heap having
the key value.
The rows in the index are stored in the order of the index key values, but
the data rows are not guaranteed to be in any particular order.
A table is limited to a maximum of 1,000 Non Clustered indexes.
The leaf level of a Non Clustered index contains a pointer to the data you
require.
If a clustered index exists on the table, the leaf level of the NonClustered
index points at the clustering key.
If a clustered index does not exist on the table, the leaf level of the
NonClustered index points at the row of data in the table.
Both CLUSTERED and NONCLUSTERED indexes can be designated as
UNIQUE.
11. When an Index is built, every value in the index key is loaded into the
index. In effect , each index is a mini-table containing all the values
corresponding to just the columns in the index key.
It is possible for a query to be entirely satisfied by using the data in the
index.
An index that is constructed such that SQL Server can completely
satisfy queries by reading only the index is called a Covering Index.
Covering Indexes
Included Columns
Indexes can be created using the optional INCLUDE clause.
Included columns become part of the index at only the leaf level.
Values from included columns do not appear in the root or
intermediate levels of an index and do not count against the 900-
byte limit for an index.
This way you can construct covering indexes that can have
more than 16 columns and 900 bytes by using the INCLUDE
clause.
12. Ways to create statistics in SQL Server 2008 :
The optimizer automatically creates single-column statistics as needed
as side effect of optimizing SELECT , INSERT , UPDATE, DELETE,
and MERGE statements if AUTO_CREATE_STATISTICS is enabled ,
which is the default setting.
Note: the optimizer only creates nonfiltered statistics in these cases.
When an index is created, SQL Server generates a structure called a
histogram that stores information about the relative distribution of data
values within a column.
The degree to which values in the column allow you to locate small sets of
data is referred to as the selectivity of the index.
As the number of unique values within a column increases, the selectivity
of an index increases.
13. (continue)
There are several ways to create statistics or indexes . Ultimately, through
each issues one of the above two commands.
Use dbcc dbreindex to rebuild one or more indexes for a table in the
specified database.
DBCC DBREINDEX (<table name>, <index name>, fill factor);
DBCC DBREINDEX (“Employee", PK_EmployeeID,80);
All indexes on table employee
DBCC DBREINDEX(" Employee ", " ", 80);
Use sys.sp _createstats to create statistics for all eligible columns ( all
except XML columns) for all user tables in the current database. A
new statistics object will not be created for columns that already have
a statistics object.
In SQL Server Management Studio, expand the folder under a Table
object, right click the Statistics folder, and choose New Statistics.
Use the Database Engine Tuning Advisor to create indexes.
14. CREATE STATISTICS
CREATE STATISTICS FirstLast2 ON Person.Contact (FirstName, LastName)
WITH SAMPLE 50 PERCENT
The auto update statistics feature described above may be turned off at defferent
levels:
On the database level, disable auto update statistics by using command
ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS OFF
At the table level, disable auto update statistics using the NORECOMPUTE
option of the UPDATE STATISTICS command or CREATE STATISTICS
command.
Use sp_autostats to display and change the auto update statistics setting for
a table, index, or statistics object.
Re-enabling the automatic updating of statistics can be done similary using
ALTER DATABASE, UPDATE STATISTICS , or sp_autostats.
(continue)
15. The FILLFACTOR option for an index determines the percentage of free
space that is reserved on each leaf-level page of the index when an
index is created or rebuilt.
The free space reserved leaves room on the page for additional values to
be added, thereby reducing the rate at which page splits occur.
By leaving space on the leaf level, you can write a small number of rows to
a leaf-level page before a page split is required, thereby slowing the rate of
fragmentation for an index.
The FILLFACTOR is represented as a percentage full.
For example , a FILLFACTOR =75 means that 25% of the space on each
leaf-level page is left empty to accommodate future values.
FILLFACTOR applies to :
The leaf level of the index.
Intermediate-level page(s) and the root page of an index By using
PAD_INDEX option.
16. During the creation of an index, all the data values for the index key are
read, Then SQL Server creates a series of internal work tables to sort the
values prior to building the B-tree structure.
By default, the work tables are created in the same database as the index.
If you do not want to consume space in the database where the index is
created, you can specify the SORT_IN_TEMPDB option, which causes the
work tables for sort operations to be generated in the tempdb database.
Specifies the error response when an insert operation attempts to insert
duplicate key values into a unique index. The IGNORE_DUP_KEY option
applies only to insert operations after the index is created or rebuilt. The
default is OFF.
Create UNIQUE INDEX UI_marxBrothers ON marxBrothers (name)
WITH IGNORE_DUP_KEY
17. ALTER INDEX { index_name | ALL }
ON <object>
{ REBUILD
[ [ WITH (<rebuild_index_option> [ ,…n ] ) ]
| [ PARTITION = partion_number
[ WITH ( <single_partition_rebuild_index_option> [ ,…n ] ) ] ] ]
| DISABLE | REORGANIZE
[ PARTITION = partion_number]
WITH ( LOB_COMPACTION = { ON | OFF } ) ]
| SET ( < set_index_option > [ ,…n ] ) } [ ; ]
When you defragment an index , you can use either the REBUILD or
REORGANIZE options.
18. The REBUILD option rebuilds all levels of the index and leaves all
pages filled according to the FILLFACTOR setting of an index.
The rebuild of an index effectively re-creates the entire B-tree
structure, so unless you specify the ONLINE option , a shared table
lock is acquired , preventing any changes until the rebuild
operation completes.
The REORGANIZE option removes fragmentation only at the leaf
level.
Intermediate-level pages and the root page are not defragmented
during a reorganize.
REORGANIZE is always an online operation that does not incur
any long-term blocking.
19. An index can be disabled by using the ALTER INDEX statement as
follows:
ALTER INDEX { index name | ALL }
ON < object >
DISABLE [ ; ]
When an index is disabled, the definition remains in the system
catalog but is no longer used. SQL Server does not maintain the
index as data in the table changes, and the index cannot be used
to satisfy queries.
If a clustered index is disabled, the entire table becomes
inaccessible.
To enable an index, it must be rebuilt to regenerate and populate
the B-tree structure.
ALTER INDEX { index name | ALL }
ON < object >
REBUILD [ ; ]
20. Full text indexes can be created against CHAR / VARCHAR, XML, and
VARBINARY columns.
When you full text index a VARBINARY column, you must specify the filter
to be used by the word breaker to interpret the document content.
Thesaurus files allow you to specify a list of synonyms or word replacement
s for search terms.
Stop lists exclude a list of words from search arguments and a full text
index.
Only one FullText index is allowed per table or indexed view.
21. The first step in building a full text index is to create a storage structure.
Unlike relational indexes, full text indexes have a unique internal structure
that is maintained within a separate storage format called a full text
catalog.
Each full text catalog contains one or more full text indexes.
The generic syntax for creating a full text catalog is :
CREATE FULLTEXT CATALOG catalog_name
[ ON FILEGROUP filegroup ]
[ IN PATH ‘rootpath’ ]
[ WITH <catalog_option> ]
[ AS DEFAULT ]
[ AUTHORIZATION owner_name ]
< catalog_option> ::=
ACCENT_SENSITIVITY = { ON|OFF }
22. FREETEXT( ) :
Is predicate used to search columns containing character-based data
types.
It will not match the exact word, but the meaning of the words in the
search condition.
When FREETEXT is used, the FULLTEXT query engine internally
performs the following actions on the freetext_string, assigns each
term a weight, and then finds the matches.
• Separates the string into individual words based on word
boundaries (word-breaking).
• Generates inflectional forms of the words (stemming).
• Identifies a list of expansions or replacements for the terms
based on matches in the thesaurus.
SELECT BusinessEntityID, JobTitle
FROM Employee
WHERE FREETEXT(*, 'Marketing Assistant');
23. CONTAINS ( ) :
Is similar to the FREETEXT but with the difference that it takes one
keyword to match with the records, and if we want to combine other
words as well in the search then we need to provide the “and” or “or”
in search else it will throw an error.
SELECT EntityID,JobTitle
FROM Employee
WHERE CONTAINS(JobTitle, 'Marketing OR Assistant');
SELECT EntityID,JobTitle
FROM Employee
WHERE CONTAINS(JobTitle, 'Marketing AND Assistant');
24. SELECT ProductDescriptionID, Description FROM Production
WHERE FREETEXT(Description, N’bike’)
GO
All search terms used with full text are Unicode strings. If you
pass in a non-Unicode string, the query still works, but it is
much less efficient because the optimizer cannot use
parameter sniffing to evaluate distribution statistics on the full
text index.
Make certain that all terms you pass in for full text search are
always typed as Unicode for maximum performance.
25. The CHANGE_TRACKING option for a full text index determines
how SQL Server maintains the index when the underlying data
changes.
When set to AUTO, SQL Server automatically updates the full text
index as the data is modified.
When set to MANUAL, you are responsible for periodically
propagating the changes into the full text index.
SQL Server uses stemmers to allow a full text index to search on
all inflectional forms a search term , such as drive , drove , driven ,
and driving.
Stemming is language-specific. Although you could employ a
German word breaker to tokenize English, the German word
breaker to tokenize English, the German stemmer cannot process
English.
26. A thesaurus file exists for each supported language.
All thesaurus files are XML files stored in the FTDATA directory
underneath your default SQL Server installation path.
The thesaurus files are not populated , so to perform synonyms
searches, you need to populate the thesaurus files.
Stop lists are used to exclude words that you do not want
included in a full text index.
CREATE FULLTEXT STOPLIST ProductStopList;
GO
ALTER FULLTEXT STOPLIST ProductStopList ADD ‘bike’
LANGUAGE 1033
GO
ALTER FULLTEXT INDEX ON ProductStop SET STOPLIST
ProductStopList
GO
27. A stopword can be a word with meaning in a specific language,
or it can be a token that does not have linguistic meaning. For
example, in the English language, words such as "a," ”an," "is,"
and "the" are left out of the full-text index since they are known
to be useless to a search.
Word Position
Instructions 1
are 2
applicable 3
to 4
these 5
Adventure 6
Works 7
Cycles 8
models 9
Consider the phrase, "Instructions are applicable to these
Adventure Works Cycles models". The following table depicts
the position of the words in the phrase:
The stopwords "are", "to", and
"these" that are in positions 2,
4, and 5 are left out of the full-
text index. However, their
positional information is
maintained, thereby leaving
the position of the other words
in the phrase unaffected.
28. Review
1 - You are the database administrator at your company. You need to enable
the sales support team to perform fuzzy searches on product
descriptions. Which actions do you need to perform to satisfy user needs
with the least amount of effort? (Choose two. Each forms part of the
correct answer.)
A. Create a full text catalog specifying the filegroup for backup
purposes and the root path to store the contents of the catalog on
the file system.
B. Create a full text catalog and specify the filegroup to store the
contents of the catalog.
C. Create a full text index on the table of product descriptions for the
description column and specify NO POPULATION.
D. Create a full text index on the table of product descriptions for the
description column and specify CHANGE_TRACKING AUTO.
29. Review (continue….)
2 - You want to configure your full text indexes such that SQL Server
migrates changes into the index as quickly as possible with the minimum
amount of administrator effort. Which command should you execute?
A. ALTER FULLTEXT INDEX ON <table_name> START FULL
POPULATION .
B. ALTER FULLTEXT INDEX ON <table_name> START
INCREMENTAL POPULATION .
C. ALTER FULLTEXT INDEX ON <table_name> SET
CHANGE_TRACKING AUTO .
D. ALTER FULLTEXT INDEX ON <table_name> START UPDATE
POPULATION .
30. Review (continue….)
3 - You want to search for two terms based on proximity within a row. Which
full text predicates can be used to perform proximity searches? (Choose
two. Each forms a separate answer.)
A. CONTAINS
B. FREETEXT
C. CONTAINSTABLE
D. FREETEXTTABLE
4 - You want to perform a proximity search based on a weighting value for
the search arguments. Which options for the CONTAINSTABLE
predicate should you use?
A. FORMSOF with the THESAURUS keyword
B. FORMSOF with the INFLECTIONAL keyword
C. ISABOUT
D. ISABOUT with the WEIGHT keyword
31. Review (continue….)
5 - You have a list of words that should be excluded from search arguments.
Which action should you perform in SQL Server 2008 to meet your
requirements with the least amount of effort?
A. Create a stop list and associate the stop list to the full text index.
B. Create a noise word file and associate the noise word file to the
full text index.
C. Populate a thesaurus file and associate the thesaurus file to the
full text index.
D. Parse the inbound query and remove any common words from
the search arguments
32. Answers
1 - Correct Answers: B and D
A. Incorrect: SQL Server 2005 stored full text indexes on the file
system. SQL Server 2008 stores full text indexes within a
filegroup in the database.
B. Correct: Full text catalogs contain full text indexes and the
contents of the indexes are stored within the database in SQL
Server 2008.
C. Incorrect: The NO POPULATION option enables SQL Server to
create the full text index but does not populate the index.
Therefore, searches do not return any results.
D. Correct: CHANGE_TRACKING AUTO option enables SQL Server
to populate the full text index upon initial creation and migrate
changes automatically to underlying data into the index.
33. Answers (continue….)
2 - Correct Answer: C
A. Incorrect: The START {FULL | INCREMENTAL | UPDATE}
POPULATION argument executes a population run for the full text
index, but it must either be executed manually or configured to
run in a SQL Server Agent job.
B. Incorrect: The START {FULL | INCREMENTAL | UPDATE}
POPULATION argument executes a population run for the full text
index, but it must either be executed manually or configured to
run in a SQL Server Agent job.
C. Correct: When the CHANGE_TRACKING argument is set to
AUTO, SQL Server automatically updates the full text index as
changes to underlying data occur. In AUTO mode, no
administrator intervention is required either manually or via a
scheduled job.
D. Incorrect: The START {FULL | INCREMENTAL | UPDATE}
POPULATION argument executes a population run for the full text
index, but it must either be executed manually or configured to
run in a SQL Server Agent job
34. Answers (continue….)
3 - Correct Answers: A and C
A. Correct: CONTAINS allows proximity searches by using the
NEAR keyword.
B. Incorrect: FREETEXT does not allow proximity searches.
C. Correct: CONTAINSTABLE allows proximity searches by using
the NEAR keyword.
D. Incorrect: FREETEXTTABLE does not allow proximity searches
4 - Correct Answer: D
A. Incorrect:The FORMS OF argument allows you to search based
on a thesaurus or inflectional forms of a search term but does
not perform proximity searches.
B. Incorrect:The FORMS OFargument allows you to search based
on a thesaurus or inflectional forms of a search term but does
not perform proximity searches.
C. Incorrect: ISABOUT performs proximity searches but does not
apply weighting unless the WEIGHTkeyword and weighting value
are also supplied.
D. Correct: ISABOUT performs proximity searches and it also
applies weighting if the WEIGHTkeyword and weighting value are
supplied.
35. Answers (continue….)
5 - Correct Answer: A
A. Correct: Stop lists are created in SQL Server 2008 to exclude
words from a full text index as well as search arguments. After
the stop list is associated to a full text index, any queries that use
the index automatically have any stop words removed from the
search arguments.
B. Incorrect: Noise word files were used in SQL Server 2005. SQL
Server 2008 uses stop lists for the purpose of excluding words
from search arguments.
C. Incorrect: A thesaurus allows terms to be replaced such as
common abbreviations or mis-spellings but does not exclude
words from being searched upon.
D. Incorrect: Although you could alter your application to remove
search arguments, you would require more effort than creating,
populating, and managing a stop list.