Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. INDEXING Davood Pour Yousefian Barfeh
  2. 2. INDEXING <ul><li>What is index? </li></ul><ul><li>Why is it needed? </li></ul><ul><li>When should it be used? </li></ul><ul><li>Types of indexes </li></ul>
  3. 3. What is index? <ul><li>a data structure </li></ul><ul><li>a way of sorting </li></ul><ul><li>holds the field value, and pointer to the record it relates to </li></ul>
  4. 4. Why is Index needed? (((((Advantage))))) <ul><li>speed up retrieval of data </li></ul><ul><li>without index: Linear Search N= number of records </li></ul><ul><li>- key (unique value) – N/2 </li></ul><ul><li>- non-key – N </li></ul><ul><li>using index: Binary Search </li></ul><ul><li>log 2 N </li></ul>
  5. 5. Indexing (((((Disadvantage))))) <ul><li>Additional space on the disk </li></ul><ul><li>Slow down </li></ul>
  6. 6. <ul><li>Field name Data type Size on disk </li></ul><ul><li>id (Primary key) Unsigned INT 4 bytes </li></ul><ul><li>firstName Char(50) 50 bytes </li></ul><ul><li>lastName Char(50) 50 bytes </li></ul><ul><li>emailAddress Char(100) 100 bytes </li></ul><ul><li>*char was used in place of varchar to allow for an accurate size on disk value </li></ul><ul><li>*database contains five million rows, and is unindexed </li></ul><ul><li>r = 5,000,000 records & record length R = 204 bytes & block size B = 1,024 bytes </li></ul><ul><li>bfr = (B/R) = 1024/204 = 5 records per disk block </li></ul><ul><li>total number of blocks required N = (r/bfr) = 5,000,000 / 5 = 1,000,000 blocks </li></ul><ul><li>linear search for a key field: N / 2 = 500,000 blocks -- can be log 2 N = 19.93  20 blocks </li></ul><ul><li>Linear search for a non-key field: N = 1,000,000 blocks </li></ul>Ex. Without Indexing
  7. 7. <ul><li>Field name Data type Size on disk </li></ul><ul><li>firstName Char(50) 50 bytes </li></ul><ul><li>(record pointer) Special 4 bytes </li></ul><ul><li>*Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table </li></ul><ul><li>r = 5,000,000 records & index record length R = 54 bytes & block size B = 1,024 bytes </li></ul><ul><li>bfr = (B/R) = 1024 / 54 = 18 records per disk block </li></ul><ul><li>The total number of blocks required to hold the index is: </li></ul><ul><li>N = (r/bfr) = 5000000 / 18 -> 277,778 blocks </li></ul><ul><li>Binary Search: </li></ul><ul><li>log 2 N = log 2 277,778 = 18.08 -> 19 blocks </li></ul>Ex.Using Indexing
  8. 8. When should indexing be used? can <ul><li>General Rule: Anything that limits the number of results you are trying to find. </li></ul><ul><li>speed up finding data </li></ul><ul><li>cardinality </li></ul><ul><li>table that references other table </li></ul>
  9. 9. When should indexing be used? <ul><li>speed up finding data </li></ul><ul><li>but slow down inserting , deleting or updating data </li></ul><ul><li>- not only table must be updated but the index as well </li></ul><ul><li>bank account number is better than one on balance </li></ul>
  10. 10. <ul><li>Cardinality: The number of distinct values for a column </li></ul><ul><li> Binary Search </li></ul><ul><li> </li></ul><ul><li> Linear Search </li></ul>When should indexing be used?
  11. 11. When should indexing be used? <ul><li>Cardinality </li></ul><ul><li>Ex. good Selectivity: A table having 100'000 records and one of its indexed column has 88’000 distinct values, then the selectivity of this index is 88'000 / 100’000 = 88% </li></ul><ul><li>Ex. bad Selectivity: A table of 100'000 records had only 200 distinct values, then the index's selectivity is 200 / 100'000 = 0.2% </li></ul><ul><li>Number of records in each group= 100’000 / 200 = 5’000 </li></ul><ul><li>full table scan is more efficient as using such an index where much more I/O is needed to scan repeatedly the index and the table </li></ul>Index Selectivity = Number of distinct values Number of records
  12. 12. When should indexing be used? <ul><li>table that references other table - join </li></ul><ul><li>Ex. </li></ul>SELECT newstitle, firstname, lastname FROM newsitem n, authors a WHERE n.authorid=a.authorid; CREATE INDEX newsitem_authorid ON newsitem(authorid); General Rule: Any fields involved in a table join must be indexed CREATE TABLE newsitem (   newsid INT PRIMARY KEY,   newstitle VARCHAR(255),   newscontent TEXT,   authorid INT,   newsdate TIMESTAMP ); CREATE TABLE authors (   authorid INT PRIMARY KEY,   username VARCHAR(255),   firstname VARCHAR(255),   lastname VARCHAR(255) );
  13. 13. When should indexing be used? SELECT n.newstitle, c.categoryname FROM categories c, newsitem_categories nc, newsitem n WHERE c.categoryid=nc.categoryid AND nc.newsid=n.newsid; These fields must be indexed: newsitem  newsid newsitem_categories  newsid newsitem_categories  categoryid categories  categoryid CREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); Ex. CREATE TABLE newsitem (   newsid INT PRIMARY KEY,   newstitle VARCHAR(255),   newscontent TEXT,   authorid INT,   newsdate TIMESTAMP ); CREATE TABLE newsitem_categories (   newsid INT,   categoryid INT ); CREATE TABLE categories (   categoryid INT PRIMARY KEY,   categoryname VARCHAR(255) );
  14. 14. Combination on Indexing CREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); CREATE INDEX news_cats ON newsitem_categories(newsid, categoryid); Can we do? YES but LIMITATIONs
  15. 15. Conjunctions in Cobnations on Indexing CREATE TABLE example (   a int,   b int,   c int ); CREATE INDEX example_index ON example(a,b,c); <ul><li>It will be used when you check against ‘a’. </li></ul><ul><li>It will be used when you check against ‘a’ and ‘b’. </li></ul><ul><li>It will be used when you check against ‘a’, ‘b’ and ‘c’. </li></ul><ul><li>It will not be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’ </li></ul><ul><li>It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it will not be used </li></ul><ul><li>to check the ‘c’ column as well. </li></ul><ul><li>A query against ‘a’ OR ‘b’ like this: </li></ul><ul><li>SELECT a,b,c FROM example where a=1 OR b=2; </li></ul><ul><li>Will only be able to use the index to check the ‘a’ column as well – it will not be able to use it </li></ul><ul><li>to check the ‘b’ column. </li></ul>
  16. 16. Types of indexes (1) <ul><li>Clustered and Non-clustered </li></ul><ul><li>Indexes </li></ul>indexes whose order of the rows in the data page correspond to the order of the rows in the index <ul><li>Only one per table – primary key </li></ul><ul><li>Faster to read than non clustered as data is physically stored in index order </li></ul><ul><li>Can be used many times per table </li></ul><ul><li>Quicker for insert, delete, and update operations </li></ul><ul><li>than a clustered index </li></ul>Order of rows is not important
  17. 17. Types of indexes (2) <ul><li>Unique and Non-unique </li></ul><ul><li>Indexes </li></ul>help maintain data integrity by ensuring that no two rows of data in a table have identical key values uniqueness is enforced improve query performance by maintaining a sorted order of data values that are used frequently
  18. 18. Types of indexes (3) <ul><li>Bitmap index - stores the bulk of its data as bit array </li></ul><ul><li> values of a variable repeat very frequently </li></ul><ul><li>Dense index - An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record </li></ul><ul><li>Sparse index - Index records are created only for some of the records </li></ul><ul><li> primary key </li></ul><ul><li>Reverse index - reverses the key value before entering it in the index </li></ul><ul><li>sequence numbers, where new key values monotonically increase </li></ul>
  19. 19. Types of indexes (4) <ul><li>Fulltext - search engine examines all of the words in every stored document as </li></ul><ul><li> it tries to match search words supplied by the user </li></ul><ul><li>many other types of search: </li></ul><ul><li>Two words near each other </li></ul><ul><li>Any word derived from a particular root (for example run, ran, or running) </li></ul><ul><li>Multiple words with distinct weightings </li></ul><ul><li>A word or phrase close to the search word or phrase </li></ul><ul><li>Spatial - allow users to treat data within a data - store as existing within a two dimensional context </li></ul><ul><li>extended index that allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, such as geometry or geography </li></ul>
  20. 20. Syntax of Index (1) <ul><li>Creation: </li></ul><ul><li>CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name </li></ul><ul><li>[ index_type ] </li></ul><ul><li>ON tbl_name ( index_col_name ,...) </li></ul><ul><li>[ index_type ] </li></ul><ul><li>index_col_name : </li></ul><ul><li>col_name [( length )] [ASC | DESC] </li></ul><ul><li>index_type : </li></ul><ul><li>USING {BTREE | HASH} </li></ul>
  21. 21. Access Method <ul><li>BTree: </li></ul><ul><li>Keys have some locality of reference </li></ul><ul><li>They can be sorted well </li></ul><ul><li>Neighborhood-expect that a query for a given key </li></ul><ul><li>will likely be followed by a query for one of its neighbors </li></ul><ul><li>Hash: </li></ul><ul><li>Dataset is extremely large </li></ul>
  22. 22. Syntax of Index(2) <ul><li>Displaying Index Information: </li></ul><ul><li>SHOW INDEX FROM table_name </li></ul><ul><li>Deletion: </li></ul><ul><li>DROP INDEX index_name ON table_name </li></ul>
  23. 23. Summary <ul><li>What is index? - data structure – sorting a number of records </li></ul><ul><li>Why is it needed? - advantages & disadvantages </li></ul><ul><li>When should it be used? - finding </li></ul><ul><li>Types of indexes - clustered & non-clustered – unique & non-unique </li></ul><ul><li>Syntax - creation, display, deletion </li></ul>