CS 542 Database Index Structures


Published on

Published in: Technology, Sports
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

CS 542 Database Index Structures

  1. 1. CS 542 Database Management Systems<br />Index Structures<br />J Singh <br />February 7, 2011<br />
  2. 2. Today’s Topic: Index Structures<br />Search Performance is the primary driver of the utility of databases<br />And the principal factor attenuating the continuing expansion of db size<br />Index structures are primary drivers of search performance<br />Continuing innovation required for special applications<br />Topics<br />Single-Attribute Indexes: Overview<br />Order-based Indexes<br />(adapted from Prof. Babu’s slides, Duke Univ)<br />B-Trees<br />Hash-based Indexes<br />(adaptedfrom Prof. Ramakrishnan’s slides, U. Wisconsin)<br />Extensible Hashing<br />Linear Hashing<br />Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future)<br />
  3. 3. Approach to today’s discussion<br />Choose a few representative tables<br />For each indexing solution<br />The data structures involved<br />How queries are handled<br />Insertion Operations<br />Deletion Operations are generally omitted from the discussion<br />Generally the reverse of insertion<br />New algorithmic details but no new insights<br />On average, tables grow, <br />Often not worthwhile trying to collapse data structures <br />Often equally effective to replace with a tombstone record<br />
  4. 4. Representative Tables<br />Table S<br />100,000 rows<br />100 bytes per row<br />10 MB total data<br />Table M<br />10 million rows<br />1KB per row<br />10 GB total data<br />Table W<br />1 million rows<br />10 MB per row (images?)<br />10 TB total data<br />Table T<br />625 million rows<br />160 bytes (tweets?)<br />100 GB total data<br />
  5. 5. Single Attribute Index (p1)<br />a<br />a<br />b<br />1<br />1<br />1<br />a<br />a<br />b<br />i<br />i<br />i<br />a<br />a<br />b<br />2<br />2<br />2<br />a<br />a<br />b<br />n<br />n<br />n<br />A<br />B<br />A = val<br />A > lowA < high<br />
  6. 6. Where does the data live?<br />Index files for a relation R can occur in three forms:<br />Data entries store the actual data for relation R.<br />Index file provides both indexing and storage.<br />Data entries store pairs <k, rid>:<br />k – value for a search key.<br />rid – rid of record having search key value k.<br />Actual data record is stored somewhere else<br />Data entries store pairs <k, rid-list><br />K – value for a search key<br />Rid-list – list of rid for all records having search key value k<br />Actual data record is stored somewhere else<br />
  7. 7. Clustered vsUnclustered Index <br />Index is said to be clustered if<br />Data records in the file are organized as data entries in the index<br />If data is stored in the index, then the index is clustered by definition. This is option (1) from previous slide.<br />Otherwise, data file must be sorted in order to match index organization.<br />Un-clustered index<br />Organization on data entries in index is independent from organization of data records.<br />These are options (2) and (3)<br />File storing a relation R can only have 1 clustered index, but many un-clustered indices<br />Why?<br />
  8. 8. Single Attribute Index (p2)<br />Sparse Indexes<br />Require an index entry for every n tuples (comprising a block)<br />Require each block to be laid out in tuple order<br />Dense Indexes<br />Require an index entry for every tuple<br />Memory-resident indexes win<br />a<br />1<br />a<br />i<br />a<br />2<br />a<br />n<br />A = val<br />A > lowA < high<br />
  9. 9. B-Trees – B doesn’t mean binary<br />71<br />32<br />83<br />16<br />54<br />92<br />74<br />16<br />32<br />54<br />71<br />74<br />83<br />92<br />Binary Trees<br />Store records of type <key,lPtr,dataPtr,rightPtr><br />32 bytes per key, 32 bytes per ptr 128 bytes per record<br />Table S: <br />12.8 MB for indexes vs 10 MB data<br />Average 17 lookups (217 > 100K)<br />How many of the 17 lookups from disk?<br />
  10. 10. B-Trees – B means balanced<br />B-Trees are<br />Technically, B+ Trees<br />Support range predicates (and equality)<br />B means balanced<br />
  11. 11. B-Tree Example<br />63<br />Root Node<br />36<br />84<br />91<br />Intermediate Nodes<br />Leaf Nodes<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />Data Records<br />
  12. 12. Meaning of Internal Node<br />84<br />91<br />key < 84<br />91 ≤ key<br />84 ≤ key < 91<br />
  13. 13. Meaning of Leaf Nodes<br />63<br />76<br />Next leaf<br />pointer to record 63<br />pointer to record 76<br />
  14. 14. B-Trees in Action<br />B+ Tree Java Applet<br />
  15. 15. Equality Predicates<br />key = 87<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  16. 16. Equality Predicates<br />key = 87<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  17. 17. Equality Predicates<br />key = 87<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  18. 18. Equality Predicates<br />key = 87<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  19. 19. Range Predicates<br />57 ≤ key < 95<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  20. 20. Range Predicates<br />57 ≤ key < 95<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  21. 21. Range Predicates<br />57 ≤ key < 95<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  22. 22. Range Predicates<br />57 ≤ key < 95<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  23. 23. Range Predicates<br />57 ≤ key < 95<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  24. 24. Range Predicates<br />57 ≤ key < 95<br />63<br />36<br />84<br />91<br />15<br />36<br />57<br />63<br />92<br />100<br />76<br />87<br />null<br />
  25. 25. General B-Trees<br />Number of keys: n<br />Number of pointers: n + 1<br />All leaves at same depth<br />All (key, record pointer) in leaves<br />Node size should be at least:<br />Root: 2 pointers<br />Internal nodes: (n+1)/2 pointers<br />Leaf nodes: (n+1)/2 pointers to data<br />Min<br />Max<br />5<br />15<br />21<br />15<br />Internal<br />42<br />56<br />31<br />31<br />42<br />Leaf<br />Definitions:<br />Order: (n+1)/2<br />Fanout<br />Average # of pointers out<br />Order < Fanout < 2Order<br />
  26. 26. B-Tree Construction<br />75<br />81<br />97<br />62<br />11<br />13<br />15<br />21<br />34<br />41<br />48<br />57<br />75<br />97<br />21<br />41<br />57<br />15<br />11<br />13<br />48<br />34<br />62<br />81<br />Scan<br />
  27. 27. B-Tree Construction<br />21<br />48<br />75<br />62<br />75<br />81<br />97<br />11<br />13<br />15<br />21<br />34<br />41<br />48<br />57<br />Why is sort-based construction better than insertion-based construction?<br />Scan<br />
  28. 28. Cost of B-Tree Operations<br />Height of B-Tree: H<br />Assume no duplicates<br />Assume no blocks in memory<br />What is the random I/O cost of:<br />Insertion:<br />Deletion:<br />Equality search:<br />Range Search: <br />Assume root and intermediate nodes in memory<br />But not leaf nodes and data blocks<br />What are the I/O costs?<br />
  29. 29. B+ Trees in Practice<br />Typical order: 100. Typical fill-factor: 67%.<br />average fanout = 133<br />Typical capacities:<br />Height 2: 1332 = 17,689 entries<br />Height 3: 1333 = 2,352,637 entries<br />Height 4: 1334 = 312,900,700 entries<br />Can often hold top levels in buffer pool:<br />Level 1 = 1 page = 8 Kbytes<br />Level 2 = 133 pages = 1 Mbyte<br />Level 3 = 17,689 pages = 133 MBytes<br />
  30. 30. Revisit Representative Tables<br />Table S<br />100,000 rows<br />100 bytes per row<br />B+ Tree<br />Height = 3<br />Leaf Nodes<br />100,000 / 133 = 752 pgs<br />Level 2 nodes<br />750 / 133 = 6 pgs<br />Memory budget for nodes<br />759 pgs = 6MB <br />10 MB total data<br />Table M<br />Homework<br />Table W<br />1 million rows<br />10 MB per row (images?)<br />B+ Tree<br />Height = 3<br />Leaf Nodes = 7519 pgs<br />Level 2 nodes = 57 pgs<br />Memory budget<br />60.6 MB<br />10 TB total data<br />Table T<br />Homework<br />
  31. 31. CS 542: Database Management Systems<br />Hash-based Indexes<br />
  32. 32. Indexing Problem (recap)<br />a<br />1<br />a<br />i<br />a<br />2<br />a<br />n<br />datum<br />datum<br />A = val<br />datum<br />A > lowA < high<br />datum<br />
  33. 33. Hash-Based Indexes<br />Adaptation of main memory hash tables<br />Support equality searches<br />No range searches<br />
  34. 34. Static Hashing<br /><ul><li>Hashing function: h(k) mod N = </li></ul>Bucket to which the datum with key k belongs<br /># primary pages fixed (N = # of buckets)<br />allocated sequentially<br />never de-allocated<br />overflow pages if needed.<br />0<br />h(key) mod N<br />2<br />key<br />h<br />N-1<br />Primary bucket pages<br />Overflow pages<br />
  35. 35. Hashing Function<br /><ul><li>Hashing function: h(k) mod N = </li></ul>Bucket to which the datum with key k belongs<br /># primary pages fixed (N = # of buckets)<br />Hash function works on search key of record r.<br />h() must distribute values over range [0 ... N-1].<br />For example, h(key) = (a * key + b) <br />a and b are constants <br />lots known about how to tune h.<br />
  36. 36. Issues with Static Hashing<br />Primary pages fixed space  static structure.<br />Fixed # bucket is the problem:<br />Rehashing can be done  Not good for search.<br />In practice, instead use overflow chains.<br />Long overflow chains can develop and degrade performance. <br />Solution: Employ dynamic techniques to fix this problem:<br />Extensible hashing, or<br />Linear Hashing <br />
  37. 37. Extensible Hash Index (p1)<br />What to do when bucket (primary page) becomes full. <br />Why not re-organize file by doubling # of buckets?<br />Because reading and writing all pages is expensive<br />Use directory of pointers to bucketsinstead of buckets<br />double # of buckets by doubling the directory<br />split just the bucket that overflowed<br />Discussion:<br />+ Directory much smaller than file, so doubling is cheaper. <br />+ Only one page of data entries is split. <br />+ No overflow pages, ever <br />Trick lies in how hash function is adjusted!<br />
  38. 38. Extensible Hash Index (p2)<br />
  39. 39. Extensible Hash Index (p3)<br />
  40. 40. Extensible Hash Index (p4)<br />
  41. 41. Extensible Hash Index (p5)<br />
  42. 42. Extensible Hash Index (p6)<br />
  43. 43. Extensible Hash Index (p7)<br />
  44. 44. Extensible Hash Index Summary<br />Lookup:<br />Global depth:<br /># of bits needed to tell which bucket a datum belongs<br />Local depth:<br /># of bits needed to tell if a datum belongs in the bucket<br />Insertion:<br />If a bucket has room, add the hash key (seen when adding 6*)<br />If no room, <br />May be able to add a new page and link it in (when adding 9*)<br />May need to double the directory (when adding 20*)<br />How to tell if doubling is necessary?<br />Doubling is necessary if Global Depth = Local Depth<br />
  45. 45. Extensible Hash – Size Considerations<br />Each record in a bucket is<br /><key, pointer> = 8 bytes<br />1000 hashes max in an 8KB bucket.<br />Assuming 50% bucket occupancy, 500 hashes/bucket<br />Table S<br />200 buckets<br />Directory could be  256 rows, negligible space<br />But only if the hash function distributes evenly<br />Bucket space = 1600 KB = 1.6 MB<br />Chances are buckets will also fit in memory<br />If directory fits in memory, <br />Equality searches answered with one disk access, else two<br />Why is this a misleading statement?<br />
  46. 46. Hash-based indexes: best for equality searches, cannot support range searches.<br />Static Hashing can lead to long overflow chains.<br />Extendible Hashing avoids overflow pages by splitting full bucket when new data to be added <br />Directory to keep track of buckets, doubles periodically<br />Summary<br />
  47. 47. Next meeting<br />February14<br />Parallel and Distributed Databases, Chapter 20.1 – 20.4<br />Selected Papers<br />