Published on

Published in: Business, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. CHAPTER 6 Index Structures for Files
  2. 2. Indexes <ul><li>Additional auxiliary access structures which are used to speed up the retrieval of records in response to certain conditions. </li></ul><ul><li>Provide secondary access paths </li></ul>
  3. 3. INDEX – consists of keys and addresses (physical disc locations.
  4. 4. TYPES OF SINGLE-LEVEL ORDERED INDEXES <ul><li>Primary index - The index that controls the current processing order of a file. It maintains an index on the primary key. </li></ul><ul><li>Clustering index – determines how rows are physically ordered (clustered) in a table space. It provides significant performance advantages in some operations, particularly those that involve many records. </li></ul><ul><li>Secondary index – If the search key of a secondary index is not a candidate key, it is not enough to point to just the first record with each search-key value because the remaining records with the same search-key value could not be anywhere in the file. Therefore, a secondary index must contain pointers to all of the records. </li></ul>
  5. 5. PRIMARY INDEXES <ul><li>An ordered file whose records are of fixed length with two fields. </li></ul><ul><li><K(i),P(i)> </li></ul><ul><li><K(1) = (Aaron ,Ed), P(1) = address of block 1> </li></ul><ul><li><K(2) = (Adams ,John), P(2) = address of block 2> </li></ul><ul><li><K(3) = (Aaron ,Ed), P(3) = address of block 3> </li></ul>
  6. 6. Adams, John Aaron, Ed Alexander, Ed Aaron, Ed Acosta, Marc Abbott, Diane (Primary key field) NAME DATA FILE SSN BLOCK ANCHOR PRIMARY KEY VALUE BLOCK POINTER BDATE JOB SALARY INDEX FILE (<K(i), P(i)>entries) Adams, John Adams, Robin Alexander, Ed Alfred, Bob Akers, John Allen, Sam PRIMARY INDEX
  7. 7. <ul><li>Indexes can be characterized as: </li></ul><ul><li>DENSE – An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record. </li></ul><ul><li>SPARSE (NON-DENSE) – Index records are created only for some of the records. </li></ul>
  10. 10. The index file for a primary index needs substantially fewer blocks than does the data file, for two reasons. <ul><li>There are fewer index entries than there are records in the data file. </li></ul><ul><li>Each index entry is typically smaller in size than a data record. </li></ul>
  11. 11. CLUSTERING INDEXES <ul><li>An ordered file with two fields; the first field is of the same type as the clustering field of the data file, and the second field is a block pointer. </li></ul>
  13. 13. SECONDARY INDEXES <ul><li>The first field is of the same data type as some non-ordering field of the data file that is an indexing field. </li></ul><ul><li>The second field is either a block pointer or a record pointer . </li></ul>
  15. 15. TYPES OF INDEXES ORDERING FIELD NON ORDERING FIELD Key field NonKey field Primary index Secondary index (nonkey) Secondary index (key) Clustering index
  16. 16. PROPERTIES OF INDEX TYPES No Dense or nondense Number of records or number of distinct index field values Secondary (nonkey) No Dense Number of records in data file Secondary (key) Yes/No Nondense Number of distinct index field values Clustering Yes Nondense Number of blocks in data file Primary BLOCK ANCHORING ON THE DATA FILE DENSE OR NONDENSE NUMBER OF (FIRST-LEVEL) INDEX ENTRIES TYPE OF INDEX
  17. 17. MULTILEVEL INDEXES <ul><li>Multilevel indexes can be constructed to improve the efficiency of searching an index. </li></ul><ul><li>The index is too large and so is split into a number of separate indexes. There would then be an index for these indexes. In fact, a number of different levels of index may exist. </li></ul>
  18. 18. MULTILEVEL INDEX 5 3000 4 2500 3 2000 2 1500 1 1000 Address(Index) High Key 14 2500 13 2400 12 2300 11 2200 10 2100 Address(Index) High Key 69 2400 68 2390 67 2380 66 2370 65 2360 64 2350 63 2340 62 2330 61 2320 60 2310 Address(Index) High Key 781049 2360 781048 2359 781047 2358 781046 2357 781045 2356 781044 2355 781043 2354 781042 2353 781041 2352 781040 2351 Address Key
  19. 19. INDEXED SEQUENTIAL FILE <ul><li>Files are ordered sequentially on some search key, and a primary index is associated with it. </li></ul><ul><li>Indexed sequential files are important for applications where data needs to be accessed..... </li></ul><ul><li>1. sequentially </li></ul><ul><li>2. randomly using the index. </li></ul><ul><li>An indexed sequential file allows fast access to a specific record. </li></ul>
  20. 20. EXAMPLE: A company may store details about its employees as an indexed sequential file. Sometimes the file is accessed… <ul><li>Sequentially . For example when the whole of the file is processed to produce pay slips at the end of the month. </li></ul><ul><li>Randomly . Maybe an employee changes address, or a female employee gets married and changes her surname. </li></ul>
  21. 21. DYNAMIC MULTILEVEL INDEXES USING B-TREES AND B+ TREES <ul><li>B-trees and B+ trees are special cases of the well-known tree data structure. </li></ul><ul><li>Rudolf Bayer and Ed McCreight – B-tree’s creators </li></ul><ul><li>B stands for balanced , as all the leaf nodes are at the same level in the tree. </li></ul><ul><li>B may also stand for Bayer , Branching Tree, or for Boeing because they are working for Boeing Scientific Research Labs at that time. </li></ul>
  22. 22. <ul><li>TREE </li></ul><ul><li>ROOT </li></ul><ul><li>LEAF NODE </li></ul><ul><li>INTERNAL NODE </li></ul>
  23. 23. A K I H G D C J F E B Root node (level 0) nodes at level 1 Nodes at level 2 Nodes at level 3 SUBTREE FOR NODE B (nodes E, J, C, G, H, and K are leaf nodes of the tree) A tree data structure that shows an unbalanced tree.
  24. 24. SEARCH TREES AND B-TREES <ul><li>SEARCH TREE – is a special type of tree that is used to guide the search for a record, given the value of one of the record’s field. </li></ul>
  25. 25. <ul><li>A binary search tree of size 9 and depth 3, with root 8 and leaves 1, 4, 7 and 13 </li></ul><ul><li>Binary search tree ( BST ) is a binary tree data structure which has the following properties: </li></ul><ul><li>each node (item in the tree) has a value; </li></ul><ul><li>a total order (linear order) is defined on these values; </li></ul><ul><li>the left subtree of a node contains only values less than the node's value; </li></ul><ul><li>the right subtree of a node contains only values greater than or equal to the node's value. </li></ul>
  26. 26. <ul><li>B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time. It is most commonly used in databases and filesystems . </li></ul>B-TREE
  27. 27. A simple B tree example.
  28. 28. B+ TREES <ul><li>B+ tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key . </li></ul><ul><li>It is a dynamic, multilevel index, with maximum and minimum bounds on the number of keys in each index segment (usually called a ' block ' or ' node '). </li></ul>
  29. 29. A simple B+ tree example linking the keys 1-7 to data values d1-d7. Note the linked list (red) allowing rapid in-order traversal.
  30. 30. INDEXES ON MULTIPLE KEYS <ul><li>The primary or secondary keys on which files were accessed were single attributes (fields). </li></ul><ul><li>In many retrieval and update requests, multiple attributes are involved. If a certain combination of attributes is used very frequently, it is advantageous to set up an access structure to provide efficient access by a key value that is a combination of those attributes. </li></ul>
  31. 31. PARTITIONED HASHING <ul><li>An extension of static external hashing that allows access on multiple keys. </li></ul><ul><li>It is suitable only for equality comparisons. </li></ul>
  32. 32. GRID FILES <ul><li>A grid file is a multidimensional array , normally held on disk, and used as an index into a database of information. </li></ul><ul><li>Grid files perform well in terms of reduction in time for multiple key access. </li></ul>
  33. 33. OTHER TYPES OF INDEXES <ul><li>Using hashing and other data structures as indexes </li></ul><ul><li>Logical versus physical indexes </li></ul>
  34. 34. <ul><li>An index is often called an access structure. </li></ul><ul><li>A secondary index is created to avoid physical ordering of the records in the data file on disk. </li></ul>
  35. 35. <ul><li>Fully inverted file – a file that has a secondary index on every one of its fields </li></ul><ul><li>Virtual Storage Access Method (VSAM) – IBM file organization that is similar to the B+ tree structure. </li></ul>
  36. 36. <ul><li>Thank You! </li></ul><ul><li>Reported by: </li></ul><ul><li>Myrtle P. Bautista </li></ul><ul><li>BIT07B1 </li></ul>