Additional auxiliary access structures which are used to speed up the retrieval of records in response to certain conditions.
Provide secondary access paths
INDEX – consists of keys and addresses (physical disc locations.
TYPES OF SINGLE-LEVEL ORDERED INDEXES
Primary index - The index that controls the current processing order of a file. It maintains an index on the primary key.
Clustering index – determines how rows are physically ordered (clustered) in a table space. It provides significant performance advantages in some operations, particularly those that involve many records.
Secondary index – If the search key of a secondary index is not a candidate key, it is not enough to point to just the first record with each search-key value because the remaining records with the same search-key value could not be anywhere in the file. Therefore, a secondary index must contain pointers to all of the records.
An ordered file whose records are of fixed length with two fields.
<K(1) = (Aaron ,Ed), P(1) = address of block 1>
<K(2) = (Adams ,John), P(2) = address of block 2>
<K(3) = (Aaron ,Ed), P(3) = address of block 3>
Adams, John Aaron, Ed Alexander, Ed Aaron, Ed Acosta, Marc Abbott, Diane (Primary key field) NAME DATA FILE SSN BLOCK ANCHOR PRIMARY KEY VALUE BLOCK POINTER BDATE JOB SALARY INDEX FILE (<K(i), P(i)>entries) Adams, John Adams, Robin Alexander, Ed Alfred, Bob Akers, John Allen, Sam PRIMARY INDEX
Indexes can be characterized as:
DENSE – An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record.
SPARSE (NON-DENSE) – Index records are created only for some of the records.
SPARSE OR NON-DENSE INDEX
The index file for a primary index needs substantially fewer blocks than does the data file, for two reasons.
There are fewer index entries than there are records in the data file.
Each index entry is typically smaller in size than a data record.
An ordered file with two fields; the first field is of the same type as the clustering field of the data file, and the second field is a block pointer.
The first field is of the same data type as some non-ordering field of the data file that is an indexing field.
The second field is either a block pointer or a record pointer .
TYPES OF INDEXES ORDERING FIELD NON ORDERING FIELD Key field NonKey field Primary index Secondary index (nonkey) Secondary index (key) Clustering index
PROPERTIES OF INDEX TYPES No Dense or nondense Number of records or number of distinct index field values Secondary (nonkey) No Dense Number of records in data file Secondary (key) Yes/No Nondense Number of distinct index field values Clustering Yes Nondense Number of blocks in data file Primary BLOCK ANCHORING ON THE DATA FILE DENSE OR NONDENSE NUMBER OF (FIRST-LEVEL) INDEX ENTRIES TYPE OF INDEX
Multilevel indexes can be constructed to improve the efficiency of searching an index.
The index is too large and so is split into a number of separate indexes. There would then be an index for these indexes. In fact, a number of different levels of index may exist.
Files are ordered sequentially on some search key, and a primary index is associated with it.
Indexed sequential files are important for applications where data needs to be accessed.....
2. randomly using the index.
An indexed sequential file allows fast access to a specific record.
EXAMPLE: A company may store details about its employees as an indexed sequential file. Sometimes the file is accessed…
Sequentially . For example when the whole of the file is processed to produce pay slips at the end of the month.
Randomly . Maybe an employee changes address, or a female employee gets married and changes her surname.
DYNAMIC MULTILEVEL INDEXES USING B-TREES AND B+ TREES
B-trees and B+ trees are special cases of the well-known tree data structure.
Rudolf Bayer and Ed McCreight – B-tree’s creators
B stands for balanced , as all the leaf nodes are at the same level in the tree.
B may also stand for Bayer , Branching Tree, or for Boeing because they are working for Boeing Scientific Research Labs at that time.
A K I H G D C J F E B Root node (level 0) nodes at level 1 Nodes at level 2 Nodes at level 3 SUBTREE FOR NODE B (nodes E, J, C, G, H, and K are leaf nodes of the tree) A tree data structure that shows an unbalanced tree.
SEARCH TREES AND B-TREES
SEARCH TREE – is a special type of tree that is used to guide the search for a record, given the value of one of the record’s field.
A binary search tree of size 9 and depth 3, with root 8 and leaves 1, 4, 7 and 13
Binary search tree ( BST ) is a binary tree data structure which has the following properties:
each node (item in the tree) has a value;
a total order (linear order) is defined on these values;
the left subtree of a node contains only values less than the node's value;
the right subtree of a node contains only values greater than or equal to the node's value.
B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time. It is most commonly used in databases and filesystems .
A simple B tree example.
B+ tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key .
It is a dynamic, multilevel index, with maximum and minimum bounds on the number of keys in each index segment (usually called a ' block ' or ' node ').
A simple B+ tree example linking the keys 1-7 to data values d1-d7. Note the linked list (red) allowing rapid in-order traversal.
INDEXES ON MULTIPLE KEYS
The primary or secondary keys on which files were accessed were single attributes (fields).
In many retrieval and update requests, multiple attributes are involved. If a certain combination of attributes is used very frequently, it is advantageous to set up an access structure to provide efficient access by a key value that is a combination of those attributes.
An extension of static external hashing that allows access on multiple keys.
It is suitable only for equality comparisons.
A grid file is a multidimensional array , normally held on disk, and used as an index into a database of information.
Grid files perform well in terms of reduction in time for multiple key access.
OTHER TYPES OF INDEXES
Using hashing and other data structures as indexes
Logical versus physical indexes
An index is often called an access structure.
A secondary index is created to avoid physical ordering of the records in the data file on disk.
Fully inverted file – a file that has a secondary index on every one of its fields
Virtual Storage Access Method (VSAM) – IBM file organization that is similar to the B+ tree structure.