SlideShare a Scribd company logo
Indexing
and
Hashing
7/24/2017 1Md. Golam Moazzam, Dept. of CSE, JU
 Indexing: Basic Concepts
 Evaluation Factors
 Ordered Indices: Primary and Secondary
 Dense and Sparse indices
 Multilevel Indexing
 B+ Tree Index Files
 B-Tree Index Files
 Hashing
 Hash File Organization
 Handling of Bucket Overflows
 Open and Closed hashing
 Hash Indices
7/24/2017 2Md. Golam Moazzam, Dept. of CSE, JU
OUTLINE
Indexing and Hashing
 Database Index
A data structure that improves the speed of data retrieval operations on a
database table at the cost of slower writes and the use of more storage
space.
 Basic Concept
An index for a file in a database system works in much the same way as the
index in this textbook. If we want to learn about a particular topic, we can
search for the topic in the index at the back of the book, find the pages
where it occurs, and then read the pages to find the information we are
looking for. The words in the index are in sorted order, making it easy to
find the word we are looking for. Moreover, the index is much smaller than
the book, further reducing the effort needed to find the words we are
looking for.
7/24/2017 3Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Types of Indices
There are two basic types of indices:
– Ordered Indices
– Hash Indices
Ordered Indices: Based on a sorted ordering of the values.
Hash Indices. Based on a uniform distribution of values across a range of buckets.
The bucket to which a value is assigned is determined by a function, called a hash
function.
 Evaluation Factors
There are several techniques for both ordered indexing and hashing. No one
technique is the best. Rather, each technique is best suited to particular database
applications. Each technique must be evaluated on the basis of the following
factors:
7/24/2017 4Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Evaluation Factors
 Access Types: Access types can include finding records with a
specified attribute value and finding records whose attribute values fall
in a specified range.
 Access Time: The time it takes to find a particular data item, or set of
items, using the technique in question.
 Insertion Time: The time it takes to insert a new data item. This value
includes the time it takes to find the correct place to insert the new data
item, as well as the time it takes to update the index structure.
7/24/2017 5Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Evaluation Factors
 Deletion time: The time it takes to delete a data item. This value
includes the time it takes to find the item to be deleted, as well as the
time it takes to update the index structure.
 Space overhead: The additional space occupied by an index structure.
Provided that the amount of additional space is moderate, it is usually
worthwhile to sacrifice the space to achieve improved performance.
7/24/2017 6Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Search Key
An attribute or set of attributes used to look up records in a file is called a
search key.
 Ordered Indices
 To gain fast random access to records in a file, we can use an index
structure.
 Each index structure is associated with a particular search key.
 An ordered index stores the values of the search keys in sorted order,
and associates with each search key the records that contain it.
 A file may have several indices, on different search keys.
7/24/2017 7Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Ordered Indices
 Primary Index
 Secondary Index
 Primary Index: If the file containing the records is sequentially
ordered, a primary index is an index whose search key also defines the
sequential order of the file.
 Primary indices are also called clustering indices.
 Types: Dense and Sparse
7/24/2017 8Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Dense Index
A dense index in databases is a file with pairs of keys and pointers for
every record in the data file. Every key in this file is associated with a
particular pointer to a record in the sorted data file.
 An index record appears for every search-key value in the file.
 In a dense primary index, the index record contains the search-key
value and a pointer to the first data record with that search-key value.
 The rest of the records with the same search key-value would be stored
sequentially after the first record, because the index is a primary one,
records are sorted on the same search key.
7/24/2017 9Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Dense Index
7/24/2017 10Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Sparse Index
 An index record appears for only some of the search-key values.
 Each index record contains a search-key value and a pointer to the first
data record with that search-key value.
 To locate a record, we find the index entry with the largest search-key
value that is less than or equal to the search-key value for which we are
looking.
 We start at the record pointed to by that index entry, and follow the
pointers in the file until we find the desired record.
7/24/2017 11Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Sparse Index
7/24/2017 12Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Dense VS Sparse Indices
 It is generally faster to locate a record if we have a dense index rather
than a sparse index.
 However, sparse indices have advantages over dense indices in that
they require less space and they impose less maintenance overhead for
insertions and deletions.
 There is a trade-off that the system designer must make between access
time and space overhead.
7/24/2017 13Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Multi-Level Indices
 If primary index does not fit in memory, access becomes expensive.
 Solution: treat primary index kept on disk as a sequential file and
construct a sparse index on it.
- Outer index – a sparse index of primary index
- Inner index – the primary index file
 If even outer index is too large to fit in main memory, yet another level
of index can be created, and so on.
 Indices at all levels must be updated on insertion or deletion from the
file.
7/24/2017 14Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Multi-Level Indices: An Example
 Consider 100,000 records, 10 per block, at one index record per block,
that's 10,000 index records. Even if we can fit 100 index records per
block, this is 100 blocks. If index is too large to be kept in main
memory, a search results in several disk reads.
 For very large files, additional levels of indexing may be required.
 Indices must be updated at all levels when insertions or deletions
require it.
 Frequently, each level of index corresponds to a unit of physical
storage.
7/24/2017 15Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Multi-Level Indices: An Example
7/24/2017 16Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Secondary Index
– Indices whose search key specifies an order different from the
sequential order of the file are called secondary indices, or non-
clustering indices.
– Secondary indices must be dense with an index entry for every search-
key value, and a pointer to every record in the file.
7/24/2017 17Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Secondary Index
7/24/2017 18Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Primary VS Secondary Indices
 A sequential scan in primary index order is efficient because records in
the file are stored physically in the same order as the index order.
 Secondary indices improve the performance of queries that use keys
other than the search key of the primary index. However, they impose a
significant overhead on modification of the database. The designer of a
database decides which secondary indices are desirable on the basis of
an estimate of the relative frequency of queries and modifications.
 The primary index is on the field which specifies the sequential order
of the data file.
 There can be only one primary index while there can be many
secondary indices.
7/24/2017 19Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 B+ Tree Index Files
 The main disadvantage of the index-sequential file organization is that
performance degrades as the file grows, both for index lookups and for
sequential scans through the data. To over come this deficiency, we use
a B+ tree index.
 The B+ tree index structure is the most widely used of several index
structures that maintain their efficiency despite insertion and deletion of
data.
 This is a balanced tree in which every path from the root of the tree to
a leaf of the tree is of the same length.
 A B+ tree index is a multilevel index. A typical node of a B+tree is
shown below.
7/24/2017 20Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 B+ Tree Index Files
 A B+ tree index is a multilevel index. A typical node of a B+-tree is
shown below.
 Each node that is not a root or a leaf has between n/2 and n children.
 A leaf node has between (n–1)/2 and n–1 values
 Special cases:
- If the root is not a leaf, it has at least 2 children.
- If the root is a leaf (that is, there are no other nodes in the tree),
it can have between 0 and (n–1) values.
7/24/2017 21Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 B+ Tree Index Files
 It contains up to n − 1 search-key values K1, K2, . . .,Kn−1, and n
pointers P1, P2, . . . ,Pn.
 The search-keys in a node are ordered: K1 < K2 < K3 < . . . < Kn–1
 For leaf nodes, for i = 1, 2, . . . , n − 1, pointer Pi points to either a file
record with search-key value Ki or to a bucket of pointers, each of
which points to a file record with search-key value Ki.
7/24/2017 22Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 B+ Tree Index Files
 A non-leaf node may hold up to n pointers, and must hold at least n/2
pointers.
 The number of pointers in a node is called the fanout of the node.
 The root node can hold fewer than n/2 pointers. However, it must
hold at least two pointers.
7/24/2017 23Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution: Construction of B+ tree for order n=4.
Search key values =3, Pointers= 4.
Insert key value 2:
Insert key value 3:
7/24/2017 24Md. Golam Moazzam, Dept. of CSE, JU
2
2 3
Indexing and Hashing
 Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 5:
Insert key value 7: Split the node.
7/24/2017 25Md. Golam Moazzam, Dept. of CSE, JU
2 3 5
2 3 5 7
5
Indexing and Hashing
 Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 11:
Insert key value 17: Split the node.
7/24/2017 26Md. Golam Moazzam, Dept. of CSE, JU
2 3 5 7
5 11
2 3 5 7 11
5
11 17
Indexing and Hashing
 Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 19:
Insert key value 23: Split the node.
7/24/2017 27Md. Golam Moazzam, Dept. of CSE, JU
2 3 5 7
5 11 19
11 17
2 3 5 7
5 11
11 17 19
19 23
Indexing and Hashing
 Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 29:
7/24/2017 28Md. Golam Moazzam, Dept. of CSE, JU
2 3 5 7
5 11 19
11 17 19 23 29
Indexing and Hashing
 Construct a B+ tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Insert key value 31:
7/24/2017 29Md. Golam Moazzam, Dept. of CSE, JU
19
2 3 5 7 11 17 19 23 29 31
5 11 29
Indexing and Hashing
 Construct a B+-tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
For n=6:
7/24/2017 30Md. Golam Moazzam, Dept. of CSE, JU
7 19
2 3 5 7 11 17 19 23 3129
Indexing and Hashing
 B-Tree Index Files
– B-tree indices are similar to B+ tree indices. The primary distinction
between the two approaches is that a B-tree eliminates the redundant
storage of search-key values.
– A B-tree allows search-key values to appear only once. Thus, it is
necessary to include an additional pointer field for each search key in a
nonleaf node. These additional pointers point to either file records or
buckets for the associated search key
7/24/2017 31Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 B-Tree Index Files
– A generalized B-tree leaf node and a non-leaf node appear in Fig. (a)
and Fig. (b) respectively.
7/24/2017 32Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 B-Tree Index Files
 Leaf nodes are the same as in B+ trees. In nonleaf nodes, the pointers Pi
are the tree pointers that we used also for B+ trees, while the pointers
Bi are bucket or file-record pointers. In the generalized B-tree in the
figure, there are n – 1 keys in the leaf node, but there are m − 1 keys in
the nonleaf node. This discrepancy occurs because nonleaf nodes must
include pointers Bi, thus reducing the number of search keys that can be
held in these nodes.
 Advantages of B-Tree indices
 May use less tree nodes than a corresponding B+ Tree.
 Sometimes possible to find search-key value before reaching leaf node.
7/24/2017 33Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Disadvantages of B-Tree indices
 Only small fraction of all search-key values are found early.
 Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees
typically have greater depth than corresponding B+ Tree
 Insertion and deletion more complicated than in B+ Trees.
 Implementation is harder than B+ Trees.
7/24/2017 34Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
7/24/2017 35Md. Golam Moazzam, Dept. of CSE, JU
 Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution: Construction of B- tree for order n=4.
Search key values =3, Pointers= 4.
Insert key value 2:
Insert key value 3:
2
2 3
Indexing and Hashing
7/24/2017 36Md. Golam Moazzam, Dept. of CSE, JU
 Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 5:
Insert key value 7:
2 3 5
2 3 7
5
Indexing and Hashing
7/24/2017 37Md. Golam Moazzam, Dept. of CSE, JU
 Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 11:
Insert key value 17:
2 3 7 11 17
5
2 3 7 11
5
Indexing and Hashing
7/24/2017 38Md. Golam Moazzam, Dept. of CSE, JU
 Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 19:
Insert key value 23:
2 3 7 11
5 17
19
2 3 7 11
5 17
19 23
Indexing and Hashing
7/24/2017 39Md. Golam Moazzam, Dept. of CSE, JU
 Construct a B- tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 29:
2 3 7 11
5 17
19 23 29
Indexing and Hashing
 Construct a B-tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
Solution:
Insert key value 31:
7/24/2017 40Md. Golam Moazzam, Dept. of CSE, JU
5 29
2 3 7 11
17
19 23 31
Indexing and Hashing
 Hashing
 One disadvantage of sequential file organization is that we must use an
index structure to locate data. File organizations based on the technique
of hashing allow us to avoid accessing an index structure. Hashing also
provides a way of constructing indices.
 File organizations based on hashing allow us to find the address of a
data item directly by computing a function on the search-key value of
the desired record.
7/24/2017 41Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Hash File Organization
 In a hash file organization, we obtain the address of the disk block, also
called the bucket containing a desired record directly by computing a
function on the search-key value of the record.
 Let K denote the set of all search-key values, and let B denote the set of
all bucket addresses. A hash function h is a function from K to B. Let h
denote a hash function.
 To insert a record with search key Ki, we compute h(Ki), which gives
the address of the bucket for that record. Assume for now that there is
space in the bucket to store the record. Then, the record is stored in that
bucket.
7/24/2017 42Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Hash File Organization
 To perform a lookup on a search-key value Ki, we simply compute
h(Ki), then search the bucket with that address. Suppose that two search
keys, K5 and K7, have the same hash value; that is, h(K5) = h(K7). If we
perform a lookup on K5, the bucket h(K5) contains records with search-
key values K5 and records with search key values K7. Thus, we have to
check the search-key value of every record in the bucket to verify that
the record is one that we want.
7/24/2017 43Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Hash File Organization: An Example
– Let us choose a hash function for the account file using the search key
branch_name.
– Suppose we have 26 buckets and we define a hash function that maps
names beginning with the ith letter of the alphabet to the ith bucket.
– This hash function has the virtue of simplicity, but it fails to provide a
uniform distribution, since we expect more branch names to begin with
such letters as B and R than Q and X.
7/24/2017 44Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Hash File Organization: An Example
– Instead, we consider 10 buckets and a hash function that computes the
sum of the binary representations of the characters of a key, then
returns the sum modulo the number of buckets.
– For branch name ‘Perryridge’
Bucket no=h(Perryridge) = 5
– For branch name ‘Round Hill’
Bucket no=h(Round Hill) = 3
– For branch name ‘Brighton’
Bucket no=h(Brighton) = 3
7/24/2017 45Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Hash File Organization: An Example
7/24/2017 46Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Handling of Bucket Overflows
 In case of insertion, if the bucket does not have enough space, a bucket
overflow is said to occur. Bucket overflow can occur mainly for two
reasons:
 Insufficient buckets. The number of buckets nB must be chosen such
that nB > nr/fr, where nr denotes the total number of records that will be
stored and fr denotes the number of records that will fit in a bucket.
 Skew. Some buckets are assigned more records than are others, so a
bucket may overflow even when other buckets still have space. This
situation is called bucket skew. Skew can occur for two reasons:
– Multiple records may have the same search key.
– The chosen hash function may result in non-uniform distribution of
search keys.
7/24/2017 47Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Handling of Bucket Overflows
Solution:
 If a record must be inserted into a bucket b, and b is already full, the
system provides an overflow bucket for b, and inserts the record into
the overflow bucket. If the overflow bucket is also full, the system
provides another overflow bucket, and so on. All the overflow buckets
of a given bucket are chained together in a linked list.
7/24/2017 48Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Handling of Bucket Overflows
7/24/2017 49Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Difference between open and closed hashing
Closed Hashing:
 Closed hashing always places keys with same hash function values in
same bucket (in overflow buckets also).
 If bucket is full, the system inserts records in overflow buckets.
 Different buckets can be of different sizes.
 Overflow buckets are linked together.
7/24/2017 50Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Difference between open and closed hashing
Open Hashing:
 Open hashing places keys with same hash function values in different
bucket if a bucket is full.
 Set of buckets is fixed there is no overflow chain
 Deletion is difficult in open hashing.
7/24/2017 51Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Hash Indices
 Hashing can be used not only for file organization, but also for index-
structure creation.
 We construct a hash index as follows. We apply a hash function on a
search key to identify a bucket, and store the key and its associated
pointers in the bucket.
7/24/2017 52Md. Golam Moazzam, Dept. of CSE, JU
Indexing and Hashing
 Hash Indices
7/24/2017 53Md. Golam Moazzam, Dept. of CSE, JU

More Related Content

What's hot

Sql fundamentals
Sql fundamentalsSql fundamentals
Sql fundamentals
Ravinder Kamboj
 
Intro to trigger and constraint
Intro to trigger and constraintIntro to trigger and constraint
Intro to trigger and constraintLearningTech
 
directory structure and file system mounting
directory structure and file system mountingdirectory structure and file system mounting
directory structure and file system mounting
rajshreemuthiah
 
File system vs DBMS
File system vs DBMSFile system vs DBMS
File system vs DBMS
BHARATH KUMAR
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for files
Zainab Almugbel
 
File organization 1
File organization 1File organization 1
File organization 1
Rupali Rana
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMSkoolkampus
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
madhav bansal
 
Query optimization
Query optimizationQuery optimization
Query optimization
Pooja Dixit
 
Parallel processing
Parallel processingParallel processing
Parallel processing
rajshreemuthiah
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
Saranya Natarajan
 
Structure of dbms
Structure of dbmsStructure of dbms
Structure of dbms
Megha yadav
 
Concurrency control
Concurrency controlConcurrency control
Concurrency control
Soumyajit Dutta
 
Lock based protocols
Lock based protocolsLock based protocols
Lock based protocols
ChethanMp7
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture
SabthamiS1
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management System
Free Open Source Software Technology Lab
 
2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMSkoolkampus
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model Introduction
Nishant Munjal
 

What's hot (20)

Sql fundamentals
Sql fundamentalsSql fundamentals
Sql fundamentals
 
Intro to trigger and constraint
Intro to trigger and constraintIntro to trigger and constraint
Intro to trigger and constraint
 
directory structure and file system mounting
directory structure and file system mountingdirectory structure and file system mounting
directory structure and file system mounting
 
File system vs DBMS
File system vs DBMSFile system vs DBMS
File system vs DBMS
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for files
 
File organization 1
File organization 1File organization 1
File organization 1
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMS
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
 
Structure of dbms
Structure of dbmsStructure of dbms
Structure of dbms
 
Concurrency control
Concurrency controlConcurrency control
Concurrency control
 
Lock based protocols
Lock based protocolsLock based protocols
Lock based protocols
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management System
 
2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS
 
Huffman codes
Huffman codesHuffman codes
Huffman codes
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model Introduction
 

Similar to indexing and hashing

Lec 1 indexing and hashing
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
Md. Mashiur Rahman
 
DBMS 8 | Memory Hierarchy and Indexing
DBMS 8 | Memory Hierarchy and IndexingDBMS 8 | Memory Hierarchy and Indexing
DBMS 8 | Memory Hierarchy and Indexing
Mohammad Imam Hossain
 
Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
Infinity Tech Solutions
 
DBMS (UNIT 5)
DBMS (UNIT 5)DBMS (UNIT 5)
DBMS (UNIT 5)
SURBHI SAROHA
 
DMBS Indexes.pptx
DMBS Indexes.pptxDMBS Indexes.pptx
DMBS Indexes.pptx
husainsadikarvy
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
Abdul mannan Karim
 
Data indexing presentation
Data indexing presentationData indexing presentation
Data indexing presentation
gmbmanikandan
 
Index Structures.pptx
Index Structures.pptxIndex Structures.pptx
Index Structures.pptx
MBablu1
 
Cs437 lecture 14_15
Cs437 lecture 14_15Cs437 lecture 14_15
Cs437 lecture 14_15
Aneeb_Khawar
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptx
peter1097
 
Database and Research Matrix.pptx
Database and Research Matrix.pptxDatabase and Research Matrix.pptx
Database and Research Matrix.pptx
RahulRoshan37
 
DBMS-Unit5-PPT.pptx important for revision
DBMS-Unit5-PPT.pptx important for revisionDBMS-Unit5-PPT.pptx important for revision
DBMS-Unit5-PPT.pptx important for revision
yuvivarmaa
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
Javed Khan
 
Ch12
Ch12Ch12
Context Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebContext Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic Web
IOSR Journals
 
Db lec 08_new
Db lec 08_newDb lec 08_new
Db lec 08_new
Ramadan Babers, PhD
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashingJeet Poria
 
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Beat Signer
 

Similar to indexing and hashing (20)

Lec 1 indexing and hashing
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
 
DBMS 8 | Memory Hierarchy and Indexing
DBMS 8 | Memory Hierarchy and IndexingDBMS 8 | Memory Hierarchy and Indexing
DBMS 8 | Memory Hierarchy and Indexing
 
Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
 
DBMS (UNIT 5)
DBMS (UNIT 5)DBMS (UNIT 5)
DBMS (UNIT 5)
 
DMBS Indexes.pptx
DMBS Indexes.pptxDMBS Indexes.pptx
DMBS Indexes.pptx
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Data indexing presentation
Data indexing presentationData indexing presentation
Data indexing presentation
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
Index Structures.pptx
Index Structures.pptxIndex Structures.pptx
Index Structures.pptx
 
Cs437 lecture 14_15
Cs437 lecture 14_15Cs437 lecture 14_15
Cs437 lecture 14_15
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptx
 
Database and Research Matrix.pptx
Database and Research Matrix.pptxDatabase and Research Matrix.pptx
Database and Research Matrix.pptx
 
DBMS-Unit5-PPT.pptx important for revision
DBMS-Unit5-PPT.pptx important for revisionDBMS-Unit5-PPT.pptx important for revision
DBMS-Unit5-PPT.pptx important for revision
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
 
Indexing Process.pptx
Indexing Process.pptxIndexing Process.pptx
Indexing Process.pptx
 
Ch12
Ch12Ch12
Ch12
 
Context Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic WebContext Based Web Indexing For Semantic Web
Context Based Web Indexing For Semantic Web
 
Db lec 08_new
Db lec 08_newDb lec 08_new
Db lec 08_new
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
 

More from University of Potsdam

Computer fundamentals 01
Computer fundamentals 01Computer fundamentals 01
Computer fundamentals 01
University of Potsdam
 
Workshop on android apps development
Workshop on android apps developmentWorkshop on android apps development
Workshop on android apps development
University of Potsdam
 
Transparency and concurrency
Transparency and concurrencyTransparency and concurrency
Transparency and concurrency
University of Potsdam
 
Database System Architecture
Database System ArchitectureDatabase System Architecture
Database System Architecture
University of Potsdam
 
Functional dependency and normalization
Functional dependency and normalizationFunctional dependency and normalization
Functional dependency and normalization
University of Potsdam
 
data recovery-raid
data recovery-raiddata recovery-raid
data recovery-raid
University of Potsdam
 
Query processing
Query processingQuery processing
Query processing
University of Potsdam
 
Machine Learning for Data Mining
Machine Learning for Data MiningMachine Learning for Data Mining
Machine Learning for Data Mining
University of Potsdam
 
Tree, function and graph
Tree, function and graphTree, function and graph
Tree, function and graph
University of Potsdam
 
Sonet
SonetSonet
Sets in discrete mathematics
Sets in discrete mathematicsSets in discrete mathematics
Sets in discrete mathematics
University of Potsdam
 
Set in discrete mathematics
Set in discrete mathematicsSet in discrete mathematics
Set in discrete mathematics
University of Potsdam
 
Series parallel ac rlc networks
Series parallel ac rlc networksSeries parallel ac rlc networks
Series parallel ac rlc networks
University of Potsdam
 
Series parallel ac networks
Series parallel ac networksSeries parallel ac networks
Series parallel ac networks
University of Potsdam
 
Relations
RelationsRelations
Relations
RelationsRelations
Propositional logic
Propositional logicPropositional logic
Propositional logic
University of Potsdam
 
Propositional logic
Propositional logicPropositional logic
Propositional logic
University of Potsdam
 
Prim algorithm
Prim algorithmPrim algorithm
Prim algorithm
University of Potsdam
 
Predicate &amp; quantifier
Predicate &amp; quantifierPredicate &amp; quantifier
Predicate &amp; quantifier
University of Potsdam
 

More from University of Potsdam (20)

Computer fundamentals 01
Computer fundamentals 01Computer fundamentals 01
Computer fundamentals 01
 
Workshop on android apps development
Workshop on android apps developmentWorkshop on android apps development
Workshop on android apps development
 
Transparency and concurrency
Transparency and concurrencyTransparency and concurrency
Transparency and concurrency
 
Database System Architecture
Database System ArchitectureDatabase System Architecture
Database System Architecture
 
Functional dependency and normalization
Functional dependency and normalizationFunctional dependency and normalization
Functional dependency and normalization
 
data recovery-raid
data recovery-raiddata recovery-raid
data recovery-raid
 
Query processing
Query processingQuery processing
Query processing
 
Machine Learning for Data Mining
Machine Learning for Data MiningMachine Learning for Data Mining
Machine Learning for Data Mining
 
Tree, function and graph
Tree, function and graphTree, function and graph
Tree, function and graph
 
Sonet
SonetSonet
Sonet
 
Sets in discrete mathematics
Sets in discrete mathematicsSets in discrete mathematics
Sets in discrete mathematics
 
Set in discrete mathematics
Set in discrete mathematicsSet in discrete mathematics
Set in discrete mathematics
 
Series parallel ac rlc networks
Series parallel ac rlc networksSeries parallel ac rlc networks
Series parallel ac rlc networks
 
Series parallel ac networks
Series parallel ac networksSeries parallel ac networks
Series parallel ac networks
 
Relations
RelationsRelations
Relations
 
Relations
RelationsRelations
Relations
 
Propositional logic
Propositional logicPropositional logic
Propositional logic
 
Propositional logic
Propositional logicPropositional logic
Propositional logic
 
Prim algorithm
Prim algorithmPrim algorithm
Prim algorithm
 
Predicate &amp; quantifier
Predicate &amp; quantifierPredicate &amp; quantifier
Predicate &amp; quantifier
 

Recently uploaded

Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 

indexing and hashing

  • 1. Indexing and Hashing 7/24/2017 1Md. Golam Moazzam, Dept. of CSE, JU
  • 2.  Indexing: Basic Concepts  Evaluation Factors  Ordered Indices: Primary and Secondary  Dense and Sparse indices  Multilevel Indexing  B+ Tree Index Files  B-Tree Index Files  Hashing  Hash File Organization  Handling of Bucket Overflows  Open and Closed hashing  Hash Indices 7/24/2017 2Md. Golam Moazzam, Dept. of CSE, JU OUTLINE
  • 3. Indexing and Hashing  Database Index A data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and the use of more storage space.  Basic Concept An index for a file in a database system works in much the same way as the index in this textbook. If we want to learn about a particular topic, we can search for the topic in the index at the back of the book, find the pages where it occurs, and then read the pages to find the information we are looking for. The words in the index are in sorted order, making it easy to find the word we are looking for. Moreover, the index is much smaller than the book, further reducing the effort needed to find the words we are looking for. 7/24/2017 3Md. Golam Moazzam, Dept. of CSE, JU
  • 4. Indexing and Hashing  Types of Indices There are two basic types of indices: – Ordered Indices – Hash Indices Ordered Indices: Based on a sorted ordering of the values. Hash Indices. Based on a uniform distribution of values across a range of buckets. The bucket to which a value is assigned is determined by a function, called a hash function.  Evaluation Factors There are several techniques for both ordered indexing and hashing. No one technique is the best. Rather, each technique is best suited to particular database applications. Each technique must be evaluated on the basis of the following factors: 7/24/2017 4Md. Golam Moazzam, Dept. of CSE, JU
  • 5. Indexing and Hashing  Evaluation Factors  Access Types: Access types can include finding records with a specified attribute value and finding records whose attribute values fall in a specified range.  Access Time: The time it takes to find a particular data item, or set of items, using the technique in question.  Insertion Time: The time it takes to insert a new data item. This value includes the time it takes to find the correct place to insert the new data item, as well as the time it takes to update the index structure. 7/24/2017 5Md. Golam Moazzam, Dept. of CSE, JU
  • 6. Indexing and Hashing  Evaluation Factors  Deletion time: The time it takes to delete a data item. This value includes the time it takes to find the item to be deleted, as well as the time it takes to update the index structure.  Space overhead: The additional space occupied by an index structure. Provided that the amount of additional space is moderate, it is usually worthwhile to sacrifice the space to achieve improved performance. 7/24/2017 6Md. Golam Moazzam, Dept. of CSE, JU
  • 7. Indexing and Hashing  Search Key An attribute or set of attributes used to look up records in a file is called a search key.  Ordered Indices  To gain fast random access to records in a file, we can use an index structure.  Each index structure is associated with a particular search key.  An ordered index stores the values of the search keys in sorted order, and associates with each search key the records that contain it.  A file may have several indices, on different search keys. 7/24/2017 7Md. Golam Moazzam, Dept. of CSE, JU
  • 8. Indexing and Hashing  Ordered Indices  Primary Index  Secondary Index  Primary Index: If the file containing the records is sequentially ordered, a primary index is an index whose search key also defines the sequential order of the file.  Primary indices are also called clustering indices.  Types: Dense and Sparse 7/24/2017 8Md. Golam Moazzam, Dept. of CSE, JU
  • 9. Indexing and Hashing  Dense Index A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file.  An index record appears for every search-key value in the file.  In a dense primary index, the index record contains the search-key value and a pointer to the first data record with that search-key value.  The rest of the records with the same search key-value would be stored sequentially after the first record, because the index is a primary one, records are sorted on the same search key. 7/24/2017 9Md. Golam Moazzam, Dept. of CSE, JU
  • 10. Indexing and Hashing  Dense Index 7/24/2017 10Md. Golam Moazzam, Dept. of CSE, JU
  • 11. Indexing and Hashing  Sparse Index  An index record appears for only some of the search-key values.  Each index record contains a search-key value and a pointer to the first data record with that search-key value.  To locate a record, we find the index entry with the largest search-key value that is less than or equal to the search-key value for which we are looking.  We start at the record pointed to by that index entry, and follow the pointers in the file until we find the desired record. 7/24/2017 11Md. Golam Moazzam, Dept. of CSE, JU
  • 12. Indexing and Hashing  Sparse Index 7/24/2017 12Md. Golam Moazzam, Dept. of CSE, JU
  • 13. Indexing and Hashing  Dense VS Sparse Indices  It is generally faster to locate a record if we have a dense index rather than a sparse index.  However, sparse indices have advantages over dense indices in that they require less space and they impose less maintenance overhead for insertions and deletions.  There is a trade-off that the system designer must make between access time and space overhead. 7/24/2017 13Md. Golam Moazzam, Dept. of CSE, JU
  • 14. Indexing and Hashing  Multi-Level Indices  If primary index does not fit in memory, access becomes expensive.  Solution: treat primary index kept on disk as a sequential file and construct a sparse index on it. - Outer index – a sparse index of primary index - Inner index – the primary index file  If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.  Indices at all levels must be updated on insertion or deletion from the file. 7/24/2017 14Md. Golam Moazzam, Dept. of CSE, JU
  • 15. Indexing and Hashing  Multi-Level Indices: An Example  Consider 100,000 records, 10 per block, at one index record per block, that's 10,000 index records. Even if we can fit 100 index records per block, this is 100 blocks. If index is too large to be kept in main memory, a search results in several disk reads.  For very large files, additional levels of indexing may be required.  Indices must be updated at all levels when insertions or deletions require it.  Frequently, each level of index corresponds to a unit of physical storage. 7/24/2017 15Md. Golam Moazzam, Dept. of CSE, JU
  • 16. Indexing and Hashing  Multi-Level Indices: An Example 7/24/2017 16Md. Golam Moazzam, Dept. of CSE, JU
  • 17. Indexing and Hashing  Secondary Index – Indices whose search key specifies an order different from the sequential order of the file are called secondary indices, or non- clustering indices. – Secondary indices must be dense with an index entry for every search- key value, and a pointer to every record in the file. 7/24/2017 17Md. Golam Moazzam, Dept. of CSE, JU
  • 18. Indexing and Hashing  Secondary Index 7/24/2017 18Md. Golam Moazzam, Dept. of CSE, JU
  • 19. Indexing and Hashing  Primary VS Secondary Indices  A sequential scan in primary index order is efficient because records in the file are stored physically in the same order as the index order.  Secondary indices improve the performance of queries that use keys other than the search key of the primary index. However, they impose a significant overhead on modification of the database. The designer of a database decides which secondary indices are desirable on the basis of an estimate of the relative frequency of queries and modifications.  The primary index is on the field which specifies the sequential order of the data file.  There can be only one primary index while there can be many secondary indices. 7/24/2017 19Md. Golam Moazzam, Dept. of CSE, JU
  • 20. Indexing and Hashing  B+ Tree Index Files  The main disadvantage of the index-sequential file organization is that performance degrades as the file grows, both for index lookups and for sequential scans through the data. To over come this deficiency, we use a B+ tree index.  The B+ tree index structure is the most widely used of several index structures that maintain their efficiency despite insertion and deletion of data.  This is a balanced tree in which every path from the root of the tree to a leaf of the tree is of the same length.  A B+ tree index is a multilevel index. A typical node of a B+tree is shown below. 7/24/2017 20Md. Golam Moazzam, Dept. of CSE, JU
  • 21. Indexing and Hashing  B+ Tree Index Files  A B+ tree index is a multilevel index. A typical node of a B+-tree is shown below.  Each node that is not a root or a leaf has between n/2 and n children.  A leaf node has between (n–1)/2 and n–1 values  Special cases: - If the root is not a leaf, it has at least 2 children. - If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1) values. 7/24/2017 21Md. Golam Moazzam, Dept. of CSE, JU
  • 22. Indexing and Hashing  B+ Tree Index Files  It contains up to n − 1 search-key values K1, K2, . . .,Kn−1, and n pointers P1, P2, . . . ,Pn.  The search-keys in a node are ordered: K1 < K2 < K3 < . . . < Kn–1  For leaf nodes, for i = 1, 2, . . . , n − 1, pointer Pi points to either a file record with search-key value Ki or to a bucket of pointers, each of which points to a file record with search-key value Ki. 7/24/2017 22Md. Golam Moazzam, Dept. of CSE, JU
  • 23. Indexing and Hashing  B+ Tree Index Files  A non-leaf node may hold up to n pointers, and must hold at least n/2 pointers.  The number of pointers in a node is called the fanout of the node.  The root node can hold fewer than n/2 pointers. However, it must hold at least two pointers. 7/24/2017 23Md. Golam Moazzam, Dept. of CSE, JU
  • 24. Indexing and Hashing  Construct a B+ tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Solution: Construction of B+ tree for order n=4. Search key values =3, Pointers= 4. Insert key value 2: Insert key value 3: 7/24/2017 24Md. Golam Moazzam, Dept. of CSE, JU 2 2 3
  • 25. Indexing and Hashing  Construct a B+ tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Insert key value 5: Insert key value 7: Split the node. 7/24/2017 25Md. Golam Moazzam, Dept. of CSE, JU 2 3 5 2 3 5 7 5
  • 26. Indexing and Hashing  Construct a B+ tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Insert key value 11: Insert key value 17: Split the node. 7/24/2017 26Md. Golam Moazzam, Dept. of CSE, JU 2 3 5 7 5 11 2 3 5 7 11 5 11 17
  • 27. Indexing and Hashing  Construct a B+ tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Insert key value 19: Insert key value 23: Split the node. 7/24/2017 27Md. Golam Moazzam, Dept. of CSE, JU 2 3 5 7 5 11 19 11 17 2 3 5 7 5 11 11 17 19 19 23
  • 28. Indexing and Hashing  Construct a B+ tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Insert key value 29: 7/24/2017 28Md. Golam Moazzam, Dept. of CSE, JU 2 3 5 7 5 11 19 11 17 19 23 29
  • 29. Indexing and Hashing  Construct a B+ tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Insert key value 31: 7/24/2017 29Md. Golam Moazzam, Dept. of CSE, JU 19 2 3 5 7 11 17 19 23 29 31 5 11 29
  • 30. Indexing and Hashing  Construct a B+-tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. For n=6: 7/24/2017 30Md. Golam Moazzam, Dept. of CSE, JU 7 19 2 3 5 7 11 17 19 23 3129
  • 31. Indexing and Hashing  B-Tree Index Files – B-tree indices are similar to B+ tree indices. The primary distinction between the two approaches is that a B-tree eliminates the redundant storage of search-key values. – A B-tree allows search-key values to appear only once. Thus, it is necessary to include an additional pointer field for each search key in a nonleaf node. These additional pointers point to either file records or buckets for the associated search key 7/24/2017 31Md. Golam Moazzam, Dept. of CSE, JU
  • 32. Indexing and Hashing  B-Tree Index Files – A generalized B-tree leaf node and a non-leaf node appear in Fig. (a) and Fig. (b) respectively. 7/24/2017 32Md. Golam Moazzam, Dept. of CSE, JU
  • 33. Indexing and Hashing  B-Tree Index Files  Leaf nodes are the same as in B+ trees. In nonleaf nodes, the pointers Pi are the tree pointers that we used also for B+ trees, while the pointers Bi are bucket or file-record pointers. In the generalized B-tree in the figure, there are n – 1 keys in the leaf node, but there are m − 1 keys in the nonleaf node. This discrepancy occurs because nonleaf nodes must include pointers Bi, thus reducing the number of search keys that can be held in these nodes.  Advantages of B-Tree indices  May use less tree nodes than a corresponding B+ Tree.  Sometimes possible to find search-key value before reaching leaf node. 7/24/2017 33Md. Golam Moazzam, Dept. of CSE, JU
  • 34. Indexing and Hashing  Disadvantages of B-Tree indices  Only small fraction of all search-key values are found early.  Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically have greater depth than corresponding B+ Tree  Insertion and deletion more complicated than in B+ Trees.  Implementation is harder than B+ Trees. 7/24/2017 34Md. Golam Moazzam, Dept. of CSE, JU
  • 35. Indexing and Hashing 7/24/2017 35Md. Golam Moazzam, Dept. of CSE, JU  Construct a B- tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Solution: Construction of B- tree for order n=4. Search key values =3, Pointers= 4. Insert key value 2: Insert key value 3: 2 2 3
  • 36. Indexing and Hashing 7/24/2017 36Md. Golam Moazzam, Dept. of CSE, JU  Construct a B- tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Solution: Insert key value 5: Insert key value 7: 2 3 5 2 3 7 5
  • 37. Indexing and Hashing 7/24/2017 37Md. Golam Moazzam, Dept. of CSE, JU  Construct a B- tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Solution: Insert key value 11: Insert key value 17: 2 3 7 11 17 5 2 3 7 11 5
  • 38. Indexing and Hashing 7/24/2017 38Md. Golam Moazzam, Dept. of CSE, JU  Construct a B- tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Solution: Insert key value 19: Insert key value 23: 2 3 7 11 5 17 19 2 3 7 11 5 17 19 23
  • 39. Indexing and Hashing 7/24/2017 39Md. Golam Moazzam, Dept. of CSE, JU  Construct a B- tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Solution: Insert key value 29: 2 3 7 11 5 17 19 23 29
  • 40. Indexing and Hashing  Construct a B-tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. Solution: Insert key value 31: 7/24/2017 40Md. Golam Moazzam, Dept. of CSE, JU 5 29 2 3 7 11 17 19 23 31
  • 41. Indexing and Hashing  Hashing  One disadvantage of sequential file organization is that we must use an index structure to locate data. File organizations based on the technique of hashing allow us to avoid accessing an index structure. Hashing also provides a way of constructing indices.  File organizations based on hashing allow us to find the address of a data item directly by computing a function on the search-key value of the desired record. 7/24/2017 41Md. Golam Moazzam, Dept. of CSE, JU
  • 42. Indexing and Hashing  Hash File Organization  In a hash file organization, we obtain the address of the disk block, also called the bucket containing a desired record directly by computing a function on the search-key value of the record.  Let K denote the set of all search-key values, and let B denote the set of all bucket addresses. A hash function h is a function from K to B. Let h denote a hash function.  To insert a record with search key Ki, we compute h(Ki), which gives the address of the bucket for that record. Assume for now that there is space in the bucket to store the record. Then, the record is stored in that bucket. 7/24/2017 42Md. Golam Moazzam, Dept. of CSE, JU
  • 43. Indexing and Hashing  Hash File Organization  To perform a lookup on a search-key value Ki, we simply compute h(Ki), then search the bucket with that address. Suppose that two search keys, K5 and K7, have the same hash value; that is, h(K5) = h(K7). If we perform a lookup on K5, the bucket h(K5) contains records with search- key values K5 and records with search key values K7. Thus, we have to check the search-key value of every record in the bucket to verify that the record is one that we want. 7/24/2017 43Md. Golam Moazzam, Dept. of CSE, JU
  • 44. Indexing and Hashing  Hash File Organization: An Example – Let us choose a hash function for the account file using the search key branch_name. – Suppose we have 26 buckets and we define a hash function that maps names beginning with the ith letter of the alphabet to the ith bucket. – This hash function has the virtue of simplicity, but it fails to provide a uniform distribution, since we expect more branch names to begin with such letters as B and R than Q and X. 7/24/2017 44Md. Golam Moazzam, Dept. of CSE, JU
  • 45. Indexing and Hashing  Hash File Organization: An Example – Instead, we consider 10 buckets and a hash function that computes the sum of the binary representations of the characters of a key, then returns the sum modulo the number of buckets. – For branch name ‘Perryridge’ Bucket no=h(Perryridge) = 5 – For branch name ‘Round Hill’ Bucket no=h(Round Hill) = 3 – For branch name ‘Brighton’ Bucket no=h(Brighton) = 3 7/24/2017 45Md. Golam Moazzam, Dept. of CSE, JU
  • 46. Indexing and Hashing  Hash File Organization: An Example 7/24/2017 46Md. Golam Moazzam, Dept. of CSE, JU
  • 47. Indexing and Hashing  Handling of Bucket Overflows  In case of insertion, if the bucket does not have enough space, a bucket overflow is said to occur. Bucket overflow can occur mainly for two reasons:  Insufficient buckets. The number of buckets nB must be chosen such that nB > nr/fr, where nr denotes the total number of records that will be stored and fr denotes the number of records that will fit in a bucket.  Skew. Some buckets are assigned more records than are others, so a bucket may overflow even when other buckets still have space. This situation is called bucket skew. Skew can occur for two reasons: – Multiple records may have the same search key. – The chosen hash function may result in non-uniform distribution of search keys. 7/24/2017 47Md. Golam Moazzam, Dept. of CSE, JU
  • 48. Indexing and Hashing  Handling of Bucket Overflows Solution:  If a record must be inserted into a bucket b, and b is already full, the system provides an overflow bucket for b, and inserts the record into the overflow bucket. If the overflow bucket is also full, the system provides another overflow bucket, and so on. All the overflow buckets of a given bucket are chained together in a linked list. 7/24/2017 48Md. Golam Moazzam, Dept. of CSE, JU
  • 49. Indexing and Hashing  Handling of Bucket Overflows 7/24/2017 49Md. Golam Moazzam, Dept. of CSE, JU
  • 50. Indexing and Hashing  Difference between open and closed hashing Closed Hashing:  Closed hashing always places keys with same hash function values in same bucket (in overflow buckets also).  If bucket is full, the system inserts records in overflow buckets.  Different buckets can be of different sizes.  Overflow buckets are linked together. 7/24/2017 50Md. Golam Moazzam, Dept. of CSE, JU
  • 51. Indexing and Hashing  Difference between open and closed hashing Open Hashing:  Open hashing places keys with same hash function values in different bucket if a bucket is full.  Set of buckets is fixed there is no overflow chain  Deletion is difficult in open hashing. 7/24/2017 51Md. Golam Moazzam, Dept. of CSE, JU
  • 52. Indexing and Hashing  Hash Indices  Hashing can be used not only for file organization, but also for index- structure creation.  We construct a hash index as follows. We apply a hash function on a search key to identify a bucket, and store the key and its associated pointers in the bucket. 7/24/2017 52Md. Golam Moazzam, Dept. of CSE, JU
  • 53. Indexing and Hashing  Hash Indices 7/24/2017 53Md. Golam Moazzam, Dept. of CSE, JU