Ds 8
- 1. Introduction to Hashing Techniques
Objectives
In this lesson, you will learn to:
Randomly access data by using a hash index
Implement Hashing function
Define different hashing techniques
Define collision and how collisions are handled
Create a hash index and use it, to randomly access
data from a file, using the key field
Introduction to Hashing Techniques/Lesson 8/Slide 1 of 21
©NIIT
- 2. Introduction to Hashing Techniques
Hashing
It means converting a key to an address to retrieve a
record.
Given a key, the offset of the record can be calculated
with the following formula:
Key * Record length
Introduction to Hashing Techniques/Lesson 8/Slide 2 of 21
©NIIT
- 3. Introduction to Hashing Techniques
Hashing Functions
Given a key, the hash function converts it into a hash
value (location) within the range 1 -n, where n is the
size of the storage (address) space that has been
allocated for the records.
The record is then retrieved at the location generated.
Dividing is one of the commonly used hashing
function.
Introduction to Hashing Techniques/Lesson 8/Slide 3 of 21
©NIIT
- 4. Introduction to Hashing Techniques
Hashing Techniques
Two hashing techniques commonly employed are:
Hash indexes
Hash tables
The goal of both the techniques is same, which is as
follows:
To identify the location of a data record in a file
using a key, to address transformation.
Introduction to Hashing Techniques/Lesson 8/Slide 4 of 21
©NIIT
- 5. Introduction to Hashing Techniques
Hash Indexes
An index created by placing keys in locations
calculated using a hashing function is called a hash
index file.
It contains a key-offset pair corresponding to each
record in the data file.
Following two files are used in the technique
employing hash index:
A file containing the data records, and
A hash index file.
Introduction to Hashing Techniques/Lesson 8/Slide 5 of 21
©NIIT
- 6. Introduction to Hashing Techniques
Hash Tables
Hash tables make use of data files only.
It involves calculation of a location based on the value
of a key.
In this method, a whole record is inserted into the
calculated position in the data file, i.e. the hash table.
Introduction to Hashing Techniques/Lesson 8/Slide 6 of 21
©NIIT
- 7. Introduction to Hashing Techniques
Collisions
An attempt to store two keys at the same position is
known as collision.
It will occur irrespective of the hashing function used.
Introduction to Hashing Techniques/Lesson 8/Slide 7 of 21
©NIIT
- 8. Introduction to Hashing Techniques
Collision Processing
Rehashing
This method involves using a secondary hash
function, called a rehashed function, on the hash
value of the key.
The rehash function is applied successively until an
empty position is found.
Introduction to Hashing Techniques/Lesson 8/Slide 8 of 21
©NIIT
- 9. Introduction to Hashing Techniques
Collision Processing (Contd..)
Chaining
This method uses links (pointers) to resolve hash
clashes.
Two chaining techniques are:
Coalesed Chaining
Separate Chaining
Introduction to Hashing Techniques/Lesson 8/Slide 9 of 21
©NIIT
- 10. Introduction to Hashing Techniques
Coalesed Chaining
It completely eliminates the possibility that more than
one collision will occur even for the same hash value.
It requires the storage area to be divided into two
parts:
A prime hash area
An overflow area
Introduction to Hashing Techniques/Lesson 8/Slide 10 of 21
©NIIT
- 11. Introduction to Hashing Techniques
Separate Chaining
In this method, an array of header nodes is used.
Each element in the array is a pointer, which stores
the address of a distinct linked list.
Each linked list is a list of records whose keys have
the same hash values.
When a record has to be retrieved, the hashing
function converts the given key to yield a position
(subscript) in the array.
Introduction to Hashing Techniques/Lesson 8/Slide 11 of 21
©NIIT
- 12. Introduction to Hashing Techniques
Bucket Hashing
The hashing of a key yield the position of a storage
area in which several key entries can be stored. This
storage area is called a bucket.
The file is divided into a number of such buckets.
Each bucket has enough space to store multiple
values.
When a record has to be retrieved, its key is hashed
to give an offset. This offset is a bucket offset. Then
the bucket is read into internal memory and searched
sequentially.
Introduction to Hashing Techniques/Lesson 8/Slide 12 of 21
©NIIT
- 13. Introduction to Hashing Techniques
Hash Indexes Vs Hash Tables
The choice of hashing method depends on the
following factors:
Data Organization
Access Speed
Disk Space Requirement
Introduction to Hashing Techniques/Lesson 8/Slide 13 of 21
©NIIT
- 14. Introduction to Hashing Techniques
An Example to Illustrate the Use of A Hash Table
The assumptions made in this example are:
The key is an alphanumeric field, the first byte
of which is an alphabet.
Only one key exists for a particular hash value.
The problem of collisions is not being
addressed.
The file structure assumed is shown below:
Field Length Type
city 10 String
Population 2 Integer
Introduction to Hashing Techniques/Lesson 8/Slide 14 of 21
©NIIT
- 15. Introduction to Hashing Techniques
An Example to Illustrate the Use of A Hash Table
(Contd..)
The hashing algorithm used is as follows: the first
letter from the alphabetic key is extracted and the
position of this letter in the alphabet is used as the
hash value. If the first letter in the key is C, then
the hash value is 3. Thus, it is obvious that the
number of positions (or buckets) that a key might
hash to is 26, which is the number of letters in the
alphabet.
The processing required to create the hash table
involves the following steps:
Creating file space
Introduction to Hashing Techniques/Lesson 8/Slide 15 of 21
©NIIT
- 16. Introduction to Hashing Techniques
An Example to Illustrate the Use of A Hash Table
(Contd..)
Accepting Data
Find the correct bucket
Writing to the hash table
Introduction to Hashing Techniques/Lesson 8/Slide 16 of 21
©NIIT
- 17. Introduction to Hashing Techniques
Problem Statement 8.D.1
Create a hash table for records having structure given
below:
Field Size
City 10
Population 2
The hashing algorithm used is as follows: the first
letter from the alphabetic is used as the hash value. If
the first letter in the key is C, then the hash value is
3. Thus, it is obvious that the number of positions(or
buckets) that a key might hash to is 26, which is the
number of letters in the alphabet.
Introduction to Hashing Techniques/Lesson 8/Slide 17 of 21
©NIIT
- 18. Introduction to Hashing Techniques
Problem Statement 8.D.1 (Contd..)
only one key exists for a particular hash value. In
other words, there is only one record per bucket.
The problem of collisions is not being addressed.
The key is an alphanumeric field, the first byte of
which is an alphabet.
Introduction to Hashing Techniques/Lesson 8/Slide 18 of 21
©NIIT
- 19. Introduction to Hashing Techniques
Summary
In this lesson, you learned that:
Hashing is a technique used to access data stored in
files
Using hashing techniques, it is possible to calculate
the position of a record in a data file from its key field
value
The main purpose of hashing is to eliminate
unnecessary searching by using the method of direct
access to retrieve a record. This is done by
transforming the key to yield the offset of the record
An algorithm called, a hashing function, is used to
perform the key to address transformation
Introduction to Hashing Techniques/Lesson 8/Slide 19 of 21
©NIIT
- 20. Introduction to Hashing Techniques
Summary (Contd..)
It often happens that more than one key hashes to
the same hash value resulting in collision. AS a result,
an attempt is made to store two records in one
location. Collisions are processed by using various
algorithms, three of which are:
Rehashing
Linked list collision processing
Bucket hashing
Introduction to Hashing Techniques/Lesson 8/Slide 20 of 21
©NIIT
- 21. Introduction to Hashing Techniques
Summary (Contd..)
Several hashing strategies can be employed, two of
which are:
Hash indexing
Hash tables
Introduction to Hashing Techniques/Lesson 8/Slide 21 of 21
©NIIT
Editor's Notes
- Lower Bound and Upper Bound to denote the first element number and the last element number respectively
- Lower Bound and Upper Bound to denote the first element number and the last element number respectively