Ds 8

Introduction to Hashing Techniques

Objectives
In this lesson, you will learn to:
Randomly access data by using a hash index
Implement Hashing function
Define different hashing techniques
Define collision and how collisions are handled
Create a hash index and use it, to randomly access
data from a file, using the key field

Introduction to Hashing Techniques/Lesson 8/Slide 1 of 21
©NIIT


Hashing
It means converting a key to an address to retrieve a
record.
Given a key, the offset of the record can be calculated
with the following formula:
Key * Record length

©NIIT


Hashing Functions
Given a key, the hash function converts it into a hash
value (location) within the range 1 -n, where n is the
size of the storage (address) space that has been
allocated for the records.
The record is then retrieved at the location generated.
Dividing is one of the commonly used hashing
function.

©NIIT


Hashing Techniques
Two hashing techniques commonly employed are:
Hash indexes
Hash tables
The goal of both the techniques is same, which is as
follows:
To identify the location of a data record in a file
using a key, to address transformation.

©NIIT


Hash Indexes
An index created by placing keys in locations
calculated using a hashing function is called a hash
index file.
It contains a key-offset pair corresponding to each
record in the data file.
Following two files are used in the technique
employing hash index:
A file containing the data records, and
A hash index file.

©NIIT


Hash Tables
Hash tables make use of data files only.
It involves calculation of a location based on the value
of a key.
In this method, a whole record is inserted into the
calculated position in the data file, i.e. the hash table.

©NIIT


Collisions
An attempt to store two keys at the same position is
known as collision.
It will occur irrespective of the hashing function used.

©NIIT


Collision Processing
Rehashing
This method involves using a secondary hash
function, called a rehashed function, on the hash
value of the key.
The rehash function is applied successively until an
empty position is found.

©NIIT


Collision Processing (Contd..)
Chaining
This method uses links (pointers) to resolve hash
clashes.
Two chaining techniques are:
Coalesed Chaining
Separate Chaining

©NIIT


Coalesed Chaining
It completely eliminates the possibility that more than
one collision will occur even for the same hash value.
It requires the storage area to be divided into two
parts:
A prime hash area
An overflow area

©NIIT


Separate Chaining
In this method, an array of header nodes is used.
Each element in the array is a pointer, which stores
the address of a distinct linked list.
Each linked list is a list of records whose keys have
the same hash values.
When a record has to be retrieved, the hashing
function converts the given key to yield a position
(subscript) in the array.

©NIIT


Bucket Hashing
The hashing of a key yield the position of a storage
area in which several key entries can be stored. This
storage area is called a bucket.
The file is divided into a number of such buckets.
Each bucket has enough space to store multiple
values.
When a record has to be retrieved, its key is hashed
to give an offset. This offset is a bucket offset. Then
the bucket is read into internal memory and searched
sequentially.

©NIIT


Hash Indexes Vs Hash Tables
The choice of hashing method depends on the
following factors:
Data Organization
Access Speed
Disk Space Requirement

©NIIT


An Example to Illustrate the Use of A Hash Table
The assumptions made in this example are:
The key is an alphanumeric field, the first byte
of which is an alphabet.
Only one key exists for a particular hash value.
The problem of collisions is not being
addressed.
The file structure assumed is shown below:
Field Length Type
city 10 String
Population 2 Integer

©NIIT


(Contd..)
The hashing algorithm used is as follows: the first
letter from the alphabetic key is extracted and the
position of this letter in the alphabet is used as the
hash value. If the first letter in the key is C, then
the hash value is 3. Thus, it is obvious that the
number of positions (or buckets) that a key might
hash to is 26, which is the number of letters in the
alphabet.
The processing required to create the hash table
involves the following steps:
Creating file space
©NIIT


Problem Statement 8.D.1
Create a hash table for records having structure given
below:
Field Size
City 10
Population 2
The hashing algorithm used is as follows: the first
letter from the alphabetic is used as the hash value. If
the first letter in the key is C, then the hash value is
3. Thus, it is obvious that the number of positions(or
buckets) that a key might hash to is 26, which is the
number of letters in the alphabet.
©NIIT


Problem Statement 8.D.1 (Contd..)
only one key exists for a particular hash value. In
other words, there is only one record per bucket.
The problem of collisions is not being addressed.
The key is an alphanumeric field, the first byte of
which is an alphabet.

©NIIT


Summary
In this lesson, you learned that:
Hashing is a technique used to access data stored in
files
Using hashing techniques, it is possible to calculate
the position of a record in a data file from its key field
value
The main purpose of hashing is to eliminate
unnecessary searching by using the method of direct
access to retrieve a record. This is done by
transforming the key to yield the offset of the record
An algorithm called, a hashing function, is used to
perform the key to address transformation

©NIIT


Summary (Contd..)
It often happens that more than one key hashes to
the same hash value resulting in collision. AS a result,
an attempt is made to store two records in one
location. Collisions are processed by using various
algorithms, three of which are:
Rehashing
Linked list collision processing
Bucket hashing

©NIIT

Ds 8

More Related Content

What's hot

Viewers also liked

Similar to Ds 8

More from Niit Care

Recently uploaded

Ds 8

Editor's Notes