Introduction to Hashing Techniques

Objectives
In this lesson, you will learn to:
 Randomly access data by using a hash index
 Implement Hashing function
 Define different hashing techniques
 Define collision and how collisions are handled
 Create a hash index and use it, to randomly access
  data from a file, using the key field




                       Introduction to Hashing Techniques/Lesson 8/Slide 1 of 21
  ©NIIT
Introduction to Hashing Techniques

 Hashing
  It means converting a key to an address to retrieve a
   record.
  Given a key, the offset of the record can be calculated
   with the following formula:
      Key * Record length




                    Introduction to Hashing Techniques/Lesson 8/Slide 2 of 21
  ©NIIT
Introduction to Hashing Techniques

 Hashing Functions
  Given a key, the hash function converts it into a hash
   value (location) within the range 1 -n, where n is the
   size of the storage (address) space that has been
   allocated for the records.
  The record is then retrieved at the location generated.
  Dividing is one of the commonly used hashing
   function.




                    Introduction to Hashing Techniques/Lesson 8/Slide 3 of 21
  ©NIIT
Introduction to Hashing Techniques

 Hashing Techniques
  Two hashing techniques commonly employed are:
       Hash indexes
       Hash tables
  The goal of both the techniques is same, which is as
   follows:
       To identify the location of a data record in a file
        using a key, to address transformation.




                      Introduction to Hashing Techniques/Lesson 8/Slide 4 of 21
  ©NIIT
Introduction to Hashing Techniques

 Hash Indexes
  An index created by placing keys in locations
   calculated using a hashing function is called a hash
   index file.
  It contains a key-offset pair corresponding to each
   record in the data file.
  Following two files are used in the technique
   employing hash index:
       A file containing the data records, and
       A hash index file.



                     Introduction to Hashing Techniques/Lesson 8/Slide 5 of 21
  ©NIIT
Introduction to Hashing Techniques

 Hash Tables
  Hash tables make use of data files only.
  It involves calculation of a location based on the value
   of a key.
  In this method, a whole record is inserted into the
   calculated position in the data file, i.e. the hash table.




                     Introduction to Hashing Techniques/Lesson 8/Slide 6 of 21
  ©NIIT
Introduction to Hashing Techniques

 Collisions
  An attempt to store two keys at the same position is
   known as collision.
  It will occur irrespective of the hashing function used.




                    Introduction to Hashing Techniques/Lesson 8/Slide 7 of 21
  ©NIIT
Introduction to Hashing Techniques

  Collision Processing
  Rehashing
   This method involves using a secondary hash
    function, called a rehashed function, on the hash
    value of the key.
   The rehash function is applied successively until an
    empty position is found.




                    Introduction to Hashing Techniques/Lesson 8/Slide 8 of 21
  ©NIIT
Introduction to Hashing Techniques

 Collision Processing (Contd..)
 Chaining
  This method uses links (pointers) to resolve hash
   clashes.
  Two chaining techniques are:
       Coalesed Chaining
       Separate Chaining




                   Introduction to Hashing Techniques/Lesson 8/Slide 9 of 21
  ©NIIT
Introduction to Hashing Techniques

 Coalesed Chaining
  It completely eliminates the possibility that more than
   one collision will occur even for the same hash value.
  It requires the storage area to be divided into two
   parts:
       A prime hash area
       An overflow area




                   Introduction to Hashing Techniques/Lesson 8/Slide 10 of 21
  ©NIIT
Introduction to Hashing Techniques

 Separate Chaining
  In this method, an array of header nodes is used.
  Each element in the array is a pointer, which stores
   the address of a distinct linked list.
  Each linked list is a list of records whose keys have
   the same hash values.
  When a record has to be retrieved, the hashing
   function converts the given key to yield a position
   (subscript) in the array.




                   Introduction to Hashing Techniques/Lesson 8/Slide 11 of 21
  ©NIIT
Introduction to Hashing Techniques

 Bucket Hashing
  The hashing of a key yield the position of a storage
   area in which several key entries can be stored. This
   storage area is called a bucket.
  The file is divided into a number of such buckets.
   Each bucket has enough space to store multiple
   values.
  When a record has to be retrieved, its key is hashed
   to give an offset. This offset is a bucket offset. Then
   the bucket is read into internal memory and searched
   sequentially.



                   Introduction to Hashing Techniques/Lesson 8/Slide 12 of 21
  ©NIIT
Introduction to Hashing Techniques

 Hash Indexes Vs Hash Tables
  The choice of hashing method depends on the
   following factors:
       Data Organization
       Access Speed
       Disk Space Requirement




                   Introduction to Hashing Techniques/Lesson 8/Slide 13 of 21
  ©NIIT
Introduction to Hashing Techniques

 An Example to Illustrate the Use of A Hash Table
 The assumptions made in this example are:
         The key is an alphanumeric field, the first byte
          of which is an alphabet.
         Only one key exists for a particular hash value.
         The problem of collisions is not being
          addressed.
         The file structure assumed is shown below:
          Field         Length                Type
          city           10                   String
          Population                  2                   Integer

                   Introduction to Hashing Techniques/Lesson 8/Slide 14 of 21
  ©NIIT
Introduction to Hashing Techniques

 An Example to Illustrate the Use of A Hash Table
   (Contd..)
       The hashing algorithm used is as follows: the first
        letter from the alphabetic key is extracted and the
        position of this letter in the alphabet is used as the
        hash value. If the first letter in the key is C, then
        the hash value is 3. Thus, it is obvious that the
        number of positions (or buckets) that a key might
        hash to is 26, which is the number of letters in the
        alphabet.
  The processing required to create the hash table
   involves the following steps:
       Creating file space
                     Introduction to Hashing Techniques/Lesson 8/Slide 15 of 21
  ©NIIT
Introduction to Hashing Techniques

 An Example to Illustrate the Use of A Hash Table
   (Contd..)
       Accepting Data
       Find the correct bucket
       Writing to the hash table




                    Introduction to Hashing Techniques/Lesson 8/Slide 16 of 21
  ©NIIT
Introduction to Hashing Techniques

 Problem Statement 8.D.1
  Create a hash table for records having structure given
   below:
      Field        Size
      City         10
      Population              2
      The hashing algorithm used is as follows: the first
      letter from the alphabetic is used as the hash value. If
      the first letter in the key is C, then the hash value is
      3. Thus, it is obvious that the number of positions(or
      buckets) that a key might hash to is 26, which is the
      number of letters in the alphabet.
                        Introduction to Hashing Techniques/Lesson 8/Slide 17 of 21
  ©NIIT
Introduction to Hashing Techniques

 Problem Statement 8.D.1 (Contd..)
          only one key exists for a particular hash value. In
          other words, there is only one record per bucket.
          The problem of collisions is not being addressed.
          The key is an alphanumeric field, the first byte of
          which is an alphabet.




                       Introduction to Hashing Techniques/Lesson 8/Slide 18 of 21
  ©NIIT
Introduction to Hashing Techniques

Summary
In this lesson, you learned that:
 Hashing is a technique used to access data stored in
  files
 Using hashing techniques, it is possible to calculate
  the position of a record in a data file from its key field
  value
 The main purpose of hashing is to eliminate
  unnecessary searching by using the method of direct
  access to retrieve a record. This is done by
  transforming the key to yield the offset of the record
 An algorithm called, a hashing function, is used to
  perform the key to address transformation

                      Introduction to Hashing Techniques/Lesson 8/Slide 19 of 21
   ©NIIT
Introduction to Hashing Techniques

Summary (Contd..)
 It often happens that more than one key hashes to
  the same hash value resulting in collision. AS a result,
  an attempt is made to store two records in one
  location. Collisions are processed by using various
  algorithms, three of which are:
     Rehashing
     Linked list collision processing
     Bucket hashing




                    Introduction to Hashing Techniques/Lesson 8/Slide 20 of 21
  ©NIIT
Introduction to Hashing Techniques

Summary (Contd..)
 Several hashing strategies can be employed, two of
  which are:
    Hash indexing
    Hash tables




                     Introduction to Hashing Techniques/Lesson 8/Slide 21 of 21
    ©NIIT

Ds 8

  • 1.
    Introduction to HashingTechniques Objectives In this lesson, you will learn to: Randomly access data by using a hash index Implement Hashing function Define different hashing techniques Define collision and how collisions are handled Create a hash index and use it, to randomly access data from a file, using the key field Introduction to Hashing Techniques/Lesson 8/Slide 1 of 21 ©NIIT
  • 2.
    Introduction to HashingTechniques Hashing It means converting a key to an address to retrieve a record. Given a key, the offset of the record can be calculated with the following formula: Key * Record length Introduction to Hashing Techniques/Lesson 8/Slide 2 of 21 ©NIIT
  • 3.
    Introduction to HashingTechniques Hashing Functions Given a key, the hash function converts it into a hash value (location) within the range 1 -n, where n is the size of the storage (address) space that has been allocated for the records. The record is then retrieved at the location generated. Dividing is one of the commonly used hashing function. Introduction to Hashing Techniques/Lesson 8/Slide 3 of 21 ©NIIT
  • 4.
    Introduction to HashingTechniques Hashing Techniques Two hashing techniques commonly employed are: Hash indexes Hash tables The goal of both the techniques is same, which is as follows: To identify the location of a data record in a file using a key, to address transformation. Introduction to Hashing Techniques/Lesson 8/Slide 4 of 21 ©NIIT
  • 5.
    Introduction to HashingTechniques Hash Indexes An index created by placing keys in locations calculated using a hashing function is called a hash index file. It contains a key-offset pair corresponding to each record in the data file. Following two files are used in the technique employing hash index: A file containing the data records, and A hash index file. Introduction to Hashing Techniques/Lesson 8/Slide 5 of 21 ©NIIT
  • 6.
    Introduction to HashingTechniques Hash Tables Hash tables make use of data files only. It involves calculation of a location based on the value of a key. In this method, a whole record is inserted into the calculated position in the data file, i.e. the hash table. Introduction to Hashing Techniques/Lesson 8/Slide 6 of 21 ©NIIT
  • 7.
    Introduction to HashingTechniques Collisions An attempt to store two keys at the same position is known as collision. It will occur irrespective of the hashing function used. Introduction to Hashing Techniques/Lesson 8/Slide 7 of 21 ©NIIT
  • 8.
    Introduction to HashingTechniques Collision Processing Rehashing This method involves using a secondary hash function, called a rehashed function, on the hash value of the key. The rehash function is applied successively until an empty position is found. Introduction to Hashing Techniques/Lesson 8/Slide 8 of 21 ©NIIT
  • 9.
    Introduction to HashingTechniques Collision Processing (Contd..) Chaining This method uses links (pointers) to resolve hash clashes. Two chaining techniques are: Coalesed Chaining Separate Chaining Introduction to Hashing Techniques/Lesson 8/Slide 9 of 21 ©NIIT
  • 10.
    Introduction to HashingTechniques Coalesed Chaining It completely eliminates the possibility that more than one collision will occur even for the same hash value. It requires the storage area to be divided into two parts: A prime hash area An overflow area Introduction to Hashing Techniques/Lesson 8/Slide 10 of 21 ©NIIT
  • 11.
    Introduction to HashingTechniques Separate Chaining In this method, an array of header nodes is used. Each element in the array is a pointer, which stores the address of a distinct linked list. Each linked list is a list of records whose keys have the same hash values. When a record has to be retrieved, the hashing function converts the given key to yield a position (subscript) in the array. Introduction to Hashing Techniques/Lesson 8/Slide 11 of 21 ©NIIT
  • 12.
    Introduction to HashingTechniques Bucket Hashing The hashing of a key yield the position of a storage area in which several key entries can be stored. This storage area is called a bucket. The file is divided into a number of such buckets. Each bucket has enough space to store multiple values. When a record has to be retrieved, its key is hashed to give an offset. This offset is a bucket offset. Then the bucket is read into internal memory and searched sequentially. Introduction to Hashing Techniques/Lesson 8/Slide 12 of 21 ©NIIT
  • 13.
    Introduction to HashingTechniques Hash Indexes Vs Hash Tables The choice of hashing method depends on the following factors: Data Organization Access Speed Disk Space Requirement Introduction to Hashing Techniques/Lesson 8/Slide 13 of 21 ©NIIT
  • 14.
    Introduction to HashingTechniques An Example to Illustrate the Use of A Hash Table The assumptions made in this example are: The key is an alphanumeric field, the first byte of which is an alphabet. Only one key exists for a particular hash value. The problem of collisions is not being addressed. The file structure assumed is shown below: Field Length Type city 10 String Population 2 Integer Introduction to Hashing Techniques/Lesson 8/Slide 14 of 21 ©NIIT
  • 15.
    Introduction to HashingTechniques An Example to Illustrate the Use of A Hash Table (Contd..) The hashing algorithm used is as follows: the first letter from the alphabetic key is extracted and the position of this letter in the alphabet is used as the hash value. If the first letter in the key is C, then the hash value is 3. Thus, it is obvious that the number of positions (or buckets) that a key might hash to is 26, which is the number of letters in the alphabet. The processing required to create the hash table involves the following steps: Creating file space Introduction to Hashing Techniques/Lesson 8/Slide 15 of 21 ©NIIT
  • 16.
    Introduction to HashingTechniques An Example to Illustrate the Use of A Hash Table (Contd..) Accepting Data Find the correct bucket Writing to the hash table Introduction to Hashing Techniques/Lesson 8/Slide 16 of 21 ©NIIT
  • 17.
    Introduction to HashingTechniques Problem Statement 8.D.1 Create a hash table for records having structure given below: Field Size City 10 Population 2 The hashing algorithm used is as follows: the first letter from the alphabetic is used as the hash value. If the first letter in the key is C, then the hash value is 3. Thus, it is obvious that the number of positions(or buckets) that a key might hash to is 26, which is the number of letters in the alphabet. Introduction to Hashing Techniques/Lesson 8/Slide 17 of 21 ©NIIT
  • 18.
    Introduction to HashingTechniques Problem Statement 8.D.1 (Contd..) only one key exists for a particular hash value. In other words, there is only one record per bucket. The problem of collisions is not being addressed. The key is an alphanumeric field, the first byte of which is an alphabet. Introduction to Hashing Techniques/Lesson 8/Slide 18 of 21 ©NIIT
  • 19.
    Introduction to HashingTechniques Summary In this lesson, you learned that: Hashing is a technique used to access data stored in files Using hashing techniques, it is possible to calculate the position of a record in a data file from its key field value The main purpose of hashing is to eliminate unnecessary searching by using the method of direct access to retrieve a record. This is done by transforming the key to yield the offset of the record An algorithm called, a hashing function, is used to perform the key to address transformation Introduction to Hashing Techniques/Lesson 8/Slide 19 of 21 ©NIIT
  • 20.
    Introduction to HashingTechniques Summary (Contd..) It often happens that more than one key hashes to the same hash value resulting in collision. AS a result, an attempt is made to store two records in one location. Collisions are processed by using various algorithms, three of which are: Rehashing Linked list collision processing Bucket hashing Introduction to Hashing Techniques/Lesson 8/Slide 20 of 21 ©NIIT
  • 21.
    Introduction to HashingTechniques Summary (Contd..) Several hashing strategies can be employed, two of which are: Hash indexing Hash tables Introduction to Hashing Techniques/Lesson 8/Slide 21 of 21 ©NIIT

Editor's Notes

  • #9 Lower Bound and Upper Bound to denote the first element number and the last element number respectively
  • #10 Lower Bound and Upper Bound to denote the first element number and the last element number respectively