Hash Table
Hash table
• Hash table is a data structure in which keys are
mapped to array positions by a hash function.
• Hash table, an element with key k is stored at
index h(k) and not k.
• It means a hash function h is used to calculate the
index at which the element with key k will be
stored.
• The process of mapping the keys to appropriate
locations in a hash table is called hashing.
• Hash table in which each key from the set k is
mapped to locations generated by using a
hash function.
• When two or more keys map to same memory
location is known as collision.
Hash Function
• A hash function is simply a mathematical formula
which when applied to a key, produces an integer
which can be used as an index for the key in the
hash table.
• Hash functions are used to reduce the number of
collisions.
• In Practical, there is no hash function that
eliminates collision completely.
• A good hash function can only minimize the
number of collisions.
Different Hash Functions
• Division Method
• It is the most simple method of hashing an
integer k.
• The method divides k by M and then uses the
remainder thus obtained.
h(k)= k mod M
• Where k is key and M is the maximum prime
number of the hash table size.
Example
• Calculate the hash value of key 1234 and
Given a hash table of size 100.
h(1234) = 1234 % 97 = 70
• 97 is the maximum prime number in the hash
table
Multiplication Method
1. Choose a constant A such that 0 < A <1.
2. Multiply the key k by A.
3. Extract the fractional part of kA.
4. Multiply the result of step 3 by m and take
the floor.
h(k) = |m (kA mod 1) |
Example
• Given a hash table of size 1000, map the key
12345 to an appropriate location in the hash
table.
• Suppose, We will use A = 0.618033, m = 1000 and
k=12345
h(12345) = | 1000 (12345 x 0.618033 mod 1) |
h(12345) = | 1000 (7629.617385 mod 1) |
h(12345) = | 1000 (0.617385) |
h(12345) = | 617.385 |
h(12345) = 617
Mid-Square Method
1. Square the value of the key. i.e. k2.
2. Extract the middle r bits of the result
obtained in step 1.
h(k) = k2
Example
• Calculate the hash value for key 5642 using
the mid-square method. The hash table has
100 memory location.
• We have k = 5642 and hash table has 100
memory location, it means that only two
digits are needed to map the key to a location
in the hash table, so r=2.
h(5642) = 5642 x 5642 = 31832164
h(5642) = 32.
Folding method
1. Divide the key value into a number of parts.
That is, divide k into parts k1, k2 ……. Kn,
where each part has the same number of
digits except the last part which may have
lesser digits than the other parts.
2. Add the individual parts. That is, obtain the
sum of k1+k2+……..Kn.
3. The hash value is produced by ignoring the
last carry, if any
Example
• Given a hash table of 100 locations, calculate
the hash value using folding method for key
34567.
• Key parts = 34, 56 and 7
• Sum of key parts = 34 + 56 + 7
• Hash value = 97
Collisions
• Collisions occur when the hash function maps two
different keys to the same location.
• Two records cannot be stored in the same location.
• A method used to solve the problem of collision, also
called collision resolution technique.
• The two most popular methods of resolving a
collision are :
1. Open addressing and
2. Chaining
Collision Resolution by Open
addressing
• Once a collision takes place, open addressing
computes new positions using a probe sequence
and the next record is stored in that position.
• The hash table contains two types of values :
sentinel value (-1) and data values.
• The presence of a sentinel value indicates that
the location contains no data value at present
but can be used to hold a value.
• Open addressing technique can be implemented
using linear probing, quadratic probing and
double hashing.
Linear Probing
• The simplest approach to resolve a collision is
linear probing.
• Hash function is used to resolve the collision :
h (k,i) = [h’(k) + i] mod m
Where m is the size of the hash table,
h’(k) = (k mod m) and
i is the probe number and varies
from 0 to m-1
Example• Consider a hash table of size 10. Using linear probing,
insert the keys 72, 24, 63, 81 and 92.
• Initially, the hash table can be given as :
• Step 1 : Key = 72
h(72,0) = (72 mod 10 + 0 ) mod 10
h(72,0) = (2) mod 10
h(72,0) = 2
Since T[2] is vacant, insert key 72 at this location.
0 1 2 3 4 5 6 7 8 9
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1
0 1 2 3 4 5 6 7 8 9
-1 -1 72 -1 -1 -1 -1 -1 -1 -1
Example
• Step 2 : Key = 24
h(24,0) = (24 mod 10 + 0 ) mod 10
h(24,0) = (4) mod 10
h(24,0) = 4
Since T[4] is vacant, insert key 24 at this location.
0 1 2 3 4 5 6 7 8 9
-1 -1 72 -1 24 -1 -1 -1 -1 -1
Example
• Step 3 : Key = 63
h(63,0) = (63 mod 10 + 0 ) mod 10
h(63,0) = (3) mod 10
h(63,0) = 3
Since T[3] is vacant, insert key 63 at this location.
0 1 2 3 4 5 6 7 8 9
-1 -1 72 63 24 -1 -1 -1 -1 -1
Example
• Step 4 : Key = 81
h(81,0) = (81 mod 10 + 0 ) mod 10
h(81,0) = (1) mod 10
h(81,0) = 1
Since T[1] is vacant, insert key 81 at this location.
0 1 2 3 4 5 6 7 8 9
-1 81 72 63 24 -1 -1 -1 -1 -1
Example
• Step 5 : Key = 92
h(92,0) = (92 mod 10 + 0 ) mod 10
h(92,0) = (2) mod 10
h(92,0) = 2
Since T[2] is occupied, so we cannot store the key
92 in T[2]. Therefore, try again for the next location.
Thus probe i=1
0 1 2 3 4 5 6 7 8 9
-1 81 72 63 24 -1 -1 -1 -1 -1
Example
• Step 5 : Key = 92
h(92,1) = (92 mod 10 + 1 ) mod 10
h(92,1) = (2+1) mod 10
h(92,1) = 3
Since T[3] is occupied, so we cannot store the key
92 in T[3]. Therefore, try again for the next location.
Thus probe i=2
0 1 2 3 4 5 6 7 8 9
-1 81 72 63 24 -1 -1 -1 -1 -1
Example
• Step 5 : Key = 92
h(92,2) = (92 mod 10 + 2 ) mod 10
h(92,2) = (2+2) mod 10
h(92,2) = 4
Since T[4] is occupied, so we cannot store the key
92 in T[4]. Therefore, try again for the next location.
Thus probe i=3
0 1 2 3 4 5 6 7 8 9
-1 81 72 63 24 -1 -1 -1 -1 -1
Example
• Step 5 : Key = 92
h(92,3) = (92 mod 10 + 3 ) mod 10
h(92,3) = (2+3) mod 10
h(92,3) = 5
Since T[5] is vacant, insert key 92 at this location
0 1 2 3 4 5 6 7 8 9
-1 81 72 63 24 92 -1 -1 -1 -1
Merits & Demerits
• Linear probing enables good memory caching.
• More the number of collisions, higher the
probes that are required to find a free location
and Lesser is the Performance, this process is
known as Primary clustering.
Quadratic Probing
• Hash function is used to resolve the collision :
h (k,i) = [h’(k) + c1 i + c2 i2] mod m
Where m is the size of the hash table
h’(k) = (k mod m)
i is the probe number that varies
from 0 to m-1
and c1 and c2 are constants such
that c1 and c2 != 0
Merits & Demerits
• Quadratic probing eliminates the primary
clustering and provides good memory caching.
• If there is a collision between two keys, then
the same probe sequence will be followed for
both, this process is called secondary
clustering.
Double Hashing
• Double hashing, we use two hash functions rather
than a single function.
• The hash function in the case of double hashing
can be given as :
h(k,i) = [h1(k) + i h2(k)] mod m
Where, m is the size of the hash table,
h1(k) and h2(k) are two hash functions given as
h1(k) = k mod m and h2(k) = k mod m’
i is the probe number that varies from 0 to m-1
and m’ is chosen to be less than m i.e. m’=m-1
or m-2
Merits & demerits
• In double hashing, we use two hash functions
rather a single function.
• The performance of double hashing is very
close to the performance of the ideal scheme
of uniform hashing.
• It minimizes repeated collisions.
Collision Resolution by Chaining
• In chaining, each location in the hash table
stores a pointer to a linked list that contains all
the key values that were hashed to the same
location.
Example
• Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order,
in a hash table of size 7 using chaining with the hash
function: h(key) = key % 7
h(23) = 23 % 7 = 2
h(13) = 13 % 7 = 6
h(21) = 21 % 7 = 0
h(14) = 14 % 7 = 0 collision
h(7) = 7 % 7 = 0 collision
h(8) = 8 % 7 = 1
h(15) = 15 % 7 = 1 collision
Merits
• The hash table size is unlimited. In this case
you don’t need to expand the table and
recreate a hash function.
• Collision handling is simple: just insert
colliding records into a list.
Demerits
• Using pointers slows the algorithm: time
required is to allocate new nodes.
• The main cost of chaining is the extra space
required for the linked lists.
• Overhead of multiple linked lists.

8. Hash table

  • 1.
  • 2.
    Hash table • Hashtable is a data structure in which keys are mapped to array positions by a hash function. • Hash table, an element with key k is stored at index h(k) and not k. • It means a hash function h is used to calculate the index at which the element with key k will be stored. • The process of mapping the keys to appropriate locations in a hash table is called hashing.
  • 3.
    • Hash tablein which each key from the set k is mapped to locations generated by using a hash function. • When two or more keys map to same memory location is known as collision.
  • 4.
    Hash Function • Ahash function is simply a mathematical formula which when applied to a key, produces an integer which can be used as an index for the key in the hash table. • Hash functions are used to reduce the number of collisions. • In Practical, there is no hash function that eliminates collision completely. • A good hash function can only minimize the number of collisions.
  • 5.
    Different Hash Functions •Division Method • It is the most simple method of hashing an integer k. • The method divides k by M and then uses the remainder thus obtained. h(k)= k mod M • Where k is key and M is the maximum prime number of the hash table size.
  • 6.
    Example • Calculate thehash value of key 1234 and Given a hash table of size 100. h(1234) = 1234 % 97 = 70 • 97 is the maximum prime number in the hash table
  • 7.
    Multiplication Method 1. Choosea constant A such that 0 < A <1. 2. Multiply the key k by A. 3. Extract the fractional part of kA. 4. Multiply the result of step 3 by m and take the floor. h(k) = |m (kA mod 1) |
  • 8.
    Example • Given ahash table of size 1000, map the key 12345 to an appropriate location in the hash table. • Suppose, We will use A = 0.618033, m = 1000 and k=12345 h(12345) = | 1000 (12345 x 0.618033 mod 1) | h(12345) = | 1000 (7629.617385 mod 1) | h(12345) = | 1000 (0.617385) | h(12345) = | 617.385 | h(12345) = 617
  • 9.
    Mid-Square Method 1. Squarethe value of the key. i.e. k2. 2. Extract the middle r bits of the result obtained in step 1. h(k) = k2
  • 10.
    Example • Calculate thehash value for key 5642 using the mid-square method. The hash table has 100 memory location. • We have k = 5642 and hash table has 100 memory location, it means that only two digits are needed to map the key to a location in the hash table, so r=2. h(5642) = 5642 x 5642 = 31832164 h(5642) = 32.
  • 11.
    Folding method 1. Dividethe key value into a number of parts. That is, divide k into parts k1, k2 ……. Kn, where each part has the same number of digits except the last part which may have lesser digits than the other parts. 2. Add the individual parts. That is, obtain the sum of k1+k2+……..Kn. 3. The hash value is produced by ignoring the last carry, if any
  • 12.
    Example • Given ahash table of 100 locations, calculate the hash value using folding method for key 34567. • Key parts = 34, 56 and 7 • Sum of key parts = 34 + 56 + 7 • Hash value = 97
  • 13.
    Collisions • Collisions occurwhen the hash function maps two different keys to the same location. • Two records cannot be stored in the same location. • A method used to solve the problem of collision, also called collision resolution technique. • The two most popular methods of resolving a collision are : 1. Open addressing and 2. Chaining
  • 14.
    Collision Resolution byOpen addressing • Once a collision takes place, open addressing computes new positions using a probe sequence and the next record is stored in that position. • The hash table contains two types of values : sentinel value (-1) and data values. • The presence of a sentinel value indicates that the location contains no data value at present but can be used to hold a value. • Open addressing technique can be implemented using linear probing, quadratic probing and double hashing.
  • 15.
    Linear Probing • Thesimplest approach to resolve a collision is linear probing. • Hash function is used to resolve the collision : h (k,i) = [h’(k) + i] mod m Where m is the size of the hash table, h’(k) = (k mod m) and i is the probe number and varies from 0 to m-1
  • 16.
    Example• Consider ahash table of size 10. Using linear probing, insert the keys 72, 24, 63, 81 and 92. • Initially, the hash table can be given as : • Step 1 : Key = 72 h(72,0) = (72 mod 10 + 0 ) mod 10 h(72,0) = (2) mod 10 h(72,0) = 2 Since T[2] is vacant, insert key 72 at this location. 0 1 2 3 4 5 6 7 8 9 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 -1 -1 72 -1 -1 -1 -1 -1 -1 -1
  • 17.
    Example • Step 2: Key = 24 h(24,0) = (24 mod 10 + 0 ) mod 10 h(24,0) = (4) mod 10 h(24,0) = 4 Since T[4] is vacant, insert key 24 at this location. 0 1 2 3 4 5 6 7 8 9 -1 -1 72 -1 24 -1 -1 -1 -1 -1
  • 18.
    Example • Step 3: Key = 63 h(63,0) = (63 mod 10 + 0 ) mod 10 h(63,0) = (3) mod 10 h(63,0) = 3 Since T[3] is vacant, insert key 63 at this location. 0 1 2 3 4 5 6 7 8 9 -1 -1 72 63 24 -1 -1 -1 -1 -1
  • 19.
    Example • Step 4: Key = 81 h(81,0) = (81 mod 10 + 0 ) mod 10 h(81,0) = (1) mod 10 h(81,0) = 1 Since T[1] is vacant, insert key 81 at this location. 0 1 2 3 4 5 6 7 8 9 -1 81 72 63 24 -1 -1 -1 -1 -1
  • 20.
    Example • Step 5: Key = 92 h(92,0) = (92 mod 10 + 0 ) mod 10 h(92,0) = (2) mod 10 h(92,0) = 2 Since T[2] is occupied, so we cannot store the key 92 in T[2]. Therefore, try again for the next location. Thus probe i=1 0 1 2 3 4 5 6 7 8 9 -1 81 72 63 24 -1 -1 -1 -1 -1
  • 21.
    Example • Step 5: Key = 92 h(92,1) = (92 mod 10 + 1 ) mod 10 h(92,1) = (2+1) mod 10 h(92,1) = 3 Since T[3] is occupied, so we cannot store the key 92 in T[3]. Therefore, try again for the next location. Thus probe i=2 0 1 2 3 4 5 6 7 8 9 -1 81 72 63 24 -1 -1 -1 -1 -1
  • 22.
    Example • Step 5: Key = 92 h(92,2) = (92 mod 10 + 2 ) mod 10 h(92,2) = (2+2) mod 10 h(92,2) = 4 Since T[4] is occupied, so we cannot store the key 92 in T[4]. Therefore, try again for the next location. Thus probe i=3 0 1 2 3 4 5 6 7 8 9 -1 81 72 63 24 -1 -1 -1 -1 -1
  • 23.
    Example • Step 5: Key = 92 h(92,3) = (92 mod 10 + 3 ) mod 10 h(92,3) = (2+3) mod 10 h(92,3) = 5 Since T[5] is vacant, insert key 92 at this location 0 1 2 3 4 5 6 7 8 9 -1 81 72 63 24 92 -1 -1 -1 -1
  • 24.
    Merits & Demerits •Linear probing enables good memory caching. • More the number of collisions, higher the probes that are required to find a free location and Lesser is the Performance, this process is known as Primary clustering.
  • 25.
    Quadratic Probing • Hashfunction is used to resolve the collision : h (k,i) = [h’(k) + c1 i + c2 i2] mod m Where m is the size of the hash table h’(k) = (k mod m) i is the probe number that varies from 0 to m-1 and c1 and c2 are constants such that c1 and c2 != 0
  • 26.
    Merits & Demerits •Quadratic probing eliminates the primary clustering and provides good memory caching. • If there is a collision between two keys, then the same probe sequence will be followed for both, this process is called secondary clustering.
  • 27.
    Double Hashing • Doublehashing, we use two hash functions rather than a single function. • The hash function in the case of double hashing can be given as : h(k,i) = [h1(k) + i h2(k)] mod m Where, m is the size of the hash table, h1(k) and h2(k) are two hash functions given as h1(k) = k mod m and h2(k) = k mod m’ i is the probe number that varies from 0 to m-1 and m’ is chosen to be less than m i.e. m’=m-1 or m-2
  • 28.
    Merits & demerits •In double hashing, we use two hash functions rather a single function. • The performance of double hashing is very close to the performance of the ideal scheme of uniform hashing. • It minimizes repeated collisions.
  • 29.
    Collision Resolution byChaining • In chaining, each location in the hash table stores a pointer to a linked list that contains all the key values that were hashed to the same location.
  • 31.
    Example • Load thekeys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a hash table of size 7 using chaining with the hash function: h(key) = key % 7 h(23) = 23 % 7 = 2 h(13) = 13 % 7 = 6 h(21) = 21 % 7 = 0 h(14) = 14 % 7 = 0 collision h(7) = 7 % 7 = 0 collision h(8) = 8 % 7 = 1 h(15) = 15 % 7 = 1 collision
  • 33.
    Merits • The hashtable size is unlimited. In this case you don’t need to expand the table and recreate a hash function. • Collision handling is simple: just insert colliding records into a list.
  • 34.
    Demerits • Using pointersslows the algorithm: time required is to allocate new nodes. • The main cost of chaining is the extra space required for the linked lists. • Overhead of multiple linked lists.