Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
(An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune)
NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Computer Engineering
(NBA Accredited)
Subject- Data Structures-I(CO203)
Unit III- Double Hashing
Double Hashing
• This method uses two hashing function h1(key) and h2(key). Problem of unequal
distribution of keys can be handled through double hashing
• Function h1(key) is primary hash function and h2(key) is computed. If the address
obtained by h1(key) is already occupied by another key.
• Second hash function h2(key) is used to compute the increment to be added to
the address obtained by 1st hash function h1(key) in case of collision.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 2
Common definitions for h2 are:
1. h2(key)=1+key%(tablesize) OR
2. h2(key)=M- (key%M) where M is a prime or less than table size OR
3. h2(key)=M*(key%M) where M is a prime or less than table size
• A popular second hash function is : hash2(key) = PRIME – (key % PRIME) where PRIME
is a prime smaller than the TABLE_SIZE.
Once we get h2, double hashing is done by
(hash1(key) + i * hash2(key)) % TABLE_SIZE (We repeat by increasing i when collision
occurs)
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 3
• A good second Hash function is:
• It must never evaluate to zero
• Must make sure that all cells can be probed
Performance of Double Hashing:
1. Much better than linear probing because it eliminates both primary and
secondary clustering.
2. But requires a computation of second hash function h2
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 4
• Example: load the keys 18,26,35,9,64,47,96,36 and 70 in this order, in an empty
hash table of size 13.
• h1(key)=key%13
• h2(key)= 7- key%7
H(key)=[h1(key)+i*h2(key)]%13
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 5
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 6
Key h1 h2
18 5
26 0
35 9
9 9 5
64 12
47 8
96 5 2
36 10
70 5 7
26
9
18
70
96
47
35
36
64
0
1
2
3
4
5
6
7
8
9
10
11
12
H(key)=(9+1*5)%13= 1
H(key)=(5+1*2)%13= 7
H(key)=(5+1*7)%13= 12
H(key)=(5+2*7)%13= 6
e.g. 4371,1323,6173,4199,4344,9699,1889
h1(x)=key %10
h2(x)= 7 – (x %7)
h(key)=H(key)=[h1(key)+i*h2(key)]%10
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 7
0
1
2
3
4
5
6
7
8
9
1889
4371
9699
1323
6173
4344
4199
6173=3
h2(key)= 7-6=1
H(key)= (3+1*1)%10= 4
4344=4
h2(key)= 7- key % 7 =43
H(key)= (3+1*3)%10= 6
9699=9
h2(key)= 7- key % 7 =3
H(key)= (9+1*3)%10=2
1889=9
h2(key)= 7- key % 7 =1
H(key)= (9+1*1)%10=0
// function to insert key into hash table
void insertHash(int key)
{
// if hash table is full
if (isFull())
return;
// get index from first hash
int index = hash1(key);
// if collision occurs
if (ht[index] != -1) {
// get index2 from second hash
int step= hash2(key);
int i = 1;
while (1) {
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 8
// get newIndex
int newIndex = (index + i * step) %
TABLE_SIZE;
// if no collision occurs, store the key
if (ht[newIndex] == -1) {
ht[newIndex] = key;
break;
}
i++;
}
}
// if no collision occurs
else
ht[index] = key;
}
// function to search key in hash table
void search(int key)
{
int index1 = hash1(key);
int step = hash2(key);
int i = 0;
while (ht[(index1 + i * step) % TABLE_SIZE] != key) {
if (ht[(index1 + i * step) % TABLE_SIZE] == -1) {
cout << key << " does not exist" << endl;
return; }
i++;
}
cout << key << " found" << endl; }
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 9
Quadratic Probing
• Suppose key is mapped to the location j and the cell j is already occupied.
• In quadratic probing, the location j,(j+1),(j+22),(j+32),…… are examined to find the
1st empty cell where the key is to be examined.
• This method reduces clustering
• It does not ensure that all cells in the table will be examined to find an empty cell.
• It may be possible that key will not be inserted even if there is an empty cell in
the table.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 10
• In quadratic probing, empty location is searched using the formula,
h(key)=[h(key) + i2] % table_size
• Let hash(x) be the slot index computed using the hash function.
• If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S.
• If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S.
• If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S.
• This process is repeated for all the values of i until an empty slot is found.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 11
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 12
For example: Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85,
92, 73, 101.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 13
Rehashing
• Basically, when the load factor increases to more than its pre-defined value
(default value of load factor is 0.75), the complexity increases.
• So to overcome this, the size of the table is increased (doubled) and all the values
are hashed again and stored in the new double sized table to maintain a low load
factor and low complexity.
• Rehashing is a technique, in which table size is resized, it means size of table is
doubled by creating a new table.
• It is preferable if the total size of table is prime number.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 14
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 15
How Rehashing is done?
Rehashing can be done as follows:
• For each addition of a new entry to the map, check the load factor. If it’s greater
than its pre-defined value (or default value of 0.75 if not given), then Rehash.
• For Rehash, make a new table of double the previous size and make it the new
hash table.
• Then traverse to each element in the old table and call the insert() for each so as
to insert it into the new larger hash table.
• Example insert element 37,90,55,22,17,49 ,4 and 87. table size is 10
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 16
Hash
function=key%table_size
H(37)=37%10=7
α = 0.1
H(90)=90%10=0
α = 0.2
H(55)=55%10=5
α = 0.3
H(22)=22%10=2
α = 0.4
H(17)=17%10=7
α = 0.5
H(49)=49%10=9
α = 0.6
H(4)=4%10=4
α = 0.7
0
1
2
3
4
5
6
7
8
9
90
4
55
37
17
49
0
.
.
3
9
14
17
18
21
22
.
.
49
.
.
55
.
.
37
.
.
17
87
.
.
90
22
Rehashing by
doubling the table
size
• When to Rehash:
1. When table is half Full
2. When insertion fails
3. Load factor is > 0.75
• Advantages:
1. Provide flexibility to enlarge table size.
2. Avoids occurrence of collisions
• Disadvantages:
1. It is costly and happens frequently if table size is small.
2. Time required for rehashing is O(N)
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 17
• Calculate avg. cost and no. of comparisons for following data use linear probing
technique. no. of buckets are 0 to 9 and each bucket has one slot.
12,1,4,3,7,8,10,2,5,14,6,28
No. of comparisions= 1+1+1+1+1+1+1+4+2+6+10+10=39 buckets get examined
Avg. no. of buckets get examined per key= (total no. of comparisons/no. of keys)
= 39/12=3.25
Avg. no. of key comparisions = (total no. of comparisons/no. of buckets)
= 39/10=1.2
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 18
Avg. cost can be computed by calculating successful(Sn) and unsuccessful
search(Un)
Sn = ½( 1+ [1/(1- α )])
= 1/2 ( 1+[1/ (1- 1.2)])
= -2
Un =1/2(1+[1/(1-α)2])
= ½ (1+(1/0.04))
=13
Total cost= Sn + Un
= -2+13=11
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 19

Double Hashing.pptx

  • 1.
    Sanjivani Rural EducationSociety’s Sanjivani College of Engineering, Kopargaon-423 603 (An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune) NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Computer Engineering (NBA Accredited) Subject- Data Structures-I(CO203) Unit III- Double Hashing
  • 2.
    Double Hashing • Thismethod uses two hashing function h1(key) and h2(key). Problem of unequal distribution of keys can be handled through double hashing • Function h1(key) is primary hash function and h2(key) is computed. If the address obtained by h1(key) is already occupied by another key. • Second hash function h2(key) is used to compute the increment to be added to the address obtained by 1st hash function h1(key) in case of collision. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 2
  • 3.
    Common definitions forh2 are: 1. h2(key)=1+key%(tablesize) OR 2. h2(key)=M- (key%M) where M is a prime or less than table size OR 3. h2(key)=M*(key%M) where M is a prime or less than table size • A popular second hash function is : hash2(key) = PRIME – (key % PRIME) where PRIME is a prime smaller than the TABLE_SIZE. Once we get h2, double hashing is done by (hash1(key) + i * hash2(key)) % TABLE_SIZE (We repeat by increasing i when collision occurs) DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 3
  • 4.
    • A goodsecond Hash function is: • It must never evaluate to zero • Must make sure that all cells can be probed Performance of Double Hashing: 1. Much better than linear probing because it eliminates both primary and secondary clustering. 2. But requires a computation of second hash function h2 DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 4
  • 5.
    • Example: loadthe keys 18,26,35,9,64,47,96,36 and 70 in this order, in an empty hash table of size 13. • h1(key)=key%13 • h2(key)= 7- key%7 H(key)=[h1(key)+i*h2(key)]%13 DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 5
  • 6.
    DEPARTMENT OF COMPUTERENGINEERING, Sanjivani COE, Kopargaon 6 Key h1 h2 18 5 26 0 35 9 9 9 5 64 12 47 8 96 5 2 36 10 70 5 7 26 9 18 70 96 47 35 36 64 0 1 2 3 4 5 6 7 8 9 10 11 12 H(key)=(9+1*5)%13= 1 H(key)=(5+1*2)%13= 7 H(key)=(5+1*7)%13= 12 H(key)=(5+2*7)%13= 6
  • 7.
    e.g. 4371,1323,6173,4199,4344,9699,1889 h1(x)=key %10 h2(x)=7 – (x %7) h(key)=H(key)=[h1(key)+i*h2(key)]%10 DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 7 0 1 2 3 4 5 6 7 8 9 1889 4371 9699 1323 6173 4344 4199 6173=3 h2(key)= 7-6=1 H(key)= (3+1*1)%10= 4 4344=4 h2(key)= 7- key % 7 =43 H(key)= (3+1*3)%10= 6 9699=9 h2(key)= 7- key % 7 =3 H(key)= (9+1*3)%10=2 1889=9 h2(key)= 7- key % 7 =1 H(key)= (9+1*1)%10=0
  • 8.
    // function toinsert key into hash table void insertHash(int key) { // if hash table is full if (isFull()) return; // get index from first hash int index = hash1(key); // if collision occurs if (ht[index] != -1) { // get index2 from second hash int step= hash2(key); int i = 1; while (1) { DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 8 // get newIndex int newIndex = (index + i * step) % TABLE_SIZE; // if no collision occurs, store the key if (ht[newIndex] == -1) { ht[newIndex] = key; break; } i++; } } // if no collision occurs else ht[index] = key; }
  • 9.
    // function tosearch key in hash table void search(int key) { int index1 = hash1(key); int step = hash2(key); int i = 0; while (ht[(index1 + i * step) % TABLE_SIZE] != key) { if (ht[(index1 + i * step) % TABLE_SIZE] == -1) { cout << key << " does not exist" << endl; return; } i++; } cout << key << " found" << endl; } DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 9
  • 10.
    Quadratic Probing • Supposekey is mapped to the location j and the cell j is already occupied. • In quadratic probing, the location j,(j+1),(j+22),(j+32),…… are examined to find the 1st empty cell where the key is to be examined. • This method reduces clustering • It does not ensure that all cells in the table will be examined to find an empty cell. • It may be possible that key will not be inserted even if there is an empty cell in the table. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 10
  • 11.
    • In quadraticprobing, empty location is searched using the formula, h(key)=[h(key) + i2] % table_size • Let hash(x) be the slot index computed using the hash function. • If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S. • If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S. • If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S. • This process is repeated for all the values of i until an empty slot is found. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 11
  • 12.
    DEPARTMENT OF COMPUTERENGINEERING, Sanjivani COE, Kopargaon 12 For example: Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.
  • 13.
    DEPARTMENT OF COMPUTERENGINEERING, Sanjivani COE, Kopargaon 13
  • 14.
    Rehashing • Basically, whenthe load factor increases to more than its pre-defined value (default value of load factor is 0.75), the complexity increases. • So to overcome this, the size of the table is increased (doubled) and all the values are hashed again and stored in the new double sized table to maintain a low load factor and low complexity. • Rehashing is a technique, in which table size is resized, it means size of table is doubled by creating a new table. • It is preferable if the total size of table is prime number. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 14
  • 15.
    DEPARTMENT OF COMPUTERENGINEERING, Sanjivani COE, Kopargaon 15 How Rehashing is done? Rehashing can be done as follows: • For each addition of a new entry to the map, check the load factor. If it’s greater than its pre-defined value (or default value of 0.75 if not given), then Rehash. • For Rehash, make a new table of double the previous size and make it the new hash table. • Then traverse to each element in the old table and call the insert() for each so as to insert it into the new larger hash table.
  • 16.
    • Example insertelement 37,90,55,22,17,49 ,4 and 87. table size is 10 DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 16 Hash function=key%table_size H(37)=37%10=7 α = 0.1 H(90)=90%10=0 α = 0.2 H(55)=55%10=5 α = 0.3 H(22)=22%10=2 α = 0.4 H(17)=17%10=7 α = 0.5 H(49)=49%10=9 α = 0.6 H(4)=4%10=4 α = 0.7 0 1 2 3 4 5 6 7 8 9 90 4 55 37 17 49 0 . . 3 9 14 17 18 21 22 . . 49 . . 55 . . 37 . . 17 87 . . 90 22 Rehashing by doubling the table size
  • 17.
    • When toRehash: 1. When table is half Full 2. When insertion fails 3. Load factor is > 0.75 • Advantages: 1. Provide flexibility to enlarge table size. 2. Avoids occurrence of collisions • Disadvantages: 1. It is costly and happens frequently if table size is small. 2. Time required for rehashing is O(N) DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 17
  • 18.
    • Calculate avg.cost and no. of comparisons for following data use linear probing technique. no. of buckets are 0 to 9 and each bucket has one slot. 12,1,4,3,7,8,10,2,5,14,6,28 No. of comparisions= 1+1+1+1+1+1+1+4+2+6+10+10=39 buckets get examined Avg. no. of buckets get examined per key= (total no. of comparisons/no. of keys) = 39/12=3.25 Avg. no. of key comparisions = (total no. of comparisons/no. of buckets) = 39/10=1.2 DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 18
  • 19.
    Avg. cost canbe computed by calculating successful(Sn) and unsuccessful search(Un) Sn = ½( 1+ [1/(1- α )]) = 1/2 ( 1+[1/ (1- 1.2)]) = -2 Un =1/2(1+[1/(1-α)2]) = ½ (1+(1/0.04)) =13 Total cost= Sn + Un = -2+13=11 DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 19