Hashing

-V.Dinesh
III/IV CSE-B
ANITS
Hashing

Outline
 11.1 Introduction
– 11.1.1 What is Hashing
– 11.1.2 Collisions
 11.2 A Simple Hashing Algorithm
 11.3 Hashing Functions and Record Distributions
– 11.3.1 Distributing Records among Addressing
– 11.3.2 Some Other Hashing Methods
– 11.3.3 Predicting the Distribution of Records
– 11.3.4Predicting Collisions for a Full File

11.1.1 What is Hashing
 Hashing is the transformation of a string of characters into a
usually shorter fixed-length value that represents the original
string.
 Hashing is used to index and retrieve items in a database.
 It is so fast that it will take O(1) time to search an element.

How hashing is done?
 It uses Hash Function.
– Takes Key (K) as an argument.
– Return Address (home address).
 Hash Table
– It is a datastructure similar to array.
– The key is placed in the home address of hash table.

Hash Function
int MyHash(char *key)
{
int d;
d=(key[0]*key[1])%1000;
return d;
}

Searching , Insertion and Deletion
Let us consider a hash table H[1000].
 Inserting :
– H[MyHash(key)]=key;
 Deleting:
– H[MyHash(key)]=‘0’;
 Searching:
– H[MyHash(key)];

Insertion
Key=“DINESH”
MyHash(Key)
60
61
62
63
64
65
66
DINESH
Name Home Address
DINESH 64
BALL 290
TREE 888

11.1.2 Collisions
 Now consider the Key “IDIOT”. The home address for this key is
64 which is same as home address of “DINESH”.
 The fighting between two different keys for the same address is
called collision.
 Keys those fight for the same address is called as synonyms.
– E.g. Here the keys “DINESH” and “IDIOT” are synonyms.

Collisions (Contd..)
 They cause many problems because we cannot insert more
than one key in one address.
 We should design an algorithm which will not give any
collisions.
 That kind of algorithm is called perfect hashing algorithm.
Practically this kind of algorithm is hard to achieve.

Avoidance of collisions
 This can be done in 3 ways.
– Spread out the records
– Use extra memory
– Put more than one record at a single address

11.2 A Simple Hashing Algorithm
 It consists of three steps.
– Step 1 : Represent the key in numerical form.
– Step 2 : Fold and add
– Step 3 : Divide by the size of the address space.

Represent the address in a Numerical Form
 If the key is already a number, we can skip this process.
 If it is a string consider the ASCII values of each character.
– Let us consider the key = “LOWELL”.
L O W E L L
76 79 87 69 76 76 32 32 32 32 32 32
Blank Spaces

2.Fold and Add
 It means chopping off pieces of the number and adding them
together.
L O W E L L
7679 | 8769 | 7676 | 3232 | 3232 | 3232
This process is chopping. We have to add these chopped
numbers in next substep.

Fold and add (contd..)
 While adding we have to check whether the sum is going
beyond the range of datatype.
 Let the range be 32767 (range of int in 16 bit compiler).We must
be sure that sum should not cross this range.
 So divide the sum in each iteration with prime number like
19937 (Why???).

Adding…
L O W E L L
7679 | 8769 | 7676 | 3232 | 3232 | 3232
7679+8769=16448  16448%19937=16448
16448+7676=24124  24124%19937=4187
4187+3232=10651  10651%19937=10651
10651+3232=13383  13383%19937=13383
Finally, Sum=13383.

Divide the size of address space
 a=s mod n
– Where a=home address
– s=sum in step 2
– n=number of addresses in a file.
 Since n is addresses in a file and can be very large. So choose
the prime closer to n.

Hash Function.
int Hash(char key[12],int maxAddress)
{
int sum=0;
for(int j=0;j<12;j+=2)
sum=(sum+100*key[j]+key[j+1])%19937;
return sum%maxAddress;
}

11.3.1 Distributing Records among Addresses
A
B
C
D
E
F
G
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
A
B
C
D
E
F
G
A
B
C
D
E
F
G
Uniform distribution All Synonyms A few Synonyms

11.3.2 Some other Hashing Methods
 Examine Keys for a pattern
 Fold parts of the key
 Divide the key by a number
 Square the key and take the middle
 Radix transformation

11.3.3 Predicting the Distribution of Records
 It is hard to tell the distribution of the records but we can predict
the distribution of the records.
 One of the prediction methods is Poission Distribution.
– p(x no of B’s and (r-x) no of A’s)=a^(r-x).b^x
– And the number of ways that x no of B’s and r-x no of A’s can be
arranged is C= r !
(r-x) ! * x !

 By rewriting the a as (1-1/N )and b as 1/N the poission
distribution is changed to
p(x)= (r/N)^x * e^(-r/N)
x!
In general if there are N addresses, then the expected number of
addresses with x records assigned to them is
N*p(x)

11.3.4 Predicting Collisions for a Full File
 Let a file contains 10000 records in 10000 addresses.
 Here , r=10000 and N=10000 then r/N=1
Substituting in p(x)= (r/N)^x * e^(-r/N)
p(0)= 1^0 * e^-1 = 0.3679
The number of addresses with no records assigned is
N*p(x)=10000*0.3679=3679
x!x!
0!

 Similiarly , the no of addresses one, two, and three records
assigned respectively are,
 10000*p(1)=3679
 10000*p(2)=1839
 10000*p(1)=613
 So, there will be 1839 overflows in two addresses,2*613
addresses in 3 addresses.

Hashing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Hashing

Similar to Hashing (20)

Recently uploaded

Recently uploaded (20)

Hashing