We would build a data structure for which both the insertion and find operations are O (1) in the worst case.
If we cannot guarantee O (1) performance in the worst case , then we make it our design objective to achieve O (1) performance in the average case .
In order to meet the performance objective of constant time insert and find operations, we need a way to do them without performing a search . I.e., given an item x , we need to be able to determine directly from x the array position where it is to be stored.
Hashing-The Basic Idea
Hash tables are widely used data structures, because some operations can be implemented to perform with a constant average time (insertion, deletion, and search).
The general model of a hash table is:
Hashing-The Basic Idea
Items in a hash table have two parts: a Key used for indexing and one
or more data fields . Typically, one or more data fields are used to
The number of cells in the table is TableSize-1 . Note that the table
might be empty. The number of items is the actual number of cells
As with arrays, each item within the table is indexed by a number,
The index number is obtained using a mapping function known as the
hash function , which ideally should provide a fast method for
computing a unique key for each cell in the table.
Must return a valid table location .
Easy to implement.
Should be 1-to-1 mapping. (avoid collision)
If key1 != key2 then hash(key1) != hash(key2)
A collision occurs when two distinct keys hash to the same location in the array
Should distribute the keys evenly
Any key value k is equally likely to hash to any of the m array locations.
Standard Hash Function
hashValue = key ( mod ) TableSize
4112041 : 12041 mod 1000 = 41
4163490 : 63490 mod 1000 = 490
TableSize should be a prime number for even distribution
Hash Function Examples
Typically the keys are of type string, using either ASCII or UNICODE.
Here is a typical hash functions:
p ublic static int Hash( string key, int tableSize)
hash = k 0 + 2 7 * (k 1 + 2 7 * (k 2 )) mod TableSize
public static int HashFig53( string key, int tableSize)
aNumber = key + 27*key + 729*key;
aNumber = aNumber % tableSize;
Hash Function Examples
We wish to implement a searchable container which will be used to contain character strings from the set of strings K ,
Suppose we define a function as given by the table:
Then, we can implement a searchable container using
a table of length n =12. To insert item x , we simply store
it a position h ( x )-1 of the table. Similarly, to locate item x ,
we simply check to see if it is found at position h ( x )-1
When an element is inserted, if it hashes to the same value as an already inserted element, then we have a collision .
Collision resolving techniques
This is a technique used to avoid collisions. The idea is to store the items that hash to the same value in a sorted list. This is a very nice example of a data structure that is actually implemented as a combination of two data structures: hash table and a set of sorted linked lists. Therefore, the operations are implemented in terms of those data structures.
For example, to find an item in the table,
1. Find the ith linked list from the index in the Table
2. Traverse the list to find the element.
Assuming that hash(x) = x mod(10),
Load factor = number of elements / table size
average length of list =
successful search cost 1 + ( link traversals
cost depends on
No linked-list. All items are in the array
If a collision occurs, alternative locations are tried until an empty cell is found
try h 0 (x), h 1 (x), h 2 (x), …
h i (x) = (hash(x) + f (i)) mod TableSize
f(i) is a collision resolution strategy
Require bigger table, should be below 0.5
If a collision occurs, try the next cell sequentially
f(i) = i
h i (x) = (hash(x) + i) mod TableSize
Try hash(x) mod TableSize, (hash(x) + 1) mod TableSize, (hash(x) + 2) mod TableSize, (hash(x) + 3) mod TableSize, . . .
Linear Probing Insert: 89, 18, 49, 58, 69
89 is directly inserted into cell 9
18 is directly inserted into cell 8
49 has a collision at cell 9 and finally put into cell 0
58 has collisions at cell 8, 9, 0 and finally put into cell 1
69 has a collisions at cell 9, 0, 1 and finally put into cell 2
0 1 2 3 4 5 6 7 8 9 49 58 69 18 89
Forming of blocks of occupied cells (called clusters)
A collision occurs if a key is hashed into anywhere in a cluster. Then there may be several attempts to resolve the collision before a free space is found. The new data is added into the cluster.
Linear Probing : ( Problem s)
Normal deletion cannot be performed :
(some following find operations will fail because the link of
collisions that leads to the data is cut) Use lazy deletion
Insertion cost = number of probes to find an empty cell
= 1/(fraction of empty cells)
= 1/(1- )
Eliminate primary clustering
f(i) = i 2
h i (x) = (hash(x) + i 2 ) mod TableSize
Try hash(x) mod TableSize, hash(x)+1 2 mod TableSize,
hash(x)+2 2 mod TableSize, hash(x)+3 2 mod TableSize, . . .
Table must be at most half full and table size must be prime, otherwise insertion may fail (always have a collision)