Hashing

16,858 views

Published on

5 Comments
40 Likes
Statistics
Notes
No Downloads
Views
Total views
16,858
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
0
Comments
5
Likes
40
Embeds 0
No embeds

No notes for slide

Hashing

  1. 1. H A S H I N G By Abdul Ghaffar Khan
  2. 2. Contents <ul><li>Basic Concepts </li></ul><ul><li>Hashing Functions </li></ul><ul><li>Collision detection techniques </li></ul>
  3. 3. Hashing-The Basic Idea <ul><li>We would build a data structure for which both the insertion and find operations are O (1) in the worst case. </li></ul><ul><li>If we cannot guarantee O (1) performance in the worst case , then we make it our design objective to achieve O (1) performance in the average case . </li></ul><ul><li>In order to meet the performance objective of constant time insert and find operations, we need a way to do them without performing a search . I.e., given an item x , we need to be able to determine directly from x the array position where it is to be stored. </li></ul>
  4. 4. Hashing-The Basic Idea <ul><li>Hash tables are widely used data structures, because some operations can be implemented to perform with a constant average time (insertion, deletion, and search). </li></ul><ul><li>The general model of a hash table is: </li></ul>
  5. 5. Hashing-The Basic Idea <ul><li>Items in a hash table have two parts: a Key used for indexing and one </li></ul><ul><li>or more data fields . Typically, one or more data fields are used to </li></ul><ul><li>create key. </li></ul><ul><li>The number of cells in the table is TableSize-1 . Note that the table </li></ul><ul><li>might be empty. The number of items is the actual number of cells </li></ul><ul><li>being used. </li></ul><ul><li>As with arrays, each item within the table is indexed by a number, </li></ul><ul><li>0…TableSize-1 . </li></ul><ul><li>The index number is obtained using a mapping function known as the </li></ul><ul><li>hash function , which ideally should provide a fast method for </li></ul><ul><li>computing a unique key for each cell in the table. </li></ul>
  6. 6. Hash function <ul><li>Must return a valid table location . </li></ul><ul><li>Easy to implement. </li></ul><ul><li>Should be 1-to-1 mapping. (avoid collision) </li></ul><ul><ul><li>If key1 != key2 then hash(key1) != hash(key2) </li></ul></ul><ul><ul><li>A collision occurs when two distinct keys hash to the same location in the array </li></ul></ul><ul><li>Should distribute the keys evenly </li></ul><ul><ul><li>Any key value k is equally likely to hash to any of the m array locations. </li></ul></ul>
  7. 7. Standard Hash Function <ul><li>hashValue = key ( mod ) TableSize </li></ul><ul><li>Example: </li></ul><ul><ul><li>4112041 : 12041 mod 1000 = 41 </li></ul></ul><ul><ul><li>4163490 : 63490 mod 1000 = 490 </li></ul></ul><ul><li>TableSize should be a prime number for even distribution </li></ul>
  8. 8. Hash Function Examples <ul><li>Typically the keys are of type string, using either ASCII or UNICODE. </li></ul><ul><li>Here is a typical hash functions: </li></ul><ul><li>p ublic static int Hash( string key, int tableSize) </li></ul><ul><li>{ </li></ul><ul><li>int hashVal = 0; </li></ul><ul><li>char c; </li></ul><ul><li>for ( int i=0; i < key.Length; i++ ) </li></ul><ul><li>{ </li></ul><ul><li>c = key[i]; </li></ul><ul><li>hashVal += ( int ) c; </li></ul><ul><li>} </li></ul><ul><li>return hashVal % tableSize; </li></ul><ul><li>} </li></ul>
  9. 9. Hash Function Examples <ul><li> hash = (k 0 + 2 7k 1 + 2 7 2 k 2 + . . . ) mod TableSize </li></ul><ul><li>Example: 3-character key </li></ul><ul><ul><li>hash = (k 0 + 2 7k 1 + 2 7 2 k 2 ) mod TableSize </li></ul></ul><ul><ul><li>hash = k 0 + 2 7 * (k 1 + 2 7 * (k 2 )) mod TableSize </li></ul></ul><ul><ul><li>public static int HashFig53( string key, int tableSize) </li></ul></ul><ul><ul><li>{ </li></ul></ul><ul><ul><li>int aNumber; </li></ul></ul><ul><ul><li>aNumber = key[0] + 27*key[1] + 729*key[2]; </li></ul></ul><ul><ul><li>aNumber = aNumber % tableSize; </li></ul></ul><ul><ul><li>return aNumber; </li></ul></ul><ul><ul><li>} </li></ul></ul>
  10. 10. Hash Function Examples <ul><ul><li>Example: </li></ul></ul><ul><ul><li>We wish to implement a searchable container which will be used to contain character strings from the set of strings K , </li></ul></ul><ul><ul><li>Suppose we define a function as given by the table: </li></ul></ul><ul><ul><li>Then, we can implement a searchable container using </li></ul></ul><ul><ul><li>a table of length n =12. To insert item x , we simply store </li></ul></ul><ul><ul><li>it a position h ( x )-1 of the table. Similarly, to locate item x , </li></ul></ul><ul><ul><li>we simply check to see if it is found at position h ( x )-1 </li></ul></ul>
  11. 11. Collision <ul><li>When an element is inserted, if it hashes to the same value as an already inserted element, then we have a collision . </li></ul><ul><li>Collision resolving techniques </li></ul><ul><ul><li>Separate Chaining </li></ul></ul><ul><ul><li>Open Addressing </li></ul></ul><ul><ul><ul><li>Linear Probling, </li></ul></ul></ul><ul><ul><ul><li>Quadratic Probling, </li></ul></ul></ul><ul><ul><ul><li>Double Hashing </li></ul></ul></ul>
  12. 12. Separate Chaining <ul><li>This is a technique used to avoid collisions. The idea is to store the items that hash to the same value in a sorted list. This is a very nice example of a data structure that is actually implemented as a combination of two data structures: hash table and a set of sorted linked lists. Therefore, the operations are implemented in terms of those data structures. </li></ul><ul><li>For example, to find an item in the table, </li></ul><ul><li>1. Find the ith linked list from the index in the Table </li></ul><ul><li>2. Traverse the list to find the element. </li></ul><ul><li>Assuming that hash(x) = x mod(10), </li></ul>
  13. 13. Separate Chaining <ul><li>Load factor  = number of elements / table size </li></ul><ul><li>average length of list =  </li></ul><ul><li>successful search cost 1 + (  link traversals </li></ul><ul><li>cost depends on  </li></ul>
  14. 14. Open Addressing <ul><li>No linked-list. All items are in the array </li></ul><ul><li>If a collision occurs, alternative locations are tried until an empty cell is found </li></ul><ul><ul><li>try h 0 (x), h 1 (x), h 2 (x), … </li></ul></ul><ul><ul><li>h i (x) = (hash(x) + f (i)) mod TableSize </li></ul></ul><ul><ul><li>f(i) is a collision resolution strategy </li></ul></ul><ul><li>Require bigger table,  should be below 0.5 </li></ul>
  15. 15. Linear Probing <ul><li>If a collision occurs, try the next cell sequentially </li></ul><ul><li>f(i) = i </li></ul><ul><li>h i (x) = (hash(x) + i) mod TableSize </li></ul><ul><li>Try hash(x) mod TableSize, (hash(x) + 1) mod TableSize, (hash(x) + 2) mod TableSize, (hash(x) + 3) mod TableSize, . . . </li></ul>
  16. 16. Linear Probing Insert: 89, 18, 49, 58, 69 <ul><li>89 is directly inserted into cell 9 </li></ul><ul><li>18 is directly inserted into cell 8 </li></ul><ul><li>49 has a collision at cell 9 and finally put into cell 0 </li></ul><ul><li>58 has collisions at cell 8, 9, 0 and finally put into cell 1 </li></ul><ul><li>69 has a collisions at cell 9, 0, 1 and finally put into cell 2 </li></ul>0 1 2 3 4 5 6 7 8 9 49 58 69 18 89
  17. 17. Primary Clustering <ul><li>Forming of blocks of occupied cells (called clusters) </li></ul><ul><li>A collision occurs if a key is hashed into anywhere in a cluster. Then there may be several attempts to resolve the collision before a free space is found. The new data is added into the cluster. </li></ul>
  18. 18. Linear Probing : ( Problem s) <ul><li>Primary Clustering </li></ul><ul><li>Normal deletion cannot be performed : </li></ul><ul><ul><li>(some following find operations will fail because the link of </li></ul></ul><ul><ul><li>collisions that leads to the data is cut) Use lazy deletion </li></ul></ul><ul><li>Insertion cost = number of probes to find an empty cell </li></ul><ul><li>= 1/(fraction of empty cells) </li></ul><ul><li>= 1/(1-  ) </li></ul>
  19. 19. Quadratic Probing <ul><li>Eliminate primary clustering </li></ul><ul><li>f(i) = i 2 </li></ul><ul><li>h i (x) = (hash(x) + i 2 ) mod TableSize </li></ul><ul><li>Try hash(x) mod TableSize, hash(x)+1 2 mod TableSize, </li></ul><ul><li>hash(x)+2 2 mod TableSize, hash(x)+3 2 mod TableSize, . . . </li></ul><ul><li>Table must be at most half full and table size must be prime, otherwise insertion may fail (always have a collision) </li></ul>
  20. 20. Quadratic Probing Insert: 89, 18, 49, 58, 69 Insert 89, try cell 9 Insert 18, try cell 8 Insert 49, try cell 9, 0 Insert 58, try cell 8, 9, 2 Insert 69, try cell 9, 0, 3 0 1 2 3 4 5 6 7 8 9 49 58 18 89 69
  21. 21. Quadratic Probing Insert: 10, 20, 30, 40, 50, 60, 70 Insert 10, try cell 0 Insert 20, try cell 0, 1 Insert 30, try cell 0, 1, 4 Insert 40, try cell 0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6 (16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5 (25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5 (25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1 (121), 4 (144), 9 (169), 6 (196), . . . 20 30 50 60 0 1 2 3 4 5 6 7 8 9 10 40
  22. 22. Quadratic Probing <ul><li>Secondary clustering </li></ul><ul><ul><li>elements that hash to the same position will probe the same alternative cells and put into the next available space, forming a cluster. </li></ul></ul><ul><ul><li>In the first example, inserting 89, 49, 69 forms a secondary cluster. Inserting 18, 58 forms another secondary cluster. </li></ul></ul>
  23. 23. Double Hashing <ul><li>f(i) = i * hash 2 (x) </li></ul><ul><li>h i (x) = (hash(x) + i * hash 2 (x)) mod TableSize </li></ul><ul><li>Try hash(x) mod TableSize, (hash(x) + hash 2 (x)) mod TableSize, </li></ul><ul><li>(hash(x) + 2*hash 2 (x)) mod TableSize, . . . </li></ul><ul><li>Example: hash 2 (x) = R - (x mod R) </li></ul><ul><ul><li>R is a prime number smaller than TableSize </li></ul></ul>
  24. 24. Double Hashing Insert: 89, 18, 49, 58, 69, 23 hash 2 (49) = 7-(49 mod 7) = 7 hash 2 (58) = 7-(58 mod 7) = 5 hash 2 (69) = 7-(69 mod 7) = 1 hash 2 (23) = 7-(23 mod 7) = 5 Insert 49, try 9, (9+7) mod 10 = 6 Insert 58, try 8, (8+5) mod 10 = 3 Insert 69, try 9, (9+1) mod 10 = 0 Insert 23, try 3, (3 + 5) mod 10 = 8, (3 + 10) mod 10 = 3, (3+15) mod 10 = 8, . . . 0 1 2 3 4 5 6 7 8 9 69 18 89 58 49
  25. 25. Rehashing <ul><li>When the table is too full, create a new table at least twice as big (and size is prime), compute the new hash value of each element, insert it into the new table. </li></ul><ul><li>Rehash when the table is half full, or when an insertion fails, or when a certain load factor is reached. </li></ul><ul><li>Because of lazy deletion, deleted cells are also counted when the load factor is calculated. </li></ul><ul><li>Rehashing time is O(N). But the cost is shared by preceding N/2 insertions. So, it adds constant cost to each insertion. </li></ul>
  26. 26. Rehashing

×