Upcoming SlideShare
×

# Hashing

664
-1

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
664
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
36
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Hashing

1. 1. Hashing Department of Computer Science Islamia College Univerisity Peshawar Fall 2012 Semester BCS course: CS 00 Analysis of Algorithms Course Instructor: Mr. Zahid 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol
2. 2. Dictionary  Holds n records  What data structure should be used to implement T? 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
3. 3. Hashing 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
4. 4. Direct Addressing  Assumptions   The set of keys Keys are distinct  Create a table T[0..u-1]  Benefit  Each operation takes constant time  Drawbacks  The range of keys can be large 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
5. 5. Hashing  Solution  12/30/13 Use a hash function h to map the universe U of all keys into {0, 1, …, m– 1} Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
6. 6. Hash Table  The mapped keys are stored into table called hash table  The table consists of m cells  A hash table requires much less storage than a direct address table  With direct addressing, an element in key k is stored in slot k, with hashing, this element is stored in slot h(k)  So the hash function h : U → {0, 1, …., m-1}  h(k) is also called hash value of key k 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
7. 7. Hashing Functions - Modulo Function  Several functions can be used to map keys into a set of integers. The choice is made on the basis of amount of computation time required, and simplicity of the computational steps. A common choice is a modulo function h(x) defined as: h(k) = k mod m where k is the key, m is some positive integer and mod denotes the modulus operator which computes the remainder of key k divided by m.  It follows that the hash function h(x) maps the set of keys {k1, k2, k3, …….kn} into a set of integers {0,1,2,……m-1}  In essence, the modulo function is used to create a hash table of size m 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
8. 8. Modulo Function (contd…) 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
9. 9. Hashing Functions - Multiplication Method 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
10. 10. Hashing of Strings 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
11. 11. ASCII Sum Method 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
12. 12. Radix Method 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
13. 13. Universal Hashing 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
14. 14. Universal Hashing (contd…) s Ha,b(k)=((ak+b)modp)mod m where p is large enough so that every possible key k is in the range 0 to p-1, inclusive, and 0<a<p and 0<=b<p belongs to the the family of universal functions mod 6 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
15. 15. Perfect Hashing 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
16. 16. Perfect Hashing 0 1 2 m2 a2 b2 4 10 18 S2 60 75 3 … 8  12/30/13 Using perfect hashing to store {10, 22, 37, 40, 60, 70, 75}, outer hash function is Ha,b(k)=((ak+b)modp)mod m where a=3, b=42, p=101, and m=9. e.g. h(75)=2. Since h2(75)=1, 75 is stored in slot1 of secondary hash table Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
17. 17. Collisions  Two or more than two keys may hash to the same slot  When a record to be inserted maps to an already occupied slot in T, a collision occurs  Can we avoid collisions altogether?  Not if |U| > m  We need a method to resolve collisions that occur 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
18. 18. Collisions 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
19. 19. Collision Resolution  Two basic approaches to collision resolution are called chained hashing and open address hashing  Chained Hashing: In chained hashing the elements of a hash table are stored in a set of linked lists.  All colliding elements are kept in one linked list.  The list head pointers are usually stored in an array.  Chained hashing is also known as open hashing  Open Address Hashing: In open address hashing, the hashed keys are stored in the hash table itself.  The colliding keys are allocated distinct cells in the table.  Open address hashing is also referred to as closed hashing 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
20. 20. Collision Resolution by Chaining  Records in the same slot are linked into a list 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
21. 21. Collision Resolution by Chaining (contd…) 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
22. 22. Analysis of Hashing with Chaining  How long does it take to search for an element with a given key?  Let n be the number of keys in the table, and let m be the number of slots  Define the load factor of T to be α = n/m = average number of keys per slot  Analysis is in terms of α, which can be less than, equal to, or greater than 1 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
23. 23. Worst Hashing - Searching  All hash keys are mapped to a single list.  This situation may be referred to as worst distribution of hash keys  In practice, this extreme situation may not arise, but nevertheless, possibility does exist  Worst case time for searching is thus θ(n), plus time to compute the hash function  The best search time is θ(1), since the key will be found in the front node  On an average, half the list will be examined. Thus, average search time is θ(n) 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
24. 24. Worst Hashing - Insertion  The worst case running time for insertion is θ(1)  The assumption is that the key is not already present in the table  To check presence, search of the key is required – As just mentioned, worst case time of searching is θ(n)  Thus worst case running time of insertion is θ(n)  Average cost running time of insertion is also θ(n) 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
25. 25. Simple Uniform Hashing - Searching  The keys are uniformly distributed among all the linked lists i.e. it is assumed that any given element is equally likely to hash into any of the m slots  Let us denote length of the list T[j] for j= 0,1,…., m-1 by nj so that n=n0+n1+…+nm-1 and the average value of nj=E[nj] = α = n/m  We assume that hash value h(k) can be computed in O(1) time  So time required to search for an element with key k depends linearly on the length nh(k) of the list T[h(k)] 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
26. 26. Simple Uniform Hashing - Searching  Two cases    Unsuccessful search Successful search Unsuccessful search  Expected time to search unsuccessfully for a key k is the expected time to search to the end of list T[h(k)], which has the expected length E[nh(k)]= α  Thus total time required is θ(1+ α) 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
27. 27. Simple Uniform Hashing - Insertion  In order to find average time for inserting a key, let us consider the case when kth key is inserted. At that stage, the list has already k-1 keys distributed uniformly over m linked lists. Thus, prior to insertion of kth key, the average length of each list is (k-1)/m, as shown in the diagram 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
28. 28. Simple Uniform Hashing - Insertion  The insertion of new key would require probing of (k-1)/m keys plus the cost of adding new key.  Thus, the overall cost of insertion of kth key is 1+(k-1)/m, assuming that each operation consumes unit time 1.  The expected cost of inserting a key is obtained by summing over all possible values of k. Thus, the expected cost I is given by  The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α) 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
29. 29. Simple Uniform Hashing - Searching  Successful search  We assume that element x to be searched is equally likely to be any of the n elements stored in the table  The number of elements examined is one more than number of elements that appear before x is x’s list  Elements before x in the list were all placed after x was inserted  Total time required for a successful search is 1+ α /2- α /2n = θ(1+ α)  If n=O(m), α=n/m=O(m)/m=1  Thus searching takes constant time on average 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
30. 30. Open Addressing  All elements are stored in the hash table itself  In open addressing, the hash table can fill up, so that no further insertions can be made  The load factor α can never exceed 1  Advantage is that open addressing avoids pointers altogether  Extra memory freed provides hash table with a larger number of slots for the same amount of memory 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
31. 31. Insertion  We successively examine or probe the hash table until we find an empty slot in which to put the key  The sequence of positions probed depends upon the key being inserted  To determine which points to probe, we extend hash functions to include the probe number as a second input. Thus hash function becomes: h : U x {0, 1, …., m-1} → {0, 1, …., m-1} 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
32. 32. Pseudo code HASH-INSERT(T, k) 1. i ← 0 2. Repeat j ← h(k,i) 3. if T[j]=NIL 4. then T[j]←k 5. return j 6. else i ← i+1 7. until i=m 8. Error “Table full” 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
33. 33. Linear Probing  In linear probing the hashed key is incremented by an integer value. In general the hash function is defined as function h(k,i)=( h’(k)+ i) mod m, where h’(k) is an auxiliary hash function and m is the table size. 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
34. 34. Linear Probing (contd…) 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
35. 35. Searching HASH-SEARCH(T, k) 1. i ← 0 2. Repeat j ← h(k,i) 3. if T[j]=k 4. then return j 5. i ← i+1 6. until T[j]=NIL or i=m 7. Return NIL 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
36. 36. Quadratic Probing 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
37. 37. Quadratic Probing 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009
38. 38. Quadratic Probing 12/30/13 Lecture #9 Adapted from slides by Dr Onaiza Maqbol Wednesday, March 18, 2009