Class No.32  Data Structures http://ecomputernotes.com
Tables and Dictionaries http://ecomputernotes.com
Tables: rows & columns of information <ul><li>A  table  has several  fields  (types of information) </li></ul><ul><ul><li>...
Tables: rows & columns of information <ul><li>To find an  entry  in the table, you only need know the contents of  one  of...
Tables: rows & columns of information <ul><li>Ideally, a key  uniquely identifies  an entry </li></ul><ul><ul><li>If the k...
The Table ADT: operations <ul><li>insert :  given a key and an entry, inserts the entry into the table </li></ul><ul><li>f...
How should we implement a table? <ul><li>How often are entries inserted and removed? </li></ul><ul><li>How many of the pos...
TableNode: a key and its entry <ul><li>For searching purposes, it is best to store the key and the entry separately (even ...
Implementation 1: unsorted sequential array <ul><li>An array in which TableNodes are stored consecutively in  any  order <...
Implementation 2:sorted sequential array <ul><li>An array in which TableNodes are stored consecutively,  sorted  by key </...
Searching an Array: Binary Search <ul><li>Binary search is like looking up a phone number or  a word in the dictionary </l...
Binary Search <ul><li>  </li></ul>If ( value == middle element )    value is found  else if ( value < middle element )  se...
Case 1:   val == a[mid] val = 10 low = 0, high = 8 5 7 9 10 13 17 19 1 27 a: low high Binary Search http://ecomputernotes....
Case 2:   val > a[mid] val = 19 low = 0, high = 8 mid = (0 + 8) / 2 = 4 Binary Search -- Example 2 5 7 9 10 1 a: mid low h...
Case 3:   val < a[mid] val = 7 low = 0, high = 8 mid = (0 + 8) / 2 = 4 Binary Search -- Example 3 10 13 17 19 27 a: mid lo...
val = 7 Binary Search -- Example 3 (cont) 5 7 9 10 13 17 19 1 27 1 2 3 4 5 6 7 0 8 a: 5 7 9 10 13 17 19 1 27 1 2 3 4 5 6 7...
Binary Search – C++ Code <ul><li>int isPresent(int *arr, int val, int N) </li></ul><ul><li>{ </li></ul><ul><li>int low = 0...
Binary Search:  binary tree <ul><li>The search divides a list into two small sub-lists till a sub-list is no more divisibl...
Binary Search Efficiency <ul><li>After 1 bisection N/2   items </li></ul><ul><li>After 2 bisections N/4 = N/2 2 items </li...
Implementation 3: linked list <ul><li>TableNodes are again stored consecutively (unsorted or sorted) </li></ul><ul><li>ins...
Implementation 4: Skip List <ul><li>Overcome basic limitations of previous lists </li></ul><ul><ul><li>Search and update r...
Skip List Representation <ul><li>Can do better than  n  comparisons to find element in chain of length  n </li></ul>http:/...
Skip List Representation <ul><li>Example:  n/2 + 1  if we keep pointer to middle element </li></ul>http://ecomputernotes.c...
Higher Level Chains <ul><li>For general n, level 0 chain includes all elements </li></ul><ul><li>level 1 every other eleme...
Higher Level Chains <ul><li>Skip list contains a hierarchy of chains </li></ul><ul><li>In general level  i  contains a sub...
Skip List: formally <ul><li>A skip list for a set  S  of distinct (key, element) items is a series of lists  S 0 ,  S 1  ,...
Lecture No.38 Data Structure Dr. Sohail Aslam
Skip List: formally 56 64 78  31 34 44  12 23 26 S 0 64  31 34  23 S 1  31  S 2   S 3
Skip List: Search <ul><li>We search for a key  x  as follows: </li></ul><ul><ul><li>We start at the first position of the ...
Skip List: Search <ul><li>Example: search for 78 </li></ul>S 0 S 1 S 2 S 3  31  64  31 34  23 56 64 78  31 34 44...
<ul><li>To insert an item  ( x ,  o )  into a skip list, we use a randomized algorithm: </li></ul><ul><ul><li>We repeatedl...
<ul><li>To insert an item  ( x ,  o )  into a skip list, we use a randomized algorithm: (cont) </li></ul><ul><ul><li>We se...
<ul><li>Example: insert key  15 , with  i     2 </li></ul>Skip List: Insertion   10 36   23 23   S 0 S 1 S 2 ...
Randomized Algorithms <ul><li>A randomized algorithm performs coin tosses (i.e., uses random bits) to control its executio...
Skip List: Deletion <ul><li>To remove an item with key  x   from a skip list, we proceed as follows: </li></ul><ul><ul><li...
Skip List: Deletion <ul><li>Example: remove key  34 </li></ul>  S 0 S 1 S 2 S 3   45 12 23 34   34   23 34...
Skip List: Implementation   S 0 S 1 S 2 S 3   45 12 23 34   34   23 34
Implementation: TowerNode <ul><li>TowerNode will have array of next pointers. </li></ul><ul><li>Actual number of next poin...
Implementation: QuadNode <ul><li>A quad-node stores: </li></ul><ul><ul><li>item </li></ul></ul><ul><ul><li>link to the nod...
Skip Lists with Quad Nodes 56 64 78  31 34 44  12 23 26    31  64  31 34  23 S 0 S 1 S 2 S 3
Performance of Skip Lists <ul><li>In a skip list with  n  items  </li></ul><ul><ul><li>The expected space used is proporti...
Implementation 5: AVL tree <ul><li>An AVL tree, ordered by key </li></ul><ul><li>insert : a standard insert; (log  n ) </l...
Anything better? <ul><li>So far we have find, remove and insert where time varies between constant log n . </li></ul><ul><...
<ul><li>An  array  in which TableNodes are  not  stored consecutively </li></ul><ul><li>Their place of storage is calculat...
<ul><li>insert : calculate place of storage, insert TableNode; (1) </li></ul><ul><li>find : calculate place of storage, re...
Hashing <ul><li>We use an array of some fixed size  T  to hold the data.  T  is typically prime. </li></ul><ul><li>Each ke...
Example: fruits <ul><li>Suppose our hash function gave us the following values: </li></ul><ul><ul><li>hashCode(&quot;apple...
Example <ul><li>Store data in a table array: </li></ul><ul><ul><li>table[5] = &quot;apple&quot;  table[3] = &quot;watermel...
Example <ul><li>Associative array: </li></ul><ul><ul><li>table[&quot;apple&quot;]  table[&quot;watermelon&quot;]   table[&...
Example Hash Functions <ul><li>If the keys are strings the hash function is some function of the characters in the strings...
Finding the hash function <ul><li>int hashCode( char* s ) { int i, sum; sum = 0; for(i=0; i < strlen(s); i++ )  sum = sum ...
Example Hash Functions <ul><li>Another possibility is to convert the string into some number in some arbitrary base  b  ( ...
Example Hash Functions <ul><li>If the keys are integers then  key%T  is generally a good hash function, unless the data ha...
Collision <ul><li>Suppose our hash function gave us the following values: </li></ul><ul><ul><li>hash(&quot;apple&quot;) = ...
Collision <ul><li>When two values hash to the same array location, this is called a  collision </li></ul><ul><li>Collision...
Solution for Handling collisions <ul><li>Solution #1:  Search from there for an empty location </li></ul><ul><ul><li>Can s...
Solution for Handling collisions <ul><li>Solution #2:  Use a second hash function </li></ul><ul><ul><li>...and a third, an...
Solution for Handling collisions <ul><li>Solution #3:  Use the array location as the header of a linked list of values tha...
Solution 1: Open Addressing <ul><li>This approach of handling collisions is called  open addressing ; it is also known as ...
Linear Probing <ul><li>We use  f(i) = i , i.e.,  f  is a linear function of  i . Thus location(x) = (hash(x) + i) mod Tabl...
Linear Probing: insert <ul><li>Suppose we want to  add  seagull  to this hash table </li></ul><ul><li>Also suppose: </li><...
Linear Probing: insert <ul><li>Suppose you want to  add   hawk  to this hash table </li></ul><ul><li>Also suppose </li></u...
Linear Probing: insert <ul><li>Suppose: </li></ul><ul><ul><li>You want to add  cardinal  to this hash table </li></ul></ul...
Linear Probing: find <ul><li>Suppose we want to find  hawk  in this hash table </li></ul><ul><li>We proceed as follows: </...
Linear Probing and Deletion <ul><li>If an item is placed in  array[hash(key)+4] , then the item just before it is deleted ...
Clustering <ul><li>One problem with linear probing technique is the tendency to form “clusters”. </li></ul><ul><li>A  clus...
Quadratic Probing <ul><li>Quadratic probing uses different formula: </li></ul><ul><ul><li>Use F(i) = i 2  to resolve colli...
Collision resolution: chaining <ul><li>Each table position is a linked list </li></ul><ul><li>Add the keys and entries any...
Collision resolution: chaining <ul><li>Advantages over open addressing: </li></ul><ul><ul><li>Simpler insertion and remova...
Applications of Hashing <ul><li>Compilers use hash tables to keep track of declared variables (symbol table). </li></ul><u...
Applications of Hashing <ul><li>Game playing programs use hash tables to store seen positions, thereby saving computation ...
When is hashing suitable? <ul><li>Hash tables are very good if there is a need for many searches in a reasonably stable ta...
Upcoming SlideShare
Loading in …5
×

Computer notes - Binary Search

2,187 views

Published on

Binary search is like looking up a phone number or a word in the dictionary Start in middle of book If name you're looking for comes before names on page, look in first half
Otherwise, look in second half

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,187
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
68
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Start of lecture 38.
  • End of Lecture 38
  • Start lecture 39
  • End of lecture 39, Start of lecture 40.
  • End of lecture 40.
  • Start of 41.
  • Start lecture 41
  • End of lecture 41. Start of lecture 42.
  • End of Lecture 42.
  • Start of lecture 43 after animation.
  • Computer notes - Binary Search

    1. 1. Class No.32 Data Structures http://ecomputernotes.com
    2. 2. Tables and Dictionaries http://ecomputernotes.com
    3. 3. Tables: rows & columns of information <ul><li>A table has several fields (types of information) </li></ul><ul><ul><li>A telephone book may have fields name, address, phone number </li></ul></ul><ul><ul><li>A user account table may have fields user id, password, home folder </li></ul></ul>http://ecomputernotes.com Name Address Phone Sohail Aslam 50 Zahoor Elahi Rd, Gulberg-4, Lahore 576-3205 Imran Ahmad 30-T Phase-IV, LCCHS, Lahore 572-4409 Salman Akhtar 131-D Model Town, Lahore 784-3753
    4. 4. Tables: rows & columns of information <ul><li>To find an entry in the table, you only need know the contents of one of the fields (not all of them). </li></ul><ul><li>This field is the key </li></ul><ul><ul><li>In a telephone book, the key is usually “name” </li></ul></ul><ul><ul><li>In a user account table, the key is usually “user id” </li></ul></ul>http://ecomputernotes.com
    5. 5. Tables: rows & columns of information <ul><li>Ideally, a key uniquely identifies an entry </li></ul><ul><ul><li>If the key is “name” and no two entries in the telephone book have the same name, the key uniquely identifies the entries </li></ul></ul>http://ecomputernotes.com Name Address Phone Sohail Aslam 50 Zahoor Elahi Rd, Gulberg-4, Lahore 576-3205 Imran Ahmad 30-T Phase-IV, LCCHS, Lahore 572-4409 Salman Akhtar 131-D Model Town, Lahore 784-3753
    6. 6. The Table ADT: operations <ul><li>insert : given a key and an entry, inserts the entry into the table </li></ul><ul><li>find : given a key, finds the entry associated with the key </li></ul><ul><li>remove : given a key, finds the entry associated with the key, and removes it </li></ul>http://ecomputernotes.com
    7. 7. How should we implement a table? <ul><li>How often are entries inserted and removed? </li></ul><ul><li>How many of the possible key values are likely to be used? </li></ul><ul><li>What is the likely pattern of searching for keys? E.g. Will most of the accesses be to just one or two key values? </li></ul><ul><li>Is the table small enough to fit into memory? </li></ul><ul><li>How long will the table exist? </li></ul>Our choice of representation for the Table ADT depends on the answers to the following http://ecomputernotes.com
    8. 8. TableNode: a key and its entry <ul><li>For searching purposes, it is best to store the key and the entry separately (even though the key’s value may be inside the entry) </li></ul>“ Saleem” “ Saleem”, “124 Hawkers Lane”, “9675846” “ Yunus” “ Yunus”, “1 Apple Crescent”, “0044 1970 622455” TableNode http://ecomputernotes.com key entry
    9. 9. Implementation 1: unsorted sequential array <ul><li>An array in which TableNodes are stored consecutively in any order </li></ul><ul><li>insert : add to back of array; (1) </li></ul><ul><li>find : search through the keys one at a time, potentially all of the keys; ( n ) </li></ul><ul><li>remove : find + replace removed node with last node; ( n ) </li></ul>0 … key entry 1 2 3 and so on http://ecomputernotes.com
    10. 10. Implementation 2:sorted sequential array <ul><li>An array in which TableNodes are stored consecutively, sorted by key </li></ul><ul><li>insert : add in sorted order; ( n ) </li></ul><ul><li>find : binary search; (log n ) </li></ul><ul><li>remove : find, remove node and shuffle down; ( n ) </li></ul>0 … key entry 1 2 3 We can use binary search because the array elements are sorted and so on http://ecomputernotes.com
    11. 11. Searching an Array: Binary Search <ul><li>Binary search is like looking up a phone number or a word in the dictionary </li></ul><ul><ul><li>Start in middle of book </li></ul></ul><ul><ul><li>If name you're looking for comes before names on page, look in first half </li></ul></ul><ul><ul><li>Otherwise, look in second half </li></ul></ul>http://ecomputernotes.com
    12. 12. Binary Search <ul><li> </li></ul>If ( value == middle element ) value is found else if ( value < middle element ) search left-half of list with the same method else search right-half of list with the same method http://ecomputernotes.com
    13. 13. Case 1: val == a[mid] val = 10 low = 0, high = 8 5 7 9 10 13 17 19 1 27 a: low high Binary Search http://ecomputernotes.com 1 2 3 4 5 6 7 0 8 mid mid = (0 + 8) / 2 = 4 10
    14. 14. Case 2: val > a[mid] val = 19 low = 0, high = 8 mid = (0 + 8) / 2 = 4 Binary Search -- Example 2 5 7 9 10 1 a: mid low high http://ecomputernotes.com 13 17 19 27 1 2 3 4 5 6 7 0 8 new low new low = mid+1 = 5 13 17 19 27
    15. 15. Case 3: val < a[mid] val = 7 low = 0, high = 8 mid = (0 + 8) / 2 = 4 Binary Search -- Example 3 10 13 17 19 27 a: mid low high http://ecomputernotes.com 5 7 9 1 1 2 3 4 5 6 7 0 8 new high new high = mid-1 = 3 5 7 9 1
    16. 16. val = 7 Binary Search -- Example 3 (cont) 5 7 9 10 13 17 19 1 27 1 2 3 4 5 6 7 0 8 a: 5 7 9 10 13 17 19 1 27 1 2 3 4 5 6 7 0 8 a: 5 7 9 10 13 17 19 1 27 1 2 3 4 5 6 7 0 8 a:
    17. 17. Binary Search – C++ Code <ul><li>int isPresent(int *arr, int val, int N) </li></ul><ul><li>{ </li></ul><ul><li>int low = 0; </li></ul><ul><li>int high = N - 1; </li></ul><ul><li>int mid; </li></ul><ul><li>while ( low <= high ){ </li></ul><ul><li>mid = ( low + high )/2; </li></ul><ul><li>if (arr[mid]== val) </li></ul><ul><li> return 1; // found! </li></ul><ul><li>else if (arr[mid] < val) </li></ul><ul><ul><ul><li>low = mid + 1; </li></ul></ul></ul><ul><li>else </li></ul><ul><li> high = mid - 1; </li></ul><ul><li> } </li></ul><ul><li>return 0; // not found </li></ul><ul><li>} </li></ul>http://ecomputernotes.com
    18. 18. Binary Search: binary tree <ul><li>The search divides a list into two small sub-lists till a sub-list is no more divisible. </li></ul>First half First half An entire sorted list First half Second half Second half http://ecomputernotes.com
    19. 19. Binary Search Efficiency <ul><li>After 1 bisection N/2 items </li></ul><ul><li>After 2 bisections N/4 = N/2 2 items </li></ul><ul><li>. . . </li></ul><ul><li>After i bisections N/2 i = 1 item </li></ul><ul><li>i = log 2 N </li></ul>http://ecomputernotes.com
    20. 20. Implementation 3: linked list <ul><li>TableNodes are again stored consecutively (unsorted or sorted) </li></ul><ul><li>insert : add to front; (1 or n for a sorted list ) </li></ul><ul><li>find : search through potentially all the keys, one at a time; ( n for unsorted or for a sorted list </li></ul><ul><li>remove : find, remove using pointer alterations; ( n ) </li></ul>key entry and so on http://ecomputernotes.com
    21. 21. Implementation 4: Skip List <ul><li>Overcome basic limitations of previous lists </li></ul><ul><ul><li>Search and update require linear time </li></ul></ul><ul><li>Fast Searching of Sorted Chain </li></ul><ul><li>Provide alternative to BST (binary search trees) and related tree structures. Balancing can be expensive. </li></ul><ul><li>Relatively recent data structure: Bill Pugh proposed it in 1990. </li></ul>http://ecomputernotes.com
    22. 22. Skip List Representation <ul><li>Can do better than n comparisons to find element in chain of length n </li></ul>http://ecomputernotes.com 20 30 40 50 60 head tail
    23. 23. Skip List Representation <ul><li>Example: n/2 + 1 if we keep pointer to middle element </li></ul>http://ecomputernotes.com 20 30 40 50 60 head tail
    24. 24. Higher Level Chains <ul><li>For general n, level 0 chain includes all elements </li></ul><ul><li>level 1 every other element, level 2 chain every fourth, etc. </li></ul><ul><li>level i , every 2 i th element </li></ul>http://ecomputernotes.com 40 50 60 head tail 20 30 26 57 level 1&2 chains
    25. 25. Higher Level Chains <ul><li>Skip list contains a hierarchy of chains </li></ul><ul><li>In general level i contains a subset of elements in level i-1 </li></ul>40 50 60 head tail 20 30 26 57 level 1&2 chains
    26. 26. Skip List: formally <ul><li>A skip list for a set S of distinct (key, element) items is a series of lists S 0 , S 1 , … , S h such that </li></ul><ul><ul><li>Each list S i contains the special keys  and  </li></ul></ul><ul><ul><li>List S 0 contains the keys of S in nondecreasing order </li></ul></ul><ul><ul><li>Each list is a subsequence of the previous one, i.e., S 0  S 1  …  S h </li></ul></ul><ul><ul><li>List S h contains only the two special keys </li></ul></ul>
    27. 27. Lecture No.38 Data Structure Dr. Sohail Aslam
    28. 28. Skip List: formally 56 64 78  31 34 44  12 23 26 S 0 64  31 34  23 S 1  31  S 2   S 3
    29. 29. Skip List: Search <ul><li>We search for a key x as follows: </li></ul><ul><ul><li>We start at the first position of the top list </li></ul></ul><ul><ul><li>At the current position p , we compare x with y  key ( after ( p )) </li></ul></ul><ul><ul><li>x  y : we return element ( after ( p )) </li></ul></ul><ul><ul><li>x  y : we “scan forward” </li></ul></ul><ul><ul><li>x  y : we “drop down” </li></ul></ul><ul><ul><li>If we try to drop down past the bottom list, we return NO_SUCH_KEY </li></ul></ul>
    30. 30. Skip List: Search <ul><li>Example: search for 78 </li></ul>S 0 S 1 S 2 S 3  31  64  31 34  23 56 64 78  31 34 44  12 23 26  
    31. 31. <ul><li>To insert an item ( x , o ) into a skip list, we use a randomized algorithm: </li></ul><ul><ul><li>We repeatedly toss a coin until we get tails, and we denote with i the number of times the coin came up heads </li></ul></ul><ul><ul><li>If i  h , we add to the skip list new lists S h  1 , … , S i  1 , each containing only the two special keys </li></ul></ul>Skip List: Insertion
    32. 32. <ul><li>To insert an item ( x , o ) into a skip list, we use a randomized algorithm: (cont) </li></ul><ul><ul><li>We search for x in the skip list and find the positions p 0 , p 1 , …, p i of the items with largest key less than x in each list S 0 , S 1 , … , S i </li></ul></ul><ul><ul><li>For j  0, …, i , we insert item ( x , o ) into list S j after position p j </li></ul></ul>Skip List: Insertion
    33. 33. <ul><li>Example: insert key 15 , with i  2 </li></ul>Skip List: Insertion   10 36   23 23   S 0 S 1 S 2 p 0 p 1 p 2   S 0 S 1 S 2 S 3   10 36 23 15   15   23 15
    34. 34. Randomized Algorithms <ul><li>A randomized algorithm performs coin tosses (i.e., uses random bits) to control its execution </li></ul><ul><li>It contains statements of the type </li></ul><ul><ul><li>b  random () </li></ul></ul><ul><ul><li>if b <= 0.5 // head </li></ul></ul><ul><ul><li>do A … </li></ul></ul><ul><ul><li>else // tail </li></ul></ul><ul><ul><li>do B … </li></ul></ul><ul><li>Its running time depends on the outcomes of the coin tosses, i.e, head or tail </li></ul>
    35. 35. Skip List: Deletion <ul><li>To remove an item with key x from a skip list, we proceed as follows: </li></ul><ul><ul><li>We search for x in the skip list and find the positions p 0 , p 1 , …, p i of the items with key x , where position p j is in list S j </li></ul></ul><ul><ul><li>We remove positions p 0 , p 1 , …, p i from the lists S 0 , S 1 , … , S i </li></ul></ul><ul><ul><li>We remove all but one list containing only the two special keys </li></ul></ul>
    36. 36. Skip List: Deletion <ul><li>Example: remove key 34 </li></ul>  S 0 S 1 S 2 S 3   45 12 23 34   34   23 34 p 0 p 1 p 2   45 12   23 23   S 0 S 1 S 2
    37. 37. Skip List: Implementation   S 0 S 1 S 2 S 3   45 12 23 34   34   23 34
    38. 38. Implementation: TowerNode <ul><li>TowerNode will have array of next pointers. </li></ul><ul><li>Actual number of next pointers will be decided by the random procedure. </li></ul><ul><li>Define MAXLEVEL as an upper limit on number of levels in a node. </li></ul>40 50 60 head tail 20 30 26 57 Tower Node
    39. 39. Implementation: QuadNode <ul><li>A quad-node stores: </li></ul><ul><ul><li>item </li></ul></ul><ul><ul><li>link to the node before </li></ul></ul><ul><ul><li>link to the node after </li></ul></ul><ul><ul><li>link to the node below </li></ul></ul><ul><ul><li>link to the node above </li></ul></ul><ul><li>This will require copying the key (jitem) at different levels </li></ul>x quad-node
    40. 40. Skip Lists with Quad Nodes 56 64 78  31 34 44  12 23 26    31  64  31 34  23 S 0 S 1 S 2 S 3
    41. 41. Performance of Skip Lists <ul><li>In a skip list with n items </li></ul><ul><ul><li>The expected space used is proportional to n . </li></ul></ul><ul><ul><li>The expected search, insertion and deletion time is proportional to log n . </li></ul></ul><ul><li>Skip lists are fast and simple to implement in practice </li></ul>
    42. 42. Implementation 5: AVL tree <ul><li>An AVL tree, ordered by key </li></ul><ul><li>insert : a standard insert; (log n ) </li></ul><ul><li>find : a standard find (without removing, of course); (log n ) </li></ul><ul><li>remove : a standard remove; (log n ) </li></ul>and so on key entry key entry key entry key entry
    43. 43. Anything better? <ul><li>So far we have find, remove and insert where time varies between constant log n . </li></ul><ul><li>It would be nice to have all three as constant time operations! </li></ul>
    44. 44. <ul><li>An array in which TableNodes are not stored consecutively </li></ul><ul><li>Their place of storage is calculated using the key and a hash function </li></ul><ul><li>Keys and entries are scattered throughout the array. </li></ul>Implementation 6: Hashing key entry Key hash function array index 4 10 123
    45. 45. <ul><li>insert : calculate place of storage, insert TableNode; (1) </li></ul><ul><li>find : calculate place of storage, retrieve entry; (1) </li></ul><ul><li>remove : calculate place of storage, set it to null; (1) </li></ul>Hashing key entry 4 10 123 All are constant time (1) !
    46. 46. Hashing <ul><li>We use an array of some fixed size T to hold the data. T is typically prime. </li></ul><ul><li>Each key is mapped into some number in the range 0 to T-1 using a hash function , which ideally should be efficient to compute. </li></ul>
    47. 47. Example: fruits <ul><li>Suppose our hash function gave us the following values: </li></ul><ul><ul><li>hashCode(&quot;apple&quot;) = 5 hashCode(&quot;watermelon&quot;) = 3 hashCode(&quot;grapes&quot;) = 8 hashCode(&quot;cantaloupe&quot;) = 7 hashCode(&quot;kiwi&quot;) = 0 hashCode(&quot;strawberry&quot;) = 9 hashCode(&quot;mango&quot;) = 6 hashCode(&quot;banana&quot;) = 2 </li></ul></ul>kiwi banana watermelon apple mango cantaloupe grapes strawberry 0 1 2 3 4 5 6 7 8 9
    48. 48. Example <ul><li>Store data in a table array: </li></ul><ul><ul><li>table[5] = &quot;apple&quot; table[3] = &quot;watermelon&quot; table[8] = &quot;grapes&quot; table[7] = &quot;cantaloupe&quot; table[0] = &quot;kiwi&quot; table[9] = &quot;strawberry&quot; table[6] = &quot;mango&quot; table[2] = &quot;banana&quot; </li></ul></ul>kiwi banana watermelon apple mango cantaloupe grapes strawberry 0 1 2 3 4 5 6 7 8 9
    49. 49. Example <ul><li>Associative array: </li></ul><ul><ul><li>table[&quot;apple&quot;] table[&quot;watermelon&quot;] table[&quot;grapes&quot;] table[&quot;cantaloupe&quot;] table[&quot;kiwi&quot;] table[&quot;strawberry&quot;] table[&quot;mango&quot;] table[&quot;banana&quot;] </li></ul></ul>kiwi banana watermelon apple mango cantaloupe grapes strawberry 0 1 2 3 4 5 6 7 8 9
    50. 50. Example Hash Functions <ul><li>If the keys are strings the hash function is some function of the characters in the strings. </li></ul><ul><li>One possibility is to simply add the ASCII values of the characters: </li></ul>TableSize ABC h Example TableSize i str str h length i )% 67 66 65 ( ) ( : % ] [ ) ( 1 0               
    51. 51. Finding the hash function <ul><li>int hashCode( char* s ) { int i, sum; sum = 0; for(i=0; i < strlen(s); i++ ) sum = sum + s[i]; // ascii value return sum % TABLESIZE; </li></ul><ul><li>} </li></ul>
    52. 52. Example Hash Functions <ul><li>Another possibility is to convert the string into some number in some arbitrary base b ( b also might be a prime number): </li></ul>T b b b ABC h Example T b i str str h length i i )% 67 66 65 ( ) ( : % ] [ ) ( 2 1 0 1 0                
    53. 53. Example Hash Functions <ul><li>If the keys are integers then key%T is generally a good hash function, unless the data has some undesirable features. </li></ul><ul><li>For example, if T = 10 and all keys end in zeros, then key%T = 0 for all keys. </li></ul><ul><li>In general, to avoid situations like this, T should be a prime number. </li></ul>
    54. 54. Collision <ul><li>Suppose our hash function gave us the following values: </li></ul><ul><ul><li>hash(&quot;apple&quot;) = 5 hash(&quot;watermelon&quot;) = 3 hash(&quot;grapes&quot;) = 8 hash(&quot;cantaloupe&quot;) = 7 hash(&quot;kiwi&quot;) = 0 hash(&quot;strawberry&quot;) = 9 hash(&quot;mango&quot;) = 6 hash(&quot;banana&quot;) = 2 </li></ul></ul>• Now what? hash(&quot;honeydew&quot;) = 6 kiwi banana watermelon apple mango cantaloupe grapes strawberry 0 1 2 3 4 5 6 7 8 9
    55. 55. Collision <ul><li>When two values hash to the same array location, this is called a collision </li></ul><ul><li>Collisions are normally treated as “first come, first served”—the first value that hashes to the location gets it </li></ul><ul><li>We have to find something to do with the second and subsequent values that hash to this same location. </li></ul>
    56. 56. Solution for Handling collisions <ul><li>Solution #1: Search from there for an empty location </li></ul><ul><ul><li>Can stop searching when we find the value or an empty location. </li></ul></ul><ul><ul><li>Search must be wrap-around at the end. </li></ul></ul>
    57. 57. Solution for Handling collisions <ul><li>Solution #2: Use a second hash function </li></ul><ul><ul><li>...and a third, and a fourth, and a fifth, ... </li></ul></ul>
    58. 58. Solution for Handling collisions <ul><li>Solution #3: Use the array location as the header of a linked list of values that hash to this location </li></ul>
    59. 59. Solution 1: Open Addressing <ul><li>This approach of handling collisions is called open addressing ; it is also known as closed hashing . </li></ul><ul><li>More formally, cells at h 0 (x) , h 1 (x) , h 2 (x) , … are tried in succession where h i (x) = (hash(x) + f(i)) mod TableSize , with f(0) = 0 . </li></ul><ul><li>The function, f , is the collision resolution strategy. </li></ul>
    60. 60. Linear Probing <ul><li>We use f(i) = i , i.e., f is a linear function of i . Thus location(x) = (hash(x) + i) mod TableSize </li></ul><ul><li>The collision resolution strategy is called linear probing because it scans the array sequentially (with wrap around) in search of an empty cell. </li></ul>
    61. 61. Linear Probing: insert <ul><li>Suppose we want to add seagull to this hash table </li></ul><ul><li>Also suppose: </li></ul><ul><ul><li>hashCode(“seagull”) = 143 </li></ul></ul><ul><ul><li>table[143] is not empty </li></ul></ul><ul><ul><li>table[143] != seagull </li></ul></ul><ul><ul><li>table[144] is not empty </li></ul></ul><ul><ul><li>table[144] != seagull </li></ul></ul><ul><ul><li>table[145] is empty </li></ul></ul><ul><li>Therefore, put seagull at location 145 </li></ul>seagull robin sparrow hawk bluejay owl . . . 141 142 143 144 145 146 147 148 . . .
    62. 62. Linear Probing: insert <ul><li>Suppose you want to add hawk to this hash table </li></ul><ul><li>Also suppose </li></ul><ul><ul><li>hashCode(“hawk”) = 143 </li></ul></ul><ul><ul><li>table[143] is not empty </li></ul></ul><ul><ul><li>table[143] != hawk </li></ul></ul><ul><ul><li>table[144] is not empty </li></ul></ul><ul><ul><li>table[144] == hawk </li></ul></ul><ul><li>hawk is already in the table, so do nothing. </li></ul>robin sparrow hawk seagull bluejay owl . . . 141 142 143 144 145 146 147 148 . . .
    63. 63. Linear Probing: insert <ul><li>Suppose: </li></ul><ul><ul><li>You want to add cardinal to this hash table </li></ul></ul><ul><ul><li>hashCode(“cardinal”) = 147 </li></ul></ul><ul><ul><li>The last location is 148 </li></ul></ul><ul><ul><li>147 and 148 are occupied </li></ul></ul><ul><li>Solution: </li></ul><ul><ul><li>Treat the table as circular; after 148 comes 0 </li></ul></ul><ul><ul><li>Hence, cardinal goes in location 0 (or 1, or 2, or ...) </li></ul></ul>robin sparrow hawk seagull bluejay owl . . . 141 142 143 144 145 146 147 148
    64. 64. Linear Probing: find <ul><li>Suppose we want to find hawk in this hash table </li></ul><ul><li>We proceed as follows: </li></ul><ul><ul><li>hashCode(“hawk”) = 143 </li></ul></ul><ul><ul><li>table[143] is not empty </li></ul></ul><ul><ul><li>table[143] != hawk </li></ul></ul><ul><ul><li>table[144] is not empty </li></ul></ul><ul><ul><li>table[144] == hawk ( found! ) </li></ul></ul><ul><li>We use the same procedure for looking things up in the table as we do for inserting them </li></ul>robin sparrow hawk seagull bluejay owl . . . 141 142 143 144 145 146 147 148 . . .
    65. 65. Linear Probing and Deletion <ul><li>If an item is placed in array[hash(key)+4] , then the item just before it is deleted </li></ul><ul><li>How will probe determine that the “hole” does not indicate the item is not in the array? </li></ul><ul><li>Have three states for each location </li></ul><ul><ul><li>Occupied </li></ul></ul><ul><ul><li>Empty (never used) </li></ul></ul><ul><ul><li>Deleted (previously used) </li></ul></ul>
    66. 66. Clustering <ul><li>One problem with linear probing technique is the tendency to form “clusters”. </li></ul><ul><li>A cluster is a group of items not containing any open slots </li></ul><ul><li>The bigger a cluster gets, the more likely it is that new values will hash into the cluster, and make it ever bigger. </li></ul><ul><li>Clusters cause efficiency to degrade. </li></ul>
    67. 67. Quadratic Probing <ul><li>Quadratic probing uses different formula: </li></ul><ul><ul><li>Use F(i) = i 2 to resolve collisions </li></ul></ul><ul><ul><li>If hash function resolves to H and a search in cell H is inconclusive, try H + 1 2 , H + 2 2 , H + 3 2 , … </li></ul></ul><ul><li>Probe array[hash(key)+1 2 ], then array[hash(key)+2 2 ], then array[hash(key)+3 2 ], and so on </li></ul><ul><ul><li>Virtually eliminates primary clusters </li></ul></ul>
    68. 68. Collision resolution: chaining <ul><li>Each table position is a linked list </li></ul><ul><li>Add the keys and entries anywhere in the list (front easiest) </li></ul>4 10 123 No need to change position! key entry key entry key entry key entry key entry
    69. 69. Collision resolution: chaining <ul><li>Advantages over open addressing: </li></ul><ul><ul><li>Simpler insertion and removal </li></ul></ul><ul><ul><li>Array size is not a limitation </li></ul></ul><ul><li>Disadvantage </li></ul><ul><ul><li>Memory overhead is large if entries are small. </li></ul></ul>4 10 123 key entry key entry key entry key entry key entry
    70. 70. Applications of Hashing <ul><li>Compilers use hash tables to keep track of declared variables (symbol table). </li></ul><ul><li>A hash table can be used for on-line spelling checkers — if misspelling detection (rather than correction) is important, an entire dictionary can be hashed and words checked in constant time. </li></ul>
    71. 71. Applications of Hashing <ul><li>Game playing programs use hash tables to store seen positions, thereby saving computation time if the position is encountered again. </li></ul><ul><li>Hash functions can be used to quickly check for inequality — if two elements hash to different values they must be different. </li></ul>
    72. 72. When is hashing suitable? <ul><li>Hash tables are very good if there is a need for many searches in a reasonably stable table. </li></ul><ul><li>Hash tables are not so good if there are many insertions and deletions, or if table traversals are needed — in this case, AVL trees are better. </li></ul><ul><li>Also, hashing is very slow for any operations which require the entries to be sorted </li></ul><ul><ul><li>e.g. Find the minimum key </li></ul></ul>

    ×