Your SlideShare is downloading. ×
Hashing in DS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hashing in DS

37
views

Published on

in computer

in computer

Published in: Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
37
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • This lecture illustrates hash tables, using open addressing.
    Before this lecture, students should have seen other forms of a Dictionary, where a collection of data is stored, and each data item has a key associated with it.
  • This lecture introduces hash tables, which are an array-based method for implementing a Dictionary. You should recall that we have seen dictionaries implemented in other ways, for example with a binary search tree. The abstract properties of a dictionary remain the same: We can insert items in the dictionary, and each item has a key associated with it. When we want to retrieve an item, we specify only the key, and the retrieval process finds the associated data.
    What we do now is use an array to implement the dictionary. The array is an array of records. In this example, we could store up to 701 records in the array.
  • Each record in the array contains two parts. The first part is a number that we'll use for the key of the item. We could use something else for the keys, such as a string. But for a hash table, numbers make the most convenient keys.
  • The numbers might be identification numbers of some sort, and the rest of the record contains information about a person. So the pattern that you see here is the same pattern that you've seen in other dictionaries: Each entry in the dictionary has a key (in this case an identifying number) and some associated data.
  • When a hash table is being used as a dictionary, some of the array locations are in use, and other spots are "empty", waiting for a new entry to come along.
    Oftentimes, the empty spots are identified by a special key. For example, if all our identification numbers are positive, then we could use 0 as the Number that indicates an empty spot.
    With this drawing, locations [0], [4], [6], and maybe some others would all have Number=0.
  • In order to insert a new entry, the key of the entry must somehow be converted to an index in the array. For our example, we must convert the key number into an index between 0 and 700. The conversion process is called hashing and the index is called the hash value of the key.
  • There are many ways to create hash values. Here is a typical approach.
    a. Take the key mod 701 (which could be anywhere from 0 to 700).
    So, quick, what is (580,625,685 mod 701) ?
  • Three.
  • So, this new item will be placed at location [3] of the array.
  • The hash value is always used to find the location for the record.
  • Sometimes, two different records might end up with the same hash value.
  • This is called a collision.
    When a collision occurs, the insertion process will move forward through the array until an empty spot is found. Sometimes you will have a second collision...
  • ...and a third collision...
  • But if there are any empty spots, eventually you will reach an empty spot, and the new item is inserted here.
  • The new record is always placed in the first available empty spot, after the hash value.
  • It is fairly easy to search for a particular item based on its key.
  • Start by computing the hash value, which is 2 in this case. Then check location 2. If location 2 has a different key than the one you are looking for, then move forward...
  • ...if the next location is not the one we are looking for, then keep moving forward...
  • Keep moving forward until you find the sought-after key...
  • In this case we find the key at location [5].
  • The data from location [5] can then be copied to to provide the result of the search function.
    What happens if a search reaches an empty spot? In that case, it can
    halt and indicate that the key was not in the hash table.
  • Records can be deleted from a hash table...
  • But the spot of the deleted record cannot be left as an ordinary empty spot, since that would interfere with searches. (Remember that a search can stop when it reaches an empty spot.)
  • Instead we must somehow mark the location as "a location that used to have something here, but no longer does."
    We might do this by using some other special value for the Number field of the record.
    In any case, a search can not stop when it reaches "a location that used to have something here". A search can only stop when it reaches a true empty spot.
  • Transcript

    • 1. SEMINAR ON HASHING PRESENTED TO: PRESENTED BY: MS. MANISHA RUCHIKA(6) RITIKA(11) MCA 1 S.S.D WOMEN’S INSTITUTE OF TECHNOLOGY, BATHINDA (AFFILIATED TO PUNJABI UNIVERSITY,PATIALA)
    • 2. CONTENTS What is hashing?What is hashing? Inserting ,deleting and searching a recordInserting ,deleting and searching a record hash functions and their methodshash functions and their methods CollisionCollision Two ways of collision resolutionTwo ways of collision resolution 1.1. open addressingopen addressing 2.2. ChainingChaining
    • 3. Hashing are a common approach to the storing/searching problem. A collection of data is stored ,and each data item has a key associated with it. Hashing
    • 4. What is a Hash Table ? The simplest kind of hash table is an array of records. This example has 701 records. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] An array of records . . . [ 700]
    • 5. What is a Hash Table ? Each record has a special field, called its key. In this example, the key is a long integer field called Number. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] . . . [ 700] [ 4 ] Number 506643548
    • 6. What is a Hash Table ? The number might be a person's identification number, and the rest of the record has information about the person. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] . . . [ 700] [ 4 ] Number 506643548
    • 7. What is a Hash Table ? When a hash table is in use, some spots contain valid records, and other spots are "empty". [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . .
    • 8. The general idea of using theThe general idea of using the keykey to determine the address of ato determine the address of a record is an excellent idea, but it must be modified so that a greatrecord is an excellent idea, but it must be modified so that a great deal of space is not wasted.deal of space is not wasted. This modification takes the form of a function H from the set K ofThis modification takes the form of a function H from the set K of keys in to the set L of memory addresses.keys in to the set L of memory addresses. H : KH : K  L (hash function)L (hash function) Chop is a technique, in which combine the pieces of key K to formChop is a technique, in which combine the pieces of key K to form the hash address H(k).the hash address H(k). key key1 key2 Key1.1 Key1.2
    • 9. Methods of Hash Functions Division methodDivision method:- Choose a number m larger than the:- Choose a number m larger than the number n of keys in k.number n of keys in k. The hash function H is defined byThe hash function H is defined by H(k)=k(mod m) or H(k)=k(mod m) +1H(k)=k(mod m) or H(k)=k(mod m) +1 k(mod m) denotes the remainder when k is divided by m.k(mod m) denotes the remainder when k is divided by m. Second formula is used when we want the hash addressesSecond formula is used when we want the hash addresses to range from 1 to m. rather than o to m-1to range from 1 to m. rather than o to m-1..
    • 10. Example of division method 68 employees is assigned a unique 4-digit employee68 employees is assigned a unique 4-digit employee number. Suppose L consists of 100 two-digit addresses.number. Suppose L consists of 100 two-digit addresses. 00,01,02,……99. we apply the division method to each of00,01,02,……99. we apply the division method to each of the employee number: 3205, 7148, 2345.the employee number: 3205, 7148, 2345. 1. Choose a prime number m close to 99 such as m=97. then1. Choose a prime number m close to 99 such as m=97. then H(3205)=4 , H(7148)=67 , H(2345)=17.H(3205)=4 , H(7148)=67 , H(2345)=17. 2. H(3205)=4+1=52. H(3205)=4+1=5 H(7148)=67+1=68H(7148)=67+1=68 H(2345)=17+1=18H(2345)=17+1=18
    • 11. Mid square method The key k is squared. Than the hash function H is definedThe key k is squared. Than the hash function H is defined byby H(k)=lH(k)=l Here l is obtained by deleting digits from both ends of k*k.Here l is obtained by deleting digits from both ends of k*k. Example:Example: k: 3205k: 3205 71487148 23452345 k*k: 10 272 025k*k: 10 272 025 51 093 90451 093 904 5 499 0255 499 025 H(k): 72H(k): 72 9393 9999
    • 12. Folding method The key k is partitioned into a number of parts, k1,k2,The key k is partitioned into a number of parts, k1,k2, ….kr.….kr. Parts are added together, ignoring the last carry.Parts are added together, ignoring the last carry. H(k)=k1+k2+……+krH(k)=k1+k2+……+kr Example:-Example:- H(3205)=32+05=37H(3205)=32+05=37 H(7148)=71+48=19 (the leading digit 1 in this is ignored).H(7148)=71+48=19 (the leading digit 1 in this is ignored). H(2345)=23+45=68H(2345)=23+45=68
    • 13. Inserting a New Record In order to insert a new record, the key must somehow be converted to an array index. The index is called the hash value of the key. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685
    • 14. Inserting a New Record Typical way create a hash value: [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 (Number mod 701) What is (580625685 mod 701) ?
    • 15. Inserting a New Record Typical way to create a hash value: [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 (Number mod 701) What is (580625685 mod 701) ? 3
    • 16. Inserting a New Record The hash value is used for the location of the new record. Number 580625685 [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . [3]
    • 17. Inserting a New Record The hash value is used for the location of the new record. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685
    • 18. Collisions Here is another new record to insert, with a hash value of 2. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 My hash value is [2].
    • 19. Collisions This is called a collision, because there is already another valid record at [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 When a collision occurs, move forward until you find an empty spot.
    • 20. Collisions This is called a collision, because there is already another valid record at [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 When a collision occurs, move forward until you find an empty spot.
    • 21. Collisions This is called a collision, because there is already another valid record at [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 When a collision occurs, move forward until you find an empty spot.
    • 22. Collisions This is called a collision, because there is already another valid record at [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 The new record goes in the empty spot.
    • 23. Searching for a Key The data that's attached to a key can be found fairly quickly. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868
    • 24. Searching for a Key Calculate the hash value. Check that location of the array for the key. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Not me.
    • 25. Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Not me.
    • 26. Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Not me.
    • 27. Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Yes!
    • 28. Searching for a Key When the item is found, the information can be copied to the necessary location. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Number 701466868 My hash value is [2]. Yes!
    • 29. Deleting a Record Records may also be deleted from a hash table. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Please delete me.
    • 30. Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" since that could interfere with searches. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868
    • 31. Deleting a Record [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 Number 701466868 Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" . The location must be marked in some special way so that a search can tell that the spot used to have something in it.
    • 32. COLLISION RESOLUTION There are two general ways of resolving collisions.There are two general ways of resolving collisions. The particular procedure that one chooses depends on manyThe particular procedure that one chooses depends on many factors. One important factor is the ratio of the number n of keysfactors. One important factor is the ratio of the number n of keys in K(which is the number of the record in F) to the number m ofin K(which is the number of the record in F) to the number m of hash addresses in L. this ratio, = n/m is called load factor.hash addresses in L. this ratio, = n/m is called load factor. n= number of filled records.n= number of filled records. m= total number of memory locations.m= total number of memory locations.
    • 33. The efficiency ofThe efficiency of hash functionhash function with a collision resolutionwith a collision resolution procedure is measured by the average number ofprocedure is measured by the average number of probes(key comparisons) needed to find the location of theprobes(key comparisons) needed to find the location of the record with a given key k.the efficiency depends mainly onrecord with a given key k.the efficiency depends mainly on the load factor . The following two quantities:the load factor . The following two quantities: S( )S( ) = average number of probes for a successful search.= average number of probes for a successful search. U( ) = average number of probes for an unsuccessful search.U( ) = average number of probes for an unsuccessful search.
    • 34. Open addressing If the home slot for the record that is being inserted is alreadyIf the home slot for the record that is being inserted is already occupied, then simply choose a different location with in the tableoccupied, then simply choose a different location with in the table.. But…how do we choose this alternate location? The technique must be reproducible, and on average be cheap.
    • 35. Linear Probing Linear probing involves simply walking down the table until anLinear probing involves simply walking down the table until an empty slot is foundempty slot is found..
    • 36. Drawbacks of linear probing The major drawbacks of linear probing is that, as the tableThe major drawbacks of linear probing is that, as the table becomes about half full, here is a tendency towardbecomes about half full, here is a tendency toward clusteringclustering.. The sequential searches needed to find an empty positionThe sequential searches needed to find an empty position become longer and longer.become longer and longer. Example:- if a new insertion hashes to location b, then it willExample:- if a new insertion hashes to location b, then it will go there, but if it hashes to location a, then it will also gogo there, but if it hashes to location a, then it will also go into b.into b. a b c d e
    • 37. The problem of clustering is essentially one ofThe problem of clustering is essentially one of instabilityinstability.. If a few keys happen randomly to be near each other, thenIf a few keys happen randomly to be near each other, then it becomes more and more likely that other keys will joinit becomes more and more likely that other keys will join them, and the distribution will become progressively morethem, and the distribution will become progressively more unbalancedunbalanced..
    • 38. Avoid the problem of clustering Quadratic probing:-Quadratic probing:- suppose a record R with key k has thesuppose a record R with key k has the hash address H(k)=h. then, instead of searching the locationshash address H(k)=h. then, instead of searching the locations with addresses h, h+1, h+2,…. We linearly search thewith addresses h, h+1, h+2,…. We linearly search the locations with addresses.locations with addresses. h, h+1, h+4, h+9, h+16,….h+i^2…h, h+1, h+4, h+9, h+16,….h+i^2… If the number m of locations in the table T is a prime number,If the number m of locations in the table T is a prime number, then the above sequence will access half of the locations in T.then the above sequence will access half of the locations in T.
    • 39. Double Hashing A second hash function H’ is used for resolving aA second hash function H’ is used for resolving a collision. Suppose a record R with key k has the hashcollision. Suppose a record R with key k has the hash addresses H(k)=h and H’(k)= h’ m then we linearlyaddresses H(k)=h and H’(k)= h’ m then we linearly search the locations with addresses.search the locations with addresses. h, h+h’, h+2h’, h+3h’,….h, h+h’, h+2h’, h+3h’,…. if m is a prime number , then the above sequence all theif m is a prime number , then the above sequence all the locations in the table T.locations in the table T.
    • 40. Chaining Design the table so that each slot is actually a container that can hold multiple records. Here, the “chains" are linked lists which could hold any number of colliding records. Alternatively each table slot could be large enough to store several records directly…in that case the slot may overflow, requiring a fallback…
    • 41. EXAMPLE LIST 8 3 0 5 7 0 0 2 0 0 6 INFO LINK A 0 B 0 C 0 D 0 E 1 X 4 Y 0 Z 0 10 11 1 2 3 4 6 5 7 8 9 10