Application of hashing in better alg design tanmay


Published on

Presented at Student Seminar Series_VIT University

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Application of hashing in better alg design tanmay

  1. 1. TanmaySinha_Student Seminar Series_VITUniversity
  2. 2. AGENDADictionaries, Symbol table and their implementation Series_VIT University TanmaySinha_Student SeminarWhat is Hashing…..Why Hashing????ComponentsComparison of techniquesTime ComplexityExamples
  3. 3. DICTIONARIES Real time examples of dictionaries Spelling Checker Series_VIT University TanmaySinha_Student Seminar Symbol tables generated by assemblers and compilers Routing tables used in networking components(for DNS lookup)
  4. 4. SYMBOL TABLEA MODIFIED DICTIONARY Data structure that associates a value with key Basic operations allowed Series_VIT University TanmaySinha_Student Seminar Implemented using1)Arrays(Unordered/Ordered)-O(n), O(n) /O(lg n)2)Linked List(Ordered/Unordered)-O(n)3)Binary Search Trees-O(lg n)4)HASHING….!!!! THE “DREADED” TAG of TIME COMPLEXITY of an algorithm..!!!!!
  5. 5. UNDERSTANDING HASHING ArraysHash Table Example Design an algorithm for printing the Series_VIT University TanmaySinha_Student Seminar 1st repeated character, if there are duplicate elements in it……!!!!!! Possible Solutions From Brute Force Approach to a better solution IF ARRAYS ARE THERE……WHY HASHING…????? Map Keys to locations…!!!
  6. 6. COMPONENTS IN HASHING Hash Table1)Generalization of an array Series_VIT University TanmaySinha_Student Seminar2)Direct addressing3)ProblemsLess Locations and more possiblekeysanalogous to VIRTUAL MEMORY concept Basically , a hash table is a data structure that stores the keys and their associated values!!!
  7. 7. COMPONENTS IN HASHING…CONTD Hash Function1)Transform the key to index, ‘k’ to ‘h(k)’….therebyreducing range of array indices!! Series_VIT University TanmaySinha_Student Seminar2)Characteristics of Good Hash fn Minimize collision Be quick and easy to compare Distribute key values evenly in the hash table Use all the information provided in the key Have a high load factor for a given set of keys
  8. 8. COMPONENTS IN HASHING…CONTD DEFINING TERMS1. Load Factor No. of elements in hash Series_VIT University TanmaySinha_Student Seminar table/hash table size=n/m2. Collisions2 records stored in same memory location What if the keys are non-integers…??? Choice of x=33,37,39,41 gives atmost 6 collisions on a vocabulary of 50000 elglish words!!!!!!!!
  9. 9. COLLISION RESOLUTIONTECHNIQUES Process of finding an alternate location Direct Chaining- array of linked lists – Series_VIT University TanmaySinha_Student Seminar Separate chaining Open Addressing – array based – Linear Probing, Quadratic probing , Double Hashing
  10. 10. CHAINING Slot ‘x’ contains a pointer(reference) to head of the list of all the stored elements that hash to ‘x’ Analogous to adjacency matrix Series_VIT University TanmaySinha_Student Seminar representation of graphs Doubly Linked list preferable Given the node’s address, it helps to delete quickly(takes an i/p element ‘x’ and not it’s key ‘k’) Worst case behaviour is terribleall ‘n’ keys hash to the same slot,creating a list of length ‘n’ Avg. Case behaviour can be improved , if we assume that any given element in equally likely to hash into any of the table slotsSIMPLE UNIFORM HASHING!!!!
  11. 11. LINEAR PROBING Search Sequentially If location occupied, check next location Restrictionno. of elements inserted into the table < Series_VIT University TanmaySinha_Student Seminar table size Fn. For rehashingH(Key)= (n+1) % tablesize Problems – Clustering!!! Importance of Tablesizeshould be prime,should not be a power of 2 PROBLEM IN DELETION->use of tombstones!!!!
  12. 12. EXAMPLE 0  H(key)= (key )% 13 1  18 % 13=5 2 41  41 % 13=2 Series_VIT University TanmaySinha_Student Seminar 3  22% 13=9 4  44%13=55+1=6 5 18  59%13=7 6 44  32%13=66+16+1+1=8 7 59 8 32  31%13=5+1+1+1+1+1=10 9 22  73%13=8+1+1+1=11 10 31 11 73 12
  13. 13. QUADRATIC PROBING Our main requirement now is to eliminate CLUSTERING problem Series_VIT University TanmaySinha_Student Seminar Instead of step size 1 , if the location is occupied check at locations i+12 , i+22 …… Fn. For rehashingH(Key)= (n+k2 ) % tablesize
  14. 14. EXAMPLE 0  H(key)= (key+k2 )% 11 1  31 % 11=9 2 2  19 % 11=8 Series_VIT University TanmaySinha_Student Seminar 3 13  2 % 11=2  13%11=214%11=3 4 25  25%11=326%11=4 5 5  24%11=225%11=328%11 6 24 =6 7 9  21%11=10 8 19  9%11=99+12 , 9+22 , 9+32 9 31 % 11=7 10 21
  15. 15. DOUBLE HASHING Reduces Clustering in a better way. Use of a 2nd hash function h2(offset), such that h2!=0 Series_VIT University TanmaySinha_Student Seminar and h2!=h1 Concept First probe at location h1 If it’s occupied, probe at location (probe+k*offset)(h1+h2) , (h1+2*h2)……. Specialized case is Linear Probing offset is 1 If Size of table is prime, then the technique ensures we look at all table locations.
  16. 16. EXAMPLE0  H1(key)= key% 111  H2(key)=7-(key%7)2  58 % 11=3 Series_VIT University TanmaySinha_Student Seminar3 58  14 % 11=33+7=104  91% 11=33+73+2*7 %11= 65  25%11=33+33+2*3=96 917  (key%7) lies between 08 and 6, so that h2 always9 25 lies between 1 and 710 14
  17. 17. COMPARISONLinear Probing Quadratic probing Double HashingFastest amongst three Easier to implement and Makes more efficient use deploy of memory Series_VIT University TanmaySinha_Student SeminarUses few probes Uses extra memory for Uses few probes but links + does not probe all takes more time table locationsProblem of Primary Problem of Secondary More complicated toClustering Clustering implementInterval between probes Interval between probes Interval between probesis fixed – often at 1 increases proportional to is computed by another hash value hash function
  18. 18. HOW DOES HASHING GET O(1)COMPLEXITY??? Each block(may be a linked list) on the avg. stores max. no. of elements less than the “Load Factor(lf)” Generally “Load Factor” is constant So,searching time Series_VIT University TanmaySinha_Student Seminar becomes constant Rehashing the elements with bigger hash table size , if avg. no. of elements in block is > Load Factor Access time of table depends on Load factor, which in-turn depends on Hash Function Unsuccessful/Successful Search For chaining.Total time = O(1+lf), including time req. to compute h(k) Unsuccessful/Successful Search For Probing.Total time = O(1/(1+lf)), including time req. to compute h(k)
  19. 19. EXTRA POINTS Static Hashing data is staticset of keys fixed ExampleSet of reserved words in a programming Series_VIT University TanmaySinha_Student Seminar language, set of file names on CD-ROM Dynamic Hashingkeys can change dynamically. Example Cache design, Hash functions in Cryptography
  20. 20. A ONE-WAY HASH FUNCTION TAKES VARIABLE-LENGTH INPUT—IN THISCASE, A MESSAGE OF ANY LENGTH, EVEN THOUSANDS OR MILLIONS OFBITS—AND PRODUCES A FIXED-LENGTH OUTPUT; SAY, 160-BITS(MESSAGE DIGEST) hash function Series_VIT University TanmaySinha_Student Seminar plaintext digest signed with private key message digest plaintext + signature private key use for signing
  21. 21. PROBLEM 1Can you Give an algorithm for finding the 1st nonrepeated character in the string????? For e.g, the1st non repeated character in the string “abzddab”is ‘z’ Series_VIT University TanmaySinha_Student Seminar Brute Force approach  Improvement using For each character in the hash tables string, scan the remaining  Create a hash table by string….If that character reading all characters in i/p doesn’t appear, we’re done string and keep their with the solution, else we count. move to the next character  After creating hash table, O(n2 ) just read the hash table entries to find out, which element has count = 1  O(n)
  22. 22. PROBLEM 2 Given an array of ‘n’ elements. Find 2 elements in the array whose sum is equal to given element ‘K’  Alternative Approach Brute ForceO(n2 ) Series_VIT University TanmaySinha_Student Seminar  ObejctiveA[x]+A[y]=K Improving Time ComplexityO(nlgn)  Insert A[x] into hash table.  Before moving to next Maintain 2 indices ‘low=0’ element,check whether K- and ‘high=n-1’. A[x] also exists in hash Compute A[low]+A[high] table. If sum is < K, decrement  Existence of such a no. ‘high’ , else increment ‘low’ means that we are able to If sum = K, that’s the find the indices. solution…BINGO!!!  Else,proceed to next i/p element.  O(n)
  23. 23. TanmaySinha_Student SeminarSeries_VIT University THANK YOU FOR PATIENT LISTENING!!!