AGENDADictionaries, Symbol table and their implementation Series_VIT University TanmaySinha_Student SeminarWhat is Hashing…..Why Hashing????ComponentsComparison of techniquesTime ComplexityExamples
DICTIONARIES Real time examples of dictionaries Spelling Checker Series_VIT University TanmaySinha_Student Seminar Symbol tables generated by assemblers and compilers Routing tables used in networking components(for DNS lookup)
SYMBOL TABLEA MODIFIED DICTIONARY Data structure that associates a value with key Basic operations allowed Series_VIT University TanmaySinha_Student Seminar Implemented using1)Arrays(Unordered/Ordered)-O(n), O(n) /O(lg n)2)Linked List(Ordered/Unordered)-O(n)3)Binary Search Trees-O(lg n)4)HASHING….!!!! THE “DREADED” TAG of TIME COMPLEXITY of an algorithm..!!!!!
UNDERSTANDING HASHING ArraysHash Table Example Design an algorithm for printing the Series_VIT University TanmaySinha_Student Seminar 1st repeated character, if there are duplicate elements in it……!!!!!! Possible Solutions From Brute Force Approach to a better solution IF ARRAYS ARE THERE……WHY HASHING…????? Map Keys to locations…!!!
COMPONENTS IN HASHING Hash Table1)Generalization of an array Series_VIT University TanmaySinha_Student Seminar2)Direct addressing3)ProblemsLess Locations and more possiblekeysanalogous to VIRTUAL MEMORY concept Basically , a hash table is a data structure that stores the keys and their associated values!!!
COMPONENTS IN HASHING…CONTD Hash Function1)Transform the key to index, ‘k’ to ‘h(k)’….therebyreducing range of array indices!! Series_VIT University TanmaySinha_Student Seminar2)Characteristics of Good Hash fn Minimize collision Be quick and easy to compare Distribute key values evenly in the hash table Use all the information provided in the key Have a high load factor for a given set of keys
COMPONENTS IN HASHING…CONTD DEFINING TERMS1. Load Factor No. of elements in hash Series_VIT University TanmaySinha_Student Seminar table/hash table size=n/m2. Collisions2 records stored in same memory location What if the keys are non-integers…??? Choice of x=33,37,39,41 gives atmost 6 collisions on a vocabulary of 50000 elglish words!!!!!!!!
COLLISION RESOLUTIONTECHNIQUES Process of finding an alternate location Direct Chaining- array of linked lists – Series_VIT University TanmaySinha_Student Seminar Separate chaining Open Addressing – array based – Linear Probing, Quadratic probing , Double Hashing
CHAINING Slot ‘x’ contains a pointer(reference) to head of the list of all the stored elements that hash to ‘x’ Analogous to adjacency matrix Series_VIT University TanmaySinha_Student Seminar representation of graphs Doubly Linked list preferable Given the node’s address, it helps to delete quickly(takes an i/p element ‘x’ and not it’s key ‘k’) Worst case behaviour is terribleall ‘n’ keys hash to the same slot,creating a list of length ‘n’ Avg. Case behaviour can be improved , if we assume that any given element in equally likely to hash into any of the table slotsSIMPLE UNIFORM HASHING!!!!
LINEAR PROBING Search Sequentially If location occupied, check next location Restrictionno. of elements inserted into the table < Series_VIT University TanmaySinha_Student Seminar table size Fn. For rehashingH(Key)= (n+1) % tablesize Problems – Clustering!!! Importance of Tablesizeshould be prime,should not be a power of 2 PROBLEM IN DELETION->use of tombstones!!!!
QUADRATIC PROBING Our main requirement now is to eliminate CLUSTERING problem Series_VIT University TanmaySinha_Student Seminar Instead of step size 1 , if the location is occupied check at locations i+12 , i+22 …… Fn. For rehashingH(Key)= (n+k2 ) % tablesize
DOUBLE HASHING Reduces Clustering in a better way. Use of a 2nd hash function h2(offset), such that h2!=0 Series_VIT University TanmaySinha_Student Seminar and h2!=h1 Concept First probe at location h1 If it’s occupied, probe at location (probe+k*offset)(h1+h2) , (h1+2*h2)……. Specialized case is Linear Probing offset is 1 If Size of table is prime, then the technique ensures we look at all table locations.
EXAMPLE0 H1(key)= key% 111 H2(key)=7-(key%7)2 58 % 11=3 Series_VIT University TanmaySinha_Student Seminar3 58 14 % 11=33+7=104 91% 11=33+73+2*7 %11= 65 25%11=33+33+2*3=96 917 (key%7) lies between 08 and 6, so that h2 always9 25 lies between 1 and 710 14
COMPARISONLinear Probing Quadratic probing Double HashingFastest amongst three Easier to implement and Makes more efficient use deploy of memory Series_VIT University TanmaySinha_Student SeminarUses few probes Uses extra memory for Uses few probes but links + does not probe all takes more time table locationsProblem of Primary Problem of Secondary More complicated toClustering Clustering implementInterval between probes Interval between probes Interval between probesis fixed – often at 1 increases proportional to is computed by another hash value hash function
HOW DOES HASHING GET O(1)COMPLEXITY??? Each block(may be a linked list) on the avg. stores max. no. of elements less than the “Load Factor(lf)” Generally “Load Factor” is constant So,searching time Series_VIT University TanmaySinha_Student Seminar becomes constant Rehashing the elements with bigger hash table size , if avg. no. of elements in block is > Load Factor Access time of table depends on Load factor, which in-turn depends on Hash Function Unsuccessful/Successful Search For chaining.Total time = O(1+lf), including time req. to compute h(k) Unsuccessful/Successful Search For Probing.Total time = O(1/(1+lf)), including time req. to compute h(k)
EXTRA POINTS Static Hashing data is staticset of keys fixed ExampleSet of reserved words in a programming Series_VIT University TanmaySinha_Student Seminar language, set of file names on CD-ROM Dynamic Hashingkeys can change dynamically. Example Cache design, Hash functions in Cryptography
A ONE-WAY HASH FUNCTION TAKES VARIABLE-LENGTH INPUT—IN THISCASE, A MESSAGE OF ANY LENGTH, EVEN THOUSANDS OR MILLIONS OFBITS—AND PRODUCES A FIXED-LENGTH OUTPUT; SAY, 160-BITS(MESSAGE DIGEST) hash function Series_VIT University TanmaySinha_Student Seminar plaintext digest signed with private key message digest plaintext + signature private key use for signing
PROBLEM 1Can you Give an algorithm for finding the 1st nonrepeated character in the string????? For e.g, the1st non repeated character in the string “abzddab”is ‘z’ Series_VIT University TanmaySinha_Student Seminar Brute Force approach Improvement using For each character in the hash tables string, scan the remaining Create a hash table by string….If that character reading all characters in i/p doesn’t appear, we’re done string and keep their with the solution, else we count. move to the next character After creating hash table, O(n2 ) just read the hash table entries to find out, which element has count = 1 O(n)
PROBLEM 2 Given an array of ‘n’ elements. Find 2 elements in the array whose sum is equal to given element ‘K’ Alternative Approach Brute ForceO(n2 ) Series_VIT University TanmaySinha_Student Seminar ObejctiveA[x]+A[y]=K Improving Time ComplexityO(nlgn) Insert A[x] into hash table. Before moving to next Maintain 2 indices ‘low=0’ element,check whether K- and ‘high=n-1’. A[x] also exists in hash Compute A[low]+A[high] table. If sum is < K, decrement Existence of such a no. ‘high’ , else increment ‘low’ means that we are able to If sum = K, that’s the find the indices. solution…BINGO!!! Else,proceed to next i/p element. O(n)
TanmaySinha_Student SeminarSeries_VIT University THANK YOU FOR PATIENT LISTENING!!!