Application of hashing in better alg design tanmay

TanmaySinha_Student Seminar Series_VIT
University

AGENDA
Dictionaries,
Symbol table and their
implementation

Series_VIT University
TanmaySinha_Student Seminar
What is Hashing…..Why
Hashing????
Components
Comparison of techniques
Time Complexity
Examples

DICTIONARIES
 Real time examples of dictionaries
 Spelling Checker

 Symbol tables generated by assemblers and
compilers
 Routing tables used in networking
components(for DNS lookup)

SYMBOL TABLEA MODIFIED DICTIONARY
 Data structure that associates a value with
key
 Basic operations allowed

 Implemented using

1)Arrays(Unordered/Ordered)-O(n), O(n) /O(lg n)
2)Linked List(Ordered/Unordered)-O(n)
3)Binary Search Trees-O(lg n)
4)HASHING….!!!!
 THE “DREADED” TAG of TIME
COMPLEXITY of an algorithm..!!!!!

UNDERSTANDING HASHING
 ArraysHash Table
 Example Design an algorithm for printing the

1st repeated character, if there are duplicate
elements in it……!!!!!!
 Possible Solutions From Brute Force Approach to a
better solution
 IF ARRAYS ARE THERE……WHY
HASHING…?????
 Map Keys to locations…!!!

COMPONENTS IN HASHING
 Hash Table
1)Generalization of an array

2)Direct addressing
3)ProblemsLess Locations and more possible
keysanalogous to VIRTUAL MEMORY concept
 Basically , a hash table is a data structure that
stores the keys and their associated values!!!

COMPONENTS IN HASHING…CONTD
 Hash Function
1)Transform the key to index, ‘k’ to ‘h(k)’….thereby
reducing range of array indices!!

2)Characteristics of Good Hash fn
 Minimize collision

 Be quick and easy to compare

 Distribute key values evenly in the hash table

 Use all the information provided in the key

 Have a high load factor for a given set of keys

COMPONENTS IN HASHING…CONTD
 DEFINING TERMS
1. Load Factor No. of elements in hash

table/hash table size=n/m
2. Collisions2 records stored in same memory
location
 What if the keys are non-integers…???

 Choice of x=33,37,39,41 gives atmost 6
collisions on a vocabulary of 50000 elglish
words!!!!!!!!

COLLISION RESOLUTION
TECHNIQUES
 Process of finding an alternate location
 Direct Chaining- array of linked lists –

Separate chaining
 Open Addressing – array based – Linear
Probing, Quadratic probing , Double Hashing

CHAINING
 Slot ‘x’ contains a pointer(reference) to head
of the list of all the stored elements that hash to
‘x’
 Analogous to adjacency matrix

representation of graphs
 Doubly Linked list preferable Given the
node’s address, it helps to delete quickly(takes an
i/p element ‘x’ and not it’s key ‘k’)
 Worst case behaviour is terribleall ‘n’ keys
hash to the same slot,creating a list of length ‘n’
 Avg. Case behaviour can be improved , if we
assume that any given element in equally likely
to hash into any of the table slotsSIMPLE
UNIFORM HASHING!!!!

LINEAR PROBING
 Search Sequentially If location occupied, check
next location
 Restrictionno. of elements inserted into the table <

table size
 Fn. For rehashing

H(Key)= (n+1) % tablesize
 Problems – Clustering!!!

 Importance of Tablesizeshould be prime,should
not be a power of 2
 PROBLEM IN DELETION->use of tombstones!!!!

EXAMPLE
0  H(key)= (key )% 13
1  18 % 13=5
2 41  41 % 13=2

3  22% 13=9
4
 44%13=55+1=6
5 18
 59%13=7
6 44
 32%13=66+16+1+1=8
7 59
8 32  31%13=5+1+1+1+1+1=10

9 22  73%13=8+1+1+1=11
10 31
11 73
12

QUADRATIC PROBING
 Our main requirement now is to eliminate
CLUSTERING problem

 Instead of step size 1 , if the location is
occupied check at locations i+12 , i+22 ……
 Fn. For rehashing

H(Key)= (n+k2 ) % tablesize

EXAMPLE
0  H(key)= (key+k2 )% 11
1  31 % 11=9

2 2  19 % 11=8

3 13  2 % 11=2

 13%11=214%11=3
4 25
 25%11=326%11=4
5 5
 24%11=225%11=328%11
6 24
=6
7 9
 21%11=10
8 19  9%11=99+12 , 9+22 , 9+32
9 31 % 11=7
10 21

DOUBLE HASHING
 Reduces Clustering in a better way.
 Use of a 2nd hash function h2(offset), such that h2!=0

and h2!=h1
 Concept

 First probe at location h1

 If it’s occupied, probe at location
(probe+k*offset)(h1+h2) , (h1+2*h2)…….
 Specialized case is Linear Probing offset is 1

 If Size of table is prime, then the technique
ensures we look at all table locations.

EXAMPLE
0
 H1(key)= key% 11
1  H2(key)=7-(key%7)
2  58 % 11=3

3 58  14 % 11=33+7=10

4  91% 11=33+73+2*7
%11= 6
5
 25%11=33+33+2*3=9
6 91

7
 (key%7) lies between 0
8 and 6, so that h2 always
9 25 lies between 1 and 7
10 14

COMPARISON
Linear Probing Quadratic probing Double Hashing

Fastest amongst three Easier to implement and Makes more efficient use
deploy of memory

Uses few probes Uses extra memory for Uses few probes but
links + does not probe all takes more time
table locations

Problem of Primary Problem of Secondary More complicated to
Clustering Clustering implement

Interval between probes Interval between probes Interval between probes
is fixed – often at 1 increases proportional to is computed by another
hash value hash function

HOW DOES HASHING GET O(1)
COMPLEXITY???
 Each block(may be a linked list) on the avg. stores max. no.
of elements less than the “Load Factor(lf)”
Generally “Load Factor” is constant So,searching time


becomes constant
 Rehashing the elements with bigger hash table size , if
avg. no. of elements in block is > Load Factor
 Access time of table depends on Load factor, which in-turn
depends on Hash Function
 Unsuccessful/Successful Search For chaining.Total
time = O(1+lf), including time req. to compute h(k)
 Unsuccessful/Successful Search For Probing.Total
time = O(1/(1+lf)), including time req. to compute h(k)

EXTRA POINTS
 Static Hashing data is staticset of keys fixed
 ExampleSet of reserved words in a programming

language, set of file names on CD-ROM
 Dynamic Hashingkeys can change dynamically.

 Example Cache design, Hash functions in
Cryptography

A ONE-WAY HASH FUNCTION TAKES VARIABLE-LENGTH INPUT—IN THIS
CASE, A MESSAGE OF ANY LENGTH, EVEN THOUSANDS OR MILLIONS OF
BITS—AND PRODUCES A FIXED-LENGTH OUTPUT; SAY, 160-
BITS(MESSAGE DIGEST)

hash function

plaintext
digest signed
with private
key

message digest plaintext
+
signature

private key
use for
signing

PROBLEM 1
Can you Give an algorithm for finding the 1st non
repeated character in the string????? For e.g, the
1st non repeated character in the string “abzddab”
is ‘z’

 Brute Force approach  Improvement using
For each character in the hash tables
string, scan the remaining  Create a hash table by
string….If that character reading all characters in i/p
doesn’t appear, we’re done string and keep their
with the solution, else we count.
move to the next character  After creating hash table,
 O(n2 ) just read the hash table
entries to find out, which
element has count = 1
 O(n)

PROBLEM 2
Given an array of ‘n’ elements. Find 2
elements in the array whose sum is equal to
given element ‘K’
 Alternative Approach
Brute ForceO(n2 )


 ObejctiveA[x]+A[y]=K
 Improving Time
ComplexityO(nlgn)  Insert A[x] into hash table.
 Before moving to next
 Maintain 2 indices ‘low=0’
element,check whether K-
and ‘high=n-1’.
A[x] also exists in hash
 Compute A[low]+A[high] table.
 If sum is < K, decrement  Existence of such a no.
‘high’ , else increment ‘low’ means that we are able to
 If sum = K, that’s the find the indices.
solution…BINGO!!!  Else,proceed to next i/p
element.
 O(n)

THANK YOU FOR PATIENT
LISTENING!!!

Application of hashing in better alg design tanmay

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Application of hashing in better alg design tanmay

Similar to Application of hashing in better alg design tanmay (20)

Recently uploaded

Recently uploaded (20)

Application of hashing in better alg design tanmay