1.
Hashing
Department of Computer Science
Islamia College Univerisity Peshawar
Fall 2012 Semester
BCS course: CS 00 Analysis of Algorithms
Course Instructor: Mr. Zahid
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
2.
Dictionary
Holds n records
What data structure should be used to implement T?
12/30/13
Lecture #9 Adapted from slides by
Dr Onaiza Maqbol
Wednesday, March 18, 2009
3.
Hashing
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
4.
Direct Addressing
Assumptions
The set of keys
Keys are distinct
Create a table T[0..u-1]
Benefit
Each operation takes constant time
Drawbacks
The range of keys can be large
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
5.
Hashing
Solution
12/30/13
Use a hash function h to map the universe U of all keys into {0, 1, …, m–
1}
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
6.
Hash Table
The mapped keys are stored into table called hash table
The table consists of m cells
A hash table requires much less storage than a direct address
table
With direct addressing, an element in key k is stored in slot k,
with hashing, this element is stored in slot h(k)
So the hash function h : U → {0, 1, …., m-1}
h(k) is also called hash value of key k
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
7.
Hashing Functions - Modulo Function
Several functions can be used to map keys into a set of integers. The
choice is made on the basis of amount of computation time required,
and simplicity of the computational steps. A common choice is a
modulo function h(x) defined as:
h(k) = k mod m
where k is the key, m is some positive integer and mod denotes the
modulus operator which computes the remainder of key k divided by m.
It follows that the hash function h(x) maps the set of keys {k1, k2, k3,
…….kn} into a set of integers {0,1,2,……m-1}
In essence, the modulo function is used to create a hash table of size m
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
8.
Modulo Function (contd…)
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
9.
Hashing Functions - Multiplication
Method
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
10.
Hashing of Strings
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
11.
ASCII Sum Method
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
12.
Radix Method
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
13.
Universal Hashing
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
14.
Universal Hashing (contd…)
s
Ha,b(k)=((ak+b)modp)mod m where p is large enough so that every possible key k is in the range 0
to p-1, inclusive, and 0<a<p and 0<=b<p
belongs to the the family of universal functions
mod 6
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
15.
Perfect Hashing
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
16.
Perfect Hashing
0
1
2
m2
a2
b2
4
10
18
S2
60
75
3
…
8
12/30/13
Using perfect hashing to store {10, 22, 37, 40, 60, 70, 75}, outer hash function
is Ha,b(k)=((ak+b)modp)mod m where a=3, b=42, p=101, and m=9. e.g.
h(75)=2. Since h2(75)=1, 75 is stored in slot1 of secondary hash table
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
17.
Collisions
Two or more than two keys may hash to the same slot
When a record to be inserted maps to an already occupied slot in
T, a collision occurs
Can we avoid collisions altogether?
Not if |U| > m
We need a method to resolve collisions that occur
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
18.
Collisions
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
19.
Collision Resolution
Two basic approaches to collision resolution are called chained
hashing and open address hashing
Chained Hashing: In chained hashing the elements of a hash
table are stored in a set of linked lists.
All colliding elements are kept in one linked list.
The list head pointers are usually stored in an array.
Chained hashing is also known as open hashing
Open Address Hashing: In open address hashing, the hashed
keys are stored in the hash table itself.
The colliding keys are allocated distinct cells in the table.
Open address hashing is also referred to as closed hashing
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
20.
Collision Resolution by Chaining
Records in the same slot are linked into a list
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
21.
Collision Resolution by Chaining (contd…)
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
22.
Analysis of Hashing with Chaining
How long does it take to search for an element with a given key?
Let n be the number of keys in the table, and let m be the number
of slots
Define the load factor of T to be α = n/m = average number of
keys per slot
Analysis is in terms of α, which can be less than, equal to, or
greater than 1
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
23.
Worst Hashing - Searching
All hash keys are mapped to a single list.
This situation may be referred to as worst distribution of hash keys
In practice, this extreme situation may not arise, but nevertheless, possibility
does exist
Worst case time for searching is thus θ(n), plus time to compute the hash
function
The best search time is θ(1), since the key will be found in the front node
On an average, half the list will be examined. Thus, average search time is θ(n)
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
24.
Worst Hashing - Insertion
The worst case running time for insertion is θ(1)
The assumption is that the key is not already present in the table
To check presence, search of the key is required – As just
mentioned, worst case time of searching is θ(n)
Thus worst case running time of insertion is θ(n)
Average cost running time of insertion is also θ(n)
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
25.
Simple Uniform Hashing - Searching
The keys are uniformly distributed among all the linked lists i.e. it is
assumed that any given element is equally likely to hash into any of the
m slots
Let us denote length of the list T[j] for j= 0,1,…., m-1 by nj so that
n=n0+n1+…+nm-1 and the average value of nj=E[nj] = α = n/m
We assume that hash value h(k) can be computed in O(1) time
So time required to search for an element with key k depends linearly on
the length nh(k) of the list T[h(k)]
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
26.
Simple Uniform Hashing - Searching
Two cases
Unsuccessful search
Successful search
Unsuccessful search
Expected time to search unsuccessfully for a key k is the expected time to search to
the end of list T[h(k)], which has the expected length E[nh(k)]= α
Thus total time required is θ(1+ α)
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
27.
Simple Uniform Hashing - Insertion
In order to find average time for inserting a key, let us consider the case
when kth key is inserted. At that stage, the list has already k-1 keys
distributed uniformly over m linked lists. Thus, prior to insertion of kth
key, the average length of each list is (k-1)/m, as shown in the diagram
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
28.
Simple Uniform Hashing - Insertion
The insertion of new key would require probing of (k-1)/m keys plus the cost of
adding new key.
Thus, the overall cost of insertion of kth key is 1+(k-1)/m, assuming that each
operation consumes unit time 1.
The expected cost of inserting a key is obtained by summing over all possible
values of k. Thus, the expected cost I is given by
The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α)
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
29.
Simple Uniform Hashing - Searching
Successful search
We assume that element x to be searched is equally likely to be any
of the n elements stored in the table
The number of elements examined is one more than number of
elements that appear before x is x’s list
Elements before x in the list were all placed after x was inserted
Total time required for a successful search is 1+ α /2- α /2n = θ(1+
α)
If n=O(m), α=n/m=O(m)/m=1
Thus searching takes constant time on average
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
30.
Open Addressing
All elements are stored in the hash table itself
In open addressing, the hash table can fill up, so that no further
insertions can be made
The load factor α can never exceed 1
Advantage is that open addressing avoids pointers altogether
Extra memory freed provides hash table with a larger number of
slots for the same amount of memory
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
31.
Insertion
We successively examine or probe the hash table until we find an
empty slot in which to put the key
The sequence of positions probed depends upon the key being
inserted
To determine which points to probe, we extend hash functions to
include the probe number as a second input. Thus hash function
becomes:
h : U x {0, 1, …., m-1} → {0, 1, …., m-1}
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
32.
Pseudo code
HASH-INSERT(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3.
if T[j]=NIL
4.
then T[j]←k
5.
return j
6.
else i ← i+1
7.
until i=m
8. Error “Table full”
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
33.
Linear Probing
In linear probing the hashed key is incremented by an integer value. In
general the hash function is defined as function
h(k,i)=( h’(k)+ i) mod m,
where h’(k) is an auxiliary hash function and m is the table size.
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
34.
Linear Probing (contd…)
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
35.
Searching
HASH-SEARCH(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3.
if T[j]=k
4.
then return j
5.
i ← i+1
6.
until T[j]=NIL or i=m
7. Return NIL
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
36.
Quadratic Probing
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
37.
Quadratic Probing
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
38.
Quadratic Probing
12/30/13
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Wednesday, March 18, 2009
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment