Sunawar Khan
Islamia University of Bahawalpur
Bahawalnagar Campus
Analysis o f Algorithm
‫خان‬ ‫سنور‬ Algorithm Analysis
Collision
Hash Table
Direct Address Table
Hash Functions
Collision Resolution
Contents
‫خان‬ ‫سنور‬ Algorithm Analysis
Dictionary
• Dictionary:
• Dynamic-set data structure for storing items indexed using keys.
• Supports operations Insert, Search, and Delete.
• Applications:
• Symbol table of a compiler.
• Memory-management tables in operating systems.
• Large-scale distributed systems.
• Hash Tables:
• Effective way of implementing dictionaries.
• Generalization of ordinary arrays.
‫خان‬ ‫سنور‬ Algorithm Analysis
Direct-address Tables
• Direct-address Tables are ordinary arrays.
• Facilitate direct addressing.
• Element whose key is k is obtained by indexing into the kth position of the array.
• Applicable when we can afford to allocate an array with one position for
every possible key.
• i.e. when the universe of keys U is small.
• Dictionary operations can be implemented to take O(1) time.
• Details in Sec. 11.1.
‫خان‬ ‫سنور‬ Algorithm Analysis
Hash Tables
• Notation:
• U – Universe of all possible keys.
• K – Set of keys actually stored in the dictionary.
• |K| = n.
• When U is very large,
• Arrays are not practical.
• |K| << |U|.
• Use a table of size proportional to |K| – The hash tables.
• However, we lose the direct-addressing ability.
• Define functions that map keys to slots of the hash table.
‫خان‬ ‫سنور‬ Algorithm Analysis
Hashing
• A range of key values can be transformed into a range of array
index values.
• A simple array can be used where each record occupies one cell
of the array and the index number of the cell is the key value for
that record.
• But keys may not be well arranged.
• In such a situation hash tables can be used.
‫خان‬ ‫سنور‬ Algorithm Analysis
Hashing
• Hash function h: Mapping from U to the slots of a hash table T[0..m–1].
h : U  {0,1,…, m–1}
• With arrays, key k maps to slot A[k].
• With hash tables, key k maps or “hashes” to slot T[h[k]].
• h[k] is the hash value of key k.
‫خان‬ ‫سنور‬ Algorithm Analysis
Hashing
0
m–1
h(k1)
h(k4)
h(k2)=h(k5)
h(k3)
U
(universe of keys)
K
(actual
keys)
k1
k2
k3
k5
k4
collision
‫خان‬ ‫سنور‬ Algorithm Analysis
Issues with Hashing
• Multiple keys can hash to the same slot – collisions are possible.
• Design hash functions such that collisions are minimized.
• But avoiding collisions is impossible.
• Design collision-resolution techniques.
• Search will cost Ө(n) time in the worst case.
• However, all operations can be made to have an expected complexity of Ө(1).
‫خان‬ ‫سنور‬ Algorithm Analysis
What is a Hash Table ?
• The simplest kind of hash
table is an array of records.
• This example has 701
records.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[ 700]
‫خان‬ ‫سنور‬ Algorithm Analysis
What is a Hash Table ?
• Each record has a special
field, called its key.
• In this example, the key is
a long integer field called
Number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
‫خان‬ ‫سنور‬ Algorithm Analysis
What is a Hash Table ?
• The number might be a
person's identification
number, and the rest of the
record has information
about the person.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
‫خان‬ ‫سنور‬ Algorithm Analysis
What is a Hash Table ?
• When a hash table is in use,
some spots contain valid
records, and other spots are
"empty".
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
‫خان‬ ‫سنور‬ Algorithm Analysis
Inserting a New Record
• In order to insert a new
record, the key must
somehow be converted to an
array index.
• The index is called the hash
value of the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
‫خان‬ ‫سنور‬ Algorithm Analysis
Inserting a New Record
• Typical way create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
‫خان‬ ‫سنور‬ Algorithm Analysis
Inserting a New Record
• Typical way to create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
3
‫خان‬ ‫سنور‬ Algorithm Analysis
Inserting a New Record
• The hash value is used for
the location of the new
record.
Number 580625685
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
[3]
‫خان‬ ‫سنور‬ Algorithm Analysis
Writing a hash function
• If we write a hash table that can store objects, we need a hash function
for the objects, so that we know what index to store them
• We want a hash function to:
1. Be simple/fast to compute
2. Map equal elements to the same index
3. Map different elements to different indexes
4. Have keys distributed evenly among indexes
‫خان‬ ‫سنور‬ Algorithm Analysis
Hash Function
• Need to compress the huge range of numbers.
• arrayIndex = hugenumber % smallRange;
• This is a hash function.
• It hashes a number in a large range into a number in a smaller range,
corresponding to the index numbers in an array.
• An array into which data is inserted using a hash function later is called a
hash table.
‫خان‬ ‫سنور‬ Algorithm Analysis
Hash Functions
• A good hash function is simple and can be computed quickly.
• Speed degrades if hash function is slow.
• Purpose is to transform a range of key values into index values such that
the key values are distributed randomly across all the indices of the hash
table.
• Keys may be completely random or not so random.
‫خان‬ ‫سنور‬ Algorithm Analysis
Hash Functions
Hash Functions
• Direct Hashing
• Subtraction Method
• Modulo Division
• Digit Extraction Method
• Mid-Square Method
• Folding Method
• Rotation Method
• Pseudo Random Methods
‫خان‬ ‫سنور‬ Algorithm Analysis
Direct Hashing
• No algorithm Manipulation
• No Collision
• Limited
• Not Suitable for large key Values
KEY H(K) Address
‫خان‬ ‫سنور‬ Algorithm Analysis
Modulo Division Method
• Map a key k into one of the m slots by taking the remainder of k divided by
m. That is,
h(k) = k mod m where m is the list size
• Example: m = 31 and k = 78  h(k) = 16.
• Advantage: Fast, since requires just one division operation.
• Disadvantage: Have to avoid certain values of m.
• Don’t pick certain values, such as m=2p
• Or hash won’t depend on all bits of k.
• Good choice for m:
• Primes, not too close to power of 2 (or 10) are good.
‫خان‬ ‫سنور‬ Algorithm Analysis
Mid Square Hashing
• Def:- In mid-square hashing the key is squared and the address is selected
from the middle of the square number
• Address = (Middle Digits of Key)2
• Middle of Square
• Key is squared and the address is selected from the middle of the square
number
• Problem:- Non- Uniform distribution of the key.
Size of the key is too large
Variation:- Use only a portion of the key
‫خان‬ ‫سنور‬ Algorithm Analysis
Example
•Key = 9452 * 9452 = 89340304 = 3403
•Variation Explanation:-
•379452 :- 379 * 379 = 143641 = 364
‫خان‬ ‫سنور‬ Algorithm Analysis
Digit Extraction
• Using digit Extraction selected digits are extracted from the key and used
as the address.
:- Address = Selected digits from the key
Example:-
Using SIX digit employee number to hash to a three digit address, we
could select the first, third and fourth digits(from the left) and use them as
the address.
• The keys are
• 379452 = 394
• 121267 = 112
• 378845 = 388
• 045128 = 051
051 045128
112 121267
388 378845
394 379452
‫خان‬ ‫سنور‬ Algorithm Analysis
Folding Method for Hashing
• Break the key into groups of digits and add the groups.
• The number of digits in a group should correspond to the size of the array.
Fold-Shift Hashing Fold-Boundary Hashing
Key value is divided into parts whose size
matches the size required address
Left and right number are folded on a
fixed boundary between them the center
no.
K = 123456789{3}
123 + 456 + 789
123
456
789
1368
K = 123456789{3}
123 + 456 + 789
321
456
987
1764
H(k) = 368 H(k) = 764
‫خان‬ ‫سنور‬ Algorithm Analysis
Pseudo-Random Method
Key
Pseudo Random
Number Generator
Random Number Modulo Division Address
Key = y = ax + b
For maximum efficacy a and c should be prime numbers
Where
• Y is random number
• a is chosen co-efficient
• c is constant
H(k) = Y MOD n
Where n is list size
‫خان‬ ‫سنور‬ Algorithm Analysis
Random Keys
• If the world were perfect, Evenly distributed NOT!
• A perfect hash function maps every key into a different table location.
• In most cases large number of keys are compressed into a smaller range of
index numbers.
• Distribution of key values in a particular database determines what the
hash function needs to be.
• For random keys: index = key % arraySize;
‫خان‬ ‫سنور‬ Algorithm Analysis
Example
•Example:-
Key = 121267 a = 17 c = 7 listSize = 307
Address = ((17 * 121267) + 7 MOD 307 + 1
= (2061539 + 7) MOD 307 + 1
= 41 + 1
= 42
‫خان‬ ‫سنور‬ Algorithm Analysis
Subtraction Hashing
• Subtract a fixed number from key
• H(k) = k – c
• Where c is fixed number
• Like Direct Hashing
• No Collision
• Suitable for Small List
‫خان‬ ‫سنور‬ Algorithm Analysis
Non-random Keys
• Consider a number of the form 030-050-20-92-7.
• Every digit serves a purpose. The last 3 digits are redundant for error
checking.
• These digits shouldn’t be considered.
• Every part of the remaining key should contribute to the data.
• Use a prime number for the modulo base.
‫خان‬ ‫سنور‬ Algorithm Analysis
0
1 21
2
3
4 34
5
6
7 7
8 18
9
• Collision: the event that two hash table elements map into
the same slot in the array
• Example: add 41, 34, 7, 18, then 21
• 21 hashes into the same slot as 41!
• 21 should not replace 41 in the hash table;
they should both be there
Collision Resolution: a strategy for fixing collisions in a hash
table
Hash collisions
‫خان‬ ‫سنور‬ Algorithm Analysis
Collisions
• Two words can hash to the same array index, resulting in collision.
•Open Addressing: Search the array in some systematic way for an
empty cell and insert the new item there if collision occurs.
•Separate chaining: Create an array of linked list of words, so that the
item can be inserted into the linked list if collision occurs.
‫خان‬ ‫سنور‬ Algorithm Analysis
Collision Resolution Methods
Collision
Resolution
Open
Addressing
Linear
Probing
Quadratic
Probing
Pseudo
Random
Methods
Key Offset
Linked List Buckets
‫خان‬ ‫سنور‬ Algorithm Analysis
Open Addressing
• Three methods to find next vacant cell:
• Linear Probing :- Search sequentially for vacant cells, incrementing the
index until an empty cell is found.
• Clustering is a problem occurring in linear probing.
• As the array gets full, clusters grow larger, resulting in very long probe
lengths.
• Array can be expanded if it becomes too full.
‫خان‬ ‫سنور‬ Algorithm Analysis
Open Addressing
• Open Addressing is
• a collision resolution strategy
• on a collision, look for another empty spot in the array
• previous discussed examples are all examples of open addressing
• Linear probing
• Quadratic probing
• Double hashing
• Look-up for open addressing scheme must continue looking for item until it finds it
or an empty slot.
‫خان‬ ‫سنور‬ Algorithm Analysis
0
1 41
2 21
3
4 34
5
6
7 7
8 18
9 57
• linear probing: resolving collisions in slot i by putting the
colliding element into the next available slot (i+1, i+2, ...)
• add 41, 34, 7, 18, then 21, then 57
• 21 collides (41 is already there), so we search ahead until we find
empty slot 2
• 57 collides (7 is already there), so we search ahead twice until we find
empty slot 9
• lookup algorithm becomes slightly modified; we have to loop
now until we find the element or an empty slot
• what happens when the table gets mostly full?
Linear probing
‫خان‬ ‫سنور‬ Algorithm Analysis
0 49
1 58
2 9
3
4
5
6
7
8 18
9 89
• Clustering: nodes being placed close together by probing,
which degrades hash table's performance
• add 89, 18, 49, 58, 9
• now searching for the value 28 will have to check half the hash
table! no longer constant time...
Clustering problem
‫خان‬ ‫سنور‬ Algorithm Analysis
Quadratic Probing
• load factor = nItems / arraySize;
• If load factor isn’t high, clusters can form.
• In quadratic probing more widely separated cells are probed.
• The step is the square of the step number.
• If index is x, the probe goes to x+1, x+4, x+9, x+16 and so on.
• Eliminates primary clustering, but all the keys that hash to a particular cell
follow the same sequence in trying to find a vacant cell (secondary
clustering).
‫خان‬ ‫سنور‬ Algorithm Analysis
0 49
1
2 58
3 9
4
5
6
7
8 18
9 89
• Quadratic probing: resolving collisions on slot i by putting
the colliding element into slot i+1, i+4, i+9, i+16, ...
• add 89, 18, 49, 58, 9
• 49 collides (89 is already there), so we search ahead by +1 to empty slot
0
• 58 collides (18 is already there), so we search ahead by +1 to occupied
slot 9, then +4 to empty slot 2
• 9 collides (89 is already there), so we search ahead by +1 to occupied
slot 0, then +4 to empty slot 3
• clustering is reduced
• what is the lookup algorithm?
Quadratic probing
‫خان‬ ‫سنور‬ Algorithm Analysis
Double Hashing
• Better solution.
• Generate probe sequences that depend on the key instead of being the same for every
key.
• Hash the key a second time using a different hash function and use the result as the
step size.
• Step size remains constant throughout a probe, but its different for different keys.
• Secondary hash function should not be the same as primary hash function.
• It must never output a zero.
• stepSize = constant – (key % constant);
• Requires that size of hash table is a prime number.
‫خان‬ ‫سنور‬ Algorithm Analysis
0
1
2
3 49
4
5
6
7
8 18
9 89
• Double hashing:
• Pick a secondary hash function hash2().
• when hashing item x, resolving collisions on slot i by putting the
colliding element into slot i+hash2(x), i+2*hash2(x),
i+3*hash2(x), i+4*hash2(x), ...
• Suppose hash2(x) = (x / 10) % 10.
• add 89, 18, 49, 58: What happens?
• 49 collides (89 is already there); hash2(x) = 4, so check location i + 4
next; put 49 in slot 3.
• 58 collides (18 is already there); hash2(x) = 5, so check location i + 5
next; still occupied, check location i+2*5; still occupied!
• will remain still occupied forever!
• Fix this particular problem by using a prime # for your table size. Then
will visit all array entries eventually during probing.
• what is the lookup algorithm?
Double Hashing
‫خان‬ ‫سنور‬ Algorithm Analysis
Separate Chaining
• No need to search for empty cells.
• The load factor can be 1 or greater.
• If there are more items on the lists access time is reduced.
• Deletion poses no problems.
• Table size is not a prime number.
• Arrays (buckets) can be used at each location in a hash table instead of a
linked list.
‫خان‬ ‫سنور‬ Algorithm Analysis
0
1
2
3
4
5
6
7
8
9
10
107
22 12 42
• Chaining: All keys that map to the same hash value are kept in a linked list
Chaining
‫خان‬ ‫سنور‬ Algorithm Analysis
Hashing Efficiency
• Quadratic probing and Double Hashing share their performance
equations.
• For successful hashing : -log2(1 - loadFactor) / loadFactor
• For an unsuccessful search :- 1 / (1 - loadFactor)
• Searching for separate chaining :- 1 + loadFactor /2
• For unsuccessful search :- 1 + loadFactor
• For insertion :- 1 +loadfactor ?2 for ordered lists and 1 for unordered lists.
‫خان‬ ‫سنور‬ Algorithm Analysis
Open Addressing vs. Separate Chaining
• If open addressing is to be used, double hashing is preferred over
quadratic probing.
• If plenty of memory is available and the data won’t expand, then linear
probing is simpler to implement.
• If number of items to be inserted in hash table isn’t known, separate
chaining is preferable to open addressing.
• When in doubt use separate chaining
‫خان‬ ‫سنور‬ Algorithm Analysis
External Storage
• Hash table can be stored in main memory.
• If it is too large it can be stored externally on disk, with only part of it
being read into main memory at a time.
• In external hashing its important that the blocks do not become full.
• Even with a good hash function, the block might become full.
• This situation can be handled using variations of the collision-resolution
schemes.
‫خان‬ ‫سنور‬ Algorithm Analysis
• main operation: lookup of item in table
• What is worst-case cost of finding an item?
• assuming hash table e hash table has n items in it
• Is the worst-case cost different for chaining, and the various open
addressing schemes?
• Worst-case analysis doesn’t make sense for hash tables, look at average
case cost
• Cost highly depend on the load factor
Analysis of hash tables
‫خان‬ ‫سنور‬ Algorithm Analysis
• Load: the load  of a hash table is the ratio:
 no. of elements
 array size
• Average case analysis of search
• Assume hashCode distributes entries uniformly at random into various indices.
• Using chaining implementation
• What is the average list size?
• What does this imply about search times?
M
N
Analysis of hash table search
‫خان‬ ‫سنور‬ Algorithm Analysis51
• Average case analysis of search, with chaining:
• Count number of link traversals necessary.
• unsuccessful: 
(the average length of a list at hash(i))
• successful: 1 + (/2)
(one node, plus half the avg. length of a list)
• Analysis of open addressing schemes:
• Are more lookups or less lookups required for open addressing, on average?
Analysis of hash table search
‫خان‬ ‫سنور‬ Algorithm Analysis52
• Average case analysis of search, with linear probing:
• Number of lookups worse than chaining
• Complicated to analyze; done by Knuth [1962]
• unsuccessful: 
• successful: 
Analysis of hash table search







 2
)1(
1
1
2
1









1
1
1
2
1
‫خان‬ ‫سنور‬ Algorithm Analysis
Hashing Efficiency
• Insertion and searching can approach O(1) time.
• If collision occurs, access time depends on the resulting probe lengths.
• Individual insert or search time is proportional to the length of the probe.
This is in addition to a constant time for hash function.
• Relationship between probe length (P) and load factor (L) for linear
probing : P = (1+1 / (1 – L2)) / 2 for successful search and
P = (1 + 1 / (1 – L))/ 2
‫خان‬ ‫سنور‬ Algorithm Analysis
Exam Related Questions
• 11.1-1
Suppose that a dynamic set S is represented by a direct-address table T
of length m. Describe a procedure that finds the maximum element of S.
What is the worst-case performance of your procedure?
• Answer
Starting from the first index in T, keep track of the highest index so far
that has anon NIL entry. This takes time O(m).
‫خان‬ ‫سنور‬ Algorithm Analysis
Exam Related Questions
11.1-2
A bit vector is simply an array of bits(0sand 1s). A bit vector of length m takes much
less space than an array of m pointers. Describe how to use a bit vector to represent a
dynamic set of distinct elements with no satellite data. Dictionary operations should
run in O.1/ time.
Answer
Start with a bit vector b which contains a 1 in position k if k is in the dynamic set, and
a 0 otherwise. To search, we return true if b[x]==1. To insert x, set b[x]=1. To delete
x, set b[x]=0. Each of the set takes O(1)time.
‫خان‬ ‫سنور‬ Algorithm Analysis
Exam Related Questions
11.2-2
Demonstrate what happens when we insert the keys 5, 28, 19, 15, 20, 33, 12, 17, 10
into a hash table with collisions resolved by chaining. Let the table have 9 slots, and
let the hash function be h(k) = k MOD 9.
Answer
Label the slots of our table be 0, 1, 2, . . . , 8. numbers
which appear to the left in the table have been inserted
later.
‫خان‬ ‫سنور‬ Algorithm Analysis
Exam Related Questions
11.3-1
Suppose we wish to search a linked list of length n, where each element contains a
key k along with a hash value h(k). Each key is a long character string. How might we
take advantage of the hash values when searching the list for an element with a given
key?
Answer
If every element also contained a hash of the long character string, when we are
searching for the desired element, we’ll first check if the hash value of the node in the
linked list, and move on if it disagrees. This can increase the runtime by a factor
proportional to the length of the long character strings.
‫خان‬ ‫سنور‬ Algorithm Analysis
Exam Related Questions
11.3-2
Suppose that we has a string of r characters into m slots by treating it as a radix-128
number and then using the division method. We can easily represent the number m
as a 32-bit computer word, but the string of r characters, treated as a radix-128
number, takes many words. How can we apply the division method to compute the
hash value of the character string without using more than a constant number of
words of storage outside the string itself?
Answer
Compute the value of the first character mod m, add the value of the second
character mod m, add the value of the third character mod m to that, and so on, until
all r characters have been taken care of.
‫خان‬ ‫سنور‬ Algorithm Analysis
Exam Related Questions
11.4-2
Write pseudo code for HASH-DELETE as outlined in the text, and modify HASH-
INSERT to handle the special value DELETED.
Answer
i = HASH SEARCH(T;k)
T[i]= DELETED
We modify INSERT by changing line 4 to: if T[j]== NIL or T[j]== DELETED.
Algorithm1 HASH-DELETE(T,k)

Hash tables

  • 1.
    Sunawar Khan Islamia Universityof Bahawalpur Bahawalnagar Campus Analysis o f Algorithm
  • 2.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Collision Hash Table Direct Address Table Hash Functions Collision Resolution Contents
  • 3.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Dictionary • Dictionary: • Dynamic-set data structure for storing items indexed using keys. • Supports operations Insert, Search, and Delete. • Applications: • Symbol table of a compiler. • Memory-management tables in operating systems. • Large-scale distributed systems. • Hash Tables: • Effective way of implementing dictionaries. • Generalization of ordinary arrays.
  • 4.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Direct-address Tables • Direct-address Tables are ordinary arrays. • Facilitate direct addressing. • Element whose key is k is obtained by indexing into the kth position of the array. • Applicable when we can afford to allocate an array with one position for every possible key. • i.e. when the universe of keys U is small. • Dictionary operations can be implemented to take O(1) time. • Details in Sec. 11.1.
  • 5.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hash Tables • Notation: • U – Universe of all possible keys. • K – Set of keys actually stored in the dictionary. • |K| = n. • When U is very large, • Arrays are not practical. • |K| << |U|. • Use a table of size proportional to |K| – The hash tables. • However, we lose the direct-addressing ability. • Define functions that map keys to slots of the hash table.
  • 6.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hashing • A range of key values can be transformed into a range of array index values. • A simple array can be used where each record occupies one cell of the array and the index number of the cell is the key value for that record. • But keys may not be well arranged. • In such a situation hash tables can be used.
  • 7.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hashing • Hash function h: Mapping from U to the slots of a hash table T[0..m–1]. h : U  {0,1,…, m–1} • With arrays, key k maps to slot A[k]. • With hash tables, key k maps or “hashes” to slot T[h[k]]. • h[k] is the hash value of key k.
  • 8.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hashing 0 m–1 h(k1) h(k4) h(k2)=h(k5) h(k3) U (universe of keys) K (actual keys) k1 k2 k3 k5 k4 collision
  • 9.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Issues with Hashing • Multiple keys can hash to the same slot – collisions are possible. • Design hash functions such that collisions are minimized. • But avoiding collisions is impossible. • Design collision-resolution techniques. • Search will cost Ө(n) time in the worst case. • However, all operations can be made to have an expected complexity of Ө(1).
  • 10.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis What is a Hash Table ? • The simplest kind of hash table is an array of records. • This example has 701 records. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] An array of records . . . [ 700]
  • 11.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis What is a Hash Table ? • Each record has a special field, called its key. • In this example, the key is a long integer field called Number. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] . . . [ 700] [ 4 ] Number 506643548
  • 12.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis What is a Hash Table ? • The number might be a person's identification number, and the rest of the record has information about the person. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] . . . [ 700] [ 4 ] Number 506643548
  • 13.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis What is a Hash Table ? • When a hash table is in use, some spots contain valid records, and other spots are "empty". [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . .
  • 14.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Inserting a New Record • In order to insert a new record, the key must somehow be converted to an array index. • The index is called the hash value of the key. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685
  • 15.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Inserting a New Record • Typical way create a hash value: [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 (Number mod 701) What is (580625685 mod 701) ?
  • 16.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Inserting a New Record • Typical way to create a hash value: [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . Number 580625685 (Number mod 701) What is (580625685 mod 701) ? 3
  • 17.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Inserting a New Record • The hash value is used for the location of the new record. Number 580625685 [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number 506643548Number 233667136Number 281942902 Number 155778322 . . . [3]
  • 18.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Writing a hash function • If we write a hash table that can store objects, we need a hash function for the objects, so that we know what index to store them • We want a hash function to: 1. Be simple/fast to compute 2. Map equal elements to the same index 3. Map different elements to different indexes 4. Have keys distributed evenly among indexes
  • 19.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hash Function • Need to compress the huge range of numbers. • arrayIndex = hugenumber % smallRange; • This is a hash function. • It hashes a number in a large range into a number in a smaller range, corresponding to the index numbers in an array. • An array into which data is inserted using a hash function later is called a hash table.
  • 20.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hash Functions • A good hash function is simple and can be computed quickly. • Speed degrades if hash function is slow. • Purpose is to transform a range of key values into index values such that the key values are distributed randomly across all the indices of the hash table. • Keys may be completely random or not so random.
  • 21.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hash Functions Hash Functions • Direct Hashing • Subtraction Method • Modulo Division • Digit Extraction Method • Mid-Square Method • Folding Method • Rotation Method • Pseudo Random Methods
  • 22.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Direct Hashing • No algorithm Manipulation • No Collision • Limited • Not Suitable for large key Values KEY H(K) Address
  • 23.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Modulo Division Method • Map a key k into one of the m slots by taking the remainder of k divided by m. That is, h(k) = k mod m where m is the list size • Example: m = 31 and k = 78  h(k) = 16. • Advantage: Fast, since requires just one division operation. • Disadvantage: Have to avoid certain values of m. • Don’t pick certain values, such as m=2p • Or hash won’t depend on all bits of k. • Good choice for m: • Primes, not too close to power of 2 (or 10) are good.
  • 24.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Mid Square Hashing • Def:- In mid-square hashing the key is squared and the address is selected from the middle of the square number • Address = (Middle Digits of Key)2 • Middle of Square • Key is squared and the address is selected from the middle of the square number • Problem:- Non- Uniform distribution of the key. Size of the key is too large Variation:- Use only a portion of the key
  • 25.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Example •Key = 9452 * 9452 = 89340304 = 3403 •Variation Explanation:- •379452 :- 379 * 379 = 143641 = 364
  • 26.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Digit Extraction • Using digit Extraction selected digits are extracted from the key and used as the address. :- Address = Selected digits from the key Example:- Using SIX digit employee number to hash to a three digit address, we could select the first, third and fourth digits(from the left) and use them as the address. • The keys are • 379452 = 394 • 121267 = 112 • 378845 = 388 • 045128 = 051 051 045128 112 121267 388 378845 394 379452
  • 27.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Folding Method for Hashing • Break the key into groups of digits and add the groups. • The number of digits in a group should correspond to the size of the array. Fold-Shift Hashing Fold-Boundary Hashing Key value is divided into parts whose size matches the size required address Left and right number are folded on a fixed boundary between them the center no. K = 123456789{3} 123 + 456 + 789 123 456 789 1368 K = 123456789{3} 123 + 456 + 789 321 456 987 1764 H(k) = 368 H(k) = 764
  • 28.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Pseudo-Random Method Key Pseudo Random Number Generator Random Number Modulo Division Address Key = y = ax + b For maximum efficacy a and c should be prime numbers Where • Y is random number • a is chosen co-efficient • c is constant H(k) = Y MOD n Where n is list size
  • 29.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Random Keys • If the world were perfect, Evenly distributed NOT! • A perfect hash function maps every key into a different table location. • In most cases large number of keys are compressed into a smaller range of index numbers. • Distribution of key values in a particular database determines what the hash function needs to be. • For random keys: index = key % arraySize;
  • 30.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Example •Example:- Key = 121267 a = 17 c = 7 listSize = 307 Address = ((17 * 121267) + 7 MOD 307 + 1 = (2061539 + 7) MOD 307 + 1 = 41 + 1 = 42
  • 31.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Subtraction Hashing • Subtract a fixed number from key • H(k) = k – c • Where c is fixed number • Like Direct Hashing • No Collision • Suitable for Small List
  • 32.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Non-random Keys • Consider a number of the form 030-050-20-92-7. • Every digit serves a purpose. The last 3 digits are redundant for error checking. • These digits shouldn’t be considered. • Every part of the remaining key should contribute to the data. • Use a prime number for the modulo base.
  • 33.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis 0 1 21 2 3 4 34 5 6 7 7 8 18 9 • Collision: the event that two hash table elements map into the same slot in the array • Example: add 41, 34, 7, 18, then 21 • 21 hashes into the same slot as 41! • 21 should not replace 41 in the hash table; they should both be there Collision Resolution: a strategy for fixing collisions in a hash table Hash collisions
  • 34.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Collisions • Two words can hash to the same array index, resulting in collision. •Open Addressing: Search the array in some systematic way for an empty cell and insert the new item there if collision occurs. •Separate chaining: Create an array of linked list of words, so that the item can be inserted into the linked list if collision occurs.
  • 35.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Collision Resolution Methods Collision Resolution Open Addressing Linear Probing Quadratic Probing Pseudo Random Methods Key Offset Linked List Buckets
  • 36.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Open Addressing • Three methods to find next vacant cell: • Linear Probing :- Search sequentially for vacant cells, incrementing the index until an empty cell is found. • Clustering is a problem occurring in linear probing. • As the array gets full, clusters grow larger, resulting in very long probe lengths. • Array can be expanded if it becomes too full.
  • 37.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Open Addressing • Open Addressing is • a collision resolution strategy • on a collision, look for another empty spot in the array • previous discussed examples are all examples of open addressing • Linear probing • Quadratic probing • Double hashing • Look-up for open addressing scheme must continue looking for item until it finds it or an empty slot.
  • 38.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis 0 1 41 2 21 3 4 34 5 6 7 7 8 18 9 57 • linear probing: resolving collisions in slot i by putting the colliding element into the next available slot (i+1, i+2, ...) • add 41, 34, 7, 18, then 21, then 57 • 21 collides (41 is already there), so we search ahead until we find empty slot 2 • 57 collides (7 is already there), so we search ahead twice until we find empty slot 9 • lookup algorithm becomes slightly modified; we have to loop now until we find the element or an empty slot • what happens when the table gets mostly full? Linear probing
  • 39.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis 0 49 1 58 2 9 3 4 5 6 7 8 18 9 89 • Clustering: nodes being placed close together by probing, which degrades hash table's performance • add 89, 18, 49, 58, 9 • now searching for the value 28 will have to check half the hash table! no longer constant time... Clustering problem
  • 40.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Quadratic Probing • load factor = nItems / arraySize; • If load factor isn’t high, clusters can form. • In quadratic probing more widely separated cells are probed. • The step is the square of the step number. • If index is x, the probe goes to x+1, x+4, x+9, x+16 and so on. • Eliminates primary clustering, but all the keys that hash to a particular cell follow the same sequence in trying to find a vacant cell (secondary clustering).
  • 41.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis 0 49 1 2 58 3 9 4 5 6 7 8 18 9 89 • Quadratic probing: resolving collisions on slot i by putting the colliding element into slot i+1, i+4, i+9, i+16, ... • add 89, 18, 49, 58, 9 • 49 collides (89 is already there), so we search ahead by +1 to empty slot 0 • 58 collides (18 is already there), so we search ahead by +1 to occupied slot 9, then +4 to empty slot 2 • 9 collides (89 is already there), so we search ahead by +1 to occupied slot 0, then +4 to empty slot 3 • clustering is reduced • what is the lookup algorithm? Quadratic probing
  • 42.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Double Hashing • Better solution. • Generate probe sequences that depend on the key instead of being the same for every key. • Hash the key a second time using a different hash function and use the result as the step size. • Step size remains constant throughout a probe, but its different for different keys. • Secondary hash function should not be the same as primary hash function. • It must never output a zero. • stepSize = constant – (key % constant); • Requires that size of hash table is a prime number.
  • 43.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis 0 1 2 3 49 4 5 6 7 8 18 9 89 • Double hashing: • Pick a secondary hash function hash2(). • when hashing item x, resolving collisions on slot i by putting the colliding element into slot i+hash2(x), i+2*hash2(x), i+3*hash2(x), i+4*hash2(x), ... • Suppose hash2(x) = (x / 10) % 10. • add 89, 18, 49, 58: What happens? • 49 collides (89 is already there); hash2(x) = 4, so check location i + 4 next; put 49 in slot 3. • 58 collides (18 is already there); hash2(x) = 5, so check location i + 5 next; still occupied, check location i+2*5; still occupied! • will remain still occupied forever! • Fix this particular problem by using a prime # for your table size. Then will visit all array entries eventually during probing. • what is the lookup algorithm? Double Hashing
  • 44.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Separate Chaining • No need to search for empty cells. • The load factor can be 1 or greater. • If there are more items on the lists access time is reduced. • Deletion poses no problems. • Table size is not a prime number. • Arrays (buckets) can be used at each location in a hash table instead of a linked list.
  • 45.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis 0 1 2 3 4 5 6 7 8 9 10 107 22 12 42 • Chaining: All keys that map to the same hash value are kept in a linked list Chaining
  • 46.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hashing Efficiency • Quadratic probing and Double Hashing share their performance equations. • For successful hashing : -log2(1 - loadFactor) / loadFactor • For an unsuccessful search :- 1 / (1 - loadFactor) • Searching for separate chaining :- 1 + loadFactor /2 • For unsuccessful search :- 1 + loadFactor • For insertion :- 1 +loadfactor ?2 for ordered lists and 1 for unordered lists.
  • 47.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Open Addressing vs. Separate Chaining • If open addressing is to be used, double hashing is preferred over quadratic probing. • If plenty of memory is available and the data won’t expand, then linear probing is simpler to implement. • If number of items to be inserted in hash table isn’t known, separate chaining is preferable to open addressing. • When in doubt use separate chaining
  • 48.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis External Storage • Hash table can be stored in main memory. • If it is too large it can be stored externally on disk, with only part of it being read into main memory at a time. • In external hashing its important that the blocks do not become full. • Even with a good hash function, the block might become full. • This situation can be handled using variations of the collision-resolution schemes.
  • 49.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis • main operation: lookup of item in table • What is worst-case cost of finding an item? • assuming hash table e hash table has n items in it • Is the worst-case cost different for chaining, and the various open addressing schemes? • Worst-case analysis doesn’t make sense for hash tables, look at average case cost • Cost highly depend on the load factor Analysis of hash tables
  • 50.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis • Load: the load  of a hash table is the ratio:  no. of elements  array size • Average case analysis of search • Assume hashCode distributes entries uniformly at random into various indices. • Using chaining implementation • What is the average list size? • What does this imply about search times? M N Analysis of hash table search
  • 51.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis51 • Average case analysis of search, with chaining: • Count number of link traversals necessary. • unsuccessful:  (the average length of a list at hash(i)) • successful: 1 + (/2) (one node, plus half the avg. length of a list) • Analysis of open addressing schemes: • Are more lookups or less lookups required for open addressing, on average? Analysis of hash table search
  • 52.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis52 • Average case analysis of search, with linear probing: • Number of lookups worse than chaining • Complicated to analyze; done by Knuth [1962] • unsuccessful:  • successful:  Analysis of hash table search         2 )1( 1 1 2 1          1 1 1 2 1
  • 53.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Hashing Efficiency • Insertion and searching can approach O(1) time. • If collision occurs, access time depends on the resulting probe lengths. • Individual insert or search time is proportional to the length of the probe. This is in addition to a constant time for hash function. • Relationship between probe length (P) and load factor (L) for linear probing : P = (1+1 / (1 – L2)) / 2 for successful search and P = (1 + 1 / (1 – L))/ 2
  • 54.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Exam Related Questions • 11.1-1 Suppose that a dynamic set S is represented by a direct-address table T of length m. Describe a procedure that finds the maximum element of S. What is the worst-case performance of your procedure? • Answer Starting from the first index in T, keep track of the highest index so far that has anon NIL entry. This takes time O(m).
  • 55.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Exam Related Questions 11.1-2 A bit vector is simply an array of bits(0sand 1s). A bit vector of length m takes much less space than an array of m pointers. Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in O.1/ time. Answer Start with a bit vector b which contains a 1 in position k if k is in the dynamic set, and a 0 otherwise. To search, we return true if b[x]==1. To insert x, set b[x]=1. To delete x, set b[x]=0. Each of the set takes O(1)time.
  • 56.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Exam Related Questions 11.2-2 Demonstrate what happens when we insert the keys 5, 28, 19, 15, 20, 33, 12, 17, 10 into a hash table with collisions resolved by chaining. Let the table have 9 slots, and let the hash function be h(k) = k MOD 9. Answer Label the slots of our table be 0, 1, 2, . . . , 8. numbers which appear to the left in the table have been inserted later.
  • 57.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Exam Related Questions 11.3-1 Suppose we wish to search a linked list of length n, where each element contains a key k along with a hash value h(k). Each key is a long character string. How might we take advantage of the hash values when searching the list for an element with a given key? Answer If every element also contained a hash of the long character string, when we are searching for the desired element, we’ll first check if the hash value of the node in the linked list, and move on if it disagrees. This can increase the runtime by a factor proportional to the length of the long character strings.
  • 58.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Exam Related Questions 11.3-2 Suppose that we has a string of r characters into m slots by treating it as a radix-128 number and then using the division method. We can easily represent the number m as a 32-bit computer word, but the string of r characters, treated as a radix-128 number, takes many words. How can we apply the division method to compute the hash value of the character string without using more than a constant number of words of storage outside the string itself? Answer Compute the value of the first character mod m, add the value of the second character mod m, add the value of the third character mod m to that, and so on, until all r characters have been taken care of.
  • 59.
    ‫خان‬ ‫سنور‬ AlgorithmAnalysis Exam Related Questions 11.4-2 Write pseudo code for HASH-DELETE as outlined in the text, and modify HASH- INSERT to handle the special value DELETED. Answer i = HASH SEARCH(T;k) T[i]= DELETED We modify INSERT by changing line 4 to: if T[j]== NIL or T[j]== DELETED. Algorithm1 HASH-DELETE(T,k)