3. Space-time tradeoff :Space-time tradeoff :
a space-time or time-memory tradeoff is a situation wherea space-time or time-memory tradeoff is a situation where
thethe memorymemory use can be reduced at the cost of sloweruse can be reduced at the cost of slower
program execution . As the relative costs of CPU cycles,program execution . As the relative costs of CPU cycles,
RAM space, and hard drive space change — hard driveRAM space, and hard drive space change — hard drive
space has for some time been getting cheaper at a muchspace has for some time been getting cheaper at a much
faster rate than other components of computersfaster rate than other components of computers
4. Space-for-time tradeoffsSpace-for-time tradeoffs
Two varieties of space-for-time algorithms:Two varieties of space-for-time algorithms:
input enhancementinput enhancement —— preprocess the input (or its part) topreprocess the input (or its part) to
store some info to be used later in solving the problemstore some info to be used later in solving the problem
• counting sorts.counting sorts.
• string searching algorithms.string searching algorithms.
prestructuringprestructuring —— preprocess the input to make accessing itspreprocess the input to make accessing its
elements easierelements easier
• hashinghashing
• indexing schemes (e.g., B-trees)indexing schemes (e.g., B-trees)
5. Direct Addressing:Direct Addressing:
Suppose:Suppose:
• The range of keys is 0..The range of keys is 0..mm-1-1
• Keys are distinctKeys are distinct
The idea:The idea:
• Set up an array T[0..m-1] in whichSet up an array T[0..m-1] in which
– T[T[ii] =] = xx ifif xx∈∈ TT and key[and key[xx] =] = ii
– T[T[ii] = NULL] = NULL otherwiseotherwise
• This is called aThis is called a direct-address tabledirect-address table
– Operations take O(1) time!Operations take O(1) time!
– So what’s the problem?So what’s the problem?
7. Direct-Address Tables
Let U = {0,….,m-1} be the set of possible keys.
Use array T={0,….,m-1] as a direct-address table.
There is a 1-1 correspondence between keys and slots.
Direct-Address-Search(T; k)
return T[k]
Direct-Address-Insert(T; x)
T[key[x]] x
Direct-Address-Delete(T; k)
T[key[x]] nil
8. HashingHashing
A very efficient method for implementing aA very efficient method for implementing a dictionary,dictionary, i.e.,i.e., aa
set with the operations:set with the operations:
– findfind
– insertinsert
– deletedelete
Based on space-for-time tradeoff ideas.Based on space-for-time tradeoff ideas.
Important applications:Important applications:
– symbol tables.symbol tables.
– databases (databases (extendible hashingextendible hashing).).
The idea of hashing is to map keys of a given file of size n intoThe idea of hashing is to map keys of a given file of size n into
a table of size m, called the hash table, by using a predefineda table of size m, called the hash table, by using a predefined
function, called the hash function,function, called the hash function,
9. Hashing, hash functionsHashing, hash functions
The idea: somehow we map every element into some indexThe idea: somehow we map every element into some index
in the array ("hash" it);in the array ("hash" it);
this is its one and only place that it should gothis is its one and only place that it should go
• add, remove, contains all become O(1) !add, remove, contains all become O(1) !
For now, let's look at integers (For now, let's look at integers (intint))
• a "hash function"a "hash function" hh forfor intint is trivial:is trivial:
storestore int iint i at indexat index ii (a direct mapping)(a direct mapping)
– ifif ii >=>= array.lengtharray.length, store i at index, store i at index
(i % array.length)(i % array.length)
• h(i) = i % array.lengthh(i) = i % array.length
Generally, a hash function should:Generally, a hash function should:
• be easy to computebe easy to compute
• distribute keys about evenly throughout the hash tabledistribute keys about evenly throughout the hash table
10. Hashing, hash functionsHashing, hash functions
The idea: somehow we map every element into some indexThe idea: somehow we map every element into some index
in the array ("hash" it);in the array ("hash" it);
this is its one and only place that it should gothis is its one and only place that it should go
• Lookup becomesLookup becomes constant-timeconstant-time: simply look at that one slot again: simply look at that one slot again
later to see if the element is therelater to see if the element is there
• add, remove, contains all become O(1) !add, remove, contains all become O(1) !
For now, let's look at integers (For now, let's look at integers (intint))
• a "hash function"a "hash function" hh forfor intint is trivial:is trivial:
storestore int iint i at indexat index ii (a direct mapping)(a direct mapping)
– ifif ii >=>= array.lengtharray.length, store i at index, store i at index
(i % array.length)(i % array.length)
• h(i) = i % array.lengthh(i) = i % array.length
11. 00
11 4141
22
33
44 3434
55
66
77 77
88 1818
99
elements = Integerselements = Integers
h(i) = i % 10h(i) = i % 10
add 41, 34, 7, and 18add 41, 34, 7, and 18
constant-time lookup:constant-time lookup:
• just look atjust look at i % 10i % 10 again lateragain later
We lose all ordering information:We lose all ordering information:
• getMin, getMax, removeMin, removeMaxgetMin, getMax, removeMin, removeMax
• the various ordered traversalsthe various ordered traversals
• printing items in sorted orderprinting items in sorted order
Hash function exampleHash function example
12. 00
11 2121
22
33
44 3434
55
66
77 77
88 1818
99
collisioncollision: the event that two hash table elements: the event that two hash table elements
map into the same slot in the arraymap into the same slot in the array
example: add 41, 34, 7, 18, then 21example: add 41, 34, 7, 18, then 21
• 21 hashes into the same slot as 41!21 hashes into the same slot as 41!
• 21 should not replace 41 in the hash table;21 should not replace 41 in the hash table;
they should both be therethey should both be there
collision resolutioncollision resolution: means for fixing collisions in a: means for fixing collisions in a
hash tablehash table
Hash collisionsHash collisions
13. CollisionsCollisions
IfIf hh((KK11)) = h= h((KK22), there is a), there is a collisioncollision
Good hash functions result in fewer collisions but someGood hash functions result in fewer collisions but some
collisions should be expected (collisions should be expected (birthday paradoxbirthday paradox).).
Two principal hashing schemes handle collisions differentlyTwo principal hashing schemes handle collisions differently::
• Open hashingOpen hashing
–– each cell is a header of linked list of all keys hashed to iteach cell is a header of linked list of all keys hashed to it
• Closed hashingClosed hashing
– one key per cellone key per cell
– in case of collision, finds another cell byin case of collision, finds another cell by
– linear probing:linear probing: use next free bucketuse next free bucket
– double hashing:double hashing: use second hash function to compute incrementuse second hash function to compute increment
15. 00
11 4141
22 2121
33
44 3434
55
66
77 77
88 1818
99 5757
linear probinglinear probing: resolving collisions in slot: resolving collisions in slot ii by putting theby putting the
colliding element into the next available slot (i+1, i+2, ...)colliding element into the next available slot (i+1, i+2, ...)
• add 41, 34, 7, 18, then 21, then 57add 41, 34, 7, 18, then 21, then 57
– 21 collides (41 is already there), so we search ahead until we find21 collides (41 is already there), so we search ahead until we find
empty slot 2empty slot 2
– 57 collides (7 is already there), so we search ahead twice until we57 collides (7 is already there), so we search ahead twice until we
find empty slot 9find empty slot 9
• lookup algorithm becomes slightly modified; we have to looplookup algorithm becomes slightly modified; we have to loop
now until we find the element or an empty slot.now until we find the element or an empty slot.
Closed hashing -Linear probingClosed hashing -Linear probing
16. 00 4949
11 5858
22 99
33
44
55
66
77
88 1818
99 8989
clusteringclustering: nodes being placed close together by: nodes being placed close together by
probing, which degrades hash table's performanceprobing, which degrades hash table's performance
• add 89, 18, 49, 58, 9add 89, 18, 49, 58, 9
Clustering problemClustering problem
17. elements = Stringselements = Strings
let's view a string by its letters:let's view a string by its letters:
• String s : sString s : s00, s, s11, s, s22, …, s, …, sn-1n-1
how do we map a string into an integer index?how do we map a string into an integer index?
(how do we "hash" it?)(how do we "hash" it?)
one possible hash function:one possible hash function:
• treat first character as antreat first character as an intint, and hash on that, and hash on that
– h(s) = sh(s) = s00 % array.length% array.length
Hash function for stringsHash function for strings
18. another possible hash function:another possible hash function:
• treat each character as antreat each character as an intint, sum them, and hash on that, sum them, and hash on that
– h(s) =h(s) = % array.length% array.length
Better string hash functionsBetter string hash functions
∑
−
=
1
0
n
i
is
19. Open hashing (Separate chaining)Open hashing (Separate chaining)
Keys are stored in linked listsKeys are stored in linked lists outsideoutside a hash table whosea hash table whose
elements serve as the lists’ headers.elements serve as the lists’ headers.
Example:Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTEDA, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED
hh((KK) = sum of) = sum of KK ‘s letters’ positions in the alphabet MOD 13‘s letters’ positions in the alphabet MOD 13
KeyKey AA FOOLFOOL ANDAND HISHIS MONEYMONEY AREARE SOONSOON PARTEDPARTED
hh((KK)) 11 99 66 1010 77 1111 1111 1212
AA FOOLFOOLANDAND HISHISMONEYMONEY AREARE PARTEDPARTED
SOONSOON
1211109876543210
20. Open hashing (cont.) :Open hashing (cont.) :
Worst case: all keys in k hash to same slot = Θ(n) per
operation ..
let n = # keys stored in table.
m = # slots in table.
load factor α = n/m = average # keys per slot ..
The performance is likely to be Θ (1 + α) .
the 1 comes from applying the hash function and access slot
whereas the α comes from searching the list.
It is actually Θ(1 + α) ..
22. Closed hashing (Open addressing)Closed hashing (Open addressing)
Keys are storedKeys are stored insideinside a hash table.a hash table.
AA
AA FOOLFOOL
AA ANDAND FOOLFOOL
AA ANDAND FOOLFOOL HISHIS
AA ANDAND MONEYMONEY FOOLFOOL HISHIS
AA ANDAND MONEYMONEY FOOLFOOL HISHIS AREARE
AA ANDAND MONEYMONEY FOOLFOOL HISHIS AREARE SOONSOON
PARTEDPARTED AA ANDAND MONEYMONEY FOOLFOOL HISHIS AREARE SOONSOON
KeyKey AA FOOLFOOL ANDAND HISHIS MONEYMONEY AREARE SOONSOON PARTEDPARTED
hh((KK)) 11 99 66 1010 77 1111 1111 1212
0 1 2 3 4 5 6 7 8 9 10 11 12
23. Closed hashing (cont.)Closed hashing (cont.)
Does not work ifDoes not work if n > mn > m
Avoids pointersAvoids pointers
Deletions areDeletions are notnot straightforwardstraightforward
Number of probes to find/insert/delete a key depends onNumber of probes to find/insert/delete a key depends on
load factorload factor αα == nn//mm (hash table density) and collision(hash table density) and collision
resolution strategy. For linear probing:resolution strategy. For linear probing:
SS = (½) (1+ 1/(1-= (½) (1+ 1/(1- αα)) and)) and U =U = (½) (1+ 1/(1-(½) (1+ 1/(1- αα)²))²)
As the table gets filled (As the table gets filled (αα approaches 1), number of probesapproaches 1), number of probes
in linear probing increases dramatically:in linear probing increases dramatically:
25. Analysis :
Open addressing for n items in table of size m has expected cost
of ≤ 1 − α per operation, where α = n/m(< 1) assuming
uniform hashing
26. Open hashing (cont.)Open hashing (cont.)
If hash function distributes keys uniformly, average length ofIf hash function distributes keys uniformly, average length of
linked list will belinked list will be αα == n/m.n/m. This ratio is calledThis ratio is called load factorload factor..
Average number of probes in successful,Average number of probes in successful, SS, and unsuccessful, and unsuccessful
searches,searches, UU::
SS ≈≈ 1+1+αα/2,/2, U =U = αα
LoadLoad αα is typically kept small (ideally, about 1)is typically kept small (ideally, about 1)
Open hashing still works ifOpen hashing still works if n > mn > m