SlideShare a Scribd company logo
1 of 71
COLLISION RESOLUTION
1
Collision Avoidance
With (x%TableSize), number of collisions depends on
 the ints inserted
 TableSize
Larger table-size tends to help, but not always
 Example: 70, 24, 56, 43, 10
with TableSize = 10 and TableSize = 60
Technique: Pick table size to be prime. Why?
 Real-life data tends to have a pattern,
 "Multiples of 61" are probably less likely than
"multiples of 60"
 Some collision strategies do better with prime size
2
Collision Resolution
Collision:
When two keys map to the same location
in the hash table
We try to avoid it, but the number of keys
always exceeds the table size
Ergo, hash tables generally must support
some form of collision resolution
3
Flavors of Collision Resolution
Separate Chaining
Open Addressing
 Linear Probing
 Quadratic Probing
 Double Hashing
4
Terminology Warning
We and the book use the terms
 "chaining" or "separate chaining"
 "open addressing"
Very confusingly, others use the terms
 "open hashing" for "chaining"
 "closed hashing" for "open addressing"
We also do trees upside-down
5
Separate Chaining
6
0 /
1 /
2 /
3 /
4 /
5 /
6 /
7 /
8 /
9 /
All keys that map to the same
table location are kept in a linked
list (a.k.a. a "chain" or "bucket")
As easy as it sounds
Example:
insert 10, 22, 86, 12, 42
with h(x) = x % 10
Separate Chaining
7
0
1 /
2 /
3 /
4 /
5 /
6 /
7 /
8 /
9 /
All keys that map to the same
table location are kept in a linked
list (a.k.a. a "chain" or "bucket")
As easy as it sounds
Example:
insert 10, 22, 86, 12, 42
with h(x) = x % 10
10 /
Separate Chaining
8
0
1 /
2
3 /
4 /
5 /
6 /
7 /
8 /
9 /
All keys that map to the same
table location are kept in a linked
list (a.k.a. a "chain" or "bucket")
As easy as it sounds
Example:
insert 10, 22, 86, 12, 42
with h(x) = x % 10
10 /
22 /
Separate Chaining
9
0
1 /
2
3 /
4 /
5 /
6
7 /
8 /
9 /
All keys that map to the same
table location are kept in a linked
list (a.k.a. a "chain" or "bucket")
As easy as it sounds
Example:
insert 10, 22, 86, 12, 42
with h(x) = x % 10
10 /
22 /
86 /
Separate Chaining
10
0
1 /
2
3 /
4 /
5 /
6
7 /
8 /
9 /
All keys that map to the same
table location are kept in a linked
list (a.k.a. a "chain" or "bucket")
As easy as it sounds
Example:
insert 10, 22, 86, 12, 42
with h(x) = x % 10
10 /
12
86 /
22 /
Separate Chaining
11
0
1 /
2
3 /
4 /
5 /
6
7 /
8 /
9 /
All keys that map to the same
table location are kept in a linked
list (a.k.a. a "chain" or "bucket")
As easy as it sounds
Example:
insert 10, 22, 86, 12, 42
with h(x) = x % 10
10 /
42
86 /
12 22 /
Thoughts on Separate Chaining
Worst-case time for find?
 Linear
 But only with really bad luck or bad hash function
 Not worth avoiding (e.g., with balanced trees at each bucket)
 Keep small number of items in each bucket
 Overhead of tree balancing not worthwhile for small n
Beyond asymptotic complexity, some "data-structure
engineering" can improve constant factors
 Linked list, array, or a hybrid
 Insert at end or beginning of list
 Sorting the lists gains and loses performance
 Splay-like: Always move item to front of list
12
Rigorous Separate Chaining Analysis
The load factor, , of a hash table is calculated as
𝜆 =
𝑛
𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒
where n is the number of items currently in the table
13
Load Factor?
14
0
1 /
2
3 /
4 /
5 /
6
7 /
8 /
9 /
10 /
42
86 /
12 22 /
𝜆 =
𝑛
𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒
= ?
=
5
10
= 0.5
Load Factor?
15
0
1
2
3
4 /
5
6
7
8
9
10 /
42
86 /
12 22 /
𝜆 =
𝑛
𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒
= ?
=
21
10
= 2.1
71 2 31 /
63 73 /
75 5 65 95 /
27 47
88 18 38 98 /
99 /
Rigorous Separate Chaining Analysis
The load factor, , of a hash table is calculated as
𝜆 =
𝑛
𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒
where n is the number of items currently in the table
Under chaining, the average number of elements per
bucket is ___
So if some inserts are followed by random finds, then
on average:
 Each unsuccessful find compares against ___ items
 Each successful find compares against ___ items
How big should TableSize be??
16
Rigorous Separate Chaining Analysis
The load factor, , of a hash table is calculated as
𝜆 =
𝑛
𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒
where n is the number of items currently in the table
Under chaining, the average number of elements per
bucket is 
So if some inserts are followed by random finds, then
on average:
 Each unsuccessful find compares against  items
 Each successful find compares against  items
 If  is low, find and insert likely to be O(1)
 We like to keep  around 1 for separate chaining
17
Separate Chaining Deletion
Not too bad and quite easy
 Find in table
 Delete from bucket
Similar run-time as insert
 Sensitive to underlying
bucket structure
18
0
1 /
2
3 /
4 /
5 /
6
7 /
8 /
9 /
10 /
42
86 /
12 22 /
Open Addressing: Linear Probing
Separate chaining does not use all the
space in the table. Why not use it?
 Store directly in the array cell
 No linked lists or buckets
How to deal with collisions?
If h(key) is already full,
try (h(key) + 1) % TableSize. If full,
try (h(key) + 2) % TableSize. If full,
try (h(key) + 3) % TableSize. If full…
Example: insert 38, 19, 8, 79, 10
0
1
2
3
4
5
6
7
8
9
19
Open Addressing: Linear Probing
Separate chaining does not use all the
space in the table. Why not use it?
 Store directly in the array cell (no linked
list or buckets)
How to deal with collisions?
If h(key) is already full,
try (h(key) + 1) % TableSize. If full,
try (h(key) + 2) % TableSize. If full,
try (h(key) + 3) % TableSize. If full…
Example: insert 38, 19, 8, 79, 10
0
1
2
3
4
5
6
7
8 38
9
20
Open Addressing: Linear Probing
Separate chaining does not use all the
space in the table. Why not use it?
 Store directly in the array cell
(no linked list or buckets)
How to deal with collisions?
If h(key) is already full,
try (h(key) + 1) % TableSize. If full,
try (h(key) + 2) % TableSize. If full,
try (h(key) + 3) % TableSize. If full…
Example: insert 38, 19, 8, 79, 10
0
1
2
3
4
5
6
7
8 38
9 19
21
Open Addressing: Linear Probing
Separate chaining does not use all the
space in the table. Why not use it?
 Store directly in the array cell
(no linked list or buckets)
How to deal with collisions?
If h(key) is already full,
try (h(key) + 1) % TableSize. If full,
try (h(key) + 2) % TableSize. If full,
try (h(key) + 3) % TableSize. If full…
Example: insert 38, 19, 8, 79, 10
0 8
1
2
3
4
5
6
7
8 38
9 19
22
Open Addressing: Linear Probing
Separate chaining does not use all the
space in the table. Why not use it?
 Store directly in the array cell
(no linked list or buckets)
How to deal with collisions?
If h(key) is already full,
try (h(key) + 1) % TableSize. If full,
try (h(key) + 2) % TableSize. If full,
try (h(key) + 3) % TableSize. If full…
Example: insert 38, 19, 8, 79, 10
0 8
1 79
2
3
4
5
6
7
8 38
9 19
23
Open Addressing: Linear Probing
Separate chaining does not use all the
space in the table. Why not use it?
 Store directly in the array cell
(no linked list or buckets)
How to deal with collisions?
If h(key) is already full,
try (h(key) + 1) % TableSize. If full,
try (h(key) + 2) % TableSize. If full,
try (h(key) + 3) % TableSize. If full…
Example: insert 38, 19, 8, 79, 10
0 8
1 79
2 10
3
4
5
6
7
8 38
9 19
24
Load Factor?
0 8
1 79
2 10
3
4
5
6
7
8 38
9 19
25
𝜆 =
𝑛
𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒
= ?
=
5
10
= 0.5
Can the load factor when using
linear probing ever exceed 1.0?
Nope!!
Open Addressing in General
This is one example of open addressing
Open addressing means resolving collisions by trying
a sequence of other positions in the table
Trying the next spot is called probing
 We just did linear probing
h(key) + i) % TableSize
 In general have some probe function f and use
h(key) + f(i) % TableSize
Open addressing does poorly with high load factor 
 So we want larger tables
 Too many probes means we lose our O(1)
26
Open Addressing: Other Operations
insert finds an open table position using a probe
function
What about find?
 Must use same probe function to "retrace the
trail" for the data
 Unsuccessful search when reach empty position
What about delete?
 Must use "lazy" deletion. Why?
 Marker indicates "data was here, keep on probing"
27
10  / 23 / / 16  26
Primary Clustering
It turns out linear probing is a bad idea, even
though the probe function is quick to compute
(which is a good thing)
 This tends to produce
clusters, which lead to
long probe sequences
 This is called primary
clustering
 We saw the start of a
cluster in our linear
probing example
28
[R. Sedgewick]
Analysis of Linear Probing
Trivial fact:
For any  < 1, linear probing will find an empty slot
 We are safe from an infinite loop unless table is full
Non-trivial facts (we won’t prove these):
Average # of probes given load factor 
 For an unsuccessful search as TableSize → ∞:
1
2
1 +
1
(1 − 𝜆)2
 For an successful search as TableSize → ∞:
1
2
1 +
1
(1 − 𝜆)
29
Analysis in Chart Form
Linear-probing performance degrades rapidly as
the table gets full
 The Formula does assumes a "large table" but
the point remains
Note that separate chaining performance is linear
in  and has no trouble with  > 1
30
Open Addressing: Quadratic Probing
We can avoid primary clustering by changing the
probe function from just i to f(i)
(h(key) + f(i)) % TableSize
For quadratic probing, f(i) = i2:
0th probe: (h(key) + 0) % TableSize
1st probe: (h(key) + 1) % TableSize
2nd probe: (h(key) + 4) % TableSize
3rd probe: (h(key) + 9) % TableSize
…
ith probe: (h(key) + i2) % TableSize
Intuition: Probes quickly "leave the neighborhood"
31
Quadratic Probing Example
32
0
1
2
3
4
5
6
7
8
9
TableSize = 10
insert(89)
Quadratic Probing Example
33
0
1
2
3
4
5
6
7
8
9 89
TableSize = 10
insert(89)
insert(18)
Quadratic Probing Example
TableSize = 10
insert(89)
insert(18)
insert(49)
34
0
1
2
3
4
5
6
7
8 18
9 89
Quadratic Probing Example
TableSize = 10
insert(89)
insert(18)
insert(49)
49 % 10 = 9 collision!
(49 + 1) % 10 = 0
insert(58)
35
0 49
1
2
3
4
5
6
7
8 18
9 89
Quadratic Probing Example
TableSize = 10
insert(89)
insert(18)
insert(49)
insert(58)
58 % 10 = 8 collision!
(58 + 1) % 10 = 9 collision!
(58 + 4) % 10 = 2
insert(79)
36
0 49
1
2 58
3
4
5
6
7
8 18
9 89
Quadratic Probing Example
TableSize = 10
insert(89)
insert(18)
insert(49)
insert(58)
insert(79)
79 % 10 = 9 collision!
(79 + 1) % 10 = 0 collision!
(79 + 4) % 10 = 3
37
0 49
1
2 58
3 79
4
5
6
7
8 18
9 89
Another Quadratic Probing Example
38
0
1
2
3
4
5
6
TableSize = 7
Insert:
76 (76 % 7 = 6)
40 (40 % 7 = 5)
48 (48 % 7 = 6)
5 (5 % 7 = 5)
55 (55 % 7 = 6)
47 (47 % 7 = 5)
Another Quadratic Probing Example
39
0
1
2
3
4
5
6 76
TableSize = 7
Insert:
76 (76 % 7 = 6)
40 (40 % 7 = 5)
48 (48 % 7 = 6)
5 (5 % 7 = 5)
55 (55 % 7 = 6)
47 (47 % 7 = 5)
Another Quadratic Probing Example
40
0
1
2
3
4
5 40
6 76
TableSize = 7
Insert:
76 (76 % 7 = 6)
40 (40 % 7 = 5)
48 (48 % 7 = 6)
5 (5 % 7 = 5)
55 (55 % 7 = 6)
47 (47 % 7 = 5)
Another Quadratic Probing Example
41
0 48
1
2
3
4
5 40
6 76
TableSize = 7
Insert:
76 (76 % 7 = 6)
40 (40 % 7 = 5)
48 (48 % 7 = 6)
5 (5 % 7 = 5)
55 (55 % 7 = 6)
47 (47 % 7 = 5)
Another Quadratic Probing Example
42
0 48
1
2 5
3
4
5 40
6 76
TableSize = 7
Insert:
76 (76 % 7 = 6)
40 (40 % 7 = 5)
48 (48 % 7 = 6)
5 (5 % 7 = 5)
55 (55 % 7 = 6)
47 (47 % 7 = 5)
Another Quadratic Probing Example
43
0 48
1
2 5
3 55
4
5 40
6 76
TableSize = 7
Insert:
76 (76 % 7 = 6)
40 (40 % 7 = 5)
48 (48 % 7 = 6)
5 (5 % 7 = 5)
55 (55 % 7 = 6)
47 (47 % 7 = 5)
Another Quadratic Probing Example
44
0 48
1
2 5
3 55
4
5 40
6 76
TableSize = 7
Insert:
76 (76 % 7 = 6)
40 (40 % 7 = 5)
48 (48 % 7 = 6)
5 (5 % 7 = 5)
55 (55 % 7 = 6)
47 (47 % 7 = 5)
(47 + 1) % 7 = 6 collision!
(47 + 4) % 7 = 2 collision!
(47 + 9) % 7 = 0 collision!
(47 + 16) % 7 = 0 collision!
(47 + 25) % 7 = 2 collision!
Will we ever get
a 1 or 4?!?
Another Quadratic Probing Example
45
0 48
1
2 5
3 55
4
5 40
6 76
insert(47) will always fail here. Why?
For all n, (5 + n2) % 7 is 0, 2, 5, or 6
Proof uses induction and
(5 + n2) % 7 = (5 + (n - 7)2) % 7
In fact, for all c and k,
(c + n2) % k = (c + (n - k)2) % k
From Bad News to Good News
After TableSize quadratic probes, we cycle
through the same indices
The good news:
 For prime T and 0  i, j  T/2 where i  j,
(h(key) + i2) % T  (h(key) + j2) % T
 If TableSize is prime and  < ½, quadratic
probing will find an empty slot in at most
TableSize/2 probes
 If you keep  < ½, no need to detect cycles as
we just saw
46
Clustering Reconsidered
Quadratic probing does not suffer from primary
clustering as the quadratic nature quickly escapes
the neighborhood
But it is no help if keys initially hash the same index
 Any 2 keys that hash to the same value will have
the same series of moves after that
 Called secondary clustering
We can avoid secondary clustering with a probe
function that depends on the key: double hashing
47
Open Addressing: Double Hashing
Idea:
Given two good hash functions h and g, it is very
unlikely that for some key, h(key) == g(key)
Ergo, why not probe using g(key)?
For double hashing, f(i) = i ⋅ g(key):
0th probe: (h(key) + 0 ⋅ g(key)) % TableSize
1st probe: (h(key) + 1 ⋅ g(key)) % TableSize
2nd probe: (h(key) + 2 ⋅ g(key)) % TableSize
…
ith probe: (h(key) + i ⋅ g(key)) % TableSize
Crucial Detail:
We must make sure that g(key) cannot be 0
48
Double Hashing
Insert these values into the hash table in this
order. Resolve any collisions with double hashing:
13
28
33
147
43
T = 10 (TableSize)
Hash Functions:
h(key) = key mod T
g(key) = 1 + ((key/T) mod (T-1))
49
0
1
2
3
4
5
6
7
8
9
Double Hashing
Insert these values into the hash table in this
order. Resolve any collisions with double hashing:
13
28
33
147
43
T = 10 (TableSize)
Hash Functions:
h(key) = key mod T
g(key) = 1 + ((key/T) mod (T-1))
50
0
1
2
3 13
4
5
6
7
8
9
Double Hashing
Insert these values into the hash table in this
order. Resolve any collisions with double hashing:
13
28
33
147
43
T = 10 (TableSize)
Hash Functions:
h(key) = key mod T
g(key) = 1 + ((key/T) mod (T-1))
51
0
1
2
3 13
4
5
6
7
8 28
9
Double Hashing
Insert these values into the hash table in this
order. Resolve any collisions with double hashing:
13
28
33  g(33) = 1 + 3 mod 9 = 4
147
43
T = 10 (TableSize)
Hash Functions:
h(key) = key mod T
g(key) = 1 + ((key/T) mod (T-1))
52
0
1
2
3 13
4
5
6
7 33
8 28
9
Double Hashing
Insert these values into the hash table in this
order. Resolve any collisions with double hashing:
13
28
33
147  g(147) = 1 + 14 mod 9 = 6
43
T = 10 (TableSize)
Hash Functions:
h(key) = key mod T
g(key) = 1 + ((key/T) mod (T-1))
53
0
1
2
3 13
4
5
6
7 33
8 28
9 147
Double Hashing
Insert these values into the hash table in this
order. Resolve any collisions with double hashing:
13
28
33
147  g(147) = 1 + 14 mod 9 = 6
43  g(43) = 1 + 4 mod 9 = 5
T = 10 (TableSize)
Hash Functions:
h(key) = key mod T
g(key) = 1 + ((key/T) mod (T-1))
54
0
1
2
3 13
4
5
6
7 33
8 28
9 147
We have a problem:
3 + 0 = 3 3 + 5 = 8 3 + 10 = 13
3 + 15 = 18 3 + 20 = 23
Double Hashing Analysis
Because each probe is "jumping" by g(key) each
time, we should ideally "leave the neighborhood" and
"go different places from the same initial collision"
But, as in quadratic probing, we could still have a
problem where we are not "safe" due to an infinite
loop despite room in table
This cannot happen in at least one case:
For primes p and q such that 2 < q < p
h(key) = key % p
g(key) = q – (key % q)
55
Summarizing Collision Resolution
Separate Chaining is easy
 find, delete proportional to load factor on average
 insert can be constant if just push on front of list
Open addressing uses probing, has clustering issues
as it gets full but still has reasons for its use:
 Easier data representation
 Less memory allocation
 Run-time overhead for list nodes (but an array
implementation could be faster)
56
REHASHING
When you make hash from hash leftovers…
57
Rehashing
As with array-based stacks/queues/lists
 If table gets too full, create a bigger table and
copy everything
 Less helpful to shrink a table that is underfull
With chaining, we get to decide what "too full"
means
 Keep load factor reasonable (e.g., < 1)?
 Consider average or max size of non-empty chains
For open addressing, half-full is a good rule of thumb
58
Rehashing
What size should we choose?
 Twice-as-big?
 Except that won’t be prime!
We go twice-as-big but guarantee prime
 Implement by hard coding a list of prime numbers
 You probably will not grow more than 20-30 times
and can then calculate after that if necessary
59
Rehashing
Can we copy all data to the same indices in the new table?
 Will not work; we calculated the index based on TableSize
Rehash Algorithm:
Go through old table
Do standard insert for each item into new table
Resize is an O(n) operation,
 Iterate over old table: O(n)
 n inserts / calls to the hash function: n ⋅ O(1) = O(n)
Is there some way to avoid all those hash function calls?
 Space/time tradeoff: Could store h(key) with each data item
 Growing the table is still O(n); only helps by a constant factor
60
EXTENSIBLE HASHING
When you make hash from hash leftovers…
61
Extendible Hashing
• Hashing technique for huge data sets
– optimizes to reduce disk accesses
– each hash bucket fits on one disk block
– better than B-Trees if order is not important
• Table contains
– buckets, each fitting in one disk block, with the
data
– a directory that fits in one disk block used to hash
to the correct bucket
62
001 010 011 110 111
101
Extendible Hash Table
• Directory - entries labeled by k bits & pointer
to bucket with all keys starting with its bits
• Each block contains keys & data matching on
the first j  k bits
000 100
(2)
00001
00011
00100
00110
(2)
01001
01011
01100
(3)
10001
10011
(3)
10101
10110
10111
(2)
11001
11011
11100
11110
directory for k = 3
63
Inserting (easy case)
001 010 011 110 111
101
000 100
(2)
00001
00011
00100
00110
(2)
01001
01011
01100
(3)
10001
10011
(3)
10101
10110
10111
(2)
11001
11011
11100
11110
insert(11011)
001 010 011 110 111
101
000 100
(2)
00001
00011
00100
00110
(2)
01001
01011
01100
(3)
10001
10011
(3)
10101
10110
10111
(2)
11001
11100
11110
64
Splitting
001 010 011 110 111
101
000 100
(2)
00001
00011
00100
00110
(2)
01001
01011
01100
(3)
10001
10011
(3)
10101
10110
10111
(2)
11001
11011
11100
11110
insert(11000)
001 010 011 110 111
101
000 100
(2)
00001
00011
00100
00110
(2)
01001
01011
01100
(3)
10001
10011
(3)
10101
10110
10111
(3)
11000
11001
11011
(3)
11100
11110
65
IMPLEMENTING HASHING
Reality is never as clean-cut as theory
66
Hashing and Comparing
Our use of int key can lead to us overlooking a
critical detail
 We do perform the initial hash on E
 While chaining/probing, we compare to E which
requires equality testing (compare == 0)
A hash table needs a hash function and a comparator
 In Project 2, you will use two function objects
 The Java library uses a more object-oriented approach:
each object has an equals method and a hashCode
method:
67
class Object {
boolean equals(Object o) {…}
int hashCode() {…}
…
}
Equal Objects Must Hash the Same
The Java library (and your project hash table) make
a very important assumption that clients must satisfy
Object-oriented way of saying it:
If a.equals(b), then we must require
a.hashCode()==b.hashCode()
Function object way of saying it:
If c.compare(a,b) == 0, then we must require
h.hash(a) == h.hash(b)
If you ever override equals
 You need to override hashCode also in a consistent way
 See CoreJava book, Chapter 5 for other "gotchas" with equals
68
Comparable/Comparator Rules
We have not emphasized important "rules"
about comparison for:
 all our dictionaries
 sorting (next major topic)
Comparison must impose a consistent,
total ordering:
For all a, b, and c:
 If compare(a,b) < 0, then compare(b,a) > 0
 If compare(a,b) == 0, then compare(b,a) == 0
 If compare(a,b) < 0 and compare(b,c) < 0,
then compare(a,c) < 0
69
A Generally Good hashCode()
int result = 17; // start at a prime
foreach field f
int fieldHashcode =
boolean: (f ? 1: 0)
byte, char, short, int: (int) f
long: (int) (f ^ (f >>> 32))
float: Float.floatToIntBits(f)
double: Double.doubleToLongBits(f), then above
Object: object.hashCode( )
result = 31 * result + fieldHashcode;
return result;
70
Final Word on Hashing
The hash table is one of the most important data structures
 Efficient find, insert, and delete
 Operations based on sorted order are not so efficient
 Useful in many, many real-world applications
 Popular topic for job interview questions
Important to use a good hash function
 Good distribution of key hashs
 Not overly expensive to calculate (bit shifts good!)
Important to keep hash table at a good size
 Keep TableSize a prime number
 Set a preferable  depending on type of hashtable
71

More Related Content

What's hot

Queue in Data Structure
Queue in Data Structure Queue in Data Structure
Queue in Data Structure Janki Shah
 
Data Structures Notes 2021
Data Structures Notes 2021Data Structures Notes 2021
Data Structures Notes 2021Sreedhar Chowdam
 
Linked list in Data Structure and Algorithm
Linked list in Data Structure and Algorithm Linked list in Data Structure and Algorithm
Linked list in Data Structure and Algorithm KristinaBorooah
 
Arrays In Python | Python Array Operations | Edureka
Arrays In Python | Python Array Operations | EdurekaArrays In Python | Python Array Operations | Edureka
Arrays In Python | Python Array Operations | EdurekaEdureka!
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and AlgorithmDhaval Kaneria
 
Hashing In Data Structure
Hashing In Data Structure Hashing In Data Structure
Hashing In Data Structure Meghaj Mallick
 
Data structure - Graph
Data structure - GraphData structure - Graph
Data structure - GraphMadhu Bala
 
Data Structures in Python
Data Structures in PythonData Structures in Python
Data Structures in PythonDevashish Kumar
 

What's hot (20)

Python exception handling
Python   exception handlingPython   exception handling
Python exception handling
 
Queue in Data Structure
Queue in Data Structure Queue in Data Structure
Queue in Data Structure
 
Hash table
Hash tableHash table
Hash table
 
Hashing PPT
Hashing PPTHashing PPT
Hashing PPT
 
Data Structures Notes 2021
Data Structures Notes 2021Data Structures Notes 2021
Data Structures Notes 2021
 
Linked list in Data Structure and Algorithm
Linked list in Data Structure and Algorithm Linked list in Data Structure and Algorithm
Linked list in Data Structure and Algorithm
 
Linked list
Linked listLinked list
Linked list
 
Arrays In Python | Python Array Operations | Edureka
Arrays In Python | Python Array Operations | EdurekaArrays In Python | Python Array Operations | Edureka
Arrays In Python | Python Array Operations | Edureka
 
Binary search tree(bst)
Binary search tree(bst)Binary search tree(bst)
Binary search tree(bst)
 
Heaps
HeapsHeaps
Heaps
 
stack & queue
stack & queuestack & queue
stack & queue
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
 
Binary Search
Binary SearchBinary Search
Binary Search
 
convex hull
convex hullconvex hull
convex hull
 
Heap sort
Heap sortHeap sort
Heap sort
 
Hashing In Data Structure
Hashing In Data Structure Hashing In Data Structure
Hashing In Data Structure
 
Linear search-and-binary-search
Linear search-and-binary-searchLinear search-and-binary-search
Linear search-and-binary-search
 
Data structure - Graph
Data structure - GraphData structure - Graph
Data structure - Graph
 
Data Structures in Python
Data Structures in PythonData Structures in Python
Data Structures in Python
 
Abstract Data Types
Abstract Data TypesAbstract Data Types
Abstract Data Types
 

Similar to Collision in Hashing.pptx

computer notes - Data Structures - 36
computer notes - Data Structures - 36computer notes - Data Structures - 36
computer notes - Data Structures - 36ecomputernotes
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec IISajid Marwat
 
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)Kuntal Bhowmick
 
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...Subhajit Sahu
 
lecture 17
lecture 17lecture 17
lecture 17sajinsc
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2SHAKOOR AB
 
BASIC OF ALGORITHM AND MATHEMATICS STUDENTS
BASIC OF ALGORITHM AND MATHEMATICS STUDENTSBASIC OF ALGORITHM AND MATHEMATICS STUDENTS
BASIC OF ALGORITHM AND MATHEMATICS STUDENTSjainyshah20
 
Assignment4 Assignment 4 Hashtables In this assignment we w.pdf
Assignment4 Assignment 4 Hashtables In this assignment we w.pdfAssignment4 Assignment 4 Hashtables In this assignment we w.pdf
Assignment4 Assignment 4 Hashtables In this assignment we w.pdfkksrivastava1
 
computer notes - Data Structures - 35
computer notes - Data Structures - 35computer notes - Data Structures - 35
computer notes - Data Structures - 35ecomputernotes
 
HASHING IS NOT YASH IT IS HASH.pptx
HASHING IS NOT YASH IT IS HASH.pptxHASHING IS NOT YASH IT IS HASH.pptx
HASHING IS NOT YASH IT IS HASH.pptxJITTAYASHWANTHREDDY
 
Calculus - Functions Review
Calculus - Functions ReviewCalculus - Functions Review
Calculus - Functions Reviewhassaanciit
 
Computer notes - Hashing
Computer notes - HashingComputer notes - Hashing
Computer notes - Hashingecomputernotes
 
Sienna 9 hashing
Sienna 9 hashingSienna 9 hashing
Sienna 9 hashingchidabdu
 

Similar to Collision in Hashing.pptx (20)

computer notes - Data Structures - 36
computer notes - Data Structures - 36computer notes - Data Structures - 36
computer notes - Data Structures - 36
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec II
 
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
 
Hashing
HashingHashing
Hashing
 
Hashing
HashingHashing
Hashing
 
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
Concurrent Hashing and Natural Parallelism : The Art of Multiprocessor Progra...
 
lecture 17
lecture 17lecture 17
lecture 17
 
new math seminar paper
new math seminar papernew math seminar paper
new math seminar paper
 
Presentation Of Analysis
Presentation Of Analysis  Presentation Of Analysis
Presentation Of Analysis
 
Hashing .pptx
Hashing .pptxHashing .pptx
Hashing .pptx
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
 
BASIC OF ALGORITHM AND MATHEMATICS STUDENTS
BASIC OF ALGORITHM AND MATHEMATICS STUDENTSBASIC OF ALGORITHM AND MATHEMATICS STUDENTS
BASIC OF ALGORITHM AND MATHEMATICS STUDENTS
 
Assignment4 Assignment 4 Hashtables In this assignment we w.pdf
Assignment4 Assignment 4 Hashtables In this assignment we w.pdfAssignment4 Assignment 4 Hashtables In this assignment we w.pdf
Assignment4 Assignment 4 Hashtables In this assignment we w.pdf
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Cours Stats 5E
Cours Stats 5ECours Stats 5E
Cours Stats 5E
 
computer notes - Data Structures - 35
computer notes - Data Structures - 35computer notes - Data Structures - 35
computer notes - Data Structures - 35
 
HASHING IS NOT YASH IT IS HASH.pptx
HASHING IS NOT YASH IT IS HASH.pptxHASHING IS NOT YASH IT IS HASH.pptx
HASHING IS NOT YASH IT IS HASH.pptx
 
Calculus - Functions Review
Calculus - Functions ReviewCalculus - Functions Review
Calculus - Functions Review
 
Computer notes - Hashing
Computer notes - HashingComputer notes - Hashing
Computer notes - Hashing
 
Sienna 9 hashing
Sienna 9 hashingSienna 9 hashing
Sienna 9 hashing
 

More from NBACriteria2SICET

More from NBACriteria2SICET (8)

Basics of Classification.ppt
Basics of Classification.pptBasics of Classification.ppt
Basics of Classification.ppt
 
PHP InterLevel.ppt
PHP InterLevel.pptPHP InterLevel.ppt
PHP InterLevel.ppt
 
PHP-01-Overview.ppt
PHP-01-Overview.pptPHP-01-Overview.ppt
PHP-01-Overview.ppt
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Support Vector Machine.ppt
Support Vector Machine.pptSupport Vector Machine.ppt
Support Vector Machine.ppt
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
 
Linked List.ppt
Linked List.pptLinked List.ppt
Linked List.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 

Recently uploaded

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...Call girls in Ahmedabad High profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 

Recently uploaded (20)

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 

Collision in Hashing.pptx

  • 2. Collision Avoidance With (x%TableSize), number of collisions depends on  the ints inserted  TableSize Larger table-size tends to help, but not always  Example: 70, 24, 56, 43, 10 with TableSize = 10 and TableSize = 60 Technique: Pick table size to be prime. Why?  Real-life data tends to have a pattern,  "Multiples of 61" are probably less likely than "multiples of 60"  Some collision strategies do better with prime size 2
  • 3. Collision Resolution Collision: When two keys map to the same location in the hash table We try to avoid it, but the number of keys always exceeds the table size Ergo, hash tables generally must support some form of collision resolution 3
  • 4. Flavors of Collision Resolution Separate Chaining Open Addressing  Linear Probing  Quadratic Probing  Double Hashing 4
  • 5. Terminology Warning We and the book use the terms  "chaining" or "separate chaining"  "open addressing" Very confusingly, others use the terms  "open hashing" for "chaining"  "closed hashing" for "open addressing" We also do trees upside-down 5
  • 6. Separate Chaining 6 0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / All keys that map to the same table location are kept in a linked list (a.k.a. a "chain" or "bucket") As easy as it sounds Example: insert 10, 22, 86, 12, 42 with h(x) = x % 10
  • 7. Separate Chaining 7 0 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / All keys that map to the same table location are kept in a linked list (a.k.a. a "chain" or "bucket") As easy as it sounds Example: insert 10, 22, 86, 12, 42 with h(x) = x % 10 10 /
  • 8. Separate Chaining 8 0 1 / 2 3 / 4 / 5 / 6 / 7 / 8 / 9 / All keys that map to the same table location are kept in a linked list (a.k.a. a "chain" or "bucket") As easy as it sounds Example: insert 10, 22, 86, 12, 42 with h(x) = x % 10 10 / 22 /
  • 9. Separate Chaining 9 0 1 / 2 3 / 4 / 5 / 6 7 / 8 / 9 / All keys that map to the same table location are kept in a linked list (a.k.a. a "chain" or "bucket") As easy as it sounds Example: insert 10, 22, 86, 12, 42 with h(x) = x % 10 10 / 22 / 86 /
  • 10. Separate Chaining 10 0 1 / 2 3 / 4 / 5 / 6 7 / 8 / 9 / All keys that map to the same table location are kept in a linked list (a.k.a. a "chain" or "bucket") As easy as it sounds Example: insert 10, 22, 86, 12, 42 with h(x) = x % 10 10 / 12 86 / 22 /
  • 11. Separate Chaining 11 0 1 / 2 3 / 4 / 5 / 6 7 / 8 / 9 / All keys that map to the same table location are kept in a linked list (a.k.a. a "chain" or "bucket") As easy as it sounds Example: insert 10, 22, 86, 12, 42 with h(x) = x % 10 10 / 42 86 / 12 22 /
  • 12. Thoughts on Separate Chaining Worst-case time for find?  Linear  But only with really bad luck or bad hash function  Not worth avoiding (e.g., with balanced trees at each bucket)  Keep small number of items in each bucket  Overhead of tree balancing not worthwhile for small n Beyond asymptotic complexity, some "data-structure engineering" can improve constant factors  Linked list, array, or a hybrid  Insert at end or beginning of list  Sorting the lists gains and loses performance  Splay-like: Always move item to front of list 12
  • 13. Rigorous Separate Chaining Analysis The load factor, , of a hash table is calculated as 𝜆 = 𝑛 𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒 where n is the number of items currently in the table 13
  • 14. Load Factor? 14 0 1 / 2 3 / 4 / 5 / 6 7 / 8 / 9 / 10 / 42 86 / 12 22 / 𝜆 = 𝑛 𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒 = ? = 5 10 = 0.5
  • 15. Load Factor? 15 0 1 2 3 4 / 5 6 7 8 9 10 / 42 86 / 12 22 / 𝜆 = 𝑛 𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒 = ? = 21 10 = 2.1 71 2 31 / 63 73 / 75 5 65 95 / 27 47 88 18 38 98 / 99 /
  • 16. Rigorous Separate Chaining Analysis The load factor, , of a hash table is calculated as 𝜆 = 𝑛 𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒 where n is the number of items currently in the table Under chaining, the average number of elements per bucket is ___ So if some inserts are followed by random finds, then on average:  Each unsuccessful find compares against ___ items  Each successful find compares against ___ items How big should TableSize be?? 16
  • 17. Rigorous Separate Chaining Analysis The load factor, , of a hash table is calculated as 𝜆 = 𝑛 𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒 where n is the number of items currently in the table Under chaining, the average number of elements per bucket is  So if some inserts are followed by random finds, then on average:  Each unsuccessful find compares against  items  Each successful find compares against  items  If  is low, find and insert likely to be O(1)  We like to keep  around 1 for separate chaining 17
  • 18. Separate Chaining Deletion Not too bad and quite easy  Find in table  Delete from bucket Similar run-time as insert  Sensitive to underlying bucket structure 18 0 1 / 2 3 / 4 / 5 / 6 7 / 8 / 9 / 10 / 42 86 / 12 22 /
  • 19. Open Addressing: Linear Probing Separate chaining does not use all the space in the table. Why not use it?  Store directly in the array cell  No linked lists or buckets How to deal with collisions? If h(key) is already full, try (h(key) + 1) % TableSize. If full, try (h(key) + 2) % TableSize. If full, try (h(key) + 3) % TableSize. If full… Example: insert 38, 19, 8, 79, 10 0 1 2 3 4 5 6 7 8 9 19
  • 20. Open Addressing: Linear Probing Separate chaining does not use all the space in the table. Why not use it?  Store directly in the array cell (no linked list or buckets) How to deal with collisions? If h(key) is already full, try (h(key) + 1) % TableSize. If full, try (h(key) + 2) % TableSize. If full, try (h(key) + 3) % TableSize. If full… Example: insert 38, 19, 8, 79, 10 0 1 2 3 4 5 6 7 8 38 9 20
  • 21. Open Addressing: Linear Probing Separate chaining does not use all the space in the table. Why not use it?  Store directly in the array cell (no linked list or buckets) How to deal with collisions? If h(key) is already full, try (h(key) + 1) % TableSize. If full, try (h(key) + 2) % TableSize. If full, try (h(key) + 3) % TableSize. If full… Example: insert 38, 19, 8, 79, 10 0 1 2 3 4 5 6 7 8 38 9 19 21
  • 22. Open Addressing: Linear Probing Separate chaining does not use all the space in the table. Why not use it?  Store directly in the array cell (no linked list or buckets) How to deal with collisions? If h(key) is already full, try (h(key) + 1) % TableSize. If full, try (h(key) + 2) % TableSize. If full, try (h(key) + 3) % TableSize. If full… Example: insert 38, 19, 8, 79, 10 0 8 1 2 3 4 5 6 7 8 38 9 19 22
  • 23. Open Addressing: Linear Probing Separate chaining does not use all the space in the table. Why not use it?  Store directly in the array cell (no linked list or buckets) How to deal with collisions? If h(key) is already full, try (h(key) + 1) % TableSize. If full, try (h(key) + 2) % TableSize. If full, try (h(key) + 3) % TableSize. If full… Example: insert 38, 19, 8, 79, 10 0 8 1 79 2 3 4 5 6 7 8 38 9 19 23
  • 24. Open Addressing: Linear Probing Separate chaining does not use all the space in the table. Why not use it?  Store directly in the array cell (no linked list or buckets) How to deal with collisions? If h(key) is already full, try (h(key) + 1) % TableSize. If full, try (h(key) + 2) % TableSize. If full, try (h(key) + 3) % TableSize. If full… Example: insert 38, 19, 8, 79, 10 0 8 1 79 2 10 3 4 5 6 7 8 38 9 19 24
  • 25. Load Factor? 0 8 1 79 2 10 3 4 5 6 7 8 38 9 19 25 𝜆 = 𝑛 𝑇𝑎𝑏𝑙𝑒𝑆𝑖𝑧𝑒 = ? = 5 10 = 0.5 Can the load factor when using linear probing ever exceed 1.0? Nope!!
  • 26. Open Addressing in General This is one example of open addressing Open addressing means resolving collisions by trying a sequence of other positions in the table Trying the next spot is called probing  We just did linear probing h(key) + i) % TableSize  In general have some probe function f and use h(key) + f(i) % TableSize Open addressing does poorly with high load factor   So we want larger tables  Too many probes means we lose our O(1) 26
  • 27. Open Addressing: Other Operations insert finds an open table position using a probe function What about find?  Must use same probe function to "retrace the trail" for the data  Unsuccessful search when reach empty position What about delete?  Must use "lazy" deletion. Why?  Marker indicates "data was here, keep on probing" 27 10  / 23 / / 16  26
  • 28. Primary Clustering It turns out linear probing is a bad idea, even though the probe function is quick to compute (which is a good thing)  This tends to produce clusters, which lead to long probe sequences  This is called primary clustering  We saw the start of a cluster in our linear probing example 28 [R. Sedgewick]
  • 29. Analysis of Linear Probing Trivial fact: For any  < 1, linear probing will find an empty slot  We are safe from an infinite loop unless table is full Non-trivial facts (we won’t prove these): Average # of probes given load factor   For an unsuccessful search as TableSize → ∞: 1 2 1 + 1 (1 − 𝜆)2  For an successful search as TableSize → ∞: 1 2 1 + 1 (1 − 𝜆) 29
  • 30. Analysis in Chart Form Linear-probing performance degrades rapidly as the table gets full  The Formula does assumes a "large table" but the point remains Note that separate chaining performance is linear in  and has no trouble with  > 1 30
  • 31. Open Addressing: Quadratic Probing We can avoid primary clustering by changing the probe function from just i to f(i) (h(key) + f(i)) % TableSize For quadratic probing, f(i) = i2: 0th probe: (h(key) + 0) % TableSize 1st probe: (h(key) + 1) % TableSize 2nd probe: (h(key) + 4) % TableSize 3rd probe: (h(key) + 9) % TableSize … ith probe: (h(key) + i2) % TableSize Intuition: Probes quickly "leave the neighborhood" 31
  • 33. Quadratic Probing Example 33 0 1 2 3 4 5 6 7 8 9 89 TableSize = 10 insert(89) insert(18)
  • 34. Quadratic Probing Example TableSize = 10 insert(89) insert(18) insert(49) 34 0 1 2 3 4 5 6 7 8 18 9 89
  • 35. Quadratic Probing Example TableSize = 10 insert(89) insert(18) insert(49) 49 % 10 = 9 collision! (49 + 1) % 10 = 0 insert(58) 35 0 49 1 2 3 4 5 6 7 8 18 9 89
  • 36. Quadratic Probing Example TableSize = 10 insert(89) insert(18) insert(49) insert(58) 58 % 10 = 8 collision! (58 + 1) % 10 = 9 collision! (58 + 4) % 10 = 2 insert(79) 36 0 49 1 2 58 3 4 5 6 7 8 18 9 89
  • 37. Quadratic Probing Example TableSize = 10 insert(89) insert(18) insert(49) insert(58) insert(79) 79 % 10 = 9 collision! (79 + 1) % 10 = 0 collision! (79 + 4) % 10 = 3 37 0 49 1 2 58 3 79 4 5 6 7 8 18 9 89
  • 38. Another Quadratic Probing Example 38 0 1 2 3 4 5 6 TableSize = 7 Insert: 76 (76 % 7 = 6) 40 (40 % 7 = 5) 48 (48 % 7 = 6) 5 (5 % 7 = 5) 55 (55 % 7 = 6) 47 (47 % 7 = 5)
  • 39. Another Quadratic Probing Example 39 0 1 2 3 4 5 6 76 TableSize = 7 Insert: 76 (76 % 7 = 6) 40 (40 % 7 = 5) 48 (48 % 7 = 6) 5 (5 % 7 = 5) 55 (55 % 7 = 6) 47 (47 % 7 = 5)
  • 40. Another Quadratic Probing Example 40 0 1 2 3 4 5 40 6 76 TableSize = 7 Insert: 76 (76 % 7 = 6) 40 (40 % 7 = 5) 48 (48 % 7 = 6) 5 (5 % 7 = 5) 55 (55 % 7 = 6) 47 (47 % 7 = 5)
  • 41. Another Quadratic Probing Example 41 0 48 1 2 3 4 5 40 6 76 TableSize = 7 Insert: 76 (76 % 7 = 6) 40 (40 % 7 = 5) 48 (48 % 7 = 6) 5 (5 % 7 = 5) 55 (55 % 7 = 6) 47 (47 % 7 = 5)
  • 42. Another Quadratic Probing Example 42 0 48 1 2 5 3 4 5 40 6 76 TableSize = 7 Insert: 76 (76 % 7 = 6) 40 (40 % 7 = 5) 48 (48 % 7 = 6) 5 (5 % 7 = 5) 55 (55 % 7 = 6) 47 (47 % 7 = 5)
  • 43. Another Quadratic Probing Example 43 0 48 1 2 5 3 55 4 5 40 6 76 TableSize = 7 Insert: 76 (76 % 7 = 6) 40 (40 % 7 = 5) 48 (48 % 7 = 6) 5 (5 % 7 = 5) 55 (55 % 7 = 6) 47 (47 % 7 = 5)
  • 44. Another Quadratic Probing Example 44 0 48 1 2 5 3 55 4 5 40 6 76 TableSize = 7 Insert: 76 (76 % 7 = 6) 40 (40 % 7 = 5) 48 (48 % 7 = 6) 5 (5 % 7 = 5) 55 (55 % 7 = 6) 47 (47 % 7 = 5) (47 + 1) % 7 = 6 collision! (47 + 4) % 7 = 2 collision! (47 + 9) % 7 = 0 collision! (47 + 16) % 7 = 0 collision! (47 + 25) % 7 = 2 collision! Will we ever get a 1 or 4?!?
  • 45. Another Quadratic Probing Example 45 0 48 1 2 5 3 55 4 5 40 6 76 insert(47) will always fail here. Why? For all n, (5 + n2) % 7 is 0, 2, 5, or 6 Proof uses induction and (5 + n2) % 7 = (5 + (n - 7)2) % 7 In fact, for all c and k, (c + n2) % k = (c + (n - k)2) % k
  • 46. From Bad News to Good News After TableSize quadratic probes, we cycle through the same indices The good news:  For prime T and 0  i, j  T/2 where i  j, (h(key) + i2) % T  (h(key) + j2) % T  If TableSize is prime and  < ½, quadratic probing will find an empty slot in at most TableSize/2 probes  If you keep  < ½, no need to detect cycles as we just saw 46
  • 47. Clustering Reconsidered Quadratic probing does not suffer from primary clustering as the quadratic nature quickly escapes the neighborhood But it is no help if keys initially hash the same index  Any 2 keys that hash to the same value will have the same series of moves after that  Called secondary clustering We can avoid secondary clustering with a probe function that depends on the key: double hashing 47
  • 48. Open Addressing: Double Hashing Idea: Given two good hash functions h and g, it is very unlikely that for some key, h(key) == g(key) Ergo, why not probe using g(key)? For double hashing, f(i) = i ⋅ g(key): 0th probe: (h(key) + 0 ⋅ g(key)) % TableSize 1st probe: (h(key) + 1 ⋅ g(key)) % TableSize 2nd probe: (h(key) + 2 ⋅ g(key)) % TableSize … ith probe: (h(key) + i ⋅ g(key)) % TableSize Crucial Detail: We must make sure that g(key) cannot be 0 48
  • 49. Double Hashing Insert these values into the hash table in this order. Resolve any collisions with double hashing: 13 28 33 147 43 T = 10 (TableSize) Hash Functions: h(key) = key mod T g(key) = 1 + ((key/T) mod (T-1)) 49 0 1 2 3 4 5 6 7 8 9
  • 50. Double Hashing Insert these values into the hash table in this order. Resolve any collisions with double hashing: 13 28 33 147 43 T = 10 (TableSize) Hash Functions: h(key) = key mod T g(key) = 1 + ((key/T) mod (T-1)) 50 0 1 2 3 13 4 5 6 7 8 9
  • 51. Double Hashing Insert these values into the hash table in this order. Resolve any collisions with double hashing: 13 28 33 147 43 T = 10 (TableSize) Hash Functions: h(key) = key mod T g(key) = 1 + ((key/T) mod (T-1)) 51 0 1 2 3 13 4 5 6 7 8 28 9
  • 52. Double Hashing Insert these values into the hash table in this order. Resolve any collisions with double hashing: 13 28 33  g(33) = 1 + 3 mod 9 = 4 147 43 T = 10 (TableSize) Hash Functions: h(key) = key mod T g(key) = 1 + ((key/T) mod (T-1)) 52 0 1 2 3 13 4 5 6 7 33 8 28 9
  • 53. Double Hashing Insert these values into the hash table in this order. Resolve any collisions with double hashing: 13 28 33 147  g(147) = 1 + 14 mod 9 = 6 43 T = 10 (TableSize) Hash Functions: h(key) = key mod T g(key) = 1 + ((key/T) mod (T-1)) 53 0 1 2 3 13 4 5 6 7 33 8 28 9 147
  • 54. Double Hashing Insert these values into the hash table in this order. Resolve any collisions with double hashing: 13 28 33 147  g(147) = 1 + 14 mod 9 = 6 43  g(43) = 1 + 4 mod 9 = 5 T = 10 (TableSize) Hash Functions: h(key) = key mod T g(key) = 1 + ((key/T) mod (T-1)) 54 0 1 2 3 13 4 5 6 7 33 8 28 9 147 We have a problem: 3 + 0 = 3 3 + 5 = 8 3 + 10 = 13 3 + 15 = 18 3 + 20 = 23
  • 55. Double Hashing Analysis Because each probe is "jumping" by g(key) each time, we should ideally "leave the neighborhood" and "go different places from the same initial collision" But, as in quadratic probing, we could still have a problem where we are not "safe" due to an infinite loop despite room in table This cannot happen in at least one case: For primes p and q such that 2 < q < p h(key) = key % p g(key) = q – (key % q) 55
  • 56. Summarizing Collision Resolution Separate Chaining is easy  find, delete proportional to load factor on average  insert can be constant if just push on front of list Open addressing uses probing, has clustering issues as it gets full but still has reasons for its use:  Easier data representation  Less memory allocation  Run-time overhead for list nodes (but an array implementation could be faster) 56
  • 57. REHASHING When you make hash from hash leftovers… 57
  • 58. Rehashing As with array-based stacks/queues/lists  If table gets too full, create a bigger table and copy everything  Less helpful to shrink a table that is underfull With chaining, we get to decide what "too full" means  Keep load factor reasonable (e.g., < 1)?  Consider average or max size of non-empty chains For open addressing, half-full is a good rule of thumb 58
  • 59. Rehashing What size should we choose?  Twice-as-big?  Except that won’t be prime! We go twice-as-big but guarantee prime  Implement by hard coding a list of prime numbers  You probably will not grow more than 20-30 times and can then calculate after that if necessary 59
  • 60. Rehashing Can we copy all data to the same indices in the new table?  Will not work; we calculated the index based on TableSize Rehash Algorithm: Go through old table Do standard insert for each item into new table Resize is an O(n) operation,  Iterate over old table: O(n)  n inserts / calls to the hash function: n ⋅ O(1) = O(n) Is there some way to avoid all those hash function calls?  Space/time tradeoff: Could store h(key) with each data item  Growing the table is still O(n); only helps by a constant factor 60
  • 61. EXTENSIBLE HASHING When you make hash from hash leftovers… 61
  • 62. Extendible Hashing • Hashing technique for huge data sets – optimizes to reduce disk accesses – each hash bucket fits on one disk block – better than B-Trees if order is not important • Table contains – buckets, each fitting in one disk block, with the data – a directory that fits in one disk block used to hash to the correct bucket 62
  • 63. 001 010 011 110 111 101 Extendible Hash Table • Directory - entries labeled by k bits & pointer to bucket with all keys starting with its bits • Each block contains keys & data matching on the first j  k bits 000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (2) 11001 11011 11100 11110 directory for k = 3 63
  • 64. Inserting (easy case) 001 010 011 110 111 101 000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (2) 11001 11011 11100 11110 insert(11011) 001 010 011 110 111 101 000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (2) 11001 11100 11110 64
  • 65. Splitting 001 010 011 110 111 101 000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (2) 11001 11011 11100 11110 insert(11000) 001 010 011 110 111 101 000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (3) 11000 11001 11011 (3) 11100 11110 65
  • 66. IMPLEMENTING HASHING Reality is never as clean-cut as theory 66
  • 67. Hashing and Comparing Our use of int key can lead to us overlooking a critical detail  We do perform the initial hash on E  While chaining/probing, we compare to E which requires equality testing (compare == 0) A hash table needs a hash function and a comparator  In Project 2, you will use two function objects  The Java library uses a more object-oriented approach: each object has an equals method and a hashCode method: 67 class Object { boolean equals(Object o) {…} int hashCode() {…} … }
  • 68. Equal Objects Must Hash the Same The Java library (and your project hash table) make a very important assumption that clients must satisfy Object-oriented way of saying it: If a.equals(b), then we must require a.hashCode()==b.hashCode() Function object way of saying it: If c.compare(a,b) == 0, then we must require h.hash(a) == h.hash(b) If you ever override equals  You need to override hashCode also in a consistent way  See CoreJava book, Chapter 5 for other "gotchas" with equals 68
  • 69. Comparable/Comparator Rules We have not emphasized important "rules" about comparison for:  all our dictionaries  sorting (next major topic) Comparison must impose a consistent, total ordering: For all a, b, and c:  If compare(a,b) < 0, then compare(b,a) > 0  If compare(a,b) == 0, then compare(b,a) == 0  If compare(a,b) < 0 and compare(b,c) < 0, then compare(a,c) < 0 69
  • 70. A Generally Good hashCode() int result = 17; // start at a prime foreach field f int fieldHashcode = boolean: (f ? 1: 0) byte, char, short, int: (int) f long: (int) (f ^ (f >>> 32)) float: Float.floatToIntBits(f) double: Double.doubleToLongBits(f), then above Object: object.hashCode( ) result = 31 * result + fieldHashcode; return result; 70
  • 71. Final Word on Hashing The hash table is one of the most important data structures  Efficient find, insert, and delete  Operations based on sorted order are not so efficient  Useful in many, many real-world applications  Popular topic for job interview questions Important to use a good hash function  Good distribution of key hashs  Not overly expensive to calculate (bit shifts good!) Important to keep hash table at a good size  Keep TableSize a prime number  Set a preferable  depending on type of hashtable 71