SlideShare a Scribd company logo
1 of 98
Download to read offline
Advanced Algorithms – COMS31900
Hashing part one
Chaining, true randomness and universal hashing
(based on slides by Markus Jalsenius)
Benjamin Sach
Dictionaries
Often we want to perform the following three operations:
In a dictionary data structure we store (key, value)-pairs
add(x, v) Add the the pair (x, v).
lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise.
delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary).
such that for any key there is at most one pair (key, value) in the dictionary.
Dictionaries
Often we want to perform the following three operations:
Linked lists
Binary search trees
(2,3,4)-trees
In a dictionary data structure we store (key, value)-pairs
add(x, v) Add the the pair (x, v).
lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise.
delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary).
There are many data structures that will do this job, e.g.:
Red-black trees
Skip lists
van Emde Boas trees (later in this course)
such that for any key there is at most one pair (key, value) in the dictionary.
Dictionaries
Often we want to perform the following three operations:
Linked lists
Binary search trees
(2,3,4)-trees
In a dictionary data structure we store (key, value)-pairs
add(x, v) Add the the pair (x, v).
lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise.
delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary).
There are many data structures that will do this job, e.g.:
Red-black trees
Skip lists
van Emde Boas trees (later in this course)
such that for any key there is at most one pair (key, value) in the dictionary.
these data structures all support extra operations beyond the three above
Dictionaries
Often we want to perform the following three operations:
Linked lists
Binary search trees
(2,3,4)-trees
In a dictionary data structure we store (key, value)-pairs
add(x, v) Add the the pair (x, v).
lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise.
delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary).
There are many data structures that will do this job, e.g.:
Red-black trees
Skip lists
van Emde Boas trees (later in this course)
such that for any key there is at most one pair (key, value) in the dictionary.
these data structures all support extra operations beyond the three above
but none of them take O(1) worst case time for all operations. . .
Dictionaries
Often we want to perform the following three operations:
Linked lists
Binary search trees
(2,3,4)-trees
In a dictionary data structure we store (key, value)-pairs
add(x, v) Add the the pair (x, v).
lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise.
delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary).
There are many data structures that will do this job, e.g.:
Red-black trees
Skip lists
van Emde Boas trees (later in this course)
such that for any key there is at most one pair (key, value) in the dictionary.
these data structures all support extra operations beyond the three above
but none of them take O(1) worst case time for all operations. . .
so maybe there is room for improvement?
Hash tables
Universe U containing u keys.
We want to store n elements from the universe, U in a dictionary.
Typically u = |U| is much, much larger than n.
Hash tables
Universe U containing u keys.
We want to store n elements from the universe, U in a dictionary.
Array T of size m.
Typically u = |U| is much, much larger than n.
m
Hash tables
Universe U containing u keys.
We want to store n elements from the universe, U in a dictionary.
Array T of size m.
Typically u = |U| is much, much larger than n.
T is called a hash table.
m
Hash tables
Universe U containing u keys.
We want to store n elements from the universe, U in a dictionary.
Array T of size m.
A hash function h : U → [m] maps a key to a position in T.
We write [m] to denote the set {0, . . . , m − 1}.
Typically u = |U| is much, much larger than n.
T is called a hash table.
m
Hash tables
Universe U containing u keys.
We want to store n elements from the universe, U in a dictionary.
Array T of size m.
A hash function h : U → [m] maps a key to a position in T.
x
h(x) (x, vx)
We write [m] to denote the set {0, . . . , m − 1}.
Typically u = |U| is much, much larger than n.
T is called a hash table.
m
Hash tables
Universe U containing u keys.
We want to store n elements from the universe, U in a dictionary.
Array T of size m.
A hash function h : U → [m] maps a key to a position in T.
x
h(x)
We want to avoid collisions, i.e. h(x) = h(y) for x = y.
(x, vx)
We write [m] to denote the set {0, . . . , m − 1}.
Typically u = |U| is much, much larger than n.
T is called a hash table.
m
Hash tables
Universe U containing u keys.
We want to store n elements from the universe, U in a dictionary.
Array T of size m.
A hash function h : U → [m] maps a key to a position in T.
x
h(x)
We want to avoid collisions, i.e. h(x) = h(y) for x = y.
y
w
z (x, vx) (y, vy) (z, vz) (w, vw)
Collisions can be resolved
with chaining, i.e. linked list.
We write [m] to denote the set {0, . . . , m − 1}.
Typically u = |U| is much, much larger than n.
T is called a hash table.
m
Time complexity
We cannot avoid collisions entirely since u m;
Operation Worst case time Comment
Simply add item to the list link if
necessary.
We might have to search through the
whole list containing x.
Only O(1) to perform the actual
delete. . . but you have to find x first
add(x, v) O(1)
O(length of chain containing x)lookup(x)
delete(x)
some keys from the universe are bound to be mapped to the same position.
O(length of chain containing x)
By building a hash table with chaining, we get the following time complexities:
(remember u is the size of the universe and m is the size of the table)
Time complexity
We cannot avoid collisions entirely since u m;
Operation Worst case time Comment
Simply add item to the list link if
necessary.
We might have to search through the
whole list containing x.
Only O(1) to perform the actual
delete. . . but you have to find x first
add(x, v) O(1)
O(length of chain containing x)lookup(x)
delete(x)
some keys from the universe are bound to be mapped to the same position.
So how long are these chains?
O(length of chain containing x)
By building a hash table with chaining, we get the following time complexities:
(remember u is the size of the universe and m is the size of the table)
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
Let x, y be two distinct keys from U.
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
we have that, Pr h(x) = h(y) =
1
m
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
we have that, Pr h(x) = h(y) =
1
m
this is because h(x) and h(y) are chosen uniformly and independently from [m].
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
we have that, Pr h(x) = h(y) =
1
m
this is because h(x) and h(y) are chosen uniformly and independently from [m].
Therefore, E(Ix,y) = Pr(Ix,y = 1) = Pr h(x) = h(y) = 1
m .
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
we have that, Pr h(x) = h(y) =
1
m
this is because h(x) and h(y) are chosen uniformly and independently from [m].
Therefore, E(Ix,y) = Pr(Ix,y = 1) = Pr h(x) = h(y) = 1
m .
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
We have that, E(Ix,y) = 1
m .
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
We have that, E(Ix,y) = 1
m .
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
We have that, E(Ix,y) = 1
m .
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
We have that, E(Ix,y) = 1
m .
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
We have that, E(Ix,y) = 1
m .
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
We have that, E(Ix,y) = 1
m .
Let Nx be the number of keys stored in T that are hashed to h(x)
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
We have that, E(Ix,y) = 1
m .
Let Nx be the number of keys stored in T that are hashed to h(x)
so, in the worst case it takes Nx time to look up x in T.
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
Observe that Nx =
y∈T
Ix,y
iff means if and only if.Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
We have that, E(Ix,y) = 1
m .
Let Nx be the number of keys stored in T that are hashed to h(x)
so, in the worst case it takes Nx time to look up x in T.
the keys in T
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
True randomness
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
PROOF
Observe that Nx =
y∈T
Ix,y
iff means if and only if.
linearity of expectation.
Let x, y be two distinct keys from U.
Let indicator r.v. Ix,y be 1 iff h(x) = h(y).
We have that, E(Ix,y) = 1
m .
Let Nx be the number of keys stored in T that are hashed to h(x)
so, in the worst case it takes Nx time to look up x in T.
Finally, we have that E(Nx) = E


y∈T
Ix,y

 =
y∈T
E(Ix,y) = n·
1
m
=
n
m
the keys in T
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1 + n
m ), or simply O(1) if m n.
Pick h uniformly at random from the set of all functions U → [m].
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
this is a number in [m], so requires log2 m bits.
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Why not pick the hash function as we go?
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Why not pick the hash function as we go?
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Couldn’t we generate h(x) when we first see x?
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Why not pick the hash function as we go?
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Couldn’t we generate h(x) when we first see x?
Wouldn’t we only use n log2 m bits? (one per key we actually store)
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Why not pick the hash function as we go?
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Couldn’t we generate h(x) when we first see x?
Wouldn’t we only use n log2 m bits? (one per key we actually store)
The problem with this approach is recalling h(x) the next time we see x
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Why not pick the hash function as we go?
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Couldn’t we generate h(x) when we first see x?
Wouldn’t we only use n log2 m bits? (one per key we actually store)
The problem with this approach is recalling h(x) the next time we see x
Essentially we’d need to build a dictionary to solve the dictionary problem!
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Why not pick the hash function as we go?
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Couldn’t we generate h(x) when we first see x?
Wouldn’t we only use n log2 m bits? (one per key we actually store)
The problem with this approach is recalling h(x) the next time we see x
Essentially we’d need to build a dictionary to solve the dictionary problem!
This has become rather cyclic... let’s try something else!
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Instead, we define a set, or family of hash functions: H = {h1, h2, . . . }.
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Instead, we define a set, or family of hash functions: H = {h1, h2, . . . }.
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
As part of initialising the hash table,
we choose the hash function h from H randomly.
Specifying the hash function
Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
For each key in U we need to specify an arbitrary position in T,
Instead, we define a set, or family of hash functions: H = {h1, h2, . . . }.
How should we specify the hash functions in H and how do we pick one at random?
this is a number in [m], so requires log2 m bits.
So in total we need u log2 m bits, which is a ridiculous amount of space!
(in particular, it’s much bigger than the table :s)
As part of initialising the hash table,
we choose the hash function h from H randomly.
Weakly universal hashing
A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U,
Pr h(x) = h(y)
1
m
where h is chosen uniformly at random from H.
Weakly universal hashing
A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U,
Pr h(x) = h(y)
1
m
where h is chosen uniformly at random from H.
OBSERVE
The randomness here comes from the fact that h is picked randomly.
Weakly universal hashing
A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U,
Pr h(x) = h(y)
1
m
where h is chosen uniformly at random from H.
OBSERVE
The randomness here comes from the fact that h is picked randomly.
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1) if m n.
Pick h uniformly at random from a weakly universal set H of hash functions.
Weakly universal hashing
A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U,
Pr h(x) = h(y)
1
m
where h is chosen uniformly at random from H.
PROOF
The proof we used for true randomness works here too (which is nice)
OBSERVE
The randomness here comes from the fact that h is picked randomly.
THEOREM
Consider any n fixed inputs to the hash table (which has size m),
i.e. any sequence of n add/lookup/delete operations.
The expected run-time per operation is O(1) if m n.
Pick h uniformly at random from a weakly universal set H of hash functions.
Constructing a weakly universal family of hash functions
Suppose U = [u], i.e. the keys in the universe are integers 0 to u−1.
Let p be any prime bigger than u.
For a, b ∈ [p], let
ha,b(x) = (ax + b mod p) mod m,
Hp,m = {ha,b | a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}}.
Constructing a weakly universal family of hash functions
Suppose U = [u], i.e. the keys in the universe are integers 0 to u−1.
Let p be any prime bigger than u.
For a, b ∈ [p], let
ha,b(x) = (ax + b mod p) mod m,
Hp,m = {ha,b | a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}}.
THEOREM
Hp,m is a weakly universal set of hash functions.
Constructing a weakly universal family of hash functions
Suppose U = [u], i.e. the keys in the universe are integers 0 to u−1.
Let p be any prime bigger than u.
For a, b ∈ [p], let
ha,b(x) = (ax + b mod p) mod m,
Hp,m = {ha,b | a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}}.
THEOREM
Hp,m is a weakly universal set of hash functions.
PROOF
See CLRS, Theorem 11.5, (page 267 in 3rd edition).
Constructing a weakly universal family of hash functions
Suppose U = [u], i.e. the keys in the universe are integers 0 to u−1.
Let p be any prime bigger than u.
For a, b ∈ [p], let
ha,b(x) = (ax + b mod p) mod m,
Hp,m = {ha,b | a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}}.
THEOREM
Hp,m is a weakly universal set of hash functions.
PROOF
See CLRS, Theorem 11.5, (page 267 in 3rd edition).
OBSERVE
ax + b is a linear transformation which “spreads the keys” over p values when
taken modulo p. This does not cause any collisions.
Only when taken modulo m do we get collisions.
True randomness vs. weakly universal hashing
the expected lookup time in the hash table is O(1).
we have seen that when m n,
For both,
true randomness
(h is picked uniformly from the set of all possible hash functions)
and weakly universal hashing
(h is picked uniformly from a weakly universal set of hash functions)
True randomness vs. weakly universal hashing
Since constructing a weakly universal set of hash functions seems much easier
the expected lookup time in the hash table is O(1).
we have seen that when m n,
than obtaining true randomness, this is all good news!
For both,
true randomness
(h is picked uniformly from the set of all possible hash functions)
and weakly universal hashing
(h is picked uniformly from a weakly universal set of hash functions)
True randomness vs. weakly universal hashing
Since constructing a weakly universal set of hash functions seems much easier
isn’t it?
the expected lookup time in the hash table is O(1).
we have seen that when m n,
than obtaining true randomness, this is all good news!
For both,
true randomness
(h is picked uniformly from the set of all possible hash functions)
and weakly universal hashing
(h is picked uniformly from a weakly universal set of hash functions)
True randomness vs. weakly universal hashing
Since constructing a weakly universal set of hash functions seems much easier
What about the length of the longest chain? (the longest linked list)
isn’t it?
the expected lookup time in the hash table is O(1).
we have seen that when m n,
than obtaining true randomness, this is all good news!
For both,
true randomness
(h is picked uniformly from the set of all possible hash functions)
and weakly universal hashing
(h is picked uniformly from a weakly universal set of hash functions)
True randomness vs. weakly universal hashing
Since constructing a weakly universal set of hash functions seems much easier
What about the length of the longest chain? (the longest linked list)
isn’t it?
If it is very long, some lookups could take a very long time. . .
the expected lookup time in the hash table is O(1).
we have seen that when m n,
than obtaining true randomness, this is all good news!
For both,
true randomness
(h is picked uniformly from the set of all possible hash functions)
and weakly universal hashing
(h is picked uniformly from a weakly universal set of hash functions)
Longest chain – true randomness
If h is selected uniformly at random from all functions U → [m] then,
Pr (any chain has length 3 log m )
1
m
.
LEMMA
over m fixed inputs,
Longest chain – true randomness
If h is selected uniformly at random from all functions U → [m] then,
Pr (any chain has length 3 log m )
1
m
.
LEMMA
OBSERVE
In this lemma we insert m keys, i.e. n = m.
over m fixed inputs,
Longest chain – true randomness
If h is selected uniformly at random from all functions U → [m] then,
Pr (any chain has length 3 log m )
1
m
.
LEMMA
OBSERVE
In this lemma we insert m keys, i.e. n = m.
over m fixed inputs,
Longest chain – true randomness
If h is selected uniformly at random from all functions U → [m] then,
Pr (any chain has length 3 log m )
1
m
.
LEMMA
OBSERVE
In this lemma we insert m keys, i.e. n = m.
PROOF
The problem is equivalent to showing that if we randomly throw m balls into m bins, the
probability of having a bin with at least 3 log m balls is at most 1
m .
· · ·
over m fixed inputs,
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Choose any k of the m balls (we’ll pick k in a bit)
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
Let V1, . . . , Vq be q events. Then
THEOREM
Pr
q
i=1
Vi
q
i=1
Pr(Vi).
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
m
k
=
m!
k!(m − k)!
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
m
k
=
m!
k!(m − k)!
=
m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)!
k!(m − k)!
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
m
k
=
m!
k!(m − k)!
=
m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)!
k!(m − k)!
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
m · (m) · (m) · . . . (m)
k!
m
k
=
m!
k!(m − k)!
=
m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)!
k!(m − k)!
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
m · (m) · (m) · . . . (m)
k!
m
k
=
m!
k!(m − k)!
=
m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)!
k!(m − k)!
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
m · (m) · (m) · . . . (m)
k!
k
m
k
=
m!
k!(m − k)!
=
m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)!
k!(m − k)!
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
m · (m) · (m) · . . . (m)
k!
mk
k!
k
m
k
=
m!
k!(m − k)!
=
m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)!
k!(m − k)!
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
By using the union bound again, we have that
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Pr(at least one bin receives at least k balls) m · Pr(X1 k)
m
k!
.
By using the union bound again, we have that
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Now we set k = 3 log m and observe that
m
k!
1
m
for m 2,
and we are done.
Pr(at least one bin receives at least k balls) m · Pr(X1 k)
m
k!
.
By using the union bound again, we have that
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
Longest chain – true randomness
PROOF
continued. . .
Let X1 be the number of balls in the first bin.
Pr(X1 k)
m
k
·
1
mk
1
k!
.
Now we set k = 3 log m and observe that
m
k!
1
m
for m 2,
and we are done.
Pr(at least one bin receives at least k balls) m · Pr(X1 k)
m
k!
.
By using the union bound again, we have that
Number of subsets of size k.
Choose any k of the m balls (we’ll pick k in a bit)
the probability at all of these k balls go into the first bin is 1
mk .
So, the union bound gives us
Why is m
k!
1
m ? (when k = 3 log m)
k! = k × (k − 1) × (k − 2) . . . × 2 × 1
k terms
k! > 2 × 2 × 2 . . . × 2 × 1 = 2k−1
Let k = 3 log m . . .
k! > 2(3 log m−1) 22 log m = (2log m)2 = m2
so m
k!
m
m2 = 1
m
Longest chain – true randomness
If h is selected uniformly at random from all functions U → [m] then,
Pr (any chain has length 3 log m )
1
m
.
LEMMA
OBSERVE
In this lemma we insert m keys, i.e. n = m.
PROOF
The problem is equivalent to showing that if we randomly throw m balls into m bins, the
probability of having a bin with at least 3 log m balls is at most 1
m .
· · ·
over m fixed inputs,
Longest chain – weakly universal hashing
The conclusion from previous slides is that with true randomness,
the longest chain is very short (at most 3 log m) with high probability.
Longest chain – weakly universal hashing
The conclusion from previous slides is that with true randomness,
If h is picked uniformly at random from a weakly universal set of hash functions then,
over m fixed inputs,
Pr any chain has length 1 +
√
2m
1
2
.
LEMMA
the longest chain is very short (at most 3 log m) with high probability.
Longest chain – weakly universal hashing
The conclusion from previous slides is that with true randomness,
If h is picked uniformly at random from a weakly universal set of hash functions then,
over m fixed inputs,
Pr any chain has length 1 +
√
2m
1
2
.
LEMMA
OBSERVE
This rubbish upper bound of 1
2 does not necessarily rule out the possibility that the
tightest upper bound is indeed very small. However, the upper bound of 1
2 is in fact tight!
the longest chain is very short (at most 3 log m) with high probability.
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Let r.v. L be the length of the longest chain. Then C L
2 .
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Let r.v. L be the length of the longest chain. Then C L
2 .
This is because a chain of length L causes
L
2 collisions!
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Let r.v. L be the length of the longest chain. Then C L
2 .
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Let r.v. L be the length of the longest chain. Then C L
2 .
Now, Pr
(L−1)2
2 m Pr L
2 m Pr (C m) 1
2 .
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Let r.v. L be the length of the longest chain. Then C L
2 .
Now, Pr
(L−1)2
2 m Pr L
2 m Pr (C m) 1
2 .
this is because
L
2 =
L!
2!(L − 2)!
=
L · (L − 1)
2
(L − 1)2
2
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Let r.v. L be the length of the longest chain. Then C L
2 .
Now, Pr
(L−1)2
2 m Pr L
2 m Pr (C m) 1
2 .
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Let r.v. L be the length of the longest chain. Then C L
2 .
Now, Pr
(L−1)2
2 m Pr L
2 m Pr (C m) 1
2 .
Longest chain – weakly universal hashing
PROOF
For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
Using linearity of expectation and E(Ix,y) = 1
m (h is weakly universal),
E(C) = E
x,y∈T, x<y
Ix,y =
x,y∈T, x<y
E(Ix,y) =
m
2
·
1
m
m
2
.
by Markov’s inequality, Pr(C m)
E(C)
m
1
2 .
Let r.v. L be the length of the longest chain. Then C L
2 .
Now, Pr
(L−1)2
2 m Pr L
2 m Pr (C m) 1
2 .
By rearranging, we have that Pr L 1 +
√
2m 1
2 , and we are done.
Conclusions
the expected lookup time in a hash table with chaining is O(1).
we have seen that when m n,
For both,
true randomness (h is picked uniformly from the set of all possible hash functions)
and weakly universal hashing
(h is picked uniformly from a weakly universal set of hash functions)
If h is selected uniformly at random from all functions U → [m] then,
Pr (any chain has length 3 log m )
1
m
.
LEMMA
LEMMA
(both Lemmas hold for m any fixed inputs)
If h is picked uniformly at random from a weakly universal set of hash functions,
Pr any chain has length 1 +
√
2m
1
2
.

More Related Content

What's hot

Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceasimnawaz54
 
slides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial scienceslides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial scienceArthur Charpentier
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuQuantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuSEENET-MTP
 
Lesson 16: Derivatives of Exponential and Logarithmic Functions
Lesson 16: Derivatives of Exponential and Logarithmic FunctionsLesson 16: Derivatives of Exponential and Logarithmic Functions
Lesson 16: Derivatives of Exponential and Logarithmic FunctionsMatthew Leingang
 
Largedictionaries handout
Largedictionaries handoutLargedictionaries handout
Largedictionaries handoutcsedays
 
Integration by Parts, Part 1
Integration by Parts, Part 1Integration by Parts, Part 1
Integration by Parts, Part 1Pablo Antuna
 
Advanced matlab codigos matematicos
Advanced matlab codigos matematicosAdvanced matlab codigos matematicos
Advanced matlab codigos matematicosKmilo Bolaños
 
Predicates and Quantifiers
Predicates and QuantifiersPredicates and Quantifiers
Predicates and Quantifiersblaircomp2003
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
Term paper inna_tarasyan
Term paper inna_tarasyanTerm paper inna_tarasyan
Term paper inna_tarasyanInna Таrasyan
 

What's hot (18)

ScalaMeter 2014
ScalaMeter 2014ScalaMeter 2014
ScalaMeter 2014
 
slides tails copulas
slides tails copulasslides tails copulas
slides tails copulas
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inference
 
Scala Parallel Collections
Scala Parallel CollectionsScala Parallel Collections
Scala Parallel Collections
 
slides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial scienceslides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial science
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuQuantum modes - Ion Cotaescu
Quantum modes - Ion Cotaescu
 
Lesson 16: Derivatives of Exponential and Logarithmic Functions
Lesson 16: Derivatives of Exponential and Logarithmic FunctionsLesson 16: Derivatives of Exponential and Logarithmic Functions
Lesson 16: Derivatives of Exponential and Logarithmic Functions
 
Largedictionaries handout
Largedictionaries handoutLargedictionaries handout
Largedictionaries handout
 
Integration by Parts, Part 1
Integration by Parts, Part 1Integration by Parts, Part 1
Integration by Parts, Part 1
 
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
 
Advanced matlab codigos matematicos
Advanced matlab codigos matematicosAdvanced matlab codigos matematicos
Advanced matlab codigos matematicos
 
Predicates and Quantifiers
Predicates and QuantifiersPredicates and Quantifiers
Predicates and Quantifiers
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Sildes buenos aires
Sildes buenos airesSildes buenos aires
Sildes buenos aires
 
Term paper inna_tarasyan
Term paper inna_tarasyanTerm paper inna_tarasyan
Term paper inna_tarasyan
 
Multiattribute utility copula
Multiattribute utility copulaMultiattribute utility copula
Multiattribute utility copula
 
Fougeres Besancon Archimax
Fougeres Besancon ArchimaxFougeres Besancon Archimax
Fougeres Besancon Archimax
 
Numerical Methods 3
Numerical Methods 3Numerical Methods 3
Numerical Methods 3
 

Viewers also liked

Solution to linear equhgations
Solution to linear equhgationsSolution to linear equhgations
Solution to linear equhgationsRobin Singh
 
11. Hashing - Data Structures using C++ by Varsha Patil
11. Hashing - Data Structures using C++ by Varsha Patil11. Hashing - Data Structures using C++ by Varsha Patil
11. Hashing - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)krishnapriya R
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2SHAKOOR AB
 
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Chyi-Tsong Chen
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data StructuresSHAKOOR AB
 

Viewers also liked (8)

Solution of linear system of equations
Solution of linear system of equationsSolution of linear system of equations
Solution of linear system of equations
 
Solution to linear equhgations
Solution to linear equhgationsSolution to linear equhgations
Solution to linear equhgations
 
11. Hashing - Data Structures using C++ by Varsha Patil
11. Hashing - Data Structures using C++ by Varsha Patil11. Hashing - Data Structures using C++ by Varsha Patil
11. Hashing - Data Structures using C++ by Varsha Patil
 
NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
 
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
 
Myp10 system of linear equations with solution
Myp10 system of linear equations with solutionMyp10 system of linear equations with solution
Myp10 system of linear equations with solution
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
 

Similar to Hashing Part One

MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxMA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxsmile790243
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.Dr. Volkan OBAN
 
Numarical values
Numarical valuesNumarical values
Numarical valuesAmanSaeed11
 
Numarical values highlighted
Numarical values highlightedNumarical values highlighted
Numarical values highlightedAmanSaeed11
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniyaTutorialsDuniya.com
 
Algorithm chapter 7
Algorithm chapter 7Algorithm chapter 7
Algorithm chapter 7chidabdu
 
Skiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingSkiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingzukun
 
R command cheatsheet.pdf
R command cheatsheet.pdfR command cheatsheet.pdf
R command cheatsheet.pdfNgcnh947953
 
Existance Theory for First Order Nonlinear Random Dfferential Equartion
Existance Theory for First Order Nonlinear Random Dfferential EquartionExistance Theory for First Order Nonlinear Random Dfferential Equartion
Existance Theory for First Order Nonlinear Random Dfferential Equartioninventionjournals
 
Unique fixed point theorems for generalized weakly contractive condition in o...
Unique fixed point theorems for generalized weakly contractive condition in o...Unique fixed point theorems for generalized weakly contractive condition in o...
Unique fixed point theorems for generalized weakly contractive condition in o...Alexander Decker
 
Defining Functions on Equivalence Classes
Defining Functions on Equivalence ClassesDefining Functions on Equivalence Classes
Defining Functions on Equivalence ClassesLawrence Paulson
 

Similar to Hashing Part One (20)

08 Hash Tables
08 Hash Tables08 Hash Tables
08 Hash Tables
 
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxMA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
 
Reference card for R
Reference card for RReference card for R
Reference card for R
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
 
Numarical values
Numarical valuesNumarical values
Numarical values
 
Numarical values highlighted
Numarical values highlightedNumarical values highlighted
Numarical values highlighted
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniya
 
Algorithm chapter 7
Algorithm chapter 7Algorithm chapter 7
Algorithm chapter 7
 
Theory of computation Lec1
Theory of computation Lec1Theory of computation Lec1
Theory of computation Lec1
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Skiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingSkiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sorting
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
R command cheatsheet.pdf
R command cheatsheet.pdfR command cheatsheet.pdf
R command cheatsheet.pdf
 
@ R reference
@ R reference@ R reference
@ R reference
 
Existance Theory for First Order Nonlinear Random Dfferential Equartion
Existance Theory for First Order Nonlinear Random Dfferential EquartionExistance Theory for First Order Nonlinear Random Dfferential Equartion
Existance Theory for First Order Nonlinear Random Dfferential Equartion
 
Lagrange’s interpolation formula
Lagrange’s interpolation formulaLagrange’s interpolation formula
Lagrange’s interpolation formula
 
Lec5
Lec5Lec5
Lec5
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
Unique fixed point theorems for generalized weakly contractive condition in o...
Unique fixed point theorems for generalized weakly contractive condition in o...Unique fixed point theorems for generalized weakly contractive condition in o...
Unique fixed point theorems for generalized weakly contractive condition in o...
 
Defining Functions on Equivalence Classes
Defining Functions on Equivalence ClassesDefining Functions on Equivalence Classes
Defining Functions on Equivalence Classes
 

More from Benjamin Sach

Approximation Algorithms Part Four: APTAS
Approximation Algorithms Part Four: APTASApproximation Algorithms Part Four: APTAS
Approximation Algorithms Part Four: APTASBenjamin Sach
 
Approximation Algorithms Part Three: (F)PTAS
Approximation Algorithms Part Three: (F)PTASApproximation Algorithms Part Three: (F)PTAS
Approximation Algorithms Part Three: (F)PTASBenjamin Sach
 
Approximation Algorithms Part Two: More Constant factor approximations
Approximation Algorithms Part Two: More Constant factor approximationsApproximation Algorithms Part Two: More Constant factor approximations
Approximation Algorithms Part Two: More Constant factor approximationsBenjamin Sach
 
Approximation Algorithms Part One: Constant factor approximations
Approximation Algorithms Part One: Constant factor approximationsApproximation Algorithms Part One: Constant factor approximations
Approximation Algorithms Part One: Constant factor approximationsBenjamin Sach
 
Orthogonal Range Searching
Orthogonal Range SearchingOrthogonal Range Searching
Orthogonal Range SearchingBenjamin Sach
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesBenjamin Sach
 
Pattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistancePattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistanceBenjamin Sach
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common AncestorBenjamin Sach
 
Range Minimum Queries
Range Minimum QueriesRange Minimum Queries
Range Minimum QueriesBenjamin Sach
 
Pattern Matching Part Two: Suffix Arrays
Pattern Matching Part Two: Suffix ArraysPattern Matching Part Two: Suffix Arrays
Pattern Matching Part Two: Suffix ArraysBenjamin Sach
 
Pattern Matching Part One: Suffix Trees
Pattern Matching Part One: Suffix TreesPattern Matching Part One: Suffix Trees
Pattern Matching Part One: Suffix TreesBenjamin Sach
 
Minimum Spanning Trees (via Disjoint Sets)
Minimum Spanning Trees (via Disjoint Sets)Minimum Spanning Trees (via Disjoint Sets)
Minimum Spanning Trees (via Disjoint Sets)Benjamin Sach
 
Shortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Shortest Paths Part 1: Priority Queues and Dijkstra's AlgorithmShortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Shortest Paths Part 1: Priority Queues and Dijkstra's AlgorithmBenjamin Sach
 
Depth First Search and Breadth First Search
Depth First Search and Breadth First SearchDepth First Search and Breadth First Search
Depth First Search and Breadth First SearchBenjamin Sach
 
Shortest Paths Part 2: Negative Weights and All-pairs
Shortest Paths Part 2: Negative Weights and All-pairsShortest Paths Part 2: Negative Weights and All-pairs
Shortest Paths Part 2: Negative Weights and All-pairsBenjamin Sach
 
Line Segment Intersections
Line Segment IntersectionsLine Segment Intersections
Line Segment IntersectionsBenjamin Sach
 
Self-balancing Trees and Skip Lists
Self-balancing Trees and Skip ListsSelf-balancing Trees and Skip Lists
Self-balancing Trees and Skip ListsBenjamin Sach
 

More from Benjamin Sach (19)

Approximation Algorithms Part Four: APTAS
Approximation Algorithms Part Four: APTASApproximation Algorithms Part Four: APTAS
Approximation Algorithms Part Four: APTAS
 
Approximation Algorithms Part Three: (F)PTAS
Approximation Algorithms Part Three: (F)PTASApproximation Algorithms Part Three: (F)PTAS
Approximation Algorithms Part Three: (F)PTAS
 
Approximation Algorithms Part Two: More Constant factor approximations
Approximation Algorithms Part Two: More Constant factor approximationsApproximation Algorithms Part Two: More Constant factor approximations
Approximation Algorithms Part Two: More Constant factor approximations
 
Approximation Algorithms Part One: Constant factor approximations
Approximation Algorithms Part One: Constant factor approximationsApproximation Algorithms Part One: Constant factor approximations
Approximation Algorithms Part One: Constant factor approximations
 
Orthogonal Range Searching
Orthogonal Range SearchingOrthogonal Range Searching
Orthogonal Range Searching
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
 
Pattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistancePattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming Distance
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common Ancestor
 
Range Minimum Queries
Range Minimum QueriesRange Minimum Queries
Range Minimum Queries
 
Pattern Matching Part Two: Suffix Arrays
Pattern Matching Part Two: Suffix ArraysPattern Matching Part Two: Suffix Arrays
Pattern Matching Part Two: Suffix Arrays
 
Pattern Matching Part One: Suffix Trees
Pattern Matching Part One: Suffix TreesPattern Matching Part One: Suffix Trees
Pattern Matching Part One: Suffix Trees
 
Bloom Filters
Bloom FiltersBloom Filters
Bloom Filters
 
Dynamic Programming
Dynamic ProgrammingDynamic Programming
Dynamic Programming
 
Minimum Spanning Trees (via Disjoint Sets)
Minimum Spanning Trees (via Disjoint Sets)Minimum Spanning Trees (via Disjoint Sets)
Minimum Spanning Trees (via Disjoint Sets)
 
Shortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Shortest Paths Part 1: Priority Queues and Dijkstra's AlgorithmShortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Shortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
 
Depth First Search and Breadth First Search
Depth First Search and Breadth First SearchDepth First Search and Breadth First Search
Depth First Search and Breadth First Search
 
Shortest Paths Part 2: Negative Weights and All-pairs
Shortest Paths Part 2: Negative Weights and All-pairsShortest Paths Part 2: Negative Weights and All-pairs
Shortest Paths Part 2: Negative Weights and All-pairs
 
Line Segment Intersections
Line Segment IntersectionsLine Segment Intersections
Line Segment Intersections
 
Self-balancing Trees and Skip Lists
Self-balancing Trees and Skip ListsSelf-balancing Trees and Skip Lists
Self-balancing Trees and Skip Lists
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Hashing Part One

  • 1. Advanced Algorithms – COMS31900 Hashing part one Chaining, true randomness and universal hashing (based on slides by Markus Jalsenius) Benjamin Sach
  • 2. Dictionaries Often we want to perform the following three operations: In a dictionary data structure we store (key, value)-pairs add(x, v) Add the the pair (x, v). lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise. delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary). such that for any key there is at most one pair (key, value) in the dictionary.
  • 3. Dictionaries Often we want to perform the following three operations: Linked lists Binary search trees (2,3,4)-trees In a dictionary data structure we store (key, value)-pairs add(x, v) Add the the pair (x, v). lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise. delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary). There are many data structures that will do this job, e.g.: Red-black trees Skip lists van Emde Boas trees (later in this course) such that for any key there is at most one pair (key, value) in the dictionary.
  • 4. Dictionaries Often we want to perform the following three operations: Linked lists Binary search trees (2,3,4)-trees In a dictionary data structure we store (key, value)-pairs add(x, v) Add the the pair (x, v). lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise. delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary). There are many data structures that will do this job, e.g.: Red-black trees Skip lists van Emde Boas trees (later in this course) such that for any key there is at most one pair (key, value) in the dictionary. these data structures all support extra operations beyond the three above
  • 5. Dictionaries Often we want to perform the following three operations: Linked lists Binary search trees (2,3,4)-trees In a dictionary data structure we store (key, value)-pairs add(x, v) Add the the pair (x, v). lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise. delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary). There are many data structures that will do this job, e.g.: Red-black trees Skip lists van Emde Boas trees (later in this course) such that for any key there is at most one pair (key, value) in the dictionary. these data structures all support extra operations beyond the three above but none of them take O(1) worst case time for all operations. . .
  • 6. Dictionaries Often we want to perform the following three operations: Linked lists Binary search trees (2,3,4)-trees In a dictionary data structure we store (key, value)-pairs add(x, v) Add the the pair (x, v). lookup(x) Return v if (x, v) is in dictionary, or NULL otherwise. delete(x) Remove pair (x, v) (assuming (x, v) is in dictionary). There are many data structures that will do this job, e.g.: Red-black trees Skip lists van Emde Boas trees (later in this course) such that for any key there is at most one pair (key, value) in the dictionary. these data structures all support extra operations beyond the three above but none of them take O(1) worst case time for all operations. . . so maybe there is room for improvement?
  • 7. Hash tables Universe U containing u keys. We want to store n elements from the universe, U in a dictionary. Typically u = |U| is much, much larger than n.
  • 8. Hash tables Universe U containing u keys. We want to store n elements from the universe, U in a dictionary. Array T of size m. Typically u = |U| is much, much larger than n. m
  • 9. Hash tables Universe U containing u keys. We want to store n elements from the universe, U in a dictionary. Array T of size m. Typically u = |U| is much, much larger than n. T is called a hash table. m
  • 10. Hash tables Universe U containing u keys. We want to store n elements from the universe, U in a dictionary. Array T of size m. A hash function h : U → [m] maps a key to a position in T. We write [m] to denote the set {0, . . . , m − 1}. Typically u = |U| is much, much larger than n. T is called a hash table. m
  • 11. Hash tables Universe U containing u keys. We want to store n elements from the universe, U in a dictionary. Array T of size m. A hash function h : U → [m] maps a key to a position in T. x h(x) (x, vx) We write [m] to denote the set {0, . . . , m − 1}. Typically u = |U| is much, much larger than n. T is called a hash table. m
  • 12. Hash tables Universe U containing u keys. We want to store n elements from the universe, U in a dictionary. Array T of size m. A hash function h : U → [m] maps a key to a position in T. x h(x) We want to avoid collisions, i.e. h(x) = h(y) for x = y. (x, vx) We write [m] to denote the set {0, . . . , m − 1}. Typically u = |U| is much, much larger than n. T is called a hash table. m
  • 13. Hash tables Universe U containing u keys. We want to store n elements from the universe, U in a dictionary. Array T of size m. A hash function h : U → [m] maps a key to a position in T. x h(x) We want to avoid collisions, i.e. h(x) = h(y) for x = y. y w z (x, vx) (y, vy) (z, vz) (w, vw) Collisions can be resolved with chaining, i.e. linked list. We write [m] to denote the set {0, . . . , m − 1}. Typically u = |U| is much, much larger than n. T is called a hash table. m
  • 14. Time complexity We cannot avoid collisions entirely since u m; Operation Worst case time Comment Simply add item to the list link if necessary. We might have to search through the whole list containing x. Only O(1) to perform the actual delete. . . but you have to find x first add(x, v) O(1) O(length of chain containing x)lookup(x) delete(x) some keys from the universe are bound to be mapped to the same position. O(length of chain containing x) By building a hash table with chaining, we get the following time complexities: (remember u is the size of the universe and m is the size of the table)
  • 15. Time complexity We cannot avoid collisions entirely since u m; Operation Worst case time Comment Simply add item to the list link if necessary. We might have to search through the whole list containing x. Only O(1) to perform the actual delete. . . but you have to find x first add(x, v) O(1) O(length of chain containing x)lookup(x) delete(x) some keys from the universe are bound to be mapped to the same position. So how long are these chains? O(length of chain containing x) By building a hash table with chaining, we get the following time complexities: (remember u is the size of the universe and m is the size of the table)
  • 16. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 17. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 18. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF Let x, y be two distinct keys from U. i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 19. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 20. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 21. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). we have that, Pr h(x) = h(y) = 1 m i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 22. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). we have that, Pr h(x) = h(y) = 1 m this is because h(x) and h(y) are chosen uniformly and independently from [m]. i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 23. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). we have that, Pr h(x) = h(y) = 1 m this is because h(x) and h(y) are chosen uniformly and independently from [m]. Therefore, E(Ix,y) = Pr(Ix,y = 1) = Pr h(x) = h(y) = 1 m . i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 24. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). we have that, Pr h(x) = h(y) = 1 m this is because h(x) and h(y) are chosen uniformly and independently from [m]. Therefore, E(Ix,y) = Pr(Ix,y = 1) = Pr h(x) = h(y) = 1 m . i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m]. We have that, E(Ix,y) = 1 m .
  • 25. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m]. We have that, E(Ix,y) = 1 m .
  • 26. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m]. We have that, E(Ix,y) = 1 m .
  • 27. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m]. We have that, E(Ix,y) = 1 m .
  • 28. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). We have that, E(Ix,y) = 1 m . i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 29. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). We have that, E(Ix,y) = 1 m . Let Nx be the number of keys stored in T that are hashed to h(x) i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 30. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). We have that, E(Ix,y) = 1 m . Let Nx be the number of keys stored in T that are hashed to h(x) so, in the worst case it takes Nx time to look up x in T. i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 31. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF Observe that Nx = y∈T Ix,y iff means if and only if.Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). We have that, E(Ix,y) = 1 m . Let Nx be the number of keys stored in T that are hashed to h(x) so, in the worst case it takes Nx time to look up x in T. the keys in T i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 32. True randomness THEOREM Consider any n fixed inputs to the hash table (which has size m), PROOF Observe that Nx = y∈T Ix,y iff means if and only if. linearity of expectation. Let x, y be two distinct keys from U. Let indicator r.v. Ix,y be 1 iff h(x) = h(y). We have that, E(Ix,y) = 1 m . Let Nx be the number of keys stored in T that are hashed to h(x) so, in the worst case it takes Nx time to look up x in T. Finally, we have that E(Nx) = E   y∈T Ix,y   = y∈T E(Ix,y) = n· 1 m = n m the keys in T i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1 + n m ), or simply O(1) if m n. Pick h uniformly at random from the set of all functions U → [m].
  • 33. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function?
  • 34. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, this is a number in [m], so requires log2 m bits.
  • 35. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space!
  • 36. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s)
  • 37. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Why not pick the hash function as we go? this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s)
  • 38. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Why not pick the hash function as we go? this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s) Couldn’t we generate h(x) when we first see x?
  • 39. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Why not pick the hash function as we go? this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s) Couldn’t we generate h(x) when we first see x? Wouldn’t we only use n log2 m bits? (one per key we actually store)
  • 40. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Why not pick the hash function as we go? this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s) Couldn’t we generate h(x) when we first see x? Wouldn’t we only use n log2 m bits? (one per key we actually store) The problem with this approach is recalling h(x) the next time we see x
  • 41. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Why not pick the hash function as we go? this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s) Couldn’t we generate h(x) when we first see x? Wouldn’t we only use n log2 m bits? (one per key we actually store) The problem with this approach is recalling h(x) the next time we see x Essentially we’d need to build a dictionary to solve the dictionary problem!
  • 42. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Why not pick the hash function as we go? this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s) Couldn’t we generate h(x) when we first see x? Wouldn’t we only use n log2 m bits? (one per key we actually store) The problem with this approach is recalling h(x) the next time we see x Essentially we’d need to build a dictionary to solve the dictionary problem! This has become rather cyclic... let’s try something else!
  • 43. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s)
  • 44. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Instead, we define a set, or family of hash functions: H = {h1, h2, . . . }. this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s)
  • 45. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Instead, we define a set, or family of hash functions: H = {h1, h2, . . . }. this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s) As part of initialising the hash table, we choose the hash function h from H randomly.
  • 46. Specifying the hash function Problem: how do we specify an arbitrary (e.g. a truely random) hash function? For each key in U we need to specify an arbitrary position in T, Instead, we define a set, or family of hash functions: H = {h1, h2, . . . }. How should we specify the hash functions in H and how do we pick one at random? this is a number in [m], so requires log2 m bits. So in total we need u log2 m bits, which is a ridiculous amount of space! (in particular, it’s much bigger than the table :s) As part of initialising the hash table, we choose the hash function h from H randomly.
  • 47. Weakly universal hashing A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U, Pr h(x) = h(y) 1 m where h is chosen uniformly at random from H.
  • 48. Weakly universal hashing A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U, Pr h(x) = h(y) 1 m where h is chosen uniformly at random from H. OBSERVE The randomness here comes from the fact that h is picked randomly.
  • 49. Weakly universal hashing A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U, Pr h(x) = h(y) 1 m where h is chosen uniformly at random from H. OBSERVE The randomness here comes from the fact that h is picked randomly. THEOREM Consider any n fixed inputs to the hash table (which has size m), i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1) if m n. Pick h uniformly at random from a weakly universal set H of hash functions.
  • 50. Weakly universal hashing A set H of hash functions is weakly universal if for any two distinct keys x, y ∈ U, Pr h(x) = h(y) 1 m where h is chosen uniformly at random from H. PROOF The proof we used for true randomness works here too (which is nice) OBSERVE The randomness here comes from the fact that h is picked randomly. THEOREM Consider any n fixed inputs to the hash table (which has size m), i.e. any sequence of n add/lookup/delete operations. The expected run-time per operation is O(1) if m n. Pick h uniformly at random from a weakly universal set H of hash functions.
  • 51. Constructing a weakly universal family of hash functions Suppose U = [u], i.e. the keys in the universe are integers 0 to u−1. Let p be any prime bigger than u. For a, b ∈ [p], let ha,b(x) = (ax + b mod p) mod m, Hp,m = {ha,b | a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}}.
  • 52. Constructing a weakly universal family of hash functions Suppose U = [u], i.e. the keys in the universe are integers 0 to u−1. Let p be any prime bigger than u. For a, b ∈ [p], let ha,b(x) = (ax + b mod p) mod m, Hp,m = {ha,b | a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}}. THEOREM Hp,m is a weakly universal set of hash functions.
  • 53. Constructing a weakly universal family of hash functions Suppose U = [u], i.e. the keys in the universe are integers 0 to u−1. Let p be any prime bigger than u. For a, b ∈ [p], let ha,b(x) = (ax + b mod p) mod m, Hp,m = {ha,b | a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}}. THEOREM Hp,m is a weakly universal set of hash functions. PROOF See CLRS, Theorem 11.5, (page 267 in 3rd edition).
  • 54. Constructing a weakly universal family of hash functions Suppose U = [u], i.e. the keys in the universe are integers 0 to u−1. Let p be any prime bigger than u. For a, b ∈ [p], let ha,b(x) = (ax + b mod p) mod m, Hp,m = {ha,b | a ∈ {1, . . . , p − 1}, b ∈ {0, . . . , p − 1}}. THEOREM Hp,m is a weakly universal set of hash functions. PROOF See CLRS, Theorem 11.5, (page 267 in 3rd edition). OBSERVE ax + b is a linear transformation which “spreads the keys” over p values when taken modulo p. This does not cause any collisions. Only when taken modulo m do we get collisions.
  • 55. True randomness vs. weakly universal hashing the expected lookup time in the hash table is O(1). we have seen that when m n, For both, true randomness (h is picked uniformly from the set of all possible hash functions) and weakly universal hashing (h is picked uniformly from a weakly universal set of hash functions)
  • 56. True randomness vs. weakly universal hashing Since constructing a weakly universal set of hash functions seems much easier the expected lookup time in the hash table is O(1). we have seen that when m n, than obtaining true randomness, this is all good news! For both, true randomness (h is picked uniformly from the set of all possible hash functions) and weakly universal hashing (h is picked uniformly from a weakly universal set of hash functions)
  • 57. True randomness vs. weakly universal hashing Since constructing a weakly universal set of hash functions seems much easier isn’t it? the expected lookup time in the hash table is O(1). we have seen that when m n, than obtaining true randomness, this is all good news! For both, true randomness (h is picked uniformly from the set of all possible hash functions) and weakly universal hashing (h is picked uniformly from a weakly universal set of hash functions)
  • 58. True randomness vs. weakly universal hashing Since constructing a weakly universal set of hash functions seems much easier What about the length of the longest chain? (the longest linked list) isn’t it? the expected lookup time in the hash table is O(1). we have seen that when m n, than obtaining true randomness, this is all good news! For both, true randomness (h is picked uniformly from the set of all possible hash functions) and weakly universal hashing (h is picked uniformly from a weakly universal set of hash functions)
  • 59. True randomness vs. weakly universal hashing Since constructing a weakly universal set of hash functions seems much easier What about the length of the longest chain? (the longest linked list) isn’t it? If it is very long, some lookups could take a very long time. . . the expected lookup time in the hash table is O(1). we have seen that when m n, than obtaining true randomness, this is all good news! For both, true randomness (h is picked uniformly from the set of all possible hash functions) and weakly universal hashing (h is picked uniformly from a weakly universal set of hash functions)
  • 60. Longest chain – true randomness If h is selected uniformly at random from all functions U → [m] then, Pr (any chain has length 3 log m ) 1 m . LEMMA over m fixed inputs,
  • 61. Longest chain – true randomness If h is selected uniformly at random from all functions U → [m] then, Pr (any chain has length 3 log m ) 1 m . LEMMA OBSERVE In this lemma we insert m keys, i.e. n = m. over m fixed inputs,
  • 62. Longest chain – true randomness If h is selected uniformly at random from all functions U → [m] then, Pr (any chain has length 3 log m ) 1 m . LEMMA OBSERVE In this lemma we insert m keys, i.e. n = m. over m fixed inputs,
  • 63. Longest chain – true randomness If h is selected uniformly at random from all functions U → [m] then, Pr (any chain has length 3 log m ) 1 m . LEMMA OBSERVE In this lemma we insert m keys, i.e. n = m. PROOF The problem is equivalent to showing that if we randomly throw m balls into m bins, the probability of having a bin with at least 3 log m balls is at most 1 m . · · · over m fixed inputs,
  • 64. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin.
  • 65. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Choose any k of the m balls (we’ll pick k in a bit)
  • 66. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk .
  • 67. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us
  • 68. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us
  • 69. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us Let V1, . . . , Vq be q events. Then THEOREM Pr q i=1 Vi q i=1 Pr(Vi).
  • 70. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us
  • 71. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us m k = m! k!(m − k)!
  • 72. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us m k = m! k!(m − k)! = m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)! k!(m − k)!
  • 73. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us m k = m! k!(m − k)! = m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)! k!(m − k)!
  • 74. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us m · (m) · (m) · . . . (m) k! m k = m! k!(m − k)! = m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)! k!(m − k)!
  • 75. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us m · (m) · (m) · . . . (m) k! m k = m! k!(m − k)! = m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)! k!(m − k)!
  • 76. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us m · (m) · (m) · . . . (m) k! k m k = m! k!(m − k)! = m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)! k!(m − k)!
  • 77. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us m · (m) · (m) · . . . (m) k! mk k! k m k = m! k!(m − k)! = m · (m − 1) · (m − 2) · . . . (m − k + 1) · (m − k)! k!(m − k)!
  • 78. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . By using the union bound again, we have that Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us
  • 79. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Pr(at least one bin receives at least k balls) m · Pr(X1 k) m k! . By using the union bound again, we have that Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us
  • 80. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Now we set k = 3 log m and observe that m k! 1 m for m 2, and we are done. Pr(at least one bin receives at least k balls) m · Pr(X1 k) m k! . By using the union bound again, we have that Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us
  • 81. Longest chain – true randomness PROOF continued. . . Let X1 be the number of balls in the first bin. Pr(X1 k) m k · 1 mk 1 k! . Now we set k = 3 log m and observe that m k! 1 m for m 2, and we are done. Pr(at least one bin receives at least k balls) m · Pr(X1 k) m k! . By using the union bound again, we have that Number of subsets of size k. Choose any k of the m balls (we’ll pick k in a bit) the probability at all of these k balls go into the first bin is 1 mk . So, the union bound gives us Why is m k! 1 m ? (when k = 3 log m) k! = k × (k − 1) × (k − 2) . . . × 2 × 1 k terms k! > 2 × 2 × 2 . . . × 2 × 1 = 2k−1 Let k = 3 log m . . . k! > 2(3 log m−1) 22 log m = (2log m)2 = m2 so m k! m m2 = 1 m
  • 82. Longest chain – true randomness If h is selected uniformly at random from all functions U → [m] then, Pr (any chain has length 3 log m ) 1 m . LEMMA OBSERVE In this lemma we insert m keys, i.e. n = m. PROOF The problem is equivalent to showing that if we randomly throw m balls into m bins, the probability of having a bin with at least 3 log m balls is at most 1 m . · · · over m fixed inputs,
  • 83. Longest chain – weakly universal hashing The conclusion from previous slides is that with true randomness, the longest chain is very short (at most 3 log m) with high probability.
  • 84. Longest chain – weakly universal hashing The conclusion from previous slides is that with true randomness, If h is picked uniformly at random from a weakly universal set of hash functions then, over m fixed inputs, Pr any chain has length 1 + √ 2m 1 2 . LEMMA the longest chain is very short (at most 3 log m) with high probability.
  • 85. Longest chain – weakly universal hashing The conclusion from previous slides is that with true randomness, If h is picked uniformly at random from a weakly universal set of hash functions then, over m fixed inputs, Pr any chain has length 1 + √ 2m 1 2 . LEMMA OBSERVE This rubbish upper bound of 1 2 does not necessarily rule out the possibility that the tightest upper bound is indeed very small. However, the upper bound of 1 2 is in fact tight! the longest chain is very short (at most 3 log m) with high probability.
  • 86. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y).
  • 87. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y.
  • 88. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 .
  • 89. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 .
  • 90. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 . Let r.v. L be the length of the longest chain. Then C L 2 .
  • 91. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 . Let r.v. L be the length of the longest chain. Then C L 2 . This is because a chain of length L causes L 2 collisions!
  • 92. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 . Let r.v. L be the length of the longest chain. Then C L 2 .
  • 93. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 . Let r.v. L be the length of the longest chain. Then C L 2 . Now, Pr (L−1)2 2 m Pr L 2 m Pr (C m) 1 2 .
  • 94. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 . Let r.v. L be the length of the longest chain. Then C L 2 . Now, Pr (L−1)2 2 m Pr L 2 m Pr (C m) 1 2 . this is because L 2 = L! 2!(L − 2)! = L · (L − 1) 2 (L − 1)2 2
  • 95. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 . Let r.v. L be the length of the longest chain. Then C L 2 . Now, Pr (L−1)2 2 m Pr L 2 m Pr (C m) 1 2 .
  • 96. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 . Let r.v. L be the length of the longest chain. Then C L 2 . Now, Pr (L−1)2 2 m Pr L 2 m Pr (C m) 1 2 .
  • 97. Longest chain – weakly universal hashing PROOF For any two keys x, y, let indicator r.v. Ix,y be 1 iff h(x) = h(y). Let r.v. C be the total number of collisions: C = x,y∈T, x<y Ix,y. Using linearity of expectation and E(Ix,y) = 1 m (h is weakly universal), E(C) = E x,y∈T, x<y Ix,y = x,y∈T, x<y E(Ix,y) = m 2 · 1 m m 2 . by Markov’s inequality, Pr(C m) E(C) m 1 2 . Let r.v. L be the length of the longest chain. Then C L 2 . Now, Pr (L−1)2 2 m Pr L 2 m Pr (C m) 1 2 . By rearranging, we have that Pr L 1 + √ 2m 1 2 , and we are done.
  • 98. Conclusions the expected lookup time in a hash table with chaining is O(1). we have seen that when m n, For both, true randomness (h is picked uniformly from the set of all possible hash functions) and weakly universal hashing (h is picked uniformly from a weakly universal set of hash functions) If h is selected uniformly at random from all functions U → [m] then, Pr (any chain has length 3 log m ) 1 m . LEMMA LEMMA (both Lemmas hold for m any fixed inputs) If h is picked uniformly at random from a weakly universal set of hash functions, Pr any chain has length 1 + √ 2m 1 2 .