Analysis of Algorithm
Disjoint Set Representation
Andres Mendez-Vazquez
November 8, 2015
1 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
2 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
3 / 114
Disjoint Set Representation
Problem
1 Items are drawn from the finite universe U = 1, 2, ..., n for some fixed
n.
2 We want to maintain a partition of U as a collection of disjoint sets.
3 In addition, we want to uniquely name each set by one of its items
called its representative item.
These disjoint sets are maintained under the following operations
1 MakeSet(x)
2 Union(A,B)
3 Find(x)
4 / 114
Disjoint Set Representation
Problem
1 Items are drawn from the finite universe U = 1, 2, ..., n for some fixed
n.
2 We want to maintain a partition of U as a collection of disjoint sets.
3 In addition, we want to uniquely name each set by one of its items
called its representative item.
These disjoint sets are maintained under the following operations
1 MakeSet(x)
2 Union(A,B)
3 Find(x)
4 / 114
Disjoint Set Representation
Problem
1 Items are drawn from the finite universe U = 1, 2, ..., n for some fixed
n.
2 We want to maintain a partition of U as a collection of disjoint sets.
3 In addition, we want to uniquely name each set by one of its items
called its representative item.
These disjoint sets are maintained under the following operations
1 MakeSet(x)
2 Union(A,B)
3 Find(x)
4 / 114
Disjoint Set Representation
Problem
1 Items are drawn from the finite universe U = 1, 2, ..., n for some fixed
n.
2 We want to maintain a partition of U as a collection of disjoint sets.
3 In addition, we want to uniquely name each set by one of its items
called its representative item.
These disjoint sets are maintained under the following operations
1 MakeSet(x)
2 Union(A,B)
3 Find(x)
4 / 114
Disjoint Set Representation
Problem
1 Items are drawn from the finite universe U = 1, 2, ..., n for some fixed
n.
2 We want to maintain a partition of U as a collection of disjoint sets.
3 In addition, we want to uniquely name each set by one of its items
called its representative item.
These disjoint sets are maintained under the following operations
1 MakeSet(x)
2 Union(A,B)
3 Find(x)
4 / 114
Disjoint Set Representation
Problem
1 Items are drawn from the finite universe U = 1, 2, ..., n for some fixed
n.
2 We want to maintain a partition of U as a collection of disjoint sets.
3 In addition, we want to uniquely name each set by one of its items
called its representative item.
These disjoint sets are maintained under the following operations
1 MakeSet(x)
2 Union(A,B)
3 Find(x)
4 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
5 / 114
Operations
MakeSet(x)
Given x ∈ U currently not belonging to any set in the collection, create a
new singleton set {x} and name it x.
This is usually done at start, once per item, to create the initial trivial
partition.
Union(A,B)
It changes the current partition by replacing its sets A and B with A ∪ B.
Name the set A or B.
The operation may choose either one of the two representatives as the
new representatives.
Find(x)
It returns the name of the set that currently contains item x.
6 / 114
Operations
MakeSet(x)
Given x ∈ U currently not belonging to any set in the collection, create a
new singleton set {x} and name it x.
This is usually done at start, once per item, to create the initial trivial
partition.
Union(A,B)
It changes the current partition by replacing its sets A and B with A ∪ B.
Name the set A or B.
The operation may choose either one of the two representatives as the
new representatives.
Find(x)
It returns the name of the set that currently contains item x.
6 / 114
Operations
MakeSet(x)
Given x ∈ U currently not belonging to any set in the collection, create a
new singleton set {x} and name it x.
This is usually done at start, once per item, to create the initial trivial
partition.
Union(A,B)
It changes the current partition by replacing its sets A and B with A ∪ B.
Name the set A or B.
The operation may choose either one of the two representatives as the
new representatives.
Find(x)
It returns the name of the set that currently contains item x.
6 / 114
Example
for x = 1 to 9 do MakeSet(x)
98621 2 3 4 5 6 8 97
Then, you do a Union(1, 2)
Now, Union(3, 4); Union(5, 8); Union(6, 9)
7 / 114
Example
for x = 1 to 9 do MakeSet(x)
98621 2 3 4 5 6 8 97
Then, you do a Union(1, 2)
9863 4 5 6 8 9721 2
Now, Union(3, 4); Union(5, 8); Union(6, 9)
7 / 114
Example
for x = 1 to 9 do MakeSet(x)
98621 2 3 4 5 6 8 97
Then, you do a Union(1, 2)
9863 4 5 6 8 9721 2
Now, Union(3, 4); Union(5, 8); Union(6, 9)
21 2 3 4 5 678 9
7 / 114
Example
Now, Union(1, 5); Union(7, 4)
21 2 3 45 678 9
Then, if we do the following operations
Find(1) returns 5
Find(9) returns 9
Finally, Union(5, 9)
Then Find(9) returns 5
8 / 114
Example
Now, Union(1, 5); Union(7, 4)
21 2 3 45 678 9
Then, if we do the following operations
Find(1) returns 5
Find(9) returns 9
Finally, Union(5, 9)
Then Find(9) returns 5
8 / 114
Example
Now, Union(1, 5); Union(7, 4)
21 2 3 45 678 9
Then, if we do the following operations
Find(1) returns 5
Find(9) returns 9
Finally, Union(5, 9)
21 2 3 45 6 78 9
Then Find(9) returns 5
8 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
9 / 114
Union-Find Problem
Problem
S = be a sequence of m = |S| MakeSet, Union and Find operations
(intermixed in arbitrary order):
n of which are MakeSet.
At most n − 1 are Union.
The rest are Finds.
Cost(S) = total computational time to execute sequence s.
Goal: Find an implementation that, for every m and n, minimizes the
amortized cost per operation:
Cost (S)
|S|
(1)
for any arbitrary sequence S.
10 / 114
Union-Find Problem
Problem
S = be a sequence of m = |S| MakeSet, Union and Find operations
(intermixed in arbitrary order):
n of which are MakeSet.
At most n − 1 are Union.
The rest are Finds.
Cost(S) = total computational time to execute sequence s.
Goal: Find an implementation that, for every m and n, minimizes the
amortized cost per operation:
Cost (S)
|S|
(1)
for any arbitrary sequence S.
10 / 114
Union-Find Problem
Problem
S = be a sequence of m = |S| MakeSet, Union and Find operations
(intermixed in arbitrary order):
n of which are MakeSet.
At most n − 1 are Union.
The rest are Finds.
Cost(S) = total computational time to execute sequence s.
Goal: Find an implementation that, for every m and n, minimizes the
amortized cost per operation:
Cost (S)
|S|
(1)
for any arbitrary sequence S.
10 / 114
Union-Find Problem
Problem
S = be a sequence of m = |S| MakeSet, Union and Find operations
(intermixed in arbitrary order):
n of which are MakeSet.
At most n − 1 are Union.
The rest are Finds.
Cost(S) = total computational time to execute sequence s.
Goal: Find an implementation that, for every m and n, minimizes the
amortized cost per operation:
Cost (S)
|S|
(1)
for any arbitrary sequence S.
10 / 114
Union-Find Problem
Problem
S = be a sequence of m = |S| MakeSet, Union and Find operations
(intermixed in arbitrary order):
n of which are MakeSet.
At most n − 1 are Union.
The rest are Finds.
Cost(S) = total computational time to execute sequence s.
Goal: Find an implementation that, for every m and n, minimizes the
amortized cost per operation:
Cost (S)
|S|
(1)
for any arbitrary sequence S.
10 / 114
Union-Find Problem
Problem
S = be a sequence of m = |S| MakeSet, Union and Find operations
(intermixed in arbitrary order):
n of which are MakeSet.
At most n − 1 are Union.
The rest are Finds.
Cost(S) = total computational time to execute sequence s.
Goal: Find an implementation that, for every m and n, minimizes the
amortized cost per operation:
Cost (S)
|S|
(1)
for any arbitrary sequence S.
10 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
11 / 114
Applications
Examples
1 Maintaining partitions and equivalence classes.
2 Graph connectivity under edge insertion.
3 Minimum spanning trees (e.g. Kruskal’s algorithm).
4 Random maze construction.
12 / 114
Applications
Examples
1 Maintaining partitions and equivalence classes.
2 Graph connectivity under edge insertion.
3 Minimum spanning trees (e.g. Kruskal’s algorithm).
4 Random maze construction.
12 / 114
Applications
Examples
1 Maintaining partitions and equivalence classes.
2 Graph connectivity under edge insertion.
3 Minimum spanning trees (e.g. Kruskal’s algorithm).
4 Random maze construction.
12 / 114
Applications
Examples
1 Maintaining partitions and equivalence classes.
2 Graph connectivity under edge insertion.
3 Minimum spanning trees (e.g. Kruskal’s algorithm).
4 Random maze construction.
12 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
13 / 114
Circular lists
We use the following structures
Data structure: Two arrays Set[1..n] and next[1..n].
Set[x] returns the name of the set that contains item x.
A is a set if and only if Set[A] = A
next[x] returns the next item on the list of the set that contains item
x.
14 / 114
Circular lists
We use the following structures
Data structure: Two arrays Set[1..n] and next[1..n].
Set[x] returns the name of the set that contains item x.
A is a set if and only if Set[A] = A
next[x] returns the next item on the list of the set that contains item
x.
14 / 114
Circular lists
We use the following structures
Data structure: Two arrays Set[1..n] and next[1..n].
Set[x] returns the name of the set that contains item x.
A is a set if and only if Set[A] = A
next[x] returns the next item on the list of the set that contains item
x.
14 / 114
Circular lists
We use the following structures
Data structure: Two arrays Set[1..n] and next[1..n].
Set[x] returns the name of the set that contains item x.
A is a set if and only if Set[A] = A
next[x] returns the next item on the list of the set that contains item
x.
14 / 114
Circular lists
Example: n = 16,
Partition: {{1, 2, 8, 9} , {4, 3, 10, 13, 14, 15, 16} , {7, 6, 5, 11, 12}
Set
next
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 1 4 4 7 7 7 1 1 4 7 7 4 4 4 4
2 8 10 3 12 5 6 9 1 13 7 11 14 15 16 4
Set Position 1
15 / 114
Circular lists
Example: n = 16,
Partition: {{1, 2, 8, 9} , {4, 3, 10, 13, 14, 15, 16} , {7, 6, 5, 11, 12}
Set
next
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 1 4 4 7 7 7 1 1 4 7 7 4 4 4 4
2 8 10 3 12 5 6 9 1 13 7 11 14 15 16 4
Set Position 1
1 2 8 91
15 / 114
Circular lists
Set Position 7
7 6 5 12 117
Set Position 4
16 / 114
Circular lists
Set Position 7
7 6 5 12 117
Set Position 4
4 3 10 13 14 15 164
16 / 114
Operations and Cost
Make(x)
1 Set[x] = x
2 next[x] = x
Complexity
O (1) Time
Find(x)
1 return Set[x]
Complexity
O (1) Time
17 / 114
Operations and Cost
Make(x)
1 Set[x] = x
2 next[x] = x
Complexity
O (1) Time
Find(x)
1 return Set[x]
Complexity
O (1) Time
17 / 114
Operations and Cost
Make(x)
1 Set[x] = x
2 next[x] = x
Complexity
O (1) Time
Find(x)
1 return Set[x]
Complexity
O (1) Time
17 / 114
Operations and Cost
Make(x)
1 Set[x] = x
2 next[x] = x
Complexity
O (1) Time
Find(x)
1 return Set[x]
Complexity
O (1) Time
17 / 114
Operations and Cost
For the union
We are assuming Set[A] = A =Set[B] = B
Union1(A, B)
1 Set[B] = A
2 x =next[B]
3 while (x = B)
4 Set[x] = A /* Rename Set B to A*/
5 x =next[x]
6 x =next[B] /* Splice list A and B */
7 next[B] =next[A]
8 next[A] = x
18 / 114
Operations and Cost
For the union
We are assuming Set[A] = A =Set[B] = B
Union1(A, B)
1 Set[B] = A
2 x =next[B]
3 while (x = B)
4 Set[x] = A /* Rename Set B to A*/
5 x =next[x]
6 x =next[B] /* Splice list A and B */
7 next[B] =next[A]
8 next[A] = x
18 / 114
Operations and Cost
For the union
We are assuming Set[A] = A =Set[B] = B
Union1(A, B)
1 Set[B] = A
2 x =next[B]
3 while (x = B)
4 Set[x] = A /* Rename Set B to A*/
5 x =next[x]
6 x =next[B] /* Splice list A and B */
7 next[B] =next[A]
8 next[A] = x
18 / 114
Operations an Cost
Thus, we have in the Splice part
A
B
x
19 / 114
We have a Problem
Complexity
O (|B|) Time
Not only that, if we have the following sequence of operations
1 for x = 1 to n
2 MakeSet(x)
3 for x = 1 to n − 1
4 Union1(x + 1, x)
20 / 114
We have a Problem
Complexity
O (|B|) Time
Not only that, if we have the following sequence of operations
1 for x = 1 to n
2 MakeSet(x)
3 for x = 1 to n − 1
4 Union1(x + 1, x)
20 / 114
Thus
Thus, we have the following number of aggregated steps
n +
n−1
i=1
i = n +
n (n − 1)
2
= n +
n2 − n
2
=
n2
2
+
n
2
= Θ n2
21 / 114
Thus
Thus, we have the following number of aggregated steps
n +
n−1
i=1
i = n +
n (n − 1)
2
= n +
n2 − n
2
=
n2
2
+
n
2
= Θ n2
21 / 114
Thus
Thus, we have the following number of aggregated steps
n +
n−1
i=1
i = n +
n (n − 1)
2
= n +
n2 − n
2
=
n2
2
+
n
2
= Θ n2
21 / 114
Thus
Thus, we have the following number of aggregated steps
n +
n−1
i=1
i = n +
n (n − 1)
2
= n +
n2 − n
2
=
n2
2
+
n
2
= Θ n2
21 / 114
Thus
Thus, we have the following number of aggregated steps
n +
n−1
i=1
i = n +
n (n − 1)
2
= n +
n2 − n
2
=
n2
2
+
n
2
= Θ n2
21 / 114
Aggregate Time
Thus, the aggregate time is as follow
Aggregate Time = Θ n2
Therefore
Amortized Time per operation = Θ (n)
22 / 114
Aggregate Time
Thus, the aggregate time is as follow
Aggregate Time = Θ n2
Therefore
Amortized Time per operation = Θ (n)
22 / 114
This is not exactly good
Thus, we need to have something better
We will try now the Weighted-Union Heuristic!!!
23 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
24 / 114
Implementation 2: Weighted-Union Heuristic Lists
We extend the previous data structure
Data structure: Three arrays Set[1..n], next[1..n], size[1..n].
size[A] returns the number of items in set A if A == Set[A]
(Otherwise, we do not care).
25 / 114
Operations
MakeSet(x)
1 Set[x] = x
2 next[x] = x
3 size[x] = 1
Complexity
O (1) time
Find(x)
1 return Set[x]
Complexity
O (1) time
26 / 114
Operations
MakeSet(x)
1 Set[x] = x
2 next[x] = x
3 size[x] = 1
Complexity
O (1) time
Find(x)
1 return Set[x]
Complexity
O (1) time
26 / 114
Operations
MakeSet(x)
1 Set[x] = x
2 next[x] = x
3 size[x] = 1
Complexity
O (1) time
Find(x)
1 return Set[x]
Complexity
O (1) time
26 / 114
Operations
MakeSet(x)
1 Set[x] = x
2 next[x] = x
3 size[x] = 1
Complexity
O (1) time
Find(x)
1 return Set[x]
Complexity
O (1) time
26 / 114
Operations
Union2(A, B)
1 if size[set [A]] >size[set [B]]
2 size[set [A]] =size[set [A]]+size[set [B]]
3 Union1(A, B)
4 else
5 size[set [B]] =size[set [A]]+size[set [B]]
6 Union1(B, A)
Note: Weight Balanced Union: Merge smaller set into large set
Complexity
O (min {|A| , |B|}) time.
27 / 114
Operations
Union2(A, B)
1 if size[set [A]] >size[set [B]]
2 size[set [A]] =size[set [A]]+size[set [B]]
3 Union1(A, B)
4 else
5 size[set [B]] =size[set [A]]+size[set [B]]
6 Union1(B, A)
Note: Weight Balanced Union: Merge smaller set into large set
Complexity
O (min {|A| , |B|}) time.
27 / 114
What about the operations eliciting the worst behavior
Remember
1 for x = 1 to n
2 MakeSet(x)
3 for x = 1 to n − 1
4 Union2(x + 1, x)
We have then
n +
n−1
i=1
1 = n + n − 1
= 2n − 1
= Θ (n)
IMPORTANT: This is not the worst sequence!!!
28 / 114
What about the operations eliciting the worst behavior
Remember
1 for x = 1 to n
2 MakeSet(x)
3 for x = 1 to n − 1
4 Union2(x + 1, x)
We have then
n +
n−1
i=1
1 = n + n − 1
= 2n − 1
= Θ (n)
IMPORTANT: This is not the worst sequence!!!
28 / 114
For this, notice the following worst sequence
Worst Sequence s
MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner.
Within each round, the sets have roughly equal size.
Starting round: Each round has size 1.
Next round: Each round has size 2.
Next: ... size 4.
...
We claim the following
Aggregate time = Θ(n log n)
Amortized time per operation = Θ(log n)
29 / 114
For this, notice the following worst sequence
Worst Sequence s
MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner.
Within each round, the sets have roughly equal size.
Starting round: Each round has size 1.
Next round: Each round has size 2.
Next: ... size 4.
...
We claim the following
Aggregate time = Θ(n log n)
Amortized time per operation = Θ(log n)
29 / 114
For this, notice the following worst sequence
Worst Sequence s
MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner.
Within each round, the sets have roughly equal size.
Starting round: Each round has size 1.
Next round: Each round has size 2.
Next: ... size 4.
...
We claim the following
Aggregate time = Θ(n log n)
Amortized time per operation = Θ(log n)
29 / 114
For this, notice the following worst sequence
Worst Sequence s
MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner.
Within each round, the sets have roughly equal size.
Starting round: Each round has size 1.
Next round: Each round has size 2.
Next: ... size 4.
...
We claim the following
Aggregate time = Θ(n log n)
Amortized time per operation = Θ(log n)
29 / 114
For this, notice the following worst sequence
Worst Sequence s
MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner.
Within each round, the sets have roughly equal size.
Starting round: Each round has size 1.
Next round: Each round has size 2.
Next: ... size 4.
...
We claim the following
Aggregate time = Θ(n log n)
Amortized time per operation = Θ(log n)
29 / 114
For this, notice the following worst sequence
Worst Sequence s
MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner.
Within each round, the sets have roughly equal size.
Starting round: Each round has size 1.
Next round: Each round has size 2.
Next: ... size 4.
...
We claim the following
Aggregate time = Θ(n log n)
Amortized time per operation = Θ(log n)
29 / 114
For this, notice the following worst sequence
Worst Sequence s
MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner.
Within each round, the sets have roughly equal size.
Starting round: Each round has size 1.
Next round: Each round has size 2.
Next: ... size 4.
...
We claim the following
Aggregate time = Θ(n log n)
Amortized time per operation = Θ(log n)
29 / 114
For this, notice the following worst sequence
Worst Sequence s
MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner.
Within each round, the sets have roughly equal size.
Starting round: Each round has size 1.
Next round: Each round has size 2.
Next: ... size 4.
...
We claim the following
Aggregate time = Θ(n log n)
Amortized time per operation = Θ(log n)
29 / 114
For this, notice the following worst sequence
Example n = 16
Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12}
{13} {14} {15} {16}
Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15,
16}
Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16}
Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16}
Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
30 / 114
For this, notice the following worst sequence
Example n = 16
Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12}
{13} {14} {15} {16}
Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15,
16}
Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16}
Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16}
Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
30 / 114
For this, notice the following worst sequence
Example n = 16
Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12}
{13} {14} {15} {16}
Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15,
16}
Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16}
Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16}
Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
30 / 114
For this, notice the following worst sequence
Example n = 16
Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12}
{13} {14} {15} {16}
Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15,
16}
Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16}
Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16}
Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
30 / 114
For this, notice the following worst sequence
Example n = 16
Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12}
{13} {14} {15} {16}
Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15,
16}
Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16}
Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16}
Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
30 / 114
Now
Given the previous worst case
What is the complexity of this implementation?
31 / 114
Now, the Amortized Costs of this implementation
Claim 1: Amortized time per operation is O(log n)
For this, we have the following theorem!!!
Theorem 1
Using the linked-list representation of disjoint sets and the weighted-Union
heuristic, a sequence of m MakeSet, Union, and FindSet operations, n of
which are MakeSet operations, takes O (m + n log n) time.
32 / 114
Now, the Amortized Costs of this implementation
Claim 1: Amortized time per operation is O(log n)
For this, we have the following theorem!!!
Theorem 1
Using the linked-list representation of disjoint sets and the weighted-Union
heuristic, a sequence of m MakeSet, Union, and FindSet operations, n of
which are MakeSet operations, takes O (m + n log n) time.
32 / 114
Proof
Because each Union operation unites two disjoint sets
We perform at most n − 1 Union operations over all.
We now bound the total time taken by these Union operations
We start by determining, for each object, an upper bound on the
number of times the object’s pointer back to its set object is
updated.
Consider a particular object x.
We know that each time x’s pointer was updated, x must have started
in the smaller set.
The first time x’s pointer was updated, therefore, the resulting set
must have had at least 2 members.
Similarly, the next time x’s pointer was updated, the resulting set must
have had at least 4 members.
33 / 114
Proof
Because each Union operation unites two disjoint sets
We perform at most n − 1 Union operations over all.
We now bound the total time taken by these Union operations
We start by determining, for each object, an upper bound on the
number of times the object’s pointer back to its set object is
updated.
Consider a particular object x.
We know that each time x’s pointer was updated, x must have started
in the smaller set.
The first time x’s pointer was updated, therefore, the resulting set
must have had at least 2 members.
Similarly, the next time x’s pointer was updated, the resulting set must
have had at least 4 members.
33 / 114
Proof
Because each Union operation unites two disjoint sets
We perform at most n − 1 Union operations over all.
We now bound the total time taken by these Union operations
We start by determining, for each object, an upper bound on the
number of times the object’s pointer back to its set object is
updated.
Consider a particular object x.
We know that each time x’s pointer was updated, x must have started
in the smaller set.
The first time x’s pointer was updated, therefore, the resulting set
must have had at least 2 members.
Similarly, the next time x’s pointer was updated, the resulting set must
have had at least 4 members.
33 / 114
Proof
Because each Union operation unites two disjoint sets
We perform at most n − 1 Union operations over all.
We now bound the total time taken by these Union operations
We start by determining, for each object, an upper bound on the
number of times the object’s pointer back to its set object is
updated.
Consider a particular object x.
We know that each time x’s pointer was updated, x must have started
in the smaller set.
The first time x’s pointer was updated, therefore, the resulting set
must have had at least 2 members.
Similarly, the next time x’s pointer was updated, the resulting set must
have had at least 4 members.
33 / 114
Proof
Because each Union operation unites two disjoint sets
We perform at most n − 1 Union operations over all.
We now bound the total time taken by these Union operations
We start by determining, for each object, an upper bound on the
number of times the object’s pointer back to its set object is
updated.
Consider a particular object x.
We know that each time x’s pointer was updated, x must have started
in the smaller set.
The first time x’s pointer was updated, therefore, the resulting set
must have had at least 2 members.
Similarly, the next time x’s pointer was updated, the resulting set must
have had at least 4 members.
33 / 114
Proof
Continuing on
We observe that for any k ≤ n, after x’s pointer has been updated log n
times!!!
The resulting set must have at least k members.
Thus
Since the largest set has at most n members, each object’s pointer is
updated at most log n times over all the Union operations.
Then
The total time spent updating object pointers over all Union operations is
O (n log n).
34 / 114
Proof
Continuing on
We observe that for any k ≤ n, after x’s pointer has been updated log n
times!!!
The resulting set must have at least k members.
Thus
Since the largest set has at most n members, each object’s pointer is
updated at most log n times over all the Union operations.
Then
The total time spent updating object pointers over all Union operations is
O (n log n).
34 / 114
Proof
Continuing on
We observe that for any k ≤ n, after x’s pointer has been updated log n
times!!!
The resulting set must have at least k members.
Thus
Since the largest set has at most n members, each object’s pointer is
updated at most log n times over all the Union operations.
Then
The total time spent updating object pointers over all Union operations is
O (n log n).
34 / 114
Proof
Continuing on
We observe that for any k ≤ n, after x’s pointer has been updated log n
times!!!
The resulting set must have at least k members.
Thus
Since the largest set has at most n members, each object’s pointer is
updated at most log n times over all the Union operations.
Then
The total time spent updating object pointers over all Union operations is
O (n log n).
34 / 114
Proof
We must also account for updating the tail pointers and the list
lengths
It takes only O (1) time per Union operation
Therefore
The total time spent in all Union operations is thus O (n log n).
The time for the entire sequence of m operations follows easily
Each MakeSet and FindSet operation takes O (1) time, and there are
O (m) of them.
35 / 114
Proof
We must also account for updating the tail pointers and the list
lengths
It takes only O (1) time per Union operation
Therefore
The total time spent in all Union operations is thus O (n log n).
The time for the entire sequence of m operations follows easily
Each MakeSet and FindSet operation takes O (1) time, and there are
O (m) of them.
35 / 114
Proof
We must also account for updating the tail pointers and the list
lengths
It takes only O (1) time per Union operation
Therefore
The total time spent in all Union operations is thus O (n log n).
The time for the entire sequence of m operations follows easily
Each MakeSet and FindSet operation takes O (1) time, and there are
O (m) of them.
35 / 114
Proof
Therefore
The total time for the entire sequence is thus O (m + n log n).
36 / 114
Amortized Cost: Aggregate Analysis
Aggregate cost O(m + n log n). Amortized cost per operation
O(log n).
O(m + n log n)
m
= O (1 + log n) = O (log n) (2)
37 / 114
There are other ways of analyzing the amortized cost
It is possible to use
1 Accounting Method.
2 Potential Method.
38 / 114
Amortized Costs: Accounting Method
Accounting method
MakeSet(x): Charge (1 + log n). 1 to do the operation, log n stored
as credit with item x.
Find(x): Charge 1, and use it to do the operation.
Union(A, B): Charge 0 and use 1 stored credit from each item in the
smaller set to move it.
39 / 114
Amortized Costs: Accounting Method
Accounting method
MakeSet(x): Charge (1 + log n). 1 to do the operation, log n stored
as credit with item x.
Find(x): Charge 1, and use it to do the operation.
Union(A, B): Charge 0 and use 1 stored credit from each item in the
smaller set to move it.
39 / 114
Amortized Costs: Accounting Method
Accounting method
MakeSet(x): Charge (1 + log n). 1 to do the operation, log n stored
as credit with item x.
Find(x): Charge 1, and use it to do the operation.
Union(A, B): Charge 0 and use 1 stored credit from each item in the
smaller set to move it.
39 / 114
Amortized Costs: Accounting Method
Credit invariant
Total stored credit is
S
|S| log n
|S| , where the summation is taken over
the collection S of all disjoint sets of the current partition.
40 / 114
Amortized Costs: Potential Method
Potential function method
Exercise:
Define a regular potential function and use it to do the amortized
analysis.
Can you make the Union amortized cost O(log n), MakeSet and Find
costs O(1)?
41 / 114
Amortized Costs: Potential Method
Potential function method
Exercise:
Define a regular potential function and use it to do the amortized
analysis.
Can you make the Union amortized cost O(log n), MakeSet and Find
costs O(1)?
41 / 114
Amortized Costs: Potential Method
Potential function method
Exercise:
Define a regular potential function and use it to do the amortized
analysis.
Can you make the Union amortized cost O(log n), MakeSet and Find
costs O(1)?
41 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
42 / 114
Improving over the heuristic using union by rank
Union by Rank
Instead of using the number of nodes in each tree to make a decision, we
maintain a rank, a upper bound on the height of the tree.
We have the following data structure to support this:
We maintain a parent array p[1..n].
A is a set if and only if A = p[A] (a tree root).
x ∈ A if and only if x is in the tree rooted at A.
43 / 114
Improving over the heuristic using union by rank
Union by Rank
Instead of using the number of nodes in each tree to make a decision, we
maintain a rank, a upper bound on the height of the tree.
We have the following data structure to support this:
We maintain a parent array p[1..n].
A is a set if and only if A = p[A] (a tree root).
x ∈ A if and only if x is in the tree rooted at A.
43 / 114
Improving over the heuristic using union by rank
Union by Rank
Instead of using the number of nodes in each tree to make a decision, we
maintain a rank, a upper bound on the height of the tree.
We have the following data structure to support this:
We maintain a parent array p[1..n].
A is a set if and only if A = p[A] (a tree root).
x ∈ A if and only if x is in the tree rooted at A.
43 / 114
Improving over the heuristic using union by rank
Union by Rank
Instead of using the number of nodes in each tree to make a decision, we
maintain a rank, a upper bound on the height of the tree.
We have the following data structure to support this:
We maintain a parent array p[1..n].
A is a set if and only if A = p[A] (a tree root).
x ∈ A if and only if x is in the tree rooted at A.
43 / 114
Improving over the heuristic using union by rank
Union by Rank
Instead of using the number of nodes in each tree to make a decision, we
maintain a rank, a upper bound on the height of the tree.
We have the following data structure to support this:
We maintain a parent array p[1..n].
A is a set if and only if A = p[A] (a tree root).
x ∈ A if and only if x is in the tree rooted at A.
1
13 5 8
20 14 10
19
4
2 18 6
15 9 11
17
7
3
12
16
43 / 114
Forest of Up-Trees: Operations without union by rank or
weight
MakeSet(x)
1 p[x] = x
Complexity
O (1) time
Union(A, B)
1 p[B] = A
Note: We are assuming that p[A] == A =p[B] == B. This is the
reason we need a find operation!!!
44 / 114
Forest of Up-Trees: Operations without union by rank or
weight
MakeSet(x)
1 p[x] = x
Complexity
O (1) time
Union(A, B)
1 p[B] = A
Note: We are assuming that p[A] == A =p[B] == B. This is the
reason we need a find operation!!!
44 / 114
Forest of Up-Trees: Operations without union by rank or
weight
MakeSet(x)
1 p[x] = x
Complexity
O (1) time
Union(A, B)
1 p[B] = A
Note: We are assuming that p[A] == A =p[B] == B. This is the
reason we need a find operation!!!
44 / 114
Example
Remember we are doing the joins without caring about getting the
worst case
B
A
45 / 114
Forest of Up-Trees: Operations without union by rank or
weight
Find(x)
1 if x ==p[x]
2 return x
3 return Find(p [x])
Example
46 / 114
Forest of Up-Trees: Operations without union by rank or
weight
Find(x)
1 if x ==p[x]
2 return x
3 return Find(p [x])
Example
x
46 / 114
Forest of Up-Trees: Operations without union by rank or
weight
Still I can give you a horrible case
Sequence of operations
1 for x = 1 to n
2 MakeSet(x)
3 for x = 1 to n − 1
4 Union(x)
5 for x = 1 to n − 1
6 Find(1)
47 / 114
Forest of Up-Trees: Operations without union by rank or
weight
Still I can give you a horrible case
Sequence of operations
1 for x = 1 to n
2 MakeSet(x)
3 for x = 1 to n − 1
4 Union(x)
5 for x = 1 to n − 1
6 Find(1)
47 / 114
Forest of Up-Trees: Operations without union by rank or
weight
We finish with this data structure
1
2
n-1
n
Thus the last part of the sequence give us a total time of
Aggregate Time Θ n2
Amortized Analysis per operation Θ (n)
48 / 114
Forest of Up-Trees: Operations without union by rank or
weight
We finish with this data structure
1
2
n-1
n
Thus the last part of the sequence give us a total time of
Aggregate Time Θ n2
Amortized Analysis per operation Θ (n)
48 / 114
Self-Adjusting forest of Up-Trees
How, we avoid this problem
Use together the following heuristics!!!
1 Balanced Union.
By tree weight (i.e., size)
By tree rank (i.e., height)
2 Find with path compression
Observations
Each single improvement (1 or 2) by itself will result in logarithmic
amortized cost per operation.
The two improvements combined will result in amortized cost per
operation approaching very close to O(1).
49 / 114
Self-Adjusting forest of Up-Trees
How, we avoid this problem
Use together the following heuristics!!!
1 Balanced Union.
By tree weight (i.e., size)
By tree rank (i.e., height)
2 Find with path compression
Observations
Each single improvement (1 or 2) by itself will result in logarithmic
amortized cost per operation.
The two improvements combined will result in amortized cost per
operation approaching very close to O(1).
49 / 114
Self-Adjusting forest of Up-Trees
How, we avoid this problem
Use together the following heuristics!!!
1 Balanced Union.
By tree weight (i.e., size)
By tree rank (i.e., height)
2 Find with path compression
Observations
Each single improvement (1 or 2) by itself will result in logarithmic
amortized cost per operation.
The two improvements combined will result in amortized cost per
operation approaching very close to O(1).
49 / 114
Self-Adjusting forest of Up-Trees
How, we avoid this problem
Use together the following heuristics!!!
1 Balanced Union.
By tree weight (i.e., size)
By tree rank (i.e., height)
2 Find with path compression
Observations
Each single improvement (1 or 2) by itself will result in logarithmic
amortized cost per operation.
The two improvements combined will result in amortized cost per
operation approaching very close to O(1).
49 / 114
Self-Adjusting forest of Up-Trees
How, we avoid this problem
Use together the following heuristics!!!
1 Balanced Union.
By tree weight (i.e., size)
By tree rank (i.e., height)
2 Find with path compression
Observations
Each single improvement (1 or 2) by itself will result in logarithmic
amortized cost per operation.
The two improvements combined will result in amortized cost per
operation approaching very close to O(1).
49 / 114
Self-Adjusting forest of Up-Trees
How, we avoid this problem
Use together the following heuristics!!!
1 Balanced Union.
By tree weight (i.e., size)
By tree rank (i.e., height)
2 Find with path compression
Observations
Each single improvement (1 or 2) by itself will result in logarithmic
amortized cost per operation.
The two improvements combined will result in amortized cost per
operation approaching very close to O(1).
49 / 114
Self-Adjusting forest of Up-Trees
How, we avoid this problem
Use together the following heuristics!!!
1 Balanced Union.
By tree weight (i.e., size)
By tree rank (i.e., height)
2 Find with path compression
Observations
Each single improvement (1 or 2) by itself will result in logarithmic
amortized cost per operation.
The two improvements combined will result in amortized cost per
operation approaching very close to O(1).
49 / 114
Balanced Union by Size
Using size for Balanced Union
We can use the size of each set to obtain what we want
50 / 114
We have then
MakeSet(x)
1 p[x] = x
2 size[x] = 1
Note: Complexity O (1) time
Union(A, B)
Input: assume that p[A]=A=p[B]=B
1 if size[A] >size[B]
2 size[A] =size[A]+size[B]
3 p[B] = A
4 else
5 size[B] =size[A]+size[B]
6 p[A] = B
Note: Complexity O (1) time
51 / 114
We have then
MakeSet(x)
1 p[x] = x
2 size[x] = 1
Note: Complexity O (1) time
Union(A, B)
Input: assume that p[A]=A=p[B]=B
1 if size[A] >size[B]
2 size[A] =size[A]+size[B]
3 p[B] = A
4 else
5 size[B] =size[A]+size[B]
6 p[A] = B
Note: Complexity O (1) time
51 / 114
We have then
MakeSet(x)
1 p[x] = x
2 size[x] = 1
Note: Complexity O (1) time
Union(A, B)
Input: assume that p[A]=A=p[B]=B
1 if size[A] >size[B]
2 size[A] =size[A]+size[B]
3 p[B] = A
4 else
5 size[B] =size[A]+size[B]
6 p[A] = B
Note: Complexity O (1) time
51 / 114
Example
Now, we use the size for the union
size[A]>size[B]
B
A
52 / 114
Nevertheless
Union by size can make the analysis too complex
People would rather use the rank
Rank
It is defined as the height of the tree
Because
The use of the rank simplify the amortized analysis for the data structure!!!
53 / 114
Nevertheless
Union by size can make the analysis too complex
People would rather use the rank
Rank
It is defined as the height of the tree
Because
The use of the rank simplify the amortized analysis for the data structure!!!
53 / 114
Nevertheless
Union by size can make the analysis too complex
People would rather use the rank
Rank
It is defined as the height of the tree
Because
The use of the rank simplify the amortized analysis for the data structure!!!
53 / 114
Thus, we use the balanced union by rank
MakeSet(x)
1 p[x] = x
2 rank[x] = 0
Note: Complexity O (1) time
Union(A, B)
Input: assume that p[A]=A=p[B]=B
1 if rank[A] >rank[B]
2 p[B] = A
3 else
4 p[A] = B
5 if rank[A] ==rank[B]
6 rank[B]=rank[B]+1
Note: Complexity O (1) time
54 / 114
Thus, we use the balanced union by rank
MakeSet(x)
1 p[x] = x
2 rank[x] = 0
Note: Complexity O (1) time
Union(A, B)
Input: assume that p[A]=A=p[B]=B
1 if rank[A] >rank[B]
2 p[B] = A
3 else
4 p[A] = B
5 if rank[A] ==rank[B]
6 rank[B]=rank[B]+1
Note: Complexity O (1) time
54 / 114
Thus, we use the balanced union by rank
MakeSet(x)
1 p[x] = x
2 rank[x] = 0
Note: Complexity O (1) time
Union(A, B)
Input: assume that p[A]=A=p[B]=B
1 if rank[A] >rank[B]
2 p[B] = A
3 else
4 p[A] = B
5 if rank[A] ==rank[B]
6 rank[B]=rank[B]+1
Note: Complexity O (1) time
54 / 114
Example
Now
We use the rank for the union
Case I
The rank of A is larger than B
55 / 114
Example
Now
We use the rank for the union
Case I
The rank of A is larger than B
rank[A]>rank[B]
B
A
55 / 114
Example
Case II
The rank of B is larger than A
56 / 114
Example
Case II
The rank of B is larger than A
rank[B]>rank[A]
B
A
56 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
57 / 114
Here is the new heuristic to improve overall performance:
Path Compression
Find(x)
1 if x=p[x]
2 p[x]=Find(p [x])
3 return p[x]
Complexity
O (depth (x)) time
58 / 114
Here is the new heuristic to improve overall performance:
Path Compression
Find(x)
1 if x=p[x]
2 p[x]=Find(p [x])
3 return p[x]
Complexity
O (depth (x)) time
58 / 114
Example
We have the following structure
59 / 114
Example
The recursive Find(p [x])
60 / 114
Example
The recursive Find(p [x])
61 / 114
Path compression
Find(x) should traverse the path from x up to its root.
This might as well create shortcuts along the way to improve the efficiency
of the future operations.
Find(2)
3
13 15 10 13 15 10
3
12 14 12 1411
11
2
2
1
1
7 8
7 84
6 5
9
4
6 5
9
62 / 114
Outline
1 Disjoint Set Representation
Definition of the Problem
Operations
2 Union-Find Problem
The Main Problem
Applications
3 Implementations
First Attempt: Circular List
Operations and Cost
Still we have a Problem
Weighted-Union Heuristic
Operations
Still a Problem
Heuristic Union by Rank
4 Balanced Union
Path compression
Time Complexity
Ackermann’s Function
Bounds
The Rank Observation
Proof of Complexity
Theorem for Union by Rank and Path Compression)
63 / 114
Time complexity
Tight upper bound on time complexity
An amortized time of O(mα(m, n)) for m operations.
Where α(m, n) is the inverse of the Ackermann’s function (almost a
constant).
This bound, for a slightly different definition of α than that given
here is shown in Cormen’s book.
64 / 114
Time complexity
Tight upper bound on time complexity
An amortized time of O(mα(m, n)) for m operations.
Where α(m, n) is the inverse of the Ackermann’s function (almost a
constant).
This bound, for a slightly different definition of α than that given
here is shown in Cormen’s book.
64 / 114
Time complexity
Tight upper bound on time complexity
An amortized time of O(mα(m, n)) for m operations.
Where α(m, n) is the inverse of the Ackermann’s function (almost a
constant).
This bound, for a slightly different definition of α than that given
here is shown in Cormen’s book.
64 / 114
Ackermann’s Function
Definition
A(1, j) = 2j where j ≥ 1
A(i, 1) = A(i − 1, 2) where i ≥ 2
A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2
Note:
This is one of several in-equivalent but similar definitions
of Ackermann’s function found in the literature.
Cormen’s book authors give a different definition,
although they never really call theirs Ackermann’s
function.
Property
Ackermann’s function grows very fast, thus it’s inverse grows very slow.
65 / 114
Ackermann’s Function
Definition
A(1, j) = 2j where j ≥ 1
A(i, 1) = A(i − 1, 2) where i ≥ 2
A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2
Note:
This is one of several in-equivalent but similar definitions
of Ackermann’s function found in the literature.
Cormen’s book authors give a different definition,
although they never really call theirs Ackermann’s
function.
Property
Ackermann’s function grows very fast, thus it’s inverse grows very slow.
65 / 114
Ackermann’s Function
Definition
A(1, j) = 2j where j ≥ 1
A(i, 1) = A(i − 1, 2) where i ≥ 2
A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2
Note:
This is one of several in-equivalent but similar definitions
of Ackermann’s function found in the literature.
Cormen’s book authors give a different definition,
although they never really call theirs Ackermann’s
function.
Property
Ackermann’s function grows very fast, thus it’s inverse grows very slow.
65 / 114
Ackermann’s Function
Definition
A(1, j) = 2j where j ≥ 1
A(i, 1) = A(i − 1, 2) where i ≥ 2
A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2
Note:
This is one of several in-equivalent but similar definitions
of Ackermann’s function found in the literature.
Cormen’s book authors give a different definition,
although they never really call theirs Ackermann’s
function.
Property
Ackermann’s function grows very fast, thus it’s inverse grows very slow.
65 / 114
Ackermann’s Function
Definition
A(1, j) = 2j where j ≥ 1
A(i, 1) = A(i − 1, 2) where i ≥ 2
A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2
Note:
This is one of several in-equivalent but similar definitions
of Ackermann’s function found in the literature.
Cormen’s book authors give a different definition,
although they never really call theirs Ackermann’s
function.
Property
Ackermann’s function grows very fast, thus it’s inverse grows very slow.
65 / 114
Ackermann’s Function
Definition
A(1, j) = 2j where j ≥ 1
A(i, 1) = A(i − 1, 2) where i ≥ 2
A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2
Note:
This is one of several in-equivalent but similar definitions
of Ackermann’s function found in the literature.
Cormen’s book authors give a different definition,
although they never really call theirs Ackermann’s
function.
Property
Ackermann’s function grows very fast, thus it’s inverse grows very slow.
65 / 114
Ackermann’s Function
Example A(3, 4)
A (3, 4) = 2
2...
2 2
2
...
2 2
2
...
2 16
Notation: 2
2...
2 10
means 22222222222
66 / 114
Inverse of Ackermann’s function
Definition
α(m, n) = min i ≥ 1|A i,
m
n
> log n (3)
Note: This is not a true mathematical inverse.
Intuition: Grows about as slowly as Ackermann’s function does fast.
How slowly?
Let m
n = k, then m ≥ n → k ≥ 1
67 / 114
Inverse of Ackermann’s function
Definition
α(m, n) = min i ≥ 1|A i,
m
n
> log n (3)
Note: This is not a true mathematical inverse.
Intuition: Grows about as slowly as Ackermann’s function does fast.
How slowly?
Let m
n = k, then m ≥ n → k ≥ 1
67 / 114
Inverse of Ackermann’s function
Definition
α(m, n) = min i ≥ 1|A i,
m
n
> log n (3)
Note: This is not a true mathematical inverse.
Intuition: Grows about as slowly as Ackermann’s function does fast.
How slowly?
Let m
n = k, then m ≥ n → k ≥ 1
67 / 114
Inverse of Ackermann’s function
Definition
α(m, n) = min i ≥ 1|A i,
m
n
> log n (3)
Note: This is not a true mathematical inverse.
Intuition: Grows about as slowly as Ackermann’s function does fast.
How slowly?
Let m
n = k, then m ≥ n → k ≥ 1
67 / 114
Thus
First
We can show that A(i, k) ≥ A(i, 1) for all i ≥ 1.
This is left to you...
For Example
Consider i = 4, then A(i, k) ≥ A(4, 1) = 2
2...2
10
≈ 1080.
Finally
if log n < 1080. i.e., if n < 21080
=⇒ α(m, n) ≤ 4
68 / 114
Thus
First
We can show that A(i, k) ≥ A(i, 1) for all i ≥ 1.
This is left to you...
For Example
Consider i = 4, then A(i, k) ≥ A(4, 1) = 2
2...2
10
≈ 1080.
Finally
if log n < 1080. i.e., if n < 21080
=⇒ α(m, n) ≤ 4
68 / 114
Thus
First
We can show that A(i, k) ≥ A(i, 1) for all i ≥ 1.
This is left to you...
For Example
Consider i = 4, then A(i, k) ≥ A(4, 1) = 2
2...2
10
≈ 1080.
Finally
if log n < 1080. i.e., if n < 21080
=⇒ α(m, n) ≤ 4
68 / 114
Thus
First
We can show that A(i, k) ≥ A(i, 1) for all i ≥ 1.
This is left to you...
For Example
Consider i = 4, then A(i, k) ≥ A(4, 1) = 2
2...2
10
≈ 1080.
Finally
if log n < 1080. i.e., if n < 21080
=⇒ α(m, n) ≤ 4
68 / 114
Thus
First
We can show that A(i, k) ≥ A(i, 1) for all i ≥ 1.
This is left to you...
For Example
Consider i = 4, then A(i, k) ≥ A(4, 1) = 2
2...2
10
≈ 1080.
Finally
if log n < 1080. i.e., if n < 21080
=⇒ α(m, n) ≤ 4
68 / 114
Instead of Using the Ackermann Inverse
We define the following function
log∗
n = min i ≥ 0| log(i)
n ≤ 1 (4)
The i means log · · · log n i times
Then
We will establish O (m log∗
n) as upper bound.
69 / 114
Instead of Using the Ackermann Inverse
We define the following function
log∗
n = min i ≥ 0| log(i)
n ≤ 1 (4)
The i means log · · · log n i times
Then
We will establish O (m log∗
n) as upper bound.
69 / 114
In particular
Something Notable
In particular, we have that log∗
2
2...
2
k
= k + 1
For Example
log∗
265536
= 2
2222
4
= 5 (5)
Therefore
We have that log∗
n ≤ 5 for all practical purposes.
70 / 114
In particular
Something Notable
In particular, we have that log∗
2
2...
2
k
= k + 1
For Example
log∗
265536
= 2
2222
4
= 5 (5)
Therefore
We have that log∗
n ≤ 5 for all practical purposes.
70 / 114
In particular
Something Notable
In particular, we have that log∗
2
2...
2
k
= k + 1
For Example
log∗
265536
= 2
2222
4
= 5 (5)
Therefore
We have that log∗
n ≤ 5 for all practical purposes.
70 / 114
The Rank Observation
Something Notable
It is that once somebody becomes a child of another node their rank does
not change given any posterior operation.
71 / 114
For Example
The number in the right is the height
MakeSet(1),MakeSet(2),MakeSet(3), ..., MakeSet(10)
1/0 2/0 3/0 4/0 5/0
6/0 7/0 8/0 9/0 10/0
72 / 114
Example
Now, we do
Union(6, 1),Union(7, 2), ..., Union(10, 1)
1/1 2/1 3/1 4/1 5/1
6/0 7/0 8/0 9/0 10/0
73 / 114
Example
Next - Assuming that you are using a FindSet to get the name set
Union(1, 2)
3/1 4/1 5/1
8/0 9/0 10/0
2/2
7/01/1
6/0
74 / 114
Example
Next
Union(3, 4)
2/2
7/01/1
6/0
3/1
4/2
8/0
9/0
5/1
10/0
75 / 114
Example
Next
Union(2, 4)
2/2
1/1
6/0
7/0
3/1
4/3
8/0
9/0
5/1
10/0
76 / 114
Example
Now you give a FindSet(8)
2/2
1/1
6/0
7/0
4/3
9/0
5/1
10/03/1 8/0
77 / 114
Example
Now you give a Union(4, 5)
2/2
1/1
6/0
7/0
4/3
9/03/1 8/0 5/1
10/0
78 / 114
Properties of ranks
Lemma 1 (About the Rank Properties)
1 ∀x, rank[x] ≤ rank[p[x]].
2 ∀x and x = p[x], then rank[x] < rank[p[x]].
3 rank[x] is initially 0.
4 rank[x] does not decrease.
5 Once x = p[x] holds rank[x] does not change.
6 rank[p[x]] is a monotonically increasing function of time.
Proof
By induction on the number of operations...
79 / 114
Properties of ranks
Lemma 1 (About the Rank Properties)
1 ∀x, rank[x] ≤ rank[p[x]].
2 ∀x and x = p[x], then rank[x] < rank[p[x]].
3 rank[x] is initially 0.
4 rank[x] does not decrease.
5 Once x = p[x] holds rank[x] does not change.
6 rank[p[x]] is a monotonically increasing function of time.
Proof
By induction on the number of operations...
79 / 114
Properties of ranks
Lemma 1 (About the Rank Properties)
1 ∀x, rank[x] ≤ rank[p[x]].
2 ∀x and x = p[x], then rank[x] < rank[p[x]].
3 rank[x] is initially 0.
4 rank[x] does not decrease.
5 Once x = p[x] holds rank[x] does not change.
6 rank[p[x]] is a monotonically increasing function of time.
Proof
By induction on the number of operations...
79 / 114
Properties of ranks
Lemma 1 (About the Rank Properties)
1 ∀x, rank[x] ≤ rank[p[x]].
2 ∀x and x = p[x], then rank[x] < rank[p[x]].
3 rank[x] is initially 0.
4 rank[x] does not decrease.
5 Once x = p[x] holds rank[x] does not change.
6 rank[p[x]] is a monotonically increasing function of time.
Proof
By induction on the number of operations...
79 / 114
Properties of ranks
Lemma 1 (About the Rank Properties)
1 ∀x, rank[x] ≤ rank[p[x]].
2 ∀x and x = p[x], then rank[x] < rank[p[x]].
3 rank[x] is initially 0.
4 rank[x] does not decrease.
5 Once x = p[x] holds rank[x] does not change.
6 rank[p[x]] is a monotonically increasing function of time.
Proof
By induction on the number of operations...
79 / 114
Properties of ranks
Lemma 1 (About the Rank Properties)
1 ∀x, rank[x] ≤ rank[p[x]].
2 ∀x and x = p[x], then rank[x] < rank[p[x]].
3 rank[x] is initially 0.
4 rank[x] does not decrease.
5 Once x = p[x] holds rank[x] does not change.
6 rank[p[x]] is a monotonically increasing function of time.
Proof
By induction on the number of operations...
79 / 114
Properties of ranks
Lemma 1 (About the Rank Properties)
1 ∀x, rank[x] ≤ rank[p[x]].
2 ∀x and x = p[x], then rank[x] < rank[p[x]].
3 rank[x] is initially 0.
4 rank[x] does not decrease.
5 Once x = p[x] holds rank[x] does not change.
6 rank[p[x]] is a monotonically increasing function of time.
Proof
By induction on the number of operations...
79 / 114
For Example
Imagine a MakeSet(x)
Then, rank [x] ≤ rank [p [x]]
Thus, it is true after n operations.
The we get the n + 1 operations that can be:
Case I - FindSet.
Case II - Union.
The rest are for you to prove
It is a good mental exercise!!!
80 / 114
For Example
Imagine a MakeSet(x)
Then, rank [x] ≤ rank [p [x]]
Thus, it is true after n operations.
The we get the n + 1 operations that can be:
Case I - FindSet.
Case II - Union.
The rest are for you to prove
It is a good mental exercise!!!
80 / 114
For Example
Imagine a MakeSet(x)
Then, rank [x] ≤ rank [p [x]]
Thus, it is true after n operations.
The we get the n + 1 operations that can be:
Case I - FindSet.
Case II - Union.
The rest are for you to prove
It is a good mental exercise!!!
80 / 114
For Example
Imagine a MakeSet(x)
Then, rank [x] ≤ rank [p [x]]
Thus, it is true after n operations.
The we get the n + 1 operations that can be:
Case I - FindSet.
Case II - Union.
The rest are for you to prove
It is a good mental exercise!!!
80 / 114
For Example
Imagine a MakeSet(x)
Then, rank [x] ≤ rank [p [x]]
Thus, it is true after n operations.
The we get the n + 1 operations that can be:
Case I - FindSet.
Case II - Union.
The rest are for you to prove
It is a good mental exercise!!!
80 / 114
The Number of Nodes in a Tree
Lemma 2
For all tree roots x, size(x) ≥ 2rank[x]
Note size (x)= Number of nodes in tree rooted at x
Proof
By induction on the number of link operations:
Basis Step
Before first link, all ranks are 0 and each tree contains one node.
Inductive Step
Consider linking x and y (Link (x, y))
Assume lemma holds before this operation; we show that it will holds
after.
81 / 114
The Number of Nodes in a Tree
Lemma 2
For all tree roots x, size(x) ≥ 2rank[x]
Note size (x)= Number of nodes in tree rooted at x
Proof
By induction on the number of link operations:
Basis Step
Before first link, all ranks are 0 and each tree contains one node.
Inductive Step
Consider linking x and y (Link (x, y))
Assume lemma holds before this operation; we show that it will holds
after.
81 / 114
The Number of Nodes in a Tree
Lemma 2
For all tree roots x, size(x) ≥ 2rank[x]
Note size (x)= Number of nodes in tree rooted at x
Proof
By induction on the number of link operations:
Basis Step
Before first link, all ranks are 0 and each tree contains one node.
Inductive Step
Consider linking x and y (Link (x, y))
Assume lemma holds before this operation; we show that it will holds
after.
81 / 114
The Number of Nodes in a Tree
Lemma 2
For all tree roots x, size(x) ≥ 2rank[x]
Note size (x)= Number of nodes in tree rooted at x
Proof
By induction on the number of link operations:
Basis Step
Before first link, all ranks are 0 and each tree contains one node.
Inductive Step
Consider linking x and y (Link (x, y))
Assume lemma holds before this operation; we show that it will holds
after.
81 / 114
The Number of Nodes in a Tree
Lemma 2
For all tree roots x, size(x) ≥ 2rank[x]
Note size (x)= Number of nodes in tree rooted at x
Proof
By induction on the number of link operations:
Basis Step
Before first link, all ranks are 0 and each tree contains one node.
Inductive Step
Consider linking x and y (Link (x, y))
Assume lemma holds before this operation; we show that it will holds
after.
81 / 114
The Number of Nodes in a Tree
Lemma 2
For all tree roots x, size(x) ≥ 2rank[x]
Note size (x)= Number of nodes in tree rooted at x
Proof
By induction on the number of link operations:
Basis Step
Before first link, all ranks are 0 and each tree contains one node.
Inductive Step
Consider linking x and y (Link (x, y))
Assume lemma holds before this operation; we show that it will holds
after.
81 / 114
The Number of Nodes in a Tree
Lemma 2
For all tree roots x, size(x) ≥ 2rank[x]
Note size (x)= Number of nodes in tree rooted at x
Proof
By induction on the number of link operations:
Basis Step
Before first link, all ranks are 0 and each tree contains one node.
Inductive Step
Consider linking x and y (Link (x, y))
Assume lemma holds before this operation; we show that it will holds
after.
81 / 114
Case 1: rank[x] = rank[y]
Assume rank [x] < rank [y]
Note:
rank [x] == rank [x] and rank [y] == rank [y]
82 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]
= 2rank [y]
83 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]
= 2rank [y]
83 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]
= 2rank [y]
83 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]
= 2rank [y]
83 / 114
Case 2: rank[x] == rank[y]
Assume rank [x] == rank [y]
Note:
rank [x] == rank [x] and rank [y] == rank [y] + 1
84 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]+1
= 2rank [y]
Note: In the worst case rank [x] == rank [y] == 0
85 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]+1
= 2rank [y]
Note: In the worst case rank [x] == rank [y] == 0
85 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]+1
= 2rank [y]
Note: In the worst case rank [x] == rank [y] == 0
85 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]+1
= 2rank [y]
Note: In the worst case rank [x] == rank [y] == 0
85 / 114
Therefore
We have that
size (y) = size (x) + size (y)
≥ 2rank[x]
+ 2rank[y]
≥ 2rank[y]+1
= 2rank [y]
Note: In the worst case rank [x] == rank [y] == 0
85 / 114
The number of nodes at certain rank
Lemma 3
For any integer r ≥ 0, there are an most n
2r nodes of rank r.
Proof
First fix r.
When rank r is assigned to some node x, then imagine that you label
each node in the tree rooted at x by “x.”
By lemma 21.3, 2r or more nodes are labeled each time when
executing a union.
By lemma 21.2, each node is labeled at most once, when its root is
first assigned rank r.
If there were more than n
2r nodes of rank r.
Then, we will have that more than 2r · n
2r = n nodes would be
labeled by a node of rank r, a contradiction.
86 / 114
The number of nodes at certain rank
Lemma 3
For any integer r ≥ 0, there are an most n
2r nodes of rank r.
Proof
First fix r.
When rank r is assigned to some node x, then imagine that you label
each node in the tree rooted at x by “x.”
By lemma 21.3, 2r or more nodes are labeled each time when
executing a union.
By lemma 21.2, each node is labeled at most once, when its root is
first assigned rank r.
If there were more than n
2r nodes of rank r.
Then, we will have that more than 2r · n
2r = n nodes would be
labeled by a node of rank r, a contradiction.
86 / 114
The number of nodes at certain rank
Lemma 3
For any integer r ≥ 0, there are an most n
2r nodes of rank r.
Proof
First fix r.
When rank r is assigned to some node x, then imagine that you label
each node in the tree rooted at x by “x.”
By lemma 21.3, 2r or more nodes are labeled each time when
executing a union.
By lemma 21.2, each node is labeled at most once, when its root is
first assigned rank r.
If there were more than n
2r nodes of rank r.
Then, we will have that more than 2r · n
2r = n nodes would be
labeled by a node of rank r, a contradiction.
86 / 114
The number of nodes at certain rank
Lemma 3
For any integer r ≥ 0, there are an most n
2r nodes of rank r.
Proof
First fix r.
When rank r is assigned to some node x, then imagine that you label
each node in the tree rooted at x by “x.”
By lemma 21.3, 2r or more nodes are labeled each time when
executing a union.
By lemma 21.2, each node is labeled at most once, when its root is
first assigned rank r.
If there were more than n
2r nodes of rank r.
Then, we will have that more than 2r · n
2r = n nodes would be
labeled by a node of rank r, a contradiction.
86 / 114
The number of nodes at certain rank
Lemma 3
For any integer r ≥ 0, there are an most n
2r nodes of rank r.
Proof
First fix r.
When rank r is assigned to some node x, then imagine that you label
each node in the tree rooted at x by “x.”
By lemma 21.3, 2r or more nodes are labeled each time when
executing a union.
By lemma 21.2, each node is labeled at most once, when its root is
first assigned rank r.
If there were more than n
2r nodes of rank r.
Then, we will have that more than 2r · n
2r = n nodes would be
labeled by a node of rank r, a contradiction.
86 / 114
The number of nodes at certain rank
Lemma 3
For any integer r ≥ 0, there are an most n
2r nodes of rank r.
Proof
First fix r.
When rank r is assigned to some node x, then imagine that you label
each node in the tree rooted at x by “x.”
By lemma 21.3, 2r or more nodes are labeled each time when
executing a union.
By lemma 21.2, each node is labeled at most once, when its root is
first assigned rank r.
If there were more than n
2r nodes of rank r.
Then, we will have that more than 2r · n
2r = n nodes would be
labeled by a node of rank r, a contradiction.
86 / 114
The number of nodes at certain rank
Lemma 3
For any integer r ≥ 0, there are an most n
2r nodes of rank r.
Proof
First fix r.
When rank r is assigned to some node x, then imagine that you label
each node in the tree rooted at x by “x.”
By lemma 21.3, 2r or more nodes are labeled each time when
executing a union.
By lemma 21.2, each node is labeled at most once, when its root is
first assigned rank r.
If there were more than n
2r nodes of rank r.
Then, we will have that more than 2r · n
2r = n nodes would be
labeled by a node of rank r, a contradiction.
86 / 114
Corollary 1
Corollary 1
Every node has rank at most log n .
Proof
if there is a rank r such that r > log n → n
2r < 1 nodes of rank r a
contradiction.
87 / 114
Corollary 1
Corollary 1
Every node has rank at most log n .
Proof
if there is a rank r such that r > log n → n
2r < 1 nodes of rank r a
contradiction.
87 / 114
Providing the time bound
Lemma 4 (Lemma 21.7)
Suppose we convert a sequence S of m MakeSet, Union and FindSet
operations into a sequence S of m MakeSet, Link, and FindSet operations
by turning each Union into two FindSet operations followed by a Link.
Then, if sequence S runs in O(m log∗
n) time, sequence S runs in
O(m log∗
n) time.
88 / 114
Proof:
The proof is quite easy
1 Since each UNION operation in sequence S is converted into three
operations in S.
m ≤ m ≤ 3m (6)
2 We have that m = O (m )
3 Then, if the new sequence S runs in O (m log∗
n) this implies that
the old sequence S runs in O (m log∗
n)
89 / 114
Proof:
The proof is quite easy
1 Since each UNION operation in sequence S is converted into three
operations in S.
m ≤ m ≤ 3m (6)
2 We have that m = O (m )
3 Then, if the new sequence S runs in O (m log∗
n) this implies that
the old sequence S runs in O (m log∗
n)
89 / 114
Proof:
The proof is quite easy
1 Since each UNION operation in sequence S is converted into three
operations in S.
m ≤ m ≤ 3m (6)
2 We have that m = O (m )
3 Then, if the new sequence S runs in O (m log∗
n) this implies that
the old sequence S runs in O (m log∗
n)
89 / 114
Theorem for Union by Rank and Path Compression
Theorem
Any sequence of m MakeSet, Link, and FindSet operations, n of which are
MakeSet operations, is performed in worst-case time O(m log∗
n).
Proof
First, MakeSet and Link take O(1) time.
The Key of the Analysis is to Accurately Charging FindSet.
90 / 114
Theorem for Union by Rank and Path Compression
Theorem
Any sequence of m MakeSet, Link, and FindSet operations, n of which are
MakeSet operations, is performed in worst-case time O(m log∗
n).
Proof
First, MakeSet and Link take O(1) time.
The Key of the Analysis is to Accurately Charging FindSet.
90 / 114
Theorem for Union by Rank and Path Compression
Theorem
Any sequence of m MakeSet, Link, and FindSet operations, n of which are
MakeSet operations, is performed in worst-case time O(m log∗
n).
Proof
First, MakeSet and Link take O(1) time.
The Key of the Analysis is to Accurately Charging FindSet.
90 / 114
For this, we have the following
We can do the following
Partition ranks into blocks.
Put each rank j into block log∗
r for r = 0, 1, ..., log n (Corollary 1).
Highest-numbered block is log∗
(log n) = (log∗
n) − 1.
In addition, the cost of FindSet pays for the foollowing situations
1 The FindSet pays for the cost of the root and its child.
2 A bill is given to every node whose rank parent changes in the path
compression!!!
91 / 114
For this, we have the following
We can do the following
Partition ranks into blocks.
Put each rank j into block log∗
r for r = 0, 1, ..., log n (Corollary 1).
Highest-numbered block is log∗
(log n) = (log∗
n) − 1.
In addition, the cost of FindSet pays for the foollowing situations
1 The FindSet pays for the cost of the root and its child.
2 A bill is given to every node whose rank parent changes in the path
compression!!!
91 / 114
For this, we have the following
We can do the following
Partition ranks into blocks.
Put each rank j into block log∗
r for r = 0, 1, ..., log n (Corollary 1).
Highest-numbered block is log∗
(log n) = (log∗
n) − 1.
In addition, the cost of FindSet pays for the foollowing situations
1 The FindSet pays for the cost of the root and its child.
2 A bill is given to every node whose rank parent changes in the path
compression!!!
91 / 114
For this, we have the following
We can do the following
Partition ranks into blocks.
Put each rank j into block log∗
r for r = 0, 1, ..., log n (Corollary 1).
Highest-numbered block is log∗
(log n) = (log∗
n) − 1.
In addition, the cost of FindSet pays for the foollowing situations
1 The FindSet pays for the cost of the root and its child.
2 A bill is given to every node whose rank parent changes in the path
compression!!!
91 / 114
For this, we have the following
We can do the following
Partition ranks into blocks.
Put each rank j into block log∗
r for r = 0, 1, ..., log n (Corollary 1).
Highest-numbered block is log∗
(log n) = (log∗
n) − 1.
In addition, the cost of FindSet pays for the foollowing situations
1 The FindSet pays for the cost of the root and its child.
2 A bill is given to every node whose rank parent changes in the path
compression!!!
91 / 114
Now, define the Block function
Define the following Upper Bound Function
B(j) ≡



−1 if j = −1
1 if j = 0
2 if j = 1
2
2...
2
j−1
if j ≥ 2
92 / 114
First
Something Notable
These are going to be the upper bounds for blocks in the ranks
Where
For j = 0, 1, ..., log∗
n − 1, block j consist of the set of ranks:
B(j − 1) + 1, B(j − 1) + 2, ..., B(j)
Elements in Block j
(7)
93 / 114
First
Something Notable
These are going to be the upper bounds for blocks in the ranks
Where
For j = 0, 1, ..., log∗
n − 1, block j consist of the set of ranks:
B(j − 1) + 1, B(j − 1) + 2, ..., B(j)
Elements in Block j
(7)
93 / 114
For Example
We have that
B(−1) = 1
B(0) = 0
B(1) = 2
B(2) = 22
= 4
B(3) = 222
= 24
= 16
B(4) = 2222
= 216
= 65536
94 / 114
For Example
We have that
B(−1) = 1
B(0) = 0
B(1) = 2
B(2) = 22
= 4
B(3) = 222
= 24
= 16
B(4) = 2222
= 216
= 65536
94 / 114
For Example
We have that
B(−1) = 1
B(0) = 0
B(1) = 2
B(2) = 22
= 4
B(3) = 222
= 24
= 16
B(4) = 2222
= 216
= 65536
94 / 114
For Example
We have that
B(−1) = 1
B(0) = 0
B(1) = 2
B(2) = 22
= 4
B(3) = 222
= 24
= 16
B(4) = 2222
= 216
= 65536
94 / 114
For Example
We have that
B(−1) = 1
B(0) = 0
B(1) = 2
B(2) = 22
= 4
B(3) = 222
= 24
= 16
B(4) = 2222
= 216
= 65536
94 / 114
For Example
We have that
B(−1) = 1
B(0) = 0
B(1) = 2
B(2) = 22
= 4
B(3) = 222
= 24
= 16
B(4) = 2222
= 216
= 65536
94 / 114
For Example
Thus, we have
Block j Set of Ranks
0 0,1
1 2
2 3,4
3 5,...,16
4 17,...,65536
...
...
Note B(j) = 2B(j−1)
for j > 0.
95 / 114
For Example
Thus, we have
Block j Set of Ranks
0 0,1
1 2
2 3,4
3 5,...,16
4 17,...,65536
...
...
Note B(j) = 2B(j−1)
for j > 0.
95 / 114
Example
Now you give a Union(4, 5)
2/2
1/1
6/0
7/0
4/3
9/03/1 8/0 5/1
10/0
Bock 0
Bock 1
Bock 2
96 / 114
Finally
Given our Bound in the Ranks
Thus, all the blocks from B (0) to B (log∗
n − 1) will be used for storing
the ranking elements
97 / 114
Charging for FindSets
Two types of charges for FindSet(x0)
Block charges and Path charges.
Charge each
node as either:
1) Block Charge
2) Path Charge
98 / 114
Charging for FindSets
Thus, for find sets
The find operation pays for the work done for the root and its
immediate child.
It also pays for all the nodes which are not in the same block as
their parents.
99 / 114
Charging for FindSets
Thus, for find sets
The find operation pays for the work done for the root and its
immediate child.
It also pays for all the nodes which are not in the same block as
their parents.
99 / 114
Then
First
1 All these nodes are children of some other nodes, so their ranks will
not change and they are bound to stay in the same block until the
end of the computation.
2 If a node is in the same block as its parent, it will be charged for the
work done in the FindSet Operation!!!
100 / 114
Then
First
1 All these nodes are children of some other nodes, so their ranks will
not change and they are bound to stay in the same block until the
end of the computation.
2 If a node is in the same block as its parent, it will be charged for the
work done in the FindSet Operation!!!
100 / 114
Thus
We have the following charges
Block Charge :
For j = 0, 1, ..., log∗
n − 1, give one block charge to the last node with
rank in block j on the path x0, x1, ..., xl.
Also give one block charge to the child of the root, i.e., xl−1, and the
root itself, i.e., xl−1.
Path Charge :
Give nodes in x0, ..., xl a path charge until they are moved to point to a
name element with a rank different from the child’s block
101 / 114
Thus
We have the following charges
Block Charge :
For j = 0, 1, ..., log∗
n − 1, give one block charge to the last node with
rank in block j on the path x0, x1, ..., xl.
Also give one block charge to the child of the root, i.e., xl−1, and the
root itself, i.e., xl−1.
Path Charge :
Give nodes in x0, ..., xl a path charge until they are moved to point to a
name element with a rank different from the child’s block
101 / 114
Thus
We have the following charges
Block Charge :
For j = 0, 1, ..., log∗
n − 1, give one block charge to the last node with
rank in block j on the path x0, x1, ..., xl.
Also give one block charge to the child of the root, i.e., xl−1, and the
root itself, i.e., xl−1.
Path Charge :
Give nodes in x0, ..., xl a path charge until they are moved to point to a
name element with a rank different from the child’s block
101 / 114
Thus
We have the following charges
Block Charge :
For j = 0, 1, ..., log∗
n − 1, give one block charge to the last node with
rank in block j on the path x0, x1, ..., xl.
Also give one block charge to the child of the root, i.e., xl−1, and the
root itself, i.e., xl−1.
Path Charge :
Give nodes in x0, ..., xl a path charge until they are moved to point to a
name element with a rank different from the child’s block
101 / 114
Thus
We have the following charges
Block Charge :
For j = 0, 1, ..., log∗
n − 1, give one block charge to the last node with
rank in block j on the path x0, x1, ..., xl.
Also give one block charge to the child of the root, i.e., xl−1, and the
root itself, i.e., xl−1.
Path Charge :
Give nodes in x0, ..., xl a path charge until they are moved to point to a
name element with a rank different from the child’s block
101 / 114
Charging for FindSets
Two types of charges for FindSet(x0)
Block charges and Path charges.
Charge each
node as either:
1) Block Charge
2) Path Charge
102 / 114
Charging for FindSets
Two types of charges for FindSet(x0)
Block charges and Path charges.
Charge each
node as either:
1) Block Charge
2) Path Charge
102 / 114
Next
Something Notable
Number of nodes whose parents are in different blocks is limited by
(log∗
n) − 1.
Making it an upper bound for the charges when changing the last
node with rank in block j.
2 charges for the root and its child.
Thus
The cost of the Block Charges for the FindSet operation is upper bounded
by:
log∗
n − 1 + 2 = log∗
n + 1. (8)
103 / 114
Next
Something Notable
Number of nodes whose parents are in different blocks is limited by
(log∗
n) − 1.
Making it an upper bound for the charges when changing the last
node with rank in block j.
2 charges for the root and its child.
Thus
The cost of the Block Charges for the FindSet operation is upper bounded
by:
log∗
n − 1 + 2 = log∗
n + 1. (8)
103 / 114
Next
Something Notable
Number of nodes whose parents are in different blocks is limited by
(log∗
n) − 1.
Making it an upper bound for the charges when changing the last
node with rank in block j.
2 charges for the root and its child.
Thus
The cost of the Block Charges for the FindSet operation is upper bounded
by:
log∗
n − 1 + 2 = log∗
n + 1. (8)
103 / 114
Next
Something Notable
Number of nodes whose parents are in different blocks is limited by
(log∗
n) − 1.
Making it an upper bound for the charges when changing the last
node with rank in block j.
2 charges for the root and its child.
Thus
The cost of the Block Charges for the FindSet operation is upper bounded
by:
log∗
n − 1 + 2 = log∗
n + 1. (8)
103 / 114
Next
Something Notable
Number of nodes whose parents are in different blocks is limited by
(log∗
n) − 1.
Making it an upper bound for the charges when changing the last
node with rank in block j.
2 charges for the root and its child.
Thus
The cost of the Block Charges for the FindSet operation is upper bounded
by:
log∗
n − 1 + 2 = log∗
n + 1. (8)
103 / 114
Claim
Claim
Once a node other than a root or its child is given a Block Charge (B.C.),
it will never be given a Path Charge (P.C.)
104 / 114
Proof
Proof
Given a node x, we know that:
rank [p [x]] − rank [x] is monotonically increasing ⇒
log∗
rank [p [x]] − log∗
rank [x] is monotonically increasing.
Thus, Once x and p[x] are in different blocks, they will always be in
different blocks because:
The rank of the parent can only increases.
And the child’s rank stays the same
Thus, the node x will be billed in the first FindSet operation a patch
charge and block charge if necessary.
Thus, the node x will never be charged again a path charge because
is already pointing to the member set name.
105 / 114
Proof
Proof
Given a node x, we know that:
rank [p [x]] − rank [x] is monotonically increasing ⇒
log∗
rank [p [x]] − log∗
rank [x] is monotonically increasing.
Thus, Once x and p[x] are in different blocks, they will always be in
different blocks because:
The rank of the parent can only increases.
And the child’s rank stays the same
Thus, the node x will be billed in the first FindSet operation a patch
charge and block charge if necessary.
Thus, the node x will never be charged again a path charge because
is already pointing to the member set name.
105 / 114
Proof
Proof
Given a node x, we know that:
rank [p [x]] − rank [x] is monotonically increasing ⇒
log∗
rank [p [x]] − log∗
rank [x] is monotonically increasing.
Thus, Once x and p[x] are in different blocks, they will always be in
different blocks because:
The rank of the parent can only increases.
And the child’s rank stays the same
Thus, the node x will be billed in the first FindSet operation a patch
charge and block charge if necessary.
Thus, the node x will never be charged again a path charge because
is already pointing to the member set name.
105 / 114
Proof
Proof
Given a node x, we know that:
rank [p [x]] − rank [x] is monotonically increasing ⇒
log∗
rank [p [x]] − log∗
rank [x] is monotonically increasing.
Thus, Once x and p[x] are in different blocks, they will always be in
different blocks because:
The rank of the parent can only increases.
And the child’s rank stays the same
Thus, the node x will be billed in the first FindSet operation a patch
charge and block charge if necessary.
Thus, the node x will never be charged again a path charge because
is already pointing to the member set name.
105 / 114
Proof
Proof
Given a node x, we know that:
rank [p [x]] − rank [x] is monotonically increasing ⇒
log∗
rank [p [x]] − log∗
rank [x] is monotonically increasing.
Thus, Once x and p[x] are in different blocks, they will always be in
different blocks because:
The rank of the parent can only increases.
And the child’s rank stays the same
Thus, the node x will be billed in the first FindSet operation a patch
charge and block charge if necessary.
Thus, the node x will never be charged again a path charge because
is already pointing to the member set name.
105 / 114
Proof
Proof
Given a node x, we know that:
rank [p [x]] − rank [x] is monotonically increasing ⇒
log∗
rank [p [x]] − log∗
rank [x] is monotonically increasing.
Thus, Once x and p[x] are in different blocks, they will always be in
different blocks because:
The rank of the parent can only increases.
And the child’s rank stays the same
Thus, the node x will be billed in the first FindSet operation a patch
charge and block charge if necessary.
Thus, the node x will never be charged again a path charge because
is already pointing to the member set name.
105 / 114
Proof
Proof
Given a node x, we know that:
rank [p [x]] − rank [x] is monotonically increasing ⇒
log∗
rank [p [x]] − log∗
rank [x] is monotonically increasing.
Thus, Once x and p[x] are in different blocks, they will always be in
different blocks because:
The rank of the parent can only increases.
And the child’s rank stays the same
Thus, the node x will be billed in the first FindSet operation a patch
charge and block charge if necessary.
Thus, the node x will never be charged again a path charge because
is already pointing to the member set name.
105 / 114
Remaining Goal
The Total cost of the FindSet’s Operations
Total cost of FindSet’s = Total Block Charges + Total Path Charges.
We want to show
Total Block Charges + Total Path Charges= O(m log∗
n)
106 / 114
Remaining Goal
The Total cost of the FindSet’s Operations
Total cost of FindSet’s = Total Block Charges + Total Path Charges.
We want to show
Total Block Charges + Total Path Charges= O(m log∗
n)
106 / 114
Bounding Block Charges
This part is easy
Block numbers range over 0, ..., log∗
n − 1.
The number of Block Charges per FindSet is ≤ log∗
n + 1 .
The total number of FindSet’s is ≤ m
The total number of Block Charges is ≤ m(log∗
n + 1) .
107 / 114
Bounding Block Charges
This part is easy
Block numbers range over 0, ..., log∗
n − 1.
The number of Block Charges per FindSet is ≤ log∗
n + 1 .
The total number of FindSet’s is ≤ m
The total number of Block Charges is ≤ m(log∗
n + 1) .
107 / 114
Bounding Block Charges
This part is easy
Block numbers range over 0, ..., log∗
n − 1.
The number of Block Charges per FindSet is ≤ log∗
n + 1 .
The total number of FindSet’s is ≤ m
The total number of Block Charges is ≤ m(log∗
n + 1) .
107 / 114
Bounding Block Charges
This part is easy
Block numbers range over 0, ..., log∗
n − 1.
The number of Block Charges per FindSet is ≤ log∗
n + 1 .
The total number of FindSet’s is ≤ m
The total number of Block Charges is ≤ m(log∗
n + 1) .
107 / 114
Bounding Path Charges
Claim
Let N(j) be the number of nodes whose ranks are in block j. Then, for all
j ≥ 0, N(j) ≤ 3n
2B(j)
Proof
By Lemma 3, N(j) ≤
B(j)
r=B(j−1)+1
n
2r summing over all possible ranks
For j = 0:
N (0) ≤
n
20
+
n
2
=
3n
2
=
3n
2B(0)
108 / 114
Bounding Path Charges
Claim
Let N(j) be the number of nodes whose ranks are in block j. Then, for all
j ≥ 0, N(j) ≤ 3n
2B(j)
Proof
By Lemma 3, N(j) ≤
B(j)
r=B(j−1)+1
n
2r summing over all possible ranks
For j = 0:
N (0) ≤
n
20
+
n
2
=
3n
2
=
3n
2B(0)
108 / 114
Bounding Path Charges
Claim
Let N(j) be the number of nodes whose ranks are in block j. Then, for all
j ≥ 0, N(j) ≤ 3n
2B(j)
Proof
By Lemma 3, N(j) ≤
B(j)
r=B(j−1)+1
n
2r summing over all possible ranks
For j = 0:
N (0) ≤
n
20
+
n
2
=
3n
2
=
3n
2B(0)
108 / 114
Bounding Path Charges
Claim
Let N(j) be the number of nodes whose ranks are in block j. Then, for all
j ≥ 0, N(j) ≤ 3n
2B(j)
Proof
By Lemma 3, N(j) ≤
B(j)
r=B(j−1)+1
n
2r summing over all possible ranks
For j = 0:
N (0) ≤
n
20
+
n
2
=
3n
2
=
3n
2B(0)
108 / 114
Bounding Path Charges
Claim
Let N(j) be the number of nodes whose ranks are in block j. Then, for all
j ≥ 0, N(j) ≤ 3n
2B(j)
Proof
By Lemma 3, N(j) ≤
B(j)
r=B(j−1)+1
n
2r summing over all possible ranks
For j = 0:
N (0) ≤
n
20
+
n
2
=
3n
2
=
3n
2B(0)
108 / 114
Proof of claim
For j ≥ 1
N(j) ≤
n
2B(j−1)+1
B(j)−(B(j−1)+1)
r=0
1
2r
<
n
2B(j−1)+1
∞
r=0
1
2r
=
n
2B(j−1)
This is where the fact that B (j) = 2B(j−1)is used.
=
n
B(j)
<
3n
2B(j)
109 / 114
Proof of claim
For j ≥ 1
N(j) ≤
n
2B(j−1)+1
B(j)−(B(j−1)+1)
r=0
1
2r
<
n
2B(j−1)+1
∞
r=0
1
2r
=
n
2B(j−1)
This is where the fact that B (j) = 2B(j−1)is used.
=
n
B(j)
<
3n
2B(j)
109 / 114
Proof of claim
For j ≥ 1
N(j) ≤
n
2B(j−1)+1
B(j)−(B(j−1)+1)
r=0
1
2r
<
n
2B(j−1)+1
∞
r=0
1
2r
=
n
2B(j−1)
This is where the fact that B (j) = 2B(j−1)is used.
=
n
B(j)
<
3n
2B(j)
109 / 114
Proof of claim
For j ≥ 1
N(j) ≤
n
2B(j−1)+1
B(j)−(B(j−1)+1)
r=0
1
2r
<
n
2B(j−1)+1
∞
r=0
1
2r
=
n
2B(j−1)
This is where the fact that B (j) = 2B(j−1)is used.
=
n
B(j)
<
3n
2B(j)
109 / 114
Proof of claim
For j ≥ 1
N(j) ≤
n
2B(j−1)+1
B(j)−(B(j−1)+1)
r=0
1
2r
<
n
2B(j−1)+1
∞
r=0
1
2r
=
n
2B(j−1)
This is where the fact that B (j) = 2B(j−1)is used.
=
n
B(j)
<
3n
2B(j)
109 / 114
Bounding Path Charges
We have the following
Let P(n) denote the overall number of path charges. Then:
P(n) ≤
log∗
n−1
j=0
αj · βj (9)
αj is the max number of nodes with ranks in Block j
βj is the max number of path charges per node of Block j.
110 / 114
Bounding Path Charges
We have the following
Let P(n) denote the overall number of path charges. Then:
P(n) ≤
log∗
n−1
j=0
αj · βj (9)
αj is the max number of nodes with ranks in Block j
βj is the max number of path charges per node of Block j.
110 / 114
Bounding Path Charges
We have the following
Let P(n) denote the overall number of path charges. Then:
P(n) ≤
log∗
n−1
j=0
αj · βj (9)
αj is the max number of nodes with ranks in Block j
βj is the max number of path charges per node of Block j.
110 / 114
Then, we have the following
Upper Bounds
By claim, αj upper-bounded by 3n
2B(j) ,
In addition, we need to bound βj that represents the maximum
number of path charges for nodes x at block j.
Note: Any node in Block j that is given a P.C. will be in Block j
after all m operations.
111 / 114
Then, we have the following
Upper Bounds
By claim, αj upper-bounded by 3n
2B(j) ,
In addition, we need to bound βj that represents the maximum
number of path charges for nodes x at block j.
Note: Any node in Block j that is given a P.C. will be in Block j
after all m operations.
111 / 114
Then, we have the following
Upper Bounds
By claim, αj upper-bounded by 3n
2B(j) ,
In addition, we need to bound βj that represents the maximum
number of path charges for nodes x at block j.
Note: Any node in Block j that is given a P.C. will be in Block j
after all m operations.
111 / 114
Then, we have the following
Upper Bounds
By claim, αj upper-bounded by 3n
2B(j) ,
In addition, we need to bound βj that represents the maximum
number of path charges for nodes x at block j.
Note: Any node in Block j that is given a P.C. will be in Block j
after all m operations.
Path Compression
is issued
111 / 114
Now, we bound βj
So, every time x is assessed a Path Charges, it gets a new parent with
increased rank.
Note: x’s rank is not changed by path compression
Suppose x has a rank in Block j
Repeated Path Charges to x will ultimately result in x’s parent having
a rank in a Block higher than j.
From that point onward, x is given Block Charges, not Path Charges.
Therefore, the Worst Case
x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents
ranks successively take on the values.
B(j − 1) + 2, B(j − 1) + 3, ..., B(j)
112 / 114
Now, we bound βj
So, every time x is assessed a Path Charges, it gets a new parent with
increased rank.
Note: x’s rank is not changed by path compression
Suppose x has a rank in Block j
Repeated Path Charges to x will ultimately result in x’s parent having
a rank in a Block higher than j.
From that point onward, x is given Block Charges, not Path Charges.
Therefore, the Worst Case
x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents
ranks successively take on the values.
B(j − 1) + 2, B(j − 1) + 3, ..., B(j)
112 / 114
Now, we bound βj
So, every time x is assessed a Path Charges, it gets a new parent with
increased rank.
Note: x’s rank is not changed by path compression
Suppose x has a rank in Block j
Repeated Path Charges to x will ultimately result in x’s parent having
a rank in a Block higher than j.
From that point onward, x is given Block Charges, not Path Charges.
Therefore, the Worst Case
x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents
ranks successively take on the values.
B(j − 1) + 2, B(j − 1) + 3, ..., B(j)
112 / 114
Now, we bound βj
So, every time x is assessed a Path Charges, it gets a new parent with
increased rank.
Note: x’s rank is not changed by path compression
Suppose x has a rank in Block j
Repeated Path Charges to x will ultimately result in x’s parent having
a rank in a Block higher than j.
From that point onward, x is given Block Charges, not Path Charges.
Therefore, the Worst Case
x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents
ranks successively take on the values.
B(j − 1) + 2, B(j − 1) + 3, ..., B(j)
112 / 114
Now, we bound βj
So, every time x is assessed a Path Charges, it gets a new parent with
increased rank.
Note: x’s rank is not changed by path compression
Suppose x has a rank in Block j
Repeated Path Charges to x will ultimately result in x’s parent having
a rank in a Block higher than j.
From that point onward, x is given Block Charges, not Path Charges.
Therefore, the Worst Case
x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents
ranks successively take on the values.
B(j − 1) + 2, B(j − 1) + 3, ..., B(j)
112 / 114
Finally
Hence, x can be given at most B(j) − B(j − 1) − 1 Path Charges.
Therefore:
P(n) ≤
log∗
n−1
j=0
3n
2B(j)(B(j) − B(j − 1) − 1)
P(n) ≤
log∗
n−1
j=0
3n
2B(j) B(j)
P(n) = 3
2n log∗
n
113 / 114
Finally
Hence, x can be given at most B(j) − B(j − 1) − 1 Path Charges.
Therefore:
P(n) ≤
log∗
n−1
j=0
3n
2B(j)(B(j) − B(j − 1) − 1)
P(n) ≤
log∗
n−1
j=0
3n
2B(j) B(j)
P(n) = 3
2n log∗
n
113 / 114
Finally
Hence, x can be given at most B(j) − B(j − 1) − 1 Path Charges.
Therefore:
P(n) ≤
log∗
n−1
j=0
3n
2B(j)(B(j) − B(j − 1) − 1)
P(n) ≤
log∗
n−1
j=0
3n
2B(j) B(j)
P(n) = 3
2n log∗
n
113 / 114
Finally
Hence, x can be given at most B(j) − B(j − 1) − 1 Path Charges.
Therefore:
P(n) ≤
log∗
n−1
j=0
3n
2B(j)(B(j) − B(j − 1) − 1)
P(n) ≤
log∗
n−1
j=0
3n
2B(j) B(j)
P(n) = 3
2n log∗
n
113 / 114
Thus
FindSet operations contribute
O(m(log∗
n + 1) + n log∗
n) = O(m log∗
n) (10)
MakeSet and Link contribute O(n)
Entire sequence takes O (m log∗
n).
114 / 114
Thus
FindSet operations contribute
O(m(log∗
n + 1) + n log∗
n) = O(m log∗
n) (10)
MakeSet and Link contribute O(n)
Entire sequence takes O (m log∗
n).
114 / 114

17 Disjoint Set Representation

  • 1.
    Analysis of Algorithm DisjointSet Representation Andres Mendez-Vazquez November 8, 2015 1 / 114
  • 2.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 2 / 114
  • 3.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 3 / 114
  • 4.
    Disjoint Set Representation Problem 1Items are drawn from the finite universe U = 1, 2, ..., n for some fixed n. 2 We want to maintain a partition of U as a collection of disjoint sets. 3 In addition, we want to uniquely name each set by one of its items called its representative item. These disjoint sets are maintained under the following operations 1 MakeSet(x) 2 Union(A,B) 3 Find(x) 4 / 114
  • 5.
    Disjoint Set Representation Problem 1Items are drawn from the finite universe U = 1, 2, ..., n for some fixed n. 2 We want to maintain a partition of U as a collection of disjoint sets. 3 In addition, we want to uniquely name each set by one of its items called its representative item. These disjoint sets are maintained under the following operations 1 MakeSet(x) 2 Union(A,B) 3 Find(x) 4 / 114
  • 6.
    Disjoint Set Representation Problem 1Items are drawn from the finite universe U = 1, 2, ..., n for some fixed n. 2 We want to maintain a partition of U as a collection of disjoint sets. 3 In addition, we want to uniquely name each set by one of its items called its representative item. These disjoint sets are maintained under the following operations 1 MakeSet(x) 2 Union(A,B) 3 Find(x) 4 / 114
  • 7.
    Disjoint Set Representation Problem 1Items are drawn from the finite universe U = 1, 2, ..., n for some fixed n. 2 We want to maintain a partition of U as a collection of disjoint sets. 3 In addition, we want to uniquely name each set by one of its items called its representative item. These disjoint sets are maintained under the following operations 1 MakeSet(x) 2 Union(A,B) 3 Find(x) 4 / 114
  • 8.
    Disjoint Set Representation Problem 1Items are drawn from the finite universe U = 1, 2, ..., n for some fixed n. 2 We want to maintain a partition of U as a collection of disjoint sets. 3 In addition, we want to uniquely name each set by one of its items called its representative item. These disjoint sets are maintained under the following operations 1 MakeSet(x) 2 Union(A,B) 3 Find(x) 4 / 114
  • 9.
    Disjoint Set Representation Problem 1Items are drawn from the finite universe U = 1, 2, ..., n for some fixed n. 2 We want to maintain a partition of U as a collection of disjoint sets. 3 In addition, we want to uniquely name each set by one of its items called its representative item. These disjoint sets are maintained under the following operations 1 MakeSet(x) 2 Union(A,B) 3 Find(x) 4 / 114
  • 10.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 5 / 114
  • 11.
    Operations MakeSet(x) Given x ∈U currently not belonging to any set in the collection, create a new singleton set {x} and name it x. This is usually done at start, once per item, to create the initial trivial partition. Union(A,B) It changes the current partition by replacing its sets A and B with A ∪ B. Name the set A or B. The operation may choose either one of the two representatives as the new representatives. Find(x) It returns the name of the set that currently contains item x. 6 / 114
  • 12.
    Operations MakeSet(x) Given x ∈U currently not belonging to any set in the collection, create a new singleton set {x} and name it x. This is usually done at start, once per item, to create the initial trivial partition. Union(A,B) It changes the current partition by replacing its sets A and B with A ∪ B. Name the set A or B. The operation may choose either one of the two representatives as the new representatives. Find(x) It returns the name of the set that currently contains item x. 6 / 114
  • 13.
    Operations MakeSet(x) Given x ∈U currently not belonging to any set in the collection, create a new singleton set {x} and name it x. This is usually done at start, once per item, to create the initial trivial partition. Union(A,B) It changes the current partition by replacing its sets A and B with A ∪ B. Name the set A or B. The operation may choose either one of the two representatives as the new representatives. Find(x) It returns the name of the set that currently contains item x. 6 / 114
  • 14.
    Example for x =1 to 9 do MakeSet(x) 98621 2 3 4 5 6 8 97 Then, you do a Union(1, 2) Now, Union(3, 4); Union(5, 8); Union(6, 9) 7 / 114
  • 15.
    Example for x =1 to 9 do MakeSet(x) 98621 2 3 4 5 6 8 97 Then, you do a Union(1, 2) 9863 4 5 6 8 9721 2 Now, Union(3, 4); Union(5, 8); Union(6, 9) 7 / 114
  • 16.
    Example for x =1 to 9 do MakeSet(x) 98621 2 3 4 5 6 8 97 Then, you do a Union(1, 2) 9863 4 5 6 8 9721 2 Now, Union(3, 4); Union(5, 8); Union(6, 9) 21 2 3 4 5 678 9 7 / 114
  • 17.
    Example Now, Union(1, 5);Union(7, 4) 21 2 3 45 678 9 Then, if we do the following operations Find(1) returns 5 Find(9) returns 9 Finally, Union(5, 9) Then Find(9) returns 5 8 / 114
  • 18.
    Example Now, Union(1, 5);Union(7, 4) 21 2 3 45 678 9 Then, if we do the following operations Find(1) returns 5 Find(9) returns 9 Finally, Union(5, 9) Then Find(9) returns 5 8 / 114
  • 19.
    Example Now, Union(1, 5);Union(7, 4) 21 2 3 45 678 9 Then, if we do the following operations Find(1) returns 5 Find(9) returns 9 Finally, Union(5, 9) 21 2 3 45 6 78 9 Then Find(9) returns 5 8 / 114
  • 20.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 9 / 114
  • 21.
    Union-Find Problem Problem S =be a sequence of m = |S| MakeSet, Union and Find operations (intermixed in arbitrary order): n of which are MakeSet. At most n − 1 are Union. The rest are Finds. Cost(S) = total computational time to execute sequence s. Goal: Find an implementation that, for every m and n, minimizes the amortized cost per operation: Cost (S) |S| (1) for any arbitrary sequence S. 10 / 114
  • 22.
    Union-Find Problem Problem S =be a sequence of m = |S| MakeSet, Union and Find operations (intermixed in arbitrary order): n of which are MakeSet. At most n − 1 are Union. The rest are Finds. Cost(S) = total computational time to execute sequence s. Goal: Find an implementation that, for every m and n, minimizes the amortized cost per operation: Cost (S) |S| (1) for any arbitrary sequence S. 10 / 114
  • 23.
    Union-Find Problem Problem S =be a sequence of m = |S| MakeSet, Union and Find operations (intermixed in arbitrary order): n of which are MakeSet. At most n − 1 are Union. The rest are Finds. Cost(S) = total computational time to execute sequence s. Goal: Find an implementation that, for every m and n, minimizes the amortized cost per operation: Cost (S) |S| (1) for any arbitrary sequence S. 10 / 114
  • 24.
    Union-Find Problem Problem S =be a sequence of m = |S| MakeSet, Union and Find operations (intermixed in arbitrary order): n of which are MakeSet. At most n − 1 are Union. The rest are Finds. Cost(S) = total computational time to execute sequence s. Goal: Find an implementation that, for every m and n, minimizes the amortized cost per operation: Cost (S) |S| (1) for any arbitrary sequence S. 10 / 114
  • 25.
    Union-Find Problem Problem S =be a sequence of m = |S| MakeSet, Union and Find operations (intermixed in arbitrary order): n of which are MakeSet. At most n − 1 are Union. The rest are Finds. Cost(S) = total computational time to execute sequence s. Goal: Find an implementation that, for every m and n, minimizes the amortized cost per operation: Cost (S) |S| (1) for any arbitrary sequence S. 10 / 114
  • 26.
    Union-Find Problem Problem S =be a sequence of m = |S| MakeSet, Union and Find operations (intermixed in arbitrary order): n of which are MakeSet. At most n − 1 are Union. The rest are Finds. Cost(S) = total computational time to execute sequence s. Goal: Find an implementation that, for every m and n, minimizes the amortized cost per operation: Cost (S) |S| (1) for any arbitrary sequence S. 10 / 114
  • 27.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 11 / 114
  • 28.
    Applications Examples 1 Maintaining partitionsand equivalence classes. 2 Graph connectivity under edge insertion. 3 Minimum spanning trees (e.g. Kruskal’s algorithm). 4 Random maze construction. 12 / 114
  • 29.
    Applications Examples 1 Maintaining partitionsand equivalence classes. 2 Graph connectivity under edge insertion. 3 Minimum spanning trees (e.g. Kruskal’s algorithm). 4 Random maze construction. 12 / 114
  • 30.
    Applications Examples 1 Maintaining partitionsand equivalence classes. 2 Graph connectivity under edge insertion. 3 Minimum spanning trees (e.g. Kruskal’s algorithm). 4 Random maze construction. 12 / 114
  • 31.
    Applications Examples 1 Maintaining partitionsand equivalence classes. 2 Graph connectivity under edge insertion. 3 Minimum spanning trees (e.g. Kruskal’s algorithm). 4 Random maze construction. 12 / 114
  • 32.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 13 / 114
  • 33.
    Circular lists We usethe following structures Data structure: Two arrays Set[1..n] and next[1..n]. Set[x] returns the name of the set that contains item x. A is a set if and only if Set[A] = A next[x] returns the next item on the list of the set that contains item x. 14 / 114
  • 34.
    Circular lists We usethe following structures Data structure: Two arrays Set[1..n] and next[1..n]. Set[x] returns the name of the set that contains item x. A is a set if and only if Set[A] = A next[x] returns the next item on the list of the set that contains item x. 14 / 114
  • 35.
    Circular lists We usethe following structures Data structure: Two arrays Set[1..n] and next[1..n]. Set[x] returns the name of the set that contains item x. A is a set if and only if Set[A] = A next[x] returns the next item on the list of the set that contains item x. 14 / 114
  • 36.
    Circular lists We usethe following structures Data structure: Two arrays Set[1..n] and next[1..n]. Set[x] returns the name of the set that contains item x. A is a set if and only if Set[A] = A next[x] returns the next item on the list of the set that contains item x. 14 / 114
  • 37.
    Circular lists Example: n= 16, Partition: {{1, 2, 8, 9} , {4, 3, 10, 13, 14, 15, 16} , {7, 6, 5, 11, 12} Set next 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 4 4 7 7 7 1 1 4 7 7 4 4 4 4 2 8 10 3 12 5 6 9 1 13 7 11 14 15 16 4 Set Position 1 15 / 114
  • 38.
    Circular lists Example: n= 16, Partition: {{1, 2, 8, 9} , {4, 3, 10, 13, 14, 15, 16} , {7, 6, 5, 11, 12} Set next 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 4 4 7 7 7 1 1 4 7 7 4 4 4 4 2 8 10 3 12 5 6 9 1 13 7 11 14 15 16 4 Set Position 1 1 2 8 91 15 / 114
  • 39.
    Circular lists Set Position7 7 6 5 12 117 Set Position 4 16 / 114
  • 40.
    Circular lists Set Position7 7 6 5 12 117 Set Position 4 4 3 10 13 14 15 164 16 / 114
  • 41.
    Operations and Cost Make(x) 1Set[x] = x 2 next[x] = x Complexity O (1) Time Find(x) 1 return Set[x] Complexity O (1) Time 17 / 114
  • 42.
    Operations and Cost Make(x) 1Set[x] = x 2 next[x] = x Complexity O (1) Time Find(x) 1 return Set[x] Complexity O (1) Time 17 / 114
  • 43.
    Operations and Cost Make(x) 1Set[x] = x 2 next[x] = x Complexity O (1) Time Find(x) 1 return Set[x] Complexity O (1) Time 17 / 114
  • 44.
    Operations and Cost Make(x) 1Set[x] = x 2 next[x] = x Complexity O (1) Time Find(x) 1 return Set[x] Complexity O (1) Time 17 / 114
  • 45.
    Operations and Cost Forthe union We are assuming Set[A] = A =Set[B] = B Union1(A, B) 1 Set[B] = A 2 x =next[B] 3 while (x = B) 4 Set[x] = A /* Rename Set B to A*/ 5 x =next[x] 6 x =next[B] /* Splice list A and B */ 7 next[B] =next[A] 8 next[A] = x 18 / 114
  • 46.
    Operations and Cost Forthe union We are assuming Set[A] = A =Set[B] = B Union1(A, B) 1 Set[B] = A 2 x =next[B] 3 while (x = B) 4 Set[x] = A /* Rename Set B to A*/ 5 x =next[x] 6 x =next[B] /* Splice list A and B */ 7 next[B] =next[A] 8 next[A] = x 18 / 114
  • 47.
    Operations and Cost Forthe union We are assuming Set[A] = A =Set[B] = B Union1(A, B) 1 Set[B] = A 2 x =next[B] 3 while (x = B) 4 Set[x] = A /* Rename Set B to A*/ 5 x =next[x] 6 x =next[B] /* Splice list A and B */ 7 next[B] =next[A] 8 next[A] = x 18 / 114
  • 48.
    Operations an Cost Thus,we have in the Splice part A B x 19 / 114
  • 49.
    We have aProblem Complexity O (|B|) Time Not only that, if we have the following sequence of operations 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union1(x + 1, x) 20 / 114
  • 50.
    We have aProblem Complexity O (|B|) Time Not only that, if we have the following sequence of operations 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union1(x + 1, x) 20 / 114
  • 51.
    Thus Thus, we havethe following number of aggregated steps n + n−1 i=1 i = n + n (n − 1) 2 = n + n2 − n 2 = n2 2 + n 2 = Θ n2 21 / 114
  • 52.
    Thus Thus, we havethe following number of aggregated steps n + n−1 i=1 i = n + n (n − 1) 2 = n + n2 − n 2 = n2 2 + n 2 = Θ n2 21 / 114
  • 53.
    Thus Thus, we havethe following number of aggregated steps n + n−1 i=1 i = n + n (n − 1) 2 = n + n2 − n 2 = n2 2 + n 2 = Θ n2 21 / 114
  • 54.
    Thus Thus, we havethe following number of aggregated steps n + n−1 i=1 i = n + n (n − 1) 2 = n + n2 − n 2 = n2 2 + n 2 = Θ n2 21 / 114
  • 55.
    Thus Thus, we havethe following number of aggregated steps n + n−1 i=1 i = n + n (n − 1) 2 = n + n2 − n 2 = n2 2 + n 2 = Θ n2 21 / 114
  • 56.
    Aggregate Time Thus, theaggregate time is as follow Aggregate Time = Θ n2 Therefore Amortized Time per operation = Θ (n) 22 / 114
  • 57.
    Aggregate Time Thus, theaggregate time is as follow Aggregate Time = Θ n2 Therefore Amortized Time per operation = Θ (n) 22 / 114
  • 58.
    This is notexactly good Thus, we need to have something better We will try now the Weighted-Union Heuristic!!! 23 / 114
  • 59.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 24 / 114
  • 60.
    Implementation 2: Weighted-UnionHeuristic Lists We extend the previous data structure Data structure: Three arrays Set[1..n], next[1..n], size[1..n]. size[A] returns the number of items in set A if A == Set[A] (Otherwise, we do not care). 25 / 114
  • 61.
    Operations MakeSet(x) 1 Set[x] =x 2 next[x] = x 3 size[x] = 1 Complexity O (1) time Find(x) 1 return Set[x] Complexity O (1) time 26 / 114
  • 62.
    Operations MakeSet(x) 1 Set[x] =x 2 next[x] = x 3 size[x] = 1 Complexity O (1) time Find(x) 1 return Set[x] Complexity O (1) time 26 / 114
  • 63.
    Operations MakeSet(x) 1 Set[x] =x 2 next[x] = x 3 size[x] = 1 Complexity O (1) time Find(x) 1 return Set[x] Complexity O (1) time 26 / 114
  • 64.
    Operations MakeSet(x) 1 Set[x] =x 2 next[x] = x 3 size[x] = 1 Complexity O (1) time Find(x) 1 return Set[x] Complexity O (1) time 26 / 114
  • 65.
    Operations Union2(A, B) 1 ifsize[set [A]] >size[set [B]] 2 size[set [A]] =size[set [A]]+size[set [B]] 3 Union1(A, B) 4 else 5 size[set [B]] =size[set [A]]+size[set [B]] 6 Union1(B, A) Note: Weight Balanced Union: Merge smaller set into large set Complexity O (min {|A| , |B|}) time. 27 / 114
  • 66.
    Operations Union2(A, B) 1 ifsize[set [A]] >size[set [B]] 2 size[set [A]] =size[set [A]]+size[set [B]] 3 Union1(A, B) 4 else 5 size[set [B]] =size[set [A]]+size[set [B]] 6 Union1(B, A) Note: Weight Balanced Union: Merge smaller set into large set Complexity O (min {|A| , |B|}) time. 27 / 114
  • 67.
    What about theoperations eliciting the worst behavior Remember 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union2(x + 1, x) We have then n + n−1 i=1 1 = n + n − 1 = 2n − 1 = Θ (n) IMPORTANT: This is not the worst sequence!!! 28 / 114
  • 68.
    What about theoperations eliciting the worst behavior Remember 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union2(x + 1, x) We have then n + n−1 i=1 1 = n + n − 1 = 2n − 1 = Θ (n) IMPORTANT: This is not the worst sequence!!! 28 / 114
  • 69.
    For this, noticethe following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114
  • 70.
    For this, noticethe following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114
  • 71.
    For this, noticethe following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114
  • 72.
    For this, noticethe following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114
  • 73.
    For this, noticethe following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114
  • 74.
    For this, noticethe following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114
  • 75.
    For this, noticethe following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114
  • 76.
    For this, noticethe following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114
  • 77.
    For this, noticethe following worst sequence Example n = 16 Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15, 16} Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16} Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16} Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16} 30 / 114
  • 78.
    For this, noticethe following worst sequence Example n = 16 Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15, 16} Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16} Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16} Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16} 30 / 114
  • 79.
    For this, noticethe following worst sequence Example n = 16 Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15, 16} Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16} Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16} Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16} 30 / 114
  • 80.
    For this, noticethe following worst sequence Example n = 16 Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15, 16} Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16} Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16} Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16} 30 / 114
  • 81.
    For this, noticethe following worst sequence Example n = 16 Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15, 16} Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16} Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16} Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16} 30 / 114
  • 82.
    Now Given the previousworst case What is the complexity of this implementation? 31 / 114
  • 83.
    Now, the AmortizedCosts of this implementation Claim 1: Amortized time per operation is O(log n) For this, we have the following theorem!!! Theorem 1 Using the linked-list representation of disjoint sets and the weighted-Union heuristic, a sequence of m MakeSet, Union, and FindSet operations, n of which are MakeSet operations, takes O (m + n log n) time. 32 / 114
  • 84.
    Now, the AmortizedCosts of this implementation Claim 1: Amortized time per operation is O(log n) For this, we have the following theorem!!! Theorem 1 Using the linked-list representation of disjoint sets and the weighted-Union heuristic, a sequence of m MakeSet, Union, and FindSet operations, n of which are MakeSet operations, takes O (m + n log n) time. 32 / 114
  • 85.
    Proof Because each Unionoperation unites two disjoint sets We perform at most n − 1 Union operations over all. We now bound the total time taken by these Union operations We start by determining, for each object, an upper bound on the number of times the object’s pointer back to its set object is updated. Consider a particular object x. We know that each time x’s pointer was updated, x must have started in the smaller set. The first time x’s pointer was updated, therefore, the resulting set must have had at least 2 members. Similarly, the next time x’s pointer was updated, the resulting set must have had at least 4 members. 33 / 114
  • 86.
    Proof Because each Unionoperation unites two disjoint sets We perform at most n − 1 Union operations over all. We now bound the total time taken by these Union operations We start by determining, for each object, an upper bound on the number of times the object’s pointer back to its set object is updated. Consider a particular object x. We know that each time x’s pointer was updated, x must have started in the smaller set. The first time x’s pointer was updated, therefore, the resulting set must have had at least 2 members. Similarly, the next time x’s pointer was updated, the resulting set must have had at least 4 members. 33 / 114
  • 87.
    Proof Because each Unionoperation unites two disjoint sets We perform at most n − 1 Union operations over all. We now bound the total time taken by these Union operations We start by determining, for each object, an upper bound on the number of times the object’s pointer back to its set object is updated. Consider a particular object x. We know that each time x’s pointer was updated, x must have started in the smaller set. The first time x’s pointer was updated, therefore, the resulting set must have had at least 2 members. Similarly, the next time x’s pointer was updated, the resulting set must have had at least 4 members. 33 / 114
  • 88.
    Proof Because each Unionoperation unites two disjoint sets We perform at most n − 1 Union operations over all. We now bound the total time taken by these Union operations We start by determining, for each object, an upper bound on the number of times the object’s pointer back to its set object is updated. Consider a particular object x. We know that each time x’s pointer was updated, x must have started in the smaller set. The first time x’s pointer was updated, therefore, the resulting set must have had at least 2 members. Similarly, the next time x’s pointer was updated, the resulting set must have had at least 4 members. 33 / 114
  • 89.
    Proof Because each Unionoperation unites two disjoint sets We perform at most n − 1 Union operations over all. We now bound the total time taken by these Union operations We start by determining, for each object, an upper bound on the number of times the object’s pointer back to its set object is updated. Consider a particular object x. We know that each time x’s pointer was updated, x must have started in the smaller set. The first time x’s pointer was updated, therefore, the resulting set must have had at least 2 members. Similarly, the next time x’s pointer was updated, the resulting set must have had at least 4 members. 33 / 114
  • 90.
    Proof Continuing on We observethat for any k ≤ n, after x’s pointer has been updated log n times!!! The resulting set must have at least k members. Thus Since the largest set has at most n members, each object’s pointer is updated at most log n times over all the Union operations. Then The total time spent updating object pointers over all Union operations is O (n log n). 34 / 114
  • 91.
    Proof Continuing on We observethat for any k ≤ n, after x’s pointer has been updated log n times!!! The resulting set must have at least k members. Thus Since the largest set has at most n members, each object’s pointer is updated at most log n times over all the Union operations. Then The total time spent updating object pointers over all Union operations is O (n log n). 34 / 114
  • 92.
    Proof Continuing on We observethat for any k ≤ n, after x’s pointer has been updated log n times!!! The resulting set must have at least k members. Thus Since the largest set has at most n members, each object’s pointer is updated at most log n times over all the Union operations. Then The total time spent updating object pointers over all Union operations is O (n log n). 34 / 114
  • 93.
    Proof Continuing on We observethat for any k ≤ n, after x’s pointer has been updated log n times!!! The resulting set must have at least k members. Thus Since the largest set has at most n members, each object’s pointer is updated at most log n times over all the Union operations. Then The total time spent updating object pointers over all Union operations is O (n log n). 34 / 114
  • 94.
    Proof We must alsoaccount for updating the tail pointers and the list lengths It takes only O (1) time per Union operation Therefore The total time spent in all Union operations is thus O (n log n). The time for the entire sequence of m operations follows easily Each MakeSet and FindSet operation takes O (1) time, and there are O (m) of them. 35 / 114
  • 95.
    Proof We must alsoaccount for updating the tail pointers and the list lengths It takes only O (1) time per Union operation Therefore The total time spent in all Union operations is thus O (n log n). The time for the entire sequence of m operations follows easily Each MakeSet and FindSet operation takes O (1) time, and there are O (m) of them. 35 / 114
  • 96.
    Proof We must alsoaccount for updating the tail pointers and the list lengths It takes only O (1) time per Union operation Therefore The total time spent in all Union operations is thus O (n log n). The time for the entire sequence of m operations follows easily Each MakeSet and FindSet operation takes O (1) time, and there are O (m) of them. 35 / 114
  • 97.
    Proof Therefore The total timefor the entire sequence is thus O (m + n log n). 36 / 114
  • 98.
    Amortized Cost: AggregateAnalysis Aggregate cost O(m + n log n). Amortized cost per operation O(log n). O(m + n log n) m = O (1 + log n) = O (log n) (2) 37 / 114
  • 99.
    There are otherways of analyzing the amortized cost It is possible to use 1 Accounting Method. 2 Potential Method. 38 / 114
  • 100.
    Amortized Costs: AccountingMethod Accounting method MakeSet(x): Charge (1 + log n). 1 to do the operation, log n stored as credit with item x. Find(x): Charge 1, and use it to do the operation. Union(A, B): Charge 0 and use 1 stored credit from each item in the smaller set to move it. 39 / 114
  • 101.
    Amortized Costs: AccountingMethod Accounting method MakeSet(x): Charge (1 + log n). 1 to do the operation, log n stored as credit with item x. Find(x): Charge 1, and use it to do the operation. Union(A, B): Charge 0 and use 1 stored credit from each item in the smaller set to move it. 39 / 114
  • 102.
    Amortized Costs: AccountingMethod Accounting method MakeSet(x): Charge (1 + log n). 1 to do the operation, log n stored as credit with item x. Find(x): Charge 1, and use it to do the operation. Union(A, B): Charge 0 and use 1 stored credit from each item in the smaller set to move it. 39 / 114
  • 103.
    Amortized Costs: AccountingMethod Credit invariant Total stored credit is S |S| log n |S| , where the summation is taken over the collection S of all disjoint sets of the current partition. 40 / 114
  • 104.
    Amortized Costs: PotentialMethod Potential function method Exercise: Define a regular potential function and use it to do the amortized analysis. Can you make the Union amortized cost O(log n), MakeSet and Find costs O(1)? 41 / 114
  • 105.
    Amortized Costs: PotentialMethod Potential function method Exercise: Define a regular potential function and use it to do the amortized analysis. Can you make the Union amortized cost O(log n), MakeSet and Find costs O(1)? 41 / 114
  • 106.
    Amortized Costs: PotentialMethod Potential function method Exercise: Define a regular potential function and use it to do the amortized analysis. Can you make the Union amortized cost O(log n), MakeSet and Find costs O(1)? 41 / 114
  • 107.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 42 / 114
  • 108.
    Improving over theheuristic using union by rank Union by Rank Instead of using the number of nodes in each tree to make a decision, we maintain a rank, a upper bound on the height of the tree. We have the following data structure to support this: We maintain a parent array p[1..n]. A is a set if and only if A = p[A] (a tree root). x ∈ A if and only if x is in the tree rooted at A. 43 / 114
  • 109.
    Improving over theheuristic using union by rank Union by Rank Instead of using the number of nodes in each tree to make a decision, we maintain a rank, a upper bound on the height of the tree. We have the following data structure to support this: We maintain a parent array p[1..n]. A is a set if and only if A = p[A] (a tree root). x ∈ A if and only if x is in the tree rooted at A. 43 / 114
  • 110.
    Improving over theheuristic using union by rank Union by Rank Instead of using the number of nodes in each tree to make a decision, we maintain a rank, a upper bound on the height of the tree. We have the following data structure to support this: We maintain a parent array p[1..n]. A is a set if and only if A = p[A] (a tree root). x ∈ A if and only if x is in the tree rooted at A. 43 / 114
  • 111.
    Improving over theheuristic using union by rank Union by Rank Instead of using the number of nodes in each tree to make a decision, we maintain a rank, a upper bound on the height of the tree. We have the following data structure to support this: We maintain a parent array p[1..n]. A is a set if and only if A = p[A] (a tree root). x ∈ A if and only if x is in the tree rooted at A. 43 / 114
  • 112.
    Improving over theheuristic using union by rank Union by Rank Instead of using the number of nodes in each tree to make a decision, we maintain a rank, a upper bound on the height of the tree. We have the following data structure to support this: We maintain a parent array p[1..n]. A is a set if and only if A = p[A] (a tree root). x ∈ A if and only if x is in the tree rooted at A. 1 13 5 8 20 14 10 19 4 2 18 6 15 9 11 17 7 3 12 16 43 / 114
  • 113.
    Forest of Up-Trees:Operations without union by rank or weight MakeSet(x) 1 p[x] = x Complexity O (1) time Union(A, B) 1 p[B] = A Note: We are assuming that p[A] == A =p[B] == B. This is the reason we need a find operation!!! 44 / 114
  • 114.
    Forest of Up-Trees:Operations without union by rank or weight MakeSet(x) 1 p[x] = x Complexity O (1) time Union(A, B) 1 p[B] = A Note: We are assuming that p[A] == A =p[B] == B. This is the reason we need a find operation!!! 44 / 114
  • 115.
    Forest of Up-Trees:Operations without union by rank or weight MakeSet(x) 1 p[x] = x Complexity O (1) time Union(A, B) 1 p[B] = A Note: We are assuming that p[A] == A =p[B] == B. This is the reason we need a find operation!!! 44 / 114
  • 116.
    Example Remember we aredoing the joins without caring about getting the worst case B A 45 / 114
  • 117.
    Forest of Up-Trees:Operations without union by rank or weight Find(x) 1 if x ==p[x] 2 return x 3 return Find(p [x]) Example 46 / 114
  • 118.
    Forest of Up-Trees:Operations without union by rank or weight Find(x) 1 if x ==p[x] 2 return x 3 return Find(p [x]) Example x 46 / 114
  • 119.
    Forest of Up-Trees:Operations without union by rank or weight Still I can give you a horrible case Sequence of operations 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union(x) 5 for x = 1 to n − 1 6 Find(1) 47 / 114
  • 120.
    Forest of Up-Trees:Operations without union by rank or weight Still I can give you a horrible case Sequence of operations 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union(x) 5 for x = 1 to n − 1 6 Find(1) 47 / 114
  • 121.
    Forest of Up-Trees:Operations without union by rank or weight We finish with this data structure 1 2 n-1 n Thus the last part of the sequence give us a total time of Aggregate Time Θ n2 Amortized Analysis per operation Θ (n) 48 / 114
  • 122.
    Forest of Up-Trees:Operations without union by rank or weight We finish with this data structure 1 2 n-1 n Thus the last part of the sequence give us a total time of Aggregate Time Θ n2 Amortized Analysis per operation Θ (n) 48 / 114
  • 123.
    Self-Adjusting forest ofUp-Trees How, we avoid this problem Use together the following heuristics!!! 1 Balanced Union. By tree weight (i.e., size) By tree rank (i.e., height) 2 Find with path compression Observations Each single improvement (1 or 2) by itself will result in logarithmic amortized cost per operation. The two improvements combined will result in amortized cost per operation approaching very close to O(1). 49 / 114
  • 124.
    Self-Adjusting forest ofUp-Trees How, we avoid this problem Use together the following heuristics!!! 1 Balanced Union. By tree weight (i.e., size) By tree rank (i.e., height) 2 Find with path compression Observations Each single improvement (1 or 2) by itself will result in logarithmic amortized cost per operation. The two improvements combined will result in amortized cost per operation approaching very close to O(1). 49 / 114
  • 125.
    Self-Adjusting forest ofUp-Trees How, we avoid this problem Use together the following heuristics!!! 1 Balanced Union. By tree weight (i.e., size) By tree rank (i.e., height) 2 Find with path compression Observations Each single improvement (1 or 2) by itself will result in logarithmic amortized cost per operation. The two improvements combined will result in amortized cost per operation approaching very close to O(1). 49 / 114
  • 126.
    Self-Adjusting forest ofUp-Trees How, we avoid this problem Use together the following heuristics!!! 1 Balanced Union. By tree weight (i.e., size) By tree rank (i.e., height) 2 Find with path compression Observations Each single improvement (1 or 2) by itself will result in logarithmic amortized cost per operation. The two improvements combined will result in amortized cost per operation approaching very close to O(1). 49 / 114
  • 127.
    Self-Adjusting forest ofUp-Trees How, we avoid this problem Use together the following heuristics!!! 1 Balanced Union. By tree weight (i.e., size) By tree rank (i.e., height) 2 Find with path compression Observations Each single improvement (1 or 2) by itself will result in logarithmic amortized cost per operation. The two improvements combined will result in amortized cost per operation approaching very close to O(1). 49 / 114
  • 128.
    Self-Adjusting forest ofUp-Trees How, we avoid this problem Use together the following heuristics!!! 1 Balanced Union. By tree weight (i.e., size) By tree rank (i.e., height) 2 Find with path compression Observations Each single improvement (1 or 2) by itself will result in logarithmic amortized cost per operation. The two improvements combined will result in amortized cost per operation approaching very close to O(1). 49 / 114
  • 129.
    Self-Adjusting forest ofUp-Trees How, we avoid this problem Use together the following heuristics!!! 1 Balanced Union. By tree weight (i.e., size) By tree rank (i.e., height) 2 Find with path compression Observations Each single improvement (1 or 2) by itself will result in logarithmic amortized cost per operation. The two improvements combined will result in amortized cost per operation approaching very close to O(1). 49 / 114
  • 130.
    Balanced Union bySize Using size for Balanced Union We can use the size of each set to obtain what we want 50 / 114
  • 131.
    We have then MakeSet(x) 1p[x] = x 2 size[x] = 1 Note: Complexity O (1) time Union(A, B) Input: assume that p[A]=A=p[B]=B 1 if size[A] >size[B] 2 size[A] =size[A]+size[B] 3 p[B] = A 4 else 5 size[B] =size[A]+size[B] 6 p[A] = B Note: Complexity O (1) time 51 / 114
  • 132.
    We have then MakeSet(x) 1p[x] = x 2 size[x] = 1 Note: Complexity O (1) time Union(A, B) Input: assume that p[A]=A=p[B]=B 1 if size[A] >size[B] 2 size[A] =size[A]+size[B] 3 p[B] = A 4 else 5 size[B] =size[A]+size[B] 6 p[A] = B Note: Complexity O (1) time 51 / 114
  • 133.
    We have then MakeSet(x) 1p[x] = x 2 size[x] = 1 Note: Complexity O (1) time Union(A, B) Input: assume that p[A]=A=p[B]=B 1 if size[A] >size[B] 2 size[A] =size[A]+size[B] 3 p[B] = A 4 else 5 size[B] =size[A]+size[B] 6 p[A] = B Note: Complexity O (1) time 51 / 114
  • 134.
    Example Now, we usethe size for the union size[A]>size[B] B A 52 / 114
  • 135.
    Nevertheless Union by sizecan make the analysis too complex People would rather use the rank Rank It is defined as the height of the tree Because The use of the rank simplify the amortized analysis for the data structure!!! 53 / 114
  • 136.
    Nevertheless Union by sizecan make the analysis too complex People would rather use the rank Rank It is defined as the height of the tree Because The use of the rank simplify the amortized analysis for the data structure!!! 53 / 114
  • 137.
    Nevertheless Union by sizecan make the analysis too complex People would rather use the rank Rank It is defined as the height of the tree Because The use of the rank simplify the amortized analysis for the data structure!!! 53 / 114
  • 138.
    Thus, we usethe balanced union by rank MakeSet(x) 1 p[x] = x 2 rank[x] = 0 Note: Complexity O (1) time Union(A, B) Input: assume that p[A]=A=p[B]=B 1 if rank[A] >rank[B] 2 p[B] = A 3 else 4 p[A] = B 5 if rank[A] ==rank[B] 6 rank[B]=rank[B]+1 Note: Complexity O (1) time 54 / 114
  • 139.
    Thus, we usethe balanced union by rank MakeSet(x) 1 p[x] = x 2 rank[x] = 0 Note: Complexity O (1) time Union(A, B) Input: assume that p[A]=A=p[B]=B 1 if rank[A] >rank[B] 2 p[B] = A 3 else 4 p[A] = B 5 if rank[A] ==rank[B] 6 rank[B]=rank[B]+1 Note: Complexity O (1) time 54 / 114
  • 140.
    Thus, we usethe balanced union by rank MakeSet(x) 1 p[x] = x 2 rank[x] = 0 Note: Complexity O (1) time Union(A, B) Input: assume that p[A]=A=p[B]=B 1 if rank[A] >rank[B] 2 p[B] = A 3 else 4 p[A] = B 5 if rank[A] ==rank[B] 6 rank[B]=rank[B]+1 Note: Complexity O (1) time 54 / 114
  • 141.
    Example Now We use therank for the union Case I The rank of A is larger than B 55 / 114
  • 142.
    Example Now We use therank for the union Case I The rank of A is larger than B rank[A]>rank[B] B A 55 / 114
  • 143.
    Example Case II The rankof B is larger than A 56 / 114
  • 144.
    Example Case II The rankof B is larger than A rank[B]>rank[A] B A 56 / 114
  • 145.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 57 / 114
  • 146.
    Here is thenew heuristic to improve overall performance: Path Compression Find(x) 1 if x=p[x] 2 p[x]=Find(p [x]) 3 return p[x] Complexity O (depth (x)) time 58 / 114
  • 147.
    Here is thenew heuristic to improve overall performance: Path Compression Find(x) 1 if x=p[x] 2 p[x]=Find(p [x]) 3 return p[x] Complexity O (depth (x)) time 58 / 114
  • 148.
    Example We have thefollowing structure 59 / 114
  • 149.
  • 150.
  • 151.
    Path compression Find(x) shouldtraverse the path from x up to its root. This might as well create shortcuts along the way to improve the efficiency of the future operations. Find(2) 3 13 15 10 13 15 10 3 12 14 12 1411 11 2 2 1 1 7 8 7 84 6 5 9 4 6 5 9 62 / 114
  • 152.
    Outline 1 Disjoint SetRepresentation Definition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 63 / 114
  • 153.
    Time complexity Tight upperbound on time complexity An amortized time of O(mα(m, n)) for m operations. Where α(m, n) is the inverse of the Ackermann’s function (almost a constant). This bound, for a slightly different definition of α than that given here is shown in Cormen’s book. 64 / 114
  • 154.
    Time complexity Tight upperbound on time complexity An amortized time of O(mα(m, n)) for m operations. Where α(m, n) is the inverse of the Ackermann’s function (almost a constant). This bound, for a slightly different definition of α than that given here is shown in Cormen’s book. 64 / 114
  • 155.
    Time complexity Tight upperbound on time complexity An amortized time of O(mα(m, n)) for m operations. Where α(m, n) is the inverse of the Ackermann’s function (almost a constant). This bound, for a slightly different definition of α than that given here is shown in Cormen’s book. 64 / 114
  • 156.
    Ackermann’s Function Definition A(1, j)= 2j where j ≥ 1 A(i, 1) = A(i − 1, 2) where i ≥ 2 A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2 Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. Cormen’s book authors give a different definition, although they never really call theirs Ackermann’s function. Property Ackermann’s function grows very fast, thus it’s inverse grows very slow. 65 / 114
  • 157.
    Ackermann’s Function Definition A(1, j)= 2j where j ≥ 1 A(i, 1) = A(i − 1, 2) where i ≥ 2 A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2 Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. Cormen’s book authors give a different definition, although they never really call theirs Ackermann’s function. Property Ackermann’s function grows very fast, thus it’s inverse grows very slow. 65 / 114
  • 158.
    Ackermann’s Function Definition A(1, j)= 2j where j ≥ 1 A(i, 1) = A(i − 1, 2) where i ≥ 2 A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2 Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. Cormen’s book authors give a different definition, although they never really call theirs Ackermann’s function. Property Ackermann’s function grows very fast, thus it’s inverse grows very slow. 65 / 114
  • 159.
    Ackermann’s Function Definition A(1, j)= 2j where j ≥ 1 A(i, 1) = A(i − 1, 2) where i ≥ 2 A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2 Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. Cormen’s book authors give a different definition, although they never really call theirs Ackermann’s function. Property Ackermann’s function grows very fast, thus it’s inverse grows very slow. 65 / 114
  • 160.
    Ackermann’s Function Definition A(1, j)= 2j where j ≥ 1 A(i, 1) = A(i − 1, 2) where i ≥ 2 A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2 Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. Cormen’s book authors give a different definition, although they never really call theirs Ackermann’s function. Property Ackermann’s function grows very fast, thus it’s inverse grows very slow. 65 / 114
  • 161.
    Ackermann’s Function Definition A(1, j)= 2j where j ≥ 1 A(i, 1) = A(i − 1, 2) where i ≥ 2 A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2 Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. Cormen’s book authors give a different definition, although they never really call theirs Ackermann’s function. Property Ackermann’s function grows very fast, thus it’s inverse grows very slow. 65 / 114
  • 162.
    Ackermann’s Function Example A(3,4) A (3, 4) = 2 2... 2 2 2 ... 2 2 2 ... 2 16 Notation: 2 2... 2 10 means 22222222222 66 / 114
  • 163.
    Inverse of Ackermann’sfunction Definition α(m, n) = min i ≥ 1|A i, m n > log n (3) Note: This is not a true mathematical inverse. Intuition: Grows about as slowly as Ackermann’s function does fast. How slowly? Let m n = k, then m ≥ n → k ≥ 1 67 / 114
  • 164.
    Inverse of Ackermann’sfunction Definition α(m, n) = min i ≥ 1|A i, m n > log n (3) Note: This is not a true mathematical inverse. Intuition: Grows about as slowly as Ackermann’s function does fast. How slowly? Let m n = k, then m ≥ n → k ≥ 1 67 / 114
  • 165.
    Inverse of Ackermann’sfunction Definition α(m, n) = min i ≥ 1|A i, m n > log n (3) Note: This is not a true mathematical inverse. Intuition: Grows about as slowly as Ackermann’s function does fast. How slowly? Let m n = k, then m ≥ n → k ≥ 1 67 / 114
  • 166.
    Inverse of Ackermann’sfunction Definition α(m, n) = min i ≥ 1|A i, m n > log n (3) Note: This is not a true mathematical inverse. Intuition: Grows about as slowly as Ackermann’s function does fast. How slowly? Let m n = k, then m ≥ n → k ≥ 1 67 / 114
  • 167.
    Thus First We can showthat A(i, k) ≥ A(i, 1) for all i ≥ 1. This is left to you... For Example Consider i = 4, then A(i, k) ≥ A(4, 1) = 2 2...2 10 ≈ 1080. Finally if log n < 1080. i.e., if n < 21080 =⇒ α(m, n) ≤ 4 68 / 114
  • 168.
    Thus First We can showthat A(i, k) ≥ A(i, 1) for all i ≥ 1. This is left to you... For Example Consider i = 4, then A(i, k) ≥ A(4, 1) = 2 2...2 10 ≈ 1080. Finally if log n < 1080. i.e., if n < 21080 =⇒ α(m, n) ≤ 4 68 / 114
  • 169.
    Thus First We can showthat A(i, k) ≥ A(i, 1) for all i ≥ 1. This is left to you... For Example Consider i = 4, then A(i, k) ≥ A(4, 1) = 2 2...2 10 ≈ 1080. Finally if log n < 1080. i.e., if n < 21080 =⇒ α(m, n) ≤ 4 68 / 114
  • 170.
    Thus First We can showthat A(i, k) ≥ A(i, 1) for all i ≥ 1. This is left to you... For Example Consider i = 4, then A(i, k) ≥ A(4, 1) = 2 2...2 10 ≈ 1080. Finally if log n < 1080. i.e., if n < 21080 =⇒ α(m, n) ≤ 4 68 / 114
  • 171.
    Thus First We can showthat A(i, k) ≥ A(i, 1) for all i ≥ 1. This is left to you... For Example Consider i = 4, then A(i, k) ≥ A(4, 1) = 2 2...2 10 ≈ 1080. Finally if log n < 1080. i.e., if n < 21080 =⇒ α(m, n) ≤ 4 68 / 114
  • 172.
    Instead of Usingthe Ackermann Inverse We define the following function log∗ n = min i ≥ 0| log(i) n ≤ 1 (4) The i means log · · · log n i times Then We will establish O (m log∗ n) as upper bound. 69 / 114
  • 173.
    Instead of Usingthe Ackermann Inverse We define the following function log∗ n = min i ≥ 0| log(i) n ≤ 1 (4) The i means log · · · log n i times Then We will establish O (m log∗ n) as upper bound. 69 / 114
  • 174.
    In particular Something Notable Inparticular, we have that log∗ 2 2... 2 k = k + 1 For Example log∗ 265536 = 2 2222 4 = 5 (5) Therefore We have that log∗ n ≤ 5 for all practical purposes. 70 / 114
  • 175.
    In particular Something Notable Inparticular, we have that log∗ 2 2... 2 k = k + 1 For Example log∗ 265536 = 2 2222 4 = 5 (5) Therefore We have that log∗ n ≤ 5 for all practical purposes. 70 / 114
  • 176.
    In particular Something Notable Inparticular, we have that log∗ 2 2... 2 k = k + 1 For Example log∗ 265536 = 2 2222 4 = 5 (5) Therefore We have that log∗ n ≤ 5 for all practical purposes. 70 / 114
  • 177.
    The Rank Observation SomethingNotable It is that once somebody becomes a child of another node their rank does not change given any posterior operation. 71 / 114
  • 178.
    For Example The numberin the right is the height MakeSet(1),MakeSet(2),MakeSet(3), ..., MakeSet(10) 1/0 2/0 3/0 4/0 5/0 6/0 7/0 8/0 9/0 10/0 72 / 114
  • 179.
    Example Now, we do Union(6,1),Union(7, 2), ..., Union(10, 1) 1/1 2/1 3/1 4/1 5/1 6/0 7/0 8/0 9/0 10/0 73 / 114
  • 180.
    Example Next - Assumingthat you are using a FindSet to get the name set Union(1, 2) 3/1 4/1 5/1 8/0 9/0 10/0 2/2 7/01/1 6/0 74 / 114
  • 181.
  • 182.
  • 183.
    Example Now you givea FindSet(8) 2/2 1/1 6/0 7/0 4/3 9/0 5/1 10/03/1 8/0 77 / 114
  • 184.
    Example Now you givea Union(4, 5) 2/2 1/1 6/0 7/0 4/3 9/03/1 8/0 5/1 10/0 78 / 114
  • 185.
    Properties of ranks Lemma1 (About the Rank Properties) 1 ∀x, rank[x] ≤ rank[p[x]]. 2 ∀x and x = p[x], then rank[x] < rank[p[x]]. 3 rank[x] is initially 0. 4 rank[x] does not decrease. 5 Once x = p[x] holds rank[x] does not change. 6 rank[p[x]] is a monotonically increasing function of time. Proof By induction on the number of operations... 79 / 114
  • 186.
    Properties of ranks Lemma1 (About the Rank Properties) 1 ∀x, rank[x] ≤ rank[p[x]]. 2 ∀x and x = p[x], then rank[x] < rank[p[x]]. 3 rank[x] is initially 0. 4 rank[x] does not decrease. 5 Once x = p[x] holds rank[x] does not change. 6 rank[p[x]] is a monotonically increasing function of time. Proof By induction on the number of operations... 79 / 114
  • 187.
    Properties of ranks Lemma1 (About the Rank Properties) 1 ∀x, rank[x] ≤ rank[p[x]]. 2 ∀x and x = p[x], then rank[x] < rank[p[x]]. 3 rank[x] is initially 0. 4 rank[x] does not decrease. 5 Once x = p[x] holds rank[x] does not change. 6 rank[p[x]] is a monotonically increasing function of time. Proof By induction on the number of operations... 79 / 114
  • 188.
    Properties of ranks Lemma1 (About the Rank Properties) 1 ∀x, rank[x] ≤ rank[p[x]]. 2 ∀x and x = p[x], then rank[x] < rank[p[x]]. 3 rank[x] is initially 0. 4 rank[x] does not decrease. 5 Once x = p[x] holds rank[x] does not change. 6 rank[p[x]] is a monotonically increasing function of time. Proof By induction on the number of operations... 79 / 114
  • 189.
    Properties of ranks Lemma1 (About the Rank Properties) 1 ∀x, rank[x] ≤ rank[p[x]]. 2 ∀x and x = p[x], then rank[x] < rank[p[x]]. 3 rank[x] is initially 0. 4 rank[x] does not decrease. 5 Once x = p[x] holds rank[x] does not change. 6 rank[p[x]] is a monotonically increasing function of time. Proof By induction on the number of operations... 79 / 114
  • 190.
    Properties of ranks Lemma1 (About the Rank Properties) 1 ∀x, rank[x] ≤ rank[p[x]]. 2 ∀x and x = p[x], then rank[x] < rank[p[x]]. 3 rank[x] is initially 0. 4 rank[x] does not decrease. 5 Once x = p[x] holds rank[x] does not change. 6 rank[p[x]] is a monotonically increasing function of time. Proof By induction on the number of operations... 79 / 114
  • 191.
    Properties of ranks Lemma1 (About the Rank Properties) 1 ∀x, rank[x] ≤ rank[p[x]]. 2 ∀x and x = p[x], then rank[x] < rank[p[x]]. 3 rank[x] is initially 0. 4 rank[x] does not decrease. 5 Once x = p[x] holds rank[x] does not change. 6 rank[p[x]] is a monotonically increasing function of time. Proof By induction on the number of operations... 79 / 114
  • 192.
    For Example Imagine aMakeSet(x) Then, rank [x] ≤ rank [p [x]] Thus, it is true after n operations. The we get the n + 1 operations that can be: Case I - FindSet. Case II - Union. The rest are for you to prove It is a good mental exercise!!! 80 / 114
  • 193.
    For Example Imagine aMakeSet(x) Then, rank [x] ≤ rank [p [x]] Thus, it is true after n operations. The we get the n + 1 operations that can be: Case I - FindSet. Case II - Union. The rest are for you to prove It is a good mental exercise!!! 80 / 114
  • 194.
    For Example Imagine aMakeSet(x) Then, rank [x] ≤ rank [p [x]] Thus, it is true after n operations. The we get the n + 1 operations that can be: Case I - FindSet. Case II - Union. The rest are for you to prove It is a good mental exercise!!! 80 / 114
  • 195.
    For Example Imagine aMakeSet(x) Then, rank [x] ≤ rank [p [x]] Thus, it is true after n operations. The we get the n + 1 operations that can be: Case I - FindSet. Case II - Union. The rest are for you to prove It is a good mental exercise!!! 80 / 114
  • 196.
    For Example Imagine aMakeSet(x) Then, rank [x] ≤ rank [p [x]] Thus, it is true after n operations. The we get the n + 1 operations that can be: Case I - FindSet. Case II - Union. The rest are for you to prove It is a good mental exercise!!! 80 / 114
  • 197.
    The Number ofNodes in a Tree Lemma 2 For all tree roots x, size(x) ≥ 2rank[x] Note size (x)= Number of nodes in tree rooted at x Proof By induction on the number of link operations: Basis Step Before first link, all ranks are 0 and each tree contains one node. Inductive Step Consider linking x and y (Link (x, y)) Assume lemma holds before this operation; we show that it will holds after. 81 / 114
  • 198.
    The Number ofNodes in a Tree Lemma 2 For all tree roots x, size(x) ≥ 2rank[x] Note size (x)= Number of nodes in tree rooted at x Proof By induction on the number of link operations: Basis Step Before first link, all ranks are 0 and each tree contains one node. Inductive Step Consider linking x and y (Link (x, y)) Assume lemma holds before this operation; we show that it will holds after. 81 / 114
  • 199.
    The Number ofNodes in a Tree Lemma 2 For all tree roots x, size(x) ≥ 2rank[x] Note size (x)= Number of nodes in tree rooted at x Proof By induction on the number of link operations: Basis Step Before first link, all ranks are 0 and each tree contains one node. Inductive Step Consider linking x and y (Link (x, y)) Assume lemma holds before this operation; we show that it will holds after. 81 / 114
  • 200.
    The Number ofNodes in a Tree Lemma 2 For all tree roots x, size(x) ≥ 2rank[x] Note size (x)= Number of nodes in tree rooted at x Proof By induction on the number of link operations: Basis Step Before first link, all ranks are 0 and each tree contains one node. Inductive Step Consider linking x and y (Link (x, y)) Assume lemma holds before this operation; we show that it will holds after. 81 / 114
  • 201.
    The Number ofNodes in a Tree Lemma 2 For all tree roots x, size(x) ≥ 2rank[x] Note size (x)= Number of nodes in tree rooted at x Proof By induction on the number of link operations: Basis Step Before first link, all ranks are 0 and each tree contains one node. Inductive Step Consider linking x and y (Link (x, y)) Assume lemma holds before this operation; we show that it will holds after. 81 / 114
  • 202.
    The Number ofNodes in a Tree Lemma 2 For all tree roots x, size(x) ≥ 2rank[x] Note size (x)= Number of nodes in tree rooted at x Proof By induction on the number of link operations: Basis Step Before first link, all ranks are 0 and each tree contains one node. Inductive Step Consider linking x and y (Link (x, y)) Assume lemma holds before this operation; we show that it will holds after. 81 / 114
  • 203.
    The Number ofNodes in a Tree Lemma 2 For all tree roots x, size(x) ≥ 2rank[x] Note size (x)= Number of nodes in tree rooted at x Proof By induction on the number of link operations: Basis Step Before first link, all ranks are 0 and each tree contains one node. Inductive Step Consider linking x and y (Link (x, y)) Assume lemma holds before this operation; we show that it will holds after. 81 / 114
  • 204.
    Case 1: rank[x]= rank[y] Assume rank [x] < rank [y] Note: rank [x] == rank [x] and rank [y] == rank [y] 82 / 114
  • 205.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y] = 2rank [y] 83 / 114
  • 206.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y] = 2rank [y] 83 / 114
  • 207.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y] = 2rank [y] 83 / 114
  • 208.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y] = 2rank [y] 83 / 114
  • 209.
    Case 2: rank[x]== rank[y] Assume rank [x] == rank [y] Note: rank [x] == rank [x] and rank [y] == rank [y] + 1 84 / 114
  • 210.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y]+1 = 2rank [y] Note: In the worst case rank [x] == rank [y] == 0 85 / 114
  • 211.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y]+1 = 2rank [y] Note: In the worst case rank [x] == rank [y] == 0 85 / 114
  • 212.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y]+1 = 2rank [y] Note: In the worst case rank [x] == rank [y] == 0 85 / 114
  • 213.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y]+1 = 2rank [y] Note: In the worst case rank [x] == rank [y] == 0 85 / 114
  • 214.
    Therefore We have that size(y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y]+1 = 2rank [y] Note: In the worst case rank [x] == rank [y] == 0 85 / 114
  • 215.
    The number ofnodes at certain rank Lemma 3 For any integer r ≥ 0, there are an most n 2r nodes of rank r. Proof First fix r. When rank r is assigned to some node x, then imagine that you label each node in the tree rooted at x by “x.” By lemma 21.3, 2r or more nodes are labeled each time when executing a union. By lemma 21.2, each node is labeled at most once, when its root is first assigned rank r. If there were more than n 2r nodes of rank r. Then, we will have that more than 2r · n 2r = n nodes would be labeled by a node of rank r, a contradiction. 86 / 114
  • 216.
    The number ofnodes at certain rank Lemma 3 For any integer r ≥ 0, there are an most n 2r nodes of rank r. Proof First fix r. When rank r is assigned to some node x, then imagine that you label each node in the tree rooted at x by “x.” By lemma 21.3, 2r or more nodes are labeled each time when executing a union. By lemma 21.2, each node is labeled at most once, when its root is first assigned rank r. If there were more than n 2r nodes of rank r. Then, we will have that more than 2r · n 2r = n nodes would be labeled by a node of rank r, a contradiction. 86 / 114
  • 217.
    The number ofnodes at certain rank Lemma 3 For any integer r ≥ 0, there are an most n 2r nodes of rank r. Proof First fix r. When rank r is assigned to some node x, then imagine that you label each node in the tree rooted at x by “x.” By lemma 21.3, 2r or more nodes are labeled each time when executing a union. By lemma 21.2, each node is labeled at most once, when its root is first assigned rank r. If there were more than n 2r nodes of rank r. Then, we will have that more than 2r · n 2r = n nodes would be labeled by a node of rank r, a contradiction. 86 / 114
  • 218.
    The number ofnodes at certain rank Lemma 3 For any integer r ≥ 0, there are an most n 2r nodes of rank r. Proof First fix r. When rank r is assigned to some node x, then imagine that you label each node in the tree rooted at x by “x.” By lemma 21.3, 2r or more nodes are labeled each time when executing a union. By lemma 21.2, each node is labeled at most once, when its root is first assigned rank r. If there were more than n 2r nodes of rank r. Then, we will have that more than 2r · n 2r = n nodes would be labeled by a node of rank r, a contradiction. 86 / 114
  • 219.
    The number ofnodes at certain rank Lemma 3 For any integer r ≥ 0, there are an most n 2r nodes of rank r. Proof First fix r. When rank r is assigned to some node x, then imagine that you label each node in the tree rooted at x by “x.” By lemma 21.3, 2r or more nodes are labeled each time when executing a union. By lemma 21.2, each node is labeled at most once, when its root is first assigned rank r. If there were more than n 2r nodes of rank r. Then, we will have that more than 2r · n 2r = n nodes would be labeled by a node of rank r, a contradiction. 86 / 114
  • 220.
    The number ofnodes at certain rank Lemma 3 For any integer r ≥ 0, there are an most n 2r nodes of rank r. Proof First fix r. When rank r is assigned to some node x, then imagine that you label each node in the tree rooted at x by “x.” By lemma 21.3, 2r or more nodes are labeled each time when executing a union. By lemma 21.2, each node is labeled at most once, when its root is first assigned rank r. If there were more than n 2r nodes of rank r. Then, we will have that more than 2r · n 2r = n nodes would be labeled by a node of rank r, a contradiction. 86 / 114
  • 221.
    The number ofnodes at certain rank Lemma 3 For any integer r ≥ 0, there are an most n 2r nodes of rank r. Proof First fix r. When rank r is assigned to some node x, then imagine that you label each node in the tree rooted at x by “x.” By lemma 21.3, 2r or more nodes are labeled each time when executing a union. By lemma 21.2, each node is labeled at most once, when its root is first assigned rank r. If there were more than n 2r nodes of rank r. Then, we will have that more than 2r · n 2r = n nodes would be labeled by a node of rank r, a contradiction. 86 / 114
  • 222.
    Corollary 1 Corollary 1 Everynode has rank at most log n . Proof if there is a rank r such that r > log n → n 2r < 1 nodes of rank r a contradiction. 87 / 114
  • 223.
    Corollary 1 Corollary 1 Everynode has rank at most log n . Proof if there is a rank r such that r > log n → n 2r < 1 nodes of rank r a contradiction. 87 / 114
  • 224.
    Providing the timebound Lemma 4 (Lemma 21.7) Suppose we convert a sequence S of m MakeSet, Union and FindSet operations into a sequence S of m MakeSet, Link, and FindSet operations by turning each Union into two FindSet operations followed by a Link. Then, if sequence S runs in O(m log∗ n) time, sequence S runs in O(m log∗ n) time. 88 / 114
  • 225.
    Proof: The proof isquite easy 1 Since each UNION operation in sequence S is converted into three operations in S. m ≤ m ≤ 3m (6) 2 We have that m = O (m ) 3 Then, if the new sequence S runs in O (m log∗ n) this implies that the old sequence S runs in O (m log∗ n) 89 / 114
  • 226.
    Proof: The proof isquite easy 1 Since each UNION operation in sequence S is converted into three operations in S. m ≤ m ≤ 3m (6) 2 We have that m = O (m ) 3 Then, if the new sequence S runs in O (m log∗ n) this implies that the old sequence S runs in O (m log∗ n) 89 / 114
  • 227.
    Proof: The proof isquite easy 1 Since each UNION operation in sequence S is converted into three operations in S. m ≤ m ≤ 3m (6) 2 We have that m = O (m ) 3 Then, if the new sequence S runs in O (m log∗ n) this implies that the old sequence S runs in O (m log∗ n) 89 / 114
  • 228.
    Theorem for Unionby Rank and Path Compression Theorem Any sequence of m MakeSet, Link, and FindSet operations, n of which are MakeSet operations, is performed in worst-case time O(m log∗ n). Proof First, MakeSet and Link take O(1) time. The Key of the Analysis is to Accurately Charging FindSet. 90 / 114
  • 229.
    Theorem for Unionby Rank and Path Compression Theorem Any sequence of m MakeSet, Link, and FindSet operations, n of which are MakeSet operations, is performed in worst-case time O(m log∗ n). Proof First, MakeSet and Link take O(1) time. The Key of the Analysis is to Accurately Charging FindSet. 90 / 114
  • 230.
    Theorem for Unionby Rank and Path Compression Theorem Any sequence of m MakeSet, Link, and FindSet operations, n of which are MakeSet operations, is performed in worst-case time O(m log∗ n). Proof First, MakeSet and Link take O(1) time. The Key of the Analysis is to Accurately Charging FindSet. 90 / 114
  • 231.
    For this, wehave the following We can do the following Partition ranks into blocks. Put each rank j into block log∗ r for r = 0, 1, ..., log n (Corollary 1). Highest-numbered block is log∗ (log n) = (log∗ n) − 1. In addition, the cost of FindSet pays for the foollowing situations 1 The FindSet pays for the cost of the root and its child. 2 A bill is given to every node whose rank parent changes in the path compression!!! 91 / 114
  • 232.
    For this, wehave the following We can do the following Partition ranks into blocks. Put each rank j into block log∗ r for r = 0, 1, ..., log n (Corollary 1). Highest-numbered block is log∗ (log n) = (log∗ n) − 1. In addition, the cost of FindSet pays for the foollowing situations 1 The FindSet pays for the cost of the root and its child. 2 A bill is given to every node whose rank parent changes in the path compression!!! 91 / 114
  • 233.
    For this, wehave the following We can do the following Partition ranks into blocks. Put each rank j into block log∗ r for r = 0, 1, ..., log n (Corollary 1). Highest-numbered block is log∗ (log n) = (log∗ n) − 1. In addition, the cost of FindSet pays for the foollowing situations 1 The FindSet pays for the cost of the root and its child. 2 A bill is given to every node whose rank parent changes in the path compression!!! 91 / 114
  • 234.
    For this, wehave the following We can do the following Partition ranks into blocks. Put each rank j into block log∗ r for r = 0, 1, ..., log n (Corollary 1). Highest-numbered block is log∗ (log n) = (log∗ n) − 1. In addition, the cost of FindSet pays for the foollowing situations 1 The FindSet pays for the cost of the root and its child. 2 A bill is given to every node whose rank parent changes in the path compression!!! 91 / 114
  • 235.
    For this, wehave the following We can do the following Partition ranks into blocks. Put each rank j into block log∗ r for r = 0, 1, ..., log n (Corollary 1). Highest-numbered block is log∗ (log n) = (log∗ n) − 1. In addition, the cost of FindSet pays for the foollowing situations 1 The FindSet pays for the cost of the root and its child. 2 A bill is given to every node whose rank parent changes in the path compression!!! 91 / 114
  • 236.
    Now, define theBlock function Define the following Upper Bound Function B(j) ≡    −1 if j = −1 1 if j = 0 2 if j = 1 2 2... 2 j−1 if j ≥ 2 92 / 114
  • 237.
    First Something Notable These aregoing to be the upper bounds for blocks in the ranks Where For j = 0, 1, ..., log∗ n − 1, block j consist of the set of ranks: B(j − 1) + 1, B(j − 1) + 2, ..., B(j) Elements in Block j (7) 93 / 114
  • 238.
    First Something Notable These aregoing to be the upper bounds for blocks in the ranks Where For j = 0, 1, ..., log∗ n − 1, block j consist of the set of ranks: B(j − 1) + 1, B(j − 1) + 2, ..., B(j) Elements in Block j (7) 93 / 114
  • 239.
    For Example We havethat B(−1) = 1 B(0) = 0 B(1) = 2 B(2) = 22 = 4 B(3) = 222 = 24 = 16 B(4) = 2222 = 216 = 65536 94 / 114
  • 240.
    For Example We havethat B(−1) = 1 B(0) = 0 B(1) = 2 B(2) = 22 = 4 B(3) = 222 = 24 = 16 B(4) = 2222 = 216 = 65536 94 / 114
  • 241.
    For Example We havethat B(−1) = 1 B(0) = 0 B(1) = 2 B(2) = 22 = 4 B(3) = 222 = 24 = 16 B(4) = 2222 = 216 = 65536 94 / 114
  • 242.
    For Example We havethat B(−1) = 1 B(0) = 0 B(1) = 2 B(2) = 22 = 4 B(3) = 222 = 24 = 16 B(4) = 2222 = 216 = 65536 94 / 114
  • 243.
    For Example We havethat B(−1) = 1 B(0) = 0 B(1) = 2 B(2) = 22 = 4 B(3) = 222 = 24 = 16 B(4) = 2222 = 216 = 65536 94 / 114
  • 244.
    For Example We havethat B(−1) = 1 B(0) = 0 B(1) = 2 B(2) = 22 = 4 B(3) = 222 = 24 = 16 B(4) = 2222 = 216 = 65536 94 / 114
  • 245.
    For Example Thus, wehave Block j Set of Ranks 0 0,1 1 2 2 3,4 3 5,...,16 4 17,...,65536 ... ... Note B(j) = 2B(j−1) for j > 0. 95 / 114
  • 246.
    For Example Thus, wehave Block j Set of Ranks 0 0,1 1 2 2 3,4 3 5,...,16 4 17,...,65536 ... ... Note B(j) = 2B(j−1) for j > 0. 95 / 114
  • 247.
    Example Now you givea Union(4, 5) 2/2 1/1 6/0 7/0 4/3 9/03/1 8/0 5/1 10/0 Bock 0 Bock 1 Bock 2 96 / 114
  • 248.
    Finally Given our Boundin the Ranks Thus, all the blocks from B (0) to B (log∗ n − 1) will be used for storing the ranking elements 97 / 114
  • 249.
    Charging for FindSets Twotypes of charges for FindSet(x0) Block charges and Path charges. Charge each node as either: 1) Block Charge 2) Path Charge 98 / 114
  • 250.
    Charging for FindSets Thus,for find sets The find operation pays for the work done for the root and its immediate child. It also pays for all the nodes which are not in the same block as their parents. 99 / 114
  • 251.
    Charging for FindSets Thus,for find sets The find operation pays for the work done for the root and its immediate child. It also pays for all the nodes which are not in the same block as their parents. 99 / 114
  • 252.
    Then First 1 All thesenodes are children of some other nodes, so their ranks will not change and they are bound to stay in the same block until the end of the computation. 2 If a node is in the same block as its parent, it will be charged for the work done in the FindSet Operation!!! 100 / 114
  • 253.
    Then First 1 All thesenodes are children of some other nodes, so their ranks will not change and they are bound to stay in the same block until the end of the computation. 2 If a node is in the same block as its parent, it will be charged for the work done in the FindSet Operation!!! 100 / 114
  • 254.
    Thus We have thefollowing charges Block Charge : For j = 0, 1, ..., log∗ n − 1, give one block charge to the last node with rank in block j on the path x0, x1, ..., xl. Also give one block charge to the child of the root, i.e., xl−1, and the root itself, i.e., xl−1. Path Charge : Give nodes in x0, ..., xl a path charge until they are moved to point to a name element with a rank different from the child’s block 101 / 114
  • 255.
    Thus We have thefollowing charges Block Charge : For j = 0, 1, ..., log∗ n − 1, give one block charge to the last node with rank in block j on the path x0, x1, ..., xl. Also give one block charge to the child of the root, i.e., xl−1, and the root itself, i.e., xl−1. Path Charge : Give nodes in x0, ..., xl a path charge until they are moved to point to a name element with a rank different from the child’s block 101 / 114
  • 256.
    Thus We have thefollowing charges Block Charge : For j = 0, 1, ..., log∗ n − 1, give one block charge to the last node with rank in block j on the path x0, x1, ..., xl. Also give one block charge to the child of the root, i.e., xl−1, and the root itself, i.e., xl−1. Path Charge : Give nodes in x0, ..., xl a path charge until they are moved to point to a name element with a rank different from the child’s block 101 / 114
  • 257.
    Thus We have thefollowing charges Block Charge : For j = 0, 1, ..., log∗ n − 1, give one block charge to the last node with rank in block j on the path x0, x1, ..., xl. Also give one block charge to the child of the root, i.e., xl−1, and the root itself, i.e., xl−1. Path Charge : Give nodes in x0, ..., xl a path charge until they are moved to point to a name element with a rank different from the child’s block 101 / 114
  • 258.
    Thus We have thefollowing charges Block Charge : For j = 0, 1, ..., log∗ n − 1, give one block charge to the last node with rank in block j on the path x0, x1, ..., xl. Also give one block charge to the child of the root, i.e., xl−1, and the root itself, i.e., xl−1. Path Charge : Give nodes in x0, ..., xl a path charge until they are moved to point to a name element with a rank different from the child’s block 101 / 114
  • 259.
    Charging for FindSets Twotypes of charges for FindSet(x0) Block charges and Path charges. Charge each node as either: 1) Block Charge 2) Path Charge 102 / 114
  • 260.
    Charging for FindSets Twotypes of charges for FindSet(x0) Block charges and Path charges. Charge each node as either: 1) Block Charge 2) Path Charge 102 / 114
  • 261.
    Next Something Notable Number ofnodes whose parents are in different blocks is limited by (log∗ n) − 1. Making it an upper bound for the charges when changing the last node with rank in block j. 2 charges for the root and its child. Thus The cost of the Block Charges for the FindSet operation is upper bounded by: log∗ n − 1 + 2 = log∗ n + 1. (8) 103 / 114
  • 262.
    Next Something Notable Number ofnodes whose parents are in different blocks is limited by (log∗ n) − 1. Making it an upper bound for the charges when changing the last node with rank in block j. 2 charges for the root and its child. Thus The cost of the Block Charges for the FindSet operation is upper bounded by: log∗ n − 1 + 2 = log∗ n + 1. (8) 103 / 114
  • 263.
    Next Something Notable Number ofnodes whose parents are in different blocks is limited by (log∗ n) − 1. Making it an upper bound for the charges when changing the last node with rank in block j. 2 charges for the root and its child. Thus The cost of the Block Charges for the FindSet operation is upper bounded by: log∗ n − 1 + 2 = log∗ n + 1. (8) 103 / 114
  • 264.
    Next Something Notable Number ofnodes whose parents are in different blocks is limited by (log∗ n) − 1. Making it an upper bound for the charges when changing the last node with rank in block j. 2 charges for the root and its child. Thus The cost of the Block Charges for the FindSet operation is upper bounded by: log∗ n − 1 + 2 = log∗ n + 1. (8) 103 / 114
  • 265.
    Next Something Notable Number ofnodes whose parents are in different blocks is limited by (log∗ n) − 1. Making it an upper bound for the charges when changing the last node with rank in block j. 2 charges for the root and its child. Thus The cost of the Block Charges for the FindSet operation is upper bounded by: log∗ n − 1 + 2 = log∗ n + 1. (8) 103 / 114
  • 266.
    Claim Claim Once a nodeother than a root or its child is given a Block Charge (B.C.), it will never be given a Path Charge (P.C.) 104 / 114
  • 267.
    Proof Proof Given a nodex, we know that: rank [p [x]] − rank [x] is monotonically increasing ⇒ log∗ rank [p [x]] − log∗ rank [x] is monotonically increasing. Thus, Once x and p[x] are in different blocks, they will always be in different blocks because: The rank of the parent can only increases. And the child’s rank stays the same Thus, the node x will be billed in the first FindSet operation a patch charge and block charge if necessary. Thus, the node x will never be charged again a path charge because is already pointing to the member set name. 105 / 114
  • 268.
    Proof Proof Given a nodex, we know that: rank [p [x]] − rank [x] is monotonically increasing ⇒ log∗ rank [p [x]] − log∗ rank [x] is monotonically increasing. Thus, Once x and p[x] are in different blocks, they will always be in different blocks because: The rank of the parent can only increases. And the child’s rank stays the same Thus, the node x will be billed in the first FindSet operation a patch charge and block charge if necessary. Thus, the node x will never be charged again a path charge because is already pointing to the member set name. 105 / 114
  • 269.
    Proof Proof Given a nodex, we know that: rank [p [x]] − rank [x] is monotonically increasing ⇒ log∗ rank [p [x]] − log∗ rank [x] is monotonically increasing. Thus, Once x and p[x] are in different blocks, they will always be in different blocks because: The rank of the parent can only increases. And the child’s rank stays the same Thus, the node x will be billed in the first FindSet operation a patch charge and block charge if necessary. Thus, the node x will never be charged again a path charge because is already pointing to the member set name. 105 / 114
  • 270.
    Proof Proof Given a nodex, we know that: rank [p [x]] − rank [x] is monotonically increasing ⇒ log∗ rank [p [x]] − log∗ rank [x] is monotonically increasing. Thus, Once x and p[x] are in different blocks, they will always be in different blocks because: The rank of the parent can only increases. And the child’s rank stays the same Thus, the node x will be billed in the first FindSet operation a patch charge and block charge if necessary. Thus, the node x will never be charged again a path charge because is already pointing to the member set name. 105 / 114
  • 271.
    Proof Proof Given a nodex, we know that: rank [p [x]] − rank [x] is monotonically increasing ⇒ log∗ rank [p [x]] − log∗ rank [x] is monotonically increasing. Thus, Once x and p[x] are in different blocks, they will always be in different blocks because: The rank of the parent can only increases. And the child’s rank stays the same Thus, the node x will be billed in the first FindSet operation a patch charge and block charge if necessary. Thus, the node x will never be charged again a path charge because is already pointing to the member set name. 105 / 114
  • 272.
    Proof Proof Given a nodex, we know that: rank [p [x]] − rank [x] is monotonically increasing ⇒ log∗ rank [p [x]] − log∗ rank [x] is monotonically increasing. Thus, Once x and p[x] are in different blocks, they will always be in different blocks because: The rank of the parent can only increases. And the child’s rank stays the same Thus, the node x will be billed in the first FindSet operation a patch charge and block charge if necessary. Thus, the node x will never be charged again a path charge because is already pointing to the member set name. 105 / 114
  • 273.
    Proof Proof Given a nodex, we know that: rank [p [x]] − rank [x] is monotonically increasing ⇒ log∗ rank [p [x]] − log∗ rank [x] is monotonically increasing. Thus, Once x and p[x] are in different blocks, they will always be in different blocks because: The rank of the parent can only increases. And the child’s rank stays the same Thus, the node x will be billed in the first FindSet operation a patch charge and block charge if necessary. Thus, the node x will never be charged again a path charge because is already pointing to the member set name. 105 / 114
  • 274.
    Remaining Goal The Totalcost of the FindSet’s Operations Total cost of FindSet’s = Total Block Charges + Total Path Charges. We want to show Total Block Charges + Total Path Charges= O(m log∗ n) 106 / 114
  • 275.
    Remaining Goal The Totalcost of the FindSet’s Operations Total cost of FindSet’s = Total Block Charges + Total Path Charges. We want to show Total Block Charges + Total Path Charges= O(m log∗ n) 106 / 114
  • 276.
    Bounding Block Charges Thispart is easy Block numbers range over 0, ..., log∗ n − 1. The number of Block Charges per FindSet is ≤ log∗ n + 1 . The total number of FindSet’s is ≤ m The total number of Block Charges is ≤ m(log∗ n + 1) . 107 / 114
  • 277.
    Bounding Block Charges Thispart is easy Block numbers range over 0, ..., log∗ n − 1. The number of Block Charges per FindSet is ≤ log∗ n + 1 . The total number of FindSet’s is ≤ m The total number of Block Charges is ≤ m(log∗ n + 1) . 107 / 114
  • 278.
    Bounding Block Charges Thispart is easy Block numbers range over 0, ..., log∗ n − 1. The number of Block Charges per FindSet is ≤ log∗ n + 1 . The total number of FindSet’s is ≤ m The total number of Block Charges is ≤ m(log∗ n + 1) . 107 / 114
  • 279.
    Bounding Block Charges Thispart is easy Block numbers range over 0, ..., log∗ n − 1. The number of Block Charges per FindSet is ≤ log∗ n + 1 . The total number of FindSet’s is ≤ m The total number of Block Charges is ≤ m(log∗ n + 1) . 107 / 114
  • 280.
    Bounding Path Charges Claim LetN(j) be the number of nodes whose ranks are in block j. Then, for all j ≥ 0, N(j) ≤ 3n 2B(j) Proof By Lemma 3, N(j) ≤ B(j) r=B(j−1)+1 n 2r summing over all possible ranks For j = 0: N (0) ≤ n 20 + n 2 = 3n 2 = 3n 2B(0) 108 / 114
  • 281.
    Bounding Path Charges Claim LetN(j) be the number of nodes whose ranks are in block j. Then, for all j ≥ 0, N(j) ≤ 3n 2B(j) Proof By Lemma 3, N(j) ≤ B(j) r=B(j−1)+1 n 2r summing over all possible ranks For j = 0: N (0) ≤ n 20 + n 2 = 3n 2 = 3n 2B(0) 108 / 114
  • 282.
    Bounding Path Charges Claim LetN(j) be the number of nodes whose ranks are in block j. Then, for all j ≥ 0, N(j) ≤ 3n 2B(j) Proof By Lemma 3, N(j) ≤ B(j) r=B(j−1)+1 n 2r summing over all possible ranks For j = 0: N (0) ≤ n 20 + n 2 = 3n 2 = 3n 2B(0) 108 / 114
  • 283.
    Bounding Path Charges Claim LetN(j) be the number of nodes whose ranks are in block j. Then, for all j ≥ 0, N(j) ≤ 3n 2B(j) Proof By Lemma 3, N(j) ≤ B(j) r=B(j−1)+1 n 2r summing over all possible ranks For j = 0: N (0) ≤ n 20 + n 2 = 3n 2 = 3n 2B(0) 108 / 114
  • 284.
    Bounding Path Charges Claim LetN(j) be the number of nodes whose ranks are in block j. Then, for all j ≥ 0, N(j) ≤ 3n 2B(j) Proof By Lemma 3, N(j) ≤ B(j) r=B(j−1)+1 n 2r summing over all possible ranks For j = 0: N (0) ≤ n 20 + n 2 = 3n 2 = 3n 2B(0) 108 / 114
  • 285.
    Proof of claim Forj ≥ 1 N(j) ≤ n 2B(j−1)+1 B(j)−(B(j−1)+1) r=0 1 2r < n 2B(j−1)+1 ∞ r=0 1 2r = n 2B(j−1) This is where the fact that B (j) = 2B(j−1)is used. = n B(j) < 3n 2B(j) 109 / 114
  • 286.
    Proof of claim Forj ≥ 1 N(j) ≤ n 2B(j−1)+1 B(j)−(B(j−1)+1) r=0 1 2r < n 2B(j−1)+1 ∞ r=0 1 2r = n 2B(j−1) This is where the fact that B (j) = 2B(j−1)is used. = n B(j) < 3n 2B(j) 109 / 114
  • 287.
    Proof of claim Forj ≥ 1 N(j) ≤ n 2B(j−1)+1 B(j)−(B(j−1)+1) r=0 1 2r < n 2B(j−1)+1 ∞ r=0 1 2r = n 2B(j−1) This is where the fact that B (j) = 2B(j−1)is used. = n B(j) < 3n 2B(j) 109 / 114
  • 288.
    Proof of claim Forj ≥ 1 N(j) ≤ n 2B(j−1)+1 B(j)−(B(j−1)+1) r=0 1 2r < n 2B(j−1)+1 ∞ r=0 1 2r = n 2B(j−1) This is where the fact that B (j) = 2B(j−1)is used. = n B(j) < 3n 2B(j) 109 / 114
  • 289.
    Proof of claim Forj ≥ 1 N(j) ≤ n 2B(j−1)+1 B(j)−(B(j−1)+1) r=0 1 2r < n 2B(j−1)+1 ∞ r=0 1 2r = n 2B(j−1) This is where the fact that B (j) = 2B(j−1)is used. = n B(j) < 3n 2B(j) 109 / 114
  • 290.
    Bounding Path Charges Wehave the following Let P(n) denote the overall number of path charges. Then: P(n) ≤ log∗ n−1 j=0 αj · βj (9) αj is the max number of nodes with ranks in Block j βj is the max number of path charges per node of Block j. 110 / 114
  • 291.
    Bounding Path Charges Wehave the following Let P(n) denote the overall number of path charges. Then: P(n) ≤ log∗ n−1 j=0 αj · βj (9) αj is the max number of nodes with ranks in Block j βj is the max number of path charges per node of Block j. 110 / 114
  • 292.
    Bounding Path Charges Wehave the following Let P(n) denote the overall number of path charges. Then: P(n) ≤ log∗ n−1 j=0 αj · βj (9) αj is the max number of nodes with ranks in Block j βj is the max number of path charges per node of Block j. 110 / 114
  • 293.
    Then, we havethe following Upper Bounds By claim, αj upper-bounded by 3n 2B(j) , In addition, we need to bound βj that represents the maximum number of path charges for nodes x at block j. Note: Any node in Block j that is given a P.C. will be in Block j after all m operations. 111 / 114
  • 294.
    Then, we havethe following Upper Bounds By claim, αj upper-bounded by 3n 2B(j) , In addition, we need to bound βj that represents the maximum number of path charges for nodes x at block j. Note: Any node in Block j that is given a P.C. will be in Block j after all m operations. 111 / 114
  • 295.
    Then, we havethe following Upper Bounds By claim, αj upper-bounded by 3n 2B(j) , In addition, we need to bound βj that represents the maximum number of path charges for nodes x at block j. Note: Any node in Block j that is given a P.C. will be in Block j after all m operations. 111 / 114
  • 296.
    Then, we havethe following Upper Bounds By claim, αj upper-bounded by 3n 2B(j) , In addition, we need to bound βj that represents the maximum number of path charges for nodes x at block j. Note: Any node in Block j that is given a P.C. will be in Block j after all m operations. Path Compression is issued 111 / 114
  • 297.
    Now, we boundβj So, every time x is assessed a Path Charges, it gets a new parent with increased rank. Note: x’s rank is not changed by path compression Suppose x has a rank in Block j Repeated Path Charges to x will ultimately result in x’s parent having a rank in a Block higher than j. From that point onward, x is given Block Charges, not Path Charges. Therefore, the Worst Case x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents ranks successively take on the values. B(j − 1) + 2, B(j − 1) + 3, ..., B(j) 112 / 114
  • 298.
    Now, we boundβj So, every time x is assessed a Path Charges, it gets a new parent with increased rank. Note: x’s rank is not changed by path compression Suppose x has a rank in Block j Repeated Path Charges to x will ultimately result in x’s parent having a rank in a Block higher than j. From that point onward, x is given Block Charges, not Path Charges. Therefore, the Worst Case x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents ranks successively take on the values. B(j − 1) + 2, B(j − 1) + 3, ..., B(j) 112 / 114
  • 299.
    Now, we boundβj So, every time x is assessed a Path Charges, it gets a new parent with increased rank. Note: x’s rank is not changed by path compression Suppose x has a rank in Block j Repeated Path Charges to x will ultimately result in x’s parent having a rank in a Block higher than j. From that point onward, x is given Block Charges, not Path Charges. Therefore, the Worst Case x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents ranks successively take on the values. B(j − 1) + 2, B(j − 1) + 3, ..., B(j) 112 / 114
  • 300.
    Now, we boundβj So, every time x is assessed a Path Charges, it gets a new parent with increased rank. Note: x’s rank is not changed by path compression Suppose x has a rank in Block j Repeated Path Charges to x will ultimately result in x’s parent having a rank in a Block higher than j. From that point onward, x is given Block Charges, not Path Charges. Therefore, the Worst Case x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents ranks successively take on the values. B(j − 1) + 2, B(j − 1) + 3, ..., B(j) 112 / 114
  • 301.
    Now, we boundβj So, every time x is assessed a Path Charges, it gets a new parent with increased rank. Note: x’s rank is not changed by path compression Suppose x has a rank in Block j Repeated Path Charges to x will ultimately result in x’s parent having a rank in a Block higher than j. From that point onward, x is given Block Charges, not Path Charges. Therefore, the Worst Case x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents ranks successively take on the values. B(j − 1) + 2, B(j − 1) + 3, ..., B(j) 112 / 114
  • 302.
    Finally Hence, x canbe given at most B(j) − B(j − 1) − 1 Path Charges. Therefore: P(n) ≤ log∗ n−1 j=0 3n 2B(j)(B(j) − B(j − 1) − 1) P(n) ≤ log∗ n−1 j=0 3n 2B(j) B(j) P(n) = 3 2n log∗ n 113 / 114
  • 303.
    Finally Hence, x canbe given at most B(j) − B(j − 1) − 1 Path Charges. Therefore: P(n) ≤ log∗ n−1 j=0 3n 2B(j)(B(j) − B(j − 1) − 1) P(n) ≤ log∗ n−1 j=0 3n 2B(j) B(j) P(n) = 3 2n log∗ n 113 / 114
  • 304.
    Finally Hence, x canbe given at most B(j) − B(j − 1) − 1 Path Charges. Therefore: P(n) ≤ log∗ n−1 j=0 3n 2B(j)(B(j) − B(j − 1) − 1) P(n) ≤ log∗ n−1 j=0 3n 2B(j) B(j) P(n) = 3 2n log∗ n 113 / 114
  • 305.
    Finally Hence, x canbe given at most B(j) − B(j − 1) − 1 Path Charges. Therefore: P(n) ≤ log∗ n−1 j=0 3n 2B(j)(B(j) − B(j − 1) − 1) P(n) ≤ log∗ n−1 j=0 3n 2B(j) B(j) P(n) = 3 2n log∗ n 113 / 114
  • 306.
    Thus FindSet operations contribute O(m(log∗ n+ 1) + n log∗ n) = O(m log∗ n) (10) MakeSet and Link contribute O(n) Entire sequence takes O (m log∗ n). 114 / 114
  • 307.
    Thus FindSet operations contribute O(m(log∗ n+ 1) + n log∗ n) = O(m log∗ n) (10) MakeSet and Link contribute O(n) Entire sequence takes O (m log∗ n). 114 / 114