Dec. 09, 2015
Wei Li
Zehao Cai
Ishan Sharma
Time Complexity of Union Find
1Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Algorithm Definition
Disjoint-set data structure is a data structure that keeps track of a
set of elements partitioned into a number of disjoint (non-overlapping)
subsets.
Union find algorithm
supports three operations on a set of elements:
• MAKE-SET(x). Create a new set containing only element x.
• FIND(x). Return a canonical element in the set containing x.
• UNION(x, y). Merge the sets containing x and y.
Implementation: Linked-list, Tree(Often)
2Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Find(b) = c | Find(d) = f | Find(b) = f
b → h → c | d → f | b → h → c → f
Quick-find & Quick-union
3Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Definition: The rank of a node x is similar to the height of x.
When performing the operation Union(x, y), we compare rank(x) and
rank(y):
• If rank(x) < rank(y), make y the parent of x.
• If rank(x) > rank(y), make x the parent of y.
• If rank(x) = rank(y), make y the parent of x and increase the rank of
y by one.
First Optimization: Union By Rank Heuristic
4Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Note. In this case, rank = height.
During the execution of Find(e), e and all intermediate vertices on
the path from e to the root are made children of the root x.
Second Optimization: Path Compression
5Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
6Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Union by Rank & Path Compression
This is why we call it “union by rank” rather than “union by height”.
Algorithms Worst-case time
Quick-find 𝑚𝑛
Quick-union 𝑚𝑛
QU + Union by Rank 𝑛 + 𝑚𝑙𝑜𝑔𝑛
QU + Path compression 𝑛 + 𝑚𝑙𝑜𝑔𝑛
QU + Union by rank + Path compression 𝒏 + 𝒎𝒍𝒐𝒈∗
𝒏
m union-find operations on a set of n objects.
Time Complexity
7Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Lemma 1: as the find function follows the path along to
the root, the rank of node it encounters is increasing.
Union: a tree with smaller rank will be attached to a tree with greater
rank, rather than vice versa.
Find: all nodes visited along the path will be attached to the root,
which has larger rank than its children.
8Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Lemma 2: A node u which is root of a sub-tree with
rank r has at least 2r nodes.
Proof: Initially when each node is the root of its own tree, it's trivially true.
Assume that a node u with rank r has at least 2r nodes. Then when two
tree with rank r Unions by Rank and form a tree with rank r + 1, the new
node has at least 2r + 2r = 2r + 1 nodes.
9Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Lemma 3: The maximum number of nodes of rank r is
at most n/2r.
Proof: From lemma 2, we know that a node u which is root of a sub-tree
with rank r has at least 2r nodes. We will get the maximum number of nodes
of rank r when each node with rank r is the root of a tree that has exactly 2r
nodes. In this case, the number of nodes of rank r is n / 2r
10Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
We define “bucket” here: a bucket is a set that contains vertices with
particular ranks.
Proof
11Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
𝒍𝒐𝒈∗ 𝒏
𝑙𝑜𝑔∗
𝑛 ∶= /
0																																					𝑖𝑓	𝑛 ≤ 1
1 + 𝑙𝑜𝑔∗
𝑙𝑜𝑔𝑛 										𝑖𝑓	𝑛 > 1
Definition: For all non-negative integer n, 𝑙𝑜𝑔∗
𝑛	is defined as
We have 𝑙𝑜𝑔∗
𝑛 ≤ 5	 unless n exceeds the atoms in the universe.
𝑙𝑜𝑔∗
29
= 1 +	 𝑙𝑜𝑔∗
2:
= 1
𝑙𝑜𝑔∗
16 = 𝑙𝑜𝑔∗
2<=
= 1 + 𝑙𝑜𝑔∗
2<
= 3
𝑙𝑜𝑔∗
65536 = 𝑙𝑜𝑔∗
2<==
= 1 + 𝑙𝑜𝑔∗
2<=
= 4
𝑙𝑜𝑔∗2@AAB@ = 𝑙𝑜𝑔∗2<===
= 1 + 𝑙𝑜𝑔∗2<==
= 5
𝑙𝑜𝑔∗
4 = 𝑙𝑜𝑔∗
2<
= 1 +	𝑙𝑜𝑔∗
29
= 2
12Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
We can make two observations about the buckets.
The total number of buckets is at most 𝒍𝒐𝒈∗ 𝒏.
Proof: When we go from one bucket to the next, we add one more two
to the power, that is, the next bucket to [B, 2B − 1] will be [2C
,2<E
− 1 ]
The maximum number of elements in bucket [B, 2B – 1] is at
most 𝒏.
Proof: The maximum number of elements in bucket [B, 2B – 1] is at
most 𝑛 2 𝐵⁄ +	 𝑛 2CI9⁄ + 	 𝑛 2CI<⁄ + ⋯ +	 𝑛 2<EK9
≤ 2 𝐵 − 1 − 𝐵 ∗ 𝑛/2 𝐵⁄ ≤ n
Proof
13Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Let F represent the list of "find" operations performed, and let
Then the total cost of m finds is T = T1 + T2 + T3
Proof
14Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Proof
15Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
T1 = constant time cost (1) per m operations: O(m)
T2 = maximum number of different buckets: O(𝑚	𝑙𝑜𝑔∗
𝑛)
T3 = for all buckets ( for all notes in one bucket)
= ∑ ∑
N
<O
<E
K9
PQC
RST∗
N
9
	≤ 𝑙𝑜𝑔∗
𝑛		 2C
− 1 − 𝐵
N
<E
					≤ 𝑙𝑜𝑔∗ 𝑛	2C 	
N
<E
= 𝑛	𝑙𝑜𝑔∗
𝑛	
Proof
T = T1 + T2 + T3 = O(m) + O(𝑚𝑙𝑜𝑔∗
𝑛) + O(𝑛𝑙𝑜𝑔∗
𝑛)
𝑚 ≥ 𝑛 → O(𝒎𝒍𝒐𝒈∗
𝒏)
16Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Algorithms Worst-case time
Quick-find 𝑚𝑛
Quick-union 𝑚𝑛
QU + Union by Rank 𝑛 + 𝑚𝑙𝑜𝑔𝑛
QU + Path compression 𝑛 + 𝑚𝑙𝑜𝑔𝑛
QU + Union by rank + Path compression 𝒏 + 𝒎𝒍𝒐𝒈∗
𝒏
m union-find operations on a set of n objects.
Time Complexity
17Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
Algorithm & Time Complexity
• Simple data structure, algorithm easy to implement.
• Complex to prove time complexity. (Proved in 1975, Tarjan,
Robert Endre )
• Time complexity is near linear.
Applications
• Keep track of the connected components of an undirected
graph;
• Find minimum spanning tree of a graph.
Conclusions
18Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
https://en.wikipedia.org/wiki/Proof_of_O(log*n)_time_complexity_of_u
nion%E2%80%93find
http://www.ccse.kfupm.edu.sa/~wasfi/Resources/ICS353CD/Lecture1
7/lec17_slide01.swf
http://sarielhp.org/teach/2004/b/webpage/lec/22_uf.pdf
https://www.cs.princeton.edu/courses/archive/spring13/cos423/lecture
s/UnionFind.pdf
References
19Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015

Time complexity of union find

  • 1.
    Dec. 09, 2015 WeiLi Zehao Cai Ishan Sharma Time Complexity of Union Find 1Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 2.
    Algorithm Definition Disjoint-set datastructure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. Union find algorithm supports three operations on a set of elements: • MAKE-SET(x). Create a new set containing only element x. • FIND(x). Return a canonical element in the set containing x. • UNION(x, y). Merge the sets containing x and y. Implementation: Linked-list, Tree(Often) 2Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 3.
    Find(b) = c| Find(d) = f | Find(b) = f b → h → c | d → f | b → h → c → f Quick-find & Quick-union 3Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 4.
    Definition: The rankof a node x is similar to the height of x. When performing the operation Union(x, y), we compare rank(x) and rank(y): • If rank(x) < rank(y), make y the parent of x. • If rank(x) > rank(y), make x the parent of y. • If rank(x) = rank(y), make y the parent of x and increase the rank of y by one. First Optimization: Union By Rank Heuristic 4Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015 Note. In this case, rank = height.
  • 5.
    During the executionof Find(e), e and all intermediate vertices on the path from e to the root are made children of the root x. Second Optimization: Path Compression 5Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 6.
    6Wei/Zehao/Ishan CSCI 6212/Arora/Fall2015 Union by Rank & Path Compression This is why we call it “union by rank” rather than “union by height”.
  • 7.
    Algorithms Worst-case time Quick-find𝑚𝑛 Quick-union 𝑚𝑛 QU + Union by Rank 𝑛 + 𝑚𝑙𝑜𝑔𝑛 QU + Path compression 𝑛 + 𝑚𝑙𝑜𝑔𝑛 QU + Union by rank + Path compression 𝒏 + 𝒎𝒍𝒐𝒈∗ 𝒏 m union-find operations on a set of n objects. Time Complexity 7Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 8.
    Lemma 1: asthe find function follows the path along to the root, the rank of node it encounters is increasing. Union: a tree with smaller rank will be attached to a tree with greater rank, rather than vice versa. Find: all nodes visited along the path will be attached to the root, which has larger rank than its children. 8Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 9.
    Lemma 2: Anode u which is root of a sub-tree with rank r has at least 2r nodes. Proof: Initially when each node is the root of its own tree, it's trivially true. Assume that a node u with rank r has at least 2r nodes. Then when two tree with rank r Unions by Rank and form a tree with rank r + 1, the new node has at least 2r + 2r = 2r + 1 nodes. 9Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 10.
    Lemma 3: Themaximum number of nodes of rank r is at most n/2r. Proof: From lemma 2, we know that a node u which is root of a sub-tree with rank r has at least 2r nodes. We will get the maximum number of nodes of rank r when each node with rank r is the root of a tree that has exactly 2r nodes. In this case, the number of nodes of rank r is n / 2r 10Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 11.
    We define “bucket”here: a bucket is a set that contains vertices with particular ranks. Proof 11Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 12.
    𝒍𝒐𝒈∗ 𝒏 𝑙𝑜𝑔∗ 𝑛 ∶=/ 0 𝑖𝑓 𝑛 ≤ 1 1 + 𝑙𝑜𝑔∗ 𝑙𝑜𝑔𝑛 𝑖𝑓 𝑛 > 1 Definition: For all non-negative integer n, 𝑙𝑜𝑔∗ 𝑛 is defined as We have 𝑙𝑜𝑔∗ 𝑛 ≤ 5 unless n exceeds the atoms in the universe. 𝑙𝑜𝑔∗ 29 = 1 + 𝑙𝑜𝑔∗ 2: = 1 𝑙𝑜𝑔∗ 16 = 𝑙𝑜𝑔∗ 2<= = 1 + 𝑙𝑜𝑔∗ 2< = 3 𝑙𝑜𝑔∗ 65536 = 𝑙𝑜𝑔∗ 2<== = 1 + 𝑙𝑜𝑔∗ 2<= = 4 𝑙𝑜𝑔∗2@AAB@ = 𝑙𝑜𝑔∗2<=== = 1 + 𝑙𝑜𝑔∗2<== = 5 𝑙𝑜𝑔∗ 4 = 𝑙𝑜𝑔∗ 2< = 1 + 𝑙𝑜𝑔∗ 29 = 2 12Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 13.
    We can maketwo observations about the buckets. The total number of buckets is at most 𝒍𝒐𝒈∗ 𝒏. Proof: When we go from one bucket to the next, we add one more two to the power, that is, the next bucket to [B, 2B − 1] will be [2C ,2<E − 1 ] The maximum number of elements in bucket [B, 2B – 1] is at most 𝒏. Proof: The maximum number of elements in bucket [B, 2B – 1] is at most 𝑛 2 𝐵⁄ + 𝑛 2CI9⁄ + 𝑛 2CI<⁄ + ⋯ + 𝑛 2<EK9 ≤ 2 𝐵 − 1 − 𝐵 ∗ 𝑛/2 𝐵⁄ ≤ n Proof 13Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 14.
    Let F representthe list of "find" operations performed, and let Then the total cost of m finds is T = T1 + T2 + T3 Proof 14Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 15.
  • 16.
    T1 = constanttime cost (1) per m operations: O(m) T2 = maximum number of different buckets: O(𝑚 𝑙𝑜𝑔∗ 𝑛) T3 = for all buckets ( for all notes in one bucket) = ∑ ∑ N <O <E K9 PQC RST∗ N 9 ≤ 𝑙𝑜𝑔∗ 𝑛 2C − 1 − 𝐵 N <E ≤ 𝑙𝑜𝑔∗ 𝑛 2C N <E = 𝑛 𝑙𝑜𝑔∗ 𝑛 Proof T = T1 + T2 + T3 = O(m) + O(𝑚𝑙𝑜𝑔∗ 𝑛) + O(𝑛𝑙𝑜𝑔∗ 𝑛) 𝑚 ≥ 𝑛 → O(𝒎𝒍𝒐𝒈∗ 𝒏) 16Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 17.
    Algorithms Worst-case time Quick-find𝑚𝑛 Quick-union 𝑚𝑛 QU + Union by Rank 𝑛 + 𝑚𝑙𝑜𝑔𝑛 QU + Path compression 𝑛 + 𝑚𝑙𝑜𝑔𝑛 QU + Union by rank + Path compression 𝒏 + 𝒎𝒍𝒐𝒈∗ 𝒏 m union-find operations on a set of n objects. Time Complexity 17Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 18.
    Algorithm & TimeComplexity • Simple data structure, algorithm easy to implement. • Complex to prove time complexity. (Proved in 1975, Tarjan, Robert Endre ) • Time complexity is near linear. Applications • Keep track of the connected components of an undirected graph; • Find minimum spanning tree of a graph. Conclusions 18Wei/Zehao/Ishan CSCI 6212/Arora/Fall 2015
  • 19.