2. Main Contributions
• New conditional entropy for trees called the
tree degree entropy and
• a succinct tree representation with matching
size (ultra-succinct data structure).
• Compressed DFUDS and its applications
3. Background
• Succinct data structures
• Ordered Trees
• Succinct Representations of Ordered Trees
– BP (balanced parentheses)
– DFUDS (depth first unary degree sequence)
4. 4
Succinct Data Structures (copied)
• Succinct data structures = succinct representation
of data + succinct index
• Examples
– sets
– trees, graphs
– strings
– permutations, functions
5. 5
Bit Vectors (copied)
• B: 0,1 vector of length n B[0]B[1]…B[n1]
• lower bound of size = log 2n = n bits
• queries
– rank(B, x): number of ones in B[0..x]=B[0]B[1]…B[x]
– select(B, i): position of i-th 1 from the head (i 1)
• basics of all succinct data structures
• naïve data structure
– store all the answers in arrays
– 2n words (2 n log n bits)
– O(1) time queries B = 1001010001000000
0 3 5 9
n = 16
6. 6
2 6
8
1
73 54
Ordered Trees (copied)
• ordered tree / ordinal tree
• Rooted trees
• Children of each node are ordered
• No labels
– With edge labels → cardinal tree
Succinct Representations of Ordered Trees
7. 7
2 6
8
1
73 54
P ((()()())(()()))
1
2
3 4 5
6
7 8
BP
BP Representation (copied)
• Each node is represented by a pair of matching
open and close parentheses
• 2n bits for n nodes
• The size matches the lower bound
8. 8
Basic Operations on BP (copied)
• A node is represented by the position of (
• findclose(P,i): returns the position of )matching with( at P[i]
• enclose(P,i): returns the position of ( which encloses ( at P[i]
(()((()())())(()())())
findcloseenclose
21 3 84 5 6 7 9 10 11
1
3 8
4
2
5 6
7
9 10
11
P
9. 9
DFUDS Representation (copied)
• It encodes the degrees of nodes in unary codes in
depth-first order
(DFUDS = Depth First Unary Degree Sequence)
• Degree d ⇒ d (’s, followed by a )
• Add a dummy ( at the beginning
• 2n bits
2 6
8
1
73 54
1
U ((()((())))(()))
2 3 4 5 6 7 8
DFUDS
11. Tree Degree Entropy
• For an ordered tree T with n nodes, having ni
nodes with i children
H*(T) = ∑(ni/n) log(n/ni)
12. Main Results: Theorem 1
• For any rooted ordered tree T with n nodes,
there exists a data structure using
nH∗ (T ) + O(n log logn/ logn)
bits such that any consecutive logn bits of
DFUDS of T can be computed in constant time
on word RAM.
13. Main Results: Theorem 2
• The lowest common ancestor between any
two given nodes, the depth, and the level-
ancestor of a given node can be computed in
constant time on the DFUDS using
O(n(log logn)2/ logn)-bit
auxiliary data structures
14. 14
LCA on DFUDS (copied)
• Can be computed by almost the same
operation for BP
• lca(x,y) = parent(RMQE(x,y1)+1)
• Leftmost minimum is used 2 6
8
1
73 54
1
U ((()((())))(()))
2 3 4 5 6 7 8
DFUDS
P ((()()())(()()))BP
E 1232323212323210
E 1232345432123210
15. LCA on DFUDS
• E[i] = (number of ( in U[0..i]) − (number of ) in
U[0..i]).
E[ri] = E[ri−1] − 1 = E[r0] − i (1 i k),
E[ j] > E[ri ] (li <= j < ri ).
16. 16
Let E[i] = rank((U,i) rank)(U,i)
Let T1, T2,...,Tk denote subtrees of v,
DFUDS of v be U[l0..r0], E[r0] = d,
DFUDS of Ti be U[li..ri].
Lemma: E[ri] = E[ri-1]1 = di (1 i k)
E[j] > E[ri] (li j < ri)
(((()(())))((())))
123434543212343210
62
4
1
3 7 98
5
vU
E
v
d
r0 r1 r2 r3