2. Introduction
• “Text book” data structures are sufficient for
many tasks, but not all
• Rarely need to create new data structures
• Often sufficient to “augment” an existing
data structure with additional information
and operations
• Not always straightforward - must be able to
maintain added information with existing
operations
3. §15.1 Dynamic Order Statistics
• Recall from Chapter 10 that from an
unordered set, we can retrieve the ith order
statistic from a set of n elements in O(n) time
• Red-black trees can be augmented to allow
for fast retrieval of order statistics
• We shall also allow for quick determination
of the rank of an element
4. Order Statistic Trees
• Standard red-black tree
with an additional
size field (the bottom
number in each node)
• size contains # of
nodes in subtree rooted
at x, including x
• If nil->size = 0, then:
x->size = x->left->size +
x->right->size + 1
14
7
10
4
7
2
3
1
12
1
16
2
14
1
17
12
19
2
21
4
21
1
20
1
28
1
35
1
39
1
38
3
30
5
47
1
41
7
26
20
5. Retrieving Elements of a Given Rank
ostree::Select(node *x, int i)
{
int r = x->left->size + 1;
if ( i == r )
return x;
else if ( i < r )
return Select(x->left, i);
else
return Select(x->right, i-r);
}
• x->left->size contains the
number of nodes that come
before x in an inorder tree walk
– x’s rank within it’s subtree is
therefore x->left->size + 1
• Recursive selection similar to the
algorithms we saw in Chapter 10
– If x is correct order stat, return
– If x > correct order stat, recurse
left
– If x < correct order stat, recurse
right, looking for the (i-r)th order
statistic in the right subtree
7. Analysis of ostree::Select
• Each level of recursion descends one
level of the OS tree
– Therefore, ostree::Select is at worst O(h),
where h is the height of the tree
– Since the height of the tree is known to be
O(lgn), ostree::Select has running time
O(lgn)
8. Determining the Rank of an Element
ostree::Rank(node *x)
{
int r = x->left->size + 1;
node *y = x;
while ( y != root )
{
if ( y == y->parent->right)
r += y->parent->left->size + 1;
y = y->parent;
}
return r;
}
• The rank of a node x =
the # of nodes that
precede it, + 1 for itself
• r is maintained as the
rank of x in the subtree
rooted at y - which
denotes our position in
the tree
– To start, r is the rank of x
in it’s subtree
9. Determining the Rank of an Element
ostree::Rank(node *x)
{
int r = x->left->size + 1;
node *y = x;
while ( y != root )
{
if ( y == y->parent->right)
r += y->parent->left->size + 1;
y = y->parent;
}
return r;
}
• Each loop ascends the
tree, and calculates x’s
rank in that subtree
– If y is a left child, the
rank is unchanged
– If y is a right child, the
rank is equal to the size
of the left subtree, plus 1
for the parent node
11. Analysis of ostree::Rank
• Each loop ascends one level of the OS
tree
– Therefore, ostree::Rank is at worst O(h),
where h is the height of the tree
– Since the height of the tree is known to be
O(lgn), ostree::Rank has running time
O(lgn)
12. Maintaining Subtree Sizes
• ostree::Select & ostree::Rank are only
useful if we can efficiently maintain the
size field
• To be truly efficient, these fields must
be maintained through the basic
maintenance operations of the tree
13. Maintaining Subtree Sizes
• Insertion
– Recall the two operations: bst::Insert, and performing
rotations
• bst::Insert
– As we descend the tree to perform insertion, increment size
field of all traversed nodes
• Rotation
– Only the size fields of the rotated nodes are affected
– The new parent node simply assumes the size of the old
parent node
– The rotated node must now assume calculate it’s size as the
sum of it’s children, plus one
14. Maintaining Subtree Sizes
93
19
42
11
6 4
7
x
y
93
12
42
19
6
4 7
y
x
RightRotate(y)
LeftRotate(x)
Size Maintenance Through Rotation:
y->size = x->size;
x->size = x->left->size + x->right->size + 1;
What is total added cost to rotation?
15. Maintaining Subtree Sizes
• Deletion
– Also has two phases: one to delete the node, the
other to maintain the tree with at most three
rotations
– We already know the added cost of rotation
– When we splice out a node, we can traverse up the
tree, and decrement one from every node along
it’s path
• This requires O(lgn) additional time
16. Maintaining Subtree Sizes
• Analysis:
– Insertion is changed by at most O(1)
– Deletion is changed by at most O(lgn)
– Thus, the total asymptotic running time of
insertion and deletion is unchanged at
O(lgn)
17. §15.2 How To Augment A Data Structure
• Four steps:
– Choosing an underlying data structure;
– Determining additional information to be
maintained in the underlying data structure;
– Verifying that the additional information can be
maintained by the basic modifying operations on
the underlying data structure; and
– Developing new operations
• Note: this isn’t a “formula”, but a good
starting point
18. Augmenting Red-Black Trees for
Order Statistics
• Step 1:
– We chose red-black trees as the underlying data structure
due to efficient support of other dynamic-set operations
• Step 2:
– Augmented nodes with the size field, to allow the desired
operations to be more efficient
• Step 3:
– We ensured that insert and delete can maintain the new field
and still operate in O(lgn) time
• Step 4:
– We developed ostree::Select and ostree::Rank
19. Why Augment Red-Black Trees?
• From Theorem 15.1:
– If the new field can computed and
maintained using only the information in
nodes x, x->left, and x->right, then we can
maintain the values of the new field in all
nodes during insertion and deletion
without asymptotically affecting the O(lgn)
performance of these operations
20. §15.3 Interval Trees
• An interval is a pair of real numbers used to specify a
range of values
– A closed interval [t1, t2] specifies a range that includes the
endpoints
– An open interval (t1, t2) specifies a range that excludes the
endpoints
– A half-open interval [t1, t2) or (t1, t2] excludes one of the
endpoints
• E.g., consider a log that stores events sorted by time
– We may want to query the log to find out what happened during a
given time interval
21. Intervals
• Assume intervals are represented as
structs with two fields: lo and hi
• Consider two intervals x and y
– Any two intervals must satisfy the interval
trichotomy:
•x and y overlap (x.lo <= y.hi && y.lo <= x.hi)
•x lies completely to the left of y (x.hi < y.lo)
•x lies completely to the right of y (y.hi < x.lo)
22. Interval Trees
• Interval Tree is a red-black tree that maintains
a dynamic set of nodes
– Each node contains an interval
• Support these operations:
– Insertion - adds an element to the tree
– Deletion - removes an element from the tree
– Search - searches for an interval that overlaps the
requested interval
24. Interval Trees
• The interval tree stores intervals, and is
sorted by the low endpoint
• Each node contains an additional field, max,
which is the maximum value of any interval
endpoint stored in its subtree
– Maintained through insertion/deletion with this
O(1) statement:
x->max = max(x->interval->hi, x->left->max, x->right->max)
– What about through rotations? The same
operation applies, to the new parent node
25. Interval Trees: New Operations
• The only new operation is the Search operation:
intervalTree::Search(interval i)
{
node *x = root;
while ( x != NULL && !Overlap(x->interval, i) )
{
if ( x->left != NULL &&
x->left->max >= i->lo )
x = x->left;
else
x = x->right;
}
}
27. Interval Tree Search
• Interval tree search algorithm finds the
first overlapping interval
• How could we find all overlapping
intervals?
28. Why Interval Tree Search Works
• Recall this part of the interval tree search algorithm:
if ( x->left != NULL && x->left->max >= i->lo )
x = x->left;
else
x = x->right;
• If the else is executed, then the left branch is NULL, or the lo
end of the interval we’re searching for is to the right of highest
hi endpoint in the left subtree, so if an interval exists it must be
in the right subtree
• If the first branch is executed, then the max value in the left
subtree is greater than lo value of the interval we’re searching
for - so there can be an overlapping interval in the left subtree
29. Why Interval Tree Search Works
• Why aren’t there any in the right subtree?
– Assume there are no overlapping intervals
– The tree is sorted by the lo endpoint
– All nodes in the right subtree have lo endpoints > all nodes
in the right subtree
– Since i->lo < x->left->max, then for some node in the left
subtree, there’s a node with hi endpoint = x->left->max, and
lo endpoint < all lo endpoints in right subtree
– If there are no overlapping intervals, it follows that i->lo <
that node’s lo interval (otherwise it would overlap), so
therefore if an overlapping interval existed it would have to
be in the left subtree