1. Deletion <ul><li>Delete a node x as in ordinary binary search tree. Note that the last node deleted is a leaf. </li></ul><ul><li>Then trace the path from the new leaf towards the root . </li></ul><ul><li>For each node x encountered, check if heights of left(x) and right(x) differ by at most 1. If yes, proceed to parent(x). If not, perform an appropriate rotation at x. There are 4 cases as in the case of insertion. </li></ul><ul><li>For deletion, after we perform a rotation at x, we may have to perform a rotation at some ancestor of x. Thus, we must continue to trace the path until we reach the root . </li></ul>
2. Deletion <ul><li>On closer examination: the single rotations for deletion can be divided into 4 cases (instead of 2 cases) </li></ul><ul><ul><li>Two cases for rotate with left child </li></ul></ul><ul><ul><li>Two cases for rotate with right child </li></ul></ul>
3. Single rotations in deletion rotate with left child In both figures, a node is deleted in subtree C, causing the height to drop to h. The height of y is h+2. When the height of subtree A is h+1, the height of B can be h or h+1. Fortunately, the same single rotation can correct both cases.
4. Single rotations in deletion rotate with right child In both figures, a node is deleted in subtree A, causing the height to drop to h. The height of y is h+2. When the height of subtree C is h+1, the height of B can be h or h+1. A single rotation can correct both cases.
5. Rotations in deletion <ul><li>There are 4 cases for single rotations, but we do not need to distinguish among them. </li></ul><ul><li>There are exactly two cases for double rotations (as in the case of insertion) </li></ul><ul><li>Therefore, we can reuse exactly the same procedure for insertion to determine which rotation to perform </li></ul>
6. B + -Trees
7. Dictionary for Secondary storage <ul><li>The AVL tree is an excellent dictionary structure when the entire structure can fit into the main memory . </li></ul><ul><ul><li>following or updating a pointer only requires a memory cycle. </li></ul></ul><ul><li>When the size of the data becomes so large that it cannot fit into the main memory, the performance of AVL tree may deteriorate rapidly </li></ul><ul><ul><li>Following a pointer or updating a pointer requires accessing the disk once. </li></ul></ul><ul><ul><li>Traversing from root to a leaf may need to access the disk log 2 n time. </li></ul></ul><ul><ul><ul><li>when n = 1048576 = 2 20 , we need 20 disk accesses. For a disk spinning at 7200rpm, this will take roughly 0.166 seconds. 10 searches will take more than 1 second! This is way too slow . </li></ul></ul></ul>
8. B + Tree <ul><li>Since the processor is much faster, it is more important to minimize the number of disk accesses by performing more cpu instructions. </li></ul><ul><li>Idea: allow a node in a tree to have many children . </li></ul><ul><li>If each internal node in the tree has M children, the height of the tree would be log M n instead of log 2 n . </li></ul><ul><ul><li>For example, if M = 20, then log 20 2 20 < 5. </li></ul></ul><ul><li>Thus, we can speed up the search significantly. </li></ul>
9. B + Tree <ul><li>In practice: it is impossible to keep the same number of children per internal node. </li></ul><ul><li>A B + -tree of order M ≥ 3 is an M-ary tree with the following properties: </li></ul><ul><ul><li>Each internal node has at most M children </li></ul></ul><ul><ul><li>Each internal node, except the root, has between M/2 -1 and M-1 keys </li></ul></ul><ul><ul><ul><li>this guarantees that the tree does not degenerate into a binary tree </li></ul></ul></ul><ul><ul><li>The keys at each node are ordered </li></ul></ul><ul><ul><li>The root is either a leaf or has between 1 and M-1 keys </li></ul></ul><ul><ul><li>The data items are stored at the leaves. All leaves are at the same depth. Each leaf has between L/2 -1 and L-1 data items, for some L (usually L << M, but we will assume M=L in most examples) </li></ul></ul>
10. Example <ul><li>Here, M=L=5 </li></ul><ul><li>Records are stored at the leaves, but we only show the keys here </li></ul><ul><li>At the internal nodes, only keys (and pointers to children) are stored (also called separating keys ) </li></ul>
11. A B + tree with M=L=4 <ul><li>We can still talk about left and right child pointers </li></ul><ul><li>E.g. the left child pointer of N is the same as the right child pointer of J </li></ul><ul><li>We can also talk about the left subtree and right subtree of a key in internal nodes </li></ul>
12. B + Tree <ul><li>Which keys are stored at the internal nodes? </li></ul><ul><li>There are several ways to do it. Different books adopt different conventions. </li></ul><ul><li>We will adopt the following convention: </li></ul><ul><ul><li>key i in an internal node is the smallest key in its i+1 subtree (i.e. right subtree of key i) </li></ul></ul><ul><li>Even following this convention, there is no unique B + -tree for the same set of records. </li></ul>
13. B+ tree <ul><li>Each internal node/leaf is designed to fit into one I/O block of data. An I/O block usually can hold quite a lot of data. Hence, an internal node can keep a lot of keys, i.e., large M. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion. </li></ul><ul><li>B + -tree is a popular structure used in commercial databases. To further speed up the search, the first one or two levels of the B + -tree are usually kept in main memory. </li></ul><ul><li>The disadvantage of B + -tree is that most nodes will have less than M-1 keys most of the time. This could lead to severe space wastage . Thus, it is not a good dictionary structure for data in main memory. </li></ul><ul><li>The textbook calls the tree B-tree instead of B + -tree. In some other textbooks, B-tree refers to the variant where the actual records are kept at internal nodes as well as the leaves. Such a scheme is not practical. Keeping actual records at the internal nodes will limit the number of keys stored there, and thus increasing the number of tree levels. </li></ul>
14. Searching <ul><li>Suppose that we want to search for the key K. The path traversed is shown in bold. </li></ul>
15. Searching <ul><li>Let x be the input search key . </li></ul><ul><li>Start the searching at the root </li></ul><ul><li>If we encounter an internal node v , search (linear search or binary search) for x among the keys stored at v </li></ul><ul><ul><li>If x < K min at v, follow the left child pointer of K min </li></ul></ul><ul><ul><li>If K i ≤ x < K i+1 for two consecutive keys K i and K i+1 at v, follow the left child pointer of K i+1 </li></ul></ul><ul><ul><li>If x ≥ K max at v, follow the right child pointer of K max </li></ul></ul><ul><li>If we encounter a leaf v , we search (linear search or binary search) for x among the keys stored at v. If found, we return the entire record; otherwise, report not found. </li></ul>
16. Insertion <ul><li>Suppose that we want to insert a key K and its associated record. </li></ul><ul><li>Search for the key K using the search procedure </li></ul><ul><li>This will bring us to a leaf x. </li></ul><ul><li>Insert K into x </li></ul><ul><ul><li>Splitting (instead of rotations in AVL trees) of nodes is used to maintain properties of B + -trees [next slide] </li></ul></ul>
17. Insertion into a leaf <ul><li>If leaf x contains < M-1 keys , then insert K into x (at the correct position in node x) </li></ul><ul><li>If x is already full (i.e. containing M-1 keys). Split x </li></ul><ul><ul><li>Cut x off its parent </li></ul></ul><ul><ul><li>Insert K into x, pretending x has space for K. Now x has M keys. </li></ul></ul><ul><ul><li>After inserting K, split x into 2 new leaves x L and x R , with x L containing the M/2 smallest keys , and x R containing the remaining M/2 keys . Let J be the minimum key in x R </li></ul></ul><ul><ul><li>Make a copy of J to be the parent of x L and x R , and insert the copy together with its child pointers into the old parent of x. </li></ul></ul>
18. Inserting into a non-full leaf
19. Splitting a leaf: inserting T
20. Cont’d
21. <ul><li>Two disk accesses to write the two leaves, one disk access to update the parent </li></ul><ul><li>For L=32, two leaves with 16 and 17 items are created. We can perform 15 more insertions without another split </li></ul>
22. Another example:
23. Cont’d => Need to split the internal node
24. Splitting an internal node <ul><li>To insert a key K into a full internal node x : </li></ul><ul><li>Cut x off from its parent </li></ul><ul><li>Insert K and its left and right child pointers into x, pretending there is space. Now x has M keys. </li></ul><ul><li>Split x into 2 new internal nodes x L and x R , with x L containing the ( M/2 - 1 ) smallest keys , and x R containing the M/2 largest keys . Note that the ( M/2 )th key J is not placed in x L or x R </li></ul><ul><li>Make J the parent of x L and x R , and insert J together with its child pointers into the old parent of x. </li></ul>
25. Example: splitting internal node
26. Cont’d
27. Termination <ul><li>Splitting will continue as long as we encounter full internal nodes </li></ul><ul><li>If the split internal node x does not have a parent (i.e. x is a root ), then create a new root containing the key J and its two children </li></ul>
28. Deletion <ul><li>To delete a key target , we find it at a leaf x, and remove it. </li></ul><ul><li>Two situations to worry about: </li></ul><ul><ul><li>(1) target is a key in some internal node (needs to be replaced, according to our convention) </li></ul></ul><ul><ul><li>(2) After deleting target from leaf x, x contains less than M/2 - 1 keys (needs to merge nodes) </li></ul></ul>
29. Situation (1) <ul><li>By our convention, target can appear in at most one ancestor y of x as a key. Moreover, we must have visited node y and seen target in it when we searched down the tree. So after deleting from node x, we can access y directly and replace target by the new smallest key in x. </li></ul>
30. Situation (2): handling leaves with too few keys <ul><li>Suppose we delete the record with key target from a leaf. </li></ul><ul><li>Let u be the leaf that has M/2 - 2 keys (too few) </li></ul><ul><li>Let v be a sibling of u </li></ul><ul><li>Let k be the key in the parent of u and v that separates the pointers to u and v. </li></ul><ul><li>There are two cases </li></ul>
31. handling leaves with too few keys <ul><li>Case 1: v contains M/2 keys or more and v is the right sibling of u </li></ul><ul><ul><li>Move the leftmost record from v to u </li></ul></ul><ul><ul><li>Set the key in parent of u that separates u and v to be the new smallest key in v </li></ul></ul><ul><li>Case 2: v contains M/2 keys or more and v is the left sibling of u </li></ul><ul><ul><li>Move the rightmost record from v to u </li></ul></ul><ul><ul><li>Set the key in parent of u that separates u and v to be the new smallest key in u </li></ul></ul>
32. Deletion example Want to delete 15
33. Want to delete 9
34. Want to delete 10
35.
36.
37. Merging two leaves <ul><li>If no sibling leaf with at least M/2 keys exists, then merge two leaves. </li></ul><ul><li>Case (1): Suppose that the right sibling v of u contains exactly M/2 -1 keys. Merge u and v </li></ul><ul><ul><li>Move the keys in u to v </li></ul></ul><ul><ul><li>Remove the pointer to u at parent </li></ul></ul><ul><ul><li>Delete the separating key between u and v from the parent of u </li></ul></ul>
38. Merging two leaves <ul><li>Case (2): Suppose that the left sibling v of u contains exactly M/2 -1 keys. Merge u and v </li></ul><ul><ul><li>Move the keys in u to v </li></ul></ul><ul><ul><li>Remove the pointer to u at parent </li></ul></ul><ul><ul><li>Delete the separating key between u and v from the parent of u </li></ul></ul>
39. Example Want to delete 12
40. Cont’d u v
41. Cont’d
42. Cont’d too few keys! …
43. Deleting a key in an internal node <ul><li>Suppose we remove a key from an internal node u, and u has less than M/2 -1 keys afterwards. </li></ul><ul><li>Case (1): u is a root </li></ul><ul><ul><li>If u is empty, then remove u and make its child the new root </li></ul></ul>
44. Deleting a key in an internal node <ul><li>Case (2): the right sibling v of u has M/2 keys or more </li></ul><ul><ul><li>Move the separating key between u and v in the parent of u and v down to u. </li></ul></ul><ul><ul><li>Make the leftmost child of v the rightmost child of u </li></ul></ul><ul><ul><li>Move the leftmost key in v to become the separating key between u and v in the parent of u and v. </li></ul></ul><ul><li>Case (2): the left sibling v of u has M/2 keys or more </li></ul><ul><ul><li>Move the separating key between u and v in the parent of u and v down to u. </li></ul></ul><ul><ul><li>Make the rightmost child of v the leftmost child of u </li></ul></ul><ul><ul><li>Move the rightmost key in v to become the separating key between u and v in the parent of u and v. </li></ul></ul>
45. … continue from previous example u v case 2
46. Cont’d
47. <ul><li>Case (3): all sibling v of u contains exactly M/2 - 1 keys </li></ul><ul><ul><li>Move the separating key between u and v in the parent of u and v down to u. </li></ul></ul><ul><ul><li>Move the keys and child pointers in u to v </li></ul></ul><ul><ul><li>Remove the pointer to u at parent. </li></ul></ul>
Be the first to comment