2. Definition
A B-tree is a tree data structure that keeps data sorted and allows
searches, sequential access, insertions, and deletions in
logarithmic amortized time. The B-tree is a generalization of
a binary search tree in which more than two paths diverge from a
single node.
A B-tree of order m (the maximum number of children for each
node) is a tree which satisfies the following properties:
Every node has at most m children.
Every node (except root and leaves) has at least m⁄2 children.
The root has at least two children if it is not a leaf node.
All leaves appear in the same level, and carry information.
A non-leaf node with k children contains k−1 keys.
Declaration in C:
typedef struct { int Count; // number of keys stored in the current node
ItemType Key[3]; // array to hold the 3 keys [4];
long Branch[4]; // array of fake pointers (record numbers)
} NodeType;
3. Order & Key of a B-Tree
The following is an example of a B-tree of order 5. This means
that (other than the root node) all internal nodes have at least 3
children (and hence at least 2 keys). Of course, the maximum
number of children that a node can have is 5 (so that 4 is the
maximum number of keys). In practice B-trees usually have
orders a lot bigger than 5. The first row in each node shows the
keys, while the second row shows the pointers to the child nodes
4. Height of B-Tree
If n ≥ 1, then for any n-key B-tree T of height h and
minimum degree t ≥ 2, h logt (n 1)/2
Height of the B-Tree with n keys is important as it bound
the number of disk accesses.
The height of the tree is maximum when each node has
minimum number of the subtree pointers, q m / 2
.
Note:If number of nodes in B-tree equal 2,000,000 (2
million) and m=200 then maximum height of B-tree is 3,
where as the binary tree would be of height 20.
5. Search in a B-Tree
Search in a B-tree is similar to the search in BST except that in B-
tree we make a multiway branching decision instead of binary
branching in BST.
25 62
12 19 32 39 73 84
3 5 15 17 21 23 30 31 34 37 45 51 69 71 75 79 90 94
Search key 71
6. B-Tree Insert Operation
Insertion in B-tree is more complicated than in BST.
In BST, the keys are added in top down fashion
resulting in an unbalanced tree.
B-tree is built bottom up, the keys are added in the
leaf node, if the leaf node is full another node is
created, keys are evenly distributed and middle key
is promoted to the parent. If parent is full, the
process is repeated.
B-tree can also be built in top down fashion using
pre-splitting technique.
7. Basic Idea : Insertion
Find position for the key
in the appropriate leaf node
Insert key in order Is node
and adjust pointer
No full ?
yes
Split node: If parent is full
• Create a new node
• Move half of the keys from the full node to
the new node and adjust pointers
• Promote the median key (before split)
to the parent
Split guarantees that each node has m/ 2 1
keys.
8. Cases in B-Tree Insert Operation
In B-tree insertion we have the following
cases:
◦ Case 1: The leaf node has room for the new
key.
◦ Case 2: The leaf in which key is to be placed is
full.
This case can lead to the increase in tree height.
9. B-Tree Insert Operation
Case 1: The leaf node has room for the new key.
Find appropriate leaf
Insert 3 node for key 3
3
10 25
5 8 14 19 20 23 32 38
Insert 3 in order
10. B-Tree Insert Operation
Case 2: The leaf in which key is to be placed is full.
Find appropriate leaf
Insert 16 node for key 16
16
10 25
19
3 5 8 14 19 20 23 32 38
No room for key 16 in leaf node
Insert key 19 in parent node in order
Move median key 19 up and
Split node: create a new node
and move keys to the new
node.
14 16 20 23
19
11. B-Tree Insert Operation
Case 2: The leaf in which key is to be placed is full
and this lead to the increase in tree height.
45 55 67 81
12. B-Tree Insert Operation
Case 2: The height of the tree increases.
Insert 16
No room for 27 in parent, Split node
Insert 27 in parent in order 55
16
45 55 67 81
55
No room for 19 in parent,
Split parent node 48 52 57 61 72 77 86 92
13 27
19
27 33 38
3 3 4 5 5 7 3 3 4 5 5 7 3 3 4 5 5 7 3 3 4 5 5 7
2 8 7 1 9 5 2 8 7 1 9 5 2 8 7 1 9 5 2 8 7 1 9 5
9 12 14 19 20 23 29 31 35 36 41 42
Insert 19 in parent node in order
No room for key 16,
Move median key 19 up & Split node
19
14 16 20 23
13. B-Tree Delete Operation
Deletion is analogous to insertion, but a
little more complicated.
Two major cases
◦ Case 1: Deletion from leaf node
◦ Case 2: Deletion from non-leaf node
Apply delete by copy technique used in BST, this
will reduce this case to case 1.
In delete by copy, the key to be deleted is
replaced by the largest key in the right subtree or
smallest in left subtree (which is always a leaf).
14. B-Tree Delete Operation
Leaf node deletion cases:
◦ After deletion node is at least half full.
◦ After deletion underflow occurs
Redistribute: if number of keys in siblings > 2 .
m
1
Merge nodes if number of keys in siblings < m 2
1 .
Merging leads to decrease in tree height.
15. B-Tree Delete Operation
After deletion node is at least half full. (inverse of insertion
case 1)
Search key 3
10 25
3 5 8 14 19 32 38 40 45
Key found, delete key 3.
Move others keys in the node to eliminate
the gap.
16. B-Tree Delete Operation
Underflow occurs, evenly redistribute the keys if left or right
sibling has keys .
m/ 2 1
Search key
Delete 14
14
10 25
5 8 14 19 32 38 40 45
Underflow occurs, evenly redistribute
keys
in the underflow node, in its sibling
and the separator key.
17. B-Tree Delete Operation
Underflow occurs and the keys in the left & right sibling are
m / 2 1 . Merge the underflow node and a sibling.
Delete 25 Move separator key down.
Move the keys to underflow
10 32
node and discard the sibling.
5 8 19 25 38 40
Underflow occurs, merge
nodes.
18. B-Tree Delete Operation
Underflow occurs, height decreases after merging.
Delete 21
70
Underflow
occurs, merge
nodes
8 32 79 85
3 5 21 27 47 66 73 75 78 81 83 88 90 92
Underflow occurs, merge
nodes by moving separator
key and the keys in sibling
node to the underflow
node.
19. B-Tree V/s Binary Tree
Advantages
Efficient in real life problems where
number of records is very large (i.e.
large datasets)
Frees up RAM as all nodes located
on secondary memory
B Tree reduces depth of the tree
hence, desired record is located
faster
Disadvantages
Decision process at each node is
more complicated in a B-tree
A sophisticated program is required
to execute the operations in a B-tree
Fig. Comparison of linear growth rate vs.
logarithmic growth rate
21. Insert Algorithm
Insert
• Cannot just create a new leaf node and insert it
– resulting tree is not B-tree
• Insert new key into an existing leaf node
• If leaf node is full
– Split full node y (with 2t-1) keys around its median
keyt[y] into two nodes each having t-1 keys
– Move the median key into y’s parent.
– If parent is full, recursively split, all the way to the
root
node if necessary.
– If root is full, split root - height of tree increase by
one.
22. Delete Algorithm
• If k is in an internal node, swap k with its inorder
successor (in a leaf node) then delete k from the
leaf node.
• Deleting k from a leaf x may cause n[x]<t-1.
– if the left sibling has more than t-1 elements, we can
transfer an element from there to retain the property
n[x]≥t-1. To retain the order of the elements, this is
done by moving the largest element in the left sibling
to
the parent and moving the parent to the left most
position in x
23. Delete Algorithm
– else, if right sibling has more than t-1 element,
transfer from right sibling through the parent.
– else, merge x with left sibling. One pointer
from the parent needs to be removed in this
case. This is done by moving the parent
element into the new merged node. If the parent
now has fewer than t-1 element, recurse on the
parent.
• Height of the tree may be reduced by 1 if
root contains no element after delete.
• Can also do delete in one pass down, similar
to insert (see textbook).
26. Height of B-Tree
The height of B-tree is maximum if all nodes have minimum
number of keys.
1 key in the root + 2(q-1) keys on the second level +……+ 2qh-2(q-1) keys in
the leaves (level h).
1 2(q - 1) 2q(q - 1) 2qh -2 (q - 1)
h 2
1 (q 1) 2q i
i 0
Applyingtheformulaof geometricprogression
qh 1 1
1 2(q 1)
q 1
1 2q h 1
T hus, thenumber of keys in B - T reeof height h is given as :
n 1 2q h 1
n 1
h logq 1
2
27. Height of B-Tree
The height of B-tree is minimum if all nodes are full, thus we have
m-1 keys in the root + m(m-1) keys on the second level +……+ mh-1(m-1) keys in the leaf nodes
(m - 1) m(m - 1) m 2 (m - 1) m h-1 (m - 1)
h 1 h 1
i
( m 1)m ( m 1) mi
i 0 i 0
Applyingt heformulaof geomet ricprogression
mh 1
( m 1)
m 1
mh 1
T hus, t henumber of keysin B - T reeof height h is given as :
n mh 1
h logm ( n 1)
n 1
logm ( n 1) h logq 1
2
28. Height of B-Tree
Note: Order m is chosen so that B-tree node size is
nearly equal to the disk block size.
Editor's Notes
A B-tree is a specialized multiway tree designed especially for use on disk. In a B-tree each node may contain a large number of keys. The number of subtrees of each node, then, may also be large. A B-tree is designed to branch out in this large number of directions and to contain a lot of keys in each node so that the height of the tree is relatively small. This means that only a small number of nodes must be read from disk to retrieve an item. The goal is to get fast access to the data, and with disk drives this means reading a very small number of records. Note that a large node size (with lots of keys in the node) also fits with the fact that with a disk drive one can usually read a fair amount of data at once.
Insert the following letters into what is originally an empty B-tree of order 5: C N G A H E K Q M F W L T Z D P R X Y S Order 5 means that a node can have a maximum of 5 children and 4 keys. All nodes other than the root must have a minimum of 2 keys. The first 4 letters get inserted into the same node, resulting in this pictureWhen we try to insert the H, we find no room in this node, so we split it into 2 nodes, moving the median item G up into a new root node. Note that in practice we just leave the A and C in the current node and place the H and N into a new node to the right of the old one.Inserting E, K, and Q proceeds without requiring any splits:Inserting M requires a split. Note that M happens to be the median key and so is moved up into the parent node.
In the B-tree as we left it at the end of the last section, delete H. Of course, we first do a lookup to find H. Since H is in a leaf and the leaf has more than the minimum number of keys, this is easy. We move the K over where the H had been and the L over where the K had been. Next, delete the T. Since T is not in a leaf, we find its successor (the next item in ascending order), which happens to be W, and move W up to replace the T. That way, what we really have to do is to delete W from the leaf, which we already know how to do, since this leaf has extra keys. In ALL cases we reduce deletion to a deletion in a leaf, by using this method.Next, delete R. Although R is in a leaf, this leaf does not have an extra key; the deletion results in a node with only one key, which is not acceptable for a B-tree of order 5. If the sibling node to the immediate left or right has an extra key, we can then borrow a key from the parent and move a key up from this sibling. In our specific case, the sibling to the right has an extra key. So, the successor W of S (the last key in the node where the deletion occurred), is moved down from the parent, and the X is moved up. (Of course, the S is moved over so that the W can be inserted in its proper place.)Finally, let's delete E. This one causes lots of problems. Although E is in a leaf, the leaf has no extra keys, nor do the siblings to the immediate right or left. In such a case the leaf has to be combined with one of these two siblings. This includes moving down the parent's key that was between those of these two leaves. In our example, let's combine the leaf containing F with the leaf containing A C. We also move down the D.