SlideShare a Scribd company logo
1 of 36
1.1
CAS CS 460/660
Introduction to Database Systems
Tree Based Indexing: B+-tree
Slides from UC Berkeley
1.2
How to Build Tree-Structured Indexes
 Tree-structured indexing techniques support both
range searches and equality searches.
 Two examples:
 ISAM: static structure; early index technology.
 B+ tree: dynamic, adjusts gracefully under inserts and deletes.
1.3
Indexed Sequential Access
Method
 ISAM is an old-fashioned idea
 B+ trees are usually better, as we’ll see
 Though not always
 But, it’s a good place to start
 Simpler than B+ tree, but many of the same ideas
1.4
Range Searches
 ``Find all students with gpa > 3.0’’
 If data is in sorted file, do binary search to find first such student, then scan to
find others.
 Cost of binary search on disk is still quite high. Why?
 Simple idea: Create an `index’ file.
 Can do binary search on (smaller) index file!
Page 1 Page 2 Page N
Page 3 Data File
k2 kN
k1 Index File
But what if index doesn’t fit easily in memory?
1.5
ISAM
We can apply the idea repeatedly!
P
0
K
1 P
1
K 2 P
2
K m
P m
index entry
Non-leaf
Pages
Pages
Overflow
page
Primary pages
Leaf
1.6
Example ISAM Tree
 Index entries:<search key value, page id>
they direct search to data entries in leaves.
 Example where each node can hold 2 entries;
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
20 33 51 63
40
Root
1.7
Data Pages
ISAM has a STATIC Index Structure
File creation:
1. Allocate leaf (data) pages
sequentially
2. Sort records by search key
3. Allocate and fill index pages
(now the structure is ready for use)
4. Allocate and overflow pages as
needed
Static tree structure: inserts/deletes affect only
leaf pages.
ISAM File Layout
Index Pages
Overflow pages
1.8
ISAM (continued)
Search: Start at root; use key
comparisons to navigate to leaf.
Cost = log F N
F = # entries/pg (i.e., fanout)
N = # leaf pgs
 no need for `next-leaf-page’ pointers. (Why?)
Insert: Find leaf that data entry belongs to,
and put it there. Overflow page if necessary.
Delete: Find; remove from leaf; if empty de-
allocate.
Data Pages
Index Pages
Overflow pages
1.9
Example: Insert 23*,48*,41*,42*
48*
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
20 33 51 63
40
Root
Overflow
Pages
Leaf
Index
Pages
Pages
Primary
23* 41*
42*
1.10
48*
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
20 33 51 63
40
Root
Overflow
Pages
Leaf
Index
Pages
Pages
Primary
23* 41*
42*
... then Deleting 42*, 51*, 97*
 Note that 51* appears in index levels, but not in leaf!
1.11
ISAM ---- Issues?
 Pros
 ????
 Cons
 ????
1.12
B+ Tree: The Most Widely Used Index
Insert/delete at log F N cost;
keep tree height-balanced.
N = # leaf pages
Index Entries
Data Entries
("Sequence set")
(Direct search)
• Each node (except for root) contains m entries:
d <= m <= 2d entries.
• “d” is called the order of the tree.
(maintain 50% min occupancy)
• Supports equality and range-searches efficiently.
• As in ISAM, all searches go from root to leaves,
but structure is dynamic.
1.13
Example B+ Tree
 Search begins at root page, and key comparisons direct it to a leaf (as in
ISAM).
 Search for 5*, 15*, all data entries >= 24* ...
 Based on the search for 15*, we know it is not in the tree!
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
1.14
A Note on Terminology
 The “+” in B+Tree indicates a special kind of “B Tree”
in which all the data entries reside in leaf pages.
 In a vanilla “B Tree”, data entries are sprinkled throughout the tree.
 B+Trees are simpler to implement than B Trees.
 And since we have a large fanout, the upper levels comprise only a tiny fraction
of the total storage space in the tree.
 To confuse matters, most database people (like me)
call B+Trees “B Trees”!!! (sorry!)
1.15
B+Tree Pages
Question: How big should the B+Tree pages
(i.e., nodes) be?
Hint 1: we want them to be fairly large (to get
high fanout).
Hint 2: they are typically stored in files on
disk.
Hint 3: they are typically read from disk into
buffer pool frames.
Hint 4: when updated, we eventually write
them from the buffer pool back to disk.
Hint 5: we call them “pages”.
1.16
B+ Trees in Practice
 Remember = Index nodes are disk pages
 e.g., fixed length unit of communication with disk
 Typical order: 100. Typical fill-factor: 67%.
 average fanout = 133
 Typical capacities:
 Height 3: 1333 = 2,352,637 entries
 Height 4: 1334 = 312,900,700 entries
 Can often hold top levels in buffer pool:
 Level 1 = 1 page = 8 Kbytes
 Level 2 = 133 pages = 1 Mbyte
 Level 3 = 17,689 pages = 133 MBytes
1.17
Inserting a Data Entry into a B+ Tree
 Find correct leaf L.
 Put data entry onto L.
 If L has enough space, done!
 Else, must split L (into L and a new node L2)
 Redistribute entries evenly, copy up middle key.
 Insert index entry pointing to L2 into parent of L.
 This can happen recursively
 To split index node, redistribute entries evenly, but push up middle
key. (Contrast with leaf splits.)
 Splits “grow” tree; root split increases height.
 Tree growth: gets wider or one level taller at top.
1.18
Example B+ Tree – Inserting
23*
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
23*
1.19
Example B+ Tree - Inserting 8*
 Notice that root was split, leading to increase in height.
 In this example, we could avoid split by re-distributing
entries; however, this is not done in practice.
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
2* 3* 5* 7* 8*
2* 3* 7*
5* 8*
5
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
5
2* 3* 7*
5* 8*
Root
17
1.20
Leaf vs. Index Page Split
(from previous example of inserting “8”)
 Observe how
minimum
occupancy is
guaranteed in both
leaf and index pg
splits.
 Note difference
between copy-up
and push-up; be
sure you
understand the
reasons for this.
5
Entry to be inserted in parent node.
(Note that 5 is
continues to appear in the leaf.)
s copied up and
2* 3* 5* 7* 8* …
Leaf
Page
Split
2* 3* 5* 7* 8*
5 24 30
13
appears once in the index. Contrast
17
Entry to be inserted in parent node.
(Note that 17 is pushed up and only
this with a leaf split.)
17 24 30
13
Index
Page
Split
5
1.21
Deleting a Data Entry from a B+ Tree
 Start at root, find leaf L where entry belongs.
 Remove the entry.
 If L is at least half-full, done!
 If L has only d-1 entries,
 Try to re-distribute, borrowing from sibling
(adjacent node with same parent as L).
 If re-distribution fails, merge L and sibling.
 If merge occurred, must delete entry (pointing
to L or sibling) from parent of L.
 Merge could propagate to root, decreasing
height.
1.22
Root
17
24 30
19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Example Tree - Delete 19*



5 13
1.23
Root
17
24 30
20* 22* 24* 27* 29* 33* 34* 38* 39*
Example Tree - Delete 19*






5 13
1.24
Root
17
24 30
20* 22* 24* 27* 29* 33* 34* 38* 39*
Example Tree – Now, Delete
20*






5 13
Redistribute
1.25
Root
17
27 30
22* 24* 27* 29* 33* 34* 38* 39*
Example Tree – Delete 20*






5 13
1.26
Root
17
27 30
22* 27* 29* 33* 34* 38* 39*
Example Tree – Then Delete
24*



Underflow!
24*
Can’t redistribute,
must Merge…



5 13
1.27
Root
17
30
22* 27* 29* 33* 34* 38* 39*
Example Tree – Delete 24*



Underflow!



5 13
Root
30
13
5 17
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*
5* 8*
1.28
Example of Non-leaf Re-distribution
 Tree is shown below during deletion of 24*. (What
could be a possible initial tree?)
 In contrast to previous example, can re-distribute
entry from left child of root to right child.
Root
13
5 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39*
22* 27* 29*
21*
7*
5* 8*
3*
2*
1.29
After Re-distribution
 Intuitively, entries are re-distributed by `pushing
through’ the splitting entry in the parent node.
 It suffices to re-distribute index entry with key 20;
we’ve re-distributed 17 as well for illustration.
14* 16* 33* 34* 38* 39*
22* 27* 29*
17* 18* 20* 21*
7*
5* 8*
2* 3*
Root
13
5
17
30
20 22
1.30
A Note on `Order’
 Order (d) concept replaced by physical space criterion in
practice (`at least half-full’).
 Index pages can typically hold many more entries than leaf pages.
 Variable sized records and search keys mean different nodes will
contain different numbers of entries.
 Even with fixed length fields, multiple records with the same search
key value (duplicates) can lead to variable-sized data entries (if we
use Alternative (3)).
 Many real systems are even sloppier than this --- only reclaim
space when a page is completely empty.
1.31
Prefix Key Compression
 Important to increase fan-out. (Why?)
 Key values in index entries only `direct traffic’; can often
compress them.
 E.g., If we have adjacent index entries with search key values
Dannon Yogurt, David Smith and Devarakonda Murthy, we can
abbreviate David Smith to Dav. (The other keys can be compressed
too ...)
 Is this correct? Not quite! What if there is a data entry Davey
Jones? (Can only compress David Smith to Davi)
 In general, while compressing, must leave each index entry
greater than every key value (in any subtree) to its left.
 Insert/delete must be suitably modified.
1.32
Bulk Loading of a B+ Tree
 If we have a large collection of records, and we want to create a B+ tree
on some field, doing so by repeatedly inserting records is very slow.
 Also leads to minimal leaf utilization --- why?
 Bulk Loading can be done much more efficiently.
 Initialization: Sort all data entries, insert pointer to first (leaf) page in a
new (root) page.
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Sorted pages of data entries; not yet in B+ tree
Root
1.33
Bulk Loading (Contd.)
 Index entries for
leaf pages always
entered into right-
most index page
just above leaf
level. When this
fills up, it splits.
(Split may go up
right-most path to
the root.)
 Much faster than
repeated inserts,
especially when one
considers locking!
3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
Root
Data entry pages
not yet in B+ tree
35
23
12
6
10 20
3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
6
Root
10
12 23
20
35
38
not yet in B+ tree
Data entry pages
1.34
Summary of Bulk Loading
 Option 1: multiple inserts.
 Slow.
 Does not give sequential storage of leaves.
 Option 2: Bulk Loading
 Has advantages for concurrency control.
 Fewer I/Os during build.
 Leaves will be stored sequentially (and linked, of course).
 Can control “fill factor” on pages.
1.35
Summary
 Tree-structured indexes are ideal for range-searches, also good for equality
searches.
 ISAM is a static structure.
 Only leaf pages modified; overflow pages needed.
 Overflow chains can degrade performance unless size of data set and data
distribution stay constant.
 B+ tree is a dynamic structure.
 Inserts/deletes leave tree height-balanced; log F N cost.
 High fanout (F) means depth rarely more than 3 or 4.
 Almost always better than maintaining a sorted file.
1.36
Summary (Contd.)
 Typically, 67% occupancy on average.
 Usually preferable to ISAM; adjusts to growth gracefully.
 If data entries are records, splits can change rids!
 Other topics:
 Key compression increases fanout, reduces height.
 Bulk loading can be much faster than repeated inserts for creating a B+ tree on a
large data set.
 Most widely used index in database management systems because of its
versatility.
 One of the most optimized components of a DBMS.

More Related Content

Similar to btree.ppt ggggggggggggggggggggggggggggggg

BTrees-fall2010.ppt
BTrees-fall2010.pptBTrees-fall2010.ppt
BTrees-fall2010.pptzalanmvb
 
17. Java data structures trees representation and traversal
17. Java data structures trees representation and traversal17. Java data structures trees representation and traversal
17. Java data structures trees representation and traversalIntro C# Book
 
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptxLecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptxROHIT738213
 
B+tree Data structures presentation
B+tree Data structures presentationB+tree Data structures presentation
B+tree Data structures presentationMuhammad Bilal Khan
 
b tree file system report
b tree file system reportb tree file system report
b tree file system reportDinesh Gupta
 
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmmIndexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmmRAtna29
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatternsKamal Singh Lodhi
 
Data structures trees - B Tree & B+Tree.pptx
Data structures trees - B Tree & B+Tree.pptxData structures trees - B Tree & B+Tree.pptx
Data structures trees - B Tree & B+Tree.pptxMalligaarjunanN
 
Teched03 Index Maint Tony Bain
Teched03 Index Maint Tony BainTeched03 Index Maint Tony Bain
Teched03 Index Maint Tony BainTony Bain
 
LEC 7-DS ALGO(expression and huffman).pdf
LEC 7-DS  ALGO(expression and huffman).pdfLEC 7-DS  ALGO(expression and huffman).pdf
LEC 7-DS ALGO(expression and huffman).pdfMuhammadUmerIhtisham
 
Lecture 09 - Binary Search Trees.pptx mission Sk it
Lecture 09 - Binary Search Trees.pptx mission Sk itLecture 09 - Binary Search Trees.pptx mission Sk it
Lecture 09 - Binary Search Trees.pptx mission Sk itAmazingWorld37
 

Similar to btree.ppt ggggggggggggggggggggggggggggggg (20)

BTrees-fall2010.ppt
BTrees-fall2010.pptBTrees-fall2010.ppt
BTrees-fall2010.ppt
 
17. Java data structures trees representation and traversal
17. Java data structures trees representation and traversal17. Java data structures trees representation and traversal
17. Java data structures trees representation and traversal
 
Multi way&btree
Multi way&btreeMulti way&btree
Multi way&btree
 
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptxLecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
 
B+tree Data structures presentation
B+tree Data structures presentationB+tree Data structures presentation
B+tree Data structures presentation
 
B trees
B treesB trees
B trees
 
Binary tree
Binary treeBinary tree
Binary tree
 
A41001011
A41001011A41001011
A41001011
 
Jamal ppt b+tree dbms
Jamal ppt b+tree dbmsJamal ppt b+tree dbms
Jamal ppt b+tree dbms
 
Spatial index(2)
Spatial index(2)Spatial index(2)
Spatial index(2)
 
b tree file system report
b tree file system reportb tree file system report
b tree file system report
 
Indexing.ppt
Indexing.pptIndexing.ppt
Indexing.ppt
 
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmmIndexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
Data structures trees - B Tree & B+Tree.pptx
Data structures trees - B Tree & B+Tree.pptxData structures trees - B Tree & B+Tree.pptx
Data structures trees - B Tree & B+Tree.pptx
 
Teched03 Index Maint Tony Bain
Teched03 Index Maint Tony BainTeched03 Index Maint Tony Bain
Teched03 Index Maint Tony Bain
 
B+ tree.pptx
B+ tree.pptxB+ tree.pptx
B+ tree.pptx
 
LEC 7-DS ALGO(expression and huffman).pdf
LEC 7-DS  ALGO(expression and huffman).pdfLEC 7-DS  ALGO(expression and huffman).pdf
LEC 7-DS ALGO(expression and huffman).pdf
 
B and B+ tree
B and B+ treeB and B+ tree
B and B+ tree
 
Lecture 09 - Binary Search Trees.pptx mission Sk it
Lecture 09 - Binary Search Trees.pptx mission Sk itLecture 09 - Binary Search Trees.pptx mission Sk it
Lecture 09 - Binary Search Trees.pptx mission Sk it
 

More from RAtna29

PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnPatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnRAtna29
 
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnRAtna29
 
Lexical-Semantics-NLP.pptx bbbbbbbbbbbbbbb
Lexical-Semantics-NLP.pptx bbbbbbbbbbbbbbbLexical-Semantics-NLP.pptx bbbbbbbbbbbbbbb
Lexical-Semantics-NLP.pptx bbbbbbbbbbbbbbbRAtna29
 
lecture09.ppt zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
lecture09.ppt zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzlecture09.ppt zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
lecture09.ppt zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzRAtna29
 
Model-Theoretic Semantics.pdf NNNNNNNNNNNNNNNNNNNNN
Model-Theoretic Semantics.pdf NNNNNNNNNNNNNNNNNNNNNModel-Theoretic Semantics.pdf NNNNNNNNNNNNNNNNNNNNN
Model-Theoretic Semantics.pdf NNNNNNNNNNNNNNNNNNNNNRAtna29
 
TF-IDF.pdf yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
TF-IDF.pdf yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyTF-IDF.pdf yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
TF-IDF.pdf yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyRAtna29
 
exing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
exing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhexing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
exing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhRAtna29
 
redblacktreeindatastructure-200409083949.pptx
redblacktreeindatastructure-200409083949.pptxredblacktreeindatastructure-200409083949.pptx
redblacktreeindatastructure-200409083949.pptxRAtna29
 
queuesArrays.ppt bbbbbbbbbbbbbbbbbbbbbbbbbb
queuesArrays.ppt bbbbbbbbbbbbbbbbbbbbbbbbbbqueuesArrays.ppt bbbbbbbbbbbbbbbbbbbbbbbbbb
queuesArrays.ppt bbbbbbbbbbbbbbbbbbbbbbbbbbRAtna29
 
Stack and Queue.pptx university exam preparation
Stack and Queue.pptx university exam preparationStack and Queue.pptx university exam preparation
Stack and Queue.pptx university exam preparationRAtna29
 
avltrees.ppt unit 4 ppts for the university exam
avltrees.ppt unit 4 ppts for the university examavltrees.ppt unit 4 ppts for the university exam
avltrees.ppt unit 4 ppts for the university examRAtna29
 
avltreesUNIT4.ppt ggggggggggggggguuuuuuuuu
avltreesUNIT4.ppt ggggggggggggggguuuuuuuuuavltreesUNIT4.ppt ggggggggggggggguuuuuuuuu
avltreesUNIT4.ppt ggggggggggggggguuuuuuuuuRAtna29
 
nlp semantic for parsing ... mmmmmmmmmmmm
nlp semantic for parsing ... mmmmmmmmmmmmnlp semantic for parsing ... mmmmmmmmmmmm
nlp semantic for parsing ... mmmmmmmmmmmmRAtna29
 

More from RAtna29 (13)

PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnPatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 
Lexical-Semantics-NLP.pptx bbbbbbbbbbbbbbb
Lexical-Semantics-NLP.pptx bbbbbbbbbbbbbbbLexical-Semantics-NLP.pptx bbbbbbbbbbbbbbb
Lexical-Semantics-NLP.pptx bbbbbbbbbbbbbbb
 
lecture09.ppt zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
lecture09.ppt zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzlecture09.ppt zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
lecture09.ppt zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
 
Model-Theoretic Semantics.pdf NNNNNNNNNNNNNNNNNNNNN
Model-Theoretic Semantics.pdf NNNNNNNNNNNNNNNNNNNNNModel-Theoretic Semantics.pdf NNNNNNNNNNNNNNNNNNNNN
Model-Theoretic Semantics.pdf NNNNNNNNNNNNNNNNNNNNN
 
TF-IDF.pdf yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
TF-IDF.pdf yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyTF-IDF.pdf yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
TF-IDF.pdf yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
 
exing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
exing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhexing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
exing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
 
redblacktreeindatastructure-200409083949.pptx
redblacktreeindatastructure-200409083949.pptxredblacktreeindatastructure-200409083949.pptx
redblacktreeindatastructure-200409083949.pptx
 
queuesArrays.ppt bbbbbbbbbbbbbbbbbbbbbbbbbb
queuesArrays.ppt bbbbbbbbbbbbbbbbbbbbbbbbbbqueuesArrays.ppt bbbbbbbbbbbbbbbbbbbbbbbbbb
queuesArrays.ppt bbbbbbbbbbbbbbbbbbbbbbbbbb
 
Stack and Queue.pptx university exam preparation
Stack and Queue.pptx university exam preparationStack and Queue.pptx university exam preparation
Stack and Queue.pptx university exam preparation
 
avltrees.ppt unit 4 ppts for the university exam
avltrees.ppt unit 4 ppts for the university examavltrees.ppt unit 4 ppts for the university exam
avltrees.ppt unit 4 ppts for the university exam
 
avltreesUNIT4.ppt ggggggggggggggguuuuuuuuu
avltreesUNIT4.ppt ggggggggggggggguuuuuuuuuavltreesUNIT4.ppt ggggggggggggggguuuuuuuuu
avltreesUNIT4.ppt ggggggggggggggguuuuuuuuu
 
nlp semantic for parsing ... mmmmmmmmmmmm
nlp semantic for parsing ... mmmmmmmmmmmmnlp semantic for parsing ... mmmmmmmmmmmm
nlp semantic for parsing ... mmmmmmmmmmmm
 

Recently uploaded

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 

Recently uploaded (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 

btree.ppt ggggggggggggggggggggggggggggggg

  • 1. 1.1 CAS CS 460/660 Introduction to Database Systems Tree Based Indexing: B+-tree Slides from UC Berkeley
  • 2. 1.2 How to Build Tree-Structured Indexes  Tree-structured indexing techniques support both range searches and equality searches.  Two examples:  ISAM: static structure; early index technology.  B+ tree: dynamic, adjusts gracefully under inserts and deletes.
  • 3. 1.3 Indexed Sequential Access Method  ISAM is an old-fashioned idea  B+ trees are usually better, as we’ll see  Though not always  But, it’s a good place to start  Simpler than B+ tree, but many of the same ideas
  • 4. 1.4 Range Searches  ``Find all students with gpa > 3.0’’  If data is in sorted file, do binary search to find first such student, then scan to find others.  Cost of binary search on disk is still quite high. Why?  Simple idea: Create an `index’ file.  Can do binary search on (smaller) index file! Page 1 Page 2 Page N Page 3 Data File k2 kN k1 Index File But what if index doesn’t fit easily in memory?
  • 5. 1.5 ISAM We can apply the idea repeatedly! P 0 K 1 P 1 K 2 P 2 K m P m index entry Non-leaf Pages Pages Overflow page Primary pages Leaf
  • 6. 1.6 Example ISAM Tree  Index entries:<search key value, page id> they direct search to data entries in leaves.  Example where each node can hold 2 entries; 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 20 33 51 63 40 Root
  • 7. 1.7 Data Pages ISAM has a STATIC Index Structure File creation: 1. Allocate leaf (data) pages sequentially 2. Sort records by search key 3. Allocate and fill index pages (now the structure is ready for use) 4. Allocate and overflow pages as needed Static tree structure: inserts/deletes affect only leaf pages. ISAM File Layout Index Pages Overflow pages
  • 8. 1.8 ISAM (continued) Search: Start at root; use key comparisons to navigate to leaf. Cost = log F N F = # entries/pg (i.e., fanout) N = # leaf pgs  no need for `next-leaf-page’ pointers. (Why?) Insert: Find leaf that data entry belongs to, and put it there. Overflow page if necessary. Delete: Find; remove from leaf; if empty de- allocate. Data Pages Index Pages Overflow pages
  • 9. 1.9 Example: Insert 23*,48*,41*,42* 48* 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 20 33 51 63 40 Root Overflow Pages Leaf Index Pages Pages Primary 23* 41* 42*
  • 10. 1.10 48* 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 20 33 51 63 40 Root Overflow Pages Leaf Index Pages Pages Primary 23* 41* 42* ... then Deleting 42*, 51*, 97*  Note that 51* appears in index levels, but not in leaf!
  • 11. 1.11 ISAM ---- Issues?  Pros  ????  Cons  ????
  • 12. 1.12 B+ Tree: The Most Widely Used Index Insert/delete at log F N cost; keep tree height-balanced. N = # leaf pages Index Entries Data Entries ("Sequence set") (Direct search) • Each node (except for root) contains m entries: d <= m <= 2d entries. • “d” is called the order of the tree. (maintain 50% min occupancy) • Supports equality and range-searches efficiently. • As in ISAM, all searches go from root to leaves, but structure is dynamic.
  • 13. 1.13 Example B+ Tree  Search begins at root page, and key comparisons direct it to a leaf (as in ISAM).  Search for 5*, 15*, all data entries >= 24* ...  Based on the search for 15*, we know it is not in the tree! Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13
  • 14. 1.14 A Note on Terminology  The “+” in B+Tree indicates a special kind of “B Tree” in which all the data entries reside in leaf pages.  In a vanilla “B Tree”, data entries are sprinkled throughout the tree.  B+Trees are simpler to implement than B Trees.  And since we have a large fanout, the upper levels comprise only a tiny fraction of the total storage space in the tree.  To confuse matters, most database people (like me) call B+Trees “B Trees”!!! (sorry!)
  • 15. 1.15 B+Tree Pages Question: How big should the B+Tree pages (i.e., nodes) be? Hint 1: we want them to be fairly large (to get high fanout). Hint 2: they are typically stored in files on disk. Hint 3: they are typically read from disk into buffer pool frames. Hint 4: when updated, we eventually write them from the buffer pool back to disk. Hint 5: we call them “pages”.
  • 16. 1.16 B+ Trees in Practice  Remember = Index nodes are disk pages  e.g., fixed length unit of communication with disk  Typical order: 100. Typical fill-factor: 67%.  average fanout = 133  Typical capacities:  Height 3: 1333 = 2,352,637 entries  Height 4: 1334 = 312,900,700 entries  Can often hold top levels in buffer pool:  Level 1 = 1 page = 8 Kbytes  Level 2 = 133 pages = 1 Mbyte  Level 3 = 17,689 pages = 133 MBytes
  • 17. 1.17 Inserting a Data Entry into a B+ Tree  Find correct leaf L.  Put data entry onto L.  If L has enough space, done!  Else, must split L (into L and a new node L2)  Redistribute entries evenly, copy up middle key.  Insert index entry pointing to L2 into parent of L.  This can happen recursively  To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)  Splits “grow” tree; root split increases height.  Tree growth: gets wider or one level taller at top.
  • 18. 1.18 Example B+ Tree – Inserting 23* Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 23*
  • 19. 1.19 Example B+ Tree - Inserting 8*  Notice that root was split, leading to increase in height.  In this example, we could avoid split by re-distributing entries; however, this is not done in practice. Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 2* 3* 5* 7* 8* 2* 3* 7* 5* 8* 5 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 2* 3* 7* 5* 8* Root 17
  • 20. 1.20 Leaf vs. Index Page Split (from previous example of inserting “8”)  Observe how minimum occupancy is guaranteed in both leaf and index pg splits.  Note difference between copy-up and push-up; be sure you understand the reasons for this. 5 Entry to be inserted in parent node. (Note that 5 is continues to appear in the leaf.) s copied up and 2* 3* 5* 7* 8* … Leaf Page Split 2* 3* 5* 7* 8* 5 24 30 13 appears once in the index. Contrast 17 Entry to be inserted in parent node. (Note that 17 is pushed up and only this with a leaf split.) 17 24 30 13 Index Page Split 5
  • 21. 1.21 Deleting a Data Entry from a B+ Tree  Start at root, find leaf L where entry belongs.  Remove the entry.  If L is at least half-full, done!  If L has only d-1 entries,  Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).  If re-distribution fails, merge L and sibling.  If merge occurred, must delete entry (pointing to L or sibling) from parent of L.  Merge could propagate to root, decreasing height.
  • 22. 1.22 Root 17 24 30 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Example Tree - Delete 19*    5 13
  • 23. 1.23 Root 17 24 30 20* 22* 24* 27* 29* 33* 34* 38* 39* Example Tree - Delete 19*       5 13
  • 24. 1.24 Root 17 24 30 20* 22* 24* 27* 29* 33* 34* 38* 39* Example Tree – Now, Delete 20*       5 13 Redistribute
  • 25. 1.25 Root 17 27 30 22* 24* 27* 29* 33* 34* 38* 39* Example Tree – Delete 20*       5 13
  • 26. 1.26 Root 17 27 30 22* 27* 29* 33* 34* 38* 39* Example Tree – Then Delete 24*    Underflow! 24* Can’t redistribute, must Merge…    5 13
  • 27. 1.27 Root 17 30 22* 27* 29* 33* 34* 38* 39* Example Tree – Delete 24*    Underflow!    5 13 Root 30 13 5 17 2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5* 8*
  • 28. 1.28 Example of Non-leaf Re-distribution  Tree is shown below during deletion of 24*. (What could be a possible initial tree?)  In contrast to previous example, can re-distribute entry from left child of root to right child. Root 13 5 17 20 22 30 14* 16* 17* 18* 20* 33* 34* 38* 39* 22* 27* 29* 21* 7* 5* 8* 3* 2*
  • 29. 1.29 After Re-distribution  Intuitively, entries are re-distributed by `pushing through’ the splitting entry in the parent node.  It suffices to re-distribute index entry with key 20; we’ve re-distributed 17 as well for illustration. 14* 16* 33* 34* 38* 39* 22* 27* 29* 17* 18* 20* 21* 7* 5* 8* 2* 3* Root 13 5 17 30 20 22
  • 30. 1.30 A Note on `Order’  Order (d) concept replaced by physical space criterion in practice (`at least half-full’).  Index pages can typically hold many more entries than leaf pages.  Variable sized records and search keys mean different nodes will contain different numbers of entries.  Even with fixed length fields, multiple records with the same search key value (duplicates) can lead to variable-sized data entries (if we use Alternative (3)).  Many real systems are even sloppier than this --- only reclaim space when a page is completely empty.
  • 31. 1.31 Prefix Key Compression  Important to increase fan-out. (Why?)  Key values in index entries only `direct traffic’; can often compress them.  E.g., If we have adjacent index entries with search key values Dannon Yogurt, David Smith and Devarakonda Murthy, we can abbreviate David Smith to Dav. (The other keys can be compressed too ...)  Is this correct? Not quite! What if there is a data entry Davey Jones? (Can only compress David Smith to Davi)  In general, while compressing, must leave each index entry greater than every key value (in any subtree) to its left.  Insert/delete must be suitably modified.
  • 32. 1.32 Bulk Loading of a B+ Tree  If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow.  Also leads to minimal leaf utilization --- why?  Bulk Loading can be done much more efficiently.  Initialization: Sort all data entries, insert pointer to first (leaf) page in a new (root) page. 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* Sorted pages of data entries; not yet in B+ tree Root
  • 33. 1.33 Bulk Loading (Contd.)  Index entries for leaf pages always entered into right- most index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.)  Much faster than repeated inserts, especially when one considers locking! 3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44* Root Data entry pages not yet in B+ tree 35 23 12 6 10 20 3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44* 6 Root 10 12 23 20 35 38 not yet in B+ tree Data entry pages
  • 34. 1.34 Summary of Bulk Loading  Option 1: multiple inserts.  Slow.  Does not give sequential storage of leaves.  Option 2: Bulk Loading  Has advantages for concurrency control.  Fewer I/Os during build.  Leaves will be stored sequentially (and linked, of course).  Can control “fill factor” on pages.
  • 35. 1.35 Summary  Tree-structured indexes are ideal for range-searches, also good for equality searches.  ISAM is a static structure.  Only leaf pages modified; overflow pages needed.  Overflow chains can degrade performance unless size of data set and data distribution stay constant.  B+ tree is a dynamic structure.  Inserts/deletes leave tree height-balanced; log F N cost.  High fanout (F) means depth rarely more than 3 or 4.  Almost always better than maintaining a sorted file.
  • 36. 1.36 Summary (Contd.)  Typically, 67% occupancy on average.  Usually preferable to ISAM; adjusts to growth gracefully.  If data entries are records, splits can change rids!  Other topics:  Key compression increases fanout, reduces height.  Bulk loading can be much faster than repeated inserts for creating a B+ tree on a large data set.  Most widely used index in database management systems because of its versatility.  One of the most optimized components of a DBMS.