The document discusses different database indexing strategies and their performance for query processing. It compares the performance of naive file scanning, sorted file scanning, clustered and unclustered B+ tree indexes, and hash indexes. It also analyzes the performance of processing queries with conjunctions and disjunctions using different indexes.
4. Naive Approach (Heap File)
◦ 1000 pages
Sorted File (on rname)
◦ ???
Clustered Index
◦ ???
Unclustered Index
◦ ???
(R)
value
R.attr op
5. M:= Size of R on Disk (in pages)
X:= # of tuples satisfying “R.attr op value”
Y:= # of tuples per page
No Index, Unsorted Data
◦ File Scan
O(M)
No Index, Sorted Data
◦ Sorted-file Scan
Scan following a Binary Search
O(log2 M) + O(X/Y)
(R)
value
R.attr op
6. B+ Tree Index (Clustered)
◦ Alternative 1 (Data Entry == Data Record)
O(1) + O(X/Y)
◦ Alternative 2,3
O(1) + O(X/Z) + O(X/Y)
Z:= # of Data Entries per Page
Recall Z is much larger than Y
(R)
value
R.attr op
7. B+ Tree Index (Unclustered)
◦ Alternative 1 (Data Entry == Data Record)
???
◦ Alternative 2,3
O(1) + O(X/Z) + O(X)
X can be much Larger than M
Smart Approach
Sort Data Entries on the basis of page-
id or their rids
Any Gain???
(R)
value
R.attr op
8. Hash Index (Only for Equality Selection)
◦ X is typically small
◦ X == 1 implies ???
(R)
value
R.attr op
10. CNF
◦ Collection of Conjuncts connected through Ʌ
Operator
Conjunct
◦ One or more terms connected through V
Operator
◦ If a conjunct contains V then it is called
Disjunctive or containing Disjunction
◦ Primary Conjunct if an Index is available