• Like
Index management in shallow depth
Upcoming SlideShare
Loading in...5
×

Index management in shallow depth

  • 425 views
Uploaded on

These slides afford in shallow depth the index management question. There are some example on how your choice can change your relation in terms of I/O accesses

These slides afford in shallow depth the index management question. There are some example on how your choice can change your relation in terms of I/O accesses

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
425
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
5
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Index management in shallow depth Andrea Giuliano @bit_shark
  • 2. Architecture of a DBMS Andrea Giuliano @bit_shark
  • 3. Disk manager The disk manager provides the following commands for a page • • • • allocate a page deallocate a page read a page write a page The size of a page is chosen to be the size of a disk block and pages are stored as disk blocks so that reading or writing a page can be done in one disk I/O Andrea Giuliano @bit_shark
  • 4. Buffer manager is the software layer responsible for bringing pages from disk to main memory as needed and manages the available main memory by partitioning it into a collection of pages the buffer pool Andrea Giuliano @bit_shark
  • 5. Mysql indexes are stored! in B-trees
  • 6. B-tree structure data entry: record stored in an index file data record: record stored in a database file Andrea Giuliano @bit_shark
  • 7. Clustered index data entries and data records have the same (or close) order Andrea Giuliano @bit_shark
  • 8. Unclustered index data entries and data records have the different order criteria Andrea Giuliano @bit_shark
  • 9. Dense index An index is dense if every value of the search key that appears in the data file appears also in at least one data entry of the index Andrea Giuliano @bit_shark
  • 10. Sparse index An index is sparse if every value of the search key that appears in the data entry points to a page of the data record Andrea Giuliano @bit_shark
  • 11. Search in a b-tree fan-out n leaves Cost: logF n Andrea Giuliano @bit_shark
  • 12. Insert and delete Insert and delete operations must keep the tree balanced towards split, redistribution and coalesce techniques. ..too deep Andrea Giuliano @bit_shark
  • 13. How can I compute I/O accesses? Andrea Giuliano @bit_shark
  • 14. Book case BOOK code author cost publisher • • • • • 2.000.000 records (tuples) 200.000 pages 10 data record in a page 200 records with the same value of the attribute cost (on average) dense non-clustering B+-tree index with search key cost Query: ask for code, author, publisher af all books with a given cost how many page accesses do we need to answer to the query? Andrea Giuliano @bit_shark
  • 15. Book case BOOK code author cost publisher • • • • • 2.000.000 records (tuples) 200.000 pages 10 data record in a page 200 records with the same value of the attribute cost (on average) dense non-clustering B+-tree index with search key cost Let’s build the index structure ! • • • we know that each tuple has 4 field so in each page there are 40 fields we can infer that 20 data entries fit in one leaf page of the index so we have a fan-out of 20 Andrea Giuliano @bit_shark
  • 16. Book case fan-out: 20 … We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries) How many leaves are there in the tree?
 2.000.000/13 = 153.846 leaves Andrea Giuliano @bit_shark
  • 17. Book case fan-out: 20 … There are 153.846 leaves In order to go to the leaves we need log20 (153.846) = 4 I/O page accesses Andrea Giuliano @bit_shark
  • 18. Book case fan-out: 20 … Remember, we have on average 200 records with the same value of the attribute cost therefore 200/13 = 15 pages (on average)
 We need to visit these leaves because the index is dense and for each tuple we have to access the 200 data record in order to obtain the other attributes Andrea Giuliano @bit_shark
  • 19. Book case fan-out: 20 … The total cost is: 
 4 + 15 + 200 = 219 I/O accesses ~ 3 sec Andrea Giuliano @bit_shark
  • 20. Book case BOOK code author cost publisher • • • • • 2.000.000 records (tuples) 200.000 pages 10 data record in a page 200 records with the same value of the attribute cost (on average) sparse clustering B+-tree index with search key cost Query: ask for code, author, publisher af all books with a given cost how many page accesses do we need to answer to the query? Andrea Giuliano @bit_shark
  • 21. Book case BOOK code author cost publisher • • • • • 2.000.000 records (tuples) 200.000 pages 10 data record in a page 200 records with the same value of the attribute cost (on average) sparse clustering B+-tree index with search key cost Let’s build the index structure ! • • • we know that each tuple has 4 field so in each page there are 40 fields we can infer that 20 data entries fit in one leaf page of the index so we have a fan-out of 20 Andrea Giuliano @bit_shark
  • 22. Book case fan-out: 20 … We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries)
 BUT each data entry points to a data record page (and not to a tuple) How many data record pages do we have?
 2.000.000/10 = 200.000 data record pages Andrea Giuliano @bit_shark
  • 23. Book case fan-out: 20 … We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries)
 BUT each data entry points to a data record (and not to a tuple) How many leaves there are in the tree?
 200.000/13 = 15.384 leaves Andrea Giuliano @bit_shark
  • 24. Book case fan-out: 20 … There are 15.384 leaves In order to go to the leaves we need log20 (15.384) = 3 I/O pages accesses Remember, we have on average 200 data records with the same value of the attribute cost therefore 200/10 = 20 data record pages to visit Andrea Giuliano @bit_shark
  • 25. Book case fan-out: 20 … The total cost is: 
 3 + 20 = 23 I/O accesses ~ 0.3 sec Andrea Giuliano @bit_shark
  • 26. Book case And what if the attributes we want
 were part of the search key? Andrea Giuliano @bit_shark
  • 27. Book case without index In the worst case we have to visit all the 2.000.000 tuples ~ 50 min Andrea Giuliano @bit_shark
  • 28. Ο λογος δηλοι οτι Think before doing
  • 29. ? Thanks! Andrea Giuliano @bit_shark
  • 30. References: https://www.flickr.com/photos/james_wheeler/9340597900/sizes/o/in/photolist-fep1ko-bQByHkduQ4Qr-82aKA9-82aL6y-8Tn6uc-iPzADZ-99etoQ-cZy6e9-jyqdnW-bxHjLf-8gP59X-cZDq3h-cZDq9dcZDq8N-cZDq7d-cZDqwy-cZDqym-cZDqCf-cZDqAo-cZDqsS-cZDqnQ-cZDqey-cZDqkA-cZDqkJcZDqLh-7Dg8pp-a7f1QC-a7c8rK-7Dg7n6-gCbBVr-9FZ4J1-e6XCpX-aZnsGv-ecTv5D-atFACMgjXozL-9LBjtC-knoEf8-8LGGqw-a8Hw3M-gvL3bp-a7gmG6-aju6p2brQ76S-7Ckbm1-85XaXe-8JBcwN-9oYU3p-a3VsvR-atFAup/ http://www.woking.gov.uk/images/instances/00004A290FD4.C0A801BA.000079A7.0015.jpg http://assets.20bits.com/20080513/b-tree.png, http://dblab.cs.toronto.edu/courses/443/2014/basic-index/dense-index.png http://dblab.cs.toronto.edu/courses/443/2014/basic-index/sparse-index.png http://www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg, http:// www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg http://webhostinggeeks.com/blog/wp-content/uploads/2012/07/611157_small.jpg https://farm7.staticflickr.com/6237/6230474283_50d1f0f4ac_b.jpg, https://www.flickr.com/photos/ javiercosio/6230474283/sizes/l/in/photolistauyNWp-9GRjnM-9GRjmZ-9GRjFz-9GRjvV-9GUcPq-9GRjz8-9GRjm4-9GUcTb-9GRjCt-9GUcQC-9GUcY m-9GRjZD-9GRk1Z-9GUcMU-9GUcGh-9GRjsg-9GRjYZ-9GRjTF-9GRjGe-9GRjNe-9GRjBz-9GUcNs-9GU d25-9GUcKS-9GRjPn-9GRjRg-9GUcBh-9GUcVf-9GUcxj-9GUcuu-brLe7G-e8s4Cwfyi4Rj-83LyYW-83HuFg-83LyLm-83LzUY-83Htrv-83Hv2H-83LBBb-83LAg9-83LBhQ-83Hw8t-83HtKD-83H sYk-afT6uk-cwVhL1-ceVgGC-8tFezr-8SeW9d/