My File Structure Btrees Project Report


Published on

This presentation gives the brief description of B-Tree implementation using File Structures concept.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

My File Structure Btrees Project Report

  1. 1. <ul><li>Project by : Usman Sait A.K. </li></ul>
  2. 2. <ul><li>File structure is a combination of representation for data in files & of operations for accessing the data. </li></ul><ul><li>A B-Tree is a balanced search tree. A B-Tree is a method of placing & locating files (called records or keys) in a database . It is a multi-way tree in which all insertions are made at the leaf level. It uses bottom up approach. </li></ul><ul><li>Why B-Trees are used: </li></ul><ul><li>When working with large sets of data, its often not possible or desirable to maintain the entire structure in primary storage (RAM). </li></ul><ul><li>Instead, a relatively small portion of the data structure is maintained in primary storage, and additional data is read from secondary storage as needed. </li></ul><ul><li>Unfortunately, a magnetic disk, the most common form of secondary storage, is significantly slower than random access memory (RAM). </li></ul><ul><li>B-Trees are balanced trees that are optimized for situations when part or all of the tree must be maintained in secondary storage such as a magnetic disk. </li></ul>
  3. 3. <ul><li>The Project consists of: </li></ul><ul><li>Part-1 : </li></ul><ul><li>The objective of this part of the project is to create a class STUDENT with variable length fields and fixed length records </li></ul><ul><li>The implementation of this part of the project will help us maintain a student database which can help us store and retrieve the details of students. </li></ul><ul><li>Operations in Part-1 </li></ul><ul><li>Insert records onto the file. </li></ul><ul><li>Delete a record from the file. </li></ul><ul><li>Write from object file to buffer- Pack. </li></ul><ul><li>Write from buffer to object file- Unpack. </li></ul><ul><li>Modify the contents of the record- Update. </li></ul><ul><li>Display contents of the file. </li></ul><ul><li>To search for a particular record. </li></ul><ul><li>Part-2 : </li></ul><ul><li>The objective of this part is to add B-Tree indexes to the data files created in part one. </li></ul><ul><li>Operations in Part-2 </li></ul><ul><li>Display the records using B-Trees. </li></ul><ul><li>Display the average space utilization. </li></ul>
  4. 4. <ul><li>Consider the sequence: C D S T A M P I B W N G U R K E H O L J Y Q Z F X V </li></ul><ul><li>Maximum of four key-reference pairs can be inserted per node. This is order four B-Tree. </li></ul><ul><li>Insertion of C S D T into initial node </li></ul><ul><li>When 5 th key, A is added, the original node is split & the tree grows by one level as a new root is created- Split Operation. The keys in the root are the largest key in the left leaf D & the largest key in the right leaf T. </li></ul><ul><li>After inserting M P I B W N G U, the B-Tree looks as shown below. The root is now full. </li></ul>C D S T T D D C A T S W P M D D C B A M I G P N W U T S
  5. 5. <ul><li>Insertion of R causes the rightmost leaf node to split , insertion into the root to split and the tree grows to the level 3- Recursive Split. </li></ul><ul><li>Insertion of K E H O L J Y Q Z F X V results in the B-Tree as shown below. </li></ul>W P P M D W T D C B A P N W U M I G T S R Z P I I G D P M Z X T D C B A M L K J T S R Q Z Y G F E I H P O N X W V U
  6. 6. <ul><li>Insertion Operation: </li></ul><ul><li>To perform an insertion on a B-Tree, the appropriate node for the key must be located. </li></ul><ul><li>Next, the key must be inserted into the node. </li></ul><ul><li>If the node is not full prior to the insertion, no special action is required. However, if the node is full, the node must be split to make room for the new key. This splitting takes place such that the left node will have three keys & the right node will have two keys . The parent node will have the largest key of both the nodes. The parent node must not be full or another split operation is required. This process may repeat all the way up to the root and may require splitting the root node. </li></ul><ul><li>Search operation: </li></ul><ul><li>The correct child is chosen by performing a linear search of the values in the node. </li></ul><ul><li>After finding the value greater than or equal to the desired value, the child pointer to the immediate left of that value is followed. </li></ul><ul><li>If all values are less than the desired value, the rightmost child pointer is followed. </li></ul><ul><li>The search can be terminated as soon as the desired node is found. </li></ul>
  7. 7. <ul><li>Linear Search technique is used for the search operation. </li></ul><ul><li>Searching for S results in the traversal of the B-Tree up to the right leaf node. </li></ul><ul><li>Searching for the key which is not present results in the traversal of the B-Tree with respect to the parent node. </li></ul>W P P M D W T D C B A P N W U M I G T S R
  8. 8. <ul><li>Rules for deleting a key k from a node n: </li></ul><ul><li>If n has more than the number of keys and the k is not the largest in n, simply delete k from n. </li></ul><ul><li>If n has more than the minimum number of keys and the k is the largest in n, delete k and modify the higher level indexes to reflect the new largest key in n. </li></ul><ul><li>If n has exactly the minimum number of keys and one of the siblings of n has few enough keys, merge n with its sibling and delete a key from the parent node. </li></ul><ul><li>If n has exactly the minimum number of keys and one of the siblings of n has extra keys, redistribute by moving some keys from a sibling to n, and modify the higher level indexes to reflect the new largest keys in the affected nodes. </li></ul><ul><li>Redistribution: </li></ul><ul><li>Redistribution is a new idea which can restore the B-Tree properties by moving one key from a sibling into the node that has underflowed , even if the distribution of the keys between the pages is very uneven. </li></ul><ul><li>Redistribution during insertion is a way to avoid, or at least postpone, the creation of new nodes. </li></ul>
  9. 9. <ul><li>No change occurs when there is a removal of a key from the leaf node. </li></ul><ul><li>Deleting P- P changes to O in the 2 nd level & the root. </li></ul><ul><li>Removal of H causes an underflow. This results in the merging of two leaf nodes. </li></ul>Z O I I G D O M Z X T D C B A M L K J T S R Q Z Y G F E I H O N X W V U Z O I I D O M Z X T D C B A M L K J T S R Q Z Y I G F E O N X W V U
  10. 10. <ul><li>A database is a collection of data organized in a fashion that facilitates updating, retrieving, and managing the data. The data can consist of anything, including, but not limited to names, addresses, pictures, and numbers. Databases are commonplace and are used everyday. </li></ul><ul><li>For example, an airline reservation system might maintain a database of available flights, customers, and tickets issued. A teacher might maintain a database of student names and grades. </li></ul><ul><li>In order for a database to be useful and usable, it must support the desired operations, such as retrieval and storage, quickly. Because databases cannot typically be maintained entirely in memory, b-trees are often used to index the data and to provide fast access. </li></ul><ul><li>For example, searching an unindexed and unsorted database containing n key values will have a worst case running time of O(n); if the same data is indexed with a B-Tree, the same search operation will run in O(log n). </li></ul>