Indexing Data Structure

16,595 views

Published on

Published in: Education, Technology, Business
  • Be the first to comment

Indexing Data Structure

  1. 1. Vivek Kantariya (09bce020) Guided by : Prof. Vibha Patel
  2. 2. <ul><li>Manage large data </li></ul><ul><li>Provide faster access </li></ul><ul><li>Easy search </li></ul><ul><li>Reduce unwanted memory access </li></ul><ul><li>Proper memory allocation </li></ul><ul><li>Increase efficiency </li></ul>
  3. 3. <ul><li>It contains a search key and a pointer. </li></ul><ul><li>Search key - an attribute or set of attributes that is used to look up the records in a file. </li></ul><ul><li>Pointer - contains the address of where the data is stored in memory. </li></ul>
  4. 4. <ul><li>Five Factors involved when choosing the indexing technique: </li></ul><ul><li>access type </li></ul><ul><li>access time </li></ul><ul><li>insertion time </li></ul><ul><li>deletion time </li></ul><ul><li>space overhead </li></ul>
  5. 5. <ul><li>Access type - is the type of access being used. </li></ul><ul><li>Access time - time required to locate the data. </li></ul><ul><li>Insertion time - time required to insert the new data. </li></ul><ul><li>Deletion time - time required to delete the data. </li></ul><ul><li>Space overhead - the additional space occupied by the added data structure. </li></ul>
  6. 6. <ul><li>It is for multi- dimension data. </li></ul><ul><li>Used to describe 2D or 3D objects. </li></ul><ul><li>Real world usage. </li></ul><ul><li>Examples are : </li></ul><ul><li>R tree , R+ tree , KD tree , A tree , Hilbert tree , etc </li></ul>
  7. 7. <ul><li>Computer Aided Design (CAD) </li></ul><ul><li>Geographic applications (like maps) </li></ul><ul><li>Multimedia Applications (like X-rays) </li></ul><ul><li>Biological Databases </li></ul>
  8. 8. <ul><li>Any Type of Geometry </li></ul><ul><ul><li>Point </li></ul></ul><ul><ul><ul><li>City </li></ul></ul></ul><ul><ul><li>Line </li></ul></ul><ul><ul><ul><li>Trail </li></ul></ul></ul><ul><ul><li>Polygon </li></ul></ul><ul><ul><ul><li>Border </li></ul></ul></ul><ul><ul><li>A Collection of Geometries </li></ul></ul><ul><ul><ul><li>Ski Resort Trails </li></ul></ul></ul><ul><li>Any Coordinate System </li></ul><ul><ul><li>Meters </li></ul></ul><ul><ul><li>Pixels </li></ul></ul><ul><ul><li>WGS84 (GPS) </li></ul></ul>
  9. 10. <ul><li>Proposed by </li></ul><ul><ul><li>Antonin Guttman </li></ul></ul><ul><ul><li>UC Berkley </li></ul></ul><ul><li>All Spatial Data Enveloped </li></ul><ul><ul><li>Minimum Bounding Rectangle (MBR) </li></ul></ul><ul><li>Stored and Indexed According to MBR </li></ul><ul><li>Structure Resembles B+-tree </li></ul><ul><ul><li>Height Balanced </li></ul></ul>
  10. 11. <ul><li>For an index record <I, tuple-identifier> </li></ul><ul><ul><li>I = (I 0 , I 1 , … I n ) </li></ul></ul><ul><ul><ul><li>n = Number of Dimensions in the Geometry </li></ul></ul></ul><ul><ul><ul><li>Each I is a set of the form [a,b] describing the range of the rectangle along the dimension </li></ul></ul></ul><ul><ul><ul><ul><li>a or b can be equal to infinity </li></ul></ul></ul></ul><ul><ul><li>Tuple-identifier points to a record </li></ul></ul><ul><li>Non-leaf nodes are in the form: </li></ul><ul><li><I, child-pointer> </li></ul>
  11. 12. <ul><ul><li>M is the maximum number of entries in one node </li></ul></ul><ul><ul><li>m specifies the minimum number of entries in a node , where m ≤ M/2 </li></ul></ul><ul><ul><li>Properties : </li></ul></ul><ul><ul><li>Every Leaf Node Contains Between m and M index records unless it is root. </li></ul></ul><ul><ul><li>For each index record, <I, tuple-identifier> in a leaf node is the smallest rectangle that spatially contains the n-dimensional data object. </li></ul></ul>
  12. 13. <ul><li>Every non-leaf node has between m and M children unless it is the root. </li></ul><ul><li>For each entry <I, child-pointer> in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child nodes. </li></ul><ul><li>The root node has at least two children unless it is a leaf. </li></ul><ul><li>All leaves appear on the same level. </li></ul>
  13. 15. <ul><li>Search </li></ul><ul><li>Insert </li></ul><ul><li>Delete </li></ul><ul><li>Nearest Neighbor </li></ul>
  14. 16. <ul><li>Given R-tree with root T and and all records overlap with Search rectangle S. </li></ul><ul><li>If T is not leaf, check each entry E to determine whether Ei overlaps with S. </li></ul><ul><li>For all overlapping entries invoke search on each of them with root as node pointed by Ep. </li></ul><ul><li>If T is a leaf check each entry E. If it overlaps output it. </li></ul>
  15. 17. <ul><li>Start at the root node </li></ul><ul><li>Select the child that needs the least enlargement in order to fit the new geometry. </li></ul><ul><li>Repeat until at a leaf node. </li></ul><ul><li>If leaf node has available space then insert. </li></ul>
  16. 18. <ul><li>Else split the entry into two nodes. </li></ul><ul><ul><li>Update parent nodes </li></ul></ul><ul><ul><li>Update the entry that pointed to the node with a new MBR [ Minimum Bounding Rectangle ] . </li></ul></ul><ul><ul><li>Add a new entry for the second new node </li></ul></ul><ul><li>If there is no space in the parent node, split and repeat. </li></ul>
  17. 19. <ul><li>Make sure nodes are split so they cover the smallest possible area. </li></ul><ul><li>Split should minimize average search time. </li></ul>GOOD SPLIT! BAD !
  18. 20. <ul><li>Remove index node E from R-Tree. </li></ul><ul><li>Find node containing record. </li></ul><ul><li>Remove E. </li></ul><ul><li>If node contains fewer than m records remove the node and add it to Queue. </li></ul><ul><li>Move up and do the same reducing covering rectangles. </li></ul><ul><li>Reinsert all records from Queue. </li></ul>
  19. 21. <ul><ul><li>Split Entries in the tree so that there is no overlap </li></ul></ul><ul><ul><li>No more multiple paths to reach a solution </li></ul></ul><ul><ul><li>Child pointers duplicated within the tree </li></ul></ul>R-Tree MBRs R+-Tree MBRs
  20. 22. <ul><ul><li>Do not split nodes on insert </li></ul></ul><ul><ul><li>Take entries from the overfull node and reinsert them into the tree </li></ul></ul><ul><ul><ul><li>Changes MBRs </li></ul></ul></ul><ul><ul><li>Saves time and possibly rebalances the tree </li></ul></ul>
  21. 23. <ul><li>www.ieeexplore.ieee.org </li></ul><ul><ul><li>A NEW APPROACH TO CREATING SPATIAL INDEX WITH R-TREE by </li></ul></ul><ul><ul><li>Ze-Bao Zhang, Jian-Pei Zhang, Jing Yang, Yue Yang </li></ul></ul><ul><ul><li>A NEW VARIATION OF R-TREE FOR INDEXING SPACIAL DATA IN GIS by </li></ul></ul><ul><ul><li>Chen Yongkang , Zhou Xintie , Shi Tailai , Feng Xiaoming </li></ul></ul><ul><li>http://wikipedia.org/wiki/R_tree </li></ul>

×