Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Chap 2 – Dynamic Versions of R-trees  R-Trees Theory and Applications       指導教授:Kun-Ta Chuang        學生:Bo-Heng Chen
Abstract•   Introduction•   Assorted R-tree Variants•   Node Splitting•   Branch Grafting•   Compact R-trees & cR-trees•  ...
Introduction• Mentor  – Volker Gaede     • http://www.informatik.uni-trier.de/~ley/db/indices/a-       tree/g/Gaede:Volker...
(cont.)• Dynamic versions of the R-tree  – The objects are inserted on a one-by-one basis• Focus on the way in assorted R-...
+                         R -tree• The original R-tree has two important  disadvantages  – The execution of a point locati...
(cont.)• R+-tree were proposed as a structure  – avoids visiting multiple paths during point location    queries     • Thu...
(cont.)Object d is stored in two leaf nodes B and C !
(cont.)  • R+-tree - Insert               2      Scenario 1                                          m                    ...
(cont.)  • R+-tree – Delete    – All copies of and object’s MBR must be removed  entry A the corresponding leaf nodes     ...
(cont.)• Main difference between the R+-tree splitting  algorithm and that of R-tree  – In the R-tree, upward propagation ...
R*-tree• As already discussed, the R-tree is based solely  on the area minimization of each MBR• The criteria considered b...
For the leaf nodes, ChooseSubtree  (cont.)            considers the overlapping                     minimization criterion...
(cont.) • R*-tree – Insert    – In case ChooseSubtree that N a leaf that cannot                   Assume selects1 is overf...
(cont.)It considers every division of the sorted list thatensures that each node is at least 40% fullThe final division is...
(cont.)• R*-tree – Delete  – Deletion in the R∗-tree is performed with the    deletion algorithm of the original R-tree
The Hilbert R-tree• Hybrid structure based on the R-tree and the B+-tree• An entry e of an internal node is a triplet(mbr,...
(cont.)• Hilbert tree - Insert
(cont.)• An important characteristic of the Hilbert R-tree  that is missing from other variants   – There exists an order ...
(cont.)
Linear Node Splitting• Criterion of this algorithm  – distribute the objects between the two nodes as    evenly as possibl...
Optimal Node Splitting• Three node splitting algorithms were proposed by  Guttman to handle a node overflow  – linear algo...
Branch Grafting• Objectives  – achieve better-shaped R-trees and to reduce the    total number of nodes     • Both these f...
(cont.)
(cont.)• Example    C            H
Compact R-trees• Motivation  – Rtrees, R+-trees, and R*-trees suffer from the    storage utilization problem, which is aro...
(cont.)• Among theM+1 entries of an overflowing  node during insertions, a set of M entries is  selected to remain in this...
(cont.)• Performance evaluation results have shown  that the storage utilization of the new  heuristic is between 97% and ...
cR-treesThe empirical studies provided in the paperillustrate that the cR-tree query performancewas competitive with the R...
Deviating Variations• Sphere-tree   – uses minimum bounding spheres instead of MBRs• Cell-tree   – Uses minimum bounding p...
PR-trees• A provably asymptotically optimal variation of the  R-tree• height-balance tree  – i.e., all its leaves are at t...
LR-trees• The LR-tree is an index structure based on the  logarithmic dynamization method• Example  – base B=2 , capacity ...
(cont.)12 = 1100 211 1011
Summary• Evidently, the original R-tree, proposed by  Guttman, has influenced all the forthcoming  variations of dynamic R...
(cont.)• The empirical study has shown that the  Hilbert R-tree can perform better than the  other variants in some cases•...
English• In this chapter, we are further focusing on the  family of R-trees by enlightening the similarities  and differen...
(cont.)• As already discussed, the R-tree is based solely  on the area minimization of each MBR.(P.18)• The Hilbert R-tree...
(cont.)• Therefore, the best compromise between  efficiency and bipartition optimality is the  quadratic algorithm.(P.24)•...
(cont.)• It is worth mentioning that the PR-tree,  although a variant that deviates from other  existing ones, is the firs...
(cont.)• Therefore, despite its more complex building  algorithm, it has to be considered the best  variant reported so far.
Upcoming SlideShare
Loading in …5
×

Chap 2 – dynamic versions of r trees

758 views

Published on

  • Be the first to comment

  • Be the first to like this

Chap 2 – dynamic versions of r trees

  1. 1. Chap 2 – Dynamic Versions of R-trees R-Trees Theory and Applications 指導教授:Kun-Ta Chuang 學生:Bo-Heng Chen
  2. 2. Abstract• Introduction• Assorted R-tree Variants• Node Splitting• Branch Grafting• Compact R-trees & cR-trees• Deviating Variations• Summary
  3. 3. Introduction• Mentor – Volker Gaede • http://www.informatik.uni-trier.de/~ley/db/indices/a- tree/g/Gaede:Volker.html – Oliver Günther, U. Potsdam • http://www.informatik.uni-trier.de/~ley/db/indices/a- tree/g/G=uuml=nther:Oliver.html
  4. 4. (cont.)• Dynamic versions of the R-tree – The objects are inserted on a one-by-one basis• Focus on the way in assorted R-tree variants – dynamic insertions – dynamic splits
  5. 5. + R -tree• The original R-tree has two important disadvantages – The execution of a point location query in an R-tree may lead to the investigation of several paths from the root to the leaf level • This characteristic may lead to performance deterioration, specifically when the overlap of the MBRs is significant – A few large rectangles may increase the degree of overlap significantly • Leading to performance degradation during range query execution, due to empty space
  6. 6. (cont.)• R+-tree were proposed as a structure – avoids visiting multiple paths during point location queries • Thus, the retrieval performance could be improved – MBR overlapping of internal nodes is avoided • R+-trees do not allow overlapping of MBRs at the same tree level • To achieve this, a specific object’s entries may be duplicated and redundantly stored in several nodes
  7. 7. (cont.)Object d is stored in two leaf nodes B and C !
  8. 8. (cont.) • R+-tree - Insert 2 Scenario 1 m m Call SplitNode(B) ! Insert(m,A.ptr)Call Insert(m,D.ptr) ! Perform appropriate tree reorganization to reflect changes m m
  9. 9. (cont.) • R+-tree – Delete – All copies of and object’s MBR must be removed entry A the corresponding leaf nodes from D mCall Delete(m,A.ptr) !CallDelete(m,D.ptr) ! Calculatem from A.ptr of the node Remove the new MBR D.ptr Adjust the MBR of the parent node accordingly m m
  10. 10. (cont.)• Main difference between the R+-tree splitting algorithm and that of R-tree – In the R-tree, upward propagation is sufficient to guarantee the structure’s integrity – In the R+-tree, downward propagation may be necessary, in addition to the upward propagation
  11. 11. R*-tree• As already discussed, the R-tree is based solely on the area minimization of each MBR• The criteria considered by the R*-tree are the following – Minimization of the area covered by each MBR – Minimization of overlap between MBRs – Minimization of MBR margins (perimeters) – Maximization of storage utilization
  12. 12. For the leaf nodes, ChooseSubtree (cont.) considers the overlapping minimization criterion, because • R*-tree – Insert experimental results in [19] – For the insertion of a new entry, we have to decide which branch to follow, at each level of the tree • This algorithm is called ChooseSubtree[19]N. Beckmann, H.P. Kriegel, R. chooseand B. entry “The R∗-tree: an Efficient Schneider the Seeger: whose MBR needsand Robust Method for Points and Rectangles”, Proceedings ACM SIGMODConference on Management the least area Atlantic City, NJ, 1990. cover k of Data, pp.322-331, enlargement to
  13. 13. (cont.) • R*-tree – Insert – In case ChooseSubtree that N a leaf that cannot Assume selects1 is overflowed. accommodate the new entry, the R∗-tree does not immediately resort to node splittingEntry b is selected for reinsertion, as its centroid is thefarthest from the centroid of N1Reinsertion is a costly operation. Therefore, onlyone application of reinsertion is permitted for eachlevel of the tree
  14. 14. (cont.)It considers every division of the sorted list thatensures that each node is at least 40% fullThe final division is the one that has the minimumWhen overflow cannot be handled by reinsertion,overlap between the MBRs of the resulting nodesnode splitting is performed
  15. 15. (cont.)• R*-tree – Delete – Deletion in the R∗-tree is performed with the deletion algorithm of the original R-tree
  16. 16. The Hilbert R-tree• Hybrid structure based on the R-tree and the B+-tree• An entry e of an internal node is a triplet(mbr,H,p) – mbr is the MBR that encloses all the objects in the corresponding subtree – H is the maximum Hilbert value of the subtree – p is the pointer to the next level• Entries in leaf nodes are exactly the same as in R- trees, R+-trees, and R*-trees and are of the form (mbr,oid) – mbr is the MBR of the object – oid is the corresponding object identifier
  17. 17. (cont.)• Hilbert tree - Insert
  18. 18. (cont.)• An important characteristic of the Hilbert R-tree that is missing from other variants – There exists an order of the nodes at each tree level, respecting the Hilbert order of the MBRs• Instead of splitting a node immediately after its capacity has been exceeded, we try to store some entries in sibling nodes – A split takes places only if all siblings are also full. – This unique property of the Hilbert R-tree helps considerably in storage utilization increase, and avoids unnecessary split operations
  19. 19. (cont.)
  20. 20. Linear Node Splitting• Criterion of this algorithm – distribute the objects between the two nodes as evenly as possible – the minimization of the overlapping between them – the minimization of the total coverage
  21. 21. Optimal Node Splitting• Three node splitting algorithms were proposed by Guttman to handle a node overflow – linear algorithm • More time efficient but fails to determine an optimal rectangle bipartition – exponential algorithm • Achieve the optimal bipartitioning of the rectangles, at the expense of increased splitting cost – quadratic algorithm • The best compromise between efficiency and bipartition optimally
  22. 22. Branch Grafting• Objectives – achieve better-shaped R-trees and to reduce the total number of nodes • Both these factors can improve performance during query processing
  23. 23. (cont.)
  24. 24. (cont.)• Example C H
  25. 25. Compact R-trees• Motivation – Rtrees, R+-trees, and R*-trees suffer from the storage utilization problem, which is around 70% in the average case – Therefore, the authors improve the insertion mechanism of R-trees to a more compact R-tree structure, with no penalty on performance during queries
  26. 26. (cont.)• Among theM+1 entries of an overflowing node during insertions, a set of M entries is selected to remain in this node, under the constraint that the resulting MBR is the minimum possible• Then the remaining entry is inserted to a sibling that – has available space – whose MBR is enlarged as little as possible
  27. 27. (cont.)• Performance evaluation results have shown that the storage utilization of the new heuristic is between 97% and 99%• A direct impact of the storage utilization improvement is the fact that fewer tree nodes are required to index a given dataset• Moreover, less time is required to build the tree by individual insertions, because of the reduced number of split operations required
  28. 28. cR-treesThe empirical studies provided in the paperillustrate that the cR-tree query performancewas competitive with the R*-tree and wasmuch better than that of the R-tree
  29. 29. Deviating Variations• Sphere-tree – uses minimum bounding spheres instead of MBRs• Cell-tree – Uses minimum bounding polygons designed to accommodate arbitrary shape objects• P-tree(Polyhedral tree) – use minimum bounding polygons instead of MBRs• QR-tree – hybrid access method composed of a Quadtree and a forest of R-trees
  30. 30. PR-trees• A provably asymptotically optimal variation of the R-tree• height-balance tree – i.e., all its leaves are at the same level• Query performance – real data • PR-trees perform similar to existing R-tree variants – extreme data(very skewed data) • PR-trees outperform all other variants, due to their guaranteed worst-case performance
  31. 31. LR-trees• The LR-tree is an index structure based on the logarithmic dynamization method• Example – base B=2 , capacity c=4 – 11 items
  32. 32. (cont.)12 = 1100 211 1011
  33. 33. Summary• Evidently, the original R-tree, proposed by Guttman, has influenced all the forthcoming variations of dynamic R-tree structures• The R*-tree followed an engineering approach and evaluated several factors that affect the performance of the R-tree – it is considered the most robust variant and has found numerous applications, in both research and commercial systems
  34. 34. (cont.)• The empirical study has shown that the Hilbert R-tree can perform better than the other variants in some cases• PR-tree is the first approach that offers guaranteed worst-case performance and overcomes the degenerated cases when almost the entire tree has to be traversed – Despite its more complex building algorithm, it has to be considered the best variant reported so far
  35. 35. English• In this chapter, we are further focusing on the family of R-trees by enlightening the similarities and differences, advantages and disadvantages of the variations in a more exhaustive manner. (P.15)• We presented dynamic versions of the R- tree, where the objects are inserted on a one-by- one basis, as opposed to the case where a special packing technique can be applied to insert an priori known static set of object into the structure by optimizing the storage overhead and retrieval performance.(P.15)
  36. 36. (cont.)• As already discussed, the R-tree is based solely on the area minimization of each MBR.(P.18)• The Hilbert R-tree [105] is a hybrid structure based on the R-tree and the B+-tree (P.20)• Instead of splitting a node immediately after its capacity has been exceeded, we try to store some entries in sibling nodes(P.22)
  37. 37. (cont.)• Therefore, the best compromise between efficiency and bipartition optimality is the quadratic algorithm.(P.24)• In particular, the objects of an overflowing node are optimally separated in two sets.• The motivation behind the proposed approach is that R-trees, R+-trees, and R*-trees suffer from the storage utilization problem
  38. 38. (cont.)• It is worth mentioning that the PR-tree, although a variant that deviates from other existing ones, is the first approach that offers guaranteed worst-case performance and overcomes the degenerated cases when almost the entire tree has to be traversed.(P.34)
  39. 39. (cont.)• Therefore, despite its more complex building algorithm, it has to be considered the best variant reported so far.

×