R-Trees and Geospatial Data Structures

1,900 views
1,668 views

Published on

R-Trees are an excellent data structure for managing geo-spatial data. Commonly used by mapping applications and any other applications that use the location to customize content. Minimum Bounding Rectangle (MBR) is a commonly used concept in R-trees, which are a modified form of B-trees.

Published in: Education
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,900
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
11
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

R-Trees and Geospatial Data Structures

  1. 1. CS 6213 – Advanced Data Structures – Lecture 7
  2. 2.  Instructor Prof. Amrinder Arora amrinder@gwu.edu Please copy TA on emails Please feel free to call as well  TA Iswarya Parupudi iswarya2291@gwmail.gwu.edu L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 2
  3. 3. CS 6213 Basics Record / Struct / Arrays / LLs Stacks / Queues Graphs / Trees / BSTs Heaps and PQs Advanced Trie, B-Tree Splay Trees R-Tree Union Find Applications Databases Spatial String In Memory L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 3
  4. 4.  Antonin Guttman, U. C. Berkeley  K. A. Mohamed L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 4
  5. 5. 5  Spatial Data  R-Tree Structure  Operations  Searching  Insertion  Deletion  Variants  Applications L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  6. 6. Given a city map, „index‟ all university buildings in an efficient structure for quick topological search. 6L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  7. 7. 7 “Index” buildings in an efficient structure for quick search Spatial object: Contour (outline) of the area around the building(s). Minimum bounding region (MBR) of the object. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  8. 8. 8 MBR of the city neighbourhoods. MBR of the city defining the overall search region. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  9. 9. Mostly involves 2D regions.  Need to support 2D range queries.  Multiple return values desired: Answering a query region by reporting all spatial objects that are fully-contained-in or overlapping the query region (Spatial-Access Method – SAM). In general:  Spatial data objects often cover areas in multidimensional spaces.  Spatial data objects are not well-represented by point-location.  An „index‟ based on an object‟s spatial location is desirable. 9L7 - R-TreesCS 6213 - Advanced Data Structures - Arora Problem Summary: To retrieve data items quickly and efficiently according to their spatial locations.
  10. 10.  A B-Tree is an ordered, dynamic, multi-way structure of order m (i.e. each node has at most m children).  The keys and the subtrees are arranged in the fashion of a search tree.  Each node may contain a large number of keys, and the number of subtrees in each node, then, may also be large.  The B-Tree is designed (among other objectives):  to branch out this large number of directions, and  to contain a lot of keys in each node so that the height of the tree is relatively short. 10 M P T X B D F G K L N O Q S V W Y ZI E H L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  11. 11.  A height-balanced tree, similar to a B-Tree.  Index records in the leaf nodes contain pointers to the actual spatial-objects (entries) they represent.  Each entry has a unique identifier that points to one spatial object, and its MBR; i.e., entry = (MBR, pointer).  Spatial searching requires visiting only a small number of nodes.  The index is completely dynamic: inserts and deletes can be intermixed with searches. (No periodic reorganization is required.) 11L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  12. 12.  Let M be the maximum number of entries that will fit in one node.  Let m ≤ M/2 be a parameter specifying the minimum number of entries in one node. Then an R-Tree must satisfy the following properties: 1. Every leaf node contains between m and M index records, unless it is the root. 2. For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR that spatially contains the n-dimensional data object represented by the tuple-identifier. 3. Every non-leaf node has between m and M children, unless it is the root. 4. For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that spatially contains the regions in the child node. 5. The root has two children unless it is a leaf. 6. All leaves appear on the same level. 12L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  13. 13.  An entry E in a leaf node is defined as: E = (I, tuple-identifier)  Where I refers to the smallest binding n-dimensional region (MBR) that encompasses the spatial data pointed to by its tuple- identifier.  I is a series of closed-intervals that make up each dimension of the binding region.  Example. In 2D, I = (Ix, Iy), where Ix = [xa, xb], and Iy = [ya, yb]. 13L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  14. 14. [Not limited to 2D – higher dimensions are certainly possible.]  In general I = (I0, I1, …, In-1) for n-dimensions, and that Ik = [ka, kb].  If either ka or kb (or both) are equal to , this means that the spatial object extends outward indefinitely along that dimension. 14L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  15. 15.  An entry E in a non-leaf node is defined as: E = (I, child-pointer)  Where the child-pointer points to the child of this node, and I is the MBR that encompasses all the regions in the child-node‟s pointer‟s entries. 15 I(A) I(B) … I(M) I(a) I(b) I(c) I(d) B a b c d L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  16. 16. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 16
  17. 17. a b c d e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 17
  18. 18. a b c d m a b cd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 18
  19. 19. a b c d m e f n a b cd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 19
  20. 20. a b c d m e f n h g i o p a b cd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 20
  21. 21. 21 Typical query: Find and report all university building sites that are within 5km of the city centre. Approach: i.Build the R-Tree using rectangular regions a, b, … i. ii.Formulate the query range Q. iii.Query the R- Tree and report all regions overlapping Q. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  22. 22. Let Q be the query region. Let T be the root of the R-Tree. Search all entry-records whose regions overlaps Q. Search sub-trees:  If T is not leaf, then apply Search on ever child-node entry E whose I overlaps Q. Search leaf nodes:  If T is leaf, then check each entry E in the leaf and return E if E.I overlaps Q. 22L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  23. 23. 23 r2 e r5 r8 r3 r4r1 r7r0 ic gf hba d @ r6 @ r2 @ r5 @ r8 @ r0 @ r1 @ r7 @ r3 @ r4 R-Tree settings: M = m = L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  24. 24. 24  The search algorithm descends the tree from the root in a manner similar to a B-Tree.  More than one subtree under a node visited may need to be searched.  Cannot guarantee good worst-case performance.  Countered by the algorithms during insertion, deletion, and update that maintain the tree in a form that allows the search algorithm to eliminate irrelevant regions of the indexed space.  So that only data near the search area need to be examined.  Emphasis is on the optimal placement of spatial objects with respect to the spatial location of other objects in the structure. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  25. 25.  A Node-Overflow happens when a new Entry is added to a fully packed node, causing the resulting number of entries in the node to exceed the upper-bound M.  The „overflow‟ node must be split, and all its current entries, as well as the new one, consolidated for local optimum arrangement.  A Node-Underflow happens when one or more Entries are removed from a node, causing the remaining number of entries in that node to fall below the lower-bound m.  The underflow node must be condensed, and its entries dispersed for global optimum arrangement. 25L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  26. 26. 26  New index entry-records are added to the leaves.  Nodes that overflow are split, and splits propagate up the tree.  A split-propagation may cause the tree to grow in height. The main Insert routine  Let E = (I, tuple-identifier) be the new entry to be inserted.  Let T be the root of the R-Tree.  [Ins_1] Locate a leaf L starting from T to insert E.  [Ins_2] Add E to L. If L is already full (overflow), split L into L and L‟.  [Ins_3] Propagate MBR changes (enlarged or reduced) upwards.  [Ins_4] Grow tree taller if node split propagation causes T to split. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  27. 27.  Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded.  Which leaf to insert into? (Choose Leaf)  How to split a node? (Node Split) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 27
  28. 28. m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 28
  29. 29. 29 [Ins_1] Locate a leaf L starting from T to insert E = (I, tuple-identifier).  Notion (i): Select the path that would require the least enlargement to include E.I.  Notion (ii): Resolve ties by choosing the child-node with the smallest MBR.  Invoke: L = ChooseLeaf (E, T). A B C @rN A C B E.I rN L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  30. 30. 30 Algorithm: ChooseLeaf (E, N) Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N. Output: The leaf L where E should be inserted.  If N is leaf Then Return N  Let FS be the set of current entries in the node N  Let F = (I, child-pointer) FS, so that F.I satisfies the Insertion- Notions  Return ChooseLeaf (E, F.child-pointer) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  31. 31. 31 [Ins_2] Add E to L.  Notion (i): If L has room for another entry, install E.  Notion (ii): Otherwise split L to obtain L and L‟, which between them, will contain all previous entries in L and the new E (consolidated for local optima). [Ins_3] Propagate MBR changes upwards by invoking AdjustTree (L, L‟).  Notion (i): Ascend from leaf L to the root T while adjusting the covering rectangles MBR.  Notion (ii): If L‟ exists, propagate node splits as necessary; i.e. attempt to install a new entry in the parent of L to point to L‟. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  32. 32. 32 Example. Found L = @Y to insert new E = e. R-Tree settings: M = 3, m = 1. K @G a b c @Y X Y Z @K L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  33. 33. 33 Algorithm: AdjustTree (N, N’) Inputs: (i) A node N that has had its contents modified, (ii) The resultant split node N‟, if not NULL, that accompanies N. Outputs: (i) N as above, (ii) N‟ as above.  If N is the root Then Return {(i) N, (ii) N‟}  Let PN be the parent node of N.  Let EN = (I_N, child-pointer_N) in PN, where child-pointer_N points to N.  Adjust I_N so that it tightly encloses all entry regions in N. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  34. 34. 34  If N‟ is Not NULL Then  If number of entries in PN < M-1 Then  Create a new Entry EN‟ = (I_N’, child-pointer_N’)  Install EN‟ in PN  Return AdjustTree (PN, NULL)  Else  Set {PN, PN‟} = SplitNode (PN, EN‟)  Return AdjustTree (PN, PN‟)  End If  Else  Return AdjustTree (PN, NULL)  End If L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  35. 35. [Ins_4] Grow Tree taller.  Notion: If the recursive node split propagation causes the root to split, then create a new root whose children are the two resulting nodes. 35 A B C @T (root) E F @C G H @C’ L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  36. 36. 36  The height of the R-Tree containing n entry-records is at most logm n – 1, because the branching factor of each node is at least m.  The maximum number of nodes is:  Worst case space utilisation for all nodes except the root is:  Nodes will tend to have more than m entries, and this will: L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  37. 37. 37  Current index entry-records are removed from the leaves.  Nodes that underflow are condensed, and its contents redistributed appropriately throughout the tree.  A condense propagation may cause the tree to shorten in height. The main Delete routine  Let E = (I, tuple-identifier) be a current entry to be removed.  Let T be the root of the R-Tree.  [Del_1] Find the leaf L starting from T that contains E.  [Ins_2] Remove E from L, and condense „underflow‟ nodes.  [Ins_3] Propagate MBR changes upwards.  [Ins_4] Shorten tree if T contains only 1 entry after condense propagation. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  38. 38.  [Del_1] Find the leaf L starting from T that contains E.  Algorithm: FindLeaf (E, N)  Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.  Output: The leaf L containing E.  If N is leaf Then  If N contains E Then Return N  Else Return NULL  Else  Let FS be the set of current entries in N.  For each F = (I, child-pointer) FS where F.I overlaps E.I Do  Set L = FindLeaf (E, F.child-pointer)  If L is not NULL Then Return L  Next F  Return NULL  End If 38L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  39. 39. [Del_2] Remove E from L, and condense „underflow‟ nodes. [Del_3] Propagate MBR changes upwards.  Notion (i): Ascend from leaf L to root T while adjusting covering rectangles MBR.  Notion (ii): If after removing the entry E in L and the number of entries in L becomes fewer than m, then the node L has to be eliminated and its remaining contents relocated. 39L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  40. 40.  Propagate these notions upwards by invoking CondenseTree (N, QS), where N is an R-Tree node whose entries have been modified, and QS is the set of eliminated nodes.  Start the propagation by setting N = L, and QS = .  Re-insert the entries from the eliminated nodes in QS back into the tree.  Entries from eliminated leaf nodes are re-inserted as new entries using the Insert routine discussed earlier.  Entries from higher-level nodes must be placed higher in the tree so that leaves of their dependent subtrees will be on the same level as the leaves on the main tree. 40L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  41. 41.  Example: Delete the index entry-record b. R-Tree settings: M = 4, m = 2.  Spatial constraint: a.I will form smallest MBR with r4. 41 r2 r6 @ r7 a b @ r0 r0 r1 @ r2 r3 r4 r5 @ r6 c d e @ r1 f g h @ r3 i j @ r4 k l m @ r5 n L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  42. 42. 42 Algorithm: CondenseTree (N, QS) Inputs: (i) A node N whose entries have been modified, (ii) A set of eliminated nodes QS.  If N is NOT the root Then  Let PN be the parent node of N.  Let EN = (I_N, child-pointer_N) in PN.  If N.entries < m Then  Delete EN from PN  Add N to QS  Else  Adjust I_N so that it tightly encloses all entry regions in N.  End If  CondenseTree (PN, QS) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  43. 43. 43  Else If N is root AND Q is NOT Then  For each Q QS Do  For each E Q Do  If Q is leaf Then Insert (E)  Else Insert (E) as a node entry at the same node level as Q  End If  Next E  Next Q  End If L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  44. 44. Why ‘re-insert’ orphaned entries?  Alternatively, like the delete routine in B-Tree (Rosenberg & Snyder, 1981), an „underflow‟ node can be merged with whichever adjacent sibling that will have its area increased the least, or its entries re-distributed among sibling nodes.  Both methods can cause the nodes to split.  Eventually all changes need to be propagated upwards, anyway. Re-insertion accomplishes the same thing, and:  It is simpler to implement (and at comparable efficiency).  It incrementally refines the spatial structure of the tree.  It prevents gradual deterioration if each entry was located permanently under the same parent node. 44L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  45. 45. 45  A high value of m, nearer to M, is useful when the underlying database represented by the R-Tree is mostly used for search inquiries with very few updates.  The height of the tree will be kept to a minimum.  High search performance is maintained.  However, the risk of overflow and underflow is also high.  A small value of m is good when frequent updates and modifications of the underlying database is required.  The nodes are less dense.  Maintenance is less costly. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  46. 46.  Avoids multiple paths during searching.  Objects may be stored in multiple nodes  MBRs of nodes at same tree level do not overlap  On insertion/deletion the tree may change downward or upward in order to maintain the structure L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 46 R-TreeVariants
  47. 47. http://perso.enst.fr/~saglio/bdas/EPFL0525/sld041.htm L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 47 R-TreeVariants
  48. 48.  Similar to other R-Trees except that the Hilbert value of its rectangle centroid is calculated.  That key is used to guide the insertion  On an overflow, evenly divide between two nodes  Experiments has shown that this scheme significantly improves performance and decreases insertion complexity.  Hilbert R-tree achieves up to 28% saving in the number of pages touched compared to R*-tree. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 48 R-TreeVariants
  49. 49.  The Hilbert value of an object is found by interleaving the bits of its x and y coordinates, and then chopping the binary string into 2- bit strings.  Then, for every 2-bit string, if the value is 0, we replace every 1 in the original string with a 3, and vice-versa.  If the value of the 2-bit string is 3, we replace all 2‟s and 0‟s in a similar fashion.  After this is done, you put all the 2-bit strings back together and compute the decimal value of the binary string;  This is the Hilbert value of the object.  http://www-users.cs.umn.edu/research/shashi- group/CS8715/exercise_ans.doc L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 49 R-TreeVariants
  50. 50.  Proposed by Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger in 1990  Same algorithm as the regular R-tree for query and delete operations.  When inserting, the R*-tree uses a combined strategy.  For leaf nodes, overlap is minimized  For inner nodes, enlargement and area are minimized.  When splitting, the R*-tree uses a topological split that chooses a split axis based on perimeter, then minimizes overlap.  In addition to an improved split strategy, the R*-tree also tries to avoid splits by reinserting objects and subtrees into the tree, inspired by the concept of balancing a B-tree. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 50 R-TreeVariants
  51. 51.  MBR: Minimum Bounding Rectangle  R-Trees are an extremely compelling data structure for spatial data.  Largely based on B-Tree (Can be considered a generalization of B-Tree)  Can support more than two dimensions  Support same basic operations (deletion, searching, insertion, update, etc.)  Many variants of R-Trees are available L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 51

×