Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- R-trees (data structure) by mahesamrin 1426 views
- B-tree & R-tree by Shakil Ahmed 2233 views
- RTree Spatial Indexing with MongoDB... by Nicholas Knize, P... 10843 views
- R-trees – adapting out-of-core tech... by Олег Андреев 1954 views
- High Dimensional Indexing using Mon... by Nicholas Knize, P... 2305 views
- MongoDB for Spatio-Behavioral Data ... by MongoDB 1910 views

1,900 views

1,668 views

1,668 views

Published on

Published in:
Education

No Downloads

Total views

1,900

On SlideShare

0

From Embeds

0

Number of Embeds

14

Shares

0

Downloads

11

Comments

0

Likes

7

No embeds

No notes for slide

- 1. CS 6213 – Advanced Data Structures – Lecture 7
- 2. Instructor Prof. Amrinder Arora amrinder@gwu.edu Please copy TA on emails Please feel free to call as well TA Iswarya Parupudi iswarya2291@gwmail.gwu.edu L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 2
- 3. CS 6213 Basics Record / Struct / Arrays / LLs Stacks / Queues Graphs / Trees / BSTs Heaps and PQs Advanced Trie, B-Tree Splay Trees R-Tree Union Find Applications Databases Spatial String In Memory L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 3
- 4. Antonin Guttman, U. C. Berkeley K. A. Mohamed L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 4
- 5. 5 Spatial Data R-Tree Structure Operations Searching Insertion Deletion Variants Applications L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 6. Given a city map, „index‟ all university buildings in an efficient structure for quick topological search. 6L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 7. 7 “Index” buildings in an efficient structure for quick search Spatial object: Contour (outline) of the area around the building(s). Minimum bounding region (MBR) of the object. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 8. 8 MBR of the city neighbourhoods. MBR of the city defining the overall search region. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 9. Mostly involves 2D regions. Need to support 2D range queries. Multiple return values desired: Answering a query region by reporting all spatial objects that are fully-contained-in or overlapping the query region (Spatial-Access Method – SAM). In general: Spatial data objects often cover areas in multidimensional spaces. Spatial data objects are not well-represented by point-location. An „index‟ based on an object‟s spatial location is desirable. 9L7 - R-TreesCS 6213 - Advanced Data Structures - Arora Problem Summary: To retrieve data items quickly and efficiently according to their spatial locations.
- 10. A B-Tree is an ordered, dynamic, multi-way structure of order m (i.e. each node has at most m children). The keys and the subtrees are arranged in the fashion of a search tree. Each node may contain a large number of keys, and the number of subtrees in each node, then, may also be large. The B-Tree is designed (among other objectives): to branch out this large number of directions, and to contain a lot of keys in each node so that the height of the tree is relatively short. 10 M P T X B D F G K L N O Q S V W Y ZI E H L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 11. A height-balanced tree, similar to a B-Tree. Index records in the leaf nodes contain pointers to the actual spatial-objects (entries) they represent. Each entry has a unique identifier that points to one spatial object, and its MBR; i.e., entry = (MBR, pointer). Spatial searching requires visiting only a small number of nodes. The index is completely dynamic: inserts and deletes can be intermixed with searches. (No periodic reorganization is required.) 11L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 12. Let M be the maximum number of entries that will fit in one node. Let m ≤ M/2 be a parameter specifying the minimum number of entries in one node. Then an R-Tree must satisfy the following properties: 1. Every leaf node contains between m and M index records, unless it is the root. 2. For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR that spatially contains the n-dimensional data object represented by the tuple-identifier. 3. Every non-leaf node has between m and M children, unless it is the root. 4. For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that spatially contains the regions in the child node. 5. The root has two children unless it is a leaf. 6. All leaves appear on the same level. 12L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 13. An entry E in a leaf node is defined as: E = (I, tuple-identifier) Where I refers to the smallest binding n-dimensional region (MBR) that encompasses the spatial data pointed to by its tuple- identifier. I is a series of closed-intervals that make up each dimension of the binding region. Example. In 2D, I = (Ix, Iy), where Ix = [xa, xb], and Iy = [ya, yb]. 13L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 14. [Not limited to 2D – higher dimensions are certainly possible.] In general I = (I0, I1, …, In-1) for n-dimensions, and that Ik = [ka, kb]. If either ka or kb (or both) are equal to , this means that the spatial object extends outward indefinitely along that dimension. 14L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 15. An entry E in a non-leaf node is defined as: E = (I, child-pointer) Where the child-pointer points to the child of this node, and I is the MBR that encompasses all the regions in the child-node‟s pointer‟s entries. 15 I(A) I(B) … I(M) I(a) I(b) I(c) I(d) B a b c d L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 16. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 16
- 17. a b c d e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 17
- 18. a b c d m a b cd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 18
- 19. a b c d m e f n a b cd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 19
- 20. a b c d m e f n h g i o p a b cd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 20
- 21. 21 Typical query: Find and report all university building sites that are within 5km of the city centre. Approach: i.Build the R-Tree using rectangular regions a, b, … i. ii.Formulate the query range Q. iii.Query the R- Tree and report all regions overlapping Q. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 22. Let Q be the query region. Let T be the root of the R-Tree. Search all entry-records whose regions overlaps Q. Search sub-trees: If T is not leaf, then apply Search on ever child-node entry E whose I overlaps Q. Search leaf nodes: If T is leaf, then check each entry E in the leaf and return E if E.I overlaps Q. 22L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 23. 23 r2 e r5 r8 r3 r4r1 r7r0 ic gf hba d @ r6 @ r2 @ r5 @ r8 @ r0 @ r1 @ r7 @ r3 @ r4 R-Tree settings: M = m = L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 24. 24 The search algorithm descends the tree from the root in a manner similar to a B-Tree. More than one subtree under a node visited may need to be searched. Cannot guarantee good worst-case performance. Countered by the algorithms during insertion, deletion, and update that maintain the tree in a form that allows the search algorithm to eliminate irrelevant regions of the indexed space. So that only data near the search area need to be examined. Emphasis is on the optimal placement of spatial objects with respect to the spatial location of other objects in the structure. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 25. A Node-Overflow happens when a new Entry is added to a fully packed node, causing the resulting number of entries in the node to exceed the upper-bound M. The „overflow‟ node must be split, and all its current entries, as well as the new one, consolidated for local optimum arrangement. A Node-Underflow happens when one or more Entries are removed from a node, causing the remaining number of entries in that node to fall below the lower-bound m. The underflow node must be condensed, and its entries dispersed for global optimum arrangement. 25L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 26. 26 New index entry-records are added to the leaves. Nodes that overflow are split, and splits propagate up the tree. A split-propagation may cause the tree to grow in height. The main Insert routine Let E = (I, tuple-identifier) be the new entry to be inserted. Let T be the root of the R-Tree. [Ins_1] Locate a leaf L starting from T to insert E. [Ins_2] Add E to L. If L is already full (overflow), split L into L and L‟. [Ins_3] Propagate MBR changes (enlarged or reduced) upwards. [Ins_4] Grow tree taller if node split propagation causes T to split. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 27. Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded. Which leaf to insert into? (Choose Leaf) How to split a node? (Node Split) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 27
- 28. m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 28
- 29. 29 [Ins_1] Locate a leaf L starting from T to insert E = (I, tuple-identifier). Notion (i): Select the path that would require the least enlargement to include E.I. Notion (ii): Resolve ties by choosing the child-node with the smallest MBR. Invoke: L = ChooseLeaf (E, T). A B C @rN A C B E.I rN L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 30. 30 Algorithm: ChooseLeaf (E, N) Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N. Output: The leaf L where E should be inserted. If N is leaf Then Return N Let FS be the set of current entries in the node N Let F = (I, child-pointer) FS, so that F.I satisfies the Insertion- Notions Return ChooseLeaf (E, F.child-pointer) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 31. 31 [Ins_2] Add E to L. Notion (i): If L has room for another entry, install E. Notion (ii): Otherwise split L to obtain L and L‟, which between them, will contain all previous entries in L and the new E (consolidated for local optima). [Ins_3] Propagate MBR changes upwards by invoking AdjustTree (L, L‟). Notion (i): Ascend from leaf L to the root T while adjusting the covering rectangles MBR. Notion (ii): If L‟ exists, propagate node splits as necessary; i.e. attempt to install a new entry in the parent of L to point to L‟. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 32. 32 Example. Found L = @Y to insert new E = e. R-Tree settings: M = 3, m = 1. K @G a b c @Y X Y Z @K L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 33. 33 Algorithm: AdjustTree (N, N’) Inputs: (i) A node N that has had its contents modified, (ii) The resultant split node N‟, if not NULL, that accompanies N. Outputs: (i) N as above, (ii) N‟ as above. If N is the root Then Return {(i) N, (ii) N‟} Let PN be the parent node of N. Let EN = (I_N, child-pointer_N) in PN, where child-pointer_N points to N. Adjust I_N so that it tightly encloses all entry regions in N. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 34. 34 If N‟ is Not NULL Then If number of entries in PN < M-1 Then Create a new Entry EN‟ = (I_N’, child-pointer_N’) Install EN‟ in PN Return AdjustTree (PN, NULL) Else Set {PN, PN‟} = SplitNode (PN, EN‟) Return AdjustTree (PN, PN‟) End If Else Return AdjustTree (PN, NULL) End If L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 35. [Ins_4] Grow Tree taller. Notion: If the recursive node split propagation causes the root to split, then create a new root whose children are the two resulting nodes. 35 A B C @T (root) E F @C G H @C’ L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 36. 36 The height of the R-Tree containing n entry-records is at most logm n – 1, because the branching factor of each node is at least m. The maximum number of nodes is: Worst case space utilisation for all nodes except the root is: Nodes will tend to have more than m entries, and this will: L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 37. 37 Current index entry-records are removed from the leaves. Nodes that underflow are condensed, and its contents redistributed appropriately throughout the tree. A condense propagation may cause the tree to shorten in height. The main Delete routine Let E = (I, tuple-identifier) be a current entry to be removed. Let T be the root of the R-Tree. [Del_1] Find the leaf L starting from T that contains E. [Ins_2] Remove E from L, and condense „underflow‟ nodes. [Ins_3] Propagate MBR changes upwards. [Ins_4] Shorten tree if T contains only 1 entry after condense propagation. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 38. [Del_1] Find the leaf L starting from T that contains E. Algorithm: FindLeaf (E, N) Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N. Output: The leaf L containing E. If N is leaf Then If N contains E Then Return N Else Return NULL Else Let FS be the set of current entries in N. For each F = (I, child-pointer) FS where F.I overlaps E.I Do Set L = FindLeaf (E, F.child-pointer) If L is not NULL Then Return L Next F Return NULL End If 38L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 39. [Del_2] Remove E from L, and condense „underflow‟ nodes. [Del_3] Propagate MBR changes upwards. Notion (i): Ascend from leaf L to root T while adjusting covering rectangles MBR. Notion (ii): If after removing the entry E in L and the number of entries in L becomes fewer than m, then the node L has to be eliminated and its remaining contents relocated. 39L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 40. Propagate these notions upwards by invoking CondenseTree (N, QS), where N is an R-Tree node whose entries have been modified, and QS is the set of eliminated nodes. Start the propagation by setting N = L, and QS = . Re-insert the entries from the eliminated nodes in QS back into the tree. Entries from eliminated leaf nodes are re-inserted as new entries using the Insert routine discussed earlier. Entries from higher-level nodes must be placed higher in the tree so that leaves of their dependent subtrees will be on the same level as the leaves on the main tree. 40L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 41. Example: Delete the index entry-record b. R-Tree settings: M = 4, m = 2. Spatial constraint: a.I will form smallest MBR with r4. 41 r2 r6 @ r7 a b @ r0 r0 r1 @ r2 r3 r4 r5 @ r6 c d e @ r1 f g h @ r3 i j @ r4 k l m @ r5 n L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 42. 42 Algorithm: CondenseTree (N, QS) Inputs: (i) A node N whose entries have been modified, (ii) A set of eliminated nodes QS. If N is NOT the root Then Let PN be the parent node of N. Let EN = (I_N, child-pointer_N) in PN. If N.entries < m Then Delete EN from PN Add N to QS Else Adjust I_N so that it tightly encloses all entry regions in N. End If CondenseTree (PN, QS) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 43. 43 Else If N is root AND Q is NOT Then For each Q QS Do For each E Q Do If Q is leaf Then Insert (E) Else Insert (E) as a node entry at the same node level as Q End If Next E Next Q End If L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 44. Why ‘re-insert’ orphaned entries? Alternatively, like the delete routine in B-Tree (Rosenberg & Snyder, 1981), an „underflow‟ node can be merged with whichever adjacent sibling that will have its area increased the least, or its entries re-distributed among sibling nodes. Both methods can cause the nodes to split. Eventually all changes need to be propagated upwards, anyway. Re-insertion accomplishes the same thing, and: It is simpler to implement (and at comparable efficiency). It incrementally refines the spatial structure of the tree. It prevents gradual deterioration if each entry was located permanently under the same parent node. 44L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 45. 45 A high value of m, nearer to M, is useful when the underlying database represented by the R-Tree is mostly used for search inquiries with very few updates. The height of the tree will be kept to a minimum. High search performance is maintained. However, the risk of overflow and underflow is also high. A small value of m is good when frequent updates and modifications of the underlying database is required. The nodes are less dense. Maintenance is less costly. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
- 46. Avoids multiple paths during searching. Objects may be stored in multiple nodes MBRs of nodes at same tree level do not overlap On insertion/deletion the tree may change downward or upward in order to maintain the structure L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 46 R-TreeVariants
- 47. http://perso.enst.fr/~saglio/bdas/EPFL0525/sld041.htm L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 47 R-TreeVariants
- 48. Similar to other R-Trees except that the Hilbert value of its rectangle centroid is calculated. That key is used to guide the insertion On an overflow, evenly divide between two nodes Experiments has shown that this scheme significantly improves performance and decreases insertion complexity. Hilbert R-tree achieves up to 28% saving in the number of pages touched compared to R*-tree. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 48 R-TreeVariants
- 49. The Hilbert value of an object is found by interleaving the bits of its x and y coordinates, and then chopping the binary string into 2- bit strings. Then, for every 2-bit string, if the value is 0, we replace every 1 in the original string with a 3, and vice-versa. If the value of the 2-bit string is 3, we replace all 2‟s and 0‟s in a similar fashion. After this is done, you put all the 2-bit strings back together and compute the decimal value of the binary string; This is the Hilbert value of the object. http://www-users.cs.umn.edu/research/shashi- group/CS8715/exercise_ans.doc L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 49 R-TreeVariants
- 50. Proposed by Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger in 1990 Same algorithm as the regular R-tree for query and delete operations. When inserting, the R*-tree uses a combined strategy. For leaf nodes, overlap is minimized For inner nodes, enlargement and area are minimized. When splitting, the R*-tree uses a topological split that chooses a split axis based on perimeter, then minimizes overlap. In addition to an improved split strategy, the R*-tree also tries to avoid splits by reinserting objects and subtrees into the tree, inspired by the concept of balancing a B-tree. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 50 R-TreeVariants
- 51. MBR: Minimum Bounding Rectangle R-Trees are an extremely compelling data structure for spatial data. Largely based on B-Tree (Can be considered a generalization of B-Tree) Can support more than two dimensions Support same basic operations (deletion, searching, insertion, update, etc.) Many variants of R-Trees are available L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 51

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment