2IL05 Data Structures Spring 2010

368 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
368
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2IL05 Data Structures Spring 2010

  1. 1. 2IL05 Data Structures Spring 2010 Lecture 10: Range Searching
  2. 2. Augmenting data structures <ul><li>Methodology for augmenting a data structure </li></ul><ul><li>Choose an underlying data structure. </li></ul><ul><li>Determine additional information to maintain. </li></ul><ul><li>Verify that we can maintain additional information for existing data structure operations. </li></ul><ul><li>Develop new operations. </li></ul><ul><li>You don’t need to do these steps in strict order! </li></ul><ul><li>Red-black trees are very well suited to augmentation … </li></ul>
  3. 3. Augmenting red-black trees <ul><li>Theorem Augment a R-B tree with field f , where f [ x ] depends only on information in x , left[ x ], and right[ x ] (including f [left[ x ]] and f [right[ x ]]). Then can maintain values of f in all nodes during insert and delete without affecting O(log n) performance. </li></ul><ul><li>When we alter information in x, changes propagate only upward on the search path for x … </li></ul><ul><li>Examples </li></ul><ul><ul><li>OS-tree new operations: OS-Select and OS-Rank </li></ul></ul><ul><ul><li>Interval-tree new operation: Interval-Search </li></ul></ul>
  4. 4. Range Searching
  5. 5. <ul><li>Example : Database for personnel administration (name, address, date of birth, salary, …) </li></ul><ul><li>Query : Report all employees born between 1950 and 1955 who earn between $3000 and $4000 per month. </li></ul>Application: Database queries More parameters? Report all employees born between 1950 and 1955 who earn between $3000 and $4000 per month and have between two and four children. ➨ more dimensions
  6. 6. <ul><li>“ Report all employees born between 1950 and 1955 who earn between $3000 and $4000 per month and have between two and four children.” </li></ul>Application: Database queries Rectangular range query or orthogonal range query
  7. 7. 1-Dimensional range searching <ul><li>P = { p 1 , p 2 , …, p n } set of points on the real line </li></ul><ul><li>Query : given a query interval [x : x’] report all p i ∈ P with p i ∈ [x : x’]. </li></ul><ul><li>Solution : Use a balanced binary search tree T . </li></ul><ul><ul><ul><li>leaves of T store the points p i </li></ul></ul></ul><ul><ul><ul><li>internal nodes store splitting values (node v stores value x v ) </li></ul></ul></ul>
  8. 8. <ul><li>Query [x : x’] ➨ search with x and x’ ➨ end in two leaves μ and μ ’ </li></ul><ul><li>Report 1. all leaves between μ and μ ’ </li></ul><ul><li>2. possibly points stored at μ and μ ’ </li></ul>1-Dimensional range searching 80 70 62 80 19 19 10 23 49 37 89 93 59 30 3 3 10 23 30 37 49 89 59 70 62 93 98 [ 18 : 77 ] μ μ ’
  9. 9. <ul><li>Query [x : x’] ➨ search with x and x’ ➨ end in two leaves μ and μ ’ </li></ul><ul><li>Report 1. all leaves between μ and μ ’ </li></ul><ul><li>2. possibly points stored at μ and μ ’ </li></ul>1-Dimensional range searching 70 62 59 49 37 30 23 19 19 10 23 80 70 62 80 49 37 89 93 59 30 3 3 10 89 93 98 [ 18 : 77 ] μ μ ’ How do we find all leaves between μ and μ ’?
  10. 10. <ul><li>How do we find all leaves between μ and μ ’ ? </li></ul><ul><li>Solution : They are the leaves of the subtrees rooted at nodes v in between the two search paths whose parents are on the search paths. </li></ul><ul><li>➨ we need to find the node v split where the search paths split </li></ul>1-Dimensional range searching
  11. 11. 1-Dimensional range searching <ul><li>FindSplitNode (T, x, x’) </li></ul><ul><li>► Input . A tree T and two values x and x’ with x ≤ x’ </li></ul><ul><li>► Output . The node v where the paths to x and x’ split, or the leaf where both paths end. </li></ul><ul><li>v = root(T) </li></ul><ul><li>while v is not a leaf and (x’ ≤ x v or x > x v ) </li></ul><ul><li>do if x’ ≤ x v </li></ul><ul><li>then v = left(v) </li></ul><ul><li>else v = right(v) </li></ul><ul><li>return v </li></ul>
  12. 12. 1-Dimensional range searching <ul><li>Starting from v split follow the search path to x. </li></ul><ul><ul><li>when the paths goes left, report all leaves in the right subtree </li></ul></ul><ul><ul><li>check if µ ∈ [x : x’] </li></ul></ul><ul><li>Starting from v split follow the search path to x’. </li></ul><ul><ul><li>when the paths goes right, report all leaves in the left subtree </li></ul></ul><ul><ul><li>check if µ’ ∈ [x : x’] </li></ul></ul>
  13. 13. 1-Dimensional range searching <ul><li>1DRangeQuery (T, [x : x’]) </li></ul><ul><li>► Input . A binary search tree T and a range [x : x’]. </li></ul><ul><li>► Output . All points stored in T that lie in the range. </li></ul><ul><li>v split = FindSplitNode (T, x, x’) </li></ul><ul><li>if v split is a leaf </li></ul><ul><li>then Check if the point stored at v split must be reported. </li></ul><ul><li>else ( Follow the path to x and report the points in subtrees right of the path ) </li></ul><ul><li>v = left(v split ) </li></ul><ul><li>while v is not a leaf </li></ul><ul><li>do if x ≤ x v </li></ul><ul><li>then ReportSubtree (right(v)) </li></ul><ul><li>v = left(v) </li></ul><ul><li>else v = right(v) </li></ul><ul><li>Check if the point stored at the leaf v must be reported. </li></ul><ul><li>Similarly, follow the path to x’, report the points in subtrees left of the path, and check if the point stored at the leaf where the path ends must be reported. </li></ul>
  14. 14. 1-Dimensional range searching <ul><li>Correctness ? </li></ul><ul><li>Need to show two things: </li></ul><ul><li>1DRangeQuery (T, [x, x’]) </li></ul><ul><li>► Input . A binary search tree T and a range [x : x’]. </li></ul><ul><li>► Output . All points stored in T that lie in the range. </li></ul><ul><li>v split = FindSplitNode (T, x, x’) </li></ul><ul><li>if v split is a leaf </li></ul><ul><li>then Check if the point stored at v split must be reported. </li></ul><ul><li>else ( Follow the path to x and report the points in subtrees right of the path ) </li></ul><ul><li>v = left(v split ) </li></ul><ul><li>while v is not a leaf </li></ul><ul><li>do if x ≤ x v </li></ul><ul><li>then ReportSubtree (right(v)) </li></ul><ul><li>v = left(v) </li></ul><ul><li>else v = right(v) </li></ul><ul><li>Check if the point stored at the leaf v must be reported. </li></ul><ul><li>Similarly, follow the path to x’, report the points in subtrees left of the path, and check if the point stored at the leaf where the path ends must be reported. </li></ul><ul><li>every reported point lies in the query range </li></ul><ul><li>every point in the query range is reported. </li></ul>
  15. 15. 1-Dimensional range searching <ul><li>Query time ? </li></ul><ul><li>ReportSubtree = O(1 + reported points) </li></ul><ul><li>➨ total query time = O(log n + reported points) </li></ul><ul><li>Storage ? </li></ul><ul><li>1DRangeQuery (T, [x, x’]) </li></ul><ul><li>► Input . A binary search tree T and a range [x : x’]. </li></ul><ul><li>► Output . All points stored in T that lie in the range. </li></ul><ul><li>v split = FindSplitNode (T, x, x’) </li></ul><ul><li>if v split is a leaf </li></ul><ul><li>then Check if the point stored at v split must be reported. </li></ul><ul><li>else ( Follow the path to x and report the points in subtrees right of the path ) </li></ul><ul><li>v = left(v split ) </li></ul><ul><li>while v is not a leaf </li></ul><ul><li>do if x ≤ x v </li></ul><ul><li>then ReportSubtree (right(v)) </li></ul><ul><li>v = left(v) </li></ul><ul><li>else v = right(v) </li></ul><ul><li>Check if the point stored at the leaf v must be reported. </li></ul><ul><li>Similarly, follow the path to x’, report the points in subtrees left of the path, and check if the point stored at the leaf where the path ends must be reported. </li></ul>O(n)
  16. 16. 2-Dimensional range searching <ul><li>P = { p 1 , p 2 , …, p n } set of points in the plane </li></ul><ul><li>Query : given a query rectangle [x : x’] x [y : y’] report all p i ∈ P with p i ∈ [x : x’] x [y : y’] , that is, p x ∈ [x : x’] and p y ∈ [y : y’] </li></ul><ul><li>➨ a 2-dimensional range query is composed of two 1-dimensional sub-queries </li></ul><ul><li>How can we generalize our 1-dimensional solution to 2 dimensions? </li></ul>for now: no two points have the same x-coordinate, no two points have the same y-coordinate
  17. 17. Back to one dimension … 3 10 19 23 30 37 49 59 62 70 80 89 93 98 80 70 62 80 19 19 10 23 49 37 89 93 59 30 3 3 10 23 30 37 49 89 59 70 62 93 98
  18. 18. Back to one dimension … 80 70 19 19 93 59 30 3 3 10 23 30 37 49 89 59 70 62 93 98 3 10 19 23 30 37 49 59 62 70 80 89 93 98 62 80 10 23 49 37 89
  19. 19. And now in two dimensions … <ul><li>Split alternating on x- and y-coordinate </li></ul>p 1 p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 p 10 l 1 l 2 l 3 l 4 l 5 l 6 l 7 l 8 l 9 l 1 l 2 l 3 l 4 l 5 l 6 l 7 l 8 l 9 p 3 p 4 p 5 p 8 p 9 p 10 p 2 p 6 p 7 2-dimensional kd-tree
  20. 20. Kd-trees <ul><li>BuildKDTree (P, depth) </li></ul><ul><li>► Input . A set of points P and the current depth. </li></ul><ul><li>► Output . The root of a kd-tree storing P. </li></ul><ul><li>if P contains only one point </li></ul><ul><li>then return a leaf storing this point </li></ul><ul><li>else if depth is even </li></ul><ul><li>then split P into two subsets with a vertical line l through the median x-coordinate of the points in P. Let P 1 be the set of points to the left or on l, and let P 2 be the set of points to the right of l. </li></ul><ul><li>else split P into two subsets with a horizontal line l through the median y-coordinate of the points in P. Let P 1 be the set of points below or on l, and let P 2 be the set of points above l. </li></ul><ul><li>v left = BuildKdTree (P 1 , depth+1) </li></ul><ul><li>v right = BuildKdTree (P 2 , depth+1) </li></ul><ul><li>Create a node v storing l, make v left the left child of v, and make v right the right child of v </li></ul><ul><li>return v </li></ul>
  21. 21. Kd-trees <ul><li>BuildKDTree (P, depth) </li></ul><ul><li>► Input . A set of points P and the current depth. </li></ul><ul><li>► Output . The root of a kd-tree storing P. </li></ul><ul><li>if P contains only one point </li></ul><ul><li>then return a leaf storing this point </li></ul><ul><li>else if depth is even </li></ul><ul><li>then split P into two subsets with a vertical line l through the median x-coordinate of the points in P. Let P 1 be the set of points to the left or on l, and let P 2 be the set of points to the right of l. </li></ul><ul><li>else split P into two subsets with a horizontal line l through the median y-coordinate of the points in P. Let P 1 be the set of points below or on l, and let P 2 be the set of points above l. </li></ul><ul><li>v left = BuildKdTree (P 1 , depth+1) </li></ul><ul><li>v right = BuildKdTree (P 2 , depth+1) </li></ul><ul><li>Create a node v storing l, make v left the left child of v, and make v right the right child of v </li></ul><ul><li>return v </li></ul><ul><li>Running time ? </li></ul><ul><li>➨ T(n) = O(n log n) </li></ul>presort to avoid linear time median finding … O(1) if n = 1 O(n) + 2T( n/2 ) if n > 1 T(n) =
  22. 22. Kd-trees <ul><li>BuildKDTree (P, depth) </li></ul><ul><li>► Input . A set of points P and the current depth. </li></ul><ul><li>► Output . The root of a kd-tree storing P. </li></ul><ul><li>if P contains only one point </li></ul><ul><li>then return a leaf storing this point </li></ul><ul><li>else if depth is even </li></ul><ul><li>then split P into two subsets with a vertical line l through the median x-coordinate of the points in P. Let P 1 be the set of points to the left or on l, and let P 2 be the set of points to the right of l. </li></ul><ul><li>else split P into two subsets with a horizontal line l through the median y-coordinate of the points in P. Let P 1 be the set of points below or on l, and let P 2 be the set of points above l. </li></ul><ul><li>v left = BuildKdTree (P 1 , depth+1) </li></ul><ul><li>v right = BuildKdTree (P 2 , depth+1) </li></ul><ul><li>Create a node v storing l, make v left the left child of v, and make v right the right child of v </li></ul><ul><li>return v </li></ul><ul><li>Storage ? </li></ul>O(n)
  23. 23. Querying a kd-tree <ul><li>Each node v corresponds to a region region(v). </li></ul><ul><li>All points which are stored in the subtree rooted at v lie in region(v). </li></ul><ul><li>➨ 1. if a region is contained in query rectangle, report all points in region. </li></ul><ul><li>2. if region is disjoint from query rectangle, report nothing. </li></ul><ul><li>3. if region intersects query rectangle, refine search (test children or point stored in region if no children) </li></ul>
  24. 24. Querying a kd-tree p 1 p 1 p 2 p 3 p 4 p 5 p 6 p 8 p 11 p 12 p 13 p 3 p 4 p 5 p 11 p 12 p 13 p 2 p 6 Disclaimer: This tree cannot have been constructed by BuildKdTree … p 7 p 9 p 10 p 7 p 8 p 9 p 10
  25. 25. Querying a kd-tree <ul><li>SearchKdTree (v, R) </li></ul><ul><li>► Input . The root of (a subtree of) a kd-tree, and a range R. </li></ul><ul><li>► Output . All points at leaves below v that lie in the range. </li></ul><ul><li>if v is a leaf </li></ul><ul><li>then report the point stored at v if it lies in R </li></ul><ul><li>else if region(left(v)) is fully contained in R </li></ul><ul><li>then ReportSubtree (left(v)) </li></ul><ul><li>else if region(left(v)) intersects R </li></ul><ul><li>then SearchKdTree (left(v), R) </li></ul><ul><li>if region(right(v)) is fully contained in R </li></ul><ul><li>then ReportSubtree (right(v)) </li></ul><ul><li>else if region(right(v)) intersects R </li></ul><ul><li>then SearchKdTree (right(v), R) </li></ul><ul><li>Query time ? </li></ul>
  26. 26. Querying a kd-tree: Analysis <ul><li>Time to traverse subtree and report points stored in leaves is linear in number of leaves </li></ul><ul><li>➨ ReportSubtree takes O( k ) time, k total number of reported points </li></ul><ul><li>need to bound number of nodes visited that are not in the traversed subtrees (grey nodes) </li></ul><ul><li>the query range properly intersects the region of each such node </li></ul>We are only interested in an upper bound … How many regions can a vertical line intersect?
  27. 27. Querying a kd-tree: Analysis <ul><li>Question : How many regions can a vertical line intersect? </li></ul><ul><li>Q(n): number of intersected regions </li></ul><ul><li>Answer 1 : </li></ul><ul><li>Answer 2 : </li></ul><ul><li>Master theorem ➨ Q(n) = O( √ n) </li></ul>Q(n) = 1 + Q(n/2) p 1 p 2 p 3 p 4 p 5 p 6 p 8 p 11 p 12 p 13 p 7 p 9 p 10 Q(n) = O(1) if n = 1 2 + 2Q(n/4) if n > 1
  28. 28. KD-trees <ul><li>Theorem A kd-tree for a set of n points in the plane uses O( n ) storage and can be built in O( n log n ) time. A rectangular range query on the kd-tree takes O( √ n + k ) time, where k is the number of reported points. </li></ul>If the number k of reported points is small, then the query time O( √ n + k ) is relatively high. Can we do better? Trade storage for query time …
  29. 29. Back to 1 dimension … (again) <ul><li>A 1DRangeQuery with [x : x’] gives us all points whose x-coordinates lie in the range [x : x’] × [y : y’]. </li></ul><ul><li>These points are stored in O(log n) subtrees. </li></ul><ul><li>Canonical subset of node v points stored in the leaves of the subtree rooted at v </li></ul><ul><li>Idea : store canonical subsets in binary search tree on y-coordinate </li></ul>
  30. 30. Range trees <ul><li>Range tree </li></ul><ul><ul><li>The main tree is a balanced binary search tree T built on the x-coordinate of the points in P. </li></ul></ul><ul><ul><li>For any internal or leaf node ν in T , the canonical subset P(ν) is stored in a balanced binary search tree T assoc (ν) on the y-coordinate of the points. The node ν stores a pointer to the root of T assoc (ν), which is called the associated structure of ν. </li></ul></ul>
  31. 31. Range trees <ul><li>Build2DRangeTree (P) </li></ul><ul><li>► Input . A set P of points in the plane. </li></ul><ul><li>► Output . The root of a 2-dimensional range tree. </li></ul><ul><li>Construct the associated structure: Build a binary search tree T assoc on the set P y of y-coordinates of the points in P. Store at the leaves of T assoc not just the y-coordinates of the points in P y , but the points themselves. </li></ul><ul><li>if P contains only one points </li></ul><ul><li>then create a leaf v storing this point, and make T assoc the associated structure of v </li></ul><ul><li>else split P into two subsets; one subset P left contains the points with x- coordinate less than or equal to x mid , the median x-coordinate, and the other subset P right contains the points with x-coordinate larger than x mid </li></ul><ul><li>v left = Build2DRangeTree (P left ) </li></ul><ul><li>v right = Build2DRangeTree (P right ) </li></ul><ul><li>create a node v storing x mid , make v left the left child of v, make v right the right child of v, and make T assoc the associated structure of v </li></ul><ul><li>return v </li></ul>
  32. 32. Range trees <ul><li>Build2DRangeTree (P) </li></ul><ul><li>► Input . A set P of points in the plane. </li></ul><ul><li>► Output . The root of a 2-dimensional range tree. </li></ul><ul><li>Construct the associated structure: Build a binary search tree T assoc on the set P y of y-coordinates of the points in P. Store at the leaves of T assoc not just the y-coordinates of the points in P y , but the points themselves. </li></ul><ul><li>if P contains only one points </li></ul><ul><li>then create a leaf v storing this point, and make T assoc the associated structure of v </li></ul><ul><li>else split P into two subsets; one subset P left contains the points with x- coordinate less than or equal to x mid , the median x-coordinate, and the other subset P right contains the points with x-coordinate larger than x mid </li></ul><ul><li>v left = Build2DRangeTree (P left ) </li></ul><ul><li>v right = Build2DRangeTree (P right ) </li></ul><ul><li>create a node v storing x mid , make v left the left child of v, make v right the right child of v, and make T assoc the associated structure of v </li></ul><ul><li>return v </li></ul><ul><li>Running time ? </li></ul>presort to built binary search trees in linear time … O(n log n)
  33. 33. Range trees: Storage <ul><li>Lemma A range tree on a set of n points in the plane requires O( n log n ) storage. </li></ul><ul><li>Proof : </li></ul><ul><ul><li>each point is stored only once per level </li></ul></ul><ul><ul><li>storage for associated structures is linear in number of points </li></ul></ul><ul><ul><li>there are O(log n) levels </li></ul></ul>
  34. 34. Querying a range tree <ul><li>2DRangeQuery (T, [x : x’] × [y : y’]) </li></ul><ul><li>► Input . A 2-dimensional range tree T and a range [x : x’] × [y : y’]. </li></ul><ul><li>► Output . All points in T that lie in the range. </li></ul><ul><li>v split = FindSplitNode (T, x, x’) </li></ul><ul><li>if v split is a leaf </li></ul><ul><li>then check if the point stored at v split must be reported </li></ul><ul><li>else ( Follow the path to x and call 1DRangeQuery on the subtrees right of the path ) </li></ul><ul><li>v = left(v split ) </li></ul><ul><li>while v is not a leaf </li></ul><ul><li>do if x ≤ x v </li></ul><ul><li>then 1DRangeQuery (T assoc (right(v)), [y : y’]) </li></ul><ul><li>v = left(v) </li></ul><ul><li>else v = right(v) </li></ul><ul><li>Check if the point stored at v must be reported </li></ul><ul><li>Similarly, follow the path from right(v split ) to x’, call 1DRangeQuery with the range [y : y’] on the associated structures of subtrees left of the path, and check if the point stored at the leaf where the path ends must be reported. </li></ul>
  35. 35. Querying a range tree <ul><li>2DRangeQuery (T, [x : x’] × [y : y’]) </li></ul><ul><li>► Input . A 2-dimensional range tree T and a range [x : x’] × [y : y’]. </li></ul><ul><li>► Output . All points in T that lie in the range. </li></ul><ul><li>v split = FindSplitNode (T, x, x’) </li></ul><ul><li>if v split is a leaf </li></ul><ul><li>then check if the point stored at v split must be reported </li></ul><ul><li>else ( Follow the path to x and call 1DRangeQuery on the subtrees right of the path ) </li></ul><ul><li>v = left(v split ) </li></ul><ul><li>while v is not a leaf </li></ul><ul><li>do if x ≤ x v </li></ul><ul><li>then 1DRangeQuery (T assoc (right(v)), [y : y’]) </li></ul><ul><li>v = left(v) </li></ul><ul><li>else v = right(v) </li></ul><ul><li>Check if the point stored at v must be reported </li></ul><ul><li>Similarly, follow the path from right(v split ) to x’, call 1DRangeQuery with the range [y : y’] on the associated structures of subtrees left of the path, and check if the point stored at the leaf where the path ends must be reported. </li></ul><ul><li>Running time ? </li></ul>O(log 2 n + k )
  36. 36. Range trees <ul><li>Theorem A range tree for a set of n points in the plane uses O( n log n ) storage and can be built in O( n log n ) time. A rectangular range query on the range tree takes O( log 2 n + k ) time, where k is the number of reported points. </li></ul><ul><li>Conclusion </li></ul>O(log 2 n + k) O(√n + k) query time O(n log n) O(n) storage Range tree kd-tree
  37. 37. Tutorials this week <ul><li>Small tutorials on Tuesday 1+2. </li></ul><ul><li>Thursday 3+4 big tutorial. </li></ul>

×