IO-Efficient Point Location and Map Overlay in Low-Density Subdivisions

1,314 views

Published on

Talk given at Aarhus Universitet, 2007

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,314
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

IO-Efficient Point Location and Map Overlay in Low-Density Subdivisions

  1. 1. IO-Efficient Point Location and Map Overlay in Low-Density Subdivisions Shripad Thite sthite@win.tue.nl Department of Computer Science Technische Universiteit Eindhoven The Netherlands Joint work with Mark de Berg, Herman Haverkort, and Laura Toma 1-1 19-Feb-2007 @ ˚rhus Universitet A
  2. 2. Point location Map = polygonal subdivision of the plane Given a point in the plane, identified by its coordinates, find the region of the map that contains the point IO-Efficient Point Location and Map Overlay / Shripad Thite 2-1
  3. 3. Map overlay Combine various attributes of data from different maps or map layers to compute the interaction of these attributes Given two polygonal subdivisions of the plane, red and blue, compute all intersections between a red edge and a blue edge IO-Efficient Point Location and Map Overlay / Shripad Thite 3-1
  4. 4. Geographic Information System (GIS) A GIS is a spatial database with algorithms for managing, analyzing, and displaying geographic information Applications with tremendous environmental, social, and economic impact—infrastructure planning, social engi- neering, facility location, agriculture Require algorithms for fundamental problems well-studied in Computational Geometry—adjacency, containment, proximity . . . . . . with a twist—geographic data is huge! IO-Efficient Point Location and Map Overlay / Shripad Thite 4-1
  5. 5. Geometric algorithms for GIS Conventional analysis of algorithms accounts for worst-case behavior, often for inputs that do not occur in practice Complex algorithms are too hard to implement and make little impact on applications Simplifying assumptions about the computational model are not valid or hold only approximately Goal: Design theoretically efficient practical algorithms ac- companied by an analysis of the algorithm complexity on a refined model of computation for realistic inputs IO-Efficient Point Location and Map Overlay / Shripad Thite 5-1
  6. 6. Massive data Practical inputs have gigabytes and terabytes of data We need algorithms whose performance scales well for increasingly large input data sets encountered in practice Traditional algorithms suffer from poor memory usage Poor cache behavior causes thrashing where excessive time is spent transferring data in and out of memory cache IO-Efficient Point Location and Map Overlay / Shripad Thite 6-1
  7. 7. External-memory algorithms The cost of data transfer significantly influences the real cost of an algorithm, often dominating CPU operations External-memory algorithms seek to minimize data trans- fer, by utilizing locality of reference Goal: Develop external-memory algorithms and data structures for geometric problems, where it is often harder to exploit locality IO-Efficient Point Location and Map Overlay / Shripad Thite 7-1
  8. 8. External-memory model Model of computation where memory is organized in two levels—internal and external memory [Aggarwal & Vitter] CPU operations can take place only on data in internal memory, which is limited in size to M words Internal External CPU memory memory (cache) (disk) External memory is large enough for input, working space, and output IO-Efficient Point Location and Map Overlay / Shripad Thite 8-1
  9. 9. External-memory model Both internal and external memory organized in blocks of B words each One input/output operation (one IO) transfers one block of B words between internal and external memory The IO-cost of an algorithm is the number of IO- operations it performs IO-complexity accurately models cost of data transfer be- tween disk and main memory, as a function of memory architecture parameters B and M IO-Efficient Point Location and Map Overlay / Shripad Thite 9-1
  10. 10. External-memory algorithms Designed to minimize Input/Output (IO) operations be- tween slow but large external memory and fast but small internal memory Each IO operation reads or writes B words stored in a block; internal memory of size M holds M/B blocks Two-level memory model introduced by Aggarwal and Vitter has become a popular design and analysis tool Lots of IO-efficient algorithms developed and proved use- ful in practice IO-Efficient Point Location and Map Overlay / Shripad Thite 10-1
  11. 11. Remember ... Map = polygonal subdivision of the plane Point location: Given a point in the plane, identified by its coordinates, find the region of the map that contains the point Map overlay: Given two polygonal subdivisions of the plane, red and blue, compute all intersections between a red edge and a blue edge IO-Efficient Point Location and Map Overlay / Shripad Thite 11-1
  12. 12. Previous work External-Memory Algorithms for Processing Line Segments in Geographic Information Systems Arge, Vengroff, and Vitter; ESA’95 overlay two maps in O(sort(n) + t/B) optimal IOs where t = number of intersections batched point location in O((n + k)/B logM/B (n/B)) IOs, where k = number of query points using Θ(n logM/B (n/B)) blocks of storage (???) We improve on space usage as well as query time, for low-density maps, at the expense of O(sort(n)) pre- processing ; our algorithms are simpler to implement IO-Efficient Point Location and Map Overlay / Shripad Thite 12-1
  13. 13. Challenges Creating a linear size index supporting queries in loga- rithmic time usual hierarchical decompositions support O(log n) query time but using O(n log n) space Support efficient batched queries on the index to answer k queries presented in a batch more efficiently than k individual queries Can we overlay two maps in O(scan(n)) IOs? Existing solutions too complicated and/or not IO-optimal IO-Efficient Point Location and Map Overlay / Shripad Thite 13-1
  14. 14. Quadtree IO-Efficient Point Location and Map Overlay / Shripad Thite 14-1
  15. 15. Z-curve Space-filling curve visits points in order of their Z-index (a.k.a. Morton block index) 0 00 01 0000 0001 0100 0101 00 1 01 10 11 0010 0011 0110 0111 1000 1001 1100 1101 0 1 10 bit-interleaved order 11 1010 1011 1110 1111 00 01 10 11 IO-Efficient Point Location and Map Overlay / Shripad Thite 15-1
  16. 16. Quadtree meets Z-curve Z-curve visits every quadtree cell in a contiguous interval The leaves of a quadtree define a subdivision of the Z- curve Two quadtree cells are either disjoint or nested Z-intervals of two quadtree cells are either disjoint or nested IO-Efficient Point Location and Map Overlay / Shripad Thite 16-1
  17. 17. Example 0 7 8 1 3 4 2 9 10 5 6 12 13 21 22 11 14 15 17 18 23 24 16 19 20 IO-Efficient Point Location and Map Overlay / Shripad Thite 17-1
  18. 18. I. Fat Triangulations 18-1
  19. 19. Fat triangulation A δ-fat triangulation is one whose minimum angle is at least δ > 0 δ δ Our input is a triangulation with fatness δ max. degree 2π/δ We assume B = Ω(1/δ) and M = Ω(1/δ 3 ) IO-Efficient Point Location and Map Overlay / Shripad Thite 19-1
  20. 20. Linear quadtree Our data structure is a linear quadtree: a linear quadtree stores only leaves (no pointers) internal nodes are represented implicitly and can be computed as required We store quadtree leaves in Z-order IO-Efficient Point Location and Map Overlay / Shripad Thite 20-1
  21. 21. Linear quadtree Recursively partition the bounding box into four quad- rants Novel stopping condition: Stop splitting a quadtree cell when all edges inter- secting the cell are incident on a common vertex Lemma: Quadtree contains O(n/δ 2 ) cells, each cell in- tersected by at most 2π/δ triangles; total number of triangle-cell intersections is O(n/δ 2 ). IO-Efficient Point Location and Map Overlay / Shripad Thite 21-1
  22. 22. Building local quadtrees Top-down recursive algorithm to build quadtree not IO- efficient quadtree may have depth Θ(n), hence IO-cost is O(n2 /B) Instead, for each vertex v, build a local quadtree for the triangles incident on v Since vertex degree is at most 2π/δ, a local quadtree can be built entirely in internal memory IO-Efficient Point Location and Map Overlay / Shripad Thite 22-1
  23. 23. Building local quadtrees Lemma: The union of all local quadtrees is identical to the global quadtree We need to show that every cell in the global quadtree appears in some local quadtree Proof: Every triangle T intersects a cell C of the global quadtree if and only if C belongs to the local quadtree of at least one of the vertices of T . IO-Efficient Point Location and Map Overlay / Shripad Thite 23-1
  24. 24. Example IO-Efficient Point Location and Map Overlay / Shripad Thite 24-1
  25. 25. Example 0 7 8 1 3 4 2 9 10 5 6 12 13 21 22 11 14 15 17 18 23 24 16 19 20 IO-Efficient Point Location and Map Overlay / Shripad Thite 25-1
  26. 26. Building an index Each triangle stored with every quadtree cell that it in- tersects The Z-index of a cell is its order along the space-filling Z-curve Whenever triangle T intersects cell C, the pair (T, C) is stored with associated key equal to the Z-index of C IO-Efficient Point Location and Map Overlay / Shripad Thite 26-1
  27. 27. Indexing triangles Sort the O(n/δ 2 ) cell-triangle pairs in Z-order of cells = O(sort(n/δ 2 )) IOs Build a cache-oblivious B-tree on the set of cell-triangle pairs sorted by key (Z-index of cell) B-tree has size O(n/δ 2 ) and depth O(logB (n/δ 2 )) IO-Efficient Point Location and Map Overlay / Shripad Thite 27-1
  28. 28. How to locate a single point Search the B-tree from root to leaf with Z-index of p for quadtree cell containing point p = O(logB (n/δ 2 )) IOs Check p against all triangles intersecting the cell (at most 2π/δ) in internal memory; all these triangles have the same key and are stored together IO-Efficient Point Location and Map Overlay / Shripad Thite 28-1
  29. 29. How to locate a batch of k points Sort the k query points by Z-index = O(sort(k)) IOs Merge the sorted query points and the sorted leaf cells by scanning in parallel = O(scan(n/δ 2 + k)) IOs IO-Efficient Point Location and Map Overlay / Shripad Thite 29-1
  30. 30. How to overlay two triangulations Quadtree leaves subdivide the Z-curve into disjoint inter- vals Since quadtree leaves are sorted in Z-order, the intervals are in sorted order Merge the two sorted sets of intervals, corresponding to the quadtrees of the two triangulations = O(scan(n/δ 2 )) IOs IO-Efficient Point Location and Map Overlay / Shripad Thite 30-1
  31. 31. How to support updates Each of the following operations affects O(1/δ 4 ) entries in the B-tree: insert/delete a vertex flip an edge Each update affects a local quadtree; perform corre- sponding changes to the global quadtree = O( δ14 logB (n/δ 2 )) IOs per update IO-Efficient Point Location and Map Overlay / Shripad Thite 31-1
  32. 32. Summary: fat triangulations We build a linear quadtree, from local quadtrees of small neighborhoods, using a novel stopping condition The quadtree leaves are stored in a cache-oblivious B- tree, indexed by their order along the Z-order space-filling curve The B-tree has linear size and logarithmic depth, thus supporting efficient queries and updates Two such quadtrees can be overlaid by scanning; the two indexes are merged in the process IO-Efficient Point Location and Map Overlay / Shripad Thite 32-1
  33. 33. II. Low-Density Maps 33-1
  34. 34. Low-density maps The density of a set S of objects is the smallest number λ such that every disk D intersects at most λ objects of S whose diameter is at least the diameter of D The density of a planar map is the density of its edge set Our input is a map with density λ We assume B = Ω(λ) A δ-fat triangulation has density λ = O(1/δ) IO-Efficient Point Location and Map Overlay / Shripad Thite 34-1
  35. 35. Compressed quadtree An annulus is the set-theoretic difference of two ordinary nested cells ordinary cell compress annulus cell An annulus can be represented by two nested Z-intervals IO-Efficient Point Location and Map Overlay / Shripad Thite 35-1
  36. 36. Compressed linear quadtree We introduce compressed linear quadtrees: a compressed quadtree has many fewer nodes than an ordinary quadtree a compressed quadtree has more complicated cells (annuli); our storage scheme handles such cells IO-Efficient Point Location and Map Overlay / Shripad Thite 36-1
  37. 37. Quadtree of guarding points Build a compressed quadtree of guarding points of edges Guarding points of an edge = vertices of the axis- aligned bounding square Stopping condition: Stop splitting a quadtree cell when it contains only one guarding point Lemma [de Berg et al.]: A square containing g guard- ing points intersects at most g + 4λ edges IO-Efficient Point Location and Map Overlay / Shripad Thite 37-1
  38. 38. Example IO-Efficient Point Location and Map Overlay / Shripad Thite 38-1
  39. 39. How to build a quadtree of points Sort guarding points in Z-order For each consecutive pair of points, output their local quadtree: their canonical bounding square and its four children Sort all cells and remove duplicates Result: Compressed quadtree of guarding points in O(sort(n)) IOs, where leaf cells are sorted in Z-order IO-Efficient Point Location and Map Overlay / Shripad Thite 39-1
  40. 40. Example IO-Efficient Point Location and Map Overlay / Shripad Thite 40-1
  41. 41. Computing cell-edge intersections We distribute the edges of the subdivision among the quadtree leaf cells For each edge e, we compute the quadtree cells that it intersects in a batched filtering use cache-oblivious distribution sweeping? A quadtree leaf cell not intersected by any edge is re- peatedly merged with a predecessor or successor cell in Z-order IO-Efficient Point Location and Map Overlay / Shripad Thite 41-1
  42. 42. Small-size quadtree Lemma: Compressed quadtree of guarding points con- tains O(n) leaf cells, each leaf intersected by at most O(λ) faces; total number of face-cell intersections is O(nλ). Build a B-tree on the set of cell-edge pairs sorted by key (Z-index of cell) B-tree has O(n) leaves and depth O(logB n) IO-Efficient Point Location and Map Overlay / Shripad Thite 42-1
  43. 43. How to locate a single point Search the B-tree from root to leaf with Z-index of p for quadtree cell containing point p = O(logB n) IOs Check p against all O(λ) faces intersecting the cell, in internal memory; all these faces have the same key and are stored together IO-Efficient Point Location and Map Overlay / Shripad Thite 43-1
  44. 44. How to locate a batch of k points Sort the k query points by Z-index = O(sort(k)) IOs Merge the sorted query points and the sorted leaf cells by scanning in parallel = O(scan(n + k)) IOs IO-Efficient Point Location and Map Overlay / Shripad Thite 44-1
  45. 45. How to overlay two maps Quadtree leaves subdivide the Z-curve into disjoint inter- vals Since quadtree leaves are sorted in Z-order, the intervals are in sorted order Merge the two sorted sets of intervals, corresponding to the quadtrees of the two maps = O(scan(n)) IOs IO-Efficient Point Location and Map Overlay / Shripad Thite 45-1
  46. 46. Summary: low-density maps We introduce compressed linear quadtrees We build a compressed linear quadtree of the set of O(n) guarding points for the edges of the subdivision We store the quadtree leaves (only) in sorted order along the Z-order space-filling curve We build a 1D index, a B-tree of linear size, on the quadtree leaves supporting efficient queries Making construction and update algorithms cache- oblivious remains an open problem IO-Efficient Point Location and Map Overlay / Shripad Thite 46-1
  47. 47. Implementation The Z-order of a point is its bit-interleaved order Z(x0 x1 . . . xb , y0 y1 . . . yb ) = x0 y0 x1 y1 . . . xb yb 2b-bit integer The canonical bounding box of two points is computed from the longest common prefix of the bitstring repre- senting their coordinates Several optimizations described in our paper IO-Efficient Point Location and Map Overlay / Shripad Thite 47-1
  48. 48. Summary We preprocess a fat triangulation or low-density subdivi- sion in O(sort(n)) IOs so we can: answer k batched point location queries in O(scan(n) + sort(k )) IOs overlay two maps in O(scan(n)) IOs We give simple, practical, implementable, fast, scalable algorithms! Our algorithms for triangulations are cache-oblivious IO-Efficient Point Location and Map Overlay / Shripad Thite 48-1
  49. 49. To read more ... I/O-Efficient Map Overlay and Point Location in Low-Density Subdivisions Mark de Berg, Herman Haverkort, ST, Laura Toma http://www.win.tue.nl/∼sthite/pubs/ Condensed version to appear at EuroCG 2007 Thanks to Sariel Har-Peled for valuable discussions IO-Efficient Point Location and Map Overlay / Shripad Thite 49-1
  50. 50. Future work Implementation (in TPIE?) IO-efficient range searching in low-density subdivisions IO-efficient overlay of general subdivisions, not assuming fatness or low density IO-Efficient Point Location and Map Overlay / Shripad Thite 50-1
  51. 51. Tak! 51-1

×