Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GeoWave: Scaling Complex (Not Just Geo) Data

This presentation will discuss the successes of GeoWave applied to the spatiotemporal domain, and focus on how the successes in this domain can be further generalized to a diverse set of complex data structures. The intent is to draw corollaries to the data challenges of the audience.

Fast indexed access to massive datasets fundamentally involves highly optimized range scans within key-value stores. If your reaction is "that's easier said than done" than you've had the pre-requisite experiences to attend this talk. The intent of the software is to make these use cases as seamless as possible for downstream consumers of the framework. Briefly, a GeoWave "dimension" is simply a function to apply sort order to real world values. The constructs for defining these "dimensions" and many more details will be discussed in this presentation.

At the core of GeoWave is a capability to store, retrieve, and analyze multi-dimensional data structures within distributed key-value stores. Fundamentally, spatio-temporal data serves as a special case for which GeoWave provides tailored extensions. The software is intended to be easily pluggable into any sorted key-value store, with current implementations available for Apache HBase, Apache Accumulo, Apache Cassandra, Apache Kudu, Redis, RocksDB, Google BigTable, and Amazon DynamoDB. The datastore support is truly provided as an extension that is discoverable at runtime. Following any GeoWave programmatic API, commandline, or service access will not be tied to any particular key-value store. Furthermore there are optimized data transfer utilities across supported stores. This approach has proven to provide seamless transitions of scale from embedded applications, external in-memory services, all the way up to its primary applications within highly distributed ecosystems.

  • Login to see the comments

GeoWave: Scaling Complex (Not Just Geo) Data

  1. 1. Barry Bragg Maxar Technologies Rich Fecher Maxar Technologies
  2. 2. An open source framework that leverages the scalability of key-value stores for effective storage, retrieval, and analysis of massive geospatial datasets
  3. 3. At its core, GeoWave handles spatial and spatiotemporal indexing within distributed key- value stores with natural integrations for various popular frameworks popular geospatial platforms distributed processing frameworks GeoWave bridges the gap between and
  4. 4. Hosted
  5. 5. Use a Space Filling Curve (SFC) to impose multi- dimensional data.
  6. 6. Z-Order Hilbert H-order Peano AR2W2 BΩ WL∞ ∞ 6 4 8 5.40 5.00 WL2 ∞ 6 4 8 6.04 5.00 WL1 ∞ 9 8 10.66 12.00 9.00 WBA ∞ 2.40 3.00 2.00 3.05 2.22 ABA 2.86 1.41 1.69 1.42 1.47 1.40 Haverkort, Walderveen Locality and Bounding-Box Quality of Two-Dimensional Space-Filling Curves 2008 arXiv:0806.4787v2 Average Total Bounding Box Area (ABA)Worst Case Dilation Worst Case Bounding Box Area Ratio (WBA)
  7. 7. ● What about data with extents such as lines/polys or time ranges? ○ We need to represent multiple resolutions... ● What about unbounded dimensions? ○ We can define a periodicity to bound a single SFC. We end up with an SFC per period (or combination of periods). ● What about queries? ○ Bounding hyperrectangles are discontinuous on the space filling curve
  8. 8. From Massive Scale in the Cloud to GeoWave Embedded in the Client With a single interface, you can use both! An example analysis tool requiring GeoWave multi-dimensional indexing for map, timeline, and graph search and visualization of massive datasets
  9. 9. Use a Space Filling Curve (SFC) to impose multi- dimensional data.
  10. 10. Made Missed
  11. 11.