Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GeoMesa: Scalable Geospatial Analytics

1,970 views

Published on

GeoMesa: Scalable Geospatial Analytics

Published in: Technology

GeoMesa: Scalable Geospatial Analytics

  1. 1. GeoMesa: Scalable Geospatial Analytics Chris Eichelberger christopher.eichelberger@ccri.com
  2. 2. terms • GeoMesa: an open-source project organized under LocationTech • scalable: if you can continue to solve problems as N >> 1 with no more change than adding hardware and minor tweaks, you scale • geospatial: data that contain a geographic reference, a date/time, and zero or more additional attributes • analytics: formally, a logical decomposition via truth-preserving transformations; informally, any useful derivation (whether deductive or inductive)
  3. 3. outline • part 1: why? ( 3 minutes) • part 2: how? (10 minutes) • part 3: what? (10 minutes) • part 4: who? ( 2 minutes)
  4. 4. part 1: why?
  5. 5. [why] which X (points) are close to location Y? • hundreds: PostgreSQL and brute force – full table scan • hundreds of thousands: PostgreSQL and PostGIS – GeoTools API – GiST (think R-trees) • hundreds of millions: a funny thing happens as you collect much more data...
  6. 6. [why] dissolution of large-volume data
  7. 7. [why] perhaps SQL is the bottleneck? • NoSQL databases, such as Apache Accumulo • trade ACID for distributed processing, storage • but there’s no PostGIS for Accumulo, so how does the canonical diagram of an Accumulo (key, value) pair help us answer some simple questions...
  8. 8. [why] questions that ought to be easy for an index to answer • easy question: Which comes first, “Ontario” or “Quebec”?
  9. 9. [why] questions that ought to be easy for an index to answer • easy question: Which comes first, “Ontario” or “Quebec”? • similar question: Which comes first, or ?
  10. 10. [why] questions that ought to be easy for an index to answer • easy question: Which comes first, “Ontario” or “Quebec”? • similar question: Which comes first, or ? • simplify, and think only of representative cities, and think of them strictly as points
  11. 11. [why] geohashing
  12. 12. [why] geohashing
  13. 13. [why] geohashing City Coordinates (courtesy Wikipedia) Geohash Ottawa 45°25′15″N 75°41′24″W f244m Montréal 45°30′N 73°34′W f25dv Charlottesville (Virginia, USA) 38°1′48″N 78°28′44″W dqb0q ● Two unique orders: ○ Order by name: Charlottesville, Montréal, Ottawa ○ Order by longitude or latitude or geohash: Charlottesville, Ottawa, Montréal ● Lexicoding location -> geohash provides a deterministic, repeatable ordering ○ with this, we can index, store, and query points by lexicographic ranges
  14. 14. [why] build-versus-buy remorse • PostgreSQL+PostGIS has some nice functions – geometric predicates – secondary indexes – standard GeoTools API • some of our data are (multi) lines, (multi) polygons • time is often more than a secondary consideration • sometimes, analysis work needn’t be done on the same old client – distributed across the tablet servers? – using tools like Spark? – streaming?
  15. 15. [why] synthesis
  16. 16. part 2: how?
  17. 17. [how] GeoMesa features • GeoTools API • sharding distributes queries uniformly • flexible SFC can incorporate time • supports (multi) point, (multi) line, (multi) polygon geometries • secondary indexes and a multi-stage query planner • burgeoning raster support via WCS • GeoServer as a plugin-based GUI • WPS standards for computation (and function chaining)
  18. 18. [how] GeoTools API
  19. 19. [how] sharding
  20. 20. [how] space-filling curve progression %~#s%3#r%0,3#gh%yyyyMM#d::%~#s%3,2#gh::%~#s%5,2#gh%HHmm#d%id
  21. 21. [how] multi-step query planning
  22. 22. [how] multi-step query planning
  23. 23. [how] non-point geometries
  24. 24. [how] rasters + GeoWave integration
  25. 25. [how] supporting other frameworks
  26. 26. [how] GeoServer as a plug-in GUI
  27. 27. [how] Web Processing Service • WPS is another OGC standard • Think of it as an abstract function definition, mapping input types to output types, and defining the computation that occurs between the two. • WPS processes can be chained. • This provides for a natural extension mechanism to GeoMesa.
  28. 28. [how] synthesis Those are merely the highlights of some of GeoMesa’s current features… … so what?
  29. 29. part 3: what?
  30. 30. [what] distributing computation
  31. 31. [what] queries that interpolate both position and time
  32. 32. [what] K-nearest neighbor
  33. 33. [what] clustering (DBSCAN)
  34. 34. [what] near-real-time streaming track analytics with web sockets
  35. 35. [what] track viewer utility
  36. 36. part 3: who?
  37. 37. [who] LocationTech and the greater community
  38. 38. [who] synthesis
  39. 39. questions For extended questions: geomesa-user@locationtech.org geomesa@ccri.com christopher.eichelberger@geomesa.org For additional reading: geomesa.org For code: github.com/locationtech/geomesa

×