2. terms
• GeoMesa: an open-source project organized under LocationTech
• scalable: if you can continue to solve problems as N >> 1 with no more change than
adding hardware and minor tweaks, you scale
• geospatial: data that contain a geographic reference, a date/time, and zero
or more additional attributes
• analytics: formally, a logical decomposition via truth-preserving transformations;
informally, any useful derivation (whether deductive or inductive)
3. outline
• part 1: why? ( 3 minutes)
• part 2: how? (10 minutes)
• part 3: what? (10 minutes)
• part 4: who? ( 2 minutes)
5. [why] which X (points) are close to location Y?
• hundreds: PostgreSQL and brute force
– full table scan
• hundreds of thousands: PostgreSQL and PostGIS
– GeoTools API
– GiST (think R-trees)
• hundreds of millions: a funny thing happens as you collect much more data...
7. [why] perhaps SQL is the bottleneck?
• NoSQL databases, such as Apache Accumulo
• trade ACID for distributed processing, storage
• but there’s no PostGIS for Accumulo, so how does the canonical diagram of an Accumulo (key,
value) pair help us answer some simple questions...
8. [why] questions that ought to be easy for an index to answer
• easy question: Which comes first, “Ontario” or “Quebec”?
9. [why] questions that ought to be easy for an index to answer
• easy question: Which comes first, “Ontario” or “Quebec”?
• similar question: Which comes first, or ?
10. [why] questions that ought to be easy for an index to answer
• easy question: Which comes first, “Ontario” or “Quebec”?
• similar question: Which comes first, or ?
• simplify, and think only of representative cities, and think of them strictly as points
13. [why] geohashing
City Coordinates (courtesy Wikipedia) Geohash
Ottawa 45°25′15″N 75°41′24″W f244m
Montréal 45°30′N 73°34′W f25dv
Charlottesville (Virginia, USA) 38°1′48″N 78°28′44″W dqb0q
● Two unique orders:
○ Order by name: Charlottesville, Montréal, Ottawa
○ Order by longitude or latitude or geohash: Charlottesville, Ottawa, Montréal
● Lexicoding location -> geohash provides a deterministic, repeatable ordering
○ with this, we can index, store, and query points by lexicographic ranges
14. [why] build-versus-buy remorse
• PostgreSQL+PostGIS has some nice functions
– geometric predicates
– secondary indexes
– standard GeoTools API
• some of our data are (multi) lines, (multi) polygons
• time is often more than a secondary consideration
• sometimes, analysis work needn’t be done on the same old client
– distributed across the tablet servers?
– using tools like Spark?
– streaming?
17. [how] GeoMesa features
• GeoTools API
• sharding distributes queries uniformly
• flexible SFC can incorporate time
• supports (multi) point, (multi) line, (multi) polygon geometries
• secondary indexes and a multi-stage query planner
• burgeoning raster support via WCS
• GeoServer as a plugin-based GUI
• WPS standards for computation (and function chaining)
27. [how] Web Processing Service
• WPS is another OGC standard
• Think of it as an abstract function definition, mapping input types to output types, and defining
the computation that occurs between the two.
• WPS processes can be chained.
• This provides for a natural extension mechanism to GeoMesa.
28. [how] synthesis
Those are merely the highlights of some of GeoMesa’s current features…
… so what?
39. questions
For extended questions:
geomesa-user@locationtech.org
geomesa@ccri.com
christopher.eichelberger@geomesa.org
For additional reading:
geomesa.org
For code:
github.com/locationtech/geomesa