This talk introduces GeoMesa and discusses how it can be used to store and analyze massive amounts of movement data.
Talk recording: https://av.tib.eu/media/42874
2. Research results
WHAT WE DO
Movement
data
Spatial
context data
Evaluation
& tuning
Model
application
Algorithms
Trained
models
Exploration
& hypothesis
formulation
Model
building &
training
3. #1 Taxis in Vienna
> 3.5 billion records since 2005
#2 Automatic Identification System (AIS) Data
500 million records per day
#3 Mobile phone network data
3 billion records per day (one big Austrian provider)
ANALYZING MASSIVE MOVEMENT DATA
4. WHY WE BOTHER?
Too much waiting
Not enough time for data
exploration & method development
5. OPEN & SPATIAL & SCALABLE
5https://projects.eclipse.org/wg/locationtech/projects
6. Short answer:
* and other big data stores
WHAT IS GEOMESA?
GeoMesa is to Accumulo* what
PostGIS is to PostgreSQL
7. WHAT IS GEOMESA?
Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”
8. Features
Store gigabytes to petabytes of spatial data (tens of billions of points or more)
Serve up tens of millions of points in seconds
Ingest data faster than 10,000 records per second per node
Scale horizontally easily (add more servers to add more capacity)
Support Spark analytics
Drive a map through GeoServer or other OGC Clients
GEOMESA
829/08/2019
http://www.geomesa.org/documentation/user/introduction.html#what-is-geomesa
12. … make 2/3D data sortable
SPACE-FILLING CURVES
12
Fox, A., Eichelberger, C., Hughes, J., & Lyon, S. (2013, October). Spatio-temporal indexing in non-relational distributed databases.
In Big Data, 2013 IEEE International Conference on (pp. 291-299). IEEE.
25. Large technology stack
Only specific versions work together
Challenging to set up & manage
PRACTICAL ASPECTS
26. Setup on one machine for experimental purposes
GeoDocker
https://github.com/geodocker/geodocker-geomesa
CCRI’s cloud-local
https://github.com/ccri/cloud-local
FIRST STEPS
Common workflow, ex: training ML models for
Travel mode detection
Prediction of traffic states
Very iterative, esp. With new data sources
Assessing data potential
Identifying limitations
Proof of concept
First contact 2015
Large area + high temporal resolution
Travel times
Anomaly detection
Movement prediction
Ex: How long does it take a cargo ship from Hamburg to Shanghai?
Timestamp Hamburg < Shanghai
No intermediate stops at other harbors
No observatin gap
…
Not as bad as week-long training in deep learning but time is ticking!
Change wait until next day check if it worked evaluate results repeat
LocationTech (strong NA focus + more professional)
OSGeo (more global + more volunteers)
Famous LocationTech projects:
JTS Java Topology Suite
Proj4J
Strong collaboration with OSGeo Java projects
This description used to be on geomesa.org homepage
(„Ok, what‘s Accumulo?“ Later / something like a DB)
Many similarities but no feature parity
Not one piece of software but modular stack
(Slides from GeoMesa devs at CCRI)
Following slides show options for the four main areas
Streaming
Persisting
Managing
Analyzing
Our focus so far: persisting & analyzing
Promises:
Scalable storage
Fast I/O
Parallel processing
For geodata!
OGC standards compliant! key feature! ensures GIS interoperability for exploration
Standard stack
= Key components that Geomesa builds on / enables
Modular, e.g. Hbase / Accumulo
Support for Simple Features as values
+ Spatial index in key
Not your usual spatial index (e.g. R-tree in PostGIS)
Too computationally expensive / hard to distribute
Accumulo / Hbase sort by one dimension only!
Locationtech SFCurves project
Even beyond 3D
Papers by GeoMesa dev Anthony Fox
Multible tables
Different indexes
Full copy of SimpleFeatures (default in GeoMesa 2)
10 million records per day
1.5 years (1/2017 to 6/2018)
Count(*) is slow
Much faster with spatial filter
414 ships between Gothenburg and Quebec
Even faster with spatiotemporal filter
Spark alternative: GeoMesa processes directly to datastore
Ex: find cargo vessels along a given route
Very fast – less overhead than Spark
Viz!
Straightforward installation
Configuration of datastores + layers like any DB
Great performance
No nice GUI for time in GeoServer preview
Add TIME parameter to request
Animation
Two WMS-T layers
On the fly
Real time
Many components
+ versions important
+ whole cluster needs to be in sync
GeoDocker: pretty alpha in 2017
GeoMesa devs (Gitter) recommend / maintain cloud-local = all components on one machine
Scripts can be adjusted to cluster setupDownside: need to manually copy config to new nodes to expand cluster / do updates