BIG SPATIAL(!)
DATA PROCESSING MIT
Anita Graser
Center for Mobility Systems, AIT Austrian Institute of Technology
Research results
WHAT WE DO
Movement
data
Spatial
context data
Evaluation
& tuning
Model
application
Algorithms
Trained
models
Exploration
& hypothesis
formulation
Model
building &
training
#1 Taxis in Vienna
> 3.5 billion records since 2005
#2 Automatic Identification System (AIS) Data
500 million records per day
#3 Mobile phone network data
3 billion records per day (one big Austrian provider)
ANALYZING MASSIVE MOVEMENT DATA
WHY WE BOTHER?
Too much waiting
 Not enough time for data
exploration & method development
OPEN & SPATIAL & SCALABLE
5https://projects.eclipse.org/wg/locationtech/projects
Short answer:
* and other big data stores
WHAT IS GEOMESA?
GeoMesa is to Accumulo* what
PostGIS is to PostgreSQL
WHAT IS GEOMESA?
Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”
Features
 Store gigabytes to petabytes of spatial data (tens of billions of points or more)
 Serve up tens of millions of points in seconds
 Ingest data faster than 10,000 records per second per node
 Scale horizontally easily (add more servers to add more capacity)
 Support Spark analytics
 Drive a map through GeoServer or other OGC Clients
GEOMESA
829/08/2019
http://www.geomesa.org/documentation/user/introduction.html#what-is-geomesa
GEOMESA
https://www.geomesa.org/documentation/user/architecture.html
Spatial extension for Accumulo
 Distributed
 Spatially indexed
GEOMESA
Zoo
keeper
Hadoop
GEOMESA – SPATIAL INDEX
… make 2/3D data sortable
SPACE-FILLING CURVES
12
Fox, A., Eichelberger, C., Hughes, J., & Lyon, S. (2013, October). Spatio-temporal indexing in non-relational distributed databases.
In Big Data, 2013 IEEE International Conference on (pp. 291-299). IEEE.
geomesa export -c geomesa.gdelt -f gdelt -u root -p GisPwd
-q "CONTAINS(POLYGON ((0 0, 0 90, 90 90, 90 0, 0 0)),geom)" -m 3
Using GEOMESA_ACCUMULO_HOME = /opt/geomesa
id,globalEventId:String,...,dtg:Date,*geom:Point:srid=4326
139...,671713129,...,2017-07-10T00:00:00.000Z,POINT (5.43827 5.35886)
9e8...,671928676,...,2017-07-10T00:00:00.000Z,POINT (5.43827 5.35886)
d6c...,671817380,...,2017-07-09T00:00:00.000Z,POINT (5.43827 5.35886)
More complex queries & analyses  Spark(SQL)!
SPATIAL QUERIES
GEOMESA
Source: Constantin Stanca “High Performance and Scalable Geospatial Analytics on Cloud with Open Source”
http://www.geomesa.org/documentation/user/spark/sparksql_functions.html
Geometry Constructors
• st_geometryFromText
• st_makeBBOX
• st_makeLine
• st_makePoint
• st_makePolygon
• …
Geometry Accessors
• st_geometryN
• st_isValid
• st_pointN
• st_x
• …
Geometry Outputs
• st_asGeoJSON
• st_asText
• …
Spatial Relationships
• st_area
• st_centroid
• st_closestPoint
• st_contains
• st_covers
• st_crosses
• st_disjoint
• st_distance
• st_distanceSphere
• st_distanceSpheroid
• st_equals
• st_intersects
• st_length
• st_lengthSphere
• st_lengthSpheroid
• st_overlaps
• st_relate
• st_touches
• st_within
Geometry Processing
• st_bufferPoint
• st_convexHull
• …
GEOMESA-SPARK-SQL MODULE
1629/08/2019
1729/08/2019
1829/08/2019
1929/08/2019
2029/08/2019
2129/08/2019
Plugin for GeoServer
 GeoMesa data store
GEOMESA & GEOSERVER
22
WMS-T
ttp://10.101.21.11:8080/geoserver/geomesa/wms?service=WMS&version=1.1.0&request=GetMap
&layers=geomesa:aisdk&styles=point&bbox=-180.0,-90.0,180.0,90.0&width=1500&height=780
&srs=EPSG:4326&format=application/openlayers
&TIME=2017-06-01T06:00:00.000Z/2017-06-01T06:30:00.000Z
WMS-T & QGIS TIME MANAGER
Large technology stack
 Only specific versions work together
 Challenging to set up & manage
PRACTICAL ASPECTS
Setup on one machine for experimental purposes
 GeoDocker
https://github.com/geodocker/geodocker-geomesa
 CCRI’s cloud-local
https://github.com/ccri/cloud-local
FIRST STEPS
CONTACT
Anita Graser
anita.graser@ait.ac.at
@underdarkGIS
anitagraser.com

Big Spatial(!) Data Processing mit GeoMesa. AGIT 2019, Salzburg, Austria.

Editor's Notes

  • #3 Common workflow, ex: training ML models for Travel mode detection Prediction of traffic states Very iterative, esp. With new data sources Assessing data potential Identifying limitations Proof of concept
  • #4 First contact 2015 Large area + high temporal resolution Travel times Anomaly detection Movement prediction Ex: How long does it take a cargo ship from Hamburg to Shanghai? Timestamp Hamburg < Shanghai No intermediate stops at other harbors No observatin gap …
  • #5 Not as bad as week-long training in deep learning but time is ticking! Change  wait until next day  check if it worked  evaluate results  repeat
  • #6 LocationTech (strong NA focus + more professional) OSGeo (more global + more volunteers) Famous LocationTech projects: JTS Java Topology Suite Proj4J Strong collaboration with OSGeo Java projects
  • #7 This description used to be on geomesa.org homepage („Ok, what‘s Accumulo?“  Later / something like a DB) Many similarities but no feature parity Not one piece of software but modular stack
  • #8 (Slides from GeoMesa devs at CCRI) Following slides show options for the four main areas Streaming Persisting Managing Analyzing Our focus so far: persisting & analyzing
  • #9 Promises: Scalable storage Fast I/O Parallel processing For geodata! OGC standards compliant!  key feature!  ensures GIS interoperability for exploration
  • #10 Standard stack = Key components that Geomesa builds on / enables Modular, e.g. Hbase / Accumulo
  • #11 Support for Simple Features as values + Spatial index in key
  • #12 Not your usual spatial index (e.g. R-tree in PostGIS) Too computationally expensive / hard to distribute
  • #13 Accumulo / Hbase sort by one dimension only! Locationtech SFCurves project Even beyond 3D Papers by GeoMesa dev Anthony Fox
  • #17 Multible tables Different indexes Full copy of SimpleFeatures (default in GeoMesa 2)
  • #18 10 million records per day 1.5 years (1/2017 to 6/2018) Count(*) is slow
  • #19 Much faster with spatial filter 414 ships between Gothenburg and Quebec
  • #20 Even faster with spatiotemporal filter
  • #21 Spark alternative: GeoMesa processes directly to datastore Ex: find cargo vessels along a given route Very fast – less overhead than Spark
  • #22 https://www.geomesa.org/documentation/user/process.html#routesearch-process
  • #23 Viz! Straightforward installation Configuration of datastores + layers like any DB Great performance
  • #24 No nice GUI for time in GeoServer preview  Add TIME parameter to request
  • #25 Animation Two WMS-T layers On the fly Real time
  • #26 Many components + versions important + whole cluster needs to be in sync
  • #27 GeoDocker: pretty alpha in 2017 GeoMesa devs (Gitter) recommend / maintain cloud-local = all components on one machine Scripts can be adjusted to cluster setup Downside: need to manually copy config to new nodes to expand cluster / do updates