During the rise and innovation of “big data,” the geospatial analytics landscape has grown and evolved. We are beyond just analyzing static maps. Geospatial data is streaming from devices, sensors, infrastructure systems, or social media, and our applications and use cases must dynamically scale to meet the increased demands.
Cloud can provide cost-effective storage and that ephemeral resource-burst needed for fast processing and low latency, all to monetize the immediate value of fresh geospatial data. Geospatial analytics require optimized spatial data types and algorithms to distill data to knowledge. Such processing, especially with strict latency requirements, has always been a challenge.
We propose an open source big data stack for geospatial analytics on Cloud based on Apache NiFi, Apache Spark and LocationTech GeoMesa. GeoMesa is a geospatial framework deployed in a modern big data platform that provides a scalable and low latency solution for indexing volumes of historical data and generating live views and streaming geospatial analytics.
Presentation and live demo performed at DataWorks 2018 Conference - San Jose: https://bit.ly/2xthAGD
2. 2
Summary
⬢ Loading Geospatial data into the cloud and GeoTools datastores never seems as easy as
it should be. There's sensors network, GPS devices, Twitter streams, FTP servers and all
sorts of other data that you need to parse, convert to SimpleFeatures, and then ingest.
⬢ GeoMesa, NiFi and Spark provides a fully open source solution to ease the pain of
ingesting and analyzing data using ANY GeoTools data store.
⬢ DataPlane Services Cloud Manager (powered by Cloudbreak) helps you to deploy
ephemeral geospatial analytics clusters to support increased computation
requirements, all decoupled from storage.
⬢ We will show how real-time streaming data such as satellite AIS can be ingested and
managed in real-time with NiFi. Also, show how geospatial data stored in S3, HDFS, or
HBase, ORC or Parquet, can be queried at scale using GeoMesa, Spark and Zeppelin.
14. 14
Cloudbreak
⬢ Cloudbreak can be utilized to address
Geospatial computational capacity needs
⬢ Easily spin auto-scalable clusters for
different workloads and purposes, whether
is a Geospatial Ingest Cluster with NiFi and
GeoMesa, or Geospatial Analytics cluster
with Spark and GeoMesa.
⬢ Data can reside in your object store or even
in a persistent data store.
⬢ These ephemeral clusters can be scheduled
for a period of time or only until the job is
done so you pay only what you use.
23. 23
How does HDP/HDF + GeoMesa stream data?
⬢ The GeoMesa Kafka DataStore allows data produces to write CRUD messages to a
Kafka topic.
⬢ Consumers off that topic build up an in-memory representation of the current state of
the world.
⬢ This allows for
– live maps,
– real time analytics, and
– complex event processing.
24. 24
How does HDP/HDF + GeoMesa persist data?
GeoMesa integrates with HBase and Accumulo:
⬢ Key structures use space filling curves
⬢ Complex geospatial filters and processing can be
‘pushed down’ using Filters, Coprocessors, and Iterators
Geomesa’s File System Datastore provides the ability to
store spatio-temporally indexed data on S3 cloud object
store or storage formats like ORC or Parquet.
27. 27
GeoMesa NiFi
⬢ GeoMesa-NiFi allows you to ingest data into GeoMesa straight from NiFi by
leveraging custom processors.
⬢ NiFi allows you to ingest data into GeoMesa from every source GeoMesa supports
and more.
Data
SimpleFeatureType
Schema
GeoMesa NiFi
Processors enabled datastores
28. 28
GeoMesa NiFi Processors
⬢ PutGeoMesaAccumulo: Ingest data into a GeoMesa Accumulo datastore with a
GeoMesa converter or from geoavro
⬢ PutGeoMesaHBase: Ingest data into a GeoMesa HBase datastore with a GeoMesa
converter or from geoavro
⬢ PutGeoMesaFileSystem: Ingest data into a GeoMesa File System datastore with a
GeoMesa converter or from geoavro
⬢ PutGeoMesaKafka: Ingest data into a GeoMesa Kafka datastore with a GeoMesa
converter or from geoavro
⬢ PutGeoTools: Ingest data into an arbitrary GeoTools datastore using a GeoMesa
converter or avro
⬢ ConvertToGeoAvro: Use a GeoMesa converter to create geoavro
30. 30
How does HDP + GeoMesa analyze geospatial data?
GeoMesa integrates deeply with Spark to:
⬢ create spatial User Defined Types and User Defined Functions
– (based on LocationTech JTS, a geometry library)
⬢ optimize spatial queries against GeoMesa DataSources
⬢ persist output data back to GeoMesa
⬢ leverage Zeppelin notebooks to allow for rapid innovation and creativity
⬢ Zeppelin allows analysts to visualize results easily