Feature Geo Analytics and
Big Data Processing:
Hybrid Approaches for Earth Science
and Real-Time Decision Making
Mansour Raad, Erik Hoel, Michael Park, Adam
Mollenkopf, Dawn J. Wright
Environmental Systems Research Institute (aka Esri)
IN12A-01 (Invited)
AGU Fall Meeting, 12 December 2016
What is Feature Geo Analytics?
A new way of processing spatiotemporal data designed for WEB-
BASED big data by leveraging distributed analytics and storage
• Works with existing GIS data and tabular data
• Designed to perform both spatial and temporal analysis
• Uses familiar workflows to complete complex analyses
• “Hybridity” - integrating open-source frameworks on clusters to run analytics
Feature Geo Analytics
Geoprocessing
Distributed analytics and storage
Feature Geo Analytics
Portal
Web GIS Layers
newmore extends
Solve New Problems
Run analytics:
• against data too big for a single desktop machine
- Buffer 8.2 million points or thousands of polygons in a little over a minute
- billions of observations of ship movements ingested via GeoEvent
• designed to gain insight into both spatial and temporal patterns
• against massive collections in a scalable manner
• and meet time constraints
months weeks days hours minutes
Geo Analytics Architectural Overview
Portal
Web GIS Layers
Un-Managed Data
New Web GIS Layers
Register large data stores, then distribute
spatial analysis across cluster of machines
for parallel processing
Store and/or deploy to web
Web GIS layers
via Pro, Portal,
Python Notebooks,
or the REST API
Managed Data
Relational
Data Store
Spatiotemporal
Data Store
Files
Files
Delimited Files EnterpriseShapefiles Big Data Stores
Server
Cluster
Rich Collection of (Web) Analysis Tools
Summarize Data
Aggregate Points
Summarize Nearby
Summarize Within
Reconstruct Tracks
Join Features
Find Locations
Find Existing Locations
Find Similar Locations
Analyze Patterns
Calculate Density
Find Hot Spots
Create Space Time Cube
Use Proximity
Create Buffers
Manage Data
Extract Data
* Temporally aware tools
Aggregate Points
Summarize Nearby
Summarize Within
Find Existing Locations
Find Similar Locations
Calculate Density
Find Hot Spots
Create Buffers
Extract Data
Analytical Overview: Aggregating and Summarizing
• Spatial Joins
• Space-time slices
• Spatiotemporal joins
Target Features Join Features Intermediate Result Final Result
Analytical Overview: Aggregating and Summarizing
Temporal Relationships on Intervals
• Points into Bins
Analytical Overview: Aggregating and Summarizing
Aggregation – Polygons vs Cells
Aggregation By Polygons Aggregation By Cells
• Reconstruct Tracks
- Summarize time-enabled points into tracks
Analytical Overview: Aggregating and Summarizing
Use Case: Hurricane Tracts
• Hurricane dataset
- 120,000 points, ~100 years
- Each point has:
- ID number
- Location
- Date
- Wind speed and pressure attributes
- Problems?
- Difficult to visualize that many points
- Difficult to visualize hurricane path
“Hybridity” for Distributed Computation
See also www.esri.com/software/open
“Hybridity” for Distributed Computation
See also www.esri.com/software/open
Real-Time GIS Performance
ArcGIS 10.4
10s of thousands of e/s
ArcGIS Spatiotemporal
Big Data Store
DesktopWeb Device
ArcGIS Server
4,000
e/s
Ingestion
GeoEvent
4,000
e/s
Visualization
Live and Historic
Aggregates & Features
Enhanced Map and
Feature Service
• Ingest high-velocity real-
time data
• Observations in a Big Data
Store
• Visualize high-velocity,
high-volume data
- as an AGGREGATION,
- as discrete FEATURES,
- live & HISTORICALLY
• Visualizations CAN scale
Stream Service
Stream Layer
3,000
e/s
Live Features
Geo Analytics Performance
Spatiotemporal
Big Data Store
Discussion groups at geonet.esri.com
Step 1. Click orange “Join in” button to create your
account.
Step 2. Join the Big Data or Sciences groups
Step 3. Contribute to AGU conversations!
Mansour Raad, Esri Big Data Team
mraad@esri.com
thunderheadxpler.blogspot.com
github.com/mraad
@mraad
For Questions/Discussion

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support

  • 1.
    Feature Geo Analyticsand Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Making Mansour Raad, Erik Hoel, Michael Park, Adam Mollenkopf, Dawn J. Wright Environmental Systems Research Institute (aka Esri) IN12A-01 (Invited) AGU Fall Meeting, 12 December 2016
  • 2.
    What is FeatureGeo Analytics? A new way of processing spatiotemporal data designed for WEB- BASED big data by leveraging distributed analytics and storage • Works with existing GIS data and tabular data • Designed to perform both spatial and temporal analysis • Uses familiar workflows to complete complex analyses • “Hybridity” - integrating open-source frameworks on clusters to run analytics
  • 3.
    Feature Geo Analytics Geoprocessing Distributedanalytics and storage Feature Geo Analytics Portal Web GIS Layers newmore extends
  • 4.
    Solve New Problems Runanalytics: • against data too big for a single desktop machine - Buffer 8.2 million points or thousands of polygons in a little over a minute - billions of observations of ship movements ingested via GeoEvent • designed to gain insight into both spatial and temporal patterns • against massive collections in a scalable manner • and meet time constraints months weeks days hours minutes
  • 5.
    Geo Analytics ArchitecturalOverview Portal Web GIS Layers Un-Managed Data New Web GIS Layers Register large data stores, then distribute spatial analysis across cluster of machines for parallel processing Store and/or deploy to web Web GIS layers via Pro, Portal, Python Notebooks, or the REST API Managed Data Relational Data Store Spatiotemporal Data Store Files Files Delimited Files EnterpriseShapefiles Big Data Stores Server Cluster
  • 6.
    Rich Collection of(Web) Analysis Tools Summarize Data Aggregate Points Summarize Nearby Summarize Within Reconstruct Tracks Join Features Find Locations Find Existing Locations Find Similar Locations Analyze Patterns Calculate Density Find Hot Spots Create Space Time Cube Use Proximity Create Buffers Manage Data Extract Data * Temporally aware tools Aggregate Points Summarize Nearby Summarize Within Find Existing Locations Find Similar Locations Calculate Density Find Hot Spots Create Buffers Extract Data
  • 7.
    Analytical Overview: Aggregatingand Summarizing • Spatial Joins • Space-time slices
  • 8.
    • Spatiotemporal joins TargetFeatures Join Features Intermediate Result Final Result Analytical Overview: Aggregating and Summarizing
  • 9.
  • 10.
    • Points intoBins Analytical Overview: Aggregating and Summarizing
  • 11.
    Aggregation – Polygonsvs Cells Aggregation By Polygons Aggregation By Cells
  • 12.
    • Reconstruct Tracks -Summarize time-enabled points into tracks Analytical Overview: Aggregating and Summarizing
  • 13.
    Use Case: HurricaneTracts • Hurricane dataset - 120,000 points, ~100 years - Each point has: - ID number - Location - Date - Wind speed and pressure attributes - Problems? - Difficult to visualize that many points - Difficult to visualize hurricane path
  • 14.
    “Hybridity” for DistributedComputation See also www.esri.com/software/open
  • 15.
    “Hybridity” for DistributedComputation See also www.esri.com/software/open
  • 16.
    Real-Time GIS Performance ArcGIS10.4 10s of thousands of e/s ArcGIS Spatiotemporal Big Data Store DesktopWeb Device ArcGIS Server 4,000 e/s Ingestion GeoEvent 4,000 e/s Visualization Live and Historic Aggregates & Features Enhanced Map and Feature Service • Ingest high-velocity real- time data • Observations in a Big Data Store • Visualize high-velocity, high-volume data - as an AGGREGATION, - as discrete FEATURES, - live & HISTORICALLY • Visualizations CAN scale Stream Service Stream Layer 3,000 e/s Live Features Geo Analytics Performance Spatiotemporal Big Data Store
  • 17.
    Discussion groups atgeonet.esri.com Step 1. Click orange “Join in” button to create your account. Step 2. Join the Big Data or Sciences groups Step 3. Contribute to AGU conversations! Mansour Raad, Esri Big Data Team mraad@esri.com thunderheadxpler.blogspot.com github.com/mraad @mraad For Questions/Discussion

Editor's Notes

  • #3 “hybrid” in that ArcGIS Server integrates open-source big data frameworks such as Apache Hadoop and Apache Spark on the cluster in order to run the analytics
  • #4 Building blocks of this approach
  • #5 buffer 8.2 million points or thousands of polygons in a little over a minute Meet time constraints, especially against the next NSF proposal deadlines
  • #6 These “feature geo analytics” tools run in both batch and streaming spatial analysis mode as distributed computations across a cluster of servers on typical “big” data sets, where static data exist in traditional geospatial formats (e.g., shapefile) locally on a disk or file share, attached as static spatiotemporal big data stores, or streamed in near-real-time. In other words, the approach registers large datasets or data stores with ArcGIS Enterprise (Server), then distributes analysis across a cluster of machines for parallel processing. We aim to register large data stores / data sets with ArcGIS Server, then distribute analysis across a cluster of machines for parallel processing Many frameworks/technologies exist for distributing computation E.g., Hadoop, MapReduce, Spark Spark: processes distributed data in memory; Supports MapReduce programming model Includes additional framework level distributed algorithms ArcGIS Server integrates these technologies on a cluster to solve analytic problems
  • #8 Due to lack of time, will focus on Aggregation and Summarizing
  • #15 Many frameworks/technologies exist for distributing computation E.g., Hadoop, MapReduce, Spark Spark: processes distributed data in memory; Supports MapReduce programming model Includes additional framework level distributed algorithms ArcGIS Server integrates these technologies on a cluster to solve analytic problems
  • #16  For fast, dynamic queries, integrate Cloudera Impala which is an open-source query engine that runs on Apache Hadoop (Hadoop Distributed File System). Delivers fast SQL processing on HDFS Read/write data in HDFS using Impala Write code in Python, Java, Scala (like C, ”scalable language”) ArcPy helps you to perform geographic data analysis in Python By the way, you’ll need at least 8 CPU cores 16 Gb RAM (32 Gb is better) 512 Gb Solid State Drive (1 Tb is better)
  • #17 e/s = events per second We aim to register large data stores / data sets with ArcGIS Server, then distribute analysis across a cluster of machines for parallel processing Performance example: buffer 8.2 million points or thousands of polygons in a little over a minute, Coming: ~250,000 writes to disk per second across 5 nodes Many frameworks/technologies exist for distributing computation E.g., Hadoop, MapReduce, Spark Spark: processes distributed data in memory; Supports MapReduce programming model Includes additional framework level distributed algorithms ArcGIS Server integrates these technologies on a cluster to solve analytic problems