SlideShare a Scribd company logo
1
High Performance and Scalable Geospatial
Analytics on Cloud with Open Source
James Hughes – CCRI
Constantin Stanca – Hortonworks
3
Summary
• Loading Geospatial data into the cloud and GeoTools datastores never seems as easy as
it should be. There's sensors network, GPS devices, Twitter streams, FTP servers and all
sorts of other data that you need to parse, convert to SimpleFeatures, and then ingest.
• GeoMesa, NiFi and Spark provides a fully open source solution to ease the pain of
ingesting and analyzing data using ANY GeoTools data store.
• DataPlane Services Cloud Manager (powered by Cloudbreak) helps you to deploy
ephemeral geospatial analytics clusters to support increased computation
requirements, all decoupled from storage.
• We will show how real-time streaming data such as satellite AIS can be ingested and
managed in real-time with NiFi. Also, show how geospatial data stored in S3, HDFS, or
HBase, ORC or Parquet, can be queried at scale using GeoMesa, Spark and Zeppelin.
4
Geospatial Analytics
Challenges
5
Data Movement & System Complexity with Added Pressure of Big Data
Acquire
Data
Store
Data
Acquire
Data
Store
Data
Store
Data
Store
Data
Store
Data
Process
and
Analyze
Data
Data
Flow
Acquire
Data
Acquire
Data
6
If That Was Not Enough …
Spatial Data Types
Points
Locations
Events
Instantaneous
Positions
Lines
Road networks
Voyages
Trips
Trajectories
Polygons
Administrative
Regions
Airspaces
7
If That Was Not Enough …
Spatial Data Relationships
equals
disjoint
intersects
touches
crosses
within
contains
overlaps
8
If That Was Not Enough …
Topology Operations
Algorithms
Convex Hull
Buffer
Validation
Dissolve
Polygonization
Simplification
Triangulation
Voronoi
Linear Referencing
and more...
8
9
Requirements for a High
Performance Geospatial
Analytics Platform
10
Traditional Approach
• GIS, data crunching and web serving were three very separate worlds.
• If a web app wanted access to the analysis there was a long process of ETL, DB work,
imports and exports, and bribing various network and storage people for the resources
you needed.
11
Requirements for a High Performance Geospatial Analytics Platform
• IoT sensors present an opportunity to understand the world right now
• A map of the current state of the world enables faster reactions
• The variety of sensors and data source present data management challenges
• Adding new, varied data sources must be easy
• Big data requires distributed storage / computation and scalable infrastructure
• The data layer has to scale
• Analysis has to be easy
12
Scalable Geospatial
Analytics on the Cloud
13
How Cloud Helps to Address Geospatial Big Data Challenges
• Challenges:
• Big data problem (derive insights from all data)
• Compute resources when they are needed (easy scale, easy access to data)
• Solution:
• Cloud provides elastically the needed compute resources, all decoupled from the storage, whether
that is an object store, file system or NoSQL.
14
Importance for Geospatial Analytics
• Spatial streaming visualizations and analytics can present near real-time insights
• Decision makers can respond more rapidly when they see live data feeds on a map
• Spatial batch analytics can fuse multiple data sources together to understand a region
• Patterns of life emerge
• Advertisers can plan their next campaigns
• Business can locate their new store sites
15
Cloudbreak
• Cloudbreak can be utilized to address
Geospatial computational capacity needs
• Easily spin auto-scalable clusters for
different workloads and purposes, whether
is a Geospatial Ingest Cluster with NiFi and
GeoMesa, or Geospatial Analytics cluster
with Spark and GeoMesa.
• Data can reside in your object store or even
in a persistent data store.
• These ephemeral clusters can be scheduled
for a period of time or only until the job is
done so you pay only what you use.
16
LocationTech GeoMesa
17
How GeoMesa Helps with Geospatial Data Type Challenges
• Challenges:
• Vector & raster data
• Geospatial data types
• Solution:
• GeoMesa tools for streaming, persisting, managing, and analyzing spatio-temporal data at scale
18
What Is GeoMesa?
A suite of tools for streaming, persisting, managing, and
analyzing spatio-temporal data at scale
19
What Is GeoMesa?
A suite of tools for streaming, persisting, managing, and
analyzing spatio-temporal data at scale
20
What Is GeoMesa?
A suite of tools for streaming, persisting, managing, and
analyzing spatio-temporal data at scale
21
What Is GeoMesa?
A suite of tools for streaming, persisting, managing, and
analyzing spatio-temporal data at scale
22
What Is GeoMesa?
A suite of tools for streaming, persisting, managing, and
analyzing spatio-temporal data at scale
23
Proposed Reference Architecture
24
How Does HDP/HDF + GeoMesa Stream Data?
• The GeoMesa Kafka DataStore allows data produces to write CRUD messages to a Kafka
topic.
• Consumers off that topic build up an in-memory representation of the current state of
the world.
• This allows for
• live maps,
• real time analytics, and
• complex event processing.
25
How Does HDP/HDF + GeoMesa Persist Data?
GeoMesa integrates with HBase and Accumulo:
• Key structures use space filling curves
• Complex geospatial filters and processing can be
‘pushed down’ using Filters, Coprocessors, and Iterators
GeoMesa’s File System Datastore provides the ability to
store spatio-temporally indexed data on S3 cloud object
store or storage formats like ORC or Parquet.
26
Geospatial Data Flow Transformation
with NiFi and GeoMesa
27
Geo Data in
Motion
(Cloud)
Geo Data
in
Motion
(on-premises)
Geo Data
at Rest
(on-premises)
Edge
Geo Data
Geo Data
in Motion
Edge
Analytics
Geo Data
at Rest
(Cloud)
Edge
Geo Data
Geo Data
at Rest
(on-premises)
Closed
Loop
Analytics
Machine
Learning
Deep
Historical
Analysis
Geospatial Data Flow Transformation with NiFi and GeoMesa
On-Prem
Cloud
Satellite AIS
Spatial Data
28
GeoMesa NiFi
• GeoMesa-NiFi allows you to ingest data into GeoMesa straight from NiFi by leveraging
custom processors.
• NiFi allows you to ingest data into GeoMesa from every source GeoMesa supports and
more.
Data
SimpleFeatureType
Schema
GeoMesa NiFi
Processors enabled datastores
29
GeoMesa NiFi Processors
• PutGeoMesaAccumulo: Ingest data into a GeoMesa Accumulo datastore with a
GeoMesa converter or from geoavro
• PutGeoMesaHBase: Ingest data into a GeoMesa HBase datastore with a GeoMesa
converter or from geoavro
• PutGeoMesaFileSystem: Ingest data into a GeoMesa File System datastore with a
GeoMesa converter or from geoavro
• PutGeoMesaKafka: Ingest data into a GeoMesa Kafka datastore with a GeoMesa
converter or from geoavro
• PutGeoTools: Ingest data into an arbitrary GeoTools datastore using a GeoMesa
converter or avro
• ConvertToGeoAvro: Use a GeoMesa converter to create geoavro
30
Analyze Geospatial Data with
GeoMesa and Spark
31
How does HDP + GeoMesa analyze geospatial data?
• GeoMesa integrates deeply with Spark to:
• create spatial User Defined Types and User Defined Functions
• (based on LocationTech JTS, a geometry library)
• optimize spatial queries against GeoMesa DataSources
• persist output data back to GeoMesa
• leverage Zeppelin notebooks to allow for rapid innovation and creativity
• Zeppelin allows analysts to visualize results easily
32
DEMO
Data Ingest and Interactive Insights with
GeoMesa, NiFi, Spark and Zeppelin
33
Demo
• Introduce EE dataset
• Data management / NiFi overview
• Real-time view + historical recall
• Spark Analysis
34
NiFi-GeoMesa Data Flow
35
Satellite AIS
36
Setup
● Import GeoMesa
dependency
● Create dataframe
backed by GeoMesa
relation
● Create SQL temporary
view so we can query
it
37
Sub-select Data
● Create rough sub-
selection of data
■ Bound by time
■ Bound by bounding
box roughly around
the Gulf of Mexico
● Create a new temporary
view from this sub-
selection
● Cache the data (pull into
memory)
38
Data Exploration
● Query for Tankers in the
Gulf
● Get counts for each type
of Tanker
● Group the counts by day
● Graph counts to see
trends
39
Data Exploration
● Restrict our search to
just Trinity Bay
40
Data Exploration
● Create a new
temporary view of
the number of ships
in Trinity Bay
41
Extra Data
● Pull in Gas price data
○ Acquired from
EIA.gov
○ Two Gas Price
Indexes
■ NYH: New York
Harbor
■ GC: Gulf Coast
● Create temporary view
so we can analyze with
SQL
42
Data Exploration
● Graph data over
time period of
Harvey
● Notice we don’t
have daily values
43
Data Exploration
● Create temporary
view of gas price
data around our
time of interest
44
Data Exploration
● Backfill the price data with
the last value to give us day-
continuous data
● Min/Max Normalize gas and
ship counts
● Graph gas prices and ship
counts together
45
Resources
46
Resources
• GeoMesa Project: http://www.geomesa.org/
• GeoMesa-NiFi: http://www.geomesa.org/documentation/user/nifi.html
• GeoMesa-Spark: http://www.geomesa.org/documentation/user/spark/index.html
• Articles:
• http://www.ccri.com/2017/03/20/new-geomesa-spark-sql-zeppelin-notebooks-support/
• http://www.ccri.com/2018/02/26/interactive-insights-hurricane-harveys-impact-energy-
production-geomesa-jupyter-notebooks/
47
Thank you

More Related Content

What's hot

MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
DataWorks Summit
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
DataWorks Summit
 
GDPR compliance application architecture and implementation using Hadoop and ...
GDPR compliance application architecture and implementation using Hadoop and ...GDPR compliance application architecture and implementation using Hadoop and ...
GDPR compliance application architecture and implementation using Hadoop and ...
DataWorks Summit
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
DataWorks Summit/Hadoop Summit
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
DataWorks Summit
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
DataWorks Summit/Hadoop Summit
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
Ridwan Fadjar
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit Processes
DataWorks Summit
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
DataWorks Summit/Hadoop Summit
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Cask Data
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
DataWorks Summit
 
Large-scaled telematics analytics
Large-scaled telematics analyticsLarge-scaled telematics analytics
Large-scaled telematics analytics
DataWorks Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
About CDAP
About CDAPAbout CDAP
About CDAP
Cask Data
 

What's hot (20)

MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
 
GDPR compliance application architecture and implementation using Hadoop and ...
GDPR compliance application architecture and implementation using Hadoop and ...GDPR compliance application architecture and implementation using Hadoop and ...
GDPR compliance application architecture and implementation using Hadoop and ...
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit Processes
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Large-scaled telematics analytics
Large-scaled telematics analyticsLarge-scaled telematics analytics
Large-scaled telematics analytics
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
About CDAP
About CDAPAbout CDAP
About CDAP
 

Similar to High Performance and Scalable Geospatial Analytics on Cloud with Open Source

Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
GEO Analytics Canada
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
"Constantin \"Cristi\"" Stanca
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
GEO Analytics Canada
 
Geonetwork for Spatial Data
Geonetwork for Spatial DataGeonetwork for Spatial Data
Geonetwork for Spatial Data
Nizam GIS
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Making Sense of Remote Sensing
Making Sense of Remote SensingMaking Sense of Remote Sensing
Making Sense of Remote Sensing
Amazon Web Services
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Kinetica
 
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
The Statistical and Applied Mathematical Sciences Institute
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
Amr Kamel Deklel
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
Eduard Lazar
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution WSO2
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1GurinderG
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
Torsten Steinbach
 
afternoon3.pdf
afternoon3.pdfafternoon3.pdf
afternoon3.pdf
WinnieChu21
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
Databricks
 

Similar to High Performance and Scalable Geospatial Analytics on Cloud with Open Source (20)

Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
Geonetwork for Spatial Data
Geonetwork for Spatial DataGeonetwork for Spatial Data
Geonetwork for Spatial Data
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Making Sense of Remote Sensing
Making Sense of Remote SensingMaking Sense of Remote Sensing
Making Sense of Remote Sensing
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
 
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
afternoon3.pdf
afternoon3.pdfafternoon3.pdf
afternoon3.pdf
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

High Performance and Scalable Geospatial Analytics on Cloud with Open Source

  • 1. 1 High Performance and Scalable Geospatial Analytics on Cloud with Open Source James Hughes – CCRI Constantin Stanca – Hortonworks
  • 2. 3 Summary • Loading Geospatial data into the cloud and GeoTools datastores never seems as easy as it should be. There's sensors network, GPS devices, Twitter streams, FTP servers and all sorts of other data that you need to parse, convert to SimpleFeatures, and then ingest. • GeoMesa, NiFi and Spark provides a fully open source solution to ease the pain of ingesting and analyzing data using ANY GeoTools data store. • DataPlane Services Cloud Manager (powered by Cloudbreak) helps you to deploy ephemeral geospatial analytics clusters to support increased computation requirements, all decoupled from storage. • We will show how real-time streaming data such as satellite AIS can be ingested and managed in real-time with NiFi. Also, show how geospatial data stored in S3, HDFS, or HBase, ORC or Parquet, can be queried at scale using GeoMesa, Spark and Zeppelin.
  • 4. 5 Data Movement & System Complexity with Added Pressure of Big Data Acquire Data Store Data Acquire Data Store Data Store Data Store Data Store Data Process and Analyze Data Data Flow Acquire Data Acquire Data
  • 5. 6 If That Was Not Enough … Spatial Data Types Points Locations Events Instantaneous Positions Lines Road networks Voyages Trips Trajectories Polygons Administrative Regions Airspaces
  • 6. 7 If That Was Not Enough … Spatial Data Relationships equals disjoint intersects touches crosses within contains overlaps
  • 7. 8 If That Was Not Enough … Topology Operations Algorithms Convex Hull Buffer Validation Dissolve Polygonization Simplification Triangulation Voronoi Linear Referencing and more... 8
  • 8. 9 Requirements for a High Performance Geospatial Analytics Platform
  • 9. 10 Traditional Approach • GIS, data crunching and web serving were three very separate worlds. • If a web app wanted access to the analysis there was a long process of ETL, DB work, imports and exports, and bribing various network and storage people for the resources you needed.
  • 10. 11 Requirements for a High Performance Geospatial Analytics Platform • IoT sensors present an opportunity to understand the world right now • A map of the current state of the world enables faster reactions • The variety of sensors and data source present data management challenges • Adding new, varied data sources must be easy • Big data requires distributed storage / computation and scalable infrastructure • The data layer has to scale • Analysis has to be easy
  • 12. 13 How Cloud Helps to Address Geospatial Big Data Challenges • Challenges: • Big data problem (derive insights from all data) • Compute resources when they are needed (easy scale, easy access to data) • Solution: • Cloud provides elastically the needed compute resources, all decoupled from the storage, whether that is an object store, file system or NoSQL.
  • 13. 14 Importance for Geospatial Analytics • Spatial streaming visualizations and analytics can present near real-time insights • Decision makers can respond more rapidly when they see live data feeds on a map • Spatial batch analytics can fuse multiple data sources together to understand a region • Patterns of life emerge • Advertisers can plan their next campaigns • Business can locate their new store sites
  • 14. 15 Cloudbreak • Cloudbreak can be utilized to address Geospatial computational capacity needs • Easily spin auto-scalable clusters for different workloads and purposes, whether is a Geospatial Ingest Cluster with NiFi and GeoMesa, or Geospatial Analytics cluster with Spark and GeoMesa. • Data can reside in your object store or even in a persistent data store. • These ephemeral clusters can be scheduled for a period of time or only until the job is done so you pay only what you use.
  • 16. 17 How GeoMesa Helps with Geospatial Data Type Challenges • Challenges: • Vector & raster data • Geospatial data types • Solution: • GeoMesa tools for streaming, persisting, managing, and analyzing spatio-temporal data at scale
  • 17. 18 What Is GeoMesa? A suite of tools for streaming, persisting, managing, and analyzing spatio-temporal data at scale
  • 18. 19 What Is GeoMesa? A suite of tools for streaming, persisting, managing, and analyzing spatio-temporal data at scale
  • 19. 20 What Is GeoMesa? A suite of tools for streaming, persisting, managing, and analyzing spatio-temporal data at scale
  • 20. 21 What Is GeoMesa? A suite of tools for streaming, persisting, managing, and analyzing spatio-temporal data at scale
  • 21. 22 What Is GeoMesa? A suite of tools for streaming, persisting, managing, and analyzing spatio-temporal data at scale
  • 23. 24 How Does HDP/HDF + GeoMesa Stream Data? • The GeoMesa Kafka DataStore allows data produces to write CRUD messages to a Kafka topic. • Consumers off that topic build up an in-memory representation of the current state of the world. • This allows for • live maps, • real time analytics, and • complex event processing.
  • 24. 25 How Does HDP/HDF + GeoMesa Persist Data? GeoMesa integrates with HBase and Accumulo: • Key structures use space filling curves • Complex geospatial filters and processing can be ‘pushed down’ using Filters, Coprocessors, and Iterators GeoMesa’s File System Datastore provides the ability to store spatio-temporally indexed data on S3 cloud object store or storage formats like ORC or Parquet.
  • 25. 26 Geospatial Data Flow Transformation with NiFi and GeoMesa
  • 26. 27 Geo Data in Motion (Cloud) Geo Data in Motion (on-premises) Geo Data at Rest (on-premises) Edge Geo Data Geo Data in Motion Edge Analytics Geo Data at Rest (Cloud) Edge Geo Data Geo Data at Rest (on-premises) Closed Loop Analytics Machine Learning Deep Historical Analysis Geospatial Data Flow Transformation with NiFi and GeoMesa On-Prem Cloud Satellite AIS Spatial Data
  • 27. 28 GeoMesa NiFi • GeoMesa-NiFi allows you to ingest data into GeoMesa straight from NiFi by leveraging custom processors. • NiFi allows you to ingest data into GeoMesa from every source GeoMesa supports and more. Data SimpleFeatureType Schema GeoMesa NiFi Processors enabled datastores
  • 28. 29 GeoMesa NiFi Processors • PutGeoMesaAccumulo: Ingest data into a GeoMesa Accumulo datastore with a GeoMesa converter or from geoavro • PutGeoMesaHBase: Ingest data into a GeoMesa HBase datastore with a GeoMesa converter or from geoavro • PutGeoMesaFileSystem: Ingest data into a GeoMesa File System datastore with a GeoMesa converter or from geoavro • PutGeoMesaKafka: Ingest data into a GeoMesa Kafka datastore with a GeoMesa converter or from geoavro • PutGeoTools: Ingest data into an arbitrary GeoTools datastore using a GeoMesa converter or avro • ConvertToGeoAvro: Use a GeoMesa converter to create geoavro
  • 29. 30 Analyze Geospatial Data with GeoMesa and Spark
  • 30. 31 How does HDP + GeoMesa analyze geospatial data? • GeoMesa integrates deeply with Spark to: • create spatial User Defined Types and User Defined Functions • (based on LocationTech JTS, a geometry library) • optimize spatial queries against GeoMesa DataSources • persist output data back to GeoMesa • leverage Zeppelin notebooks to allow for rapid innovation and creativity • Zeppelin allows analysts to visualize results easily
  • 31. 32 DEMO Data Ingest and Interactive Insights with GeoMesa, NiFi, Spark and Zeppelin
  • 32. 33 Demo • Introduce EE dataset • Data management / NiFi overview • Real-time view + historical recall • Spark Analysis
  • 35. 36 Setup ● Import GeoMesa dependency ● Create dataframe backed by GeoMesa relation ● Create SQL temporary view so we can query it
  • 36. 37 Sub-select Data ● Create rough sub- selection of data ■ Bound by time ■ Bound by bounding box roughly around the Gulf of Mexico ● Create a new temporary view from this sub- selection ● Cache the data (pull into memory)
  • 37. 38 Data Exploration ● Query for Tankers in the Gulf ● Get counts for each type of Tanker ● Group the counts by day ● Graph counts to see trends
  • 38. 39 Data Exploration ● Restrict our search to just Trinity Bay
  • 39. 40 Data Exploration ● Create a new temporary view of the number of ships in Trinity Bay
  • 40. 41 Extra Data ● Pull in Gas price data ○ Acquired from EIA.gov ○ Two Gas Price Indexes ■ NYH: New York Harbor ■ GC: Gulf Coast ● Create temporary view so we can analyze with SQL
  • 41. 42 Data Exploration ● Graph data over time period of Harvey ● Notice we don’t have daily values
  • 42. 43 Data Exploration ● Create temporary view of gas price data around our time of interest
  • 43. 44 Data Exploration ● Backfill the price data with the last value to give us day- continuous data ● Min/Max Normalize gas and ship counts ● Graph gas prices and ship counts together
  • 45. 46 Resources • GeoMesa Project: http://www.geomesa.org/ • GeoMesa-NiFi: http://www.geomesa.org/documentation/user/nifi.html • GeoMesa-Spark: http://www.geomesa.org/documentation/user/spark/index.html • Articles: • http://www.ccri.com/2017/03/20/new-geomesa-spark-sql-zeppelin-notebooks-support/ • http://www.ccri.com/2018/02/26/interactive-insights-hurricane-harveys-impact-energy- production-geomesa-jupyter-notebooks/