SlideShare a Scribd company logo
GeoMesa: Using
Accumulo for optimized
spatio-temporal
processing
Dr. James Hughes, CCRi
james.hughes@ccri.com
GeoMesa is
● A collection of libraries and modules which can be used to
solve Big Geo Data problems
○ Great for managing billions to trillions of vector data
○ Great for streaming vector data
● Open sourced through Eclipse’s LocationTech working group and has
graduated incubation
● Built on top of great open source libraries
GeoMesa Background
Such architectures allow for live views and near-real time processing (speed layer)
while persisting the data for historic queries and batch analysis (batch layer).
Client access to both layers can be handled by GeoServer.
GeoMesa enables Lambda architectures
Suppose we wish to monitor and understand a group of GPS-enabled and
internet-enabled devices (ex: sensors, vehicles).
● GeoMesa’s ETL / converter library aids in re-usable data modeling.
● GeoMesa’s NiFi support will let us move Flow Files around easily and ingest
into Accumulo and Kafka topics.
● Leveraging GeoMesa’s Kafka DataStore, one can implement CEP such as
1) geo-fencing, 2) location trackers, and 3) complex alerting rules.
● Effective storage in Accumulo allows for fast query returns.
● End-to-end visualization and analysis supports allows aggregations to pushed
down to the Accumulo tablet servers.
● GeoMesa’s Spark + Jupyter support allows for quick prototyping, ad hoc
interactive analysis and data discovery.
Example Use Case: Managing Internet-Aware Devices
Suppose we wish to monitor and understand a group of GPS-enabled and
internet-enabled devices (ex: sensors, vehicles).
● GeoMesa’s ETL / converter library aids in re-usable data modeling.
● GeoMesa’s NiFi support will let us move Flow Files around easily and ingest
into Accumulo and Kafka topics.
● Leveraging GeoMesa’s Kafka DataStore, one can implement CEP such as
1) geo-fencing, 2) location trackers, and 3) complex alerting rules.
● Effective storage in Accumulo allows for fast query returns.
● End-to-end visualization and analysis supports allows aggregations to pushed
down to the Accumulo tablet servers.
● GeoMesa’s Spark + Jupyter support allows for quick prototyping, ad hoc
interactive analysis and data discovery.
All of this adds up to “Speed! Speed! Speed!” whether you are looking at
a live view of the data or pulling back an analysis product.
Example Use Case: Managing Internet-Aware Devices
Enabling and making visualization and analysis quick has been a journey and this
talk is about our steps so far
Talk Outline
Enabling and making visualization and analysis quick has been a journey and this
talk is about our steps so far
1. Space-filling curves and storing spatio-temporal data
2. Improvements to GeoMesa use and implementation of Accumulo Iterators
3. Spark and MapReduce for distributed computation
Talk Outline
Enabling and making visualization and analysis quick has been a journey and this
talk is about our steps so far
1. Space-filling curves and storing spatio-temporal data
2. Improvements to GeoMesa use and implementation of Accumulo Iterators
3. Spark and MapReduce for distributed computation
Not in this talk
1. Storm / NiFi - Streaming Ingest
2. Live views and online processing with Kafka
3. Command line tools
4. ETL / parser library
5. Machine learning / Deep Analytics
Talk Outline
● Accumulo Key Design
● Space Filling Curves 101
● Indices for Points with Time
● Indices for Lines and Polygons
● Lessons Learned
GeoMesa's
evolution of
Accumulo
schemas
In a traditional stack, the application
issues queries to a database which is
responsible for query planning.
Overview of query planning in Accumulo
In a traditional stack, the application
issues queries to a database which is
responsible for query planning.
Overview of query planning in Accumulo
With Accumulo, the query planning is
handled by library code in the
application.
● Goal: Index 2+ dimensional data
● Approach: Use Space Filling Curves
Space Filling Curves (in one slide!)
● Goal: Index 2+ dimensional data
● Approach: Use Space Filling Curves
● First, ‘grid’ the data space into bins.
Space Filling Curves (in one slide!)
● Goal: Index 2+ dimensional data
● Approach: Use Space Filling Curves
● First, ‘grid’ the data space into bins.
● Next, order the grid cells with a space
filling curve.
○ Label the grid cells by the order
that the curve visits the them.
○ Associate the data in that grid cell
with a byte representation of the
label.
Space Filling Curves (in one slide!)
● Goal: Index 2+ dimensional data
● Approach: Use Space Filling Curves
● First, ‘grid’ the data space into bins.
● Next, order the grid cells with a space
filling curve.
○ Label the grid cells by the order
that the curve visits the them.
○ Associate the data in that grid cell
with a byte representation of the
label.
● We prefer “good” space filling curves:
○ Want recursive curves and locality.
Space Filling Curves (in one slide!)
● Goal: Index 2+ dimensional data
● Approach: Use Space Filling Curves
● First, ‘grid’ the data space into bins.
● Next, order the grid cells with a space
filling curve.
○ Label the grid cells by the order
that the curve visits the them.
○ Associate the data in that grid cell
with a byte representation of the
label.
● We prefer “good” space filling curves:
○ Want recursive curves and locality.
● Space filling curves have higher
dimensional analogs.
Space Filling Curves (in one slide!)
To query for points in the grey rectangle, the
query planner enumerates a collection of index
ranges which cover the area.
Note: Most queries won’t line up perfectly with the
gridding strategy.
Further filtering can be run on the Accumulo
tablet servers with Iterators (next section)
or we can return ‘loose’ bounding box results
(likely more quickly).
Query planning with Space Filling Curves
GeoMesa has several tables; each optimized for a particular use case.
The Z3 table is used with and optimized for temporal point data. (Think sensor
observations, track reports, or other events which happen at particular location.)
GeoMesa Key Structure for the ‘Z3’ table
Key Value
Row
Column
Record
Family Qualifier
Shard
1-Byte
Epoch
Week
2-Bytes
Z3(x,y,t)
8-Bytes
‘F’
Here and now:
(38.9864985, -76.9561856)
10:15am, Tuesday, Oct. 11th, 2016
Epoch Week: 2440
X value: 1275689
Y value: 151972
T value: 2097151
Z3 (as a long):
6430470637115132837
Most approaches to indexing non-point
geometries involve covering the
geometry with a number of grid cells
and storing a copy with each index.
This means that the client has to
deduplicate results which is expensive.
Indexing non-point geometries: New XZ Index
Most approaches to indexing non-point
geometries involve covering the
geometry with a number of grid cells
and storing a copy with each index.
This means that the client has to
deduplicate results which is expensive.
Böhm, Klump, and Kriegel describe an
indexing strategy allows such
geometries to be stored once.
GeoMesa has implemented this
strategy in XZ2 (spatial-only) and XZ3
(spatio-temporal) tables.
The key is to store data by resolution,
separate geometries by size, and then
index them by their lower left corner.
This does require consideration on the
query planning side, but avoiding
deduplication is worth the trade-off.
Indexing non-point geometries: New XZ Index
For more details, see Böhm, Klump, and Kriegel. “XZ-ordering: a space-filling curve for objects with spatial
extension.” 6th. Int. Symposium on Large Spatial Databases (SSD), 1999, Hong Kong, China.
(http://www.dbs.ifi.lmu.de/Publikationen/Boehm/Ordering_99.pdf)
● Accumulo Iterator Overview
● GeoMesa Iterators for Analysis
and Visualization
● Iterator Lessons Learned
GeoMesa's use
of Accumulo
Iterators
“Iterators provide a modular mechanism for adding functionality to be executed by
TabletServers when scanning or compacting data. This allows users to efficiently
summarize, filter, and aggregate data.” -- Accumulo 1.7 documentation
Part of the modularity is that the iterators can be stacked:
t the output of one can be wired into the next.
Example: The first iterator might read from disk, the second could filter with
Authorizations, and a final iterator could filter by column family.
Other notes:
● Iterators provided a sorted view of the key/values.
● Iterator code can be loaded from HDFS and namespaced!
Accumulo Iterators
Visualization Example: Heatmaps
Without powerful visualization options,
big data is big nonsense.
Consider this view of shipping in the
Mediterranean sea
Visualization Example: Heatmaps
Without powerful visualization options,
big data is big nonsense.
Consider this view of shipping in the
Mediterranean sea
Heatmaps help show patterns and
they can be accelerated with
GeoMesa
Visualization Example: Heatmaps
Without powerful visualization options,
big data is big nonsense.
Consider this view of shipping in the
Mediterranean sea
Heatmaps help show patterns and
they can be accelerated with
GeoMesa
Heatmap
Request
HeatMap WPS
Query Hints
A request to GeoMesa consists of two broad pieces:
1. A filter restricting the data to act on, e.g.:
a. Records in Maryland with ‘Accumulo’ in the text field.
b. Records during the first week of 2016.
2. A request for ‘how’ to return the data, e.g.:
a. Return the full records
b. Return a subset of the record (either a projection or ‘bin’ file format)
c. Return a histogram
d. Return a heatmap / kernel density
Generally, a filter can be handled partially by selecting which ranges to scan; the
remainder can be handled by an Iterator.
Modifications to selected data can also be handled by a GeoMesa Iterator.
GeoMesa Data Requests
The first pass of GeoMesa iterators separated concerns into separate iterators.
The GeoMesa query planner assembled a stack of iterators to achieve the desired
result.
Initial GeoMesa Iterator design
Image from “Spatio-temporal Indexing in Non-relational Distributed Databases” by
Anthony Fox, Chris Eichelberger, James Hughes, Skylar Lyon
The key benefit to having decomposed iterators is that they are easier to
understand and re-mix.
In terms of performance, each one needs to understand the bytes in the Key and
Value. In many cases, this will lead to additional serialization/deserialization.
Now, we prefer to write Iterators which handle transforming the underlying data
into what the client code is expecting in one go.
Second GeoMesa Iterator design
1. Using fewer iterators in the stack can be beneficial
2. Using lazy evaluation / deserialization for filtering Values can power speed
improvements.
3. Iterators take in Sorted Keys + Values and *must* produce Sorted Keys and
Values.
4. Accumulo 1.8.0 has an Iterator Test Harness!
https://accumulo.apache.org/release_notes/1.8.0#iterator-test-harness
https://accumulo.apache.org/1.8/accumulo_user_manual.html#_iterator_testing
Lessons learned about Iterators
Through our use of a) space filling curves, b) a cost-based query optimizer, and
c) carefully configured iterators, the GeoMesa query planner has a lot going on.
The GeoMesa query explainer logs 1) which index was used, 2) which ranges
where scanned, 3) Iterator configuration, etc.
Putting all together: the GeoMesa Query Explainer
geomesa> geomesa explain -u USER -p PASS -i INSTANCE -c geomesa -z zoo1,zoo2,zoo3 -f AccumuloQuickStart -q "Who =
'Bierce'"
Planning 'AccumuloQuickStart' Who = 'Bierce'
Original filter: Who = 'Bierce'
Hints: density[false] bin[false] stats[false] map-aggregate[false] sampling[none]
Sort: none
Transforms: None
Strategy selection:
Query processing took 69ms and produced 1 options
Filter plan: FilterPlan[ATTRIBUTE[Who = 'Bierce'][None]]
Strategy selection took 8ms for 1 options
Strategy 1 of 1: AttributeIdxStrategy
Strategy filter: ATTRIBUTE[Who = 'Bierce'][None]
Plan: org.locationtech.geomesa.accumulo.index.BatchScanPlan
Table: geomesa_attr
Deduplicate: false
Column Families: all
Ranges (1): [%01;%00;%00;Bierce%00;::%01;%00;%00;Bierce%01;)
Iterators (0):
Query planning took 119ms
Verify hints
Inspect strategies considered
See table and ranges to be scanned
Quantify planning time
● GeoMesa + Spark Setup
● GeoMesa + Spark Analytics
● GeoMesa powered notebooks
(Jupyter and Zeppelin)
GeoMesa’s
Spark Support:
Data Analysis
and Discovery
Using Accumulo Iterators, we’ve seen how one can easily
perform simple ‘MapReduce’ style jobs without needing more
infrastructure.
NB: Those tasks are limited. One can filter inputs,
transform/map records and aggregate partial results on each
tablet server.
To implement more complex processes, we look to
MapReduce and Spark.
GeoMesa MapReduce and Spark Support
Using Accumulo Iterators, we’ve seen how one can easily
perform simple ‘MapReduce’ style jobs without needing more
infrastructure.
NB: Those tasks are limited. One can filter inputs,
transform/map records and aggregate partial results on each
tablet server.
To implement more complex processes, we look to
MapReduce and Spark.
Accumulo Implements the MapReduce InputFormat interface.
GeoMesa MapReduce and Spark Support
Using Accumulo Iterators, we’ve seen how one can easily
perform simple ‘MapReduce’ style jobs without needing more
infrastructure.
NB: Those tasks are limited. One can filter inputs,
transform/map records and aggregate partial results on each
tablet server.
To implement more complex processes, we look to
MapReduce and Spark.
Accumulo Implements the MapReduce InputFormat interface.
Spark provides a way to change InputFormats into RDDs.
GeoMesa MapReduce and Spark Support
Using Accumulo Iterators, we’ve seen how one can easily
perform simple ‘MapReduce’ style jobs without needing more
infrastructure.
NB: Those tasks are limited. One can filter inputs,
transform/map records and aggregate partial results on each
tablet server.
To implement more complex processes, we look to
MapReduce and Spark.
Accumulo Implements the MapReduce InputFormat interface.
Spark provides a way to change InputFormats into RDDs.
So with a little glue code and Spark classpath/environment
management, GeoMesa has Spark support!
GeoMesa MapReduce and Spark Support
GeoMesa Spark Example 1: Time Series
Step 1: Get an RDD[SimpleFeature]
Step 2: Calculate the time series
Step 3: Plot the time series in R.
Using one dataset (country boundaries) to group another (here, GDELT) is
effectively a join.
Our summer intern, Atallah, worked out the details of doing this analysis in Spark
and created a tutorial and blog post.
This picture shows ‘stability’ of a region from GDELT Goldstein values
GeoMesa Spark Example 2: Aggregating by Regions
http://www.ccri.com/2016/08/17/new-geomesa-tutorial-aggregating-visualizing-data/
http://www.geomesa.org/documentation/tutorials/shallow-join.html
GeoMesa Spark Example 3: Aggregating Tweets about #traffic
Virginia Polygon CQL
GeoMesa RDD
Aggregate by County
Calculate ratio of #traffic
Store back to GeoMesa
GeoMesa Spark Example 3: Aggregating Tweets about #traffic
#traffic by Virginia county
Darker blue has a higher count
Problem: Another developer came by and mentioned that his Spark job using
GeoMesa had quite a few tasks (far more than expected).
Around the same time, Eugene Cheipesh (Azavea / GeoTrellis) wrote in to the
Accumulo user list…
In Accumulo 1.6.x, each range in the Accumulo InputFormat becomes a Split.
With space filling curves, it is easy to enumerate plenty of ranges for a query.
Solution: The short term solution was to create a custom InputFormat which
produce Splits which contain more than one range.
A small bump in the road…
Interactive Data Discovery at Scale in GeoMesa Notebooks
Writing (and debugging!) MapReduce /
Spark jobs is slow and requires
expertise.
A long development cycle for an
analytic saps energy and creativity.
The answer to both is interactive
‘notebook’ servers like Apache
Zeppelin and Jupyter (formerly
iPython Notebook).
Interactive Data Discovery at Scale in GeoMesa Notebooks
Writing (and debugging!) MapReduce /
Spark jobs is slow and requires
expertise.
A long development cycle for an
analytic saps energy and creativity.
The answer to both is interactive
‘notebook’ servers like Apache
Zeppelin and Jupyter (formerly
iPython Notebook).
There are two big things to work out:
1. Getting the right libraries on the
classpath.
2. Wiring up visualizations.
Interactive Data Discovery at Scale in GeoMesa Notebooks
GeoMesa Notebook Roadmap:
● Improved JavaScript integration
● D3.js and other visualization
libraries
● OpenLayers and Leaflet
● Python Bindings
Questions?
Find out more at http://geomesa.org
Connect with us on Gitter:
https://gitter.im/locationtech/geomes
a
See applications at CCRi’s blog:
http://www.ccri.com/blog/
Backup slides
http://www.eichelberger.org/sfseize/index.html
Talk filling curves
GeoMesa Converter Library
The Converter library is used in
1. The GeoMesa command line tools
2. GeoMesa’s NiFi processors
Configurations support XML, CSV, TSV JSON, Avro, and more!
Examples are available for GeoNames, GDELT,OSM-GPX, Twitter, and others.
Live view with the GeoMesa Kafka DataStore
Q: How did you get billions of points?
A: Data is streaming in continually.
Examples come from IoT related
applications:
10 thousand sensors reporting
every 5 seconds generate 1.2 billion
records in a week.
In these cases, we want to see where
things are right now.
GeoMesa Kafka DataStore Architecture
We have two issues to address:
1. In-memory index of
SimpleFeatures
2. Durable message passing system
For indexing, we use a combination of
Guava and CQEngine (efficient Java
collections).
Kafka serves as the message passing
system.
Consumer KDSes can be run in Storm
(for event processing), GeoServer (OGC
access), etc.
Z-Order Hilbert
Around 100 years ago, mathematicians asked the question,
“Is there a continuous function from the unit interval to the unit square
which covers it?”
Space Filling Curves: The Math
Row-Major
Streaming Data Architecture; Part 1
Continuous ingest:
GeoMesa-NiFi
leverages the
GeoMesa converter
library

More Related Content

What's hot

Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Databricks
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
Xiao Qin
 
Hadoop
HadoopHadoop
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Govt.Engineering college, Idukki
 
GeoMesa: Scalable Geospatial Analytics
GeoMesa:  Scalable Geospatial AnalyticsGeoMesa:  Scalable Geospatial Analytics
GeoMesa: Scalable Geospatial Analytics
VisionGEOMATIQUE2014
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkDache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Safir Shah
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Advancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGISAdvancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGIS
The HDF-EOS Tools and Information Center
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
European Data Forum
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
The HDF-EOS Tools and Information Center
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
Rob Emanuele
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
Muralidharan Deenathayalan
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
Zubair Nabi
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
Avinash Pandu
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
Tianwei Liu
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence Generator
Rim Moussa
 
parallel OLAP
parallel OLAPparallel OLAP
parallel OLAP
Rim Moussa
 
MapReduce
MapReduceMapReduce

What's hot (20)

Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
 
Hadoop
HadoopHadoop
Hadoop
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
 
GeoMesa: Scalable Geospatial Analytics
GeoMesa:  Scalable Geospatial AnalyticsGeoMesa:  Scalable Geospatial Analytics
GeoMesa: Scalable Geospatial Analytics
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkDache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Advancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGISAdvancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGIS
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence Generator
 
parallel OLAP
parallel OLAPparallel OLAP
parallel OLAP
 
MapReduce
MapReduceMapReduce
MapReduce
 

Viewers also liked

Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit
 
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit
 
Accumulo design
Accumulo designAccumulo design
Accumulo design
scsorensen
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109
Sqrrl
 
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
Aaron Cordova
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo Clusters
Aaron Cordova
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit
 
Accumulo: A Quick Introduction
Accumulo: A Quick IntroductionAccumulo: A Quick Introduction
Accumulo: A Quick Introduction
James Salter
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit
 
GeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in AccumuloGeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in Accumulo
CvilleDataScience
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
Rob Emanuele
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
Jody Garnett
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
Sqrrl
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
Rob Emanuele
 
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataOct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Yahoo Developer Network
 

Viewers also liked (20)

Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
 
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
 
Accumulo design
Accumulo designAccumulo design
Accumulo design
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109
 
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the Enterprise
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo Clusters
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Accumulo: A Quick Introduction
Accumulo: A Quick IntroductionAccumulo: A Quick Introduction
Accumulo: A Quick Introduction
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
 
GeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in AccumuloGeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in Accumulo
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
 
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataOct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
 

Similar to Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal Processing

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
ADAPTER
ADAPTERADAPTER
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
Rob Emanuele
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
ocporacledba
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
dba3003
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
Guy K. Kloss
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
Anant Kumar
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databases
Tilak Patidar
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
EDINA, University of Edinburgh
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
IRJET Journal
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
jins0618
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
Cloudera, Inc.
 
SSBSE10.ppt
SSBSE10.pptSSBSE10.ppt
SSBSE10.ppt
Ptidej Team
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET Journal
 
Watershed Delineation in ArcGIS
Watershed Delineation in ArcGISWatershed Delineation in ArcGIS
Watershed Delineation in ArcGIS
Arthur Green
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Shuai Yuan
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed Environment
IJMER
 
Watershed Delineation Using ArcMap
Watershed Delineation Using ArcMapWatershed Delineation Using ArcMap
Watershed Delineation Using ArcMap
Arthur Green
 
Characteristics of an on chip cache on nec sx
Characteristics of an on chip cache on nec sxCharacteristics of an on chip cache on nec sx
Characteristics of an on chip cache on nec sx
Léia de Sousa
 

Similar to Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal Processing (20)

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
ADAPTER
ADAPTERADAPTER
ADAPTER
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
 
The design and implementation of modern column oriented databases
The design and implementation of modern column oriented databasesThe design and implementation of modern column oriented databases
The design and implementation of modern column oriented databases
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
SSBSE10.ppt
SSBSE10.pptSSBSE10.ppt
SSBSE10.ppt
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
Watershed Delineation in ArcGIS
Watershed Delineation in ArcGISWatershed Delineation in ArcGIS
Watershed Delineation in ArcGIS
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed Environment
 
Watershed Delineation Using ArcMap
Watershed Delineation Using ArcMapWatershed Delineation Using ArcMap
Watershed Delineation Using ArcMap
 
Characteristics of an on chip cache on nec sx
Characteristics of an on chip cache on nec sxCharacteristics of an on chip cache on nec sx
Characteristics of an on chip cache on nec sx
 

Recently uploaded

一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 

Recently uploaded (20)

一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 

Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal Processing

  • 1. GeoMesa: Using Accumulo for optimized spatio-temporal processing Dr. James Hughes, CCRi james.hughes@ccri.com
  • 2. GeoMesa is ● A collection of libraries and modules which can be used to solve Big Geo Data problems ○ Great for managing billions to trillions of vector data ○ Great for streaming vector data ● Open sourced through Eclipse’s LocationTech working group and has graduated incubation ● Built on top of great open source libraries GeoMesa Background
  • 3. Such architectures allow for live views and near-real time processing (speed layer) while persisting the data for historic queries and batch analysis (batch layer). Client access to both layers can be handled by GeoServer. GeoMesa enables Lambda architectures
  • 4. Suppose we wish to monitor and understand a group of GPS-enabled and internet-enabled devices (ex: sensors, vehicles). ● GeoMesa’s ETL / converter library aids in re-usable data modeling. ● GeoMesa’s NiFi support will let us move Flow Files around easily and ingest into Accumulo and Kafka topics. ● Leveraging GeoMesa’s Kafka DataStore, one can implement CEP such as 1) geo-fencing, 2) location trackers, and 3) complex alerting rules. ● Effective storage in Accumulo allows for fast query returns. ● End-to-end visualization and analysis supports allows aggregations to pushed down to the Accumulo tablet servers. ● GeoMesa’s Spark + Jupyter support allows for quick prototyping, ad hoc interactive analysis and data discovery. Example Use Case: Managing Internet-Aware Devices
  • 5. Suppose we wish to monitor and understand a group of GPS-enabled and internet-enabled devices (ex: sensors, vehicles). ● GeoMesa’s ETL / converter library aids in re-usable data modeling. ● GeoMesa’s NiFi support will let us move Flow Files around easily and ingest into Accumulo and Kafka topics. ● Leveraging GeoMesa’s Kafka DataStore, one can implement CEP such as 1) geo-fencing, 2) location trackers, and 3) complex alerting rules. ● Effective storage in Accumulo allows for fast query returns. ● End-to-end visualization and analysis supports allows aggregations to pushed down to the Accumulo tablet servers. ● GeoMesa’s Spark + Jupyter support allows for quick prototyping, ad hoc interactive analysis and data discovery. All of this adds up to “Speed! Speed! Speed!” whether you are looking at a live view of the data or pulling back an analysis product. Example Use Case: Managing Internet-Aware Devices
  • 6. Enabling and making visualization and analysis quick has been a journey and this talk is about our steps so far Talk Outline
  • 7. Enabling and making visualization and analysis quick has been a journey and this talk is about our steps so far 1. Space-filling curves and storing spatio-temporal data 2. Improvements to GeoMesa use and implementation of Accumulo Iterators 3. Spark and MapReduce for distributed computation Talk Outline
  • 8. Enabling and making visualization and analysis quick has been a journey and this talk is about our steps so far 1. Space-filling curves and storing spatio-temporal data 2. Improvements to GeoMesa use and implementation of Accumulo Iterators 3. Spark and MapReduce for distributed computation Not in this talk 1. Storm / NiFi - Streaming Ingest 2. Live views and online processing with Kafka 3. Command line tools 4. ETL / parser library 5. Machine learning / Deep Analytics Talk Outline
  • 9. ● Accumulo Key Design ● Space Filling Curves 101 ● Indices for Points with Time ● Indices for Lines and Polygons ● Lessons Learned GeoMesa's evolution of Accumulo schemas
  • 10. In a traditional stack, the application issues queries to a database which is responsible for query planning. Overview of query planning in Accumulo
  • 11. In a traditional stack, the application issues queries to a database which is responsible for query planning. Overview of query planning in Accumulo With Accumulo, the query planning is handled by library code in the application.
  • 12. ● Goal: Index 2+ dimensional data ● Approach: Use Space Filling Curves Space Filling Curves (in one slide!)
  • 13. ● Goal: Index 2+ dimensional data ● Approach: Use Space Filling Curves ● First, ‘grid’ the data space into bins. Space Filling Curves (in one slide!)
  • 14. ● Goal: Index 2+ dimensional data ● Approach: Use Space Filling Curves ● First, ‘grid’ the data space into bins. ● Next, order the grid cells with a space filling curve. ○ Label the grid cells by the order that the curve visits the them. ○ Associate the data in that grid cell with a byte representation of the label. Space Filling Curves (in one slide!)
  • 15. ● Goal: Index 2+ dimensional data ● Approach: Use Space Filling Curves ● First, ‘grid’ the data space into bins. ● Next, order the grid cells with a space filling curve. ○ Label the grid cells by the order that the curve visits the them. ○ Associate the data in that grid cell with a byte representation of the label. ● We prefer “good” space filling curves: ○ Want recursive curves and locality. Space Filling Curves (in one slide!)
  • 16. ● Goal: Index 2+ dimensional data ● Approach: Use Space Filling Curves ● First, ‘grid’ the data space into bins. ● Next, order the grid cells with a space filling curve. ○ Label the grid cells by the order that the curve visits the them. ○ Associate the data in that grid cell with a byte representation of the label. ● We prefer “good” space filling curves: ○ Want recursive curves and locality. ● Space filling curves have higher dimensional analogs. Space Filling Curves (in one slide!)
  • 17. To query for points in the grey rectangle, the query planner enumerates a collection of index ranges which cover the area. Note: Most queries won’t line up perfectly with the gridding strategy. Further filtering can be run on the Accumulo tablet servers with Iterators (next section) or we can return ‘loose’ bounding box results (likely more quickly). Query planning with Space Filling Curves
  • 18. GeoMesa has several tables; each optimized for a particular use case. The Z3 table is used with and optimized for temporal point data. (Think sensor observations, track reports, or other events which happen at particular location.) GeoMesa Key Structure for the ‘Z3’ table Key Value Row Column Record Family Qualifier Shard 1-Byte Epoch Week 2-Bytes Z3(x,y,t) 8-Bytes ‘F’ Here and now: (38.9864985, -76.9561856) 10:15am, Tuesday, Oct. 11th, 2016 Epoch Week: 2440 X value: 1275689 Y value: 151972 T value: 2097151 Z3 (as a long): 6430470637115132837
  • 19. Most approaches to indexing non-point geometries involve covering the geometry with a number of grid cells and storing a copy with each index. This means that the client has to deduplicate results which is expensive. Indexing non-point geometries: New XZ Index
  • 20. Most approaches to indexing non-point geometries involve covering the geometry with a number of grid cells and storing a copy with each index. This means that the client has to deduplicate results which is expensive. Böhm, Klump, and Kriegel describe an indexing strategy allows such geometries to be stored once. GeoMesa has implemented this strategy in XZ2 (spatial-only) and XZ3 (spatio-temporal) tables. The key is to store data by resolution, separate geometries by size, and then index them by their lower left corner. This does require consideration on the query planning side, but avoiding deduplication is worth the trade-off. Indexing non-point geometries: New XZ Index For more details, see Böhm, Klump, and Kriegel. “XZ-ordering: a space-filling curve for objects with spatial extension.” 6th. Int. Symposium on Large Spatial Databases (SSD), 1999, Hong Kong, China. (http://www.dbs.ifi.lmu.de/Publikationen/Boehm/Ordering_99.pdf)
  • 21. ● Accumulo Iterator Overview ● GeoMesa Iterators for Analysis and Visualization ● Iterator Lessons Learned GeoMesa's use of Accumulo Iterators
  • 22. “Iterators provide a modular mechanism for adding functionality to be executed by TabletServers when scanning or compacting data. This allows users to efficiently summarize, filter, and aggregate data.” -- Accumulo 1.7 documentation Part of the modularity is that the iterators can be stacked: t the output of one can be wired into the next. Example: The first iterator might read from disk, the second could filter with Authorizations, and a final iterator could filter by column family. Other notes: ● Iterators provided a sorted view of the key/values. ● Iterator code can be loaded from HDFS and namespaced! Accumulo Iterators
  • 23. Visualization Example: Heatmaps Without powerful visualization options, big data is big nonsense. Consider this view of shipping in the Mediterranean sea
  • 24. Visualization Example: Heatmaps Without powerful visualization options, big data is big nonsense. Consider this view of shipping in the Mediterranean sea Heatmaps help show patterns and they can be accelerated with GeoMesa
  • 25. Visualization Example: Heatmaps Without powerful visualization options, big data is big nonsense. Consider this view of shipping in the Mediterranean sea Heatmaps help show patterns and they can be accelerated with GeoMesa Heatmap Request HeatMap WPS Query Hints
  • 26. A request to GeoMesa consists of two broad pieces: 1. A filter restricting the data to act on, e.g.: a. Records in Maryland with ‘Accumulo’ in the text field. b. Records during the first week of 2016. 2. A request for ‘how’ to return the data, e.g.: a. Return the full records b. Return a subset of the record (either a projection or ‘bin’ file format) c. Return a histogram d. Return a heatmap / kernel density Generally, a filter can be handled partially by selecting which ranges to scan; the remainder can be handled by an Iterator. Modifications to selected data can also be handled by a GeoMesa Iterator. GeoMesa Data Requests
  • 27. The first pass of GeoMesa iterators separated concerns into separate iterators. The GeoMesa query planner assembled a stack of iterators to achieve the desired result. Initial GeoMesa Iterator design Image from “Spatio-temporal Indexing in Non-relational Distributed Databases” by Anthony Fox, Chris Eichelberger, James Hughes, Skylar Lyon
  • 28. The key benefit to having decomposed iterators is that they are easier to understand and re-mix. In terms of performance, each one needs to understand the bytes in the Key and Value. In many cases, this will lead to additional serialization/deserialization. Now, we prefer to write Iterators which handle transforming the underlying data into what the client code is expecting in one go. Second GeoMesa Iterator design
  • 29. 1. Using fewer iterators in the stack can be beneficial 2. Using lazy evaluation / deserialization for filtering Values can power speed improvements. 3. Iterators take in Sorted Keys + Values and *must* produce Sorted Keys and Values. 4. Accumulo 1.8.0 has an Iterator Test Harness! https://accumulo.apache.org/release_notes/1.8.0#iterator-test-harness https://accumulo.apache.org/1.8/accumulo_user_manual.html#_iterator_testing Lessons learned about Iterators
  • 30. Through our use of a) space filling curves, b) a cost-based query optimizer, and c) carefully configured iterators, the GeoMesa query planner has a lot going on. The GeoMesa query explainer logs 1) which index was used, 2) which ranges where scanned, 3) Iterator configuration, etc. Putting all together: the GeoMesa Query Explainer geomesa> geomesa explain -u USER -p PASS -i INSTANCE -c geomesa -z zoo1,zoo2,zoo3 -f AccumuloQuickStart -q "Who = 'Bierce'" Planning 'AccumuloQuickStart' Who = 'Bierce' Original filter: Who = 'Bierce' Hints: density[false] bin[false] stats[false] map-aggregate[false] sampling[none] Sort: none Transforms: None Strategy selection: Query processing took 69ms and produced 1 options Filter plan: FilterPlan[ATTRIBUTE[Who = 'Bierce'][None]] Strategy selection took 8ms for 1 options Strategy 1 of 1: AttributeIdxStrategy Strategy filter: ATTRIBUTE[Who = 'Bierce'][None] Plan: org.locationtech.geomesa.accumulo.index.BatchScanPlan Table: geomesa_attr Deduplicate: false Column Families: all Ranges (1): [%01;%00;%00;Bierce%00;::%01;%00;%00;Bierce%01;) Iterators (0): Query planning took 119ms Verify hints Inspect strategies considered See table and ranges to be scanned Quantify planning time
  • 31. ● GeoMesa + Spark Setup ● GeoMesa + Spark Analytics ● GeoMesa powered notebooks (Jupyter and Zeppelin) GeoMesa’s Spark Support: Data Analysis and Discovery
  • 32. Using Accumulo Iterators, we’ve seen how one can easily perform simple ‘MapReduce’ style jobs without needing more infrastructure. NB: Those tasks are limited. One can filter inputs, transform/map records and aggregate partial results on each tablet server. To implement more complex processes, we look to MapReduce and Spark. GeoMesa MapReduce and Spark Support
  • 33. Using Accumulo Iterators, we’ve seen how one can easily perform simple ‘MapReduce’ style jobs without needing more infrastructure. NB: Those tasks are limited. One can filter inputs, transform/map records and aggregate partial results on each tablet server. To implement more complex processes, we look to MapReduce and Spark. Accumulo Implements the MapReduce InputFormat interface. GeoMesa MapReduce and Spark Support
  • 34. Using Accumulo Iterators, we’ve seen how one can easily perform simple ‘MapReduce’ style jobs without needing more infrastructure. NB: Those tasks are limited. One can filter inputs, transform/map records and aggregate partial results on each tablet server. To implement more complex processes, we look to MapReduce and Spark. Accumulo Implements the MapReduce InputFormat interface. Spark provides a way to change InputFormats into RDDs. GeoMesa MapReduce and Spark Support
  • 35. Using Accumulo Iterators, we’ve seen how one can easily perform simple ‘MapReduce’ style jobs without needing more infrastructure. NB: Those tasks are limited. One can filter inputs, transform/map records and aggregate partial results on each tablet server. To implement more complex processes, we look to MapReduce and Spark. Accumulo Implements the MapReduce InputFormat interface. Spark provides a way to change InputFormats into RDDs. So with a little glue code and Spark classpath/environment management, GeoMesa has Spark support! GeoMesa MapReduce and Spark Support
  • 36. GeoMesa Spark Example 1: Time Series Step 1: Get an RDD[SimpleFeature] Step 2: Calculate the time series Step 3: Plot the time series in R.
  • 37. Using one dataset (country boundaries) to group another (here, GDELT) is effectively a join. Our summer intern, Atallah, worked out the details of doing this analysis in Spark and created a tutorial and blog post. This picture shows ‘stability’ of a region from GDELT Goldstein values GeoMesa Spark Example 2: Aggregating by Regions http://www.ccri.com/2016/08/17/new-geomesa-tutorial-aggregating-visualizing-data/ http://www.geomesa.org/documentation/tutorials/shallow-join.html
  • 38. GeoMesa Spark Example 3: Aggregating Tweets about #traffic Virginia Polygon CQL GeoMesa RDD Aggregate by County Calculate ratio of #traffic Store back to GeoMesa
  • 39. GeoMesa Spark Example 3: Aggregating Tweets about #traffic #traffic by Virginia county Darker blue has a higher count
  • 40. Problem: Another developer came by and mentioned that his Spark job using GeoMesa had quite a few tasks (far more than expected). Around the same time, Eugene Cheipesh (Azavea / GeoTrellis) wrote in to the Accumulo user list… In Accumulo 1.6.x, each range in the Accumulo InputFormat becomes a Split. With space filling curves, it is easy to enumerate plenty of ranges for a query. Solution: The short term solution was to create a custom InputFormat which produce Splits which contain more than one range. A small bump in the road…
  • 41. Interactive Data Discovery at Scale in GeoMesa Notebooks Writing (and debugging!) MapReduce / Spark jobs is slow and requires expertise. A long development cycle for an analytic saps energy and creativity. The answer to both is interactive ‘notebook’ servers like Apache Zeppelin and Jupyter (formerly iPython Notebook).
  • 42. Interactive Data Discovery at Scale in GeoMesa Notebooks Writing (and debugging!) MapReduce / Spark jobs is slow and requires expertise. A long development cycle for an analytic saps energy and creativity. The answer to both is interactive ‘notebook’ servers like Apache Zeppelin and Jupyter (formerly iPython Notebook). There are two big things to work out: 1. Getting the right libraries on the classpath. 2. Wiring up visualizations.
  • 43. Interactive Data Discovery at Scale in GeoMesa Notebooks GeoMesa Notebook Roadmap: ● Improved JavaScript integration ● D3.js and other visualization libraries ● OpenLayers and Leaflet ● Python Bindings
  • 44. Questions? Find out more at http://geomesa.org Connect with us on Gitter: https://gitter.im/locationtech/geomes a See applications at CCRi’s blog: http://www.ccri.com/blog/
  • 47. GeoMesa Converter Library The Converter library is used in 1. The GeoMesa command line tools 2. GeoMesa’s NiFi processors Configurations support XML, CSV, TSV JSON, Avro, and more! Examples are available for GeoNames, GDELT,OSM-GPX, Twitter, and others.
  • 48. Live view with the GeoMesa Kafka DataStore Q: How did you get billions of points? A: Data is streaming in continually. Examples come from IoT related applications: 10 thousand sensors reporting every 5 seconds generate 1.2 billion records in a week. In these cases, we want to see where things are right now.
  • 49. GeoMesa Kafka DataStore Architecture We have two issues to address: 1. In-memory index of SimpleFeatures 2. Durable message passing system For indexing, we use a combination of Guava and CQEngine (efficient Java collections). Kafka serves as the message passing system. Consumer KDSes can be run in Storm (for event processing), GeoServer (OGC access), etc.
  • 50. Z-Order Hilbert Around 100 years ago, mathematicians asked the question, “Is there a continuous function from the unit interval to the unit square which covers it?” Space Filling Curves: The Math Row-Major
  • 51. Streaming Data Architecture; Part 1 Continuous ingest: GeoMesa-NiFi leverages the GeoMesa converter library