SlideShare a Scribd company logo
Anthony Fox

Director, Data Science and System Architecture
Commonwealth Computer Research, Inc
anthonyfox@ccri.com
What is this talk about?
Indexing, querying, visualizing, and analyzing
spatio-temporal data at scale.
Using open-source.
Why?
Why?

●  Volume of spatio-temporal data is increasing exponentially
●  Traditional multi-dimensional indexing techniques are
straining to keep up
How?

•  Storage - leverage distributed databases
• 

like Accumulo.
Compute - parallelize spatio-temporal
queries and analytics using MapReduce.

GeoMesa enables geospatial analytics within
the Hadoop ecosystem.
What is GeoMesa?

•  A flexible spatio-temporal
• 
• 

index built on Accumulo.
An implementation of
GeoTools interfaces to make
integration seamless.
A set of GeoServer plugins
for OGC compliant access to
data.
Integration
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org

Based on Google BigTable
Adds cell-level security and server side
programming model in the form of
composable iterators
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org

h"p://accumulo.apache.org/1.4/user_manual/
Accumulo_Design.html	
  	
  
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org

h"p://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html	
  	
  
How Do We Store Multi-Dimensional Data in a
Dictionary?

• 
• 
• 
• 

Space Filling Curves project
multiple dimensions into a single
dimension
Base32 encoding induces an
Accumulo friendly lexicographic
ordering
Recursive nesting facilitates storing
different resolutions of data
GeoHashes are common in web
services

http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatialindexing-with-Quadtrees-and-Hilbert-Curves
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
Uses Space Filling Curves to
encode spatio-temporal data in
Accumulo keys.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
Uses Space Filling Curves to
encode spatio-temporal data in
Accumulo keys.
Stacks server side iterators to
apply (E)CQL standard queries in
parallel at scan time.
What is the GeoMesa Model?
How Does GeoMesa Perform?
GDELT - Global Database of Events, Language, and Tone
Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual
Conference, April 2013. San Diego, CA. - See more at: http://gdelt.utdallas.edu/about.html

220 million geocoded events from 1979 until current.
Exhibits pathologies common in spatio-temporal data sets

Hot spots
Bad geocoding
GDELT
GDELT assigns an Event Code
to each event.
Codes are based on CAMEO Conflict Mediation and
Event Observation.
There are 20 top level
CAMEO codes.
John Beieler developed a
visualization of every
protest (one of the top
level categories) on the
planet since 1979.

http://www.foreignpolicy.com/articles/2013/08/22/mapped_what_every_protest_in_the_last_34_years_looks_like
GDELT
http://geomesa.github.io/gdelt.html
How?
Using Open Source
Storage,
Querying,
Filtering
Aggregation
and analysis

Visualization
Distributed Spatial Computations
● 

Scalding greatly simplifies
Map/Reduce

● 

AccumuloSource is an
implementation of a Scalding
source/sink

● 

GeoMesa allows developers
to work with SimpleFeatures
in a Map/Reduce job
Performance
PostGIS
1000 responses
in > 30 seconds

GeoMesa
1000 responses
in < 1 second
Roadmap

•  Enhance integration with cell level security
•  Build statistical index and query optimization
o 

Bring Your Own Space Filling Curve

o 

“VACUUM ANALYZE”

•  Integrate GeoWebCache and Hadoop
•  Ease developer on-ramping
•  Grow community through LocationTech

More Related Content

What's hot

2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
Rob Emanuele
 
Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And Gis
Kudos S.A.S
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
Rob Emanuele
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
Kudos S.A.S
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
Rob Emanuele
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The CloudsJacky Chu
 
Introduction of open source gis
Introduction of open source gisIntroduction of open source gis
Introduction of open source gis
Hiroaki Sengoku
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
Spark Summit
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
ExtremeEarth
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
Dan Pilone
 
Introducting RasterFrames
Introducting RasterFramesIntroducting RasterFrames
Introducting RasterFrames
Simeon Fitch
 
Snow cover assessment tool using Python
Snow cover assessment tool using PythonSnow cover assessment tool using Python
Snow cover assessment tool using Python
Prasun Kumar Gupta
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
Dan Pilone
 
ArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & RoadmapArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & Roadmap
The HDF-EOS Tools and Information Center
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
DataWorks Summit
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
The HDF-EOS Tools and Information Center
 
Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
ExtremeEarth
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
ExtremeEarth
 
Gdal introduction
Gdal introductionGdal introduction
Gdal introduction
Tomer Lieber
 

What's hot (20)

2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
 
Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And Gis
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
Introduction of open source gis
Introduction of open source gisIntroduction of open source gis
Introduction of open source gis
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
 
Introducting RasterFrames
Introducting RasterFramesIntroducting RasterFrames
Introducting RasterFrames
 
design_doc
design_docdesign_doc
design_doc
 
Snow cover assessment tool using Python
Snow cover assessment tool using PythonSnow cover assessment tool using Python
Snow cover assessment tool using Python
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
 
ArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & RoadmapArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & Roadmap
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 
Gdal introduction
Gdal introductionGdal introduction
Gdal introduction
 

Viewers also liked

Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
huguk
 
GeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in AccumuloGeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in AccumuloCvilleDataScience
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
Jody Garnett
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
Sqrrl
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
Bill Havanki
 
Redis adaptor for Apache Geode
Redis adaptor for Apache GeodeRedis adaptor for Apache Geode
Redis adaptor for Apache Geode
Swapnil Bawaskar
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better Platform
Amazon Web Services
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
Jared Winick
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
huguk
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Viet-Trung TRAN
 
Searching for effective farming policies in Gloucestershire
Searching for effective farming policies in GloucestershireSearching for effective farming policies in Gloucestershire
Searching for effective farming policies in Gloucestershire
Countryside and Community Research Institute
 

Viewers also liked (12)

Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
GeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in AccumuloGeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in Accumulo
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Redis adaptor for Apache Geode
Redis adaptor for Apache GeodeRedis adaptor for Apache Geode
Redis adaptor for Apache Geode
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better Platform
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Searching for effective farming policies in Gloucestershire
Searching for effective farming policies in GloucestershireSearching for effective farming policies in Gloucestershire
Searching for effective farming policies in Gloucestershire
 

Similar to GeoMesa LocationTech DC

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
Marco Quartulli
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensStoitsis Giannis
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
Eduard Lazar
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
Soner Altin
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Paolo Corti
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
Ravi Madduri
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
Enis Afgan
 
Big data
Big dataBig data
Big data
jaskaur1234
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
Thierry Badard
 
Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gas
Diego Escobar
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
Albert Meroño-Peñuela
 
The State of Big Data for Geo - ESRI Big Data Meetup
The State of Big Data for Geo - ESRI Big Data MeetupThe State of Big Data for Geo - ESRI Big Data Meetup
The State of Big Data for Geo - ESRI Big Data Meetupseagor
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Ed Dodds
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
Ola Spjuth
 
DEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture SessionDEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture Session
H2020 DEMETER
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
noho
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
Christophe Debruyne
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
EDINA, University of Edinburgh
 
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
Ig Bittencourt
 

Similar to GeoMesa LocationTech DC (20)

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Big data
Big dataBig data
Big data
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gas
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
The State of Big Data for Geo - ESRI Big Data Meetup
The State of Big Data for Geo - ESRI Big Data MeetupThe State of Big Data for Geo - ESRI Big Data Meetup
The State of Big Data for Geo - ESRI Big Data Meetup
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
DEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture SessionDEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture Session
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

GeoMesa LocationTech DC

  • 1. Anthony Fox Director, Data Science and System Architecture Commonwealth Computer Research, Inc anthonyfox@ccri.com
  • 2. What is this talk about? Indexing, querying, visualizing, and analyzing spatio-temporal data at scale. Using open-source.
  • 4. Why? ●  Volume of spatio-temporal data is increasing exponentially ●  Traditional multi-dimensional indexing techniques are straining to keep up
  • 5. How? •  Storage - leverage distributed databases •  like Accumulo. Compute - parallelize spatio-temporal queries and analytics using MapReduce. GeoMesa enables geospatial analytics within the Hadoop ecosystem.
  • 6. What is GeoMesa? •  A flexible spatio-temporal •  •  index built on Accumulo. An implementation of GeoTools interfaces to make integration seamless. A set of GeoServer plugins for OGC compliant access to data.
  • 8. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org
  • 9. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org Based on Google BigTable Adds cell-level security and server side programming model in the form of composable iterators
  • 10. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org h"p://accumulo.apache.org/1.4/user_manual/ Accumulo_Design.html    
  • 11. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org h"p://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html    
  • 12. How Do We Store Multi-Dimensional Data in a Dictionary? •  •  •  •  Space Filling Curves project multiple dimensions into a single dimension Base32 encoding induces an Accumulo friendly lexicographic ordering Recursive nesting facilitates storing different resolutions of data GeoHashes are common in web services http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatialindexing-with-Quadtrees-and-Hilbert-Curves
  • 13. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  • 14. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  • 15. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  • 16. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability. Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys.
  • 17. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability. Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys. Stacks server side iterators to apply (E)CQL standard queries in parallel at scan time.
  • 18. What is the GeoMesa Model?
  • 19. How Does GeoMesa Perform? GDELT - Global Database of Events, Language, and Tone Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual Conference, April 2013. San Diego, CA. - See more at: http://gdelt.utdallas.edu/about.html 220 million geocoded events from 1979 until current. Exhibits pathologies common in spatio-temporal data sets Hot spots Bad geocoding
  • 20. GDELT GDELT assigns an Event Code to each event. Codes are based on CAMEO Conflict Mediation and Event Observation. There are 20 top level CAMEO codes. John Beieler developed a visualization of every protest (one of the top level categories) on the planet since 1979. http://www.foreignpolicy.com/articles/2013/08/22/mapped_what_every_protest_in_the_last_34_years_looks_like
  • 23. Distributed Spatial Computations ●  Scalding greatly simplifies Map/Reduce ●  AccumuloSource is an implementation of a Scalding source/sink ●  GeoMesa allows developers to work with SimpleFeatures in a Map/Reduce job
  • 24. Performance PostGIS 1000 responses in > 30 seconds GeoMesa 1000 responses in < 1 second
  • 25. Roadmap •  Enhance integration with cell level security •  Build statistical index and query optimization o  Bring Your Own Space Filling Curve o  “VACUUM ANALYZE” •  Integrate GeoWebCache and Hadoop •  Ease developer on-ramping •  Grow community through LocationTech