SlideShare a Scribd company logo
1 of 25
Download to read offline
Anthony Fox

Director, Data Science and System Architecture
Commonwealth Computer Research, Inc
anthonyfox@ccri.com
What is this talk about?
Indexing, querying, visualizing, and analyzing
spatio-temporal data at scale.
Using open-source.
Why?
Why?

●  Volume of spatio-temporal data is increasing exponentially
●  Traditional multi-dimensional indexing techniques are
straining to keep up
How?

•  Storage - leverage distributed databases
• 

like Accumulo.
Compute - parallelize spatio-temporal
queries and analytics using MapReduce.

GeoMesa enables geospatial analytics within
the Hadoop ecosystem.
What is GeoMesa?

•  A flexible spatio-temporal
• 
• 

index built on Accumulo.
An implementation of
GeoTools interfaces to make
integration seamless.
A set of GeoServer plugins
for OGC compliant access to
data.
Integration
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org

Based on Google BigTable
Adds cell-level security and server side
programming model in the form of
composable iterators
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org

h"p://accumulo.apache.org/1.4/user_manual/
Accumulo_Design.html	
  	
  
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org

h"p://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html	
  	
  
How Do We Store Multi-Dimensional Data in a
Dictionary?

• 
• 
• 
• 

Space Filling Curves project
multiple dimensions into a single
dimension
Base32 encoding induces an
Accumulo friendly lexicographic
ordering
Recursive nesting facilitates storing
different resolutions of data
GeoHashes are common in web
services

http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatialindexing-with-Quadtrees-and-Hilbert-Curves
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
Uses Space Filling Curves to
encode spatio-temporal data in
Accumulo keys.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
Uses Space Filling Curves to
encode spatio-temporal data in
Accumulo keys.
Stacks server side iterators to
apply (E)CQL standard queries in
parallel at scan time.
What is the GeoMesa Model?
How Does GeoMesa Perform?
GDELT - Global Database of Events, Language, and Tone
Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual
Conference, April 2013. San Diego, CA. - See more at: http://gdelt.utdallas.edu/about.html

220 million geocoded events from 1979 until current.
Exhibits pathologies common in spatio-temporal data sets

Hot spots
Bad geocoding
GDELT
GDELT assigns an Event Code
to each event.
Codes are based on CAMEO Conflict Mediation and
Event Observation.
There are 20 top level
CAMEO codes.
John Beieler developed a
visualization of every
protest (one of the top
level categories) on the
planet since 1979.

http://www.foreignpolicy.com/articles/2013/08/22/mapped_what_every_protest_in_the_last_34_years_looks_like
GDELT
http://geomesa.github.io/gdelt.html
How?
Using Open Source
Storage,
Querying,
Filtering
Aggregation
and analysis

Visualization
Distributed Spatial Computations
● 

Scalding greatly simplifies
Map/Reduce

● 

AccumuloSource is an
implementation of a Scalding
source/sink

● 

GeoMesa allows developers
to work with SimpleFeatures
in a Map/Reduce job
Performance
PostGIS
1000 responses
in > 30 seconds

GeoMesa
1000 responses
in < 1 second
Roadmap

•  Enhance integration with cell level security
•  Build statistical index and query optimization
o 

Bring Your Own Space Filling Curve

o 

“VACUUM ANALYZE”

•  Integrate GeoWebCache and Hadoop
•  Ease developer on-ramping
•  Grow community through LocationTech

More Related Content

What's hot

2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasetsRob Emanuele
 
Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And GisKudos S.A.S
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkRob Emanuele
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial dataKudos S.A.S
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisRob Emanuele
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The CloudsJacky Chu
 
Introduction of open source gis
Introduction of open source gisIntroduction of open source gis
Introduction of open source gisHiroaki Sengoku
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkSpark Summit
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopExtremeEarth
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceDan Pilone
 
Introducting RasterFrames
Introducting RasterFramesIntroducting RasterFrames
Introducting RasterFramesSimeon Fitch
 
Snow cover assessment tool using Python
Snow cover assessment tool using PythonSnow cover assessment tool using Python
Snow cover assessment tool using PythonPrasun Kumar Gupta
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...Dan Pilone
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...DataWorks Summit
 
Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopExtremeEarth
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19ExtremeEarth
 

What's hot (20)

2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
 
Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And Gis
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
Introduction of open source gis
Introduction of open source gisIntroduction of open source gis
Introduction of open source gis
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
 
Introducting RasterFrames
Introducting RasterFramesIntroducting RasterFrames
Introducting RasterFrames
 
design_doc
design_docdesign_doc
design_doc
 
Snow cover assessment tool using Python
Snow cover assessment tool using PythonSnow cover assessment tool using Python
Snow cover assessment tool using Python
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
 
ArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & RoadmapArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & Roadmap
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 
Gdal introduction
Gdal introductionGdal introduction
Gdal introduction
 

Viewers also liked

Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
GeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in AccumuloGeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in AccumuloCvilleDataScience
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation ComparisonJody Garnett
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo OverviewBill Havanki
 
Redis adaptor for Apache Geode
Redis adaptor for Apache GeodeRedis adaptor for Apache Geode
Redis adaptor for Apache GeodeSwapnil Bawaskar
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformAmazon Web Services
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache AccumuloJared Winick
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Viet-Trung TRAN
 

Viewers also liked (12)

Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
GeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in AccumuloGeoMesa – Spatio-Temporal Indexing in Accumulo
GeoMesa – Spatio-Temporal Indexing in Accumulo
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Redis adaptor for Apache Geode
Redis adaptor for Apache GeodeRedis adaptor for Apache Geode
Redis adaptor for Apache Geode
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better Platform
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Searching for effective farming policies in Gloucestershire
Searching for effective farming policies in GloucestershireSearching for effective farming policies in Gloucestershire
Searching for effective farming policies in Gloucestershire
 

Similar to GeoMesa LocationTech DC

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensStoitsis Giannis
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016Eduard Lazar
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setSoner Altin
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Paolo Corti
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolThierry Badard
 
Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gasDiego Escobar
 
The State of Big Data for Geo - ESRI Big Data Meetup
The State of Big Data for Geo - ESRI Big Data MeetupThe State of Big Data for Geo - ESRI Big Data Meetup
The State of Big Data for Geo - ESRI Big Data Meetupseagor
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
DEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture SessionDEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture SessionH2020 DEMETER
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Gridnoho
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataChristophe Debruyne
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!EDINA, University of Edinburgh
 
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...Ig Bittencourt
 

Similar to GeoMesa LocationTech DC (20)

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Big data
Big dataBig data
Big data
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gas
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
The State of Big Data for Geo - ESRI Big Data Meetup
The State of Big Data for Geo - ESRI Big Data MeetupThe State of Big Data for Geo - ESRI Big Data Meetup
The State of Big Data for Geo - ESRI Big Data Meetup
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
DEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture SessionDEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture Session
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...From Research to Innovation: Linked Open Data and Gamification to Design Inte...
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
 

Recently uploaded

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 

Recently uploaded (20)

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 

GeoMesa LocationTech DC

  • 1. Anthony Fox Director, Data Science and System Architecture Commonwealth Computer Research, Inc anthonyfox@ccri.com
  • 2. What is this talk about? Indexing, querying, visualizing, and analyzing spatio-temporal data at scale. Using open-source.
  • 4. Why? ●  Volume of spatio-temporal data is increasing exponentially ●  Traditional multi-dimensional indexing techniques are straining to keep up
  • 5. How? •  Storage - leverage distributed databases •  like Accumulo. Compute - parallelize spatio-temporal queries and analytics using MapReduce. GeoMesa enables geospatial analytics within the Hadoop ecosystem.
  • 6. What is GeoMesa? •  A flexible spatio-temporal •  •  index built on Accumulo. An implementation of GeoTools interfaces to make integration seamless. A set of GeoServer plugins for OGC compliant access to data.
  • 8. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org
  • 9. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org Based on Google BigTable Adds cell-level security and server side programming model in the form of composable iterators
  • 10. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org h"p://accumulo.apache.org/1.4/user_manual/ Accumulo_Design.html    
  • 11. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org h"p://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html    
  • 12. How Do We Store Multi-Dimensional Data in a Dictionary? •  •  •  •  Space Filling Curves project multiple dimensions into a single dimension Base32 encoding induces an Accumulo friendly lexicographic ordering Recursive nesting facilitates storing different resolutions of data GeoHashes are common in web services http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatialindexing-with-Quadtrees-and-Hilbert-Curves
  • 13. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  • 14. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  • 15. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  • 16. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability. Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys.
  • 17. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability. Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys. Stacks server side iterators to apply (E)CQL standard queries in parallel at scan time.
  • 18. What is the GeoMesa Model?
  • 19. How Does GeoMesa Perform? GDELT - Global Database of Events, Language, and Tone Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual Conference, April 2013. San Diego, CA. - See more at: http://gdelt.utdallas.edu/about.html 220 million geocoded events from 1979 until current. Exhibits pathologies common in spatio-temporal data sets Hot spots Bad geocoding
  • 20. GDELT GDELT assigns an Event Code to each event. Codes are based on CAMEO Conflict Mediation and Event Observation. There are 20 top level CAMEO codes. John Beieler developed a visualization of every protest (one of the top level categories) on the planet since 1979. http://www.foreignpolicy.com/articles/2013/08/22/mapped_what_every_protest_in_the_last_34_years_looks_like
  • 23. Distributed Spatial Computations ●  Scalding greatly simplifies Map/Reduce ●  AccumuloSource is an implementation of a Scalding source/sink ●  GeoMesa allows developers to work with SimpleFeatures in a Map/Reduce job
  • 24. Performance PostGIS 1000 responses in > 30 seconds GeoMesa 1000 responses in < 1 second
  • 25. Roadmap •  Enhance integration with cell level security •  Build statistical index and query optimization o  Bring Your Own Space Filling Curve o  “VACUUM ANALYZE” •  Integrate GeoWebCache and Hadoop •  Ease developer on-ramping •  Grow community through LocationTech