Anthony Fox

Director, Data Science and System Architecture
Commonwealth Computer Research, Inc
anthonyfox@ccri.com
What is this talk about?
Indexing, querying, visualizing, and analyzing
spatio-temporal data at scale.
Using open-source.
Why?
Why?

●  Volume of spatio-temporal data is increasing exponentially
●  Traditional multi-dimensional indexing techniques a...
How?

•  Storage - leverage distributed databases
• 

like Accumulo.
Compute - parallelize spatio-temporal
queries and ana...
What is GeoMesa?

•  A flexible spatio-temporal
• 
• 

index built on Accumulo.
An implementation of
GeoTools interfaces t...
Integration
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieva...
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieva...
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieva...
What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieva...
How Do We Store Multi-Dimensional Data in a
Dictionary?

• 
• 
• 
• 

Space Filling Curves project
multiple dimensions int...
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
Uses Space Filling C...
How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
Uses Space Filling C...
What is the GeoMesa Model?
How Does GeoMesa Perform?
GDELT - Global Database of Events, Language, and Tone
Leetaru, Kalev and Schrodt, Philip. (2013)...
GDELT
GDELT assigns an Event Code
to each event.
Codes are based on CAMEO Conflict Mediation and
Event Observation.
There ...
GDELT
http://geomesa.github.io/gdelt.html
How?
Using Open Source
Storage,
Querying,
Filtering
Aggregation
and analysis

Visualization
Distributed Spatial Computations
● 

Scalding greatly simplifies
Map/Reduce

● 

AccumuloSource is an
implementation of a ...
Performance
PostGIS
1000 responses
in > 30 seconds

GeoMesa
1000 responses
in < 1 second
Roadmap

•  Enhance integration with cell level security
•  Build statistical index and query optimization
o 

Bring Your ...
Upcoming SlideShare
Loading in...5
×

GeoMesa LocationTech DC

905

Published on

GeoMesa presentation from LocationTech Tour - DC - November, 14th 2013. Presented by Anthony Fox (@algoriffic) of CCRi.

GeoMesa is an open source project providing spatio-temporal indexing, querying, and visualizing capabilities to Accumulo. Learn more at http://geomesa.github.io/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
905
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

GeoMesa LocationTech DC

  1. 1. Anthony Fox Director, Data Science and System Architecture Commonwealth Computer Research, Inc anthonyfox@ccri.com
  2. 2. What is this talk about? Indexing, querying, visualizing, and analyzing spatio-temporal data at scale. Using open-source.
  3. 3. Why?
  4. 4. Why? ●  Volume of spatio-temporal data is increasing exponentially ●  Traditional multi-dimensional indexing techniques are straining to keep up
  5. 5. How? •  Storage - leverage distributed databases •  like Accumulo. Compute - parallelize spatio-temporal queries and analytics using MapReduce. GeoMesa enables geospatial analytics within the Hadoop ecosystem.
  6. 6. What is GeoMesa? •  A flexible spatio-temporal •  •  index built on Accumulo. An implementation of GeoTools interfaces to make integration seamless. A set of GeoServer plugins for OGC compliant access to data.
  7. 7. Integration
  8. 8. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org
  9. 9. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org Based on Google BigTable Adds cell-level security and server side programming model in the form of composable iterators
  10. 10. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org h"p://accumulo.apache.org/1.4/user_manual/ Accumulo_Design.html    
  11. 11. What is Accumulo? “The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org h"p://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html    
  12. 12. How Do We Store Multi-Dimensional Data in a Dictionary? •  •  •  •  Space Filling Curves project multiple dimensions into a single dimension Base32 encoding induces an Accumulo friendly lexicographic ordering Recursive nesting facilitates storing different resolutions of data GeoHashes are common in web services http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatialindexing-with-Quadtrees-and-Hilbert-Curves
  13. 13. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  14. 14. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  15. 15. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability.
  16. 16. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability. Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys.
  17. 17. How Does GeoMesa’s Index Work? Constructs a key beginning with a shard id for horizontal scalability. Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys. Stacks server side iterators to apply (E)CQL standard queries in parallel at scan time.
  18. 18. What is the GeoMesa Model?
  19. 19. How Does GeoMesa Perform? GDELT - Global Database of Events, Language, and Tone Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual Conference, April 2013. San Diego, CA. - See more at: http://gdelt.utdallas.edu/about.html 220 million geocoded events from 1979 until current. Exhibits pathologies common in spatio-temporal data sets Hot spots Bad geocoding
  20. 20. GDELT GDELT assigns an Event Code to each event. Codes are based on CAMEO Conflict Mediation and Event Observation. There are 20 top level CAMEO codes. John Beieler developed a visualization of every protest (one of the top level categories) on the planet since 1979. http://www.foreignpolicy.com/articles/2013/08/22/mapped_what_every_protest_in_the_last_34_years_looks_like
  21. 21. GDELT http://geomesa.github.io/gdelt.html
  22. 22. How? Using Open Source Storage, Querying, Filtering Aggregation and analysis Visualization
  23. 23. Distributed Spatial Computations ●  Scalding greatly simplifies Map/Reduce ●  AccumuloSource is an implementation of a Scalding source/sink ●  GeoMesa allows developers to work with SimpleFeatures in a Map/Reduce job
  24. 24. Performance PostGIS 1000 responses in > 30 seconds GeoMesa 1000 responses in < 1 second
  25. 25. Roadmap •  Enhance integration with cell level security •  Build statistical index and query optimization o  Bring Your Own Space Filling Curve o  “VACUUM ANALYZE” •  Integrate GeoWebCache and Hadoop •  Ease developer on-ramping •  Grow community through LocationTech
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×