Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Geo Data Analytics
@dmarcous
● DBA (@IDF)
● Big Data Professional (@IDF)
● Data Wizard - Magic with Data (@Google - Waze)
● Pure professional
● Best practices
● Tools
● Tips & Tricks
● Free Advice!
Agenda
● Why?
● Common Language
● Problems at scale
● Solutions at scale
● Tips & Tricks for scientists
(/Wizards)
● Art
●...
Why Does Geo Data Matter?
● C/C++, GEOS: http://trac.osgeo.org/geos
● C#, NTS: http://code.google.com/p/nettopologysuite/
● Java, JTS:
○ http://tsus...
Geometry Object Model
Geospatial Operations
● WKT / WKB - Geospatial Markup Language
○ POLYGON((34.807841777801514 32.164333053441936,34.81168270111084
32.16485982096...
Databases
● RDBMS
○ Postgres (PostGIS)
○ MS-SQL / DB2 / Oracle
● NoSQL
○ MongoDB
○ IBM Cloudant
○ Lucene spatial module (e...
GIS Systems
List of most popular ones -
http://en.wikipedia.org/wiki/List_of_geographic_information_systems_software
QGIS ...
Problem?
● Non scalar data types
○ Aggregating
○ Sharding
○ Unordered
● Speed & Accuracy
○ The Physical World is non-eucli...
Solution
Data Structures
● R-Tree (PostGIS, actually R+Tree)
● Quad Tree (DB2)
● Hyperdimensional Hashing
● Space Filling Curves
○ ...
The Curse of Dimensionality
Dimension Reduction
● GeoHash - The mainstream way
○ Linear (non tangant), up to x5 difference in cell area
○ Same Prefix ...
● MongoDB Geospatial Indexing
● elastic / solr spatial indexing
● GeoMesa
● Build your own - Store the bytes in a fast
key...
● ESRI - Hive UDFs -
https://github.com/Esri/spatial-framework-for-
hadoop/wiki/UDF-Documentation
● Pigeon - Pig UDFs -
ht...
Graph Representation
● Use Cases
○ Routing
○ Supply Chains
○ Users Networks
● Tools
○ GraphX (Spark!) / Giraph (MR)
○ Dato...
Tips & Tricks
Approximation
Timezones
● tz_world
○ http://efele.net/maps/tz/world/
○ What do we do with shapefiles?
● APIs
○ Geonames
○ http://www.ear...
// Word Count
val textFile = spark.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word...
Advanced - Precision is of the Essence
● Density Based Clustering
○ DBSCAN
■ Minimum cluster size (>
Noise)
■ Epsilon (Spa...
rJava
● Wrap geospatial functions of your choice
● call them from R
● Use apply on an entire Dataframe!
● Use as features!...
R Packs for Geospatial Analysis
● geonames
○ Timezone
○ Weather
○ Nearby places
● RGoogleMaps
○ download+paint Maps
○ getG...
Engineered Geo features
● LOCAL
○ time
○ is_early / is_late
○ day of week
○ is_workday / is_weekend
○ is_day_light (sunris...
WOW!
Data Art
Google Sheets
Frontend = Javascript?
● Google Maps API
○ https://developers.google.com/maps/documentation/javascript/examples/layer-
hea...
R for Visualisation
● ggplot2 + geospatial packs
○ http://uce.uniovi.es/mundor/howtoplotashapemap.html
○ http://stackoverf...
R For Interactive
● Shiny
○ Leaflet
■ http://rstudio.github.io/leaflet/
■ http://shiny.rstudio.com/gallery/superzip-exampl...
R Animation
● http://rmaps.github.io/blog/posts/animated-choropleths/
@aaronkoblin
Keep an Eye Out!
https://locationtech.org/list-of-projects
Contact
● Daniel Marcous
● dmarcous@gmail.com
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Geo data analytics
Upcoming SlideShare
Loading in …5
×

Geo data analytics

5,001 views

Published on

Geo data analytics,
Analytics of geospatial big data, best practices, tips and tricks

Published in: Data & Analytics

Geo data analytics

  1. 1. Geo Data Analytics
  2. 2. @dmarcous ● DBA (@IDF) ● Big Data Professional (@IDF) ● Data Wizard - Magic with Data (@Google - Waze)
  3. 3. ● Pure professional ● Best practices ● Tools ● Tips & Tricks ● Free Advice!
  4. 4. Agenda ● Why? ● Common Language ● Problems at scale ● Solutions at scale ● Tips & Tricks for scientists (/Wizards) ● Art ● Keep an eye out for… ● Dog Pictures
  5. 5. Why Does Geo Data Matter?
  6. 6. ● C/C++, GEOS: http://trac.osgeo.org/geos ● C#, NTS: http://code.google.com/p/nettopologysuite/ ● Java, JTS: ○ http://tsusiatsoftware.net/jts/main.html ○ http://www.vividsolutions.com/jts/JTSHome.htm ● Python, shapely: https://github.com/Toblerity/Shapely ● Ruby, ffi-geos: https://github.com/dark-panda/ffi-geos ● Javascript, JSTS: http://github.com/bjornharrtell/jsts
  7. 7. Geometry Object Model
  8. 8. Geospatial Operations
  9. 9. ● WKT / WKB - Geospatial Markup Language ○ POLYGON((34.807841777801514 32.164333053441936,34.81168270111084 32.164859820966136,34.81337785720825 32.1613540349589,34.80865716934204 32.16046394346568,34.807841777801514 32.164333053441936)) ○ http://arthur-e.github.io/Wicket/sandbox-gmaps3.html ● GeoJSON ○ { "type": "FeatureCollection", "features": [{ "type": "Feature", "properties": { "Name": "Verint", "Guest": "dmarcous", "Accomodations": "Beer; Pizza" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 34.807841777801514, 32.164333053441936 ], [ 34.81168270111084, 32.164859820966136 ], [ 34.81337785720825, 32.1613540349589 ], [ 34.80865716934204, 32.16046394346568 ], [ 34.807841777801514, 32.164333053441936 ]]]}}]} ○ http://geojson.io/#map=17/32.16267/34.81061 ● Shape Files - ESRI vector format ● GML - The Geography Markup Language (GML) is an XML grammar for expressing geographical features. ● Raster - Display file built from coordinates Formats
  10. 10. Databases ● RDBMS ○ Postgres (PostGIS) ○ MS-SQL / DB2 / Oracle ● NoSQL ○ MongoDB ○ IBM Cloudant ○ Lucene spatial module (elastic/ solr) ● Pure Geospatial Database ○ CartoDB (OS / Hosted) ○ GeoMesa (Accumulo) ■ GeoTrellis - Scala framework for processing raster data
  11. 11. GIS Systems List of most popular ones - http://en.wikipedia.org/wiki/List_of_geographic_information_systems_software QGIS TileMillGRASS
  12. 12. Problem? ● Non scalar data types ○ Aggregating ○ Sharding ○ Unordered ● Speed & Accuracy ○ The Physical World is non-euclidian http://www.jandrewrogers.com/2015/03/02/geospatial- databases-are-hard/
  13. 13. Solution
  14. 14. Data Structures ● R-Tree (PostGIS, actually R+Tree) ● Quad Tree (DB2) ● Hyperdimensional Hashing ● Space Filling Curves ○ Z Order Curve (MS-SQL) ○ Hilbert Curve
  15. 15. The Curse of Dimensionality
  16. 16. Dimension Reduction ● GeoHash - The mainstream way ○ Linear (non tangant), up to x5 difference in cell area ○ Same Prefix - Close areas (sort of…) ○ http://geohash.org/ ○ https://github.com/google/open-location- code/blob/master/docs/comparison.adoc ● S2 - The google way ○ Quadratic, same level cell ~ similar area ○ Faces of a projected cube - divided by Quad-Trees to levels - Referenced to position on face by a Hilbert Curve ○ https://code.google.com/p/s2-geometry-library/
  17. 17. ● MongoDB Geospatial Indexing ● elastic / solr spatial indexing ● GeoMesa ● Build your own - Store the bytes in a fast key-value store with reduced keys (HBase / Cassandra) Near Real Time Answers
  18. 18. ● ESRI - Hive UDFs - https://github.com/Esri/spatial-framework-for- hadoop/wiki/UDF-Documentation ● Pigeon - Pig UDFs - https://github.com/aseldawy/pigeon ● Spark - ○ SpatialSpark ○ GeoTrellis Big Processing - It’s a UDF World
  19. 19. Graph Representation ● Use Cases ○ Routing ○ Supply Chains ○ Users Networks ● Tools ○ GraphX (Spark!) / Giraph (MR) ○ Dato SGraph (formerly known as GraphLab) ○ Gephi (On small parts for exploration) ● Algorithms ○ Shortest Path - Dijkstra / A-* ○ Communities - Triangle Counting ○ Importance - Centrality / Page Rank
  20. 20. Tips & Tricks
  21. 21. Approximation
  22. 22. Timezones ● tz_world ○ http://efele.net/maps/tz/world/ ○ What do we do with shapefiles? ● APIs ○ Geonames ○ http://www.earthtools.org/ ○ Google Timezone API ● UDFs? ○ Hive - from_utc_timestamp(timestamp, string timezone)
  23. 23. // Word Count val textFile = spark.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") // Modified Word Count val textFile = spark.textFile("hdfs://...") val counts = textFile.map(line => line.split(",")) .map(point => (coord2S2Cell(point(1),point(2)), 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") // Take that from a library! def coord2S2Cell(longitude: Double, latitude: Double, lvl = 14) : Int = { return S2Cell(longitude,latitude, lvl).CellId() } Good Old Word Count
  24. 24. Advanced - Precision is of the Essence ● Density Based Clustering ○ DBSCAN ■ Minimum cluster size (> Noise) ■ Epsilon (Spatial Radius) ○ R - MASS - kde2d ■ RGoogleMaps for the map ■ http://www.everydayanalytics.ca /2014/04/heatmap-of-toronto- traffic-signals.html
  25. 25. rJava ● Wrap geospatial functions of your choice ● call them from R ● Use apply on an entire Dataframe! ● Use as features! ● Visualize??? (in 5 minutes)
  26. 26. R Packs for Geospatial Analysis ● geonames ○ Timezone ○ Weather ○ Nearby places ● RGoogleMaps ○ download+paint Maps ○ getGeoCode ● sp / maps / maptools ○ OGC object abstractions ○ Manipulate / display geo data ● rgdal - spTransform ○ Convert formats / coordinates systems ● geosphere - distances / circles / centroids ● fpc - DBSCAN ● Coverage - ○ http://cran.r-project.org/web/views/Spatial.html
  27. 27. Engineered Geo features ● LOCAL ○ time ○ is_early / is_late ○ day of week ○ is_workday / is_weekend ○ is_day_light (sunrise/ sunset tz_world) ● Weather ○ Temperature ○ is_ Rain/ Fog / Hail / Snow ● Squared (s2cell/ geohash) statistics ○ Probability of users in square to predict X ● Address - is_residence / is_business ● News - GDELT
  28. 28. WOW!
  29. 29. Data Art
  30. 30. Google Sheets
  31. 31. Frontend = Javascript? ● Google Maps API ○ https://developers.google.com/maps/documentation/javascript/examples/layer- heatmap ● Leaflet
  32. 32. R for Visualisation ● ggplot2 + geospatial packs ○ http://uce.uniovi.es/mundor/howtoplotashapemap.html ○ http://stackoverflow.com/questions/9558040/ggplot-map-with-l ○ http://spatial.ly/2012/02/great-maps-ggplot2/ ● RGoogleMaps ○ http://rforwork.info/tag/rgooglemaps/
  33. 33. R For Interactive ● Shiny ○ Leaflet ■ http://rstudio.github.io/leaflet/ ■ http://shiny.rstudio.com/gallery/superzip-example.html ■ http://shiny.rstudio.com/gallery/bus-dashboard.html ○ Globe ■ https://github.com/trestletech/shinyGlobe
  34. 34. R Animation ● http://rmaps.github.io/blog/posts/animated-choropleths/
  35. 35. @aaronkoblin
  36. 36. Keep an Eye Out! https://locationtech.org/list-of-projects
  37. 37. Contact ● Daniel Marcous ● dmarcous@gmail.com

×