It’s All About Location,
Location, Location
Corinne Hutchinson
July 8, 2015 -- PuPPY
Highlights of Tonight’s Talk
● Reasons for using location data in a web
application
● Overview of GeoDjango & setting up a basic
web application with geospatial support
● Scaling a system reliant on geospatial queries
Basic GeoSpatial Questions
● Where is A located? (i.e. point-in-polygon,
mappability)
● What is the distance between A & B?
● What is the shortest path between A & B? (i.e.
route planning)
● What’s the elevation change between A & B?
GeoDjango
● Included in standard installations of Django since 1.4
● Allows importing of geospatial data from essentially any
vector data source (e.g. KML, shapefiles); raster data
(bitmaps, etc) are not supported
● Provides familiar ORM interface for geospatial queries
● Straightforward to learn: excellent tutorial on main
Django site
Starting a Geospatial Application: DB
● Pick your database and install any needed
extensions
● per-DB tutorial in GeoDjango docs, supported
options are PostgreSQL/Postgis, MySQL,
SQLite, or Oracle
● PostgreSQL/Postgis and Oracle Spatial
generally considered the most mature spatial
database options
Starting a Geospatial Application: Models
● Define geomodels
● Add admin interface
Adding Data? Lots of Free Geospatial Data Sources
● US Census TIGER (Topologically Integrated Geographic Encoding and
Referencing): political boundaries e.g. states, counties, metro areas
● Natural Earth: natural features
● OpenStreetMap: map tiles, land use, etc
● NASA’s Socioeconomic Data and Applications Center (SEDAC): data about
human-environment interactions e.g. land use, poverty, climate
● Open Topography: topo data, most of the world
Simple Data Import Tools
Modifying/Updating Polygons
● GeoDjango admin provides drag-and-drop
editing tools
Using Your App: Making Queries
● Geospatial lookups through ORM
o distance
o point-in-polygon
This is great! Can we scale it?
● Minimize direct database hits
● Re-route duplicated database calls to cache
(e.g. Redis, MemCache)
● Reformat our data for cacheability: geohashes
Geohashes
● Developed by Gustavo Niemeyer, entered into
public domain in 2008
● Method of sequentially subdividing the globe
into spatial buckets
● Buckets represented as encoded binary
strings (e.g. 0010110101011100011000110001101111000111 -> 5pf666y7)
● Allows for very fast point-in-polygon lookups
Examples
Adding Geohashes to Point-in-Polygon Lookups
● Choose level of precision (5 or 6 are likely good)
● Convert point to a geohash, then extract the center of that
geohash
● DB lookup to determine containing polygon
● Finally, cache the geohash-to-polygon mapping (i.e. set
the key ‘c22zp’ to the value ‘Seattle’)
● Subsequent lookups; check cache for existing key
matching geohash before conducting DB lookup
Example:
Take-Aways
● GeoDjango is simple to use
● Geohashes are a good tool to help scale
geospatial lookups
More Awesome GeoSpatial Libraries
● OGR/GDAL: interacting with geospatial data formats, i.e. opening files, etc
● PyShp: ESRI shapefile handling in pure Python (https://pypi.python.org/pypi/pyshp)
● PySAL: spatial analysis functions (https://github.com/pysal/pysal)
● PyQGIS: essentially anything you might want to do with GIS data
(http://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/intro.html)
● geopy: geocoding; integration with OpenStreetMaps, Google Geocoding API, Baidu Maps, and
many more (https://github.com/geopy/geopy)
● python-geohash: encoding/decoding points to geohashes, looking up geohash neighbors
● descartes: plotting geometric objects in matplotlib (https://pypi.python.org/pypi/descartes)
● NumPy: data wrangling (http://www.numpy.org/)
● pandas: data wrangling (http://pandas.pydata.org/)

Corinne Hutchinson's 7/8/2015 PuPPy Presentation on GeoDjango

  • 1.
    It’s All AboutLocation, Location, Location Corinne Hutchinson July 8, 2015 -- PuPPY
  • 2.
    Highlights of Tonight’sTalk ● Reasons for using location data in a web application ● Overview of GeoDjango & setting up a basic web application with geospatial support ● Scaling a system reliant on geospatial queries
  • 3.
    Basic GeoSpatial Questions ●Where is A located? (i.e. point-in-polygon, mappability) ● What is the distance between A & B? ● What is the shortest path between A & B? (i.e. route planning) ● What’s the elevation change between A & B?
  • 4.
    GeoDjango ● Included instandard installations of Django since 1.4 ● Allows importing of geospatial data from essentially any vector data source (e.g. KML, shapefiles); raster data (bitmaps, etc) are not supported ● Provides familiar ORM interface for geospatial queries ● Straightforward to learn: excellent tutorial on main Django site
  • 5.
    Starting a GeospatialApplication: DB ● Pick your database and install any needed extensions ● per-DB tutorial in GeoDjango docs, supported options are PostgreSQL/Postgis, MySQL, SQLite, or Oracle ● PostgreSQL/Postgis and Oracle Spatial generally considered the most mature spatial database options
  • 6.
    Starting a GeospatialApplication: Models ● Define geomodels ● Add admin interface
  • 7.
    Adding Data? Lotsof Free Geospatial Data Sources ● US Census TIGER (Topologically Integrated Geographic Encoding and Referencing): political boundaries e.g. states, counties, metro areas ● Natural Earth: natural features ● OpenStreetMap: map tiles, land use, etc ● NASA’s Socioeconomic Data and Applications Center (SEDAC): data about human-environment interactions e.g. land use, poverty, climate ● Open Topography: topo data, most of the world
  • 8.
  • 9.
    Modifying/Updating Polygons ● GeoDjangoadmin provides drag-and-drop editing tools
  • 10.
    Using Your App:Making Queries ● Geospatial lookups through ORM o distance o point-in-polygon
  • 11.
    This is great!Can we scale it? ● Minimize direct database hits ● Re-route duplicated database calls to cache (e.g. Redis, MemCache) ● Reformat our data for cacheability: geohashes
  • 12.
    Geohashes ● Developed byGustavo Niemeyer, entered into public domain in 2008 ● Method of sequentially subdividing the globe into spatial buckets ● Buckets represented as encoded binary strings (e.g. 0010110101011100011000110001101111000111 -> 5pf666y7) ● Allows for very fast point-in-polygon lookups
  • 13.
  • 14.
    Adding Geohashes toPoint-in-Polygon Lookups ● Choose level of precision (5 or 6 are likely good) ● Convert point to a geohash, then extract the center of that geohash ● DB lookup to determine containing polygon ● Finally, cache the geohash-to-polygon mapping (i.e. set the key ‘c22zp’ to the value ‘Seattle’) ● Subsequent lookups; check cache for existing key matching geohash before conducting DB lookup
  • 15.
  • 16.
    Take-Aways ● GeoDjango issimple to use ● Geohashes are a good tool to help scale geospatial lookups
  • 17.
    More Awesome GeoSpatialLibraries ● OGR/GDAL: interacting with geospatial data formats, i.e. opening files, etc ● PyShp: ESRI shapefile handling in pure Python (https://pypi.python.org/pypi/pyshp) ● PySAL: spatial analysis functions (https://github.com/pysal/pysal) ● PyQGIS: essentially anything you might want to do with GIS data (http://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/intro.html) ● geopy: geocoding; integration with OpenStreetMaps, Google Geocoding API, Baidu Maps, and many more (https://github.com/geopy/geopy) ● python-geohash: encoding/decoding points to geohashes, looking up geohash neighbors ● descartes: plotting geometric objects in matplotlib (https://pypi.python.org/pypi/descartes) ● NumPy: data wrangling (http://www.numpy.org/) ● pandas: data wrangling (http://pandas.pydata.org/)

Editor's Notes

  • #2 about me
  • #4 Takeaway: these questions are all just basic questions about geometry
  • #5 clearer idea of the questions, what’re the tools? In this talk I wanted to focus on getting a web application up and running that can do geospatial lookups. There are a LOT of other tools that can be used for geospatial data (not jsut on the web), I’ve listed a number on the last slide of this deck but won’t be talking about those in any greater depth tonight. Tonight I’m just going to talk about using GeoDjango, which is a pretty fantastic tool. essentially diff is vector data is the data broken down into geometric features, contrasted with raster data which is the data broken down into grid. Remembering back to the questions we outlined before, it’s not terribly surprising that vector data will allow us to ask our questions.
  • #6 Not going to talk at great length about db choice; that could very easily be a whole talk on its own .
  • #8 ESRI (Environmental Systems Research Institute titan of GIS data, CA-based company started off as land-use consultants back in the 60s) shapefiles? Do you want to start off with any areas defined? do you want to, say, know the location of NYC, or do you want to custom-define regions yourself? If so, let’s check out some options on the next slide.
  • #12 er, sounds super, but what exactly does that mean? We don’t want to kill our database doing a thousand lookups a second. However, we can easily throw many times that many lookups at a caching layer, and have no problem. Great! So how can we do this? advantages to relational db (structured data store, etc), cache is going to be super fast but there’s no caching system that currently provides support for complex geospatial data. Caches like Redis use a basic key-value store, so we can cache a key (i.e. a unique string) and know that when we go to pull that out of our cache, we’ll get whatever it was that we stored on the other end (a list, a string, etc). But we *can’t* do awesome queries like we saw in the Django ORM. So . . . how can we do this? We need to restructure! Enter . . . next slide.
  • #13 Who here has heard of geohashes? Who here has worked with geohashes? Skip this slide if *everyone* raises their hand. System we’ve been talking about up until now is pretty intuitive, very similar to how we naturally tend to think of data. A city is a polygon, you can draw it on a map, if you want to know if something is inside of it you just find the dot and find out if the dot is inside that polygon. Super! Geohashes are also pretty intuitive, but allow for a bit of abstraction on the level of precision.
  • #15 doesn’t matter the level of precision, really, bigger is less precise of course, but smaller you just have way more to manage. consistency *does* matter, though.