2. WHO?
• Patch.com is a local news and information service
• We have editorial staff in each of our 44 communities and are
expanding to cover hundreds of communities across the country this
year
• Our goal is to be the equivalent of a local paper in each town, and we
cover everything that goes on there, including everything from the
school board meeting to restaurant reviews and obituaries
• We also have a significant listings-gathering process that collects
information on every business in town
• In general, all of our data is connected to geographic information
5. WHEN?
• We started considering this project after seeing Paul Smith’s
“Take Control of Your Maps” article on A List Apart in April,
2008
• The
project really took shape in August, 2009 and launched in
November
6. WHY?
• Ourpainful design process left us with a custom page, and a
map that looks just like everybody else’s
• Map data at the highly local level still kinda sucks
• You can’t integrate data directly into a Google (etc.) map
7. THE STACK
Apache (mod_wsgi)
TileCache
Mapnik
PostGIS (with OSM planet)
8. APACHE CONFIG
In httpd.conf:
LoadModule wsgi_module modules/mod_wsgi.so
In your sites-available/config:
WSGIDaemonProcess patch processes=25 threads=1 display-
name=%{GROUP}
WSGIProcessGroup patch
WSGIScriptAlias /tilecache
/data/servers/patch_maps-fe-apache/wsgi/tilecache.wsgi
Now, you can restart the WSGI service simply by
touching this file
9. TILECACHE WSGI CONFIG
This is the application that is executed with each request:
#!/opt/bcs/python2.6
#
# In Apache2's config:
# WSGIScriptAlias /tilecache /var/www/mapserver/tilecache/tilecache.wsgi
#
import os, sys
tilecachepath = '/srv/data/servers/patch_maps-fe-apache/tilecache'
sys.path.append(tilecachepath)
from TileCache.Service import Service, wsgiHandler
cfgfiles = (os.path.join(tilecachepath, 'tilecache.cfg'))
theService = None
def application(environ, start_response):
global theService
cfgs = cfgfiles
if not theService:
theService = Service.load(cfgs)
return wsgiHandler(environ, start_response, theService)
11. MAPNIK
• Mapnikrequires a mapfile telling it how to render tiles as
requests come in
• Ourswas built using cascadenik, with help from Stamen
Design, and we compiled the file to the mapfile format
• Thatstylesheet contains all of the queries to make against the
database in order to render each layer, along with information
on how to style them
• Cascadeniklets you write this in a CSS/HTML-like format,
where the style is separated from the content
12. OPTIMIZING QUERIES
From this:
(SELECT way, name FROM osm_polygon WHERE amenity IN ('school',
'college', 'university', 'bus_station', 'ferry_terminal',
'hospital', 'kindergarten', 'place_of_worship',
'public_building', 'townhall') ORDER BY z_order ASC, way_area
DESC) AS civic
To this:
(SELECT way, name FROM osm_polygon_civic_areas_mv ORDER BY
z_order ASC, way_area DESC) AS civic
13. BUILDING THE DATABASE
• Weuse the full planet.osm, though we’re only focused in
North America at the moment
• The initial import was done by downloading the planet file
and importing it using osm2pgsql. This took four days!!!
• We have a replicated slave database, as well as a static copy
that we back up to once per month
14. TILE_FLIP
http://github.com/aub/tile_flip
•Uses the TileCache API to provide a simple
interface for managing tiles
•Seeding
•Killing
•Finding
18. BACKGROUND TASKS
• Apply updates to the OSM data once per minute
• Expire cached tiles that were affected by the data update
• Publication updating and seeding
• Seeding low zoom levels
• Trimming the cache to a reasonable size
19. APPLYING MINUTELY
UPDATES
http://wiki.openstreetmap.org/wiki/Minutely_Mapnik
We have a python script that runs every minute via cron
/opt/bcs/bin/osmosis -q --read-replication-interval
workingDirectory=/srv/data/osm/osmosis/replication
--write-xml-change /dev/stdout |
/opt/bcs/bin/osm2pgsql --append --database=osm
--username=polarmaps_rw --host=patchbe-d03.ihost.aol.com
--port=5432 --merc --prefix=osm --slim
--style /opt/bcs/packages/osm2pgsql-1.0.0/osm2pgsql.style
--expire-tiles=18
--expire-output=/srv/data/osm/expiration_lists/tmpvj1Glb
- 2>> /srv/data/osm/log/osm2pgsql.log
20. APPLYING MINUTELY
UPDATES
osm2pgsql produces an expiration list that looks like:
18/42005/91478
18/42005/91479
18/42006/91478
Our script then parses this file and adds each tile and its
ancestors to a table in the database. A separate script
then crawls that table and uses tile_flip to expire the tiles
21. PUBLICATION UPDATING
AND SEEDING
• We have a set of publications, and almost all of our tile
requests are for areas around them
• The publications have a bounding box, and we have an API for
getting the publication data along with it’s location information
• Anotherscript running via cron then uses tile_flip to
automatically seed an area around each publication at all zoom
levels
22. LOW ZOOM LEVEL SEEDING
• The low zoom levels are the slowest to render
• So, wehave another script that walks levels 10-14 for the
entire USA and pre-seeds those tiles using tile_flip
23. STATIC MAPS
http://github.com/aub/static_maps/
• Once you have all of this set up, it’s amazingly simple to add
other services, like a static map renderer
• Mapnik has an excellent Python API you can use
• Setting
up other services is as simple as writing a Python script
and adding another WSGI service to Apache
• Our static map service has two endpoints, one for rendering a
map centered around a set of points, and the second for
returning a JSON response with the pixel positions of those
points
Configures TileCache to use a disk-based cache
Expire sets the expires header in the response for client-side image caching
Setting the extension to png256 compresses the tiles more efficiently
Issues with meta-tiling across servers
In the cascadenik download, you can find example working OSM stylesheets
This first query is great.
Unfortunately, there are 10,561,333 rows in osm_polygon, and the where clause can make it morbidly slow
We used triggers that fire as data is added to simulate materialized views and then did our queries against those views, which is much faster
This article explains how to do the basic setup
The command:
* uses osmosis to pull the latest data
* pipes the output, as xml, to osm2pgsql, which appends it to our database
* writes an expiration list of all tiles that would have been affected by the update