CensusGIV - Geographic Information Visualisation of Census Data
1. UCL DEPARTMENT OF GEOGRAPHY
UCL DEPARTMENT OF GEOGRAPHY
UCL DEPARTMENT OF GEOGRAPHY
CensusGIV
Geographic Information Visualisation of Census Data
CASA Seminar
9 December 2009
Pablo Mateos
Oliver O’Brien
Department of Geography
University College London
www.censusprofiler.org
2. UCL DEPARTMENT OF GEOGRAPHY
Contents
• Context & Justification
• CensusGIV Aims &
Objectives
• Design Considerations
• System Architecture
• Demo
4. UCL DEPARTMENT OF GEOGRAPHY
The Generation
• Those born after 1993 have only known life with
the Internet
– A generation whose first port of call for knowledge is the
internet through Google‟s search engine, as opposed to
books, libraries or traditional (off-line) information
sources
(CIBER, 2008)
8. UCL DEPARTMENT OF GEOGRAPHY
Geoweb 2.0 in Teaching
UCL Geography
undergraduate field
course in London
9. UCL DEPARTMENT OF GEOGRAPHY
Geovisualisation (GVis)
• Refers to the visual representation of spatial data.
• GVis as a research tool use for:
– Hypotheses generation, knowledge discovery, analysis,
presentation and evaluation (Buckley, 2000)
• Increasing realisation of the potential for „geography‟ to
provide the primary basis for innovative visualisation and
knowledge exploration
(Dodge, McDerby and Turner, 2006)
• Recognised potential of GVis
– To make sense of increasingly large datasets
– Produce alternative representations of space
10. UCL DEPARTMENT OF GEOGRAPHY
Current Census Thematic Maps by ONS
• Neighbourhood
Statistics
(NeSS)
– 11 steps to
view a census
thematic map!
• Mapping in
CASWEB
– not present
15. UCL DEPARTMENT OF GEOGRAPHY
CensusGIV: Objectives
1. Develop a prototype to provide innovative
geographical visualization of the Census small
area statistics datasets.
2. Provide an extensive technical evaluation of the
different technological alternatives.
3. Proposal to scale up to a full service in 2011.
4. Promote the use of innovative geographic
visualisation of population datasets using
mapping mashups.
16. UCL DEPARTMENT OF GEOGRAPHY
CensusGIV: Plan
• ESRC Census Development grant £80,000
• Timeframe: 15 months (2009/10)
• Develop a Geovisualisation prototype of the UK
2011 Census using “Geoweb 2.0” technologies
• Mapping mashups based on data feeds from an
ONS “Census hypercube” or NeSS data stream
17. UCL DEPARTMENT OF GEOGRAPHY
People
• UCL Geography
– Pablo Mateos (P.I.)
– Paul Longley (co-P.I.)
– Oliver O‟Brien
• UCL CASA
– Mike Batty (co-P.I.)
– Richard Milton (consultant)
• User Panel
– Jointly with EDINA DIaD project
18. UCL DEPARTMENT OF GEOGRAPHY
CensusGIV: Requirements and Issues
• User not faced with queries or complex questions!
– Start with a map (e.g. population density)
– Automatic scale-determined geographical units
– Base map backdrop
• Available to the general public & “mashable”
• Issues:
– Intellectual Property Rights
• Geographic boundaries & Census datasets
– Data size: Over 3,000 Census variables x 300k geog units
– Managing a large number of concurrent users
19. UCL DEPARTMENT OF GEOGRAPHY
Evaluation Criteria for Final Solution
• Scalability
• Response time
• Maximum number of concurrent users
• Data storage and retrieval
• Flexibility of geovisualisation options
• Ease of use and simplicity
• Intellectual Property Rights (IPR) issues
• Cost of development and implementation
20. UCL DEPARTMENT OF GEOGRAPHY
Geovisualisation Prototypes
• Different technologies have been explored:
– WMS/ WFS
– Adobe Flash (Flex) vector maps
– SVG vector maps
– KML vector maps with Google Maps API
– Raster maps with OpenLayers
21. UCL DEPARTMENT OF GEOGRAPHY
CensusGIV: Timeline
• October 2008 – February 2009
– Evaluation phase (completed)
• October 2009 – June 2010
– Developing prototype
• Trade-offs to be made between:
– response time, storage space, concurrent users, IPR protection,
ease of navigability, flexible visualisation, back-end/front-end
solutions, cost
• First version of prototype to be tested this month
• ONS / Census Programme to decide full implementation
for 2011 Census
23. UCL DEPARTMENT OF GEOGRAPHY
Fundamental Design Decisions
• Server-based rasters
– Faster on the client side
– Fast enough on the server side
– Not delivering restricted data to the client
• Open Source software
– Leverage the powerful OpenLayers mapping API
– More powerful than Google Maps API
– An active development community
– Full access to the source – can do “cool stuff”
• “Slippy” map
– Intuitive
– Encourages exploration
24. UCL DEPARTMENT OF GEOGRAPHY
Maps of Population Data
• Cartograms
– Fairer
representation
– Multiple variables
can be shown
together
• Choropleth Maps
– Easier to relate to
• Surface mapping
– Interpolation
25. UCL DEPARTMENT OF GEOGRAPHY
Accessing the Census Data
• Neighbourhood Statistics
– Hunter vs Gatherer
– NeSS Data API (SOAP)
– CSV Downloads
• Still tedious – for each UV:
– Download files for each GOR
– Stitch them together (has been automated)
– Create corresponding tables in the database, add data
– Add ranking scores
– Add metadata
– NeSS Data API (REST) coming February 2010
• CASWEB
26. UCL DEPARTMENT OF GEOGRAPHY
Structure of The Web-App
• OpenLayers “slippy” map
– Fully opaque grey base layer
• Could be switched for aerial
imagery from Google/Microsoft
– Opaque choropleth overlay
• Variable translucency if aerial
imagery underneath
– Context overlay
• Points, lines and names
• Sea area in lighter grey
• Otherwise transparent
– POIs
• e.g. schools, hospitals
• SVG vectors rather than tiles
• “Clickable”
28. UCL DEPARTMENT OF GEOGRAPHY
Why a Custom Context Layer?
• Having full control is a definite
advantage
– Underlay
• Google colours/features can clash with
choropleths
• Lose the context if choropleth is fully
opaque
– Overlay
• Google labels can obscure information
• Google‟s cartography recently changed (for
the better)
• But no control over future changes
29. UCL DEPARTMENT OF GEOGRAPHY
Cartography of the Context Layer
• Difficult to get right
– Urban vs Rural
• Strictly Black & White
• Few point features
– Hospitals, airports, place names
• Fewer areal features
– Lakes, sea
• Mainly a network of roads/rivers/railways
• Less is more
30. UCL DEPARTMENT OF GEOGRAPHY
Creating the Context Layer
• PostGIS database
– Using the OpenStreetMap dataset for the UK
– Relatively slow to create the images from the data
• ~50 database queries for each image tile
• Higher zoom levels have tiles with smaller extent, but we include more
detail at these levels, which cancels out the speed increase
– Render on demand “unimportant” tiles at zoom levels 16-18
– Pre-render everything else
• Painter‟s Algorithm
31. UCL DEPARTMENT OF GEOGRAPHY
Painter’s Algorithm
• Two hierarchies of layering
– Feature-based layering
• Land, water, road/railway casings & cores, place names
– Intra-feature level z-ordering
• Complex road junctions
• Railway/road crossing
32. UCL DEPARTMENT OF GEOGRAPHY
Pre-rendering of Context Layer
• Rendered on “gibin”, a quad-core computer running Linux
• Utilising the Python “Threading” module – 4 tiles created at once
• The image “tiles” are PNGs with an alpha layer
• Bounding box: -10.7 W to 1.8 E, 49.8 N to 60.9 N (All of the UK)
Zoom Scale No of Size Detail Time
Level Tiles /MB /min
6-9 < 1:1M 790 5 Cities, motorways <1
10 1:600,000 2,146 15 + towns, trunk roads, lakes 1
11 1:300,000 8,208 40 + main roads, rivers, airfields 2
12 1:150,000 32,318 156 + minor roads, railways, villages 7
13 1:72,000 128,250 500 + main road, water & area names 24
14 1:36,000 510,962 1.4 GB + paths 1h 28
15 1:18,000 2,041,572 4.4 GB + minor road names 5h 34
34. UCL DEPARTMENT OF GEOGRAPHY
The Context Layer (Levels 12-17)
On Demand On Demand
35. UCL DEPARTMENT OF GEOGRAPHY
Creating the Choropleth Layers
• PostGIS database of census data
• Would never want to pre-render all the choropleths at all zoom levels
– 1000+ metrics × 10 groupings × 30 colour schemes × 2 colour orders × 13
zoom levels × 000s of tiles per zoom
– Makes sense to cache most popular zoom levels, metrics, colours
– Most people will never “explore” the map at a greater zoom level - usage
decreases exponentially with the number of clicks in a web app.
• Specially crafted URL
– Boundary Table, Data Table, Metric
– Bounding Box, Zoom
– Colour Scheme, No of Groups
– Range Type, Range Attributes
• Min/Max
• Average/Deviation
36. UCL DEPARTMENT OF GEOGRAPHY
The Modifiable Areal Unit Problem
Boundary Level Average No of Vertices Number
Type (Simplified)
MSOA 6, 7, 8 145 7,196 (Eng & Wal)
LSOA 9, 10, 11 62 40,884 (not N.I.)
OA 12 - 18 26 223,131
41. UCL DEPARTMENT OF GEOGRAPHY
Colour Theory
• “practical guidance to colour mixing and the visual
impacts of specific colour combinations”
• Formal considerations
– Colour harmony (complementary colours – pink vs blue)
– Colour context (bright colours beside subdued colours)
– Colour blindness
• Very subjective
42. UCL DEPARTMENT OF GEOGRAPHY
Colour Considerations
• Colour should relate to data type:
– Sequential
– Diverging
– Qualitative
• The “most of the UK is countryside” problem
– Try not to use bright colours for the countryside.
• Hot Bad High Cold Natural Good
Neutral Girls Boys
44. UCL DEPARTMENT OF GEOGRAPHY
Colourbrewer
• Cynthia Brewer‟s colorbrewer2.com
– Provides a set of “good” colour schemes which can be
incorporated easily into Python scripts, ArcMap, etc.
– Generally vary by hue and/or lightness
• Sequential
– Lightness should be varied, use analogous colours if varying hue
– Plenty of “good” maps that don‟t follow this rule
• Diverging
– Mid-point should be a light colour
– Extremes should have darker colours with complementary hues
• Qualitative
– Hues should vary
45. UCL DEPARTMENT OF GEOGRAPHY
Aerial Imagery
layerAerial = new OpenLayers.Layer.Google("Aerial Imagery",
{numZoomLevels: 16, type: G_SATELLITE_MAP, sphericalMercator: true});
• Very easy!
• OpenLayers
– Google Maps imagery layer
– Microsoft Virtual Earth layer
• Only useful when zoomed in
• Need to be mindful that
colour imagery interferes
with choropleth colours
• No longer self-contained
46. UCL DEPARTMENT OF GEOGRAPHY
Points of Interest (POIs)
• PostgreSQL (or MySQL) database
• Can be a completely separate server
• Client‟s OpenLayers does the work
• Aim is to provide even more context
• School names & performance indicators
47. UCL DEPARTMENT OF GEOGRAPHY
User Interface
Less is more Choice is good
• How do you get people to explore the maps?
– Maptube “visual directory”
– Hierarchical drop-down lists
– Tag cloud of keywords, maybe with a hierarchy
48. UCL DEPARTMENT OF GEOGRAPHY
User Interface – Tag Cloud
• Useful for exploring if you don‟t know what you want
• More structured alternative needed for specific research
50. UCL DEPARTMENT OF GEOGRAPHY
A Note on Python
• If you don‟t use it already, you will!
– ArcGIS 9.4
• “Python is now integrated directly into ArcMap [9.4]. I say it
every year, but if you are an ArcGIS Desktop user, you need to
take a close look at python as your scripting language.” - James
Fee
• The best thing about Python is:
– Tidy scripts!
51. UCL DEPARTMENT OF GEOGRAPHY
Servers
Server Room
dev tba
tiler1 tiler2 tiler3
blog pois
tiles1 tiles2 tiles3
www
Web browsers
52. UCL DEPARTMENT OF GEOGRAPHY
System Architecture – Website
Web Apache
browser (www)
53. UCL DEPARTMENT OF GEOGRAPHY
System Architecture – Context
Apache Web Apache
(tiles2) browser (www)
404 • No python involved
Tile
– Less strain on the
Tile
exists No server
? • Web browser may
have to request
image twice
Yes – Slow for the client
54. UCL DEPARTMENT OF GEOGRAPHY
System Architecture – Context XML
mod_python
Apache Web Apache
(tiler2) browser (www)
Python
XML
renderer.py gen_tile.py
Cache
55. UCL DEPARTMENT OF GEOGRAPHY
System Architecture – Choropleth
mod_python
Apache Web Apache
(tiler3) browser (www)
Python
Colorbrewer
Tile
renderer.py gen_tile.py
Cache
Tile (low
Yes exists No zoom)
?
56. UCL DEPARTMENT OF GEOGRAPHY
Scalability
• OpenLayers allows
multiple servers to be
specified for retrieving
image tiles
• Different servers for
different tasks
• Random server chosen per-
tile
• So should scale?
• Process is still processor
intensive if generating the
tiles at the same time
• Stress testing needed
57. UCL DEPARTMENT OF GEOGRAPHY
Prototype: Current State & Next Steps
On-demand tile × Legend
generation × Automated data updates
Fast (enough) × Tag cloud
Will scale (hopefully!) × Improve cartography
OSM not quite “complete” × Internet Explorer 6
but getting there www.savethedevelopers.org
Context layer finished × Other census & ONS data
Some data added × Interactive data combination
× Scotland & Northern Ireland
× Points of Interest
Running until June 2010
58. UCL DEPARTMENT OF GEOGRAPHY
Live Demo
• http://www.censusprofiler.org/prototype/
60. UCL DEPARTMENT OF GEOGRAPHY
Q&A
www.censusprofiler.org
www.oliverobrien.co.uk
Google Street View and Google Earth POI data is Copyright Google. Google Maps mapping data is Copyright Tele Atlas. Google aerial
imagery is Copyright Digital Globe, Infoterra Ltd, Bluesky, GeoEye, Getmapping plc, The Geoinformation Group. OpenStreetMap data is
CC-BY-SA OpenStreetMap and contributors. Logos depicted are generally Copyright of their respective organisations. Some image tiles
include boundary information supplied by EDINA‟s UKBORDERS service. The Census data is supplied by the Office for National Statistics.
The Word Cloud was produced with Wordle. The colour wheel diagrams are from worqx.com. The Painter’s Algorithm picture and the
HSL colour diagram are from Wikipedia. The cartogram was produced by James Cheshire. The corresponding choropleth was produced
by the BBC.
The following references were used in the first part of this presentation:
CIBER (2008) information behaviour of the researcher of the future. A report commissioned by The British Library and JISC 11 January
2008. http://www.bl.uk/news/pdf/googlegen.pdf
Goodchild (2007) Citizens as Sensors: The world of Volunteered Geography. Workshop on Volunteered Geographic Information, Santa
Barbara, CA. December 13-14, 2007 http://www.ncgia.ucsb.edu/projects/vgi/docs/position/Goodchild_VGI2007.pdf
O’Reilly, T (2005) What Is web 2.0 Design Patterns and Business Models for the Next Generation of Software
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/whatisWeb20.html
Turner A (2007) Introduction to Neogeography. O‟Reilly Media Short Cuts.