Geoservices activities at EDINA(OR Why the Elephant is your Friend)
About - EDINA National Data Centre • A designated National Data Centre for Tertiary Education since 1995 • Based at The University of Edinburgh • Our mission... to enhance the productivity of research, learning and teaching in UK higher and further education BY delivering access to a range of online data services through a UK academic infrastructure, as well as supporting knowledge exchange and ICT capacity building, nationally and internationally. • Focus is on service but also undertake r&D • History – first online GI service, UKBORDERS, launched in 1994 – flagship Digimap service now a teenager! – substantial experience in handling geospatial data on a large scale (large db; large user base)
The Geoservices Team• Largest team within EDINA 1999• Highly experienced and skilled team Projects – provides advice nationally and internationally Services – active in standards development and policy – active in GI community nationally and Today internationally Projects• Demands of the services offered means the team has been at leading edge of GI service Services development in UK
Our Service requirements • Fast servicing of requests • Scaleable and extensible – accommodates steady or increasing demand • Robust (our SLA aspires to 99% uptime!) • Maintainable • Standardised – can easily substitute components for repair, upgrade, etc. • Rapid prototyping and rollout • All of above on tight budget!
What do we use Postgres/PostGIS for? • Service operation and management • Map creation – Data store for vector based maps – Indexing service for raster based maps – Source for ‘Get Feature Info’ queries • Data Delivery – Data store for vector products • Searching/Querying – Advanced place name searching
… for service operation and management • Store service critical metadata • User data • Control user access • Log activity
Case Study: Digimap • Approx 50,000 active users at any point in time • Academic Year 2010/11 stats • c400,000 logins • Over 10 million maps created • 240,000 high quality print maps generated • 100,000 data download requests • Over 1 million data files downloaded
… as a ‘Data Store’ for mapping • From the (very) large • Ordnance Survey’s MasterMap (in EDINA’s map schema) Data Rows: Area: 107,293,931 Lines: 278,110,576 Boundary: 535,039 Points: 3,984,140 Symbols: 2,793,680 Text: 21,004,729 Data Size (indexes): Area: 49 Gb (13Gb) Lines: 73 Gb (24Gb) Boundary: 321 Mb (46 Mb) Points: 668 Mb (399 Mb) Symbols: 522 Mb (236 Mb) Text: 4 Gb (1.7gb)
… as a ‘Data Store’ for mapping • … via the small but cartographically complex • Ordnance Survey’s Strategi Only 778,000 rows Range of geometries Strict layer draw order Over 50 layers Many drawn multiple times
… as a ‘Data Store’ for mapping • … to the complex data schema • SeaZone’s Hydrospatial Large range of features Complex feature relationships Individual layers scale control
… as a ‘Spatial Indexing’ system • Spatial index for 1.4 million historical maps of Great Britain • Covers the late 1840s to early 1990s Complex file structure Reflects original capture Counties Towns Editions Scale And the digitisation process … but not critically TIME
• However, for historical data the temporal availability was critical.• Use of date information in addition to spatial index allows maps to be placed in correct time slot – Used publication date as survey date metadata missing – An example of a MapServer layer definition for 1890s maps: area from (select * from historic.ancient_roam_tiles b, (select county, max(edition) as edition2, a.sheet_no from historic.ancient_roam_tiles a, (selectmax(version) as max_version, sheet_no from historic.ancient_roam_tiles where (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and (version = ng or version = cs_ng) and st_setsrid(!BOX!,27700) && area group by sheet_no) as selection where a.version = selection.max_version and a.sheet_no=selection.sheet_no and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) group by a.sheet_no, county) as sheet_group where b.sheet_no=sheet_group.sheet_no and b.county = sheet_group.county and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and b.edition = sheet_group.edition2) as subq using unique id using SRID=27700
• Ease of use with range of map rendering software OS Strategi (Cadcorp GeognoSIS)OS Open Data: Panorama and VectorMap District products plus grid lines andlabels (MapServer)
… for WMS GetFeature Info• Easy to provide information about selected feature.• Allow use of additional search parameters, for example proximity to point clicked.• Access additional metadata tables for Example of proximity information. search (especially useful for point data) Map sheet information stored in metadata tables. Bedrock information and selected area highlighted.
… update interfaces to reflect current map Legend shows only rock types in area (over 1000 in full legend) Timeline highlights selected as well as other available decades
… as a ‘Data Store’ for download UKBORDERS provides bespoke data extraction of vector boundary data in custom formats (Shape, MIF,KML,DXF) Realtime extraction - uses Geoserver over PostGIS as WFS piped through FME Metamodel built around PostGIS (formerly Oracle). Migration resulted in a more scalable (multiple dev/live/fallover instances) with easier desktop prototyping OpenBoundaries – same engine, different data (all based around derived OS Open Data) and skin
… for querying • Unlock provides an Application Programming Interface (API) for querying over 11 million geographic names across variety of gazetteers: • GeoNames (world coverage) • Pleiades ancient place names (world coverage) • Natural Earth (world coverage) • OS products (UK coverage): 1:50,000 Placename Gazetteer, Meridian 2, Boundary- Line, BN Grid references • Placename outlines and attribution extracted from mapping data or published gazetteers • Outlines are unique service feature enabling further spatial data extraction and analysis • Unlock Places extensively uses stored database procedures: • The writing of dynamic queries. • Allowing complex data filtering and parsing.
Outline of Southampton returned by Unlock Places
How do we use Postgres/PostGIS to best effect • Ensure data schemas are determined by functionality – Do NOT accept defaults from loaders – Use INTs for primary selection attributes • Tailor data processing to task – For mapping do NOT include non-mapped features or attributes • Indexes are your friend – Ensure all search attributes are indexed • Clustered indexes are your best pal – Critical for our mapping schemas • Bad or unnecessary indexes are your worst enemy – Can cause sever slowdown resulting in a bad user experience – Make use of EXPLAIN
• Hide internal complexity behind database views – makes applications more portable• Use schemas to roll out data updates (just set search path to look in new default schema), makes rolling back to previous data version easy.• Take advantage of stored procedures. If SQL is hidden in application code then it might be impossible to roll out changes instantly because of the need to re-compile, re-deploy the application, downtime might be required By storing SQL within procedures any changes become immediate and more seamless.• Use built in data replication per instance – feel more protected from bad luck!
What we like about Postgres/PostGIS • Reliable ..and the elephants ... • Performant • Scalable • Easier replication • Standards compliant • Comes with good tools • Superb 3rd party support
The future: What are we planning? • Migrating to Postgres 9.1 – Currently we have a mix of 8.3 and 8.4 installs – Take advantage of new functionality and bug fixes • Exploring the new functionality in PostGIS 2.0 to enhance existing services and possible new ones – Raster capabilities – Topology – Generalisation with Highly generalised Census topological consistency 2001 OAs in Nottingham. all input features are constraints present post generalisation with no overlaps or new slivers introduced.
Conclusion • Postgres and PostGIS has been used to power EDINA geo- services for over 8 years • During late 2011 the last major service was migrated. • All geo-services (and some non-geo ones!) at EDINA rely on Postgres/PostGIS as either the sole or principal database • It will continue to form the core of our services for the foreseeable future. • The elephant is our friend, it certainly could be yours!