Open spatial data: sources
and tools

Stuart Macdonald
EDINA & Data Library
University of Edinburgh

stuart.macdonald@ed.ac.uk



                   School of Informatics Data Hack for
                   ILW – 18 Feb. 2012
EDINA & Data Library

EDINA and University Data Library (EDL)
together are a division within Information
Services of the University of Edinburgh.

EDINA is a JISC-funded National Data Centre
providing national online data resources for
education and research.

The Data Library assists Edinburgh University
users in the discovery, access, use and
management of research datasets.
Digimap started as a project under the eLib (Electronic Libraries)
      Programme in 1996 offering Ordnance Survey maps to 6 trial
      universities. The full service was launched in 2000


      Scoping exercise finding: 80% of maps are used by non-
      geographers

      The UK’s National Geospatial Data Framework (NGDF) estimates
      that approximately 80%* of information collected in the UK today
      is geo-referenced.

* Reid, J., 2002. geoXwalk – A Gazetteer Server and Service for UK Academia. IASSIST Quarterly, Vol. 26, Issue 3 - http://iassistdata.org/publications/iq/iq26/iqvol263reid.pdf
Slide courtesy of James Reid (EDINA, 2010) - http://prezi.com/n8ui3umrjxfh/survive-or-thrive/
“Open data is the idea that certain data
should be freely available to everyone
to use and republish as they wish,
without restrictions from copyright,
patents or other mechanisms of
control. The goals of the open data
movement are similar to those of other
"Open" movements such as open
source, open hardware, open content,
and open access.” – Wikipedia
(13/2/13)

“Open knowledge’ is any content,           This file is made available under the Creative Commons

information or data that people are free
                                           CC0 1.0 Universal Public Domain Dedication


to use, re-use and redistribute —
without any legal, technological or
social restriction.” – Open Knowledge
Foundation (15/2/13)
Open Knowledge Foundation (OKF)


(OKF) is an internationally
reknowned non-profit organisation
(2004) dedicated to promoting open
data and open content – including
government data, publicly funded
research and public domain cultural
content.
OKF build tools, projects and
communities with a network of
international partners each focused
on different aspects of open
knowledge, but united by common
concerns & goals
The Comprehensive Knowledge Archive
Network (CKAN) project is a web-based
system for the storage and distribution of
data, such as spreadsheets and the
contents of databases.

The system is used both as a public
platform on thedatahub.org and in various
government / regional data catalogues,
such as the UK's data.gov.uk, and the
European Commission Open Data Portal

CKAN source code is available from
github: https://github.com


URL: http://ckan.org/
CKAN provides a rich RESTful JSON API for querying and
accessing dataset information. The API provides:

    •  Full querying / searching
    •  Dataset listings by publisher, or by theme, etc
    •  Recent activity and additions (also available via
    RSS/Atom feed)
    •  Statistics on dataset usage
    •  RDF version of the catalogue (via an rdf extension)
    •  CSV & JSON dumps of entire catalogue

The API is fully documented at http://docs.ckan.org/.
CKAN has advanced geospatial features:

Data Preview: Where structured data with location
information is loaded into CKAN’s DataStore, CKAN can
plot the data on an interactive map.

Data Search: CKAN can understand a location associated
with a dataset, and use this to offer geospatial search
capabilities via the API e.g. by specifying a bounding box.

Data Discovery: To facilitate interoperability CKAN
includes tools to import geo-coded metadata in a number
of formats and make it queriable (‘discoverable’)
according to the INSPIRE standard.

For further geospatial capabilities see:
http://docs.ckan.org/en/latest/geospatial.html
Data Licensing
When sharing data, it is important to consider how you want your data to
be reused. Applying an explicit licence removes any ambiguity over what
users can and cannot do with your data. Lawyers can craft licences to
meet specific criteria, but there are a number of open licences developed
for use on the web that anyone can apply.

Licenses designed for one type of subject matter aren’t always best suited
to licensing another type of subject matter because of differences in how
copyright law applies.

Creative Commons (CC) licences were designed for 'generic' digital
content and may not be best suited to licensing specific types of subject
matter which have different intellectual property rights.

Indeed Creative Commons themselves have recommended against using
their licences (other than CC Zero - CC0, or "no rights reserved") for data
and databases.
Open Data Commons
Open Data Commons (ODC) have prepared a set of licences suitable for
data that are conformant with the principles set forth in the Open
Knowledge Definition. Each licence is accompanied by a statement
which can be placed with your data on a webpage that points to your
data.
Research Data Repositories
http://databib.org/ - a searchable catalog / registry / directory of
research data repositories

http://www.re3data.org/ - a global registry of research data repositories
from different academic disciplines




http://datashare.is.ed.ac.uk/

An online digital repository of multi-
disciplinary research datasets
produced at the University of
Edinburgh, hosted by the Data
Library in on a DSpace platform
Map or spatial mash-up ‘resource
                                discovery tool’ - there are 2659 spatial
                                mashups that utilise a whole range of Web
                                services, and 459 mapping APIs (Feb. 2013)
                                -
                                http://www.programmableweb.com/tag/ma
                                pping

                                Open Mapping Utilities:
                                GeoCommons - http://geocommons.com/
                                OpenStreetMap -
                                http://www.openstreetmap.org/
                                Google MapMaker -
                                http://www.google.com/mapmaker
                                Platial - http://www.platial.com/
UCL Centre for Advanced Spatial Analysis - http://www.casa.ucl.ac.uk/
MapTube - http://www.maptube.org/ - a free resource for viewing,
sharing, mixing and mashing maps created with the GMapCreator
software, released by CASA.
EDINA & Open Spatial Data - APIs/Developer tools:

Unlock is a set of web services intended to help researchers
and developers unlock the ‘spatial’ potential in digital
resources:


Unlock Places - An API helps developers to find locations and
shapes of places, and re-use them in your application.

•Unique UK location database compiled from OS Gazetteers

•An open access, worldwide coverage location database based
on open data from geonames.org


Unlock Geocodes – Convert UK postcodes or grid references to
co-ordinates

Unlock Text – Extract place names from text or metadata to find
their location using a geo-parser
http://unlock.edina.ac.uk/home/
ShareGeo Open – A repository of free
and reusable data sets deposited by
researchers and research institutions

•Find – search for user-contributed
datasets
•Re-use – download datasets for research,
teaching and learning
•Share – contribute your own datasets for
others to use
•Open – there are Open and Digimap
licensed versions for datasets with different
origins

http://www.sharegeo.ac.uk/
The Digimap OpenStream service provides access to a Web Map
Service (WMS) offering Ordnance Survey OpenData products, including:
GB Overview
Miniscale
1:250000 Colour Raster
VectorMap District Raster
OS Streetview

Use the Digimap OpenStream API to do things like:
•Mashups, combining OS Opendata with maps and data from other
sources
•adding OS Opendata to Google Earth.
•Embed maps in your website.
•provide OS mapping in your own applications.

Free for academic use. Registration required.

See URL: http://openstream.edina.ac.uk/registration/
Third last slide…



Gogeo – an online resource discovery tool for geospatial data created by UK
researchers: http://www.gogeo.ac.uk/metadata/search/

Users can also create, publish and export metadata records using the Geodoc tool




AddressingHistory has an API onto historic Post Office Directory data from
Edinburgh, Glasgow and Aberdeen - see URL:
http://addressinghistory.edina.ac.uk/api/.

The code for the POD Parser, used to convert Post Office Directory OCR into
structured data for AddressingHistory is also available on Github here:
https://github.com/gmh04/podparser/.
Final comments

• There’s a generally accepted assertion that 80% of all
  information has a spatial reference (implicit or otherwise) –
  exploit!

• If you’re creating open data products then Licence it!

• EDINA is a great place to start looking for spatial data services
  and tools including APIs!
FIN! - THANK YOU

 Credits:
 Image by aroid - http://www.flickr.com/photos/selago/34843234/ - CC BY 2.0
 Image by konqui - http://www.flickr.com/photos/konqui/2301314089/ - CC BY-NC 2.0
 Image by mosilager - http://www.flickr.com/photos/mosilager/2260598271/ - CC BY-NC-SA 2.0
 Image by racoles - http://www.flickr.com/photos/racoles/5719938981/ - CC BY-NC 2.0
 Image by James Bowe - http://www.flickr.com/photos/jamesrbowe/3351247547/ (CC BY 2.0)
 Image by yelnoc - http://www.flickr.com/photos/yelnoc/361303918/ - CC BY-NC-SA 2.0
 Image by epSos.de - http://www.flickr.com/photos/epsos/3384297473/ - CC BY 2.0
 Image by bek30 - http://www.flickr.com/photos/bek30/6107854810/ - CC BY-NC 2.0
 Image by karen horton - http://www.flickr.com/photos/karenhorton/3261277303/ - CC BY-NC 2.0
 Image by lofaesofa - http://www.flickr.com/photos/lofaesofa/227019975/ - CC BY 2.0
 Image by Psycho Delia - http://www.flickr.com/photos/24557420@N05/5588473657/ - CC BY-NC
 2.0
 Image by wdj(0) - http://www.flickr .com/photos/davidjoyner/534893725/ - CC BY-SA 2.0
 Image by Symic - http://www.flickr.com/photos/symic/2870349309/ - CC BY-SA 2.0
 Image by ~milj - http://www.flickr.com/photos/21989292@N07/4938052014/ - CC BY-NC-SA 2.0
 Image by giniger - http://www.flickr.com/photos/7304492@N06/417304290/ - CC BY-NC-SA 2.0
 Image by Libraryman - http://www.flickr.com/photos/libraryman/78337046/ - CC BY-NC-ND 2.0
 Image by Dru! - http://www.flickr.com/photos/druclimb/470572647/ - CC BY-NC 2.0
 Image by Muffet - http://www.flickr.com/photos/calliope/7102418379/ - CC BY 2.0

Open Spatial Data: Sources and Tools

  • 1.
    Open spatial data:sources and tools Stuart Macdonald EDINA & Data Library University of Edinburgh stuart.macdonald@ed.ac.uk School of Informatics Data Hack for ILW – 18 Feb. 2012
  • 2.
    EDINA & DataLibrary EDINA and University Data Library (EDL) together are a division within Information Services of the University of Edinburgh. EDINA is a JISC-funded National Data Centre providing national online data resources for education and research. The Data Library assists Edinburgh University users in the discovery, access, use and management of research datasets.
  • 3.
    Digimap started asa project under the eLib (Electronic Libraries) Programme in 1996 offering Ordnance Survey maps to 6 trial universities. The full service was launched in 2000 Scoping exercise finding: 80% of maps are used by non- geographers The UK’s National Geospatial Data Framework (NGDF) estimates that approximately 80%* of information collected in the UK today is geo-referenced. * Reid, J., 2002. geoXwalk – A Gazetteer Server and Service for UK Academia. IASSIST Quarterly, Vol. 26, Issue 3 - http://iassistdata.org/publications/iq/iq26/iqvol263reid.pdf
  • 4.
    Slide courtesy ofJames Reid (EDINA, 2010) - http://prezi.com/n8ui3umrjxfh/survive-or-thrive/
  • 5.
    “Open data isthe idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source, open hardware, open content, and open access.” – Wikipedia (13/2/13) “Open knowledge’ is any content, This file is made available under the Creative Commons information or data that people are free CC0 1.0 Universal Public Domain Dedication to use, re-use and redistribute — without any legal, technological or social restriction.” – Open Knowledge Foundation (15/2/13)
  • 6.
    Open Knowledge Foundation(OKF) (OKF) is an internationally reknowned non-profit organisation (2004) dedicated to promoting open data and open content – including government data, publicly funded research and public domain cultural content. OKF build tools, projects and communities with a network of international partners each focused on different aspects of open knowledge, but united by common concerns & goals
  • 7.
    The Comprehensive KnowledgeArchive Network (CKAN) project is a web-based system for the storage and distribution of data, such as spreadsheets and the contents of databases. The system is used both as a public platform on thedatahub.org and in various government / regional data catalogues, such as the UK's data.gov.uk, and the European Commission Open Data Portal CKAN source code is available from github: https://github.com URL: http://ckan.org/
  • 9.
    CKAN provides arich RESTful JSON API for querying and accessing dataset information. The API provides: • Full querying / searching • Dataset listings by publisher, or by theme, etc • Recent activity and additions (also available via RSS/Atom feed) • Statistics on dataset usage • RDF version of the catalogue (via an rdf extension) • CSV & JSON dumps of entire catalogue The API is fully documented at http://docs.ckan.org/.
  • 10.
    CKAN has advancedgeospatial features: Data Preview: Where structured data with location information is loaded into CKAN’s DataStore, CKAN can plot the data on an interactive map. Data Search: CKAN can understand a location associated with a dataset, and use this to offer geospatial search capabilities via the API e.g. by specifying a bounding box. Data Discovery: To facilitate interoperability CKAN includes tools to import geo-coded metadata in a number of formats and make it queriable (‘discoverable’) according to the INSPIRE standard. For further geospatial capabilities see: http://docs.ckan.org/en/latest/geospatial.html
  • 11.
    Data Licensing When sharingdata, it is important to consider how you want your data to be reused. Applying an explicit licence removes any ambiguity over what users can and cannot do with your data. Lawyers can craft licences to meet specific criteria, but there are a number of open licences developed for use on the web that anyone can apply. Licenses designed for one type of subject matter aren’t always best suited to licensing another type of subject matter because of differences in how copyright law applies. Creative Commons (CC) licences were designed for 'generic' digital content and may not be best suited to licensing specific types of subject matter which have different intellectual property rights. Indeed Creative Commons themselves have recommended against using their licences (other than CC Zero - CC0, or "no rights reserved") for data and databases.
  • 12.
    Open Data Commons OpenData Commons (ODC) have prepared a set of licences suitable for data that are conformant with the principles set forth in the Open Knowledge Definition. Each licence is accompanied by a statement which can be placed with your data on a webpage that points to your data.
  • 13.
    Research Data Repositories http://databib.org/- a searchable catalog / registry / directory of research data repositories http://www.re3data.org/ - a global registry of research data repositories from different academic disciplines http://datashare.is.ed.ac.uk/ An online digital repository of multi- disciplinary research datasets produced at the University of Edinburgh, hosted by the Data Library in on a DSpace platform
  • 14.
    Map or spatialmash-up ‘resource discovery tool’ - there are 2659 spatial mashups that utilise a whole range of Web services, and 459 mapping APIs (Feb. 2013) - http://www.programmableweb.com/tag/ma pping Open Mapping Utilities: GeoCommons - http://geocommons.com/ OpenStreetMap - http://www.openstreetmap.org/ Google MapMaker - http://www.google.com/mapmaker Platial - http://www.platial.com/ UCL Centre for Advanced Spatial Analysis - http://www.casa.ucl.ac.uk/ MapTube - http://www.maptube.org/ - a free resource for viewing, sharing, mixing and mashing maps created with the GMapCreator software, released by CASA.
  • 15.
    EDINA & OpenSpatial Data - APIs/Developer tools: Unlock is a set of web services intended to help researchers and developers unlock the ‘spatial’ potential in digital resources: Unlock Places - An API helps developers to find locations and shapes of places, and re-use them in your application. •Unique UK location database compiled from OS Gazetteers •An open access, worldwide coverage location database based on open data from geonames.org Unlock Geocodes – Convert UK postcodes or grid references to co-ordinates Unlock Text – Extract place names from text or metadata to find their location using a geo-parser http://unlock.edina.ac.uk/home/
  • 16.
    ShareGeo Open –A repository of free and reusable data sets deposited by researchers and research institutions •Find – search for user-contributed datasets •Re-use – download datasets for research, teaching and learning •Share – contribute your own datasets for others to use •Open – there are Open and Digimap licensed versions for datasets with different origins http://www.sharegeo.ac.uk/
  • 17.
    The Digimap OpenStreamservice provides access to a Web Map Service (WMS) offering Ordnance Survey OpenData products, including: GB Overview Miniscale 1:250000 Colour Raster VectorMap District Raster OS Streetview Use the Digimap OpenStream API to do things like: •Mashups, combining OS Opendata with maps and data from other sources •adding OS Opendata to Google Earth. •Embed maps in your website. •provide OS mapping in your own applications. Free for academic use. Registration required. See URL: http://openstream.edina.ac.uk/registration/
  • 18.
    Third last slide… Gogeo– an online resource discovery tool for geospatial data created by UK researchers: http://www.gogeo.ac.uk/metadata/search/ Users can also create, publish and export metadata records using the Geodoc tool AddressingHistory has an API onto historic Post Office Directory data from Edinburgh, Glasgow and Aberdeen - see URL: http://addressinghistory.edina.ac.uk/api/. The code for the POD Parser, used to convert Post Office Directory OCR into structured data for AddressingHistory is also available on Github here: https://github.com/gmh04/podparser/.
  • 19.
    Final comments • There’sa generally accepted assertion that 80% of all information has a spatial reference (implicit or otherwise) – exploit! • If you’re creating open data products then Licence it! • EDINA is a great place to start looking for spatial data services and tools including APIs!
  • 20.
    FIN! - THANKYOU Credits: Image by aroid - http://www.flickr.com/photos/selago/34843234/ - CC BY 2.0 Image by konqui - http://www.flickr.com/photos/konqui/2301314089/ - CC BY-NC 2.0 Image by mosilager - http://www.flickr.com/photos/mosilager/2260598271/ - CC BY-NC-SA 2.0 Image by racoles - http://www.flickr.com/photos/racoles/5719938981/ - CC BY-NC 2.0 Image by James Bowe - http://www.flickr.com/photos/jamesrbowe/3351247547/ (CC BY 2.0) Image by yelnoc - http://www.flickr.com/photos/yelnoc/361303918/ - CC BY-NC-SA 2.0 Image by epSos.de - http://www.flickr.com/photos/epsos/3384297473/ - CC BY 2.0 Image by bek30 - http://www.flickr.com/photos/bek30/6107854810/ - CC BY-NC 2.0 Image by karen horton - http://www.flickr.com/photos/karenhorton/3261277303/ - CC BY-NC 2.0 Image by lofaesofa - http://www.flickr.com/photos/lofaesofa/227019975/ - CC BY 2.0 Image by Psycho Delia - http://www.flickr.com/photos/24557420@N05/5588473657/ - CC BY-NC 2.0 Image by wdj(0) - http://www.flickr .com/photos/davidjoyner/534893725/ - CC BY-SA 2.0 Image by Symic - http://www.flickr.com/photos/symic/2870349309/ - CC BY-SA 2.0 Image by ~milj - http://www.flickr.com/photos/21989292@N07/4938052014/ - CC BY-NC-SA 2.0 Image by giniger - http://www.flickr.com/photos/7304492@N06/417304290/ - CC BY-NC-SA 2.0 Image by Libraryman - http://www.flickr.com/photos/libraryman/78337046/ - CC BY-NC-ND 2.0 Image by Dru! - http://www.flickr.com/photos/druclimb/470572647/ - CC BY-NC 2.0 Image by Muffet - http://www.flickr.com/photos/calliope/7102418379/ - CC BY 2.0

Editor's Notes

  • #6 One could argue that the ethos of open-ness is an idealistic notion. One could also argue that it is a cause worth pursuing
  • #7 The School of Data teaches journalists, researcher and analysts and others how to use data to pursue their mission – free online courses
  • #8 Austria, Brazil, Netherlands – national, regional and local open data portals
  • #12 The GNU General Public License (GNU GPL or GPL) is the most widely used free software license, which guarantees end users (individuals, organizations, companies) the freedoms to use, share (copy), and modify the software.
  • #13 44,000 historical maps of Scotland – county maps, town plans, admiralty charts (coastline), military maps, Historic OS series Plus 600 of Edinburgh and its environs Images, OCR text Creative Commons licences - IPR free - Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland Internet Archive team based at the National Library of Scotland for scanning the Scottish Post office Directories used in the project.
  • #14 GitHub, Freepository, Source Forge