1) Search the supermarkets – the data catalogues & data stores In order to unlock the potential of digital public sector information, developers and other prospective users must be able to find datasets they are interested in reusing.
There are a growing number of data catalogues that bring together listings of published open data (and there are also now data marketplaces that can help you find commercially licensed data as well – so be sure to check the details of the data you find). Data catalogues often have a particular focus – and no one catalogue can tell you about all the data out there.
Guardian World Data Store makes it easy to search across a range of different government open data catalogues – browsing data by country and format. At World Government Data you can: • Search government data sites from the UK, USA, Australia, New Zealand and London (this comes under United Kingdom, if you want to browse) in one place and download the data (more sites to come)• Help us find the best dataset by ranking them• Collect similar datasets together from around the world• Browse all datasets by each country It's all been put together with the help of developer Ben Firshman and is the culmination of our year-long project to make data widely available to everyone. And, even better, we have an API available. Even though all of these government data sites have enormous quantities of data, they are not in the same formats. What we have done is put them into a unified form, meaning developers have the opportunity to write applications that compare data between different countries. If you want the data in Atom or JSON just change the "/search" to "/search.atom" or "/search.json" in the url. There will be full documentation on this soon. Watch this space. The whole project is only going to increase in size and scope. As Ben Fry has said: "This is only going one way: there is no trend towards less data"
Publicdata.eu is a new catalogue bringing together data from right across Europe.
Information about European public datasets is currently scattered across many different data catalogues, portals and websites in many different languages, implemented using many different technologies. PublicData.eu will provide a single point of access to open, freely reusable datasets from numerous national, regional and local public bodies throughout Europe. The kinds of information stored about public datasets may vary from country to country, and from registry to registry. PublicData.eu harvests and federates this information to enable users to search, query, process, cache and perform other automated tasks on the data from a single place. This helps to solve the “discoverability problem” of finding interesting data across many different government websites, at many different levels of government, and across the many governments in Europe.
Co-production - In addition to providing access to official information about datasets from public bodies, PublicData.eu will capture (proposed) edits, annotations, comments and uploads from the broader community of public data users. In this way, PublicData.eu will harness the social aspect of working with data to create opportunities for mass collaboration. For example, a web developer might download a dataset, convert it into a new format, upload it and add a link to the new version of the dataset for others to use. From fixing broken URLs or typos in descriptions to substantive comments or supplementary documentation about using the datasets, PublicData.eu will provide up to date information for data users, by data users.
opendatasearch.org - a global version of the publicdata.eu site a meta search engine for open data, An aggregator for datasets, providing a simple and unified search interface to all of the catalogues contained: Sunlight Foundation’s National Data Catalog (and with it a large number of US-based data sources), the World Bank data catalogue, Sweden’s DCat-enabled OpenGov.se and Nexedi’s Data Publica portal. By harvesting a growing set of existing dataset descriptions, we hope to gather a comprehensive picture of the dataset properties that are widely used and that should be represented in a common format. Our goal with this is to establish some degree of interoperability between different data catalogues. Adding support for more filter options, including licenses (and their compliance to open data principles), languages used in metadata and the data itself and geographic scopes of the collected information.
Specialist independents – data stores Where the supermarkets are stacking the datasets high, and sharing them free – there might be a specialist in your area of interest – working hard to source and bring together the finest data they can. Fortunately, most of them provide the data for free too.
Timetric.com are specialists when it comes to time series data. If you can plot it on a graph over time, chances are they’ve taken the dataset, tidied it up, and providing ways to search and browse for it – with csv spreadsheet downloads of the raw data. Timetric.com, is an aggregator of public statistical data and specialists when it comes to time series data. If you can plot it on a graph over time, chances are they’ve taken the dataset, tidied it up, and providing ways to search and browse for it – with csv spreadsheet downloads of the raw data. All of their services are built on top of the Timetric Platform, a proprietary service for publishing, analysing, and performing calculations on very large quantities of time-varying statistical data.
API is the main guts of it – interfaces with the guts of Scape Toad Allows upload and download of input and output files Website uses a wizard to guide user through creation process
Citizen Driven Catalogues: In addition to data catalogues created by government, there are a growing number of citizen-driven catalogues. Often these are created by open government data advocates to map out what information is available and how open it is (both legally and technically). There are several other independent catalogue projects around the world: Datadotgc.ca in Canada which includes nearly 700 datasets added by Canadian open government data advocates; The Offene Daten91 project in Germany which contains over 300 datasets added by the Open Data Network, a German open government data organisation; National Data Catalog92 in the US being developed by the non-governmental organisation the Sunlight Foundation. Drivers: This catalogue emerged from an increasing frustration that there was no strategy for or suitable open government data catalogue, similar to the US and UK. Members of the public may have more innovative ideas on how to present these datasets to users. For example, public bodies may not want citizens to edit, comment on, or add to official registries, but web services with these features may be more desirable for user communities, especially given current efforts to link government data to other sources of information (e.g. from international sources, research bodies, and so on) providing translation of metadata where necessary.
3) Foraging – searching for the data If the data you want isn’t available pre-packaged and catalogued, you might need to head out foraging across the Internet. There is a lot of open data in the wild – you just need to know how to spot it.
With increasing amounts of data available, it can still be hard to: - find the data you you want;- query a datasource to return just the data you want;- get the data from a datasource in a particular format;- convert data from one format to another (Excel to RDF, for example, or CSV to JSON);- get data into a representation that means it can be easily visualised using a pre-existing tool. GetTheData.org makes a great first port of call to see if other data-foragers have already found a good spot to get the data you are after. It’s a community website full of requests for data, and conversations about good places to find it. Plus, if your own foraging doesn’t turn up anything, you can come back and pose your question to the community here later. Search – Try searching the web for the topic you are interested in. Perhaps add ‘data’ as an extra key word. When you read news articles or web pages that appear to be based on data, take note of the names of the data sources they mention and plug that back into a search. Oftentimes that will lead you to some data you might be able to use. Think-tank websites, academic researcher web pages and even newspaper sites can all host lots of datasets. Just make sure you find out all you can about the provenance of the information before you use it! Deep searching – You can use a standard Google Search to look for data published in common office formats hosted on a particular web domain: your local council or university for example. All you need are two handy operators: The ‘site:’ operator on Google restricts searches to only show results from a particular domain; The ‘filetype:’ operator only returns files of a particular type. Using those together you can construct searches like ‘filetype:xls site:oxford.gov.uk’ to find all the Excel spreadsheets that Google has indexed on the Oxford City Council website.
It’s not uncommon to find the data you need… only it’s just out of reach.
Perhaps it’s in a table on a web page when you want it in the sort of table you can load into a spreadsheet to sort and chart.
Or it might be spread across lots of different web pages and files.
Or as a PDF or an image-only PDF Civic society developers and others seeking to reuse government information have developed various methods to extract structured data (or information in a database-like form that a computer can read) from sources which are more or less unstructured (such as government websites, PDF documents, and scanned documents). This involves identifying patterns in the unstructured sources (such as columns and rows in a budget document) and writing a computer program to reconstruct the underlying data sets on the basis of these patterns. This process, known as 'screen scraping' can be time-consuming and may often require a degree of technical ingenuity.
There are several projects that aim to make screen scraping more accessible to people without a technical background. An example is the ScraperWiki project. ScraperWiki is a web platform for collecting and publishing public data. It was started in 2008 by civic web developer Julian Todd, who has worked on various projects aiming to open up the activities of official bodies, including They Work for You, Public Whip, and UNdemocracy. ScraperWiki allows users to extract useful structured data from a wide variety of different kinds of online sources, from non-machine-readable PDF files, web pages, and web interfaces to large databases. ScraperWiki aims to make it easier for users to collaborate on the creation and maintenance of screen-scraping scripts, thus helping to ensure that clever pieces of code are shared and that scraped data is accurate and up-to-date.
Special order – FOI Perhaps you have found that no-one stocks the data you need – not even in places you can forage or scrump for it. If the data comes from a public body, then it might be time to explore putting in a special request for it using the Freedom of Information Act.
WhatDoTheyKnow.com is a service that makes it easy to submit a Freedom of Information Act request to a local authority, government department or other public body. You have a right to ask authorities for a copy of the information and data they hold, and you can ask for it to me returned as raw data. Search WhatDoTheyKnow to see if anyone has requested the data you want already, and if not, put in your request. (Often if data is available on WhatDoTheyKnow it will be locked up in PDFs. You might need to crowd-source the process of turning it into structured raw data, although there are a few tools and approachesthat might help turn PDFs into data programatically)
IsItOpenData.org provides a useful tool for asking non-public bodies to share their data as open data, or to clarify the licensing. Open data is data that can used, reused and redistributed without restriction other than (perhaps) the requirement to attribution or share-alike. A full definition of openness is available at http://www.opendefinition.org/1.0. Data is closed if, for example, it requires additional permission or payment for its reuse. In practice it is often unclear whether the data on publisher's websites is openly available i.e accessible and freely re-usable without additional permission. In some cases, publishers may even restrict access to data by adding specific terms and conditions. The first method is to look on the publisher's website for their terms and conditions regarding use of their data. This may categorically state that they reserve rights to all data, alternatively they may clearly apply an open knowledge license to their data (see http://www.opendefinition.org/licenses/ for a list of such licenses). If the situation is ambiguous or unclear somebody may have requested clarification in the past (please check the archive or previous requests before sending a new one). If not, an enquiry can be made through the website.
6) Home grown – research and crowdsourcing If you reflect on the 2 earlier examples, Dr Snow Map and Zanesville in Ohio, collect their own data. Some data simply doesn’t exist yet – but you can create a raw dataset through research, and through crowd-sourcing, inviting others to help you research. Simple spreadsheets - if you are systematically working through a research task, keep your results in a spreadsheet. See the section on raw data for ideas about how to structure it well. Google Forms - available through http://docs.google.com allows you to create an online form that anyone can fill in, with all the responses going direct into a spreadsheet for you to use. You might be able to get supporters to research for you and collaborative build up a useful dataset.
The following post is from Friedrich Lindenberg, who is a developer at the Open Knowledge Foundation working on CKAN, PublicData.eu and Open Spending. Recently, there has hardly been a week in which there hasn’t been an announcement of a new local, regional or national open data initiative – including ever more extensive catalogues of data that is being opened up (CKAN alone now runs in 20 or more places). While this is great news for those of us interested in re-using the data, it also means it becomes increasingly hard to keep a good overview of what kind of data are available for which places. To get a better overview we’ve now started a meta search engine for open data, opendatasearch.org. opendatasearch.org is a global version of the prototype publicdata.eu site we announced in January: it’s an aggregator for datasets, providing a simple and unified search interface to all of the catalogues contained. At the moment, this includes all known instances of the CKAN software, the Sunlight Foundation’s National Data Catalog (and with it a large number of US-based data sources), the World Bank data catalogue, Sweden’s DCat-enabled OpenGov.se and Nexedi’s Data Publica portal. We’ve also put up search.ckan.net which provides access to the combined index of all CKANs only. Behind the scenes, opendatasearch.org is web spider with a twist: all collected data is converted to DCat, DERI/W3C’s RDF-based ontology for dataset descriptions. While this convention is still in early development, it’s interesting to see how well different kinds of catalogues can be expressed in it already (the harvested data can be found here). By harvesting a growing set of existing dataset descriptions, we hope to gather a comprehensive picture of the dataset properties that are widely used and that should be represented in a common format. Our goal with this is to establish some degree of interoperability between different data catalogues, leading into a federated catalogue architecture for Europe and perhaps beyond. These standardization concerns aside, we want to make opendatasearch.org useful on its own. For the immediate future this means adding support for more filter options, including licenses (and their compliance to open data principles), languages used in metadata and the data itself and geographic scopes of the collected information. This, of course, is an open source development effort and we’d glad to welcome those interested in contributing comments, catalogue data or functionality on the ckan-discuss mailing list!
Openness isn’t an end in itself, it only represents the first step, data has no value if it isn’t used!
Global challenges facing us – we need to be more sustainable Encourage external innovation, utilising geographical data, for the benefit of both the economy, and for society. Encourage collaboration Promote the use of Ordnance Survey data Promote the brand of Ordnance Survey Understand new uses of our data and new markets
GeoVation, -Innovation challenges from Ordnance Survey Proven track record Started in 2009 6 challenges to date over 1700 have registered and 556 ideas submitted. First open challenge, then focused on specific needs within communities e.g. how can Britain feed itself, transforming neighborhoods, business environmental performance GeoVation also enables us to collaborate with other government departments who partner us in challenges – e.g. DfT, Technology Strategy Board, Welsh Government and the Environment Agency. Funding for winners can be from Ordnance Survey and / or partners. Pre launch we run a problem identifying workshop (powwow) with people with expertise in the theme of the challenge and the results of this frame the challenge. Ideas invited to solve the problems associated with the theme. Runs for 6 – 8 weeks Independent Judging panel select finalists to attend Camp to help develop ideas further Winners selected to receive funding to help them develop and launch their ideas
GeoVation Camps have been held at Ordnance Survey and in Cardiff
For offenders who are sentenced to Community Payback. Using OS MasterMap® Topography layer. Members of the public can nominate locations with pinpoint accuracy on a mobile phone. The app allows geo-tagged photographs to be taken and members of the public can send these to the Trust with additional information regarding sites they would like offenders to work on. Back in the office, The Trust pinpoints this location using OS MasterMap Topography data to build up a picture of potential sites, looking at offender locations and identifying the most appropriate site to develop.
Winners Carbon Prophet - mapping soil carbon to improve environmental performance. . Carbon Prophet also won the Community Award. GeoCraft. Enabling schools and local businesses to work together to encourage learning about sustainability through Minecraft. Using Ordnance Survey data, it will stimulate children to think about environmental challenges and ideas to solve these, which can be fed back to the local business to implement. Element Green Recycling Using mapping to show the location of business, local courier collection options and reprocessing companies, the idea will help business to achieve early separation of recyclable materials and realise the value of this in revenue. Streetkleen Bio Project. a practical solution based around the anaerobic digestion of dog waste to create usable energy (methane). Using the Streetkleen app, incidents of dog fouling can be photographed and reported. It can then be collected and taken to an anaerobic digester, where it will create methane to provide energy. there’s ongoing networking to engage new challenge partners. Building on the success of GeoVation – we want to continue to improve the process and experience to make it even better – such as starting to work on sponsorship packages and how we better support winners post funding to launch. An example is that we want to introduce - new ‘mentoring’ type events for GeoVation participants to help connect them with organisations who offer help to start-ups.
8661 data sets
Publisher: ONS 649
Creates and maintains the ‘master map’ of Great Britain from which others derive benefit Manages complete national large scale digital data down to building level detail Maintains a database of 460 million features with approximately 5,000 changes made daily
5,000 changes made daily
In 2009/10, 99.9% of real world features were represented in the database within six months of completion on the ground
There are27 million addresses in the database and we check 36 000 new addresses every month.
There are27 million addresses in the database and we check 36 000 new addresses every month.
Road Routing Information, such as bridge heights, weights and widths, mini roundabouts, turn restrictions, mandatory turns, one ways, vehicle and time-based restrictions
Launch of OS OpenData
Top line= Rasters- Miniscale, 250 000 Raster, OS VectorMap District, OS StreetView Middle=Vectors- Meridian 2, Strategi, Boundary Line, Land-Form PANORAMA Bottom = Point referencing- Code-Point OPEN over StreetView, 1;50 000 Gazetteer over VMD, New OS Terrain 50.
Free is not always open Open is not always free Analysis is not always easy Open Data is not always good
10k registered users
OS OpenSpace community tools
Os open data masterclass november 2013 v3.1
OS Open Data Masterclass
Ian Holt (@IanHolt)
Developer Programme Manager,
9.30 – Introduction
9.35 - OpenData presentation – Ian Holt, Developer Engagement Manager
10.00 – Introduction to OpenData Exercise
10.30 – OS OpenData exercises – based on a “Health and Wellbeing”
12.30 – Lunch
13.00 – Field Trip GB App
13.05 - Styling OS OpenData – A Cartography Masterclass
15.00 – OS OpenSpace introduction and exercises
16.30 - Close
To Create More Services
• Ian Holt
• Luke Hampson
• Jon Field
• Charlie Glynn
Today is all about:
• Hands-on use of Open Data.
• Appreciating the role geography can play in using Open
• Meeting and learning from each other.
Global ‘Open’ movement
Open Source; Open Access; Open
Data; Open Innovation
• Governments opening up; transparency and
accountability; better value in public services;
opportunities to realise significant economic benefits
• Internet business models based on ‘free’ and ‘open’
• Open systems are interoperable
• Open access publishing is anti-publisher monopoly
“In order to unlock the potential of digital public sector information,
developers and other prospective users must be able to find datasets
they are interested in reusing”
Sources of Open Data…
Sources based on information from Open Data Cookbook 2010
Citizen Driven Catalogues
(1) The Data Supermarkets ...
• Guardian World Data Store
• A project to make data widely
available to everyone.
• Data Catalogue:
• Search government data sites
• Ranking datasets
• Collect similar datasets together
from around the world.
• Browse all datasets by each
• API available
• A catalogue bringing together data from
right across Europe.
• A single point of access to open, freely
• Search, query, process, cache and
perform other automated tasks.
• A meta search engine for open data.
• Sunlight Foundation’s National Data
Catalog (and with it a large number of
US-based data sources), the World Bank
data catalogue and Sweden’s DCatenabled OpenGov.se
• Degree of interoperability
between different data catalogues.
UK Open Data – Local
Local data that impacts most of
us more directly.
New items of Local government
spending over £500 – council by
council from Jan 2011.
Openly Local is a new project
to develop an open and unified
way of accessing Local
Challenges - Information
• Ever more extensive catalogues of
data that are being opened up.
• A meta search engine for open
• A simple and unified search
interface to all the catalogues
Open Data is only the first step ...
Build it and they will come
“There is no real guarantee of openness and transparency in the mere fact
that some data are or became, somehow, available to the general public”
Create incentives to use the
• Global competitions
• Create innovate software applications for
development using WB data.
• Aim is to bring together software developers and
• Cash prizes available
“Opportunities: Government as a Platform”
Government collects, validates and releases
Citizens reformat, add value and create services
GeoVation® runs challenges to address specific
needs within communities, which may be satisfied in
part through the use of geography.
Innovation challenges from Ordnance Survey
6 challenges to date
GeoVation – Camp
67 teams have attended a GeoVation Camp
Innovation = Problem x Solution x Execution
Challenge 4 GeoVation winner:
Community Payback Visibility App
Staffordshire and West Midland Probation Trust
An app to help public nominate sites for
Community Payback projects
Challenge 5 GeoVation winner:
Real Food Wales – Mapkin.co.uk
Challenge 6: How can we help British Business
improve environmental performance?
6 March to 1 May 2013
10 invited to GeoVation
Camp - 21 -23 June
4 winners – share £101K
in funding to get started.
Element Green Recycling
Minecraft Map of Great Britain
Open Government Licence (OGL)
• In the past, licence variations were a significant barrier to
data re-publication - the lowest common denominator
• The existing Click-Use Licence for central government
("Crown") works is now being replaced by the OGL which
is based on the world-leading Creative Commons family of
• OGL allows anyone - businesses, individuals, charities and
community groups - to re-use public sector information
without having to pay or get permission
Data Formats and Frequency
data.gov.uk is not a data store, it does not hold datasets but aim to
• Provides a pointer to the resources
• A rich and consistent resource description framework to facilitate
data discovery and, more important, linkage
Available download formats
• Statistical reports (.doc/.rtf/.pdf)
• Data tables with raw or aggregate area statistics (.xls/.csv)
• Linked data, customised datasets created on-the-fly (.xml)
Normally only the latest edition of datasets/statistics is available so
regular revisits to the data sites is necessary for time-series data
Working with UK’s open geospatial data
Locate open map and administrative datasets
• Boundary and point data
• Road traffic counts
• Crime report figures
• School performance statistics
Mapping socio-economic attributes using GIS
tools e.g. ArcGIS, Quantum GIS
Visualising multiple area attributes spatially in
web mapping GeoCommons services
All of April 2010 a range of Ordnance Survey mapping data
• AsChange…Encourage Open Innovation
available for free
• Two aims: to foster innovation and encourage government
• Free for commercial and non-commercial use for anyone
wanting to build applications underpinned by geography
What is OS OpenData?
• OS Street View
• 1: 250 000 Colour Raster
• Meridian 2
• OS VectorMap District
• 1: 50 000 Gazetteer
• OS Locator
• Code-Point Open
• Terrain 50
- Provides a precise geographic location for each postcode unit in Great Britain.
- Contains postcodes, grid references, NHS® health & regional health authority
codes; administrative ward, district, county and country area codes.
- There are 1.7 million approx. postcode units in England, Scotland and Wales.
- Each postcode unit, e.g. SO16 4GU, contains an average of 15 addresses.
- a vector digital mapping product that is a complete set of local government
administrative boundaries and electoral boundaries used in local and general election
- specifically designed to show the area of each administrative or electoral boundary.
What is OS OpenSpace?
• It’s a free service that provides free access to mapping data
• Embed maps in any web site or web application
• All applications are free to consumers
• API based on OpenLayers and can be used to access OnDemand Service
• OS Street View
• OS Vector Map District
• 1:50 000 Scale Colour Raster
• 1:250 000 Scale Colour Raster
• Mini Scale (1:1 000 000)
• Overview of Great Britain
• Outline of Great Britain
• OS OpenSpace Boundaries
• Simplified version of Boundaryline
• 1:50 000 Scale Gazetteer
• Place names
• Map centre function
• Codepoint Open
• Postcode lookup service
• Map centre
The Ordnance Survey API
controls for :
- Data delivery
- Map location
- Gazetteer searches
- Pop-up / information
boxes to hold any
content or media
- Reading in data
formats / sources
In any symbol style
OS OpenSpace community tools