1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

926 views

Published on

Authors: Sébastien Martin, Muriel Foulonneau, Slim Turki
Paris VIII University, France
Public Research Centre Henri Tudor, Luxembourg
http://link.springer.com/chapter/10.1007%2F978-3-319-03437-9_24

Presented during MTSR 2013 / 7th Metadata and Semantics Research Conference
http://mtsr2013.teithe.gr/

Abstract. The development of open data requires a better reusability of data. Indeed, the catalogs listing data dispersed in different countries have a crucial role. However, the degree of openness is also a key success factor for open data. In this paper, we study the PublicData.eu catalogue, which allows accessing open datasets from European countries and analyse the metadata recorded for each dataset. The objectives are to (i) identify the quality of a sample of metadata properties, which are critical to enable data reuse and to (ii) study the stated level of data openness. The study uses the Tim Berners-Lee’s five star evaluation scale.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
926
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The study uses the Tim Berners-Lee’s five star evaluation scale.
  • The one star openness level depends upon data licenses. Licensing information can be found in 10 distinct metadata properties, i.e., licence, License, licence_url, License_details, License_ID, License_summary, License_title, License_uri, License_url, and mandate.
  • The two star level depends upon the format in which the data is made available.
  • 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

    1. 1. 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe Sébastien Martin, Muriel Foulonneau, Slim Turki
    2. 2. Context & Objectives • • • • Level of reuse of open data is still disappointing. Development of open data requires a better reusability of data. Degree of openness is a key success factor. Catalogs listing data have a crucial role. Analyse PublicData.eu catalogue (i) identify the quality of a sample of metadata properties, which are critical to enable data reuse (ii) study the stated level of data openness. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 2
    3. 3. PublicData.eu • • Many local and national portals to provide access to public sector open datasets - 114 EU catalogues on datacatalogs.org Gather datasets across geographic and institutional boundaries PublicData.eu • • • • • • pan-European catalogue launched under the FP7 LOD2 project. aggregates data from CKAN open data catalogues all over Europe. collects data from 26 sources 1st to be published in Europe in 2011 data beyond the European Union, e.g., Serbian datasets. not exhaustive, it represents a unique aggregation of European datasets. • • 17.027 datasets UK: largest provider 21/11/2013 3
    4. 4. Methodology Descriptions of datasets collected in May 2013 236 distinct dataset properties identified, partially due to • • linguistic diversity; some providers adapt property names in their language problems of consistency in naming (upper / lower case, spaces / underscore for a single field). Major challenge to understand the content of the PublicData.eu Data collected and analysed to identify information made available on data openness and reusability in particular the licensing conditions and the data formats. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 4
    5. 5. Tim Berners-Lee’s evaluation scale ★ Available on the web (whatever format) but with an open license, to be Open Data ★★ Available as machine-readable structured data ★★★ 2 + non-proprietary format ★★★★ ★★★★★ 21/11/2013 3 + Use open standards from W3C (RDF and SPARQL) to identify things 4 + Link your data to other people’s data to provide context 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 5
    6. 6. ★ Data Licences 13.535 / 17.027 datasets have at least 1 license indication 12.470 datasets can be considered having some form of open license  73,24% 769 datasets have a Creative Commons license Significant number of datasets have a national license: • • • apie v2 to publish information created by French public authorities UK-crown which “covers material created by civil servants, ministers and government departments and agencies” in the UK, UK Open Government License 128 datasets with an explicitly closed license 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 6
    7. 7. ★★ Machine readable format • Facilitates data reusability • 4.051 / 17.027 with content_TYPE • 11.285 with at least one indication about format • 56 datasets in RDF • Dominant proportion of spreadsheets type’s formats Distribution of formats 40% not a machine readable format 34% of datasets available in a machine readable format  machine readability cond. for openness levels of 2★ and > 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 7
    8. 8. ★★★ Use of non-proprietary formats Creates ambiguities as the openness nature of formats can be debated in some cases: • • Certain formats are proprietary but their specifications are open. Some formats have been open at a certain point of time but additions and further evolutions remain proprietary In many cases, value of property was too vague to determine whether the format was or not proprietary. It was possible to identify: • • For 49% of the datasets, a non-proprietary format For 21% a proprietary format. Use of proprietary formats is a critical issue for improving the level of openness of datasets. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 8
    9. 9. ★★★★ Use of open standards from W3C Including HTML, XML, and RDF in particular. • XML-based formats may be entirely independent from W3C (e.g. KML) Availability in W3C standards: 9,5% of datasets Availability in XML based formats: 10% Information remains unknown in most cases 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 9
    10. 10. ★★★★★ Linked data Linked data are only mentioned in the description of a single dataset (Brandweer Amsterdam-Amstelland Uitrukberichten) for which the format is described as “linked data api, rdf json”. 58 datasets mention RDF (or RDFa) as a format or content type, i.e., 0,34%. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 10
    11. 11. Level of openness (1/2) 6.891 / 17.027 datasets show at least one information about their degree of openness. All come from Data.gov.uk (8 689 datasets) For a majority of datasets, the level of openness is unknown. • 21/11/2013 Coherent with lack of licensing information without which it is impossible to conclude on even ★ openness level. 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe Distribution of openness levels in UK datasets 11
    12. 12. Level of openness (2/2) Approximate level of openness derived from licensing and format properties • • 73,24% of the datasets should have ★ or above. Reference to 5★ should take into consideration linkages, cannot be inferred from dataset metadata. Level of openness according to Format and License related properties Data openness mainly related to 1st level of compliance: licensing issue. • 21/11/2013 Data providers have clearly not focused on publication of data in reusable formats. 1-5 stars: Metadata on the Openness Level of 12 Open Data Sets in Europe
    13. 13. Conclusion • Limited openness of datasets advertised as open data • Heterogeneity of associated metadata  Difficulty for reusers to (i) discover datasets, despite the creation of large catalogues of datasets, and to (ii) effectively reuse machine readable and contextualized data. ★ may be sufficient to ensure transparency of gov. action, facilitating reuse of data through services is not served below 2★ Confirmed risks regarding major challenges that data providers have to face: (i) language barrier and (ii) lack of consistency of metadata. Harmonization of practices, training and tools necessary to ensure that datasets are available in relevant formats. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 13
    14. 14. 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe Sébastien Martin, Muriel Foulonneau, Slim Turki Contact: muriel.foulonneau@tudor.lu

    ×