Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OSFair2017 Workshop | OpenDataMonitor


Published on

Dimitris Skoutas presents the OpenDataMonitor

Workshop title: Open Science Monitor

Workshop overview:
Which are the measurable components of Open Science? How do we build a trustworthy, global open science monitor? This workshop will discuss a potential framework to measure Open Science, including the path from the publishing of an open policy (registries of policies and how these are represented or machine read), to the use of open methodologies, and the opening up of research results, their recording and measurement.


Published in: Science
  • Login to see the comments

  • Be the first to like this

OSFair2017 Workshop | OpenDataMonitor

  1. 1. OpenDataMonitor Monitoring, Analysis and Visualisation of Open Data Catalogues, Hubs and Repositories Collaborative Project FP7-ICT-2013.4.3 SME initiative on analytics Dimitris Skoutas IMIS, R.C. Athena Open Science Fair 2017 7/9/2017, Athens
  2. 2. 2 The OpenDataMonitor Consortium
  3. 3. Landscape & Challenges 3 In a nutshell… Numerous organisations and public bodies already publish open data or currently start to do so but use different systems and vocabularies. Open data of relevant stakeholders is stored either on a local, regional, national or pan-European level. Entered metadata is often incomplete or inaccurate. On all levels the open data situation appears to be very fractured and hard to monitor. This leads to planning problems and raises questions such as: ▪ Which catalogues/datasets are available? ▪ Quality of available resources? ▪ Which gaps reduce re-use of open data?
  4. 4. OpenDataMonitor 1st Review Meeting 4 Stakeholders Survey Research question • What roles do stakeholders in the open data ecosystem play and what are their interests in open data? Participant selection • Two-fold strategy: • contact information extracted from open data portals across Europe: 1. downloaded the metadata of more than 10,000 open data sets stored in about 50 open data repositories of 18 European countries 2. extracted email addresses from these metadata 3. used those addresses to invite their owners to the survey • partners forwarded the survey invitation through their mailing lists, newsletters, posted Facebook messages and Twitter tweets via the OpenDataMonitor project
  5. 5. OpenDataMonitor 1st Review Meeting 5 Participant Statistics Participant types • 63% of all participants came from Germany, Spain and the United Kingdom
  6. 6. 6 Q: "Please indicate the extent to which each of the following issues influence your company’s decision to use open data." Survey Results
  7. 7. OpenDataMonitor 1st Review Meeting 7 Survey Results Differences in regard to what content interests stakeholders • “Transport and Traffic”, “Environment and Climate”, “Finance and Budget” attract the highest interest • Stakeholders distinguish noticeably between the different kinds of open data
  8. 8. OpenDataMonitor 1st Review Meeting 8 Survey Results Topical interests vary between stakeholder groups: • self-identified activists more concerned about data that is consistent with the FOI/transparency-tradition (e.g. Politics and Elections-data, Public Sector-data) • stakeholders in public administration favour data that is less politicised
  9. 9. OpenDataMonitor 1st Review Meeting 9 Survey Results Stakeholders at the policy-level: • Businesses, politicians and public managers are perceived as least supportive • Activists rank highest as supporters
  10. 10. 10 “We take data from over 330 different publishers [...] not one of them does the same thing as the next one and most of them don't do the same thing month to month. I’ve got 170 councils in our dataset [...] Some publish virtually nothing, some publish a lot. The variance in quality of the data is incredibly difficult. Data quality is a big issue.” - Ian Makgill, Spend Network Survey Results
  11. 11. 11 “..If there’s too much [data], you can’t find it. There [are] different places to find different bits, you've got, you've got all these different websites, all these different agencies. They’ve all grown organically and separate from each other and I know everybody would love one enormous data place where you go to get all your data.” - Rod Plummer, Shoothill Survey Results
  12. 12. 12 Statistics of Monitored Catalogues 31 Countries 173 Data Catalogues 213,730 Datasets Harvested 158,165 Unique Datasets 588,303 Total Distributions 1,400+ GB Total Size Distribution 12,523 Unique Publishers (Organisations and Aut
  13. 13. • Open licence: total count of open licences over total count of distributions with a licence • Machine readable: what portion of datasets are provided in a machine readable format • Open formats: the portion of dataset distributions with a non-proprietary format • Metadata completeness: the frequency of missing metadata for each attribute • Availability: portion of datasets without broken links • Discoverability: an estimation of how important a catalogue is in the web based on two traffic ranking systems: Google and Alexa. 13 Qualitative Metrics
  14. 14. • Number of datasets: the total number of available datasets in a catalogue • Total distribution size: the total size of all resources, regardless of their format, for every dataset in a catalogue • Number of distributions: the average number of distributions per datasets • Number of unique publishers: the total number of unique publishing organisations of a specific catalogue • Number of catalogues: number of catalogues harvested per country 14 Quantitative Metrics
  15. 15. OpenDataMonitor 1st Review Meeting 15 Results
  16. 16. Technical Perspective - The ODM System The ODM system consists of two main parts: 16 Metadata collection and processing: • collects metadata from open data catalogues • performs metadata cleaning and harmonization • computes metrics and provides results via an API Demonstration site: • visualises results for monitoring • allows for search and browsing • produces charts and reports • includes information about methodology and usage
  17. 17. Indicative Statistics 1717 Catalogues covered per type of harvester
  18. 18. Thank you! Questions 1818