Deirdre Lee
@deirdrelee
@derilinx
www.derilinx.com
Complying with the EC Open Data Directive
24th March 2021
2
About
Derilinx leads Linked and Open Data initiatives,
providing fully hosted and managed data
solutions.
Directive on Open Data and
Re-Use of Public Sector Information
• Stimulate the publication of dynamic data and the uptake of Application
Programme Interfaces (APIs);
• Limit the exceptions that now enable public bodies to charge more than marginal
costs of dissemination for data re-use;
• Extend the scope of the directive to include data held by public undertakings,
under a specific set of rules, and research data resulting from public funding; and
• Strengthen the transparency requirements for agreements involving public sector
information between public and private parties, thereby avoiding exclusive deals.
• Require the publication of a list of high-value datasets to be provided free of
charge.
3
Where do I start?
4
1. What data do we manage?
2. Which data should we publish as
Open Data?
3. How can we make data available as
Open Data?
What data do we manage?
5
What data do we manage?
6
We don’t manage any data
It’s already published on our website
It’s too big
The data is of poor quality
It’s not very interesting
The data is domain-specific
People will misinterpret the data
It’s available in reports
What data do we manage?
7
https://www.tusla.ie/publications/
8
Identify Datasets through a Data Audit
A Data Audit helps Public Bodies compile a comprehensive inventory of datasets they
manage, and classify if these datasets are suitable for data sharing or publication as
Open Data.
(Also useful for GDPR and Public Service Data Catalogue)
https://datacatalogue.gov.ie/
Identify Datasets through a Data Audit
• Name of the dataset
• Description of the dataset
• Dataset manager(s)
• Link to dataset
• Classification
• Format
• Publication frequency
• Demand for this dataset
• General Comments
9
Open Data Audit Tool
10
http://audit.data.gov.ie/
A Data Audit results in a data inventory
11
Which data should we publish as Open Data?
12
Which data should we publish as Open Data?
13
High-Value
Sustainable Low-Hanging Fruit
High-Value Datasets
• Have a high commercial potential and can speed up the emergence of value-
added EU-wide information products. They will also serve as key data sources for
the development of Artificial Intelligence.
• They are subject to a separate set of rules ensuring their availability free of charge,
in machine readable formats, provided via APIs and, where relevant, as bulk
download.
14
• Defined as documents the re-use of which is associated with
important benefits for the society and economy.
High-Value Datasets (HVDs)
15
• Geospatial
– e.g. national and local maps, postcodes
• Statistics
– e.g. demographic and economic indicators
• Companies and company ownership
– e.g. business registers and registration identifiers
• Mobility
– e.g. road signs and inland waterways
• Earth observation and environment
– e.g. energy consumption and satellite images
• Meteorological
– e.g. in situ data from instruments and weather forecasts
HVDs – Is there a Demand for this information?
16
➢ From colleagues
➢ From other Public Bodies
➢ From researchers
➢ From the general public
➢ Via FOIs
➢ Via PQs
Sustainable Dataset
Can we continue to support the publication of this dataset on an ongoing
basis?
• Are individuals a bottleneck?
• Is the publication process repeatable?
• Can publishing be automated?
• Can a harvester be built to automate publication?
17
Sustainable Dataset
18
Low-Hanging Fruit
19
Start publishing the easier, non-controversial datasets,
e.g. service information, data that is already available in non machine-readable
formats, etc.
People will misinterpret the data
I don’t mind, but someone else might
We’ll get spam
The data is of poor quality
Which data should we publish as Open Data?
20
High-Value Sustainable
Data Inventory
21
How can we make data available as Open Data?
22
23
People need raw data to integrate, visualize, analyse,
plan, decide, innovate, build, etc.
For example, publication of COVID19 datasets
24
https://data.gov.ie/dataset?q=covid
https://covid19ireland-geohive.hub.arcgis.com/
25
https://public.flourish.studio/story/279636/
https://twitter.com/seanmulvany/status/1316466663282405377
https://twitter.com/DrRuairiOReilly/status/1331998933246234629
https://www.researchgate.net/publication/342866515_Analysis_of_Possible_Excess_COVID-19_Deaths_in_Ireland
https://www.irishtimes.com/business/technology/coronavirus-pandemic-moves-data-analytics-front-and-centre-1.4430982?mode=amp
Where is your data currently stored/managed?
26
• Spreadsheets on your computer
• On a shared workspace
• On a file server
• In a specific system
• On your website
Make data available online in open, machine-readable formats
27
General
• CSV
• JSON
• XML
• ODF
• RDF
• TSV
Geospatial
• GeoJSON
• GML
• KML
• WKT
• LAS
• IFC
• Shapefile
• WMS
Domain-Specific
• PX
• JSON-stat
• NetCDF
• BUFR
• Datex II
• GTFS
• HDF5
• GRIB
28
Source Systems
Publish Raw Data Online
Publish Metadata on
Website Data Catalogue File Server API
Building an Application Programming Interface (API)
29
✓ Integration
✓ Flexibility
✓ Best option for dynamic data
✓ Application development ❖ Upfront effort
How to be discoverable on ?
30
Manual Publication of Data
31
✓ Good for once-off publication
✓ Suitable for linking to an API endpoint
❖ Not sustainable in long-term
Be harvested by
32
✓ Automated
✓ Keeps data up-to-date
✓ Harvester built by Derilinx
Where do I start?
33
1. What data do we manage?
– Carry out a data-audit
2. Which data should we publish as Open Data?
– High-value / sustainable / low-hanging fruit
3. How can we make data available as Open Data?
– Publish online in open, machine-readable formats
4. How to be discoverable on ?
– Manually / via harvester

Complying with the EC Open Data Directive

  • 1.
  • 2.
    2 About Derilinx leads Linkedand Open Data initiatives, providing fully hosted and managed data solutions.
  • 3.
    Directive on OpenData and Re-Use of Public Sector Information • Stimulate the publication of dynamic data and the uptake of Application Programme Interfaces (APIs); • Limit the exceptions that now enable public bodies to charge more than marginal costs of dissemination for data re-use; • Extend the scope of the directive to include data held by public undertakings, under a specific set of rules, and research data resulting from public funding; and • Strengthen the transparency requirements for agreements involving public sector information between public and private parties, thereby avoiding exclusive deals. • Require the publication of a list of high-value datasets to be provided free of charge. 3
  • 4.
    Where do Istart? 4 1. What data do we manage? 2. Which data should we publish as Open Data? 3. How can we make data available as Open Data?
  • 5.
    What data dowe manage? 5
  • 6.
    What data dowe manage? 6 We don’t manage any data It’s already published on our website It’s too big The data is of poor quality It’s not very interesting The data is domain-specific People will misinterpret the data It’s available in reports
  • 7.
    What data dowe manage? 7 https://www.tusla.ie/publications/
  • 8.
    8 Identify Datasets througha Data Audit A Data Audit helps Public Bodies compile a comprehensive inventory of datasets they manage, and classify if these datasets are suitable for data sharing or publication as Open Data. (Also useful for GDPR and Public Service Data Catalogue) https://datacatalogue.gov.ie/
  • 9.
    Identify Datasets througha Data Audit • Name of the dataset • Description of the dataset • Dataset manager(s) • Link to dataset • Classification • Format • Publication frequency • Demand for this dataset • General Comments 9
  • 10.
    Open Data AuditTool 10 http://audit.data.gov.ie/
  • 11.
    A Data Auditresults in a data inventory 11
  • 12.
    Which data shouldwe publish as Open Data? 12
  • 13.
    Which data shouldwe publish as Open Data? 13 High-Value Sustainable Low-Hanging Fruit
  • 14.
    High-Value Datasets • Havea high commercial potential and can speed up the emergence of value- added EU-wide information products. They will also serve as key data sources for the development of Artificial Intelligence. • They are subject to a separate set of rules ensuring their availability free of charge, in machine readable formats, provided via APIs and, where relevant, as bulk download. 14 • Defined as documents the re-use of which is associated with important benefits for the society and economy.
  • 15.
    High-Value Datasets (HVDs) 15 •Geospatial – e.g. national and local maps, postcodes • Statistics – e.g. demographic and economic indicators • Companies and company ownership – e.g. business registers and registration identifiers • Mobility – e.g. road signs and inland waterways • Earth observation and environment – e.g. energy consumption and satellite images • Meteorological – e.g. in situ data from instruments and weather forecasts
  • 16.
    HVDs – Isthere a Demand for this information? 16 ➢ From colleagues ➢ From other Public Bodies ➢ From researchers ➢ From the general public ➢ Via FOIs ➢ Via PQs
  • 17.
    Sustainable Dataset Can wecontinue to support the publication of this dataset on an ongoing basis? • Are individuals a bottleneck? • Is the publication process repeatable? • Can publishing be automated? • Can a harvester be built to automate publication? 17
  • 18.
  • 19.
    Low-Hanging Fruit 19 Start publishingthe easier, non-controversial datasets, e.g. service information, data that is already available in non machine-readable formats, etc. People will misinterpret the data I don’t mind, but someone else might We’ll get spam The data is of poor quality
  • 20.
    Which data shouldwe publish as Open Data? 20 High-Value Sustainable
  • 21.
  • 22.
    How can wemake data available as Open Data? 22
  • 23.
    23 People need rawdata to integrate, visualize, analyse, plan, decide, innovate, build, etc.
  • 24.
    For example, publicationof COVID19 datasets 24 https://data.gov.ie/dataset?q=covid https://covid19ireland-geohive.hub.arcgis.com/
  • 25.
  • 26.
    Where is yourdata currently stored/managed? 26 • Spreadsheets on your computer • On a shared workspace • On a file server • In a specific system • On your website
  • 27.
    Make data availableonline in open, machine-readable formats 27 General • CSV • JSON • XML • ODF • RDF • TSV Geospatial • GeoJSON • GML • KML • WKT • LAS • IFC • Shapefile • WMS Domain-Specific • PX • JSON-stat • NetCDF • BUFR • Datex II • GTFS • HDF5 • GRIB
  • 28.
    28 Source Systems Publish RawData Online Publish Metadata on Website Data Catalogue File Server API
  • 29.
    Building an ApplicationProgramming Interface (API) 29 ✓ Integration ✓ Flexibility ✓ Best option for dynamic data ✓ Application development ❖ Upfront effort
  • 30.
    How to bediscoverable on ? 30
  • 31.
    Manual Publication ofData 31 ✓ Good for once-off publication ✓ Suitable for linking to an API endpoint ❖ Not sustainable in long-term
  • 32.
    Be harvested by 32 ✓Automated ✓ Keeps data up-to-date ✓ Harvester built by Derilinx
  • 33.
    Where do Istart? 33 1. What data do we manage? – Carry out a data-audit 2. Which data should we publish as Open Data? – High-value / sustainable / low-hanging fruit 3. How can we make data available as Open Data? – Publish online in open, machine-readable formats 4. How to be discoverable on ? – Manually / via harvester