SlideShare a Scribd company logo
1 of 19
Download to read offline
Johann Höchtl, Danube University Krems, Austria
Institutionalising Open Data Quality:
Processes, Standards, Tools
Open Data Quality: from Theory to Practice
2
1. Assess data quality
3
What is Data Quality?
http://opendata.stackexchange.com/questions/613/what-are-the-data-quality-measures-for-open-data
4
A: Measures towards Trust
1.Establish quantitative measures
2.Provide statistics
3.Show-case lighthouse projects and business use
Trust
Quantity research
Evaluation Community involvement /
management
5
2. Solve current problems
6
● Inconsistent encoding
– Microsoft Excel caused data problems even when used […]
UTF-8
– Data contaminated with characters incomprehensible to
UTF-8; ill-formatted following UTF-8; flipped erratically
between other character formats; used US ASCII standard,
ISO-8859 standard and a similar non-ISO encoding
● Inconsistent dates, file names, data fields
– Data were regularly formatted with commas; changed its
filename convention; omitted or added data fields; changed
the way it formatted dates
http://www.computerweekly.com/news/2240227682/Poor-data-quality-hindering-government-open-data-transparency-programme
Mundane problems -
Encodings & Formats
7
Mundane problems –
Broken Links
http://thomaslevine.com/!/data-catalog-dead-links/
http://openstate.eu/2014/06/nederlands-nauwelijks-nieuwe-Datasets-op-data-overheid-nl/
City of Vienna – Resource check
8
B: Measures towards Open Data
Quality: Process Domain
● Data publication must be made an integral, well- defined
and standardized part of daily procedures and routines
– A. Zuiderwijk, M. Janssen, S. Choenni, and R. Meijer, “Design principles for improving the process of publishing open data,”
Transforming Government: People, Process and Policy, vol. 8, no. 2, pp. 185–204, 2014.
● Process model in which open data serves as a facilitator
towards open government
– G. Lee and Y. H. Kwak, “An Open Government Implementation Model: Moving to Increased Public Engagement,” IBM Center
for The Business of Government, Jan. 2011 [Online]. Available:
http://www.businessofgovernment.org/sites/default/files/An%20Open%20Government%20Implementation%20Model.pdf
● Establish a Chief Data Officer
– Y. Lee, “A cubic framework for the chief data officer : succeeding in a world of big data,” 2014.
9
B: Measures towards Open Data
Quality: Standards Domain
● Data on the Web
– Data on the Web Best Practices Working Group Charter
http://www.w3.org/2013/05/odbp-charter.html
– Encodings: UTF8
● File formats
– CSV: CSV on the Web Working Group
http://www.w3.org/2013/csvw/wiki/Main_Page
– Frictionless open Data: CSV Files (OKFN guidance document)
http://data.okfn.org/doc/csv
● Data entities
– Geo-Data: Spatial Data on the Web Working Group Charter
http://www.w3.org/2015/spatial/charter
– Date & Time: ISO 8601 http://www.w3.org/TR/NOTE-datetime
10
B: Measures towards Open Data
Quality: Tools Domain
● Identify Problems
● Curate File Formats & Encodings
https://github.com/ckan/ideas-and-roadmap/issues/65
T. Levine, “How can we figure out what is inside thousands of spreadsheets?,”
CEUR workshop proceedings, vol. 1209, pp. 34–38, Jul. 2014.
http://ceur-ws.org/Vol-1209/paper_12.pdf
11
Measures Towards Open Data Quality
Processes
Tools
ISOStandards
12
Open Data Quality at the
European Open Data Portal
● A.6. Mechanisms for probing broken links
The portal infrastructure will include a mechanism for
systematically probing for broken links. […] The contractor will
define and implement a communication protocol to alert the
owner of the resource.
● A.8. Mechanism allowing data linking
When RDF, * record a link between datasets that use the same
URIs; * propose a mapping between URIs that are likely to
denote the same entities
● B.6. User feedback mechanism
Allowing visitors […] suggestions for improvements in the data
quality
13
Open Data Quality in Austria
● Cooperation OGD Austria represents administration
open data portal operators
– Defines standards and procedures
– Aligned with International, European and D-A-CH efforts
● Institutionalising effort by
Sub-Working Group of Cooperation OGD Austria
Licenses
Metadata
Open Documents
Quality
Linked Data
Cooperation
14
2
check
Portal betrieben von Provider
Data producer
Data consumer
publishes
obtains
Data
references
produces
Monitor
3
checks
checks
1
4
provides
Community-Portal
Data portal
operated by
improves
5
delivers
improvesinformes
Open Data Quality
Integration Framework
1.Quality processes and
procedure models to assess
and publish data
2.Contributions of the Open
Data users
3.Quality checks when entering
(meta-)data descriptions at
the data portal
4.Monitoring of data quality
over time
5.Community-driven data portal
with user-generated content,
e.g. enrich metadata,
alternative data formats, etc.
Partners
Project duration: 30 months
o Start: October 2015
o End: March 2018
Semantic Web Company Danube University Krems Vienna University of Economics and
Business
Improving Data Quality in Open Data
ADEQUATe
• Data contaminated
with characters
incomprehensible to
UTF-8; flipped
erratically between
other character
formats;
• Data were regularly
formatted with
commas; changed its
filename convention;
A D E Q U A T e
D a t a Q u a l i t y D o m a i n s & C h a l l e n g e s
Address Data Quality in
three domains:
1. Tools
2. Standards
3. Processes
HowWhy
Easy to check
Availability, conformance,
processability, timeliness,
representational-consistency;
interlinking, conformance to
vocabularies, provenance
(Linked Data)
Hard to assess
Completeness, consistency,
accuracy, credibility, relevance
What
ADEQUATe: GOALS
Improving Data Quality on Open Data
1. Quality measures
2. Evolution monitor
3. Quality improvement through
o Algorithms
o Data linkage
o Crowdsourcing
Use cases
ADEQUATe: GOALS
M24
Refinements
M18
Quality improvements Use case connection
M12
Quality monitoring framework Data linkage
M8
Architecture Blueprint
M6
Quality metrics Requirements
ADEQUATe: GOALSADEQUATe: Milestones
19
Donau-Universität Krems.
Die Universität für Weiterbildung.
Johann Höchtl
Center for E-Governance
Johann.hoechtl@donau-uni.ac.at
@myprivate42
at.linkedin.com/in/johannhoechtl
CC-BY 3.0

More Related Content

Viewers also liked

Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
Nuffield Trust
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
emmanuel_jamin
 
Rigor and relevance ppt
Rigor and relevance pptRigor and relevance ppt
Rigor and relevance ppt
deborahsutton
 

Viewers also liked (20)

Enterprise Data World Webinars: Data Quality for Data Modelers
Enterprise Data World Webinars: Data Quality for Data ModelersEnterprise Data World Webinars: Data Quality for Data Modelers
Enterprise Data World Webinars: Data Quality for Data Modelers
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data Quality
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
 
Open Data for Social Good
Open Data for Social GoodOpen Data for Social Good
Open Data for Social Good
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Quality Metrics for Linked Open Data
Quality Metrics for  Linked Open Data Quality Metrics for  Linked Open Data
Quality Metrics for Linked Open Data
 
Rigor and relevance ppt
Rigor and relevance pptRigor and relevance ppt
Rigor and relevance ppt
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Open data quality
Open data qualityOpen data quality
Open data quality
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...
 

Similar to Institutionalising open data quality - Processes Standards, Tools

Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
madynav
 
U mass data-project proposal-noapp
U mass data-project proposal-noappU mass data-project proposal-noapp
U mass data-project proposal-noapp
Daisy LaFlamme
 

Similar to Institutionalising open data quality - Processes Standards, Tools (20)

ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
 
ALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and Tools
 
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
 
U mass data-project proposal-noapp
U mass data-project proposal-noappU mass data-project proposal-noapp
U mass data-project proposal-noapp
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
 
WEBINAR: Open Research Data in Horizon 2020
WEBINAR: Open Research Data in Horizon 2020WEBINAR: Open Research Data in Horizon 2020
WEBINAR: Open Research Data in Horizon 2020
 
H2020 Open Data Pilot
H2020 Open Data PilotH2020 Open Data Pilot
H2020 Open Data Pilot
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
 
Open Research Data & H2020
Open Research Data & H2020Open Research Data & H2020
Open Research Data & H2020
 
HKU Data Curation MLIM7350 Student Project: Data Curation Workshop
HKU Data Curation MLIM7350 Student Project: Data Curation WorkshopHKU Data Curation MLIM7350 Student Project: Data Curation Workshop
HKU Data Curation MLIM7350 Student Project: Data Curation Workshop
 
H2020 data pilot openaire
H2020 data pilot openaireH2020 data pilot openaire
H2020 data pilot openaire
 
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 
Palantir learning for begineers to understand basics
Palantir learning for begineers to understand basicsPalantir learning for begineers to understand basics
Palantir learning for begineers to understand basics
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 

More from Johann Höchtl

Smart Cities, Smart Regions and the Role of Open Data
Smart Cities, Smart Regions and the Role of Open DataSmart Cities, Smart Regions and the Role of Open Data
Smart Cities, Smart Regions and the Role of Open Data
Johann Höchtl
 
DCAT-Application Profile for Data Providers
DCAT-Application Profile for Data ProvidersDCAT-Application Profile for Data Providers
DCAT-Application Profile for Data Providers
Johann Höchtl
 

More from Johann Höchtl (20)

Homomorphic encryption on Blockchain Principles
Homomorphic encryption on Blockchain PrinciplesHomomorphic encryption on Blockchain Principles
Homomorphic encryption on Blockchain Principles
 
Performance-indicator based policy-making in Austria
Performance-indicator based policy-making in AustriaPerformance-indicator based policy-making in Austria
Performance-indicator based policy-making in Austria
 
Datenqualität auf Offenen Datenportalen
Datenqualität auf Offenen DatenportalenDatenqualität auf Offenen Datenportalen
Datenqualität auf Offenen Datenportalen
 
ADV FIWARE Workshop starring Docker and Virtualisation
ADV FIWARE Workshop starring Docker and VirtualisationADV FIWARE Workshop starring Docker and Virtualisation
ADV FIWARE Workshop starring Docker and Virtualisation
 
Yound Coders Festival
Yound Coders FestivalYound Coders Festival
Yound Coders Festival
 
Sind wir schon da?!
Sind wir schon da?!Sind wir schon da?!
Sind wir schon da?!
 
Offener Haushalt – Transparenz in öffentlichen Haushalten
Offener Haushalt – Transparenz in öffentlichen HaushaltenOffener Haushalt – Transparenz in öffentlichen Haushalten
Offener Haushalt – Transparenz in öffentlichen Haushalten
 
Datenqualität von Datenportalen
Datenqualität von DatenportalenDatenqualität von Datenportalen
Datenqualität von Datenportalen
 
Open Government Data & offene Wirtschaftsdaten - Two of a Kind?
Open Government Data & offene Wirtschaftsdaten - Two of a Kind?Open Government Data & offene Wirtschaftsdaten - Two of a Kind?
Open Government Data & offene Wirtschaftsdaten - Two of a Kind?
 
Elektronische Literaturverwaltung mit Zotero
Elektronische Literaturverwaltung mit ZoteroElektronische Literaturverwaltung mit Zotero
Elektronische Literaturverwaltung mit Zotero
 
The Case of opendataportal.at
The Case of opendataportal.atThe Case of opendataportal.at
The Case of opendataportal.at
 
From E-Government to Open Government
From E-Government to Open GovernmentFrom E-Government to Open Government
From E-Government to Open Government
 
Smart Cities and Smart ICT
Smart Cities and Smart ICTSmart Cities and Smart ICT
Smart Cities and Smart ICT
 
Evaluation of Open Government Data Implementation of City of Vienna
Evaluation of Open Government Data Implementation of City of ViennaEvaluation of Open Government Data Implementation of City of Vienna
Evaluation of Open Government Data Implementation of City of Vienna
 
Costs of Closed Science
Costs of Closed ScienceCosts of Closed Science
Costs of Closed Science
 
Smart Cities, Smart Regions and the Role of Open Data
Smart Cities, Smart Regions and the Role of Open DataSmart Cities, Smart Regions and the Role of Open Data
Smart Cities, Smart Regions and the Role of Open Data
 
OGD for Culture and Art
OGD for Culture and ArtOGD for Culture and Art
OGD for Culture and Art
 
Evaluierung der Open Government Data Umsetzung der Stadt Wien - Auszug
Evaluierung der Open Government Data Umsetzung der Stadt Wien - AuszugEvaluierung der Open Government Data Umsetzung der Stadt Wien - Auszug
Evaluierung der Open Government Data Umsetzung der Stadt Wien - Auszug
 
Open Government Data DCAT Application Profile
Open Government Data DCAT Application ProfileOpen Government Data DCAT Application Profile
Open Government Data DCAT Application Profile
 
DCAT-Application Profile for Data Providers
DCAT-Application Profile for Data ProvidersDCAT-Application Profile for Data Providers
DCAT-Application Profile for Data Providers
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 

Institutionalising open data quality - Processes Standards, Tools

  • 1. Johann Höchtl, Danube University Krems, Austria Institutionalising Open Data Quality: Processes, Standards, Tools Open Data Quality: from Theory to Practice
  • 3. 3 What is Data Quality? http://opendata.stackexchange.com/questions/613/what-are-the-data-quality-measures-for-open-data
  • 4. 4 A: Measures towards Trust 1.Establish quantitative measures 2.Provide statistics 3.Show-case lighthouse projects and business use Trust Quantity research Evaluation Community involvement / management
  • 6. 6 ● Inconsistent encoding – Microsoft Excel caused data problems even when used […] UTF-8 – Data contaminated with characters incomprehensible to UTF-8; ill-formatted following UTF-8; flipped erratically between other character formats; used US ASCII standard, ISO-8859 standard and a similar non-ISO encoding ● Inconsistent dates, file names, data fields – Data were regularly formatted with commas; changed its filename convention; omitted or added data fields; changed the way it formatted dates http://www.computerweekly.com/news/2240227682/Poor-data-quality-hindering-government-open-data-transparency-programme Mundane problems - Encodings & Formats
  • 7. 7 Mundane problems – Broken Links http://thomaslevine.com/!/data-catalog-dead-links/ http://openstate.eu/2014/06/nederlands-nauwelijks-nieuwe-Datasets-op-data-overheid-nl/ City of Vienna – Resource check
  • 8. 8 B: Measures towards Open Data Quality: Process Domain ● Data publication must be made an integral, well- defined and standardized part of daily procedures and routines – A. Zuiderwijk, M. Janssen, S. Choenni, and R. Meijer, “Design principles for improving the process of publishing open data,” Transforming Government: People, Process and Policy, vol. 8, no. 2, pp. 185–204, 2014. ● Process model in which open data serves as a facilitator towards open government – G. Lee and Y. H. Kwak, “An Open Government Implementation Model: Moving to Increased Public Engagement,” IBM Center for The Business of Government, Jan. 2011 [Online]. Available: http://www.businessofgovernment.org/sites/default/files/An%20Open%20Government%20Implementation%20Model.pdf ● Establish a Chief Data Officer – Y. Lee, “A cubic framework for the chief data officer : succeeding in a world of big data,” 2014.
  • 9. 9 B: Measures towards Open Data Quality: Standards Domain ● Data on the Web – Data on the Web Best Practices Working Group Charter http://www.w3.org/2013/05/odbp-charter.html – Encodings: UTF8 ● File formats – CSV: CSV on the Web Working Group http://www.w3.org/2013/csvw/wiki/Main_Page – Frictionless open Data: CSV Files (OKFN guidance document) http://data.okfn.org/doc/csv ● Data entities – Geo-Data: Spatial Data on the Web Working Group Charter http://www.w3.org/2015/spatial/charter – Date & Time: ISO 8601 http://www.w3.org/TR/NOTE-datetime
  • 10. 10 B: Measures towards Open Data Quality: Tools Domain ● Identify Problems ● Curate File Formats & Encodings https://github.com/ckan/ideas-and-roadmap/issues/65 T. Levine, “How can we figure out what is inside thousands of spreadsheets?,” CEUR workshop proceedings, vol. 1209, pp. 34–38, Jul. 2014. http://ceur-ws.org/Vol-1209/paper_12.pdf
  • 11. 11 Measures Towards Open Data Quality Processes Tools ISOStandards
  • 12. 12 Open Data Quality at the European Open Data Portal ● A.6. Mechanisms for probing broken links The portal infrastructure will include a mechanism for systematically probing for broken links. […] The contractor will define and implement a communication protocol to alert the owner of the resource. ● A.8. Mechanism allowing data linking When RDF, * record a link between datasets that use the same URIs; * propose a mapping between URIs that are likely to denote the same entities ● B.6. User feedback mechanism Allowing visitors […] suggestions for improvements in the data quality
  • 13. 13 Open Data Quality in Austria ● Cooperation OGD Austria represents administration open data portal operators – Defines standards and procedures – Aligned with International, European and D-A-CH efforts ● Institutionalising effort by Sub-Working Group of Cooperation OGD Austria Licenses Metadata Open Documents Quality Linked Data Cooperation
  • 14. 14 2 check Portal betrieben von Provider Data producer Data consumer publishes obtains Data references produces Monitor 3 checks checks 1 4 provides Community-Portal Data portal operated by improves 5 delivers improvesinformes Open Data Quality Integration Framework 1.Quality processes and procedure models to assess and publish data 2.Contributions of the Open Data users 3.Quality checks when entering (meta-)data descriptions at the data portal 4.Monitoring of data quality over time 5.Community-driven data portal with user-generated content, e.g. enrich metadata, alternative data formats, etc.
  • 15. Partners Project duration: 30 months o Start: October 2015 o End: March 2018 Semantic Web Company Danube University Krems Vienna University of Economics and Business Improving Data Quality in Open Data ADEQUATe
  • 16. • Data contaminated with characters incomprehensible to UTF-8; flipped erratically between other character formats; • Data were regularly formatted with commas; changed its filename convention; A D E Q U A T e D a t a Q u a l i t y D o m a i n s & C h a l l e n g e s Address Data Quality in three domains: 1. Tools 2. Standards 3. Processes HowWhy Easy to check Availability, conformance, processability, timeliness, representational-consistency; interlinking, conformance to vocabularies, provenance (Linked Data) Hard to assess Completeness, consistency, accuracy, credibility, relevance What
  • 17. ADEQUATe: GOALS Improving Data Quality on Open Data 1. Quality measures 2. Evolution monitor 3. Quality improvement through o Algorithms o Data linkage o Crowdsourcing Use cases ADEQUATe: GOALS
  • 18. M24 Refinements M18 Quality improvements Use case connection M12 Quality monitoring framework Data linkage M8 Architecture Blueprint M6 Quality metrics Requirements ADEQUATe: GOALSADEQUATe: Milestones
  • 19. 19 Donau-Universität Krems. Die Universität für Weiterbildung. Johann Höchtl Center for E-Governance Johann.hoechtl@donau-uni.ac.at @myprivate42 at.linkedin.com/in/johannhoechtl CC-BY 3.0