Data Quality
Jeremy Debattista
ADAPT Centre, Trinity College Dublin
This research has received funding from the Irish Research Council Government of Ireland Postdoctoral Fellowship award (GOIPD/2017/1204)
and the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded by
theEuropeanRegionalDevelopmentFund.
www.adaptcentre.ie
1
How many of you...
… check product review before purchasing?
Image and Reviews taken from
https://www.amazon.co.uk/Echo-Dot-Smart-Speaker-Alexa/dp/B0792KWK57/
www.adaptcentre.ie
2
How many of you...
… check trip advisor to find the right restaurant?
Images taken from TripAdvisor.com
www.adaptcentre.ie
3
Quality: A definition from a Personal Perspective
Crowd Image by James Cridland, taken from https://www.flickr.com/photos/jamescridland/613445810/. Licensed under CC-BY 2.0
What does quality mean to you?
www.adaptcentre.ie
4
Quality: A definition
Robert Pirsig
Joseph Juran
Phillip Crosby
www.adaptcentre.ie
5
Quality: A definition – Pirsig’s Perspective
Robert Pirsig
… the result of care
Zen and the Art of Motorcycle Maintenance (1974)
Photo taken from: https://www.goodreads.com
www.adaptcentre.ie
6
Quality: A definition – Juran’s Perspective
… fitness for use
Quality Control Handbook (1974)
Joseph Juran
Photo taken from: https://www.toolshero.com
www.adaptcentre.ie
7
Quality: A definition – Crosby’s Perspective
… conformance to
requirements
Quality is Free : The Art of Making Quality
Certain. Mentor book. (1979)
Phillip Crosby
Photo taken from: https://ceopedia.org
www.adaptcentre.ie
8
Data Quality – What is data quality?
What characterised good quality for the
datasets you needed to perform a task?
www.adaptcentre.ie
9
Quality in terms of data is:
• Multi-dimensional concept
• Characterise quality for a particular task
• Variety of quality measures, Subjective or Objective for different
tasks
• e.g. Accessibility, Trustworthiness, Consistency
High quality data = data that fits for its intended use.
Data Quality Definition
www.adaptcentre.ie
10
Data Quality – Why is it important?
DATA
www.adaptcentre.ie
11
Data Quality – A Strategy for Organisations
• Data Quality is expensive
• Data Quality is not just about assessing but also about improving.
Figure from Ismael Caballero, Jorge Merino, Manuel Serrano, Mario Piattini, Data Quality for Big Data: Addressing Veracity and Value, 2016
www.adaptcentre.ie
12
Data Quality – Identify problems early!
A simplistic view of the semantic publishing process
(Un/semi-)structured
data sources
Processing/Uplifting
Schemas
Mapping
Transform
Fusion
Semantic
(Knowledge) Graph
www.adaptcentre.ie
13
Data Quality – Identify problems early!
A simplistic view of the semantic publishing process
(Un/semi-)structured
data sources
Processing/Uplifting
Schemas
Mapping
Transform
Fusion
Semantic
(Knowledge) Graph
• Potentially external data
• No structure and context to the data
• Certification of quality?
www.adaptcentre.ie
14
Data Quality – Identify problems early!
A simplistic view of the semantic publishing process
(Un/semi-)structured
data sources
Processing/Uplifting
Schemas
Mapping
Transform
Fusion
Semantic
(Knowledge) Graph
• Gives context to raw data
• Drives the resulting knowledge graphs
• Should be free of contradictions and incorrect definitions
www.adaptcentre.ie
15
Data Quality – Identify problems early!
A simplistic view of the semantic publishing process
(Un/semi-)structured
data sources
Processing/Uplifting
Schemas
Mapping
Transform
Fusion
Semantic
(Knowledge) Graph
• Incorrect/Incomplete mappings (e.g. typos)
• Catch errors here, as otherwise errors in your KG will multiply
www.adaptcentre.ie
16
Data Quality – Identify problems early!
A simplistic view of the semantic publishing process
(Un/semi-)structured
data sources
Processing/Uplifting
Schemas
Mapping
Transform
Fusion
Semantic
(Knowledge) Graph
• Are external data sources fit for the task at hand?
www.adaptcentre.ie
17
Data Quality – Identify problems early!
A simplistic view of the semantic publishing process
(Un/semi-)structured
data sources
Processing/Uplifting
Schemas
Mapping
Transform
Fusion
Semantic
(Knowledge) Graph
• Any quality issues not dealt with before will definitely be here
• Big data, time consuming, more expensive to clean
www.adaptcentre.ie
18
Linked Data Quality Metrics
Figure from: A. J. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality Assessment for Linked Data: A survey.
www.adaptcentre.ie
19
Linked Data Quality Metrics - Accessibility
Are Linked Data resources readily available to be re-used in
different applications/context?
Example Metrics:
• Availability of SPARQL endpoints and RDF Data Dumps
• Dereferenceability of resources
• Indication of machine/human readable license
• Links to external datasets
• Correct usage of hash/slash URIs
www.adaptcentre.ie
20
Linked Data Quality Metrics - Intrinsic
Measures metrics that are related to the correctness and
coherence of the data, independent of the user’s context
Example Metrics:
• Syntactic valid dataset
• Incorrect datatype specification (e.g. “23.42”^^xsd:integer)
• Outlier detection
• Correct domain and range definition
• Data conciseness
www.adaptcentre.ie
21
Linked Data Quality Metrics - Contextual
Measures metrics dependent on the task at hand.
Example Metrics:
• Trustworthiness of data
• Identification of timely data
• Provenance information
www.adaptcentre.ie
22
Linked Data Quality Metrics - Representational
How well is the data represented in terms of common best
practices and guidelines?
Example Metrics:
• Re-using existing vocabularies
• Usage of undefined classes/properties
• Provide different serialisation formats for the data
• Use of multiple languages
www.adaptcentre.ie
23
ISO/IEC 25012 Standard
• Every metric identified in the
research was mapped to the
ISO/IEC 25012 Model:
§ The Inherent Category –
measures intrinsic quality
characteristics.
§ The System Category –
measures the degree of quality
when the system is used.
§ The Inherent-System
Category – which includes
metrics covering both aspects.
http://iso25000.com/index.php/en/iso-25000-standards/iso-25012
www.adaptcentre.ie
24
Problems with Assessing the Quality of Big Datasets
• Metrics classified in Zaveri et al. did not take into consideration time
and space complexity
• Efficient computation of impractical quality metrics when assessing
big datasets
• Solving intractable problems?
• Trade-off? Faster computation time against metric’s value precision
www.adaptcentre.ie
25
Probabilistic Techniques for Assessing Datasets
• Sampling
• Reservoir sampling
• Stratified sampling
• Bloom Filters
• Random Walks/Markov Chains
• Clustering
www.adaptcentre.ie
26
Quality Assessment – A Conceptual Methodology
1. Identify Quality Measures for the task at hand
• What are the important characteristics of my task?
2. Re-use or define quality metrics
3. Prepare the quality assessment
a) Access point of dataset in question
b) External Resources such as gold standard
4. Running the quality assessment
5. Assessment representation
a) Immediate use
b) Mid-to-long term use
www.adaptcentre.ie
27
Linked Data Quality Frameworks – Over the Years
Flemming LinkQA Sieve RDF Unit Triple
Check
Mate
LiQuate TRELLIS tRDF/tSP
ARQL
WIQA Luzzu
Scalability X ✓ ✓ ✓ N/A N/A N/A ✓ N/A ✓
Extensibility X Java XML SPARQL X Bayesian
Rules
X tSPARQL
Rules
WIQA PL Java or
LQML
Quality
Metadata
X X ✓
(Optional)
✓
(DQV)
X X X X X ✓(daQ)
Quality
Report
HTML HTML X HTML or
RDF
X X X X X RDF
Collaboration X X X X ✓ X ✓ X X X
Cleaning
Support
X X ✓ X X X X X X X
Last Update 2010 2011 2014 2017 2013 2014 2005 2014 2009 2018
www.adaptcentre.ie
28
Luzzu – A Quality Assessment Framework for Linked
Data
• Four Principles:
1. Extensibility
2. Scalability
3. Interoperability
4. Customisability
Luzzu
Thread Pool
Metrics Identification
List Metrics Impl. Library
Metric 1
Metric 2
Metric 3
…
Metric n
Dataset /
SPARQL Endpoint
Stream Processing
<s,p,o>
Quality Metadata
Quality Problem
Report
Try it out:
http://www.github.com/Luzzu/Framework
www.adaptcentre.ie
29
Luzzu – A Quality Assessment Framework for Linked
Data
• Four Principles:
1. Extensibility
2. Scalability
3. Interoperability
4. Customisability
Luzzu
Thread Pool
Metrics Identification
List Metrics Impl. Library
Metric 1
Metric 2
Metric 3
…
Metric n
Dataset /
SPARQL Endpoint
Stream Processing
<s,p,o>
Quality Metadata
Quality Problem
Report
Try it out:
http://www.github.com/Luzzu/Framework
www.adaptcentre.ie
30
W3C Data Quality Vocabulary (DQV)
https://www.w3.org/TR/vocab-dqv/
www.adaptcentre.ie
31
W3C Data Quality Vocabulary (DQV)
• Policies: Express policies or agreements a dataset follows defined by some
data quality concerns
• Annotations: Providing rating, certificates, feedback etc…
• Feedback: Comments from data consumers on a dataset (imagine
comments in Trip Advisor)
https://www.w3.org/TR/vocab-dqv/
www.adaptcentre.ie
32
Web of Data Quality - Aggregated
www.adaptcentre.ie
33
Web of Data Quality - Aggregated
Dataset (http://)
Aggregated
Quality
Score
Pos
zbw.eu 84.72% 1st
id.sgcb.mcu.es 83.91% 2nd
kdata.kr 82.22% 3rd
morelab.deusto.es 80.12% 4th
mapasinteractivos.didactalia.net 74.18% 5th
...
citeseer.rkbexplorer.com 48.31% 126th
prefix.cc 46.64% 127th
kent.zpr.fer.hr 46.61% 128th
transport.data.gov.uk 45.09% 129th
lingvoj.org 41.41% 130th
www.adaptcentre.ie
34
Web of Data Quality – Accessibility Category
www.adaptcentre.ie
35
Web of Data Quality – Accessibility Category
Accessibility Category:
Examples: Availability of Resources,
Licensing, Server Performance
Lessons Learned:
• Average Conformance: 30%
• Standard Deviation: 19%
• Low usage of Machine-Readable
Licences (17 out of 131 datasets)
and Human-Readable Licences (11
out of 131 datasets)
www.adaptcentre.ie
36
Web of Data Quality – Contextual Category
www.adaptcentre.ie
37
Web of Data Quality – Contextual Category
Contextual Category:
Examples: Provenance of Data, Human
Comprehensibility
Lessons Learned:
• Average Conformance: 13%
• Standard Deviation: 13%
• Poor conformance w.r.t. basic
provenance information (e.g.
creator of dataset), and
traceability of data (predicates
defining origin of data)
• More effort towards human
labelling and description of
resources by publishers
www.adaptcentre.ie
38
Web of Data Quality – Intrinsic Category
www.adaptcentre.ie
39
Web of Data Quality – Intrinsic Category
Intrinsic Category:
Examples: Syntactic Validity,
Consistency, Conciseness
Lessons Learned:
• Average Conformance: 77%
• Standard Deviation: 13%
• Overall high conformance for
almost all metrics
• Conformance towards the usage of
correct domain or range datatypes
should be improved (average
conformance ≈ 60%)
www.adaptcentre.ie
40
Web of Data Quality – Representational Category
www.adaptcentre.ie
41
Web of Data Quality – Representational Category
Representational Category:
Examples: Interoperability, Versatility,
Interpretability, Data Representation
Lessons Learned:
• Average Conformance: 63%
• Standard Deviation: 14%
• Data publishers should re-use
more existing terms (average
conformance ≈ 34%)
www.adaptcentre.ie
42
Linked Open Data Cloud – A Dataset Portal
Dataset Portal: http://luzzu.adaptcentre.ie
www.adaptcentre.ie
43
Conclusions
Quality is different
for everyone
Cost vs need for
assessment
Detect quality issues
earlier!
SoTA evolved to meet
the consumers need
to characterise
fitness for intended
use
The quality of the
Web of Data is not
bad – but needs to
improve
www.adaptcentre.ie
44
References
• J. Debattista, S. Auer, C. Lange. Luzzu - A Methodology and Framework for Linked Data Quality
Assessment. In ACM Journal of Data Information Quality. V8 I1, November 2016
• J. Debattista, S. Londoño, C. Lange, S. Auer. Quality Assessment of Linked Datasets using
Probabilistic Approximation. In 12th European Semantic Web Conference Proceedings 2015, 221-
236, Springer
• J. Debattista. Scalable Quality Assessment of Linked Data. (Thesis) Universitäts-und
Landesbibliothek Bonn 2017
• A. J. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality Assessment for
Linked Data: A survey. Semantic Web Journal, 2015
• J. Debattista, C. Lange, S. Auer. Representing dataset quality metadata using multi-dimensional
views. In Proceedings of the 10th International Conference on Semantic Systems (SEMANTiCS
’14), 92-99, ACM
• S. McGurk, J. Debattista, C. Abela. Towards Ontology Quality Assessment. 4th Workshop on
Linked Data Quality (LDQ)
www.adaptcentre.ie
45
Data Quality
@jerdeb
jeremy.debattista@adaptcentre.ie
Question Time!

Data Quality

  • 1.
    Data Quality Jeremy Debattista ADAPTCentre, Trinity College Dublin This research has received funding from the Irish Research Council Government of Ireland Postdoctoral Fellowship award (GOIPD/2017/1204) and the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded by theEuropeanRegionalDevelopmentFund.
  • 2.
    www.adaptcentre.ie 1 How many ofyou... … check product review before purchasing? Image and Reviews taken from https://www.amazon.co.uk/Echo-Dot-Smart-Speaker-Alexa/dp/B0792KWK57/
  • 3.
    www.adaptcentre.ie 2 How many ofyou... … check trip advisor to find the right restaurant? Images taken from TripAdvisor.com
  • 4.
    www.adaptcentre.ie 3 Quality: A definitionfrom a Personal Perspective Crowd Image by James Cridland, taken from https://www.flickr.com/photos/jamescridland/613445810/. Licensed under CC-BY 2.0 What does quality mean to you?
  • 5.
    www.adaptcentre.ie 4 Quality: A definition RobertPirsig Joseph Juran Phillip Crosby
  • 6.
    www.adaptcentre.ie 5 Quality: A definition– Pirsig’s Perspective Robert Pirsig … the result of care Zen and the Art of Motorcycle Maintenance (1974) Photo taken from: https://www.goodreads.com
  • 7.
    www.adaptcentre.ie 6 Quality: A definition– Juran’s Perspective … fitness for use Quality Control Handbook (1974) Joseph Juran Photo taken from: https://www.toolshero.com
  • 8.
    www.adaptcentre.ie 7 Quality: A definition– Crosby’s Perspective … conformance to requirements Quality is Free : The Art of Making Quality Certain. Mentor book. (1979) Phillip Crosby Photo taken from: https://ceopedia.org
  • 9.
    www.adaptcentre.ie 8 Data Quality –What is data quality? What characterised good quality for the datasets you needed to perform a task?
  • 10.
    www.adaptcentre.ie 9 Quality in termsof data is: • Multi-dimensional concept • Characterise quality for a particular task • Variety of quality measures, Subjective or Objective for different tasks • e.g. Accessibility, Trustworthiness, Consistency High quality data = data that fits for its intended use. Data Quality Definition
  • 11.
    www.adaptcentre.ie 10 Data Quality –Why is it important? DATA
  • 12.
    www.adaptcentre.ie 11 Data Quality –A Strategy for Organisations • Data Quality is expensive • Data Quality is not just about assessing but also about improving. Figure from Ismael Caballero, Jorge Merino, Manuel Serrano, Mario Piattini, Data Quality for Big Data: Addressing Veracity and Value, 2016
  • 13.
    www.adaptcentre.ie 12 Data Quality –Identify problems early! A simplistic view of the semantic publishing process (Un/semi-)structured data sources Processing/Uplifting Schemas Mapping Transform Fusion Semantic (Knowledge) Graph
  • 14.
    www.adaptcentre.ie 13 Data Quality –Identify problems early! A simplistic view of the semantic publishing process (Un/semi-)structured data sources Processing/Uplifting Schemas Mapping Transform Fusion Semantic (Knowledge) Graph • Potentially external data • No structure and context to the data • Certification of quality?
  • 15.
    www.adaptcentre.ie 14 Data Quality –Identify problems early! A simplistic view of the semantic publishing process (Un/semi-)structured data sources Processing/Uplifting Schemas Mapping Transform Fusion Semantic (Knowledge) Graph • Gives context to raw data • Drives the resulting knowledge graphs • Should be free of contradictions and incorrect definitions
  • 16.
    www.adaptcentre.ie 15 Data Quality –Identify problems early! A simplistic view of the semantic publishing process (Un/semi-)structured data sources Processing/Uplifting Schemas Mapping Transform Fusion Semantic (Knowledge) Graph • Incorrect/Incomplete mappings (e.g. typos) • Catch errors here, as otherwise errors in your KG will multiply
  • 17.
    www.adaptcentre.ie 16 Data Quality –Identify problems early! A simplistic view of the semantic publishing process (Un/semi-)structured data sources Processing/Uplifting Schemas Mapping Transform Fusion Semantic (Knowledge) Graph • Are external data sources fit for the task at hand?
  • 18.
    www.adaptcentre.ie 17 Data Quality –Identify problems early! A simplistic view of the semantic publishing process (Un/semi-)structured data sources Processing/Uplifting Schemas Mapping Transform Fusion Semantic (Knowledge) Graph • Any quality issues not dealt with before will definitely be here • Big data, time consuming, more expensive to clean
  • 19.
    www.adaptcentre.ie 18 Linked Data QualityMetrics Figure from: A. J. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality Assessment for Linked Data: A survey.
  • 20.
    www.adaptcentre.ie 19 Linked Data QualityMetrics - Accessibility Are Linked Data resources readily available to be re-used in different applications/context? Example Metrics: • Availability of SPARQL endpoints and RDF Data Dumps • Dereferenceability of resources • Indication of machine/human readable license • Links to external datasets • Correct usage of hash/slash URIs
  • 21.
    www.adaptcentre.ie 20 Linked Data QualityMetrics - Intrinsic Measures metrics that are related to the correctness and coherence of the data, independent of the user’s context Example Metrics: • Syntactic valid dataset • Incorrect datatype specification (e.g. “23.42”^^xsd:integer) • Outlier detection • Correct domain and range definition • Data conciseness
  • 22.
    www.adaptcentre.ie 21 Linked Data QualityMetrics - Contextual Measures metrics dependent on the task at hand. Example Metrics: • Trustworthiness of data • Identification of timely data • Provenance information
  • 23.
    www.adaptcentre.ie 22 Linked Data QualityMetrics - Representational How well is the data represented in terms of common best practices and guidelines? Example Metrics: • Re-using existing vocabularies • Usage of undefined classes/properties • Provide different serialisation formats for the data • Use of multiple languages
  • 24.
    www.adaptcentre.ie 23 ISO/IEC 25012 Standard •Every metric identified in the research was mapped to the ISO/IEC 25012 Model: § The Inherent Category – measures intrinsic quality characteristics. § The System Category – measures the degree of quality when the system is used. § The Inherent-System Category – which includes metrics covering both aspects. http://iso25000.com/index.php/en/iso-25000-standards/iso-25012
  • 25.
    www.adaptcentre.ie 24 Problems with Assessingthe Quality of Big Datasets • Metrics classified in Zaveri et al. did not take into consideration time and space complexity • Efficient computation of impractical quality metrics when assessing big datasets • Solving intractable problems? • Trade-off? Faster computation time against metric’s value precision
  • 26.
    www.adaptcentre.ie 25 Probabilistic Techniques forAssessing Datasets • Sampling • Reservoir sampling • Stratified sampling • Bloom Filters • Random Walks/Markov Chains • Clustering
  • 27.
    www.adaptcentre.ie 26 Quality Assessment –A Conceptual Methodology 1. Identify Quality Measures for the task at hand • What are the important characteristics of my task? 2. Re-use or define quality metrics 3. Prepare the quality assessment a) Access point of dataset in question b) External Resources such as gold standard 4. Running the quality assessment 5. Assessment representation a) Immediate use b) Mid-to-long term use
  • 28.
    www.adaptcentre.ie 27 Linked Data QualityFrameworks – Over the Years Flemming LinkQA Sieve RDF Unit Triple Check Mate LiQuate TRELLIS tRDF/tSP ARQL WIQA Luzzu Scalability X ✓ ✓ ✓ N/A N/A N/A ✓ N/A ✓ Extensibility X Java XML SPARQL X Bayesian Rules X tSPARQL Rules WIQA PL Java or LQML Quality Metadata X X ✓ (Optional) ✓ (DQV) X X X X X ✓(daQ) Quality Report HTML HTML X HTML or RDF X X X X X RDF Collaboration X X X X ✓ X ✓ X X X Cleaning Support X X ✓ X X X X X X X Last Update 2010 2011 2014 2017 2013 2014 2005 2014 2009 2018
  • 29.
    www.adaptcentre.ie 28 Luzzu – AQuality Assessment Framework for Linked Data • Four Principles: 1. Extensibility 2. Scalability 3. Interoperability 4. Customisability Luzzu Thread Pool Metrics Identification List Metrics Impl. Library Metric 1 Metric 2 Metric 3 … Metric n Dataset / SPARQL Endpoint Stream Processing <s,p,o> Quality Metadata Quality Problem Report Try it out: http://www.github.com/Luzzu/Framework
  • 30.
    www.adaptcentre.ie 29 Luzzu – AQuality Assessment Framework for Linked Data • Four Principles: 1. Extensibility 2. Scalability 3. Interoperability 4. Customisability Luzzu Thread Pool Metrics Identification List Metrics Impl. Library Metric 1 Metric 2 Metric 3 … Metric n Dataset / SPARQL Endpoint Stream Processing <s,p,o> Quality Metadata Quality Problem Report Try it out: http://www.github.com/Luzzu/Framework
  • 31.
    www.adaptcentre.ie 30 W3C Data QualityVocabulary (DQV) https://www.w3.org/TR/vocab-dqv/
  • 32.
    www.adaptcentre.ie 31 W3C Data QualityVocabulary (DQV) • Policies: Express policies or agreements a dataset follows defined by some data quality concerns • Annotations: Providing rating, certificates, feedback etc… • Feedback: Comments from data consumers on a dataset (imagine comments in Trip Advisor) https://www.w3.org/TR/vocab-dqv/
  • 33.
  • 34.
    www.adaptcentre.ie 33 Web of DataQuality - Aggregated Dataset (http://) Aggregated Quality Score Pos zbw.eu 84.72% 1st id.sgcb.mcu.es 83.91% 2nd kdata.kr 82.22% 3rd morelab.deusto.es 80.12% 4th mapasinteractivos.didactalia.net 74.18% 5th ... citeseer.rkbexplorer.com 48.31% 126th prefix.cc 46.64% 127th kent.zpr.fer.hr 46.61% 128th transport.data.gov.uk 45.09% 129th lingvoj.org 41.41% 130th
  • 35.
    www.adaptcentre.ie 34 Web of DataQuality – Accessibility Category
  • 36.
    www.adaptcentre.ie 35 Web of DataQuality – Accessibility Category Accessibility Category: Examples: Availability of Resources, Licensing, Server Performance Lessons Learned: • Average Conformance: 30% • Standard Deviation: 19% • Low usage of Machine-Readable Licences (17 out of 131 datasets) and Human-Readable Licences (11 out of 131 datasets)
  • 37.
    www.adaptcentre.ie 36 Web of DataQuality – Contextual Category
  • 38.
    www.adaptcentre.ie 37 Web of DataQuality – Contextual Category Contextual Category: Examples: Provenance of Data, Human Comprehensibility Lessons Learned: • Average Conformance: 13% • Standard Deviation: 13% • Poor conformance w.r.t. basic provenance information (e.g. creator of dataset), and traceability of data (predicates defining origin of data) • More effort towards human labelling and description of resources by publishers
  • 39.
    www.adaptcentre.ie 38 Web of DataQuality – Intrinsic Category
  • 40.
    www.adaptcentre.ie 39 Web of DataQuality – Intrinsic Category Intrinsic Category: Examples: Syntactic Validity, Consistency, Conciseness Lessons Learned: • Average Conformance: 77% • Standard Deviation: 13% • Overall high conformance for almost all metrics • Conformance towards the usage of correct domain or range datatypes should be improved (average conformance ≈ 60%)
  • 41.
    www.adaptcentre.ie 40 Web of DataQuality – Representational Category
  • 42.
    www.adaptcentre.ie 41 Web of DataQuality – Representational Category Representational Category: Examples: Interoperability, Versatility, Interpretability, Data Representation Lessons Learned: • Average Conformance: 63% • Standard Deviation: 14% • Data publishers should re-use more existing terms (average conformance ≈ 34%)
  • 43.
    www.adaptcentre.ie 42 Linked Open DataCloud – A Dataset Portal Dataset Portal: http://luzzu.adaptcentre.ie
  • 44.
    www.adaptcentre.ie 43 Conclusions Quality is different foreveryone Cost vs need for assessment Detect quality issues earlier! SoTA evolved to meet the consumers need to characterise fitness for intended use The quality of the Web of Data is not bad – but needs to improve
  • 45.
    www.adaptcentre.ie 44 References • J. Debattista,S. Auer, C. Lange. Luzzu - A Methodology and Framework for Linked Data Quality Assessment. In ACM Journal of Data Information Quality. V8 I1, November 2016 • J. Debattista, S. Londoño, C. Lange, S. Auer. Quality Assessment of Linked Datasets using Probabilistic Approximation. In 12th European Semantic Web Conference Proceedings 2015, 221- 236, Springer • J. Debattista. Scalable Quality Assessment of Linked Data. (Thesis) Universitäts-und Landesbibliothek Bonn 2017 • A. J. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality Assessment for Linked Data: A survey. Semantic Web Journal, 2015 • J. Debattista, C. Lange, S. Auer. Representing dataset quality metadata using multi-dimensional views. In Proceedings of the 10th International Conference on Semantic Systems (SEMANTiCS ’14), 92-99, ACM • S. McGurk, J. Debattista, C. Abela. Towards Ontology Quality Assessment. 4th Workshop on Linked Data Quality (LDQ)
  • 46.