7. LDQ Dimensions & Metrics
Quality assessment
for linked data: A
survey. A Zaveri, A
Rula, A Maurino, R
Pietrobon, J
Lehmann, S Auer.
Semantic Web 7 (1),
63-93
300+
citations
@amrapaliz
8. LDQ Dimensions & Metrics
•Data Quality: commonly conceived as a multi-
dimensional construct with a popular definition ‘fitness
for use’*.
•Dimension: characteristics of a dataset.
•Metric: or indicator is a procedure for measuring an
information quality dimension.
*Juran et al., The Quality Control Handbook, 1974
@amrapaliz
9. LDQ Assessment Goal
Fix data quality issues in given sets of (semantic) data
Such quality issues may
• be in source datasets (e.g., inaccurate or wrong data items, outdated data items)
• result from imperfections of a data integration process (e.g., data items that have
been incorrectly linked with each other)
• reveal themselves only after the data integration (e.g., duplicates, inconsistencies)
Data cleaning may be relevant both, for original datasets before combining/
integrating and for datasets resulting from an integration.
Source: http://www.ida.liu.se/research/semanticweb/events/SemDataMgmtTutorial-Part7-
Cleaning.pdf
@amrapaliz
11. LDQ Dimensions - Accessibility dimensions & metrics
• Availability - extent to which data (or some portion of it) is present, obtainable and
ready for use
• accessibility of the SPARQL endpoint and the server
• dereferenceability of the URI
• Interlinking - degree to which entities that represent the same concept are linked to
each other, be it within or between two or more data sources
• detection of the existence and usage of external URIs
• detection of all local in-links or back-links: all triples from a dataset that have
the resource’s URI as the object
@amrapaliz
12. LDQ Dimensions - Intrinsic dimensions & metrics
• Syntactic Validity - degree to which an RDF document conforms
to the specification of the serialization format
• detecting syntax errors using (i) validators, (ii) via
crowdsourcing
• by (i) use of explicit definition of the allowed values for a
datatype, (ii) syntactic rules (type of characters allowed and/
or the pattern of literal values)
•
@amrapaliz
13. LDQ Dimensions - Intrinsic dimensions & metrics
• Completeness
• Schema - ontology completeness
• Property - missing values for a specific property
• Population - % of all real-world objects of a particular type
• Interlinking - degree to which instances in the dataset are
interlinked
@amrapaliz
15. RDFUnit: RDF Unit-Testing Suite
http://aksw.org/Projects/RDFUnit.html
Syntactic
Semantic
Consiste
@amrapaliz
16. 16
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing linked data quality assessment
M Acosta, A Zaveri, E Simperl, D Kontokostas, S Auer, J Lehmann ISWC 2013 @amrapaliz
17. Luzzu: QA for LOD
http://eis-bonn.github.io/Luzzu/index.html
2
Asses
3
Clean
4
Store
5
Rank
1
Metric
@amrapaliz
19. LDQ Beyond Data — Mapping Quality
Dimou et al. Assessing and Refining Mappings to RDF to Improve Dataset
Quality. ISWC 2015.
https://github.com/RMLio/RML-Validator
@amrapaliz