Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

LDQ 2014 DQ Methodology

518 views

Published on

"Methodology for Assessment of Linked Data Quality: A Framework" at Workshop on Linked Data Quality
Paper: https://dl.dropboxusercontent.com/u/2265375/LDQ/ldq2014_submission_3.pdf

Published in: Data & Analytics
  • Be the first to comment

LDQ 2014 DQ Methodology

  1. 1. A Methodology for Assessment of Linked Data Quality Anisa Rula Amrapali Zaveri
  2. 2. Outline ➢Linked Data Quality ○ Current State ○ Limitations ➢Quality Assessment Methodology ○ 3 phases, 6 steps ➢Conclusion ○ Future Work
  3. 3. Linked Data Quality ● c.a. 50 Billion Facts in the Linked Data Cloud ● But, what about the quality? ● Data is only as good as its quality !
  4. 4. Linked Data Quality ➢30 approaches, 18 Dimensions, 69 Metrics* ➢12 Tools ○ Automated ○ Semi-automated ➢No generalized methodology ➢Not taking into account the actual use case/user requirements ➢Only assessment, no improvement * http://www.semantic-web-journal.net/content/quality-assessment-linked-data-survey
  5. 5. Quality Assessment Methodology for Linked Data ➢3 Phases ➢6 steps
  6. 6. Phase I: Requirement Analysis Step I: Use Case Analysis - Description that best illustrates the intended usage of the dataset(s) Two types of users ➢Consumers ➢Potential consumers
  7. 7. Phase II: Quality Assessment Step II: Identification of quality issues ➢Based on the use case ➢Checklist-based approach ➢Yes - 1, No - 0 ➢List of quality dimensions
  8. 8. Phase II: Quality Assessment Step III: Statistics and Low-level Analysis ➢Generic statistics ➢Example ○ Interlinking degree ○ Blank nodes
  9. 9. Phase II: Quality Assessment Step IV: Advanced Analysis ➢High-level metrics ➢Example ○ Accuracy ○ Completeness ➢Requires (i) input and (ii) target dataset
  10. 10. Data Quality Score ➢Ratio ○ DQscore = 1 - (V/T) ■ V - total no. of instances that violate a DQ rule ■ T - total no. of relevant instances ■ for each property ○ DQweightedscore= (DQscore * wi / W) ■ wi - weight ■ W - sum of all weighted factors of the properties ■ for quality of overall properties
  11. 11. Phase III: Quality Improvement Step V: Root Cause Analysis ➢Analyze cause of each quality issue ➢Helps user interpret the results ➢Detect whether the problem occurs in the original dataset ➢In case original dataset is unavailable, analyze the available dataset to determine the cause
  12. 12. Phase III: Quality Improvement Step VI: Fixing Quality Problems ➢Semi-automatic ○ Consistency ○ Completeness ○ Syntactic validity ➢Crowdsourcing* ○ Semantic accuracy ○ Datatypes ○ Interlinks * Acosta et al., Crowdsourcing Linked Data Quality Assessment. ISWC 2013.
  13. 13. Conclusion and Future Work ➢Assessment methodology - 3 phases, 6 steps ➢Focus on use case ➢Improvement phase ! Future Work ➢Application to an actual use case ➢Build a tool
  14. 14. Thank you Questions Suggestions Comments @AnisaRula @amrapaliz

×