Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data-driven Joint Debugging of the DBpedia Mappings and Ontology

387 views

Published on

DBpedia is a large-scale, cross-domain knowledge graph extracted from Wikipedia. For the extraction, crowd-sourced mappings from Wikipedia infoboxes to the DBpedia ontology are utilized. In this process, different problems may arise: users may create wrong and/or inconsistent mappings, use the ontology in an unforeseen way, or change the ontology without considering all possible consequences. In this paper, we present a data-driven approach to discover problems in mappings as well as in the ontology and its usage in a joint, data-driven process. We show both quantitative and qualitative results about the problems identified, and derive proposals for altering mappings and refactoring the DBpedia ontology.

Published in: Data & Analytics
  • Be the first to comment

Data-driven Joint Debugging of the DBpedia Mappings and Ontology

  1. 1. 06/01/17 Heiko Paulheim 1 Data-driven Joint Debugging of the DBpedia Mappings and Ontology Towards Addressing the Causes instead of the Symptoms of Data Quality in DBpedia Heiko Paulheim
  2. 2. 06/01/17 Heiko Paulheim 2 Motivation • Various works on finding errors in Knowledge Graphs – 2017 survey: 17 approaches – 15/17 are evaluated on DBpedia • Question: – How does DBpedia benefit from those works? ￘ H. Paulheim: Knowledge Graph Refinement – A Survey of Approaches and Evaluation Methods. SWJ 8(3), 2017
  3. 3. 06/01/17 Heiko Paulheim 3 Motivation • What comes out of those research works – A list of (possibly) wrong statements – Source code for finding erroneous statements – ...
  4. 4. 06/01/17 Heiko Paulheim 4 Motivation • Possible option 1: Remove erroneous triples from DBpedia • Challenges – May remove correct axioms, may need thresholding – Needs to be repeated for each release – Needs to be materialized on all of DBpedia DBpedia Extraction FrameworkWikipedia DBpedia Mappings Wiki Post Filter
  5. 5. 06/01/17 Heiko Paulheim 5 Motivation • Materialized on full DBpedia: 8/15 approaches
  6. 6. 06/01/17 Heiko Paulheim 6 Motivation • Possible option 2: Integrate into DBpedia Extraction Framework • Challenges – Development workload – Some approaches are not fully automated (technically or conceptually) – Scalability DBpedia Extraction Framework plus filter module Wikipedia DBpedia Mappings Wiki
  7. 7. 06/01/17 Heiko Paulheim 7 Motivation • Scalability analyzed: 6/15 Disclaimer: does not imply that it is actually scalable!
  8. 8. 06/01/17 Heiko Paulheim 8 Motivation • Do we have a third option? – Paulheim & Gangemi (2015): >95% of all inconsistencies in DBpedia boil down to 40 common root causes Wikipedia DBpedia Mappings Wiki DBpedia Extraction Framework Inconsistency DetectionIdentification of suspicious mappings and ontology constructs H. Paulheim, A. Gangemi: Serving DBpedia with DOLCE – More than Just Adding a Cherry on Top (ISWC 2015) Disclaimer: not equivalent to “wrong statements”
  9. 9. 06/01/17 Heiko Paulheim 9 Approach dbr:San_Diego_ County,_California dbr:Agua_Caliente_ Airport dbo:operator foaf:name dbo:Airport dbo:Settlement dbo:Place dbo:Infrastructure dbo:Architectural- Structure dbo:Agent owl:disjoint With rdf:type rdf:type “Agua Caliente Airport” dbo:PopulatedPlace dbo:Organisation rdfs:range Obama free Example!
  10. 10. 06/01/17 Heiko Paulheim 10 Approach • Find inconsistencies in extracted statements – Using DBpedia and DOLCE as top level ontology • Trace them back to mappings – In the example, there are three candidates • Property mapping to the predicate dbo:operator • Class mapping (subject) to dbo:Airport • Class mapping (object) to dbo:Settlement • Unfortunately, provenance information for DBpedia is not that fine-grained – i.e., we do not know which mapping was responsible for which statement in the end – first step: heuristic reconstruction
  11. 11. 06/01/17 Heiko Paulheim 11 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) Wikipedia Page DBpedia Resource • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  12. 12. 06/01/17 Heiko Paulheim 12 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) DBpedia Ontology Class • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  13. 13. 06/01/17 Heiko Paulheim 13 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) DBpedia Ontology Property • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  14. 14. 06/01/17 Heiko Paulheim 14 Approach (ctd.) • After we heuristically reconstructed the mappings, we can determine – How often is a mapping element involved in an inconsistency? – How often is a mapping element used, but not involved in an inconsistency?
  15. 15. 06/01/17 Heiko Paulheim 15 Approach (ctd.) • Using the two counters cm and im , we can compute two scores for the hypothesis that m is problematic • Borrowed from Association Rule Mining (support and confidence): • N is the total number of statements in DBpedia
  16. 16. 06/01/17 Heiko Paulheim 16 Identifying Interesting Problems • Hypothesis: high support and high confidence mapping elements hint at problems worth investigating – High support: fixing the issue would fix a lot of individual statements – High confidence: this mapping element actually hints at the root cause • i.e., fixing this does not break many other things • Unfortunately, both come at different scales – Difficult to use average, harmonic mean or the like – Support: μ = 0.0002, σ = 0.003 – Confidence: μ = 0.114, σ = 0.260 • Fix: use logarithmic support instead – LogSupport: μ = 0.179, σ = 0.139
  17. 17. 06/01/17 Heiko Paulheim 17 Identifying Interesting Problems (ctd.) • Inspect mappings that have a high harmonic mean of confidence and log support 0.25 0.5 0.75 more interesting
  18. 18. 06/01/17 Heiko Paulheim 18 Example Findings • Case 1: Mapping to wrong property • Example: – branch in infobox military unit is mapped to dbo:militaryBranch • but dbo:militaryBranch has dbo:Person as its domain – correction: dbo:commandStructure – Overall score: 0.721 – Affects 12,172 statements (31% of all dbo:militaryBranch)
  19. 19. 06/01/17 Heiko Paulheim 19 Example Findings • Case 2: Mappings that should be removed • Example: – dbo:picture – Most of the are inconsistent (64.5% places, 23.0% persons) – Reason: statements are extracted from picture caption dbo:Brixton_Academy dbo:picture dbo:Brixton . dbo:Justify_My_Love dbo:picture dbo:Madonna_(entertainer) .
  20. 20. 06/01/17 Heiko Paulheim 20 Example Findings • Case 3: Ontology problems (domain/range) • Example 1: – Populated places (e.g., cities) are used both as place and organization – For some properties, the range is either one of the two • e.g., dbo:operator (see introductory example) – Polysemy should be reflected in the ontology • Example 2: – dbo:architect, dbo:designer, dbo:engineer etc. have dbo:Person as their range – Significant fractions (8.6%, 7.6%, 58.4%, resp.) have a dbo:Organization as object – Range should be broadened
  21. 21. 06/01/17 Heiko Paulheim 21 Example Findings • Case 4: Missing properties • Example 1: – dbo:president links an organization to its president – Majority use (8,354, or 76.2%): link a person to the president s/he served for • Example 2: – dbo:instrument links an artist to the instrument s/he plays – Prominent alternative use (3,828, or 7.2%): links a genre to its characteristic instrument Obamaexamplealert!
  22. 22. 06/01/17 Heiko Paulheim 22 Future Work • Classify ontology, mapping, and other errors automatically – Currently ongoing: using different language editions of DBpedia • Heuristic: – problem present in many languages → ontology problem – Problem present only in one language → mapping problem • From post-processing to live processing – e.g., on-the-fly validation in DBpedia Mappings Wiki
  23. 23. 06/01/17 Heiko Paulheim 23 Take Aways • Fixing bugs in knowledge graphs is nice – But often a one-time solution – Preserving the efforts is hard • Proposed solution – Identify and address the root problem – Scoring mechanism helps identifying interesting problems – Preserving the efforts by eliminating the root causes • Provenance matters! – The more we know about how a statement gets into a knowledge graph – The better can we automate the error analysis
  24. 24. 06/01/17 Heiko Paulheim 24 Data-driven Joint Debugging of the DBpedia Mappings and Ontology Towards Addressing the Causes instead of the Symptoms of Data Quality in DBpedia Heiko Paulheim

×