Declarative Repairing Policies for Curated KBs


Published on

Presented by Giorgos Flouris (FORTH) at the 10th Hellenic Data Management Symposium (HDMS-11) on the 18th of June 2011, Athens, Greece.


Curated ontologies and semantic annotations are increasingly being used in e-science to reflect the current terminology and conceptualization of scientific domains. Such curated Knowledge Bases (KBs) are usually backended by relational databases using adequate schemas (generic or application/domain specific) and may satisfy a wide range of integrity constraints. As curated KBs continuously evolve, such constraints are often violated and thus KBs need to be frequently repaired. Motivated by the fact that consistency is mostly enforced manually by the scientists acting as curators, we propose a generic and personalized repairing framework for assisting them in this arduous task. Our framework supports a variety of useful integrity constraints using Disjunctive Embedded Dependencies (DEDs) as well as complex curator preferences over interesting features of the resulting repairs (e.g., their size and type) that can capture diverse notions of minimality in repairs. Moreover, we propose a novel exhaustive repair finding algorithm which, unlike existing greedy frameworks, is not sensitive to the resolution order and syntax of violated constraints and can correctly compute globally optimal repairs for different kinds of constraints and preferences. Despite its exponential nature, the performance and memory requirements of the exhaustive algorithm are experimentally demonstrated to be satisfactory for real world curation cases, thanks to a series of optimizations.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Declarative Repairing Policies for Curated KBs

  1. 1. Declarative Repairing Policies for Curated KBsΔηλωτικές Πολιτικές Διόρθωσηςγια Επιμελημένες Βάσεις Γνώσης Γιάννης Ρουσσάκης Γιώργος Φλουρής Βασίλης Χριστοφίδης ΕΣΔΔ-11
  2. 2. Curated KBs• Heavily used in e-science, to reflect current knowledge and understanding, terminology etc• Published using SW languages (e.g., RDF/S)• Integrity constraints necessary – Enforce/describe domain or application specific requirements• DED constraints – Transitivity, cardinality, acyclicity, functional dependencies, …18/06/2011 Giorgos Flouris, HDMS-11 2
  3. 3. Repairing Curated KBs• Constraints often violated – Dynamic KBs (new experiments, observations etc) – Specifications (i.e., integrity constraints) may change• Two basic approaches – Consistent query answering – Repairing • Value of curated KBs: quality of content • Curator has complete control• Our work applies to relational data – Examples, implementation, experiments for RDF/S (represented using a relational schema)18/06/2011 Giorgos Flouris, HDMS-11 3
  4. 4. Repair Process (Single Constraint) KB Integrity Constraint: A subsumption relationships should be between defined classes ∀x,y CIsA(x,y) → CS(x) ∧ CS(y) B KB: {CIsA(B,A), CS(B)} Repaired KB Repaired KB {CIsA(B,A), CS(B), CS(A)} {CS(B)} A B B Incomplete knowledge assumption Complete knowledge assumption18/06/2011 Giorgos Flouris, HDMS-11 4
  5. 5. Repair Process (Multiple Constraints) Integrity Constraints: A 2. subsumption relationships should be between defined classes 3. properties should have a defined domain and range P 4. the range and domain should be a defined class BSeveral Possible Repairs • Several resolutions A A per violation P … • A resolution might B B P B violate or repair other constraints18/06/2011 Giorgos Flouris, HDMS-11 5
  6. 6. Problem Statement• Given an inconsistent KB, produce the optimal consistent one• Goal: a systematic and personalized framework for repairing inconsistent KBs (focus on RDF/S) – STEP 1: searching in a systematic way all possible repairs given an expressive language of constraints – STEP 2: allowing curators to state their personalized preferences to determine the optimal result (repair)18/06/2011 Giorgos Flouris, HDMS-11 6
  7. 7. Resolution Tree Creation (GO)– Complete step 1 (find all possible repairs), then find the optimal ones (step 2)– Globally-optimal (GO) strategy – Process • Find all possible resolutions for one violation Optimal repairs • Explore them all (returned) • Repeat recursively until consistent • Return the optimal leaf (leaves)18/06/2011 Giorgos Flouris, HDMS-11 7
  8. 8. Resolution Tree Creation (LO)– Interleave steps 1, 2, per violation: find “local” repairs, keep only the optimal repairs, repeat– Locally-optimal (LO) strategy – Process • Find all possible resolutions for one violation Optimal repair • Explore the optimal one(s) (returned) • Repeat recursively until consistent • Return all remaining leaves18/06/2011 Giorgos Flouris, HDMS-11 8
  9. 9. Comparison (GO versus LO)• Characteristics of GO • Characteristics of LO – Exhaustive – Greedy – Provably optimal – Not always optimal (better quality repairs) (lesser quality repairs) – Less efficient: – More efficient (on average): large resolution trees small resolution trees – Insensitive to constraint – Sensitive to constraint syntax syntax – Deterministic – Non-deterministic (does not depend on (depends on resolution resolution order) order)18/06/2011 Giorgos Flouris, HDMS-11 9
  10. 10. Determining Optimal Repairs• Optimal repair – Application and domain dependent – Curator describes ideal repair steps – Curator-defined specifications for “preferred repairs”• Preference models from database theory• Features, aggregate functions and composition operators – Applicable on deltas K (update steps to reach Kr from K) – Example: Kr δ Min(Size) & Min(Additions)18/06/2011 Giorgos Flouris, HDMS-11 10
  11. 11. Algorithm: Optimizations• Optimizations – Wildcards (labeled nulls) – Tree pruning (in GO) – Sparse diagnosis – Heuristics – Setting-specific optimizations • When preference, constraints etc are known at design-time18/06/2011 Giorgos Flouris, HDMS-11 11
  12. 12. Algorithm: Complexity and Performance• Detailed computational complexity analysis• Complexity depends on: – Type of constraints – Features and preferences used – Generally: exponential• Performance mainly determined by tree size – Number of violations – Types of violated constraints – Interdependencies between constraints – Preference (for LO)18/06/2011 Giorgos Flouris, HDMS-11 12
  13. 13. Experimental Evaluation: Setting• Created synthetic RDF/S KBs using PowerGen – Class-centric with/without instances (CCD/CC) – Property-centric with/without instances (PCD/PC)• Used a set of standard constraints for RDF/S (Serfiotis et al. 2005)• Added violations in a random manner – Up to 20 per RDF/S KB – Standard in related literature – Realistic (for dynamic consistent KBs)18/06/2011 Giorgos Flouris, HDMS-11 13
  14. 14. Experiments (GO)18/06/2011 Giorgos Flouris, HDMS-11 14
  15. 15. Experiments (LO) Min(Additions)18/06/2011 Giorgos Flouris, HDMS-11 15
  16. 16. Experiments (Repair Quality of LO) Many false positives Many false negatives (CCD, Min(additions)) (CCD, Max(additions))18/06/2011 Giorgos Flouris, HDMS-11 16
  17. 17. Generality Results• Framework is very general – Preferences, combined with features, can express practically any policy (current or future)• Most approaches in the literature can be expressed using our model – Task reduced to finding an adequate preference – Usually LO policies18/06/2011 Giorgos Flouris, HDMS-11 17
  18. 18. Conclusions• Developed a declarative and intuitive framework for repairing inconsistencies – Generic (DED constraints) – Customizable and flexible (preferences) – Able to integrate other repair policies – Automatic (no curator input at run-time)• Two approaches (GO/LO) – Exhaustive versus greedy• Future work – Anytime algorithm18/06/2011 Giorgos Flouris, HDMS-11 18
  19. 19. 18/06/2011 Giorgos Flouris, HDMS-11 19
  20. 20. Modifications• Modify constants in tuples ∀x,y,z,w,w R(x,y,w) ∧ R(x,z,w) → (y=z) – Less disruptive change – Violates unique name assumption• Philosophical issues – Local versus global modifications – Which constant to modify (x, y, or z), and how – Modifying tuples versus adding/deleting tuples• Technical issues – Dependence on constraint syntax – Dependence on constraint resolution order18/06/2011 Giorgos Flouris, HDMS-11 20