Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast Approximate A-box Consistency Checking using Machine Learning

851 views

Published on

Ontology reasoning is typically a computationally intensive operation. While soundness and completeness of results is required in some use cases, for many others, a sensible trade-off between computation efforts and correctness of results makes more sense. In this paper, we show that it is possible to approximate a central task in reasoning, i.e., A-box consistency checking, by training a machine learning model which approximates the behavior of that reasoner for a specific ontology. On four different datasets, we show that such learned models constantly achieve an accuracy above 95% at less than 2% of the runtime of a reasoner, using a decision tree with no more than 20 inner nodes. For example, this allows for validating 293M Microdata documents against the schema.org ontology in less than 90 minutes, compared to 18 days required by a state of the art ontology reasoner.

Published in: Data & Analytics
  • Be the first to comment

Fast Approximate A-box Consistency Checking using Machine Learning

  1. 1. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 1 Fast Approximate A-box Consistency Checking using Machine Learning Heiko Paulheim, Heiner Stuckenschmidt Warning! Highly Debated Paper!
  2. 2. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 2 How to Play Jazz Improvisation on the Piano Heiko Paulheim, Heiner Stuckenschmidt
  3. 3. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 3 Introduction • Learning Improvisation – Music comes with chord progressions – Different scales fit different chords – Scales are comprised of notes • Playing Improvisation – Inspect chord progression – Identify matching scale(s) – Play notes from those scales
  4. 4. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 4 Introduction • Essentially, the model is an ontology Scale Chord Note Major Minor consists of consists of matches • ...and improvisation playing is a reasoning task: – Given a chord progression, infer matching notes
  5. 5. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 5 Introduction • Challenge – Reasoning is often slow – But results are needed in real time • Solution – Use fast approximate model – Tolerate errors (we call them “blue notes” ;-)) Guys, can you play a bit more slowly? My reasoner is still running! Thank God I don't need reasoning
  6. 6. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 6 Introduction • How you really learn improvisation playing Scale Chord Note Major Minor consists of consists of matches 1) Learn about chords, scales, etc. 2) Play, play, play 3) Forget about chords, scales, etc.
  7. 7. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 7 Introduction • Can we learn to imitate a reasoner? – By learning a simpler approximate model • Setup – Observe what a reasoner is doing – Use inputs and outputs as training signals – Train a machine learning model – Use the (fast) machine learning model • In this paper – Restricted to A-box consistency checking – i.e., a binary classification task (consistent/inconsistent)
  8. 8. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 8 Motivation • Key assumptions – We have to classify many A-boxes for the same T-box • i.e., a T-box specific reasoning model makes sense – 100% accuracy is often not needed where reasoning is used • e.g., information retrieval, recommender systems – On the other hand, real-time results are desirable – Larger parts of the ontology and language expressivity may be neglectable • see, e.g., our findings on schema.org* * Meusel, Bizer, and Paulheim (2015): A web-scale study of the adoption and evolution of the schema.org vocabulary over time
  9. 9. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 9 Approach • Obtain labeled training data using an actual reasoner • Feed training data and labels into learning algorithm – Requires propositional form (i.e., feature vectors) • Use learned model as approximation – Validate against ground truth (i.e., actual reasoner) T-box A-box Ontology Reasoner ML Classifier consistency Feature Vector Builder instances (features) A-box consistency (prediction) training data application data
  10. 10. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 10 Approach • Turning RDF graphs into proposition features – We use root path kernels (aka walk kernels)* – Variation: literals are represented by their datatype, not the actual value * Lösch et al. (2012): Graph kernels for RDF data _:1 _:2 _:3 schema.org/ director schema.org/ actor “Red Tails”schema.org/ name schema.org/Person rdf:type rdf:type “Anthony Hemingway” schema.org/ name “Cuba Gooding Jr.” schema.org/ name schema.org:name→ xsd:string schema.org:director→ schema.org:name→ xsd:string schema.org:actor→ schema.org:name→ xsd:string schema.org:director→ rdf:type→ schema.org:Person schema.org:actor→ rdf:type→ schema.org/Person rdf:type→ schema.org/Person ` _:1 1 1 1 1 1 0 _:2 1 0 0 0 0 1 _:3 1 0 0 0 0 1
  11. 11. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 11 Evaluation • Four cases – Validating individual relation assertions in DBpedia+DOLCE* • A relation assertion plus subject's and object's types • DBpedia+DOLCE Ultra Lite ontology – Validating individuals in YAGO+DOLCE • An individual plus its types • Using a mapping from YAGO to DOLCE Ultra Lite ontology – Validating entire RDFa documents against the GoodRelations ontology • From WebDataCommons – Validating entire Microdata documents against the schema.org ontology • From WebDataCommons • Top level class disjointnesses added to schema.org ontology** * Setting as in Paulheim and Gangemi (2015): Serving DBpedia with DOLCE - more than just adding a cherry on top ** Using the OWL version from http://topbraid.org/schema/
  12. 12. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 12 Evaluation Tim Berners-Lee Royal Society Award award range Description subclass of Social Agent disjoint with DBpedia ontology DBpedia instances DOLCE ontology Organisation is a Social Person equivalent class subclass of • DBpedia and DOLCE Ultra Lite – Mapping included since release 3.9 (2013) – Adds few top level disjointnesses • Can help detect inconsistent assignments
  13. 13. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 13 Evaluation • Disjointness axioms added for schema.org – Between nine top level classes – Human judgement based on intensional disjointness WebDataCommons
  14. 14. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 14 Evaluation • Three variants for each case – 11, 111, and 1,111 A-box – Rationale: training on 10, 100, and 1,000 A-box, 10-fold CV • Comparison and ground truth system: HermiT
  15. 15. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 15 Results • Using four well-known classifiers (in RapidMiner+Weka) – In standard settings – General: Decision Trees work best • And are surprisingly small (<20 nodes)
  16. 16. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 16 Results • Example Decision Tree (DBpedia)
  17. 17. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 17 Results • Scalability considerations: how long would it take to validate – All statements in DBpedia (~15M) – All instances in YAGO (~2.9M) – All RDFa documents using GoodRelations (~356k) – All Microdata documents using schema.org (~293M) • Formula for extrapolation: • Results:
  18. 18. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 18 Summary • Ontology reasoning can be well approximated • Decision trees reach >95% accuracy w/ 1,000 labeled examples • Decision trees are small (<20 nodes) – Interesting to apply in computationally limited settings, e.g., smart devices or sensors • Approximation is highly scalable – Validating all schema.org Microdata documents: 90 minutes vs. 18 days • Why does it work so well? – Learned models concentrate on relevant fragments – Reasons for inconsistency are not equally distributed → learned models adapt to most frequent inconsistencies
  19. 19. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 19 Outlook on Future Work • Faster application of learned models possible – Think: translate 20 node decision tree into a set of SPARQL queries • Ontology summarization – Learned models (e.g., trees) point to relevant concepts • Uncovering inconsistency explanations – e.g., structured prediction • More effective and efficient reasoner usage – Example selection by active learning • Systematically investigate interplay between – Feature representation – Ontology complexity – Learning paradigm
  20. 20. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 20 Thank you! Q/A session for a highly debated paper, as imagined by the speaker. Details may vary.
  21. 21. 05/31/16 Heiko Paulheim, Heiner Stuckenschmidt 21 Fast Approximate A-box Consistency Checking using Machine Learning Heiko Paulheim, Heiner Stuckenschmidt

×