Presentation at JIST 2012 -- I forgot to add a link to http://en.wikipedia.org/wiki/Knowledge_extraction I mentioned it during the presentation, because some of their output would be compatible with SPARQL
Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications
Creating Knowledge out of Interlinked Data JIST 2012 – Page 1 http://lod2.eu Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications Didier Cherix, Sebastian Hellmann, Jens Lehmann http://slideshare.net/kurzum http://dl-learner.org http://lod2.eu AKSW, Universität LeipzigLOD2 Presentation . 02.09.2010 . Page http://lod2.eu
JIST 2012 – Page 2 http://lod2.eu Motivation: 2007 - 2012DL-Learner was developed in parallel to DBpedia at University Leipzig since 2007DL-Learner is a tool for learning concepts in Description Logics (DLs) from user-provided examples.Worked very well for small to medium sized data sets, e.g. Carcinogenesis an otherML problems from the UCI ML repositoryLimit is the capacity of current OWL-DL reasonersChallenge was (and is) to do reasoning-based, supervized Machine Learning onthe DBpedia Dataset (> 200 Mio triples) or larger datasets
JIST 2012 – Page 6 http://lod2.eu Introduction DL-LearnerDL-Learner heavily relies on instance checks for machine learning, so the OWLReasoner is the bottle neckUnderlying idea:Only select relevant data for the Machine Learning Problem based on user-givenexamples→ Reduces the amount of triples that have to be given to a reasoner→ Reduces complexity and size of the OWL schemaBrute-force approach:Load all data into the OWL Reasoner, then do instance checks→ infeasible for DbpediaIterative approach (old component):Iterate over all instances and fetch the data recursively→ inefficient even with caching
JIST 2012 – Page 12 http://lod2.eu Introduction DL-Learner Challenge: What is the most efficient way to retrieve such a fragment?
JIST 2012 – Page 13 http://lod2.eu Improvements of the New Component• Step 1: Indexing the T-Box: • Download the OWL Schema and index it in memory • either via SPARQL or OWL file
JIST 2012 – Page 14 http://lod2.eu Improvements of the New Component • Step 2: A-Box QueriesParameter recursion depth:Retrieve newly discovered bindings to ?o until a certain depth is reached.
JIST 2012 – Page 15 http://lod2.eu Improvements of the New Component• Step 3: Typing the retrieved instances
JIST 2012 – Page 16 http://lod2.eu Improvements of the New Component• Step 4: T-Box Index: All “relevant” T-Box information is added via the index to the fragment. For each class already in the fragment. all superclasses and their equivalentClass axioms are added
JIST 2012 – Page 17 http://lod2.eu Benchmarking - SpeedFor each class in DBpedia Ontology:- 30 instances as positives- 30 negatives from a sister class
JIST 2012 – Page 18 http://lod2.eu Benchmarking – F-Measure on the training data 70% of the results for each class had an F-measure of 90-100% on the training data
JIST 2012 – Page 19 http://lod2.eu SPARQL Retrieval Component Impact• DL-Learner – http://dl-learner.org• DBpedia Navigator• Tiger Corpus Navigator• AutoSPARQL - http://autosparql.dl-learner.org/• HANNE – http://hanne.aksw.org• ORE - http://aksw.org/Projects/ORE Sebastian Hellmann, Jens Lehmann und Sören Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases In: International Journal on Semantic Web and Information Systems, 2009 Web Applications Active Learning → User Interaction and Feedback
JIST 2012 – Page 20 http://lod2.eu Future Work• Research Paper in Session 4b (tomorrow at 15:10) Navigation-induced Knowledge Engineering by Example• Caching + more sophisticated options• Large scale learning problems http://slideshare.net/kurzum Homepage: http://dl-learner.org Source code: http://sourceforge.net/projects/dl-learner/
JIST 2012 – Page 22 http://lod2.eu ExampleSebastian Hellmann, Jens Lehmann und Sören Auer:Learning of OWL Class Descriptions on Very Large Knowledge BasesIn: International Journal on Semantic Web and Information Systems, 2009