Invited Talk at the 8th International Conference on Scalable Uncertainty Management (SUM)
The talk outlines applications of supervised structured machine learning and presents a specific refinement operator based approach for RDF/OWL. It also outlines how similar ideas can be used in other (formal) languages, in particular link specifications.
We extend RDF with the ability to represent property values that exist, but are unknown or partially known, using constraints. Following ideas from the incomplete information literature, we develop a semantics for this extension of RDF, called RDFi, and study SPARQL query evaluation in this framework.
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
I used this slideset to present our research paper at the 14th Int. Semantic Web Conference (ISWC 2015). Find a preprint of the paper here:
http://olafhartig.de/files/HartigPerez_ISWC2015_Preprint.pdf
The talk was given at the 15th International Conference on Extending Database Technology (EDBT 2012) on March 29, 2012 in Berlin, Germany.
Abstract:
Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-of-the-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks.
Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.
We extend RDF with the ability to represent property values that exist, but are unknown or partially known, using constraints. Following ideas from the incomplete information literature, we develop a semantics for this extension of RDF, called RDFi, and study SPARQL query evaluation in this framework.
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
I used this slideset to present our research paper at the 14th Int. Semantic Web Conference (ISWC 2015). Find a preprint of the paper here:
http://olafhartig.de/files/HartigPerez_ISWC2015_Preprint.pdf
The talk was given at the 15th International Conference on Extending Database Technology (EDBT 2012) on March 29, 2012 in Berlin, Germany.
Abstract:
Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-of-the-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks.
Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
Answer set programming (ASP) is a prominent knowledge representation and reasoning paradigm that found both industrial and scientific applications. The success of ASP is due to the combination of two factors: a rich modeling language and the availability of efficient ASP implementations. In this talk we trace the history of ASP systems, describing the key evaluation techniques and their implementation in actual tools.
Full version of http://www.slideshare.net/valexiev1/gvp-lodcidocshort. Same is available on http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version.
2014-09-09: Getty special session: short version
R is a programming language and software environment for statistical analysis, graphics representation and reporting. Are You Interested to Learning R Programming in Best Institute Join Besant Technologies in Bangalore.
Many Linked Data datasets model elements in their domains in the form of lists: a countable number of ordered resources.
When publishing these lists in RDF, an important concern is making them easy to consume.
Therefore, a well-known recommendation is to find an existing list modelling solution, and reuse it.
However, a specific domain model can be implemented in different ways and vocabularies may provide alternative solutions.
In this paper, we argue that a wrong decision could have a significant impact in terms of performance and, ultimately, the availability of the data.
We take the case of RDF Lists and make the hypothesis that the efficiency of retrieving sequential linked data depends primarily on how they are modelled (triple-store invariance hypothesis).
To demonstrate this, we survey different solutions for modelling sequences in RDF, and propose a pragmatic approach for assessing their impact on data availability.
Finally, we derive good (and bad) practices on how to publish lists as linked open data.
By doing this, we sketch the foundations of an empirical, task-oriented methodology for benchmarking linked data modelling solutions.
JIMS Vasant Kunj-II is one of the best BCA colleges in Delhi NCR. The Course content provided to BCA students are well updated and as per the Demand of the IT Industry. It helps to get Placements in Top IT Companies.
This Pdf includes the Details of File Handling in C. This comes in Unit IV.
Information Content based Ranking Metric for Linked Open VocabulariesGhislain Atemezing
This talk was presented in Leipzig, during the SEMANTiCS '2014 Conference, in September. It basically gives an overview of how Information Content Theory metrics can be applied to Semantic Web, and especially to vocabularies. The results of the proposed ranking metrics can be applied in three areas: (1) vocabulary life-cycle management, (ii) semantic web visualizations and (iii) Interlinking process.
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
Answer set programming (ASP) is a prominent knowledge representation and reasoning paradigm that found both industrial and scientific applications. The success of ASP is due to the combination of two factors: a rich modeling language and the availability of efficient ASP implementations. In this talk we trace the history of ASP systems, describing the key evaluation techniques and their implementation in actual tools.
Full version of http://www.slideshare.net/valexiev1/gvp-lodcidocshort. Same is available on http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version.
2014-09-09: Getty special session: short version
R is a programming language and software environment for statistical analysis, graphics representation and reporting. Are You Interested to Learning R Programming in Best Institute Join Besant Technologies in Bangalore.
Many Linked Data datasets model elements in their domains in the form of lists: a countable number of ordered resources.
When publishing these lists in RDF, an important concern is making them easy to consume.
Therefore, a well-known recommendation is to find an existing list modelling solution, and reuse it.
However, a specific domain model can be implemented in different ways and vocabularies may provide alternative solutions.
In this paper, we argue that a wrong decision could have a significant impact in terms of performance and, ultimately, the availability of the data.
We take the case of RDF Lists and make the hypothesis that the efficiency of retrieving sequential linked data depends primarily on how they are modelled (triple-store invariance hypothesis).
To demonstrate this, we survey different solutions for modelling sequences in RDF, and propose a pragmatic approach for assessing their impact on data availability.
Finally, we derive good (and bad) practices on how to publish lists as linked open data.
By doing this, we sketch the foundations of an empirical, task-oriented methodology for benchmarking linked data modelling solutions.
JIMS Vasant Kunj-II is one of the best BCA colleges in Delhi NCR. The Course content provided to BCA students are well updated and as per the Demand of the IT Industry. It helps to get Placements in Top IT Companies.
This Pdf includes the Details of File Handling in C. This comes in Unit IV.
Information Content based Ranking Metric for Linked Open VocabulariesGhislain Atemezing
This talk was presented in Leipzig, during the SEMANTiCS '2014 Conference, in September. It basically gives an overview of how Information Content Theory metrics can be applied to Semantic Web, and especially to vocabularies. The results of the proposed ranking metrics can be applied in three areas: (1) vocabulary life-cycle management, (ii) semantic web visualizations and (iii) Interlinking process.
Slides for the following paper: NLP Data Cleansing Based on Linguistic Ontology Constraints
Abstract: Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is – compared to other domains, such as biology – a late Linked Data adopter. However, it has seen a
steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues.
n overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
An overview of existing solutions for link discovery and looked into some of the state-of-art algorithms for the rapid execution of link discovery tasks focusing on algorithms which guarantee result completeness.
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded.
In this paper we highlight the current status of progress toward semantic representation of science in ELNs.
Translating Natural Language into SPARQL for Neural Question AnsweringTommaso Soru
Using Neural SPARQL Machines to translate an utterance into a structured query for question answering over the Linked Open Data cloud.
Invited talk at the 6th Leipzig Semantic Web Day (LSWT2018).
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingua...Normunds Grūzītis
In the era of Big Data and Deep Learning, there is a common view that machine learning approaches are the only way to cope with the robust and scalable information extraction and summarization. It has been recently proposed that the CNL approach could be scaled up, building on the concept of embedded CNL and, thus, allowing for CNL-based information extraction from e.g. normative or medical texts that are rather controlled by nature but still infringe the boundaries of CNL. Although it is arguable if CNL can be exploited to approach the robust wide-coverage semantic parsing for use cases like media monitoring, its potential becomes much more obvious in the opposite direction: generation of story highlights from the summarized AMR graphs, which is in the focus of this position paper.
Abstract:
An increasing number of applications rely on RDF, OWL 2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL 2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
Similar to Machine Learning Methods for Analysing and Linking RDF Data (20)
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Machine Learning Methods for Analysing and Linking RDF Data
1. Machine Learning Methods
for Analysing and Linking RDF Data
Jens Lehmann
September 16, 2014
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 1 / 35
2. Structured Machine Learning
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
3. Structured Machine Learning
How to analyse
structured data?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
4. Detecting Prime Patterns: Series Finder
Construct "Modus operandi" of criminals - identified 9 new crime
patterns in Cambridge MA, USA
Wang, Tong, et al. "Detecting Patterns of Crime with Series Finder." AAAI 2013.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 3 / 35
5. Discovery of Laws of Physics
Background data generated using experiments
Mathematical functions on input variables form hypothesis space
Schmidt, Lipson. "Distilling free-form natural laws from experimental data." Science 2009.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 4 / 35
6. Protein Interaction
Rules learned via Inductive Logic Programming (ProGolem)
understandable by experts and competitive with statistical learners
Possibly better drug design and reduction of side effects
Santos et al. "Automated identification of protein-ligand interaction features using Inductive
Logic Programming: a hexose binding case study." BMC Bioinformatics 2012.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 5 / 35
7. Background Knowledge
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 6 / 35
8. RDF and the Linked Data Principles
RDF Triple:
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
9. RDF and the Linked Data Principles
RDF Triple:
Example:
|http://cs.o{xz.ac.uk/John}
Subject
|http://cs.ox.{azc.uk/studies}
Predicate
|http://cs.{ozx.ac.uk/CS}
Object
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
10. RDF and the Linked Data Principles
RDF Triple:
Example:
|http://cs.o{xz.ac.uk/John}
Subject
|http://cs.ox.{azc.uk/studies}
Predicate
|http://cs.{ozx.ac.uk/CS}
Object
The term Linked Data refers to a set of best practices for publishing and
interlinking structured data on the Web.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
11. RDF and the Linked Data Principles
RDF Triple:
Example:
|http://cs.o{xz.ac.uk/John}
Subject
|http://cs.ox.{azc.uk/studies}
Predicate
|http://cs.{ozx.ac.uk/CS}
Object
The term Linked Data refers to a set of best practices for publishing and
interlinking structured data on the Web.
Linked Data principles (simplified version):
1 Use RDF and URLs as identifiers
2 Include links to other datasets
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
12. OWL Ontologies
Web Ontology Language (OWL) builds on RDF and Description
Logics
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
13. OWL Ontologies
Web Ontology Language (OWL) builds on RDF and Description
Logics
Objects
Specific resources (constants)
Examples: MARIA, LEIPZIG
Classes
Sets of objects (unary predicates)
Examples: Student, Car, Country
Properties
Connections between objects (binary predicates)
Examples: hasChild, partOf
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
14. OWL Ontologies
Web Ontology Language (OWL) builds on RDF and Description
Logics
Objects
Specific resources (constants)
Examples: MARIA, LEIPZIG
Classes
Sets of objects (unary predicates)
Examples: Student, Car, Country
Properties
Connections between objects (binary predicates)
Examples: hasChild, partOf
Can be combined to complex concepts (OWL Class Expressions), e.g.:
Child u 9hasParent.Professor
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
15. Learning OWL Class Expressions - Definition
Given:
Background Knowledge (OWL ontologies and RDF datasets)
Positive and negative examples (objects in datasets)
Goal:
Find OWL class expression describing positive but not negative
examples
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 9 / 35
16. Application Example: Therapy Response Prediction
0.5-1% of population affected by Rheumatoid Arthritis
Anti-TNF not effective for several million persons for unknown reasons
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 10 / 35
17. Learning OWL Class Expressions - Approaches
Least common subsumers
Cohen et al. Computing least common subsumers in description
logics. AAAI 1992
Terminological decision trees
Fanizzi et al. Induction of concepts in web ontologies through
terminological decision trees. ECML PKDD 2010
Rule-based
Fanizzi et al. DL-FOIL concept learning in description logics. ILP
2008
Genetic Programming
Lehmann, Jens. Hybrid learning of ontology classes. MLDM 2007
Refinement operators
Lehmann et al. Concept learning in description logics using refinement
operators. ML 2010
Iannone et al. An algorithm based on counterfactuals for concept
learning in the semantic web. AI 2007
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 11 / 35
18. Refinement Operators - Definitions
Given a DL L, consider the quasi-ordered space hC(L),vT i over
concepts of L
: C(L) ! 2C(L) is a downward L refinement operator if for any
C 2 C(L):
D 2 (C) implies D vT C
Notation: Write C D instead of D 2 (C)
Example refinement chain:
Person Man Man u 9hasChild.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 12 / 35
19. Learning using Refinement Operators
0,45
too weak
Car
0,73
Person
0,78
Person u 9attends.
0,97
Person u 9attends.Talk
. . .
. . .
. . .
Start with most
general concept
(top down)
Heuristic evaluates
using pos/neg
examples
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
20. Learning using Refinement Operators
0,45
too weak
Car
0,73
Person
0,78
Person u 9attends.
0,97
Person u 9attends.Talk
. . .
. . .
. . .
Start with most
general concept
(top down)
Heuristic evaluates
using pos/neg
examples
Operator specialises
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
21. Learning using Refinement Operators
0,45
too weak
Car
0,73
Person
0,78
Person u 9attends.
0,97
Person u 9attends.Talk
. . .
. . .
. . .
Start with most
general concept
(top down)
Heuristic evaluates
using pos/neg
examples
Operator specialises
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
22. Learning using Refinement Operators
0,45
too weak
Car
0,73
Person
0,78
Person u 9attends.
0,97
Person u 9attends.Talk
. . .
. . .
. . .
Start with most
general concept
(top down)
Heuristic evaluates
using pos/neg
examples
Operator specialises
Continue until
termination
criterion met
=
Learning Algorithm
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
23. Properties of Refinement Operators
An L downward refinement operator is called
Finite iff (C) is finite for any concept C 2 C(L)
C
C1 . . . . . . Cn
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
24. Properties of Refinement Operators
An L downward refinement operator is called
Finite iff (C) is finite for any concept C 2 C(L)
Redundant iff there exist two different refinement chains from a
concept C to a concept D.
C
C1 . . . . . . Cn
C
E . . .
D
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
25. Properties of Refinement Operators
An L downward refinement operator is called
Finite iff (C) is finite for any concept C 2 C(L)
Redundant iff there exist two different refinement chains from a
concept C to a concept D.
Proper iff for C,D 2 C(L), C D implies C6T D
C
C1 . . . . . . Cn
C
E . . .
D
C
C E
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
26. Properties of Refinement Operators
An L downward refinement operator is called
Finite iff (C) is finite for any concept C 2 C(L)
Redundant iff there exist two different refinement chains from a
concept C to a concept D.
Proper iff for C,D 2 C(L), C D implies C6T D
Complete iff for C,D 2 C(L) with D @T C there is a concept E with
E T D and a refinement chain C · · · E
Weakly complete iff for any concept C with C @T we can reach a
concept E with E T C from by .
C
C1 . . . . . . Cn
C
E . . .
D
C
C E
C
. . .
D E
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
27. Properties of Refinement Operators
Properties indicate how suitable a refinement operator is for solving
the learning problem:
Incomplete operators may miss solutions
Redundant operators may lead to duplicate concepts in the search tree
Improper operators may produce equivalent concepts (which cover the
same examples)
For infinite operators it may not be possible to compute all refinements
of a given concept
Key question: Which properties can be combined?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 15 / 35
28. Theorem: Properties of L Refinement Operators
Theorem
Maximum sets of combinable properties of L refinement operators for
L 2 {ALC,ALCN, SHOIN, SROIQ} are:
1 {weakly complete, complete, finite}
2 {weakly complete, complete, proper}
3 {weakly complete, non-redundant, finite}
4 {weakly complete, non-redundant, proper}
5 {non-redundant, finite, proper}
Concept Learning in Description Logics Using Refinement Operators, Lehmann, Hitzler, Ma-chine
Learning journal, 2010
Foundations of Refinement Operators for Description Logics; Lehmann, Hitzler, ILP confer-ence,
2008
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 16 / 35
29. Definition of
(C) =
n
{?} [ (C) if C =
(C) otherwise
B(C) =
8
:
; if C = ?
{C1 t · · · t Cn | Ci 2 MB (1 i n)} if C =
{A0 | A0 2 sh#(A)} if C = A (A 2 NC )
[{A u D | D 2 B()}
{¬A0 | A0 2 sh(A)} if C = ¬A (A 2 NC )
[{¬A u D | D 2 B()}
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
{8r.E | A = ar(r), E 2 A(D)} if C = 8r.D
[ {8r.D u E | E 2 B()}
[ {8r.? |
D = A 2 NC and sh#(A) = ;}
[ {8s.D | s 2 sh#(r)}
{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u Cn
D 2 B(Ci ), 1 i n} (n 2)
{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t Cn
D 2 B(Ci ), 1 i n} (n 2)
[ {(C1 t · · · t Cn) u D |
D 2 B()}
Base Operator (Excerpt)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35
30. Definition of
(C) =
n
{?} [ (C) if C =
(C) otherwise
B(C) =
8
:
; if C = ?
{C1 t · · · t Cn | Ci 2 MB (1 i n)} if C =
{A0 | A0 2 sh#(A)} if C = A (A 2 NC )
[{A u D | D 2 B()}
{¬A0 | A0 2 sh(A)} if C = ¬A (A 2 NC )
[{¬A u D | D 2 B()}
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
{8r.E | A = ar(r), E 2 A(D)} if C = 8r.D
[ {8r.D u E | E 2 B()}
[ {8r.? |
D = A 2 NC and sh#(A) = ;}
[ {8s.D | s 2 sh#(r)}
{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u Cn
D 2 B(Ci ), 1 i n} (n 2)
{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t Cn
D 2 B(Ci ), 1 i n} (n 2)
[ {(C1 t · · · t Cn) u D |
D 2 B()}
Base Operator (Excerpt)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35
31. Definition of
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
Examples:
9takesPartIn.SocialEvent
9takesPartIn.Meeting
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
32. Definition of
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
Examples:
9takesPartIn.SocialEvent
9takesPartIn.Meeting
Student u 9takesPartIn.SocialEvent
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
33. Definition of
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
Examples:
9takesPartIn.SocialEvent
9takesPartIn.Meeting
Student u 9takesPartIn.SocialEvent
9leads.SocialEvent
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
34. Properties of
# is complete
# is infinite, e.g. there are infinitely many refinement steps of the
form:
# C1 t C2 t C3 t . . .
cl
# is proper
# is redundant: 8r1.A1 t 8r2.A1 # 8r1.(A1 u A2) t 8r2.A1
#
#
8r1.A1 t 8r2.(A1 u A2) # 8r1.(A1 u A2) t 8r2.(A1 u A2)
“DL-Learner: Learning Concepts in Description Logics”,
Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 19 / 35
35. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
36. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
37. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
38. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
39. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
40. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
41. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
42. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
43. Scalability
Refinement operator should build coherent concepts
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz
Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
44. Scalability
Refinement operator should build coherent concepts
Inference:
Complete sound vs. approximation
Open World Assumption (OWA) vs. Closed World Assumption (CWA)
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz
Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
45. Scalability
Refinement operator should build coherent concepts
Inference:
Complete sound vs. approximation
Open World Assumption (OWA) vs. Closed World Assumption (CWA)
Stochastic coverage computation
Pick random example ! perform instance check ! compute
confidence interval (e.g. via Wald Method) wrt. objective function
(e.g. F-measure)
Up to 99% less instance checks in test examples
Low influence on accuracy shown for 380 learning tasks using 7
ontologies (0, 2% ± 0, 4% F-measure difference)
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz
Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
46. Scalability
Refinement operator should build coherent concepts
Inference:
Complete sound vs. approximation
Open World Assumption (OWA) vs. Closed World Assumption (CWA)
Stochastic coverage computation
Pick random example ! perform instance check ! compute
confidence interval (e.g. via Wald Method) wrt. objective function
(e.g. F-measure)
Up to 99% less instance checks in test examples
Low influence on accuracy shown for 380 learning tasks using 7
ontologies (0, 2% ± 0, 4% F-measure difference)
Fragment extraction for application on large knowledge bases
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz
Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
47. Carcinogenesis
Goal: predict whether substance causes cancer
Why:
Each year 1000 new substances developed
Substances can often be only be validated using time consuming and
expensive experiments with mice ! prioritise those with high risk
Background knowledge:
Database of the US National Toxicology Program (NTP)
“Obtaining accurate structural alerts for the causes of chemical cancers is
a problem of great scientific and humanitarian value.” (A. Srinivasan, R.D.
King, S.H. Muggleton, M.J.E. Sternberg 1997)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 22 / 35
48. Knowledge Base Enrichment
Pattern Based Knowledge Base Enrichment; Lorenz Bühmann, Jens Lehmann; International
Semantic Web Conference (ISWC) 2013
Universal OWL Axiom Enrichment for Large Knowledge Bases; Lorenz Bühmann, Jens
Lehmann; Knowledge Engineering and Knowledge Management (EKAW) 2012
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 23 / 35
49. Protégé Plugin
Support for ontology creation and maintenance
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 24 / 35
50. Ontology Debugging: ORE
ORE - A Tool for Repairing and Enriching Knowledge Bases; Lehmann, Bühmann; Interna-tional
Semantic Web Conference (ISWC) 2010
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 25 / 35
51. Data Quality Measurement: RDFUnit
Test-driven Evaluation of Linked Data Quality; World Wide Web Conference (WWW),
ACM, 2014; Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens
Lehmann, Roland Cornelissen, Amrapali J. Zaveri
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 26 / 35
52. Robot Scientists Adam Eve
Abduction to form hypothesis and 1 000 experiments per day
12 new scientific discoveries regarding functions of genes in yeast
King, Ross D et al. The automation of science. Science 324 (2009): 85-89.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 27 / 35
53. Link Discovery - Motivation
Links are backbone of traditional WWW and Data Web
Links are central for data integration, deduplication, cross-ontology
question answering, reasoning, federated queries . . .
Central problem for many large IT companies
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35
54. Link Discovery - Motivation
Links are backbone of traditional WWW and Data Web
Links are central for data integration, deduplication, cross-ontology
question answering, reasoning, federated queries . . .
Central problem for many large IT companies
Automated tools (LIMES, SILK) can create a high number of links
between RDF resources by using heuristics
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35
55. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
56. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
S: DBpedia
rdfs:label: African Elephant
T: BBC Wildlife
dc:title: African Bush Elephant
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
57. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
S: DBpedia
rdfs:label: African Elephant
T: BBC Wildlife
dc:title: African Bush Elephant
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
58. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
S: DBpedia
rdfs:label: African Elephant
T: BBC Wildlife
dc:title: African Bush Elephant
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
= levenshtein(S.rdfs:label,T.dc:title)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
59. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
S: DBpedia
rdfs:label: African Elephant
T: BBC Wildlife
dc:title: African Bush Elephant
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
= levenshtein(S.rdfs:label,T.dc:title)
(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
60. Example: Link Specification
f (trigrams(:name, :label), 0.5) f (edit(:socId, :socId), 0.5)
t
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 30 / 35
61. Link Specification Syntax and Semantics
LS [[LS]]
f (m, ,M) {(s, t, r)|(s, t, r) 2 M ^ (m(s, t) )}
LS1 u LS2 {(s, t, r) | (s, t, r1) 2 [[L1]] ^ (s, t, r2) 2 [[L2]] ^ r = min(r1, r2)}
LS1 t LS2
8
:
(s, t, r) |
8 :
r = r1 if 9(s, t, r1) 2 [[L1]] ^ ¬(9r2 : (s, t, r2) 2 [[L2]]),
r = r2 if 9(s, t, r2) 2 [[L2]] ^ ¬(9r1 : (s, t, r1) 2 [[L1]]),
r = max(r1, r2) if (s, t, r1) 2 [[L1]] ^ (s, t, r2) 2 [[L2]].
Syntax and semantics allow to define an ordering similar to
subsumption (more specific specs generate less links)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 31 / 35
62. Link Specification Refinement Operator
#(LS) =
8
:
{f (m1, 1, ) u · · · u f (mn, 1, ) if LS = ?
| mi 2 SM, 1 i n, n 2|SM|}
f (m, dt(),M) [ LS t f (m0, 1,M) if LS = f (m, ,M) (atomic)
(m 2 SM,m6= m0)
LS1 u · · · u LSi−1 u LS0 u LSi+1 u · · · u LSn if LS = LS1 u · · · u LSn(n 2)
with LS0 2 #(LSi )
LS1 t · · · t LSi−1 t LS0 t LSi+1 t · · · t LSn if LS = LS1 t · · · t LSn(n 2)
with LS0 2 #(LSi ) [ LS t f (m, 1,M)
(m 2 SM,m not used in LS)
Upward refinement operator
Postitive: Weakly complete, finite
Negative: Not complete, redundant, not proper
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 32 / 35
63. Refinement Chain Example
f (edit(:socId, :socId), 1.0)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
64. Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
65. Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
f (trigrams(:name, :label), 1.0) f (edit(:socId, :socId), 0.5)
t
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
66. Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
f (trigrams(:name, :label), 1.0) f (edit(:socId, :socId), 0.5)
t
f (trigrams(:name, :label), 0.5) f (edit(:socId, :socId), 0.5)
t
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
67. Projects: DL-Learner and LIMES
DL-Learner
Open-Source-Project: http://dl-learner.org
Extensible Platform for concept learning algorithms
Supports all RDF/OWL serialisations and major reasoners
Several thousand downloads
LIMES (http://aksw.org/Projects/LIMES.html)
Highly scalable engine (fastest RDF link discovery tool)
Several machine learning approaches integrated (including the one
presented)
“DL-Learner: Learning Concepts in Description Logics”,
Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 34 / 35
68. Summary Conclusions
Many interesting applications of structured machine learning (therapy
response prediction, disease prediction, protein folding, data quality
measurement, ontology debugging)
Still few machine learning tools for working with RDF/OWL although
more and more data available
Refinement operators allow to apply supervised machine learning on
complex background knowledge
Can be applied to other languages like link specifications
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 35 / 35