Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Machine Learning Methods 
for Analysing and Linking RDF Data 
Jens Lehmann 
September 16, 2014 
Jens Lehmann (AKSW, Uni Le...
Structured Machine Learning 
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
Structured Machine Learning 
How to analyse 
structured data? 
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF ...
Detecting Prime Patterns: Series Finder 
Construct "Modus operandi" of criminals - identified 9 new crime 
patterns in Cam...
Discovery of Laws of Physics 
Background data generated using experiments 
Mathematical functions on input variables form ...
Protein Interaction 
Rules learned via Inductive Logic Programming (ProGolem) 
understandable by experts and competitive w...
Background Knowledge 
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 6 / 35
RDF and the Linked Data Principles 
RDF Triple: 
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September...
RDF and the Linked Data Principles 
RDF Triple: 
Example: 
|http://cs.o{xz.ac.uk/John} 
Subject 
|http://cs.ox.{azc.uk/stu...
RDF and the Linked Data Principles 
RDF Triple: 
Example: 
|http://cs.o{xz.ac.uk/John} 
Subject 
|http://cs.ox.{azc.uk/stu...
RDF and the Linked Data Principles 
RDF Triple: 
Example: 
|http://cs.o{xz.ac.uk/John} 
Subject 
|http://cs.ox.{azc.uk/stu...
OWL Ontologies 
Web Ontology Language (OWL) builds on RDF and Description 
Logics 
Jens Lehmann (AKSW, Uni Leipzig) Analys...
OWL Ontologies 
Web Ontology Language (OWL) builds on RDF and Description 
Logics 
Objects 
Specific resources (constants)...
OWL Ontologies 
Web Ontology Language (OWL) builds on RDF and Description 
Logics 
Objects 
Specific resources (constants)...
Learning OWL Class Expressions - Definition 
Given: 
Background Knowledge (OWL ontologies and RDF datasets) 
Positive and ...
Application Example: Therapy Response Prediction 
 0.5-1% of population affected by Rheumatoid Arthritis 
Anti-TNF not eff...
Learning OWL Class Expressions - Approaches 
Least common subsumers 
Cohen et al. Computing least common subsumers in desc...
Refinement Operators - Definitions 
Given a DL L, consider the quasi-ordered space hC(L),vT i over 
concepts of L 
 : C(L)...
Learning using Refinement Operators 
0,45 
 
too weak 
Car 
0,73 
Person 
0,78 
Person u 9attends. 
0,97 
Person u 9attend...
Learning using Refinement Operators 
0,45 
 
too weak 
Car 
0,73 
Person 
0,78 
Person u 9attends. 
0,97 
Person u 9attend...
Learning using Refinement Operators 
0,45 
 
too weak 
Car 
0,73 
Person 
0,78 
Person u 9attends. 
0,97 
Person u 9attend...
Learning using Refinement Operators 
0,45 
 
too weak 
Car 
0,73 
Person 
0,78 
Person u 9attends. 
0,97 
Person u 9attend...
Properties of Refinement Operators 
An L downward refinement operator  is called 
Finite iff (C) is finite for any concept...
Properties of Refinement Operators 
An L downward refinement operator  is called 
Finite iff (C) is finite for any concept...
Properties of Refinement Operators 
An L downward refinement operator  is called 
Finite iff (C) is finite for any concept...
Properties of Refinement Operators 
An L downward refinement operator  is called 
Finite iff (C) is finite for any concept...
Properties of Refinement Operators 
Properties indicate how suitable a refinement operator is for solving 
the learning pr...
Theorem: Properties of L Refinement Operators 
Theorem 
Maximum sets of combinable properties of L refinement operators fo...
Definition of  
(C) = 
n 
{?} [ (C) if C =  
(C) otherwise 
B(C) = 
8 
: 
; if C = ? 
{C1 t · · · t Cn | Ci 2 MB (1  i  n)...
Definition of  
(C) = 
n 
{?} [ (C) if C =  
(C) otherwise 
B(C) = 
8 
: 
; if C = ? 
{C1 t · · · t Cn | Ci 2 MB (1  i  n)...
Definition of  
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D 
[ {9r.D u E | E 2 B()} 
[ {9s.D | s 2 sh#(r)} 
Examples: 
9takes...
Definition of  
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D 
[ {9r.D u E | E 2 B()} 
[ {9s.D | s 2 sh#(r)} 
Examples: 
9takes...
Definition of  
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D 
[ {9r.D u E | E 2 B()} 
[ {9s.D | s 2 sh#(r)} 
Examples: 
9takes...
Properties of  
# is complete 
# is infinite, e.g. there are infinitely many refinement steps of the 
form: 
  # C1 t C2 t...
Learning using Refinement Operators 
0,457 [01] 
 
too weak 
Car 
0,7345789 [012345] 
Person 
0,789 [45] 
Person u 9attend...
Learning using Refinement Operators 
0,457 [01] 
 
too weak 
Car 
0,7345789 [012345] 
Person 
0,789 [45] 
Person u 9attend...
Learning using Refinement Operators 
0,457 [01] 
 
too weak 
Car 
0,7345789 [012345] 
Person 
0,789 [45] 
Person u 9attend...
Learning using Refinement Operators 
0,457 [01] 
 
too weak 
Car 
0,7345789 [012345] 
Person 
0,789 [45] 
Person u 9attend...
Learning using Refinement Operators 
0,457 [01] 
 
too weak 
Car 
0,7345789 [012345] 
Person 
0,789 [45] 
Person u 9attend...
Learning using Refinement Operators 
0,457 [01] 
 
too weak 
Car 
0,7345789 [012345] 
Person 
0,789 [45] 
Person u 9attend...
Learning using Refinement Operators 
0,457 [01] 
 
too weak 
Car 
0,7345789 [012345] 
Person 
0,789 [45] 
Person u 9attend...
Learning using Refinement Operators 
0,457 [01] 
 
too weak 
Car 
0,7345789 [012345] 
Person 
0,789 [45] 
Person u 9attend...
Scalability 
Refinement operator should build coherent concepts 
Class Expression Learning for Ontology Engineering; Jens ...
Scalability 
Refinement operator should build coherent concepts 
Inference: 
Complete  sound vs. approximation 
Open World...
Scalability 
Refinement operator should build coherent concepts 
Inference: 
Complete  sound vs. approximation 
Open World...
Scalability 
Refinement operator should build coherent concepts 
Inference: 
Complete  sound vs. approximation 
Open World...
Carcinogenesis 
Goal: predict whether substance causes cancer 
Why: 
Each year 1000 new substances developed 
Substances c...
Knowledge Base Enrichment 
Pattern Based Knowledge Base Enrichment; Lorenz Bühmann, Jens Lehmann; International 
Semantic ...
Protégé Plugin 
Support for ontology creation and maintenance 
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF ...
Ontology Debugging: ORE 
ORE - A Tool for Repairing and Enriching Knowledge Bases; Lehmann, Bühmann; Interna-tional 
Seman...
Data Quality Measurement: RDFUnit 
Test-driven Evaluation of Linked Data Quality; World Wide Web Conference (WWW), 
ACM, 2...
Robot Scientists Adam  Eve 
Abduction to form hypothesis and  1 000 experiments per day 
12 new scientific discoveries reg...
Link Discovery - Motivation 
Links are backbone of traditional WWW and Data Web 
Links are central for data integration, d...
Link Discovery - Motivation 
Links are backbone of traditional WWW and Data Web 
Links are central for data integration, d...
Link Discovery - Definition 
Definition (Link Discovery) 
Given sets S and T of resources and relation R (often owl:sameAs...
Link Discovery - Definition 
Definition (Link Discovery) 
Given sets S and T of resources and relation R (often owl:sameAs...
Link Discovery - Definition 
Definition (Link Discovery) 
Given sets S and T of resources and relation R (often owl:sameAs...
Link Discovery - Definition 
Definition (Link Discovery) 
Given sets S and T of resources and relation R (often owl:sameAs...
Link Discovery - Definition 
Definition (Link Discovery) 
Given sets S and T of resources and relation R (often owl:sameAs...
Example: Link Specification 
f (trigrams(:name, :label), 0.5) f (edit(:socId, :socId), 0.5) 
t 
Jens Lehmann (AKSW, Uni Le...
Link Specification Syntax and Semantics 
LS [[LS]] 
f (m, ,M) {(s, t, r)|(s, t, r) 2 M ^ (m(s, t)  )} 
LS1 u LS2 {(s, t, r...
Link Specification Refinement Operator 
#(LS) = 
8 
: 
{f (m1, 1, ) u · · · u f (mn, 1, ) if LS = ? 
| mi 2 SM, 1  i  n, n...
Refinement Chain Example 
f (edit(:socId, :socId), 1.0) 
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data S...
Refinement Chain Example 
f (edit(:socId, :socId), 1.0) 
  f (edit(:socId, :socId), 0.5) 
Jens Lehmann (AKSW, Uni Leipzig)...
Refinement Chain Example 
f (edit(:socId, :socId), 1.0) 
  f (edit(:socId, :socId), 0.5)   
f (trigrams(:name, :label), 1....
Refinement Chain Example 
f (edit(:socId, :socId), 1.0) 
  f (edit(:socId, :socId), 0.5)   
f (trigrams(:name, :label), 1....
Projects: DL-Learner and LIMES 
DL-Learner 
Open-Source-Project: http://dl-learner.org 
Extensible Platform for concept le...
Summary  Conclusions 
Many interesting applications of structured machine learning (therapy 
response prediction, disease ...
Upcoming SlideShare
Loading in …5
×

Machine Learning Methods for Analysing and Linking RDF Data

2,445 views

Published on

Invited Talk at the 8th International Conference on Scalable Uncertainty Management (SUM)

The talk outlines applications of supervised structured machine learning and presents a specific refinement operator based approach for RDF/OWL. It also outlines how similar ideas can be used in other (formal) languages, in particular link specifications.

Published in: Science

Machine Learning Methods for Analysing and Linking RDF Data

  1. 1. Machine Learning Methods for Analysing and Linking RDF Data Jens Lehmann September 16, 2014 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 1 / 35
  2. 2. Structured Machine Learning Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
  3. 3. Structured Machine Learning How to analyse structured data? Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
  4. 4. Detecting Prime Patterns: Series Finder Construct "Modus operandi" of criminals - identified 9 new crime patterns in Cambridge MA, USA Wang, Tong, et al. "Detecting Patterns of Crime with Series Finder." AAAI 2013. Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 3 / 35
  5. 5. Discovery of Laws of Physics Background data generated using experiments Mathematical functions on input variables form hypothesis space Schmidt, Lipson. "Distilling free-form natural laws from experimental data." Science 2009. Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 4 / 35
  6. 6. Protein Interaction Rules learned via Inductive Logic Programming (ProGolem) understandable by experts and competitive with statistical learners Possibly better drug design and reduction of side effects Santos et al. "Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study." BMC Bioinformatics 2012. Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 5 / 35
  7. 7. Background Knowledge Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 6 / 35
  8. 8. RDF and the Linked Data Principles RDF Triple: Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
  9. 9. RDF and the Linked Data Principles RDF Triple: Example: |http://cs.o{xz.ac.uk/John} Subject |http://cs.ox.{azc.uk/studies} Predicate |http://cs.{ozx.ac.uk/CS} Object Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
  10. 10. RDF and the Linked Data Principles RDF Triple: Example: |http://cs.o{xz.ac.uk/John} Subject |http://cs.ox.{azc.uk/studies} Predicate |http://cs.{ozx.ac.uk/CS} Object The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
  11. 11. RDF and the Linked Data Principles RDF Triple: Example: |http://cs.o{xz.ac.uk/John} Subject |http://cs.ox.{azc.uk/studies} Predicate |http://cs.{ozx.ac.uk/CS} Object The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. Linked Data principles (simplified version): 1 Use RDF and URLs as identifiers 2 Include links to other datasets Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
  12. 12. OWL Ontologies Web Ontology Language (OWL) builds on RDF and Description Logics Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
  13. 13. OWL Ontologies Web Ontology Language (OWL) builds on RDF and Description Logics Objects Specific resources (constants) Examples: MARIA, LEIPZIG Classes Sets of objects (unary predicates) Examples: Student, Car, Country Properties Connections between objects (binary predicates) Examples: hasChild, partOf Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
  14. 14. OWL Ontologies Web Ontology Language (OWL) builds on RDF and Description Logics Objects Specific resources (constants) Examples: MARIA, LEIPZIG Classes Sets of objects (unary predicates) Examples: Student, Car, Country Properties Connections between objects (binary predicates) Examples: hasChild, partOf Can be combined to complex concepts (OWL Class Expressions), e.g.: Child u 9hasParent.Professor Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
  15. 15. Learning OWL Class Expressions - Definition Given: Background Knowledge (OWL ontologies and RDF datasets) Positive and negative examples (objects in datasets) Goal: Find OWL class expression describing positive but not negative examples Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 9 / 35
  16. 16. Application Example: Therapy Response Prediction 0.5-1% of population affected by Rheumatoid Arthritis Anti-TNF not effective for several million persons for unknown reasons Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 10 / 35
  17. 17. Learning OWL Class Expressions - Approaches Least common subsumers Cohen et al. Computing least common subsumers in description logics. AAAI 1992 Terminological decision trees Fanizzi et al. Induction of concepts in web ontologies through terminological decision trees. ECML PKDD 2010 Rule-based Fanizzi et al. DL-FOIL concept learning in description logics. ILP 2008 Genetic Programming Lehmann, Jens. Hybrid learning of ontology classes. MLDM 2007 Refinement operators Lehmann et al. Concept learning in description logics using refinement operators. ML 2010 Iannone et al. An algorithm based on counterfactuals for concept learning in the semantic web. AI 2007 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 11 / 35
  18. 18. Refinement Operators - Definitions Given a DL L, consider the quasi-ordered space hC(L),vT i over concepts of L : C(L) ! 2C(L) is a downward L refinement operator if for any C 2 C(L): D 2 (C) implies D vT C Notation: Write C D instead of D 2 (C) Example refinement chain: Person Man Man u 9hasChild. Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 12 / 35
  19. 19. Learning using Refinement Operators 0,45 too weak Car 0,73 Person 0,78 Person u 9attends. 0,97 Person u 9attends.Talk . . . . . . . . . Start with most general concept (top down) Heuristic evaluates using pos/neg examples Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
  20. 20. Learning using Refinement Operators 0,45 too weak Car 0,73 Person 0,78 Person u 9attends. 0,97 Person u 9attends.Talk . . . . . . . . . Start with most general concept (top down) Heuristic evaluates using pos/neg examples Operator specialises Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
  21. 21. Learning using Refinement Operators 0,45 too weak Car 0,73 Person 0,78 Person u 9attends. 0,97 Person u 9attends.Talk . . . . . . . . . Start with most general concept (top down) Heuristic evaluates using pos/neg examples Operator specialises Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
  22. 22. Learning using Refinement Operators 0,45 too weak Car 0,73 Person 0,78 Person u 9attends. 0,97 Person u 9attends.Talk . . . . . . . . . Start with most general concept (top down) Heuristic evaluates using pos/neg examples Operator specialises Continue until termination criterion met = Learning Algorithm Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
  23. 23. Properties of Refinement Operators An L downward refinement operator is called Finite iff (C) is finite for any concept C 2 C(L) C C1 . . . . . . Cn Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
  24. 24. Properties of Refinement Operators An L downward refinement operator is called Finite iff (C) is finite for any concept C 2 C(L) Redundant iff there exist two different refinement chains from a concept C to a concept D. C C1 . . . . . . Cn C E . . . D Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
  25. 25. Properties of Refinement Operators An L downward refinement operator is called Finite iff (C) is finite for any concept C 2 C(L) Redundant iff there exist two different refinement chains from a concept C to a concept D. Proper iff for C,D 2 C(L), C D implies C6T D C C1 . . . . . . Cn C E . . . D C C E Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
  26. 26. Properties of Refinement Operators An L downward refinement operator is called Finite iff (C) is finite for any concept C 2 C(L) Redundant iff there exist two different refinement chains from a concept C to a concept D. Proper iff for C,D 2 C(L), C D implies C6T D Complete iff for C,D 2 C(L) with D @T C there is a concept E with E T D and a refinement chain C · · · E Weakly complete iff for any concept C with C @T we can reach a concept E with E T C from by . C C1 . . . . . . Cn C E . . . D C C E C . . . D E Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
  27. 27. Properties of Refinement Operators Properties indicate how suitable a refinement operator is for solving the learning problem: Incomplete operators may miss solutions Redundant operators may lead to duplicate concepts in the search tree Improper operators may produce equivalent concepts (which cover the same examples) For infinite operators it may not be possible to compute all refinements of a given concept Key question: Which properties can be combined? Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 15 / 35
  28. 28. Theorem: Properties of L Refinement Operators Theorem Maximum sets of combinable properties of L refinement operators for L 2 {ALC,ALCN, SHOIN, SROIQ} are: 1 {weakly complete, complete, finite} 2 {weakly complete, complete, proper} 3 {weakly complete, non-redundant, finite} 4 {weakly complete, non-redundant, proper} 5 {non-redundant, finite, proper} Concept Learning in Description Logics Using Refinement Operators, Lehmann, Hitzler, Ma-chine Learning journal, 2010 Foundations of Refinement Operators for Description Logics; Lehmann, Hitzler, ILP confer-ence, 2008 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 16 / 35
  29. 29. Definition of (C) = n {?} [ (C) if C = (C) otherwise B(C) = 8 : ; if C = ? {C1 t · · · t Cn | Ci 2 MB (1 i n)} if C = {A0 | A0 2 sh#(A)} if C = A (A 2 NC ) [{A u D | D 2 B()} {¬A0 | A0 2 sh(A)} if C = ¬A (A 2 NC ) [{¬A u D | D 2 B()} {9r.E | A = ar(r), E 2 A(D)} if C = 9r.D [ {9r.D u E | E 2 B()} [ {9s.D | s 2 sh#(r)} {8r.E | A = ar(r), E 2 A(D)} if C = 8r.D [ {8r.D u E | E 2 B()} [ {8r.? | D = A 2 NC and sh#(A) = ;} [ {8s.D | s 2 sh#(r)} {C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u Cn D 2 B(Ci ), 1 i n} (n 2) {C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t Cn D 2 B(Ci ), 1 i n} (n 2) [ {(C1 t · · · t Cn) u D | D 2 B()} Base Operator (Excerpt) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35
  30. 30. Definition of (C) = n {?} [ (C) if C = (C) otherwise B(C) = 8 : ; if C = ? {C1 t · · · t Cn | Ci 2 MB (1 i n)} if C = {A0 | A0 2 sh#(A)} if C = A (A 2 NC ) [{A u D | D 2 B()} {¬A0 | A0 2 sh(A)} if C = ¬A (A 2 NC ) [{¬A u D | D 2 B()} {9r.E | A = ar(r), E 2 A(D)} if C = 9r.D [ {9r.D u E | E 2 B()} [ {9s.D | s 2 sh#(r)} {8r.E | A = ar(r), E 2 A(D)} if C = 8r.D [ {8r.D u E | E 2 B()} [ {8r.? | D = A 2 NC and sh#(A) = ;} [ {8s.D | s 2 sh#(r)} {C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u Cn D 2 B(Ci ), 1 i n} (n 2) {C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t Cn D 2 B(Ci ), 1 i n} (n 2) [ {(C1 t · · · t Cn) u D | D 2 B()} Base Operator (Excerpt) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35
  31. 31. Definition of {9r.E | A = ar(r), E 2 A(D)} if C = 9r.D [ {9r.D u E | E 2 B()} [ {9s.D | s 2 sh#(r)} Examples: 9takesPartIn.SocialEvent 9takesPartIn.Meeting Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
  32. 32. Definition of {9r.E | A = ar(r), E 2 A(D)} if C = 9r.D [ {9r.D u E | E 2 B()} [ {9s.D | s 2 sh#(r)} Examples: 9takesPartIn.SocialEvent 9takesPartIn.Meeting Student u 9takesPartIn.SocialEvent Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
  33. 33. Definition of {9r.E | A = ar(r), E 2 A(D)} if C = 9r.D [ {9r.D u E | E 2 B()} [ {9s.D | s 2 sh#(r)} Examples: 9takesPartIn.SocialEvent 9takesPartIn.Meeting Student u 9takesPartIn.SocialEvent 9leads.SocialEvent Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
  34. 34. Properties of # is complete # is infinite, e.g. there are infinitely many refinement steps of the form: # C1 t C2 t C3 t . . . cl # is proper # is redundant: 8r1.A1 t 8r2.A1 # 8r1.(A1 u A2) t 8r2.A1 # # 8r1.A1 t 8r2.(A1 u A2) # 8r1.(A1 u A2) t 8r2.(A1 u A2) “DL-Learner: Learning Concepts in Description Logics”, Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 19 / 35
  35. 35. Learning using Refinement Operators 0,457 [01] too weak Car 0,7345789 [012345] Person 0,789 [45] Person u 9attends. 0,97 [4] Person u 9attends.Talk . . . . . . . . . Redundancy elimination technique with polynomial complexity wrt. search tree size Length of children limited by expansion value Infinite applicable he used by heuristic (Bias towards short concepts - Occam’s Razor) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
  36. 36. Learning using Refinement Operators 0,457 [01] too weak Car 0,7345789 [012345] Person 0,789 [45] Person u 9attends. 0,97 [4] Person u 9attends.Talk . . . . . . . . . Redundancy elimination technique with polynomial complexity wrt. search tree size Length of children limited by expansion value Infinite applicable he used by heuristic (Bias towards short concepts - Occam’s Razor) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
  37. 37. Learning using Refinement Operators 0,457 [01] too weak Car 0,7345789 [012345] Person 0,789 [45] Person u 9attends. 0,97 [4] Person u 9attends.Talk . . . . . . . . . Redundancy elimination technique with polynomial complexity wrt. search tree size Length of children limited by expansion value Infinite applicable he used by heuristic (Bias towards short concepts - Occam’s Razor) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
  38. 38. Learning using Refinement Operators 0,457 [01] too weak Car 0,7345789 [012345] Person 0,789 [45] Person u 9attends. 0,97 [4] Person u 9attends.Talk . . . . . . . . . Redundancy elimination technique with polynomial complexity wrt. search tree size Length of children limited by expansion value Infinite applicable he used by heuristic (Bias towards short concepts - Occam’s Razor) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
  39. 39. Learning using Refinement Operators 0,457 [01] too weak Car 0,7345789 [012345] Person 0,789 [45] Person u 9attends. 0,97 [4] Person u 9attends.Talk . . . . . . . . . Redundancy elimination technique with polynomial complexity wrt. search tree size Length of children limited by expansion value Infinite applicable he used by heuristic (Bias towards short concepts - Occam’s Razor) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
  40. 40. Learning using Refinement Operators 0,457 [01] too weak Car 0,7345789 [012345] Person 0,789 [45] Person u 9attends. 0,97 [4] Person u 9attends.Talk . . . . . . . . . Redundancy elimination technique with polynomial complexity wrt. search tree size Length of children limited by expansion value Infinite applicable he used by heuristic (Bias towards short concepts - Occam’s Razor) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
  41. 41. Learning using Refinement Operators 0,457 [01] too weak Car 0,7345789 [012345] Person 0,789 [45] Person u 9attends. 0,97 [4] Person u 9attends.Talk . . . . . . . . . Redundancy elimination technique with polynomial complexity wrt. search tree size Length of children limited by expansion value Infinite applicable he used by heuristic (Bias towards short concepts - Occam’s Razor) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
  42. 42. Learning using Refinement Operators 0,457 [01] too weak Car 0,7345789 [012345] Person 0,789 [45] Person u 9attends. 0,97 [4] Person u 9attends.Talk . . . . . . . . . Redundancy elimination technique with polynomial complexity wrt. search tree size Length of children limited by expansion value Infinite applicable he used by heuristic (Bias towards short concepts - Occam’s Razor) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
  43. 43. Scalability Refinement operator should build coherent concepts Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
  44. 44. Scalability Refinement operator should build coherent concepts Inference: Complete sound vs. approximation Open World Assumption (OWA) vs. Closed World Assumption (CWA) Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
  45. 45. Scalability Refinement operator should build coherent concepts Inference: Complete sound vs. approximation Open World Assumption (OWA) vs. Closed World Assumption (CWA) Stochastic coverage computation Pick random example ! perform instance check ! compute confidence interval (e.g. via Wald Method) wrt. objective function (e.g. F-measure) Up to 99% less instance checks in test examples Low influence on accuracy shown for 380 learning tasks using 7 ontologies (0, 2% ± 0, 4% F-measure difference) Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
  46. 46. Scalability Refinement operator should build coherent concepts Inference: Complete sound vs. approximation Open World Assumption (OWA) vs. Closed World Assumption (CWA) Stochastic coverage computation Pick random example ! perform instance check ! compute confidence interval (e.g. via Wald Method) wrt. objective function (e.g. F-measure) Up to 99% less instance checks in test examples Low influence on accuracy shown for 380 learning tasks using 7 ontologies (0, 2% ± 0, 4% F-measure difference) Fragment extraction for application on large knowledge bases Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
  47. 47. Carcinogenesis Goal: predict whether substance causes cancer Why: Each year 1000 new substances developed Substances can often be only be validated using time consuming and expensive experiments with mice ! prioritise those with high risk Background knowledge: Database of the US National Toxicology Program (NTP) “Obtaining accurate structural alerts for the causes of chemical cancers is a problem of great scientific and humanitarian value.” (A. Srinivasan, R.D. King, S.H. Muggleton, M.J.E. Sternberg 1997) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 22 / 35
  48. 48. Knowledge Base Enrichment Pattern Based Knowledge Base Enrichment; Lorenz Bühmann, Jens Lehmann; International Semantic Web Conference (ISWC) 2013 Universal OWL Axiom Enrichment for Large Knowledge Bases; Lorenz Bühmann, Jens Lehmann; Knowledge Engineering and Knowledge Management (EKAW) 2012 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 23 / 35
  49. 49. Protégé Plugin Support for ontology creation and maintenance Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 24 / 35
  50. 50. Ontology Debugging: ORE ORE - A Tool for Repairing and Enriching Knowledge Bases; Lehmann, Bühmann; Interna-tional Semantic Web Conference (ISWC) 2010 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 25 / 35
  51. 51. Data Quality Measurement: RDFUnit Test-driven Evaluation of Linked Data Quality; World Wide Web Conference (WWW), ACM, 2014; Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, Amrapali J. Zaveri Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 26 / 35
  52. 52. Robot Scientists Adam Eve Abduction to form hypothesis and 1 000 experiments per day 12 new scientific discoveries regarding functions of genes in yeast King, Ross D et al. The automation of science. Science 324 (2009): 85-89. Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 27 / 35
  53. 53. Link Discovery - Motivation Links are backbone of traditional WWW and Data Web Links are central for data integration, deduplication, cross-ontology question answering, reasoning, federated queries . . . Central problem for many large IT companies Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35
  54. 54. Link Discovery - Motivation Links are backbone of traditional WWW and Data Web Links are central for data integration, deduplication, cross-ontology question answering, reasoning, federated queries . . . Central problem for many large IT companies Automated tools (LIMES, SILK) can create a high number of links between RDF resources by using heuristics Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35
  55. 55. Link Discovery - Definition Definition (Link Discovery) Given sets S and T of resources and relation R (often owl:sameAs) Common approach: Find M = {(s, t) 2 S × T : (s, t) } Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
  56. 56. Link Discovery - Definition Definition (Link Discovery) Given sets S and T of resources and relation R (often owl:sameAs) Common approach: Find M = {(s, t) 2 S × T : (s, t) } S: DBpedia rdfs:label: African Elephant T: BBC Wildlife dc:title: African Bush Elephant Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
  57. 57. Link Discovery - Definition Definition (Link Discovery) Given sets S and T of resources and relation R (often owl:sameAs) Common approach: Find M = {(s, t) 2 S × T : (s, t) } S: DBpedia rdfs:label: African Elephant T: BBC Wildlife dc:title: African Bush Elephant dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ? Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
  58. 58. Link Discovery - Definition Definition (Link Discovery) Given sets S and T of resources and relation R (often owl:sameAs) Common approach: Find M = {(s, t) 2 S × T : (s, t) } S: DBpedia rdfs:label: African Elephant T: BBC Wildlife dc:title: African Bush Elephant dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ? = levenshtein(S.rdfs:label,T.dc:title) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
  59. 59. Link Discovery - Definition Definition (Link Discovery) Given sets S and T of resources and relation R (often owl:sameAs) Common approach: Find M = {(s, t) 2 S × T : (s, t) } S: DBpedia rdfs:label: African Elephant T: BBC Wildlife dc:title: African Bush Elephant dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ? = levenshtein(S.rdfs:label,T.dc:title) (dbpedia:AfricanElephant, bbc:hfzw82929) = 5 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
  60. 60. Example: Link Specification f (trigrams(:name, :label), 0.5) f (edit(:socId, :socId), 0.5) t Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 30 / 35
  61. 61. Link Specification Syntax and Semantics LS [[LS]] f (m, ,M) {(s, t, r)|(s, t, r) 2 M ^ (m(s, t) )} LS1 u LS2 {(s, t, r) | (s, t, r1) 2 [[L1]] ^ (s, t, r2) 2 [[L2]] ^ r = min(r1, r2)} LS1 t LS2 8 : (s, t, r) | 8 : r = r1 if 9(s, t, r1) 2 [[L1]] ^ ¬(9r2 : (s, t, r2) 2 [[L2]]), r = r2 if 9(s, t, r2) 2 [[L2]] ^ ¬(9r1 : (s, t, r1) 2 [[L1]]), r = max(r1, r2) if (s, t, r1) 2 [[L1]] ^ (s, t, r2) 2 [[L2]]. Syntax and semantics allow to define an ordering similar to subsumption (more specific specs generate less links) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 31 / 35
  62. 62. Link Specification Refinement Operator #(LS) = 8 : {f (m1, 1, ) u · · · u f (mn, 1, ) if LS = ? | mi 2 SM, 1 i n, n 2|SM|} f (m, dt(),M) [ LS t f (m0, 1,M) if LS = f (m, ,M) (atomic) (m 2 SM,m6= m0) LS1 u · · · u LSi−1 u LS0 u LSi+1 u · · · u LSn if LS = LS1 u · · · u LSn(n 2) with LS0 2 #(LSi ) LS1 t · · · t LSi−1 t LS0 t LSi+1 t · · · t LSn if LS = LS1 t · · · t LSn(n 2) with LS0 2 #(LSi ) [ LS t f (m, 1,M) (m 2 SM,m not used in LS) Upward refinement operator Postitive: Weakly complete, finite Negative: Not complete, redundant, not proper Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 32 / 35
  63. 63. Refinement Chain Example f (edit(:socId, :socId), 1.0) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
  64. 64. Refinement Chain Example f (edit(:socId, :socId), 1.0) f (edit(:socId, :socId), 0.5) Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
  65. 65. Refinement Chain Example f (edit(:socId, :socId), 1.0) f (edit(:socId, :socId), 0.5) f (trigrams(:name, :label), 1.0) f (edit(:socId, :socId), 0.5) t Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
  66. 66. Refinement Chain Example f (edit(:socId, :socId), 1.0) f (edit(:socId, :socId), 0.5) f (trigrams(:name, :label), 1.0) f (edit(:socId, :socId), 0.5) t f (trigrams(:name, :label), 0.5) f (edit(:socId, :socId), 0.5) t Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
  67. 67. Projects: DL-Learner and LIMES DL-Learner Open-Source-Project: http://dl-learner.org Extensible Platform for concept learning algorithms Supports all RDF/OWL serialisations and major reasoners Several thousand downloads LIMES (http://aksw.org/Projects/LIMES.html) Highly scalable engine (fastest RDF link discovery tool) Several machine learning approaches integrated (including the one presented) “DL-Learner: Learning Concepts in Description Logics”, Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 34 / 35
  68. 68. Summary Conclusions Many interesting applications of structured machine learning (therapy response prediction, disease prediction, protein folding, data quality measurement, ontology debugging) Still few machine learning tools for working with RDF/OWL although more and more data available Refinement operators allow to apply supervised machine learning on complex background knowledge Can be applied to other languages like link specifications Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 35 / 35

×