This document discusses inducing concepts in web ontologies through terminological decision trees (TDTs). It introduces TDTs, which extend first-order logical decision trees to allow description logic (DL) concept descriptions as node tests. The document outlines inducting, classifying with, and converting TDTs to learn concepts expressed in standard semantic web representations based on DL from examples. It evaluates the approach on benchmark datasets and concludes TDTs provide an effective means for automated concept learning in ontologies.
1. Induction of Concepts in Web Ontologies
through Terminological Decision Trees
Nicola Fanizzi Claudia d’Amato Floriana Esposito
LACAM – Dipartimento di Informatica
`
Universita degli Studi di Bari ”Aldo Moro”
ECML/PKDD 2010 – Barcelona, Spain
2. Preliminaries Motivation
Context
In the context of the Semantic Web
next Generation Knowledge Bases expressed as Ontologies
Problem with building ontologies:
Burdensome task
Domain Expert = Knowledge Engineer
Then
Automated Methods for learning concepts
expressed in standard SW representations
founded on Description Logics
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 2 / 32
3. Preliminaries State of the Art
Related Work
Early works
focused on learnability, LCS op. for the C LASSIC family (ancestors
of the DL languages) [Cohen et al., 1992, Cohen and Hirsh, 1994]
K LUSTER: conceptual clustering in B ACK [Kietz and Morik, 1994]
approaches based on refinement operators
ALER refinement operators [Badea and Nienhuys-Cheng, 2000]
Y IN YANG: downward operator based on the notion of
counterfactuals; examples expressed as most specific concepts:
complex concepts definitions [Iannone et al., 2007]
DL-L EARNER: top-down GP algorithm, based on new downward
operators, heuristic that favor definitions of limited complexity
[Lehmann and Hitzler, 2008, Lehmann and Hitzler, 2010]
DL-F OIL adapts F OIL to DL representation [Fanizzi et al., 2008]
Other approaches: hybrid languages
[Rouveirol and Ventos, 2000, Kietz, 2002, Lisi and Esposito, 2008]
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 3 / 32
4. Preliminaries State of the Art
In this work
Introduce Terminological Decision Trees
Induction, Classification, Conversion
Evaluation
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 4 / 32
5. Preliminaries State of the Art
Outline
1 DL: Representation & Inference
Syntax & Semantics
DL Knowledge Bases
Inference
2 Learning Concepts through TDTs
Learning Problem
Terminological Decision Trees
Induction
Classification
Conversion
3 Evaluation
Setup
Results
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 5 / 32
7. DL: Representation & Inference Syntax & Semantics
DLs Preliminaries I
In DLs
axioms inductively defined building on a vocabulary of
NC set of primitive concept names
NR set of primitive role names
NI set of individual names
and syntax constructors
Set-theoretic semantics defined by interpretations I = (∆I , ·I )
∆I domain of the interpretation (non-empty)
·I interpretation function that maps names to extensions
each A ∈ NC to a set AI ⊆ ∆I and
each R ∈ NR to RI ⊆ ∆I × ∆I
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 7 / 32
8. DL: Representation & Inference Syntax & Semantics
DLs Preliminaries II
ALC Syntax
C, D → top concept
| ⊥ bottom concept
| A primitive concept Animal
| ¬C (full) concept negation ¬Parent
| C D concept conjunction Person Male
| C D concept disjunction Male Female
| ∃R.C existential restriction ∃hasChild.Male
| ∀R.C universal restriction ∀hasChild.Female
grammar rules names examples
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 8 / 32
9. DL: Representation & Inference Syntax & Semantics
DLs Preliminaries III
ALC Semantics
construct interpretatation OWL
I
= ∆I owl:Thing
I
⊥ =∅ owl:Nothing
¬C I = ∆I C I owl:complementOf
(C D)I = C I ∩ D I owl:intersectionOf
(C D)I = C I ∪ D I owl:unionOf
(∃R.C)I = {x | ∃y : (x, y) ∈ RI ∧ y ∈ C I } owl:someValuesFrom
(∀R.C)I = {x | ∀y : (x, y) ∈ RI → y ∈ C I } owl:allValuesFrom
In SW/DL: Open world assumption made
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 9 / 32
10. DL: Representation & Inference DL Knowledge Bases
Knowledge Bases I
A knowledge base K = T , A contains
TBox T set of axioms C D (resp. C ≡ D),
meaning C I ⊆ D I (resp. C I = D I )
where C is atomic and D is a concept description
ABox A set of assertions like C(a) and R(a, b),
meaning that aI ∈ C I and (aI , bI ) ∈ RI
Ind(A) = set of individuals occurring in A
Interpretations of interest (models) satisfy all axioms in K
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 10 / 32
12. DL: Representation & Inference Inference
Inference & OWA
Q = ∃hasChild.(Parricide ∃hasChild.¬Parricide)
(class of individuals with a child who is a parricide and has a child who
is not a parricide)
K |= Q(JOCASTA) ?
Problem of incomplete knowledge about the truth of
α = Parricide(POLYNEIKES)
OWA (reasoning on the possible models): true
dividing interpretations (of K) into two classes:
1 models of α and
2 models of ¬α
In both cases JOCASTA satisfies Q (J-P-T / J-O-P)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 12 / 32
13. Learning Concepts through TDTs
Outline
1 DL: Representation & Inference
Syntax & Semantics
DL Knowledge Bases
Inference
2 Learning Concepts through TDTs
Learning Problem
Terminological Decision Trees
Induction
Classification
Conversion
3 Evaluation
Setup
Results
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 13 / 32
14. Learning Concepts through TDTs Learning Problem
Concept Induction I
Let K = (T , A) be a DL knowledge base (acting as BK )
Definition (DL concept learning problem)
Given
a target concept name C;
a set of positive and negative examples for C:
+
SC (A) = {a ∈ Ind(A) | K |= C(a)} and
−
SC (A) = {b ∈ Ind(A) | K |= ¬C(b)}
Find a concept description D that satisfies
+
K |= D(a) ∀a ∈ SC (A) and
−
K |= ¬D(b) ∀b ∈ SC (A)
Then induced axiom C ≡ D can be added to K
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 14 / 32
15. Learning Concepts through TDTs Learning Problem
Concept Induction II
Example (car checking [Blockeel and De Raedt, 1997])
Gear Replaceable,
Chain Replaceable,
Engine ¬Replaceable,
T = Wheel ¬Replaceable
¬(Fix Ok),
SendBack
¬(Ok SendBack),
Fix
¬(SendBack Fix)
Ok
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 15 / 32
16. Learning Concepts through TDTs Learning Problem
Concept Induction III
Example (cont’d)
The original examples can be encoded as assertions:
Machine(M1 ), hasPart(M1 , G1 ), Gear(G1 ), Worn(G1 ),
hasPart(M , C ), Chain(C ), Worn(C ),
1 1 1 1
Machine(M2 ), hasPart(M2 , E2 ), Engine(E2 ), Worn(E2 ),
⊆A
hasPart(M2 , C2 ), Chain(C2 ), Worn(C2 ),
Machine(M3 ), hasPart(M3 , W2 ), Wheel(W3 ), Worn(W3 ),
Machine(M4 )
Given this KB and the example sets
+ −
SC (A) = {M1 , M3 } and SC (A) = {M2 , M4 },
a good definition for C = SendBack may be:
SendBack ≡ Machine ∃hasPart.(Worn ¬Replaceable)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 16 / 32
17. Learning Concepts through TDTs Terminological Decision Trees
Terminological Decision Trees I
First-order logical decision trees (FOLDTs) are defined
[Blockeel and De Raedt, 1998] as binary decision trees in which
1 the nodes contain tests in the form of FOL formulae;
2 left and right branches stand, resp., for the truth-value (resp. true
and false) determined by the test evaluation;
3 different nodes may share variables
with some limitations
Terminological decision trees (TDTs) extend this definition,
allowing DL concept descriptions as (variable-free) node tests
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 17 / 32
18. Learning Concepts through TDTs Terminological Decision Trees
Terminological Decision Trees II
A TDT providing the definition of the SendBack concept
∃hasPart.
∃hasPart.Worn ¬SendBack ( Machine)
∃hasPart.(Worn ¬Replaceable) ¬SendBack ( Ok)
SendBack ¬SendBack ( Fix)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 18 / 32
19. Learning Concepts through TDTs Induction
Induction of TDTs – base case
function INDUCE TDT REE(C; D; Ps, Ns, Us): TDT;
C: concept name;
D: current description;
Ps, Ns, Us: set of (positive, negative, unlabeled) training individuals;
const θ: purity threshold
begin
Initialize new TDT T ;
if |Ps| = 0 and |Ns| = 0 then
begin
if Pr+ ≥ Pr− then T.root ← C else T.root ← ¬C;
return T ;
end
if |Ns| = 0 and |Ps|/(|Ps| + |Us|) > θ then
begin T.root ← C; return T ; end
if |Ps| = 0 and |Ns|/(|Ns| + |Us|) > θ then
begin T.root ← ¬C; return T ; end
{ ... }
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 19 / 32
20. Learning Concepts through TDTs Induction
Induction of TDTs – recursive case
{ ... }
Specs ← GENERATE N EW C ONCEPTS(D, Ps, Ns);
Dbest ← SELECT B EST C ONCEPT(Specs, Ps, Ns, Us);
((P l , N l , U l ), (P r , N r , U r )) ← SPLIT(Dbest , Ps, Ns, Us);
T.root ← Dbest ;
T.left ← INDUCE TDT REE(C, D Dbest , P l , N l , U l );
T.right ← INDUCE TDT REE(C, D ¬Dbest , P r , N r , U r );
return T ;
end
The (im)purity measure is based on the Gini index
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 20 / 32
21. Learning Concepts through TDTs Classification
TDTs – Classification of individuals
function CLASSIFY(a: individual, T : TDT, K: KB): concept;
begin
1 N ← ROOT(T );
2 while ¬LEAF(N, T ) do
1 (D, Tleft , Tright ) ← INODE(N );
2 if K |= D(a) then N ← ROOT(Tleft )
3 elseif K |= ¬D(a) then N ← ROOT(Tright )
4 else return
3 (D, ·, ·) ← INODE(N );
4 return D;
end
Observation To avoid unknown answers due to OWA (test failure on
both branches) use weaker right-branch test (2.3): K |= Di (a)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 21 / 32
22. Learning Concepts through TDTs Conversion
Conversion – TDTs to DL Concepts I
function DERIVE D EFINITION(C, T ): concept description;
C: concept name;
T : TDT;
begin
1 S ← ASSOCIATE(C, T, );
2 return D∈S D;
end
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 22 / 32
23. Learning Concepts through TDTs Conversion
Conversion – TDTs to DL Concepts II
function ASSOCIATE(C; T ; Dc ): set of descriptions;
C: concept name;
T : TDT;
Dc : current concept description
begin
1 N ← ROOT(T );
2 (Dn , Tleft , Tright ) ← INODE(N );
3 if LEAF(N, T )
then
1 if Dn = C then return {Dc }; else return ∅;
else
1 Sleft ← ASSOCIATE(C, Tleft , Dc Dn );
2 Sright ← ASSOCIATE(C, Tright , Dc ¬Dn );
3 return Sleft ∪ Sright ;
end
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 23 / 32
24. Evaluation
Outline
1 DL: Representation & Inference
Syntax & Semantics
DL Knowledge Bases
Inference
2 Learning Concepts through TDTs
Learning Problem
Terminological Decision Trees
Induction
Classification
Conversion
3 Evaluation
Setup
Results
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 24 / 32
25. Evaluation Setup
Evaluation – Setup
System TermiTIS applied to classification problems
50 random queries per ontology generated by composition of 2
through 8 concepts built by means of ALC constructors
.632 bootstrap strategy
DL reasoner P ELLET ver. 2 employed to decide the actual
class-membership w.r.t. the queries
Default threshold (θ = .05)
OWL ontologies selected from standard repositories
DL #obj. #d-type
ontology language #concepts prop’s prop’s #ind’s
FSM SOF (D) 20 10 7 37
MDM0.73 ALCHOF (D) 196 22 3 112
W INES ALCOF (D) 75 12 1 161
B IO PAX ALCIF (D) 74 70 40 323
H D ISEASE ALCIF (D) 1498 10 15 639
NTN SHIF (D) 47 27 8 676
F INANCIAL ALCIF 60 16 0 1000
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 25 / 32
26. Evaluation Results
Performance
Compare classification of the test individuals using both the induced
trees and the deductive one provided by a reasoner
inductive vs. deductive classification
match case: −1 vs. −1, 0 vs. 0, +1 vs. +1;
omission error case: 0 vs. −1, 0 vs. +1;
commission error case: −1 vs. +1, +1 vs. −1;
induction case: −1 vs. 0, +1 vs. 0;
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 26 / 32
27. Evaluation Results
Results I
match commission omission induction
ontology
rate rate rate rate
FSM 96.68±01.98 00.99±01.35 00.02±00.18 02.31±00.51
MDM0.73 93.96±05.44 00.39±00.61 03.50±04.16 02.15±01.47
W INES 74.36±25.63 00.67±04.63 12.46±14.28 12.13±23,49
B IO PAX 96.51±06.03 01.30±05.72 02.19±00.51 00.00±00,00
H D ISEASE 78.60±39.79 00.02±00.10 01.54±06.01 19.82±39.17
NTN 91.65±15.89 00.01±00.09 00.36±01.58 07.98±14.60
F INANCIAL 96.21±10.48 02.14±10.07 00.16±00.55 01.49±00.16
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 27 / 32
28. Evaluation Results
Results II
Examples of induced concepts and original queries
B IO PAX
induced: (Or (And physicalEntity protein) dataSource)
original:
(Or (And (And dataSource externalReferenceUtilityClass)
(ForAll ORGANISM (ForAll CONTROLLED phys icalInteraction)))
protein)
NTN
induced: (Or EvilSupernaturalBeing (Not God))
original: (Not God)
F INANCIAL
induced: (Or (Not Finished) NotPaidFinishedLoan Weekly)
original: (Or LoanPayment (Not NoProblemsFinishedLoan))
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 28 / 32
29. Conclusions
Outline
1 DL: Representation & Inference
Syntax & Semantics
DL Knowledge Bases
Inference
2 Learning Concepts through TDTs
Learning Problem
Terminological Decision Trees
Induction
Classification
Conversion
3 Evaluation
Setup
Results
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 29 / 32
30. Conclusions
Conclusions & Outgoing Work
Introduced terminological
decision trees, + new method Experiments with domain
for learning concepts in DLs experts (ontology population)
that support the standard SW More expressive DLs
ontology languages (+ new ref.op.’s)
T ERMI TIS system currently KBs represented
top-down tree induction with expressive DLs
adaptation of standard but build concepts with
tree-induction methods ALCQ constructors using
classification concept names as atoms
conversion impurity indices to exploit the
Experiments made on various uncertainty related to the
ontologies proves the method unlabeled individuals
effective and robust (high Derive new hierarchical
match rate, few commission clustering algorithms
errors)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 30 / 32
31. time for questions
Many thanks for attending this talk
comments / questions ?
(also, meet me @ Poster Session)
Offline
Nicola Fanizzi fanizzi@di.uniba.it
Claudia d’Amato claudia.damato@di.uniba.it
Floriana Esposito esposito@di.uniba.it
32. References
Badea, L. and Nienhuys-Cheng, S.-H. (2000).
A refinement operator for description logics.
In Cussens, J. and Frisch, A., editors, Proceedings of the 10th International Conference on Inductive Logic Programming,
volume 1866 of LNAI, pages 40–59. Springer.
Blockeel, H. and De Raedt, L. (1997).
Experiments with top-down induction of first order decision trees.
Technical Report CW 247, Dept. of Computer Science, K.U. Leuven.
Blockeel, H. and De Raedt, L. (1998).
Top-down induction of first-order logical decision trees.
Artificial Intelligence, 101(1-2):285–297.
Cohen, W., Borgida, A., and Hirsh, H. (1992).
Computing the least common subsumers in description logic.
In Swartout, W., editor, Proceedings of the 10th National Conference on Artificial Intelligence, pages 754–760. Mit Press.
Cohen, W. and Hirsh, H. (1994).
Learning the CLASSIC description logic.
In Torasso, P. et al., editors, Proceedings of the 4th International Conference on the Principles of Knowledge
Representation and Reasoning, pages 121–133. Morgan Kaufmann.
Fanizzi, N., d’Amato, C., and Esposito, F. (2008).
DL-F OIL: Concept learning in Description Logics.
In Zelezn´ , F. and Lavraˇ , N., editors, Proceedings of the 18th International Conference on Inductive Logic Programming,
y c
ILP2008, volume 5194 of LNAI, pages 107–121. Springer.
Iannone, L., Palmisano, I., and Fanizzi, N. (2007).
An algorithm based on counterfactuals for concept learning in the semantic web.
Applied Intelligence, 26(2):139–159.
Kietz, J.-U. (2002).
Learnability of description logic programs.
In Matwin, S. and Sammut, C., editors, Proceedings of the 12th International Conference on Inductive Logic Programming,
volume 2583 of LNAI, pages 117–132, Sydney. Springer.
Kietz, J.-U. and Morik, K. (1994).
A polynomial approach to the constructive induction of structural knowledge.
Machine Learning, 14(2):193–218.
Lehmann, J. and Hitzler, P. (2008).
Foundations of refinement operators for description logics.
In Blockeel, H. and et al., editors, Proceedings of the 17th International Conference on Inductive Logic Programming,
ILP2007, volume 4894 of LNCS, pages 161–174. Springer.
Lehmann, J. and Hitzler, P. (2010).
Concept learning in description logics using refinement operators.
Machine Learning, 78(1-2):203–250.
Lisi, F. and Esposito, F. (2008).
Foundations of onto-relational learning.
In Zelezn´ , F. and Lavraˇ , N., editors, Proceedings of the 18th International Conference on Inductive Logic Programming,
y c
ILP2008, volume 5194 of LNAI, pages 158–175.
Rouveirol, C. and Ventos, V. (2000).
Towards learning in CARIN-ALN .
In Cussens, J. and Frisch, A., editors, Proceedings of the 10th International Conference on Inductive Logic Programming,
volume 1866 of LNAI, pages 191–208. Springer.