Towards Evidence Terminological Decision Tree

Towards Evidence Terminological Decision Trees
15th International Conference on Information Processing and
Management of Uncertainty in Knowledge-Based Systems
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi and Floriana Esposito
Dipartimento di Informatica
Universit`a degli Studi di Bari ”Aldo Moro”, Bari, Italy
July 15 - 19, 2014
G.Rizzo et al. (DIB- Univ. Aldo Moro) Evidence Terminological Decision Trees July 15 - 19, 2014 1 / 17

Outline
1 Introduction & Motivation
2 The approach
3 Evaluation
4 Conclusions

Introduction & Motivation
Introduction
In the context of Web of Data, machine learning algorithms can
support:
the ontology completion
the development of new non-standard inference services
by exploiting regularities in the a knowledge base
Lack of disjointness axioms in ontologies
The Open World Assumption does not allows to assess the membership
w.r.t a query concept (or its complement) deductively

Introduction
Some techniques proposed in literature are inspired from Inductive
Logic Program
E.g.: Terminological Decision Trees (TDT) induction algorithm
In this kind of methods the uncertainty is not considered explicitly

Terminological Decision Trees
Let K = (T , A), a Terminological Decision Tree (TDT) is a binary tree
where:
each node contains a conjunctive concept description D;
each departing edge is the result of a class-membership test w.r.t. D,
i.e., given an individual a, K |= D(a)?
if a node with E is the father of the node with D then D is obtained
by using a reﬁnement operator and one of the following conditions
should be veriﬁed:
D introduces a new concept name,
D is an existential restriction,
D is an universal restriction of any its ancestor.

Motivations
Given the problem to predict the membership w.r.t. either the
concept or its complement, TDTs cannot to determine the result
w.r.t. the intermediate test.
due to the treatment of missing values with DTs
What happens when neither the membership for ∃hasPart. nor for the
complement can be decided?

The approach
The approach
Extending Terminological Decision Trees with the Dempster-Shafer
Theory the problem can be overcome
Dempster-Shafer Theory (DST) is a more suitable framework than
the probabilistic one because it allows to represents explicitly the
ignorance related to the Open World Assumption

The approach
Semantic Web Knowledge bases
Semantic Web knowledge bases can be modeled by using speciﬁc
fragments of Description Logics (DL)
A domain is modeled through primitive concepts (classes) and roles
(relations),
A knowledge base is a couple K = (T , A) where
T (TBox) contains axioms concerning concepts and roles
A (ABox) contains factual knowledge (C(a), resp. R(a, b)).
The set of individuals occurring in A is denoted by Ind(A)
Various reasoning services are available
instance checking: a service to decide if an individual is an instance of
a concept or not

The approach
The problem
Given:
a knowledge base K = (T , A)
a target concept C,
the sets of positive and negative examples for C:
Ps = {a ∈ Ind(A) | K |= C(a)} and Ns = {a ∈ Ind(A) | K |= ¬C(a)}
Obtain:
a concept description D for C (C D), such that:
K |= D(a) ∀a ∈ Ps
K |= ¬D(a) ∀a ∈ Ns
The intensional deﬁnition should be general enough to predict the
membership for future instances.

The approach
DST-Terminological Decision Trees
Let K = (T , A), a DST-Terminological Decision Tree (DST-TDT) is a
binary tree where:
each node contains a conjunctive concept description D and a Basic
Belief Assignement (BBA) m obtained by counting the positive,
negative and uncertain instances;
each departing edge is the result of a class-membership test w.r.t. D,
i.e., given an individual a, K |= D(a)?
if a node having the concept description E is the father of the node
with the concept description D then D is obtained by using the same
operator for Terminological Decision Tree

The approach
DST- Terminological Decision Tree
Training
Starting from the root the method refines the concept description
installed into the current node
Various pairs (D, m) are generated (m is generated from the frame of
discernement Ω = {D, ¬D} by counting positive, negative and
uncertain-membership instances that reached the node
Best Concept: the one having the most definite membership (i.e. the
smaller number of uncertain-membership instances)
non − specificity(D) =
A⊆ΩD
m(A)log|A|
Split the instances according to the results of the instance check test
Stop conditions:
the node is pure w.r.t. the membership
Non-specificity measure goes beyond a thresold ν

The approach
DST - Terminological Decision Tree
How-to use DST-TDTs
The membership prediction for a new indivdual is done by exploring
the tree according to the instance check test K |= C(a) and
K |= ¬C(a)
if neither the ﬁrst test nor the second test return a positive answer
(due to OWA) both branches are followed
all the BBAs associated to the reached leaves are pooled according to a
DST combination rule
the class is the one which maximizes the Conﬁrmation Function
∀A ⊆ Ω Conf (A) = Bel(A) + Pl(A) − 1

Evaluation
Evaluation
Setting
30 randomly generated queries
0.632 bootstrap
growth control threshold ν = 0.5
pooling by means of Dubois-Prade Combination Rule
Using a reasoner to decide the ground truth:
match: rate of the test cases (individuals) for which the inductive
model and a reasoner predict the same membership (i.e. +1 | +1,
−1 | −1, 0 | 0);
commission: rate of the cases for which predictions are opposite (i.e.
+1 | −1, −1 | +1);
omission: rate of test cases for which the inductive method cannot
determine a deﬁnite membership (−1, +1) while the reasoner is able to
do it;
induction: rate of cases where the inductive method can predict a
membership while it is not logically derivable.

Evaluation
Evaluation
Results
Comparison between the original terminological trees (DLTree) and the
DST-TDTs both by with the threshold (DSTG) and without (DSTTree)
Ontology Index DLTree DSTTree DSTG
FSM
match 95.34±04.94 93.22±07.33 86.16±10.48
commiss. 01.81±02.18 01.67±03.05 02.07±03.19
omission 00.74±02.15 02.57±04.09 04.98±05.99
induction 02.11±04.42 02.54±01.89 01.16±01.26
Leo
match 95.53±10.07 97.07±04.55 94.61±06.75
commiss. 00.48±00.57 00.41±00.86 00.41±01.00
omission 03.42±09.84 01.94±04.38 00.58±00.51
induction 00.57±03.13 00.58±00.51 00.00±00.00
LUBM
match 20.78±00.11 79.23±00.11 79.22±00.12
commiss. 00.00±00.00 00.00±00.00 00.00±00.00
omission 00.00±00.00 20.77±00.11 20.78±00.12
induction 79.22±00.11 00.00±00.00 00.00±00.00
BioPax
match 96.87±07.35 85.76±21.60 82.15±21.10
commiss. 01.63±06.44 11.81±19.96 12.32±19.90
omission 00.30±00.98 01.54±03.02 04.88±03.03
induction 01.21±00.56 00.89±00.53 00.26±00.27
NTN
match 27.02±01.91 18.97±19.01 87.63±00.19
commiss. 00.00±00.00 00.39±01.08 00.00±00.00
omission 00.22±00.26 02.09±03.00 12.37±00.19
induction 72.77±01.51 78.54±17.34 00.00±00.00

Evaluation
Discussion
Preliminary evaluation showed that the method did not overcome the
original version
Large number of omission cases: a more conservative behavior
Probably, due to the combination rules that returned a pooled BBA
with a greater value in favor of ignorance
the threshold employed to control the growth reduced the induction
cases
Results are not stable yet
large standard deviation

Conclusions
Conclusion & Extensions
An extension of TDT based on Dempster-Shafer Theory has been
proposed and a preliminary evaluation has been made.
The results are not very good, yet
Extensions:
Tackle the compexity of the model
Pruning algorithms
Consider other measures for the selection of the best candidates
Further datasets should be considered
Linked Data datasets

Conclusions
Thank you!
Questions?

Towards Evidence Terminological Decision Tree

Recommended

Recommended

More Related Content

Similar to Towards Evidence Terminological Decision Tree

Similar to Towards Evidence Terminological Decision Tree (20)

Recently uploaded

Recently uploaded (20)

Towards Evidence Terminological Decision Tree