Approximating Numeric Role Fillers via Predictive Clustering Trees for Knowledge Base Enrichmenent in the Web of Data
1. Approximating Numeric Role Fillers via
Predictive Clustering Trees for Knowledge
Base Enrichment in the Web of data
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito
Discovery Science 2016, Bari, 19th October 2016
G.Rizzo et al. (Univ. of Bari) 19th October 2016 1 / 16
2. Outline
1 The Context and Motivations
2 Basics
3 The approach
4 Empirical Evaluation
5 Conclusion & Further Extensions
G.Rizzo et al. (Univ. of Bari) 19th October 2016 2 / 16
3. The Context and Motivations
• Goal: Determine the numerical property values (used as
attributes) for a resource in a Web of Data knowledge base
• Web of data: lots of knowledge bases exposed in a standard
format (RDF, OWL)
• Two resources or a resource and a literal are linked through
properties (strings, numerical values)
• Inference services may fail to determine the value due to the
Open World Assumption
• Solution: solve a multi-target regression problem
• Predictive Clustering Trees (PCTs) for the Web of Data
representations (e.g. DLs)
• to predict the most plausible value
• to elicit rules (e.g. SWRL rules) for enriching the schema of a
knowledge base
G.Rizzo et al. (Univ. of Bari) 19th October 2016 3 / 16
4. Description Logics
Syntax & Semantics
• Atomic concepts (classes), NC
• Roles (binary relations), NR
• Concrete domains: string, boolean, numeric values
• Operators to build complex concept descriptions
• Semantics defined through interpretations I = (∆I, ·I)
• ∆I
: domain of the interpretation
• ·I
: intepretation function
• for each concept C ∈ NC , CI
⊆ ∆I
• for each role R ∈ NR , RI
⊆ ∆I
× ∆I
G.Rizzo et al. (Univ. of Bari) 19th October 2016 4 / 16
5. Description Logics
Knowledge bases
• Knowledge base: a couple K = (T , A) where
• T (TBox): axioms concerning concepts/roles
• Subsumption axioms C D: iff for every interpretation I,
CI
⊆ DI
holds
• Equivalence axioms C ≡ D: iff for every interpretation I,
CI
⊆ DI
and I, DI
⊆ CI
holds
• A (ABox): assertions about a set of individuals is denoted by
Ind(A)
• class assertions, C(a)
• role assertions,R(a, b) ( b is called role filler)
• Reasoning services:
• subsumption: a concept is more general than a given one
• satisfiability: given a concept description C and an interpretation
I, CI
= ∅
• instance checking: for every interpretation, I C(a) holds (a is an
instance for C)
G.Rizzo et al. (Univ. of Bari) 19th October 2016 5 / 16
6. Semantic Web Rules Language (SWRL)
• Datalog-like representation language
• Adds the expressiveness to DLs
• Syntax:
• term: a (universal quantified) variable x or a constant c
• atom: unary or binary predicate C(t1) and R(t1, t2) (predicate
symbols are concept and role names), where ti are terms
• Rule: implication between an antecedent/body and a consequent
B1 ∧ · · · ∧ Bn → H1 ∧ · · · Hm
We are interested to safety rules (each variable in the body must
be in the head)
• Open-World Assumption holds
G.Rizzo et al. (Univ. of Bari) 19th October 2016 6 / 16
7. The model for multi-target regression
• PCT for multi-target
regression: a binary
tree where
• inner nodes: DL
conjunctive concept
descriptions
• leaf nodes: vectors
with the
approximated target
properties values
Comedy
Comedy starring.Actor
p = (8.45, 9810666) p = (5.38, 4200000)
¬Comedy ¬Horror
p = (4.7, 4200000) p = (8.6, 4930000)
G.Rizzo et al. (Univ. of Bari) 19th October 2016 7 / 16
8. Learning PCTs in DLs
• Divide-and-conquer approach
• Training set: individuals whose target properties values are known
• partitioning according to the membership w.r.t. a new concept
• Refinement operator for generating the concepts
• by introducing a new concept name (or its complement)
• by replacing a sub-description with an existential restriction
• by replacing a sub-description with an universal restriction
• Best Concept: minimization the RMSE of the standardized
values of the target properties
• Stop conditions: maximum number of levels or size of the
training (sub)set
G.Rizzo et al. (Univ. of Bari) 19th October 2016 8 / 16
9. Exploiting PCTs
• Prediction: given an individual a, the algorithm traverses the tree
according to the instance check w.r.t. the inner concepts D
• if K |= D(a) the left branch is followed
• K |= ¬D(a) the right branch is satisfied
• otherwise, a default model is returned
• Eliciting SWRL rules: traversing recursive tree structure and
collecting the intermediate concept along a branch
• Body: intermediate concept descriptions as predicate name
• Head: each target property name as the predicate name
• the approximated value as a term
G.Rizzo et al. (Univ. of Bari) 19th October 2016 9 / 16
10. Experiments
Small ontologies: Settings
• Maximum depth for PCTs: 10, 15,20
• Comparison w.r.t. Terminological regression trees (TRT),
multi-target k-nn regressor (with k =
√
Tr) and multi-target
linear regression model
• atomic concepts as features set for k-nn regressor and multi-target
linear regression model
• 0.632 bootstrap
• performance in terms of RMSE
G.Rizzo et al. (Univ. of Bari) 19th October 2016 10 / 16
11. Experiments
Linked Data datasets: Settings
• Ontologies extracted from DBPedia via crawling
• Maximum depth for PCTs: 10, 15,20
• Comparison w.r.t. TRTs,k-nn (with k =
√
Tr) and LR
• 10-fold cross validation
• performance in terms of RRMSE
G.Rizzo et al. (Univ. of Bari) 19th October 2016 11 / 16
12. Experiments
Small ontologies: Outcomes
RMSE averaged over the number of the replications (and standard
deviations)
Ontology PCT TRT k-NN LR
BCO 0.0277 ± 0.01 0.0356 ± 0.01 0.0472 ± 0.01 0.0554 ± 0.01
BioPax 132 ± 11.0 145 ± 12.0 186 ± 7.00 195 ± 8.85
geopolitical 0.0284 ± 0.01 0.03561 ± 0.03 0.057 ± 0.03 0.06 ± 0.02
monetary 7.52 ± 0.15 8.46 ± 0.07 7.53 ± 0.17 7.78 ± 0.34
mutagenesis 0.0445 ± 0.07 0.0637 ± 0.03 0.0547 ± 0.02 0.0647 ± 0.05
G.Rizzo et al. (Univ. of Bari) 19th October 2016 12 / 16
13. Experiments
Linked Data datasets: Outcomes
Table: RRMSE averaged on the number of runs
Datasets PCT TRT k-NN LR
Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02
Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02
Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05
Table: Comparison in terms of elapsed times (secs)
Datasets PCT TRT k-NN LR
Fragm #1 elevation 2454.3
populationTotal 2353.0
total 2432 4807.3 547.6 234.5
Fragm #2 areaTotal 2256.0
areaUrban 2345.0
areaMetro 2345.2
total 2456 6946.2 546.2 235.7
Fragm #3 height 743.5
weight 743.4
total 743.3 1486.9 372.3 123.5
G.Rizzo et al. (Univ. of Bari) 19th October 2016 13 / 16
14. Discussion
• PCTs more performant than TRTs
• the different heuristic allows to choose more promising concepts
• standardization mitigated the abnormal values
• PCTs more performant than k-nn
• curse of dimensionality
• k-nn more performant than LR
• spurious individuals were excluded to determine the local model
• PCTs more efficient than TRTs
G.Rizzo et al. (Univ. of Bari) 19th October 2016 14 / 16
15. Examples of discovered rules
According to the discovered rules an American football player is taller
than a person that does not play american football
Person(x) ∧ Athlete(x) ∧ AmericanFootballPlayer(x) →
height(x, 195.4)
Person(x) ∧ Athlete(x) ∧ AmericanFootballPlayer(x) →
weight(x, 113.5)
Person(x) ∧ Athlete(x) ∧ ¬AmericanFootballPlayer(x) →
height(x, 187)
Person(x) ∧ Athlete(x) ∧ ¬AmericanFootballPlayer(x) →
weight(x, 87.5)
G.Rizzo et al. (Univ. of Bari) 19th October 2016 15 / 16
16. Conclusion and Further Outlooks
• We proposed an extension of predictive clustering trees compliant
to DL representation languages for solving the problem of
predicting datatype properties and discovering rules
• The outcomes are promising
• Further extensions
• New refinement operators
• Further heuristics
• linear models at leaf nodes
G.Rizzo et al. (Univ. of Bari) 19th October 2016 16 / 16