Inducing Predictive Clustering Trees for Datatype properties Values

Inducing Predictive Clustering Trees for
Datatype properties Values
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito
Semantic Machine Learning, 10th July 2016
G.Rizzo et al. (Univ. of Bari) 10th July 2016 1 / 18

Outline
1 The Context and Motivations
2 Basics
3 The approach
4 Empirical Evaluation
5 Conclusion & Further Extensions

The Context and Motivations
• Goal: approximating the (numerical) datatype property values
through regression models in the Web of Data
• Web of data: a large number of knowledge bases, datasets and
vocabularies exposed in a standard format (RDF, OWL)
• (numerical) property values can hardly be derived by using
reasoning services
• Open World Assumption
• a large number of missing information
• The informative gap can be ﬁlled by using regression models

The context and Motivations
• Solving a regression problem
• two or more property values may be related (e.g. crime rate and
population of a place)
• correlations should improve the predictiveness
• Predicting more numerical values at once (multi-target
regression) through Predictive Clustering approaches
• Predictive Clustering Trees (PCTs) as a generalization of decision
trees
• PCTs compliant to the representation languages for the Web of
Data (e.g. Description Logics)
• target values: the numeric role ﬁllers for the properties

Description Logics
Syntax & Semantics
• Atomic concepts (classes), NC and roles (relations), NR to model
domains
• Operators to build complex concept descriptions
• Concrete domains: string, boolean, numeric values
• Semantics deﬁned through interpretations I = (∆I, ·I)
• ∆I
: domain of the interpretation
• ·I
: intepretation function
• for each concept C ∈ NC , CI
⊆ ∆I
• for each role R ∈ NR , RI
⊆ ∆I
× ∆I
ALC operators
Top concept: ∆I
Bottom concept: ⊥ ∅
Concept: C CI
⊆ ∆I
Full Complement: ¬C ∆ CI
Intersection: C D CI
∩ DI
Disjunction: C D CI
∪ DI
Universal restriction ∀R.D {x ∈ ∆I
| ∀y ∈ ∆I
(x, y) ∈ RI
→ y ∈ DI
}
Existential restriction ∃R.D {x ∈ ∆I
| ∃y ∈ ∆I
(x, y) ∈ RI
∧ y ∈ DI
}

Description Logics
Knowledge bases
• Knowledge base: a couple K = (T , A) where
• T (TBox): axioms concerning concepts/roles
• Subsumption axioms C D: iff for every interpretation I,
CI
⊆ DI
holds
• Equivalence axioms C ≡ D: iff for every interpretation I,
CI
⊆ DI
and I, DI
⊆ CI
holds
• A (ABox): class assertions, C(a) and role assertions,R(a, b) about
a set of individuals is denoted by Ind(A)
• Reasoning services:
• subsumption: a concept is more general than a given one
• satisfiability: given a concept description C and an interpretation
I, CI
= ∅
• instance checking: for every interpretation, I C(a) holds (a is an
instance for C)

The problem
Given:
• a knowledge base K = (T , A);
• the target functional roles Ri , 1 ≤ i ≤ t, ranging on the domains
Di , whose analytic forms are unknown;
• a training set Tr ⊆ Ind(A) for which the numeric ﬁllers are
known,
Tr = {a ∈ Ind(A) | Ri (a, vi ) ∈ A, vi ∈ Di , 1 ≤ i ≤ t}
Build a regression model for {Ri }t
i=1, i.e. a function
h : Ind(A) → D1 × · · · × Dt such that it minimizes a loss function over
Tr. A possible loss function may be based on the mean square error.

The proposed solution
• Predictive Clustering
• objects are clustered according to an homogeneity criterion
• for each cluster a predictive model is determined (e.g. vector
containing predictions)
(a) clustering (b) predictive mod-
els
(c) predictive clus-
tering

The model for multi-target regression
• Given a knowledge
base K, a PCT for
multi-target regression
is a binary tree where
• intermediate nodes:
DL concept
descriptions
• leaf nodes: vectors
containing the
predictions w.r.t.
the target properties
Comedy
Comedy starring.Actor
p = (8.45, 9810666) p = (5.38, 4200000)
¬Comedy ¬Horror
p = (4.7, 4200000) p = (8.6, 4930000)

Learning PCTs
• Divide-and-conquer strategy
• For the current node:
• the reﬁnement operator generates the candidate concepts
• The most promising concept E∗
is selected by maximizing the
homogeneity w.r.t. the target variables simultaneously.
• Best concept: the one minimizing the RMSE of the standardized
target properties values
• Stop conditions:
• maximum number of levels
• size of the training (sub)set
• Leaf: the i-th component contains the average value for the i-th
target property over the instances sorted to the node

Installing new DL concepts as inner nodes
• The candidate concept descriptions are generated by using a
reﬁnement operator
• A quasi ordering relation over the space of the concept
descriptions
• The subsumption between concepts in Description Logics
• Downward reﬁnement operator ρ(·) to obtain specializations E of
a concept description D (E D)
• Each concept can be obtained:
• by introducing a new concept name (or its complement) as a
conjuct
• by replacing a sub-description in the scope of an existential
restriction
• by replacing a sub-description in the scope of an universal
restriction

Prediction
• Given an unseen individual a, the properties values are
determined by traversing the tree structure
• Given a test concept D:
• if K |= D(a) the left branch is followed
• if K |= ¬D(a) the right branch is followed
• otherwise, a default model is returned

Experiments
Settings
• Ontologies extracted from DBPedia via crawling
• Maximum depth for PCTs: 10, 15,20
• Comparison w.r.t. Terminological regression trees (TRT),
multi-target k-nn regressor (with k =
√
Tr) and multi-target
linear regression model
• atomic concepts as features set for k-nn regressor and multi-target
linear regression model
• 10-fold cross validation
• performance in terms of RRMSE

Table: Datasets extracted from DBPedia
Datasets Expr. Axioms. #classes # properties # ind.
Fragm.#1 ALCO 17222 990 255 12053
Fragm.#2 ALCO 20456 425 255 14400
Fragm.#3 ALCO 9070 370 106 4499
Table: Target properties ranges, number of individuals employed in the
learning problem
Datasets Properties Range |Tr|
Fragm. # 1
elevation [-654.14,19.00]
10000
populationTotal [0.0, 2255]
Fragm. #2
areaTotal [0, 16980.1]
10000
areaUrban [0.0, 6740.74]
areaMetro [0, 652874]
Fragm. #3
height [0,251.6]
2256
weight [-63.12,304.25]

Outcomes
Table: RRMSE averaged on the number of runs
Datasets PCT TRT k-NN LR
Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02
Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02
Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05
Table: Comparison in terms of elapsed times (secs)
Datasets PCT TRT k-NN LR
Fragm #1 elevation 2454.3
populationTotal 2353.0
total 2432 4807.3 547.6 234.5
Fragm #2 areaTotal 2256.0
areaUrban 2345.0
areaMetro 2345.2
total 2456 6946.2 546.2 235.7
Fragm #3 height 743.5
weight 743.4
total 743.3 1486.9 372.3 123.5

Discussion
• PCTs more performant than TRT
• the diﬀerent heuristic allows to choose more promising concepts
• standardization mitigated abnormal values increasing the error
• PCT more performant than k-nn
• curse of dimensionality
• k-nn more performant than LR
• spurious individuals were excluded to determine the local model
• PCTs more eﬃcient than TRTs

Conclusion and Further Outlooks
• We proposed an extension of predictive clustering trees compliant
to DL representation languages for solving the problem of
predicting datatype properties
• Further extensions
• New reﬁnement operators
• Further heuristics
• linear models at leaf nodes

Questions?

Inducing Predictive Clustering Trees for Datatype properties Values

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to Inducing Predictive Clustering Trees for Datatype properties Values

Similar to Inducing Predictive Clustering Trees for Datatype properties Values (20)

Recently uploaded

Recently uploaded (20)

Inducing Predictive Clustering Trees for Datatype properties Values