SlideShare a Scribd company logo
Inducing Predictive Clustering Trees for
Datatype properties Values
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito
Semantic Machine Learning, 10th July 2016
G.Rizzo et al. (Univ. of Bari) 10th July 2016 1 / 18
Outline
1 The Context and Motivations
2 Basics
3 The approach
4 Empirical Evaluation
5 Conclusion & Further Extensions
G.Rizzo et al. (Univ. of Bari) 10th July 2016 2 / 18
The Context and Motivations
• Goal: approximating the (numerical) datatype property values
through regression models in the Web of Data
• Web of data: a large number of knowledge bases, datasets and
vocabularies exposed in a standard format (RDF, OWL)
• (numerical) property values can hardly be derived by using
reasoning services
• Open World Assumption
• a large number of missing information
• The informative gap can be filled by using regression models
G.Rizzo et al. (Univ. of Bari) 10th July 2016 3 / 18
The context and Motivations
• Solving a regression problem
• two or more property values may be related (e.g. crime rate and
population of a place)
• correlations should improve the predictiveness
• Predicting more numerical values at once (multi-target
regression) through Predictive Clustering approaches
• Predictive Clustering Trees (PCTs) as a generalization of decision
trees
• PCTs compliant to the representation languages for the Web of
Data (e.g. Description Logics)
• target values: the numeric role fillers for the properties
G.Rizzo et al. (Univ. of Bari) 10th July 2016 4 / 18
Description Logics
Syntax & Semantics
• Atomic concepts (classes), NC and roles (relations), NR to model
domains
• Operators to build complex concept descriptions
• Concrete domains: string, boolean, numeric values
• Semantics defined through interpretations I = (∆I, ·I)
• ∆I
: domain of the interpretation
• ·I
: intepretation function
• for each concept C ∈ NC , CI
⊆ ∆I
• for each role R ∈ NR , RI
⊆ ∆I
× ∆I
ALC operators
Top concept: ∆I
Bottom concept: ⊥ ∅
Concept: C CI
⊆ ∆I
Full Complement: ¬C ∆  CI
Intersection: C D CI
∩ DI
Disjunction: C D CI
∪ DI
Universal restriction ∀R.D {x ∈ ∆I
| ∀y ∈ ∆I
(x, y) ∈ RI
→ y ∈ DI
}
Existential restriction ∃R.D {x ∈ ∆I
| ∃y ∈ ∆I
(x, y) ∈ RI
∧ y ∈ DI
}
G.Rizzo et al. (Univ. of Bari) 10th July 2016 5 / 18
Description Logics
Knowledge bases
• Knowledge base: a couple K = (T , A) where
• T (TBox): axioms concerning concepts/roles
• Subsumption axioms C D: iff for every interpretation I,
CI
⊆ DI
holds
• Equivalence axioms C ≡ D: iff for every interpretation I,
CI
⊆ DI
and I, DI
⊆ CI
holds
• A (ABox): class assertions, C(a) and role assertions,R(a, b) about
a set of individuals is denoted by Ind(A)
• Reasoning services:
• subsumption: a concept is more general than a given one
• satisfiability: given a concept description C and an interpretation
I, CI
= ∅
• instance checking: for every interpretation, I C(a) holds (a is an
instance for C)
G.Rizzo et al. (Univ. of Bari) 10th July 2016 6 / 18
The problem
Given:
• a knowledge base K = (T , A);
• the target functional roles Ri , 1 ≤ i ≤ t, ranging on the domains
Di , whose analytic forms are unknown;
• a training set Tr ⊆ Ind(A) for which the numeric fillers are
known,
Tr = {a ∈ Ind(A) | Ri (a, vi ) ∈ A, vi ∈ Di , 1 ≤ i ≤ t}
Build a regression model for {Ri }t
i=1, i.e. a function
h : Ind(A) → D1 × · · · × Dt such that it minimizes a loss function over
Tr. A possible loss function may be based on the mean square error.
G.Rizzo et al. (Univ. of Bari) 10th July 2016 7 / 18
The proposed solution
• Predictive Clustering
• objects are clustered according to an homogeneity criterion
• for each cluster a predictive model is determined (e.g. vector
containing predictions)
(a) clustering (b) predictive mod-
els
(c) predictive clus-
tering
G.Rizzo et al. (Univ. of Bari) 10th July 2016 8 / 18
The model for multi-target regression
• Given a knowledge
base K, a PCT for
multi-target regression
is a binary tree where
• intermediate nodes:
DL concept
descriptions
• leaf nodes: vectors
containing the
predictions w.r.t.
the target properties
Comedy
Comedy starring.Actor
p = (8.45, 9810666) p = (5.38, 4200000)
¬Comedy ¬Horror
p = (4.7, 4200000) p = (8.6, 4930000)
G.Rizzo et al. (Univ. of Bari) 10th July 2016 9 / 18
Learning PCTs
• Divide-and-conquer strategy
• For the current node:
• the refinement operator generates the candidate concepts
• The most promising concept E∗
is selected by maximizing the
homogeneity w.r.t. the target variables simultaneously.
• Best concept: the one minimizing the RMSE of the standardized
target properties values
• Stop conditions:
• maximum number of levels
• size of the training (sub)set
• Leaf: the i-th component contains the average value for the i-th
target property over the instances sorted to the node
G.Rizzo et al. (Univ. of Bari) 10th July 2016 10 / 18
Installing new DL concepts as inner nodes
• The candidate concept descriptions are generated by using a
refinement operator
• A quasi ordering relation over the space of the concept
descriptions
• The subsumption between concepts in Description Logics
• Downward refinement operator ρ(·) to obtain specializations E of
a concept description D (E D)
• Each concept can be obtained:
• by introducing a new concept name (or its complement) as a
conjuct
• by replacing a sub-description in the scope of an existential
restriction
• by replacing a sub-description in the scope of an universal
restriction
G.Rizzo et al. (Univ. of Bari) 10th July 2016 11 / 18
Prediction
• Given an unseen individual a, the properties values are
determined by traversing the tree structure
• Given a test concept D:
• if K |= D(a) the left branch is followed
• if K |= ¬D(a) the right branch is followed
• otherwise, a default model is returned
G.Rizzo et al. (Univ. of Bari) 10th July 2016 12 / 18
Experiments
Settings
• Ontologies extracted from DBPedia via crawling
• Maximum depth for PCTs: 10, 15,20
• Comparison w.r.t. Terminological regression trees (TRT),
multi-target k-nn regressor (with k =
√
Tr) and multi-target
linear regression model
• atomic concepts as features set for k-nn regressor and multi-target
linear regression model
• 10-fold cross validation
• performance in terms of RRMSE
G.Rizzo et al. (Univ. of Bari) 10th July 2016 13 / 18
Table: Datasets extracted from DBPedia
Datasets Expr. Axioms. #classes # properties # ind.
Fragm.#1 ALCO 17222 990 255 12053
Fragm.#2 ALCO 20456 425 255 14400
Fragm.#3 ALCO 9070 370 106 4499
Table: Target properties ranges, number of individuals employed in the
learning problem
Datasets Properties Range |Tr|
Fragm. # 1
elevation [-654.14,19.00]
10000
populationTotal [0.0, 2255]
Fragm. #2
areaTotal [0, 16980.1]
10000
areaUrban [0.0, 6740.74]
areaMetro [0, 652874]
Fragm. #3
height [0,251.6]
2256
weight [-63.12,304.25]
G.Rizzo et al. (Univ. of Bari) 10th July 2016 14 / 18
Outcomes
Table: RRMSE averaged on the number of runs
Datasets PCT TRT k-NN LR
Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02
Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02
Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05
Table: Comparison in terms of elapsed times (secs)
Datasets PCT TRT k-NN LR
Fragm #1 elevation 2454.3
populationTotal 2353.0
total 2432 4807.3 547.6 234.5
Fragm #2 areaTotal 2256.0
areaUrban 2345.0
areaMetro 2345.2
total 2456 6946.2 546.2 235.7
Fragm #3 height 743.5
weight 743.4
total 743.3 1486.9 372.3 123.5
G.Rizzo et al. (Univ. of Bari) 10th July 2016 15 / 18
Discussion
• PCTs more performant than TRT
• the different heuristic allows to choose more promising concepts
• standardization mitigated abnormal values increasing the error
• PCT more performant than k-nn
• curse of dimensionality
• k-nn more performant than LR
• spurious individuals were excluded to determine the local model
• PCTs more efficient than TRTs
G.Rizzo et al. (Univ. of Bari) 10th July 2016 16 / 18
Conclusion and Further Outlooks
• We proposed an extension of predictive clustering trees compliant
to DL representation languages for solving the problem of
predicting datatype properties
• Further extensions
• New refinement operators
• Further heuristics
• linear models at leaf nodes
G.Rizzo et al. (Univ. of Bari) 10th July 2016 17 / 18
Questions?
G.Rizzo et al. (Univ. of Bari) 10th July 2016 18 / 18

More Related Content

What's hot

Assignment 3 push down automata final
Assignment 3 push down automata finalAssignment 3 push down automata final
Assignment 3 push down automata final
Pawan Goel
 
ForecastCombinations package
ForecastCombinations packageForecastCombinations package
ForecastCombinations package
eraviv
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
Sebastian Ruder
 
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
Asai Masataro
 
Probabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDFProbabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDF
DKALab
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
matele41
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributions
Tomasz Kusmierczyk
 
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationAsymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain Adaptation
Yoshitaka Ushiku
 
DL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedDL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning Revisited
Giuseppe Rizzo
 
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
QUT_SEF
 
Lecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmLecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithm
Hema Kashyap
 
Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN
RishirajChakraborty4
 

What's hot (12)

Assignment 3 push down automata final
Assignment 3 push down automata finalAssignment 3 push down automata final
Assignment 3 push down automata final
 
ForecastCombinations package
ForecastCombinations packageForecastCombinations package
ForecastCombinations package
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
 
Probabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDFProbabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDF
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributions
 
Asymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain AdaptationAsymmetric Tri-training for Unsupervised Domain Adaptation
Asymmetric Tri-training for Unsupervised Domain Adaptation
 
DL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedDL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning Revisited
 
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
 
Lecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmLecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithm
 
Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN
 

Similar to Inducing Predictive Clustering Trees for Datatype properties Values

Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Giuseppe Rizzo
 
On the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision TreesOn the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision Trees
Giuseppe Rizzo
 
Towards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeTowards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision Tree
Giuseppe Rizzo
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
Denis Parra Santander
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
Rebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
PyData
 
Cluster
ClusterCluster
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
Enrico Daga
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
REVEAL - Social Media Verification
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
Symeon Papadopoulos
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Arithmer Inc.
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
dgarijo
 
clustering.ppt
clustering.pptclustering.ppt
clustering.ppt
VivekKumar898803
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
National Institute of Informatics
 
Finding the Extreme Values with some Application of Derivatives
Finding the Extreme Values with some Application of DerivativesFinding the Extreme Values with some Application of Derivatives
Finding the Extreme Values with some Application of Derivatives
ijtsrd
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
csandit
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
cscpconf
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
Ahmed Gad
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
Roy Clariana
 

Similar to Inducing Predictive Clustering Trees for Datatype properties Values (20)

Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
 
On the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision TreesOn the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision Trees
 
Towards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeTowards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision Tree
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
Cluster
ClusterCluster
Cluster
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
clustering.ppt
clustering.pptclustering.ppt
clustering.ppt
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 
Finding the Extreme Values with some Application of Derivatives
Finding the Extreme Values with some Application of DerivativesFinding the Extreme Values with some Application of Derivatives
Finding the Extreme Values with some Application of Derivatives
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 

Inducing Predictive Clustering Trees for Datatype properties Values

  • 1. Inducing Predictive Clustering Trees for Datatype properties Values Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito Semantic Machine Learning, 10th July 2016 G.Rizzo et al. (Univ. of Bari) 10th July 2016 1 / 18
  • 2. Outline 1 The Context and Motivations 2 Basics 3 The approach 4 Empirical Evaluation 5 Conclusion & Further Extensions G.Rizzo et al. (Univ. of Bari) 10th July 2016 2 / 18
  • 3. The Context and Motivations • Goal: approximating the (numerical) datatype property values through regression models in the Web of Data • Web of data: a large number of knowledge bases, datasets and vocabularies exposed in a standard format (RDF, OWL) • (numerical) property values can hardly be derived by using reasoning services • Open World Assumption • a large number of missing information • The informative gap can be filled by using regression models G.Rizzo et al. (Univ. of Bari) 10th July 2016 3 / 18
  • 4. The context and Motivations • Solving a regression problem • two or more property values may be related (e.g. crime rate and population of a place) • correlations should improve the predictiveness • Predicting more numerical values at once (multi-target regression) through Predictive Clustering approaches • Predictive Clustering Trees (PCTs) as a generalization of decision trees • PCTs compliant to the representation languages for the Web of Data (e.g. Description Logics) • target values: the numeric role fillers for the properties G.Rizzo et al. (Univ. of Bari) 10th July 2016 4 / 18
  • 5. Description Logics Syntax & Semantics • Atomic concepts (classes), NC and roles (relations), NR to model domains • Operators to build complex concept descriptions • Concrete domains: string, boolean, numeric values • Semantics defined through interpretations I = (∆I, ·I) • ∆I : domain of the interpretation • ·I : intepretation function • for each concept C ∈ NC , CI ⊆ ∆I • for each role R ∈ NR , RI ⊆ ∆I × ∆I ALC operators Top concept: ∆I Bottom concept: ⊥ ∅ Concept: C CI ⊆ ∆I Full Complement: ¬C ∆ CI Intersection: C D CI ∩ DI Disjunction: C D CI ∪ DI Universal restriction ∀R.D {x ∈ ∆I | ∀y ∈ ∆I (x, y) ∈ RI → y ∈ DI } Existential restriction ∃R.D {x ∈ ∆I | ∃y ∈ ∆I (x, y) ∈ RI ∧ y ∈ DI } G.Rizzo et al. (Univ. of Bari) 10th July 2016 5 / 18
  • 6. Description Logics Knowledge bases • Knowledge base: a couple K = (T , A) where • T (TBox): axioms concerning concepts/roles • Subsumption axioms C D: iff for every interpretation I, CI ⊆ DI holds • Equivalence axioms C ≡ D: iff for every interpretation I, CI ⊆ DI and I, DI ⊆ CI holds • A (ABox): class assertions, C(a) and role assertions,R(a, b) about a set of individuals is denoted by Ind(A) • Reasoning services: • subsumption: a concept is more general than a given one • satisfiability: given a concept description C and an interpretation I, CI = ∅ • instance checking: for every interpretation, I C(a) holds (a is an instance for C) G.Rizzo et al. (Univ. of Bari) 10th July 2016 6 / 18
  • 7. The problem Given: • a knowledge base K = (T , A); • the target functional roles Ri , 1 ≤ i ≤ t, ranging on the domains Di , whose analytic forms are unknown; • a training set Tr ⊆ Ind(A) for which the numeric fillers are known, Tr = {a ∈ Ind(A) | Ri (a, vi ) ∈ A, vi ∈ Di , 1 ≤ i ≤ t} Build a regression model for {Ri }t i=1, i.e. a function h : Ind(A) → D1 × · · · × Dt such that it minimizes a loss function over Tr. A possible loss function may be based on the mean square error. G.Rizzo et al. (Univ. of Bari) 10th July 2016 7 / 18
  • 8. The proposed solution • Predictive Clustering • objects are clustered according to an homogeneity criterion • for each cluster a predictive model is determined (e.g. vector containing predictions) (a) clustering (b) predictive mod- els (c) predictive clus- tering G.Rizzo et al. (Univ. of Bari) 10th July 2016 8 / 18
  • 9. The model for multi-target regression • Given a knowledge base K, a PCT for multi-target regression is a binary tree where • intermediate nodes: DL concept descriptions • leaf nodes: vectors containing the predictions w.r.t. the target properties Comedy Comedy starring.Actor p = (8.45, 9810666) p = (5.38, 4200000) ¬Comedy ¬Horror p = (4.7, 4200000) p = (8.6, 4930000) G.Rizzo et al. (Univ. of Bari) 10th July 2016 9 / 18
  • 10. Learning PCTs • Divide-and-conquer strategy • For the current node: • the refinement operator generates the candidate concepts • The most promising concept E∗ is selected by maximizing the homogeneity w.r.t. the target variables simultaneously. • Best concept: the one minimizing the RMSE of the standardized target properties values • Stop conditions: • maximum number of levels • size of the training (sub)set • Leaf: the i-th component contains the average value for the i-th target property over the instances sorted to the node G.Rizzo et al. (Univ. of Bari) 10th July 2016 10 / 18
  • 11. Installing new DL concepts as inner nodes • The candidate concept descriptions are generated by using a refinement operator • A quasi ordering relation over the space of the concept descriptions • The subsumption between concepts in Description Logics • Downward refinement operator ρ(·) to obtain specializations E of a concept description D (E D) • Each concept can be obtained: • by introducing a new concept name (or its complement) as a conjuct • by replacing a sub-description in the scope of an existential restriction • by replacing a sub-description in the scope of an universal restriction G.Rizzo et al. (Univ. of Bari) 10th July 2016 11 / 18
  • 12. Prediction • Given an unseen individual a, the properties values are determined by traversing the tree structure • Given a test concept D: • if K |= D(a) the left branch is followed • if K |= ¬D(a) the right branch is followed • otherwise, a default model is returned G.Rizzo et al. (Univ. of Bari) 10th July 2016 12 / 18
  • 13. Experiments Settings • Ontologies extracted from DBPedia via crawling • Maximum depth for PCTs: 10, 15,20 • Comparison w.r.t. Terminological regression trees (TRT), multi-target k-nn regressor (with k = √ Tr) and multi-target linear regression model • atomic concepts as features set for k-nn regressor and multi-target linear regression model • 10-fold cross validation • performance in terms of RRMSE G.Rizzo et al. (Univ. of Bari) 10th July 2016 13 / 18
  • 14. Table: Datasets extracted from DBPedia Datasets Expr. Axioms. #classes # properties # ind. Fragm.#1 ALCO 17222 990 255 12053 Fragm.#2 ALCO 20456 425 255 14400 Fragm.#3 ALCO 9070 370 106 4499 Table: Target properties ranges, number of individuals employed in the learning problem Datasets Properties Range |Tr| Fragm. # 1 elevation [-654.14,19.00] 10000 populationTotal [0.0, 2255] Fragm. #2 areaTotal [0, 16980.1] 10000 areaUrban [0.0, 6740.74] areaMetro [0, 652874] Fragm. #3 height [0,251.6] 2256 weight [-63.12,304.25] G.Rizzo et al. (Univ. of Bari) 10th July 2016 14 / 18
  • 15. Outcomes Table: RRMSE averaged on the number of runs Datasets PCT TRT k-NN LR Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02 Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02 Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05 Table: Comparison in terms of elapsed times (secs) Datasets PCT TRT k-NN LR Fragm #1 elevation 2454.3 populationTotal 2353.0 total 2432 4807.3 547.6 234.5 Fragm #2 areaTotal 2256.0 areaUrban 2345.0 areaMetro 2345.2 total 2456 6946.2 546.2 235.7 Fragm #3 height 743.5 weight 743.4 total 743.3 1486.9 372.3 123.5 G.Rizzo et al. (Univ. of Bari) 10th July 2016 15 / 18
  • 16. Discussion • PCTs more performant than TRT • the different heuristic allows to choose more promising concepts • standardization mitigated abnormal values increasing the error • PCT more performant than k-nn • curse of dimensionality • k-nn more performant than LR • spurious individuals were excluded to determine the local model • PCTs more efficient than TRTs G.Rizzo et al. (Univ. of Bari) 10th July 2016 16 / 18
  • 17. Conclusion and Further Outlooks • We proposed an extension of predictive clustering trees compliant to DL representation languages for solving the problem of predicting datatype properties • Further extensions • New refinement operators • Further heuristics • linear models at leaf nodes G.Rizzo et al. (Univ. of Bari) 10th July 2016 17 / 18
  • 18. Questions? G.Rizzo et al. (Univ. of Bari) 10th July 2016 18 / 18