SlideShare a Scribd company logo
1 of 18
Download to read offline
Approximating Numeric Role Fillers via
Predictive Clustering Trees for Knowledge
Base Enrichment in the Web of data
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito
Discovery Science 2016, Bari, 19th October 2016
G.Rizzo et al. (Univ. of Bari) 19th October 2016 1 / 16
Outline
1 The Context and Motivations
2 Basics
3 The approach
4 Empirical Evaluation
5 Conclusion & Further Extensions
G.Rizzo et al. (Univ. of Bari) 19th October 2016 2 / 16
The Context and Motivations
• Goal: Determine the numerical property values (used as
attributes) for a resource in a Web of Data knowledge base
• Web of data: lots of knowledge bases exposed in a standard
format (RDF, OWL)
• Two resources or a resource and a literal are linked through
properties (strings, numerical values)
• Inference services may fail to determine the value due to the
Open World Assumption
• Solution: solve a multi-target regression problem
• Predictive Clustering Trees (PCTs) for the Web of Data
representations (e.g. DLs)
• to predict the most plausible value
• to elicit rules (e.g. SWRL rules) for enriching the schema of a
knowledge base
G.Rizzo et al. (Univ. of Bari) 19th October 2016 3 / 16
Description Logics
Syntax & Semantics
• Atomic concepts (classes), NC
• Roles (binary relations), NR
• Concrete domains: string, boolean, numeric values
• Operators to build complex concept descriptions
• Semantics defined through interpretations I = (∆I, ·I)
• ∆I
: domain of the interpretation
• ·I
: intepretation function
• for each concept C ∈ NC , CI
⊆ ∆I
• for each role R ∈ NR , RI
⊆ ∆I
× ∆I
G.Rizzo et al. (Univ. of Bari) 19th October 2016 4 / 16
Description Logics
Knowledge bases
• Knowledge base: a couple K = (T , A) where
• T (TBox): axioms concerning concepts/roles
• Subsumption axioms C D: iff for every interpretation I,
CI
⊆ DI
holds
• Equivalence axioms C ≡ D: iff for every interpretation I,
CI
⊆ DI
and I, DI
⊆ CI
holds
• A (ABox): assertions about a set of individuals is denoted by
Ind(A)
• class assertions, C(a)
• role assertions,R(a, b) ( b is called role filler)
• Reasoning services:
• subsumption: a concept is more general than a given one
• satisfiability: given a concept description C and an interpretation
I, CI
= ∅
• instance checking: for every interpretation, I C(a) holds (a is an
instance for C)
G.Rizzo et al. (Univ. of Bari) 19th October 2016 5 / 16
Semantic Web Rules Language (SWRL)
• Datalog-like representation language
• Adds the expressiveness to DLs
• Syntax:
• term: a (universal quantified) variable x or a constant c
• atom: unary or binary predicate C(t1) and R(t1, t2) (predicate
symbols are concept and role names), where ti are terms
• Rule: implication between an antecedent/body and a consequent
B1 ∧ · · · ∧ Bn → H1 ∧ · · · Hm
We are interested to safety rules (each variable in the body must
be in the head)
• Open-World Assumption holds
G.Rizzo et al. (Univ. of Bari) 19th October 2016 6 / 16
The model for multi-target regression
• PCT for multi-target
regression: a binary
tree where
• inner nodes: DL
conjunctive concept
descriptions
• leaf nodes: vectors
with the
approximated target
properties values
Comedy
Comedy starring.Actor
p = (8.45, 9810666) p = (5.38, 4200000)
¬Comedy ¬Horror
p = (4.7, 4200000) p = (8.6, 4930000)
G.Rizzo et al. (Univ. of Bari) 19th October 2016 7 / 16
Learning PCTs in DLs
• Divide-and-conquer approach
• Training set: individuals whose target properties values are known
• partitioning according to the membership w.r.t. a new concept
• Refinement operator for generating the concepts
• by introducing a new concept name (or its complement)
• by replacing a sub-description with an existential restriction
• by replacing a sub-description with an universal restriction
• Best Concept: minimization the RMSE of the standardized
values of the target properties
• Stop conditions: maximum number of levels or size of the
training (sub)set
G.Rizzo et al. (Univ. of Bari) 19th October 2016 8 / 16
Exploiting PCTs
• Prediction: given an individual a, the algorithm traverses the tree
according to the instance check w.r.t. the inner concepts D
• if K |= D(a) the left branch is followed
• K |= ¬D(a) the right branch is satisfied
• otherwise, a default model is returned
• Eliciting SWRL rules: traversing recursive tree structure and
collecting the intermediate concept along a branch
• Body: intermediate concept descriptions as predicate name
• Head: each target property name as the predicate name
• the approximated value as a term
G.Rizzo et al. (Univ. of Bari) 19th October 2016 9 / 16
Experiments
Small ontologies: Settings
• Maximum depth for PCTs: 10, 15,20
• Comparison w.r.t. Terminological regression trees (TRT),
multi-target k-nn regressor (with k =
√
Tr) and multi-target
linear regression model
• atomic concepts as features set for k-nn regressor and multi-target
linear regression model
• 0.632 bootstrap
• performance in terms of RMSE
G.Rizzo et al. (Univ. of Bari) 19th October 2016 10 / 16
Experiments
Linked Data datasets: Settings
• Ontologies extracted from DBPedia via crawling
• Maximum depth for PCTs: 10, 15,20
• Comparison w.r.t. TRTs,k-nn (with k =
√
Tr) and LR
• 10-fold cross validation
• performance in terms of RRMSE
G.Rizzo et al. (Univ. of Bari) 19th October 2016 11 / 16
Experiments
Small ontologies: Outcomes
RMSE averaged over the number of the replications (and standard
deviations)
Ontology PCT TRT k-NN LR
BCO 0.0277 ± 0.01 0.0356 ± 0.01 0.0472 ± 0.01 0.0554 ± 0.01
BioPax 132 ± 11.0 145 ± 12.0 186 ± 7.00 195 ± 8.85
geopolitical 0.0284 ± 0.01 0.03561 ± 0.03 0.057 ± 0.03 0.06 ± 0.02
monetary 7.52 ± 0.15 8.46 ± 0.07 7.53 ± 0.17 7.78 ± 0.34
mutagenesis 0.0445 ± 0.07 0.0637 ± 0.03 0.0547 ± 0.02 0.0647 ± 0.05
G.Rizzo et al. (Univ. of Bari) 19th October 2016 12 / 16
Experiments
Linked Data datasets: Outcomes
Table: RRMSE averaged on the number of runs
Datasets PCT TRT k-NN LR
Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02
Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02
Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05
Table: Comparison in terms of elapsed times (secs)
Datasets PCT TRT k-NN LR
Fragm #1 elevation 2454.3
populationTotal 2353.0
total 2432 4807.3 547.6 234.5
Fragm #2 areaTotal 2256.0
areaUrban 2345.0
areaMetro 2345.2
total 2456 6946.2 546.2 235.7
Fragm #3 height 743.5
weight 743.4
total 743.3 1486.9 372.3 123.5
G.Rizzo et al. (Univ. of Bari) 19th October 2016 13 / 16
Discussion
• PCTs more performant than TRTs
• the different heuristic allows to choose more promising concepts
• standardization mitigated the abnormal values
• PCTs more performant than k-nn
• curse of dimensionality
• k-nn more performant than LR
• spurious individuals were excluded to determine the local model
• PCTs more efficient than TRTs
G.Rizzo et al. (Univ. of Bari) 19th October 2016 14 / 16
Examples of discovered rules
According to the discovered rules an American football player is taller
than a person that does not play american football
Person(x) ∧ Athlete(x) ∧ AmericanFootballPlayer(x) →
height(x, 195.4)
Person(x) ∧ Athlete(x) ∧ AmericanFootballPlayer(x) →
weight(x, 113.5)
Person(x) ∧ Athlete(x) ∧ ¬AmericanFootballPlayer(x) →
height(x, 187)
Person(x) ∧ Athlete(x) ∧ ¬AmericanFootballPlayer(x) →
weight(x, 87.5)
G.Rizzo et al. (Univ. of Bari) 19th October 2016 15 / 16
Conclusion and Further Outlooks
• We proposed an extension of predictive clustering trees compliant
to DL representation languages for solving the problem of
predicting datatype properties and discovering rules
• The outcomes are promising
• Further extensions
• New refinement operators
• Further heuristics
• linear models at leaf nodes
G.Rizzo et al. (Univ. of Bari) 19th October 2016 16 / 16
Questions?
G.Rizzo et al. (Univ. of Bari) 19th October 2016 16 / 16
Table: Datasets extracted from DBPedia
Datasets Expr. Axioms. #classes # properties # ind.
Fragm.#1 ALCO 17222 990 255 12053
Fragm.#2 ALCO 20456 425 255 14400
Fragm.#3 ALCO 9070 370 106 4499
Table: Target properties ranges, number of individuals employed in the
learning problem
Datasets Properties Range |Tr|
Fragm. # 1
elevation [-654.14,19.00]
10000
populationTotal [0.0, 2255]
Fragm. #2
areaTotal [0, 16980.1]
10000
areaUrban [0.0, 6740.74]
areaMetro [0, 652874]
Fragm. #3
height [0,251.6]
2256
weight [-63.12,304.25]
G.Rizzo et al. (Univ. of Bari) 19th October 2016 1 / 1

More Related Content

Similar to Approximating Numeric Role Fillers via Predictive Clustering Trees for Knowledge Base Enrichmenent in the Web of Data

Inducing Predictive Clustering Trees for Datatype properties Values
Inducing Predictive Clustering Trees for Datatype properties ValuesInducing Predictive Clustering Trees for Datatype properties Values
Inducing Predictive Clustering Trees for Datatype properties ValuesGiuseppe Rizzo
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Roy Clariana
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...taxonbytes
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspectiveankurpandeyinfo
 
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...COST Action TD1210
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...National Institute of Informatics
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsRebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 

Similar to Approximating Numeric Role Fillers via Predictive Clustering Trees for Knowledge Base Enrichmenent in the Web of Data (20)

Inducing Predictive Clustering Trees for Datatype properties Values
Inducing Predictive Clustering Trees for Datatype properties ValuesInducing Predictive Clustering Trees for Datatype properties Values
Inducing Predictive Clustering Trees for Datatype properties Values
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
Kdd by Mr.Sameer Kumar Das
Kdd by Mr.Sameer Kumar DasKdd by Mr.Sameer Kumar Das
Kdd by Mr.Sameer Kumar Das
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
 
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 

More from Giuseppe Rizzo

Boosting dl concept learners
Boosting dl concept learners Boosting dl concept learners
Boosting dl concept learners Giuseppe Rizzo
 
DL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedDL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedGiuseppe Rizzo
 
A framework for Tackling myopia in concept learning on the Web of Data
A framework for Tackling myopia in concept learning on the Web of DataA framework for Tackling myopia in concept learning on the Web of Data
A framework for Tackling myopia in concept learning on the Web of DataGiuseppe Rizzo
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
 
On the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision TreesOn the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision TreesGiuseppe Rizzo
 
Inductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their EnsembleInductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their EnsembleGiuseppe Rizzo
 
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge basesTackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge basesGiuseppe Rizzo
 
Towards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeTowards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeGiuseppe Rizzo
 

More from Giuseppe Rizzo (8)

Boosting dl concept learners
Boosting dl concept learners Boosting dl concept learners
Boosting dl concept learners
 
DL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning RevisitedDL-Foil:Class Expression Learning Revisited
DL-Foil:Class Expression Learning Revisited
 
A framework for Tackling myopia in concept learning on the Web of Data
A framework for Tackling myopia in concept learning on the Web of DataA framework for Tackling myopia in concept learning on the Web of Data
A framework for Tackling myopia in concept learning on the Web of Data
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
On the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision TreesOn the Effectiveness of Evidence-based Terminological Decision Trees
On the Effectiveness of Evidence-based Terminological Decision Trees
 
Inductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their EnsembleInductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their Ensemble
 
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge basesTackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
 
Towards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeTowards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision Tree
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 

Recently uploaded (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 

Approximating Numeric Role Fillers via Predictive Clustering Trees for Knowledge Base Enrichmenent in the Web of Data

  • 1. Approximating Numeric Role Fillers via Predictive Clustering Trees for Knowledge Base Enrichment in the Web of data Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito Discovery Science 2016, Bari, 19th October 2016 G.Rizzo et al. (Univ. of Bari) 19th October 2016 1 / 16
  • 2. Outline 1 The Context and Motivations 2 Basics 3 The approach 4 Empirical Evaluation 5 Conclusion & Further Extensions G.Rizzo et al. (Univ. of Bari) 19th October 2016 2 / 16
  • 3. The Context and Motivations • Goal: Determine the numerical property values (used as attributes) for a resource in a Web of Data knowledge base • Web of data: lots of knowledge bases exposed in a standard format (RDF, OWL) • Two resources or a resource and a literal are linked through properties (strings, numerical values) • Inference services may fail to determine the value due to the Open World Assumption • Solution: solve a multi-target regression problem • Predictive Clustering Trees (PCTs) for the Web of Data representations (e.g. DLs) • to predict the most plausible value • to elicit rules (e.g. SWRL rules) for enriching the schema of a knowledge base G.Rizzo et al. (Univ. of Bari) 19th October 2016 3 / 16
  • 4. Description Logics Syntax & Semantics • Atomic concepts (classes), NC • Roles (binary relations), NR • Concrete domains: string, boolean, numeric values • Operators to build complex concept descriptions • Semantics defined through interpretations I = (∆I, ·I) • ∆I : domain of the interpretation • ·I : intepretation function • for each concept C ∈ NC , CI ⊆ ∆I • for each role R ∈ NR , RI ⊆ ∆I × ∆I G.Rizzo et al. (Univ. of Bari) 19th October 2016 4 / 16
  • 5. Description Logics Knowledge bases • Knowledge base: a couple K = (T , A) where • T (TBox): axioms concerning concepts/roles • Subsumption axioms C D: iff for every interpretation I, CI ⊆ DI holds • Equivalence axioms C ≡ D: iff for every interpretation I, CI ⊆ DI and I, DI ⊆ CI holds • A (ABox): assertions about a set of individuals is denoted by Ind(A) • class assertions, C(a) • role assertions,R(a, b) ( b is called role filler) • Reasoning services: • subsumption: a concept is more general than a given one • satisfiability: given a concept description C and an interpretation I, CI = ∅ • instance checking: for every interpretation, I C(a) holds (a is an instance for C) G.Rizzo et al. (Univ. of Bari) 19th October 2016 5 / 16
  • 6. Semantic Web Rules Language (SWRL) • Datalog-like representation language • Adds the expressiveness to DLs • Syntax: • term: a (universal quantified) variable x or a constant c • atom: unary or binary predicate C(t1) and R(t1, t2) (predicate symbols are concept and role names), where ti are terms • Rule: implication between an antecedent/body and a consequent B1 ∧ · · · ∧ Bn → H1 ∧ · · · Hm We are interested to safety rules (each variable in the body must be in the head) • Open-World Assumption holds G.Rizzo et al. (Univ. of Bari) 19th October 2016 6 / 16
  • 7. The model for multi-target regression • PCT for multi-target regression: a binary tree where • inner nodes: DL conjunctive concept descriptions • leaf nodes: vectors with the approximated target properties values Comedy Comedy starring.Actor p = (8.45, 9810666) p = (5.38, 4200000) ¬Comedy ¬Horror p = (4.7, 4200000) p = (8.6, 4930000) G.Rizzo et al. (Univ. of Bari) 19th October 2016 7 / 16
  • 8. Learning PCTs in DLs • Divide-and-conquer approach • Training set: individuals whose target properties values are known • partitioning according to the membership w.r.t. a new concept • Refinement operator for generating the concepts • by introducing a new concept name (or its complement) • by replacing a sub-description with an existential restriction • by replacing a sub-description with an universal restriction • Best Concept: minimization the RMSE of the standardized values of the target properties • Stop conditions: maximum number of levels or size of the training (sub)set G.Rizzo et al. (Univ. of Bari) 19th October 2016 8 / 16
  • 9. Exploiting PCTs • Prediction: given an individual a, the algorithm traverses the tree according to the instance check w.r.t. the inner concepts D • if K |= D(a) the left branch is followed • K |= ¬D(a) the right branch is satisfied • otherwise, a default model is returned • Eliciting SWRL rules: traversing recursive tree structure and collecting the intermediate concept along a branch • Body: intermediate concept descriptions as predicate name • Head: each target property name as the predicate name • the approximated value as a term G.Rizzo et al. (Univ. of Bari) 19th October 2016 9 / 16
  • 10. Experiments Small ontologies: Settings • Maximum depth for PCTs: 10, 15,20 • Comparison w.r.t. Terminological regression trees (TRT), multi-target k-nn regressor (with k = √ Tr) and multi-target linear regression model • atomic concepts as features set for k-nn regressor and multi-target linear regression model • 0.632 bootstrap • performance in terms of RMSE G.Rizzo et al. (Univ. of Bari) 19th October 2016 10 / 16
  • 11. Experiments Linked Data datasets: Settings • Ontologies extracted from DBPedia via crawling • Maximum depth for PCTs: 10, 15,20 • Comparison w.r.t. TRTs,k-nn (with k = √ Tr) and LR • 10-fold cross validation • performance in terms of RRMSE G.Rizzo et al. (Univ. of Bari) 19th October 2016 11 / 16
  • 12. Experiments Small ontologies: Outcomes RMSE averaged over the number of the replications (and standard deviations) Ontology PCT TRT k-NN LR BCO 0.0277 ± 0.01 0.0356 ± 0.01 0.0472 ± 0.01 0.0554 ± 0.01 BioPax 132 ± 11.0 145 ± 12.0 186 ± 7.00 195 ± 8.85 geopolitical 0.0284 ± 0.01 0.03561 ± 0.03 0.057 ± 0.03 0.06 ± 0.02 monetary 7.52 ± 0.15 8.46 ± 0.07 7.53 ± 0.17 7.78 ± 0.34 mutagenesis 0.0445 ± 0.07 0.0637 ± 0.03 0.0547 ± 0.02 0.0647 ± 0.05 G.Rizzo et al. (Univ. of Bari) 19th October 2016 12 / 16
  • 13. Experiments Linked Data datasets: Outcomes Table: RRMSE averaged on the number of runs Datasets PCT TRT k-NN LR Fragm. #1 0.42 ± 0.05 0.63 ± 0.05 0.65 ± 0.02 0.73 ± 0.02 Fragm. #2 0.25 ± 0.001 0.43 ± 0.02 0.53 ± 0.00 0.43 ± 0.02 Fragm. #3 0.24 ± 0.05 0.36 ± 0.2 0.67 ± 0.10 0.73 ± 0.05 Table: Comparison in terms of elapsed times (secs) Datasets PCT TRT k-NN LR Fragm #1 elevation 2454.3 populationTotal 2353.0 total 2432 4807.3 547.6 234.5 Fragm #2 areaTotal 2256.0 areaUrban 2345.0 areaMetro 2345.2 total 2456 6946.2 546.2 235.7 Fragm #3 height 743.5 weight 743.4 total 743.3 1486.9 372.3 123.5 G.Rizzo et al. (Univ. of Bari) 19th October 2016 13 / 16
  • 14. Discussion • PCTs more performant than TRTs • the different heuristic allows to choose more promising concepts • standardization mitigated the abnormal values • PCTs more performant than k-nn • curse of dimensionality • k-nn more performant than LR • spurious individuals were excluded to determine the local model • PCTs more efficient than TRTs G.Rizzo et al. (Univ. of Bari) 19th October 2016 14 / 16
  • 15. Examples of discovered rules According to the discovered rules an American football player is taller than a person that does not play american football Person(x) ∧ Athlete(x) ∧ AmericanFootballPlayer(x) → height(x, 195.4) Person(x) ∧ Athlete(x) ∧ AmericanFootballPlayer(x) → weight(x, 113.5) Person(x) ∧ Athlete(x) ∧ ¬AmericanFootballPlayer(x) → height(x, 187) Person(x) ∧ Athlete(x) ∧ ¬AmericanFootballPlayer(x) → weight(x, 87.5) G.Rizzo et al. (Univ. of Bari) 19th October 2016 15 / 16
  • 16. Conclusion and Further Outlooks • We proposed an extension of predictive clustering trees compliant to DL representation languages for solving the problem of predicting datatype properties and discovering rules • The outcomes are promising • Further extensions • New refinement operators • Further heuristics • linear models at leaf nodes G.Rizzo et al. (Univ. of Bari) 19th October 2016 16 / 16
  • 17. Questions? G.Rizzo et al. (Univ. of Bari) 19th October 2016 16 / 16
  • 18. Table: Datasets extracted from DBPedia Datasets Expr. Axioms. #classes # properties # ind. Fragm.#1 ALCO 17222 990 255 12053 Fragm.#2 ALCO 20456 425 255 14400 Fragm.#3 ALCO 9070 370 106 4499 Table: Target properties ranges, number of individuals employed in the learning problem Datasets Properties Range |Tr| Fragm. # 1 elevation [-654.14,19.00] 10000 populationTotal [0.0, 2255] Fragm. #2 areaTotal [0, 16980.1] 10000 areaUrban [0.0, 6740.74] areaMetro [0, 652874] Fragm. #3 height [0,251.6] 2256 weight [-63.12,304.25] G.Rizzo et al. (Univ. of Bari) 19th October 2016 1 / 1