SlideShare a Scribd company logo
1 of 21
Download to read offline
Inductive Classification through Evidence-based Models
and Their Ensembles
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi and Floriana Esposito
Dipartimento di Informatica
Universit`a degli Studi di Bari ”Aldo Moro”, Bari, Italy
ESWC 2015
June 3rd, 2015
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 1 / 20
Outline
1 Introduction & Motivations
2 DS & Evidential Terminological Decision Trees
3 The framework
4 Experiments
5 Future Works
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 2 / 20
Introduction & Motivations
Motivations
AIM: predicting the membership of an individual w.r.t. a query concept
typically based on automated reasoning techniques
Inferences are affected by the incompleteness of the Semantic Web
decided using models induced by Machine learning methods
The quality depends on the training data distribution
Given a query concept, generally, many uncertain-membership
examples than individuals with a definite membership
We are assuming a ternary classification problem
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 3 / 20
Introduction & Motivations
Motivations
Previous solutions and the current limits
We started to investigate the imbalance learning problem by resorting
to a solution which combines (under-)sampling methods and
ensemble learning models
for overcoming the loss of information due to the discarded instances
Terminological Decision Tree (TDT): a DL-based Decision Tree for
concept learning and assertion prediction problems
combined to obtain Terminological Random Forests (TRF)
Some limits:
predictions made according to simple majority vote procedure (no
conflicts, no uncertainty are considered)
misclassifications mainly due to evenly distributed votes
Further rules (meta-learner) required
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 4 / 20
Introduction & Motivations
Introduction & Motivations
Underlying idea
Using soft predictions (predictions with a confidence measure for each
class value) obtained by each tree for weighting the votes
TDTs return only hard predictions (i,.e. predicted class without any
information)
Dempster-Shafer Theory (DS) operators for information fusion
Solution: Resort and modify the Evidential TDTs (ETDTs)
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 5 / 20
DS & Evidential Terminological Decision Trees
The Dempster-Shafer Theory (DS)
Frame of discernement Ω
a set of hypotheses for a domain, e.g. the membership values for an
individual given a concept Ω = {−1, +1}
Basic Belief Assignement (BBA) m : 2Ω → [0, 1]
the amount of belief exactly committed to A ⊆ Ω
Belief function: ∀A, B ∈ 2Ω Bel(A) = B⊆A m(B)
Plausibility function: ∀A, B ∈ 2Ω Pl(A) = B∩A=∅ m(B)
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 6 / 20
DS & Evidential Terminological Decision Trees
The Dempster-Shafer Theory (DS)
Combination rules: used for pooling evidences for the same frame of
discernment coming from various sources of information
Dempster’s rule
∀A, B, C ⊆ Ω m12(A) = m1 ⊕ m2 = 1
1−c B∩C=A m1(B)m2(C)
Dubois-Prade’s rule
∀A, B, C ⊆ Ω m12(A) = B∪C=A m1(B)m2(C)
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 7 / 20
DS & Evidential Terminological Decision Trees
Evidential TDTs
An ETDT is a binary tree where:
each node contains a conjunctive concept description D and a BBA m
obtained by counting the positive, negative and uncertain instances;
each departing edge is the result of instance-check test w.r.t. D, i.e.,
given an individual a, K |= D(a)?
a child node with the concept description D is obtained using a
refinement operator
The model can be used for returning soft prediction
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 8 / 20
DS & Evidential Terminological Decision Trees
An example of ETDT
∃hasPart.
m= (∅: 0, {+1}:0.30,{-1}:0.36,
{-1,+1}: 0.34)
∃hasPart.Worn
m=(∅: 0.00, {+1}:0.50,{-1}:0.36,
{-1,+1}: 0.14)
∃hasPart.(Worn ¬Replaceable)
m=(∅: 0.00, {+1}:0.50,{-1}:0.36,
{-1,+1}:0.00)
SendBack
m= (∅: 0.00, {+1}:1.00,{-1}:0.00,
{-1,+1}:0.00)
¬SendBack
m=(∅: 0.00, {+1}:0.00,{-1}:1.00,
{-1,+1}:0.00)
¬SendBack
m=(∅: 0.00, {+1}:0.00,{-1}:0.13,
{-1,+1}:0.87)
¬SendBack
m=(∅: 0.0, {+1}:0.00,{-1}:0.00,
{-1,+1}: 1.0)
Ω = {−1, +1}
{+1} ↔ K |= D(a) ∀a ∈ Ind(A)
{−1} ↔ K |= ¬D(b) ∀b ∈ Ind(A)
{−1, +1} otherwise
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 9 / 20
The framework
Evidential Terminological Random Forests
In order to tackle the imbalance learning problem, we propose
Evidential Terminological Random Forest (ETRF), where
each ETDT returns a soft prediction in the form of BBA
the meta-learner is a combination rule
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 10 / 20
The framework
Learning Evidential Terminological Random Forests
Given:
a target concept C
the number of trees n
a training set Tr = Ps, Ns, Us
Ps = {a ∈ Ind(A)|K |= C(a)}
Ns = {b ∈ Ind(A)|K |= ¬C(b)}
Us = {c ∈ Ind(A)|K |= C(c) ∧ K |= ¬C(c)}
the algorithm can be summarized as follows:
build a n bootstrap samples with a balanced distribution
for each sample learn an ETDT model
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 11 / 20
The framework
Learning ETRF
Building bootstrap samples
1 a stratified sampling with replacement procedure is employed in order
to represent the minority class instances in the bootstrap sample.
2 the majority class instances (either positive, negative and
uncertain-membership instances) are discarded.
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 12 / 20
The framework
Learning ETRF
Learning ETDTs
Divide-and-conquer algorithm for learning an ETDT [Rizzo et
al.@IPMU, 2014]
Steps:
1 refinement of the concept description installed into the current node
2 Random selection of a subset of candidates
3 A BBA for each selected description
4 The concept having the most definite membership (and its BBA)
installed into the new node.
Stop conditions: the node is pure w.r.t. the membership
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 13 / 20
The framework
Predicting membership for unseen individuals
Given a forest F and a new individual a, the algorithm collects BBAs
returned by each ETDT
The BBA returned by an ETDT is decided by following a path
according to the instance check test result.
For a concept description installed as node D
if K |= D(a) the left branch is followed
if K |= ¬D(a) the right branch is followed
otherwise both branches are followed
Various leaves can be reached and the corresponding BBAs are pooled
according to the combination rule
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 14 / 20
The framework
Predicting membership for unseen individuals
The set of BBAs returning from all the ETDTs are combined through
the combination rule
After a pooled BBA m is obtained, Bel (resp. Pl) function is derived
Final membership assignement: hypothesis which maximizes belief
(resp. plausibility) function
Bel and Pl function are monotonic : uncertain-memberhip is more
probable
Return the uncertain-membership value when the belief for the
positive- and negative-membership are approximately equal
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 15 / 20
Experiments
Experiments
15 query concepts randomly generated
10-fold cross validation
number of candidates randomly selected: |ρ(·)|
Comparison w.r.t. TDTs, ETDTs, TRFs
Forest sizes: 10, 20, 30 trees
Stratified Sampling rates: 50%, 70 %, 80 %
Metrics:
match: individuals for which the inductive model and a reasoner
predict the same membership
commission: cases of opposite predictions
omission: individuals having a definite membership that cannot be
predicted inductively;
induction: predictions that are not logically derivable.
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 16 / 20
Experiments
Some results...
Ontology index TDT ETDTs
Bco
M% 80.44 ± 11.01 90.31 ± 14.79
C% 07.56 ± 08.08 01.86 ± 02.61
O% 05.04 ± 04.28 00.00 ± 00.00
I% 06.96 ± 05.97 07.83 ± 15.35
Biopax
M% 66.63 ± 14.60 87.00 ± 07.15
C% 31.03 ± 12.95 11.57 ± 02.62
O% 00.39 ± 00.61 00.00 ± 00.00
I% 01.95 ± 07.13 01.43 ± 08.32
NTN
M% 68.85 ± 13.23 23.87 ± 26.18
C% 00.37 ± 00.30 00.00 ± 00.00
O% 09.51 ± 07.06 00.00 ± 00.00
I% 21.27 ± 08.73 75.13 ± 26.18
HD
M% 58.31 ± 14.06 10.69 ± 01.47
C% 00.44 ± 00.47 00.07 ± 00.17
O% 05.51 ± 01.81 00.00 ± 00.00
I% 35.74 ± 15.90 89.24 ± 01.46
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 17 / 20
Experiments
Some results...
Ontology index
Sampling rate 50 %
TRF ETRF
10 trees 10 trees
Bco
M% 86.27 ± 15.79 91.31 ± 06.35
C% 02.47 ± 03.70 02.91 ± 02.45
O% 01.90 ± 07.30 00.00 ± 00.00
I% 09.36 ± 13.96 05.88 ± 06.49
Biopax
M% 75.30 ± 16.23 96.92 ± 08.07
C% 18.74 ± 17.80 00.79 ± 01.22
O% 00.00 ± 00.00 00.00 ± 00.00
I% 01.97 ± 07.16 02.29 ± 08.13
NTN
M% 83.41 ± 07.85 05.38 ± 07.38
C% 00.02 ± 00.04 06.58 ± 07.51
O% 13.40 ± 10.17 00.00 ± 00.00
I% 03.17 ± 04.65 88.05 ± 08.50
HD
M% 68.00 ± 16.98 10.29 ± 00.00
C% 00.02 ± 00.05 00.26 ± 00.26
O% 06.38 ± 02.03 00.00 ± 00.00
I% 25.59 ± 18.98 89.24 ± 00.26
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 18 / 20
Experiments
Discussion
improved performance of ETRFs w.r.t. the other models
higher match rate and induction rate
a lower standard deviation
smallest changes of performance w.r.t. the forest size
weak diversification(overlapping) between trees by increasing the
number of trees
refinement operator is a bottleneck for learning phase
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 19 / 20
Future Works
Conclusions and Further Extensions
We proposed an ensemble solution based on DS to improve the
predictiveness of the models for class-membership prediction with
imbalanced training data distribution
Extensions:
Development and reuse of refinement operators
Further ensemble techniques and combination rules
Experiments with larger ontologies
Parallelization of the current implementation
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 20 / 20
End
Thank you!
Questions?
G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 20 / 20

More Related Content

Similar to Inductive Classification through Evidence-based Models and Their Ensemble

Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge basesTackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge basesGiuseppe Rizzo
 
Towards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeTowards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeGiuseppe Rizzo
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervisedMadhav Sigdel
 
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disasterMahendra Poudel
 
Explanations in Data Systems
Explanations in Data SystemsExplanations in Data Systems
Explanations in Data SystemsFotis Savva
 
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiotaKazumasa Kaneko
 
Topic_6
Topic_6Topic_6
Topic_6butest
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
Statistics pres 10 27 2015 roy sabo
Statistics pres 10 27 2015   roy saboStatistics pres 10 27 2015   roy sabo
Statistics pres 10 27 2015 roy sabotjcarter
 
Investigations of certain estimators for modeling panel data under violations...
Investigations of certain estimators for modeling panel data under violations...Investigations of certain estimators for modeling panel data under violations...
Investigations of certain estimators for modeling panel data under violations...Alexander Decker
 
Combining Dynamic Predictions from Joint Models using Bayesian Model Averaging
Combining Dynamic Predictions from Joint Models using Bayesian Model AveragingCombining Dynamic Predictions from Joint Models using Bayesian Model Averaging
Combining Dynamic Predictions from Joint Models using Bayesian Model AveragingDimitris Rizopoulos
 
Neural Network based Supervised Self Organizing Maps for Face Recognition
Neural Network based Supervised Self Organizing Maps for Face Recognition  Neural Network based Supervised Self Organizing Maps for Face Recognition
Neural Network based Supervised Self Organizing Maps for Face Recognition ijsc
 
NEURAL NETWORK BASED SUPERVISED SELF ORGANIZING MAPS FOR FACE RECOGNITION
NEURAL NETWORK BASED SUPERVISED SELF ORGANIZING MAPS FOR FACE RECOGNITIONNEURAL NETWORK BASED SUPERVISED SELF ORGANIZING MAPS FOR FACE RECOGNITION
NEURAL NETWORK BASED SUPERVISED SELF ORGANIZING MAPS FOR FACE RECOGNITIONijsc
 
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...Istituto nazionale di statistica
 
On cascading small decision trees
On cascading small decision treesOn cascading small decision trees
On cascading small decision treesJulià Minguillón
 
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-Tony Vilchez Yarihuaman
 

Similar to Inductive Classification through Evidence-based Models and Their Ensemble (20)

Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge basesTackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
Tackling the Class Imbalance Learning Problem in Semantic Web Knowledge bases
 
Towards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision TreeTowards Evidence Terminological Decision Tree
Towards Evidence Terminological Decision Tree
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
 
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disaster
 
Explanations in Data Systems
Explanations in Data SystemsExplanations in Data Systems
Explanations in Data Systems
 
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
 
Topic_6
Topic_6Topic_6
Topic_6
 
Sampling
 Sampling Sampling
Sampling
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Statistics pres 10 27 2015 roy sabo
Statistics pres 10 27 2015   roy saboStatistics pres 10 27 2015   roy sabo
Statistics pres 10 27 2015 roy sabo
 
Investigations of certain estimators for modeling panel data under violations...
Investigations of certain estimators for modeling panel data under violations...Investigations of certain estimators for modeling panel data under violations...
Investigations of certain estimators for modeling panel data under violations...
 
The t Test for Two Related Samples
The t Test for Two Related SamplesThe t Test for Two Related Samples
The t Test for Two Related Samples
 
Combining Dynamic Predictions from Joint Models using Bayesian Model Averaging
Combining Dynamic Predictions from Joint Models using Bayesian Model AveragingCombining Dynamic Predictions from Joint Models using Bayesian Model Averaging
Combining Dynamic Predictions from Joint Models using Bayesian Model Averaging
 
Neural Network based Supervised Self Organizing Maps for Face Recognition
Neural Network based Supervised Self Organizing Maps for Face Recognition  Neural Network based Supervised Self Organizing Maps for Face Recognition
Neural Network based Supervised Self Organizing Maps for Face Recognition
 
NEURAL NETWORK BASED SUPERVISED SELF ORGANIZING MAPS FOR FACE RECOGNITION
NEURAL NETWORK BASED SUPERVISED SELF ORGANIZING MAPS FOR FACE RECOGNITIONNEURAL NETWORK BASED SUPERVISED SELF ORGANIZING MAPS FOR FACE RECOGNITION
NEURAL NETWORK BASED SUPERVISED SELF ORGANIZING MAPS FOR FACE RECOGNITION
 
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
 
On cascading small decision trees
On cascading small decision treesOn cascading small decision trees
On cascading small decision trees
 
report
reportreport
report
 
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 

Inductive Classification through Evidence-based Models and Their Ensemble

  • 1. Inductive Classification through Evidence-based Models and Their Ensembles Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi and Floriana Esposito Dipartimento di Informatica Universit`a degli Studi di Bari ”Aldo Moro”, Bari, Italy ESWC 2015 June 3rd, 2015 G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 1 / 20
  • 2. Outline 1 Introduction & Motivations 2 DS & Evidential Terminological Decision Trees 3 The framework 4 Experiments 5 Future Works G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 2 / 20
  • 3. Introduction & Motivations Motivations AIM: predicting the membership of an individual w.r.t. a query concept typically based on automated reasoning techniques Inferences are affected by the incompleteness of the Semantic Web decided using models induced by Machine learning methods The quality depends on the training data distribution Given a query concept, generally, many uncertain-membership examples than individuals with a definite membership We are assuming a ternary classification problem G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 3 / 20
  • 4. Introduction & Motivations Motivations Previous solutions and the current limits We started to investigate the imbalance learning problem by resorting to a solution which combines (under-)sampling methods and ensemble learning models for overcoming the loss of information due to the discarded instances Terminological Decision Tree (TDT): a DL-based Decision Tree for concept learning and assertion prediction problems combined to obtain Terminological Random Forests (TRF) Some limits: predictions made according to simple majority vote procedure (no conflicts, no uncertainty are considered) misclassifications mainly due to evenly distributed votes Further rules (meta-learner) required G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 4 / 20
  • 5. Introduction & Motivations Introduction & Motivations Underlying idea Using soft predictions (predictions with a confidence measure for each class value) obtained by each tree for weighting the votes TDTs return only hard predictions (i,.e. predicted class without any information) Dempster-Shafer Theory (DS) operators for information fusion Solution: Resort and modify the Evidential TDTs (ETDTs) G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 5 / 20
  • 6. DS & Evidential Terminological Decision Trees The Dempster-Shafer Theory (DS) Frame of discernement Ω a set of hypotheses for a domain, e.g. the membership values for an individual given a concept Ω = {−1, +1} Basic Belief Assignement (BBA) m : 2Ω → [0, 1] the amount of belief exactly committed to A ⊆ Ω Belief function: ∀A, B ∈ 2Ω Bel(A) = B⊆A m(B) Plausibility function: ∀A, B ∈ 2Ω Pl(A) = B∩A=∅ m(B) G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 6 / 20
  • 7. DS & Evidential Terminological Decision Trees The Dempster-Shafer Theory (DS) Combination rules: used for pooling evidences for the same frame of discernment coming from various sources of information Dempster’s rule ∀A, B, C ⊆ Ω m12(A) = m1 ⊕ m2 = 1 1−c B∩C=A m1(B)m2(C) Dubois-Prade’s rule ∀A, B, C ⊆ Ω m12(A) = B∪C=A m1(B)m2(C) G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 7 / 20
  • 8. DS & Evidential Terminological Decision Trees Evidential TDTs An ETDT is a binary tree where: each node contains a conjunctive concept description D and a BBA m obtained by counting the positive, negative and uncertain instances; each departing edge is the result of instance-check test w.r.t. D, i.e., given an individual a, K |= D(a)? a child node with the concept description D is obtained using a refinement operator The model can be used for returning soft prediction G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 8 / 20
  • 9. DS & Evidential Terminological Decision Trees An example of ETDT ∃hasPart. m= (∅: 0, {+1}:0.30,{-1}:0.36, {-1,+1}: 0.34) ∃hasPart.Worn m=(∅: 0.00, {+1}:0.50,{-1}:0.36, {-1,+1}: 0.14) ∃hasPart.(Worn ¬Replaceable) m=(∅: 0.00, {+1}:0.50,{-1}:0.36, {-1,+1}:0.00) SendBack m= (∅: 0.00, {+1}:1.00,{-1}:0.00, {-1,+1}:0.00) ¬SendBack m=(∅: 0.00, {+1}:0.00,{-1}:1.00, {-1,+1}:0.00) ¬SendBack m=(∅: 0.00, {+1}:0.00,{-1}:0.13, {-1,+1}:0.87) ¬SendBack m=(∅: 0.0, {+1}:0.00,{-1}:0.00, {-1,+1}: 1.0) Ω = {−1, +1} {+1} ↔ K |= D(a) ∀a ∈ Ind(A) {−1} ↔ K |= ¬D(b) ∀b ∈ Ind(A) {−1, +1} otherwise G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 9 / 20
  • 10. The framework Evidential Terminological Random Forests In order to tackle the imbalance learning problem, we propose Evidential Terminological Random Forest (ETRF), where each ETDT returns a soft prediction in the form of BBA the meta-learner is a combination rule G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 10 / 20
  • 11. The framework Learning Evidential Terminological Random Forests Given: a target concept C the number of trees n a training set Tr = Ps, Ns, Us Ps = {a ∈ Ind(A)|K |= C(a)} Ns = {b ∈ Ind(A)|K |= ¬C(b)} Us = {c ∈ Ind(A)|K |= C(c) ∧ K |= ¬C(c)} the algorithm can be summarized as follows: build a n bootstrap samples with a balanced distribution for each sample learn an ETDT model G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 11 / 20
  • 12. The framework Learning ETRF Building bootstrap samples 1 a stratified sampling with replacement procedure is employed in order to represent the minority class instances in the bootstrap sample. 2 the majority class instances (either positive, negative and uncertain-membership instances) are discarded. G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 12 / 20
  • 13. The framework Learning ETRF Learning ETDTs Divide-and-conquer algorithm for learning an ETDT [Rizzo et al.@IPMU, 2014] Steps: 1 refinement of the concept description installed into the current node 2 Random selection of a subset of candidates 3 A BBA for each selected description 4 The concept having the most definite membership (and its BBA) installed into the new node. Stop conditions: the node is pure w.r.t. the membership G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 13 / 20
  • 14. The framework Predicting membership for unseen individuals Given a forest F and a new individual a, the algorithm collects BBAs returned by each ETDT The BBA returned by an ETDT is decided by following a path according to the instance check test result. For a concept description installed as node D if K |= D(a) the left branch is followed if K |= ¬D(a) the right branch is followed otherwise both branches are followed Various leaves can be reached and the corresponding BBAs are pooled according to the combination rule G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 14 / 20
  • 15. The framework Predicting membership for unseen individuals The set of BBAs returning from all the ETDTs are combined through the combination rule After a pooled BBA m is obtained, Bel (resp. Pl) function is derived Final membership assignement: hypothesis which maximizes belief (resp. plausibility) function Bel and Pl function are monotonic : uncertain-memberhip is more probable Return the uncertain-membership value when the belief for the positive- and negative-membership are approximately equal G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 15 / 20
  • 16. Experiments Experiments 15 query concepts randomly generated 10-fold cross validation number of candidates randomly selected: |ρ(·)| Comparison w.r.t. TDTs, ETDTs, TRFs Forest sizes: 10, 20, 30 trees Stratified Sampling rates: 50%, 70 %, 80 % Metrics: match: individuals for which the inductive model and a reasoner predict the same membership commission: cases of opposite predictions omission: individuals having a definite membership that cannot be predicted inductively; induction: predictions that are not logically derivable. G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 16 / 20
  • 17. Experiments Some results... Ontology index TDT ETDTs Bco M% 80.44 ± 11.01 90.31 ± 14.79 C% 07.56 ± 08.08 01.86 ± 02.61 O% 05.04 ± 04.28 00.00 ± 00.00 I% 06.96 ± 05.97 07.83 ± 15.35 Biopax M% 66.63 ± 14.60 87.00 ± 07.15 C% 31.03 ± 12.95 11.57 ± 02.62 O% 00.39 ± 00.61 00.00 ± 00.00 I% 01.95 ± 07.13 01.43 ± 08.32 NTN M% 68.85 ± 13.23 23.87 ± 26.18 C% 00.37 ± 00.30 00.00 ± 00.00 O% 09.51 ± 07.06 00.00 ± 00.00 I% 21.27 ± 08.73 75.13 ± 26.18 HD M% 58.31 ± 14.06 10.69 ± 01.47 C% 00.44 ± 00.47 00.07 ± 00.17 O% 05.51 ± 01.81 00.00 ± 00.00 I% 35.74 ± 15.90 89.24 ± 01.46 G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 17 / 20
  • 18. Experiments Some results... Ontology index Sampling rate 50 % TRF ETRF 10 trees 10 trees Bco M% 86.27 ± 15.79 91.31 ± 06.35 C% 02.47 ± 03.70 02.91 ± 02.45 O% 01.90 ± 07.30 00.00 ± 00.00 I% 09.36 ± 13.96 05.88 ± 06.49 Biopax M% 75.30 ± 16.23 96.92 ± 08.07 C% 18.74 ± 17.80 00.79 ± 01.22 O% 00.00 ± 00.00 00.00 ± 00.00 I% 01.97 ± 07.16 02.29 ± 08.13 NTN M% 83.41 ± 07.85 05.38 ± 07.38 C% 00.02 ± 00.04 06.58 ± 07.51 O% 13.40 ± 10.17 00.00 ± 00.00 I% 03.17 ± 04.65 88.05 ± 08.50 HD M% 68.00 ± 16.98 10.29 ± 00.00 C% 00.02 ± 00.05 00.26 ± 00.26 O% 06.38 ± 02.03 00.00 ± 00.00 I% 25.59 ± 18.98 89.24 ± 00.26 G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 18 / 20
  • 19. Experiments Discussion improved performance of ETRFs w.r.t. the other models higher match rate and induction rate a lower standard deviation smallest changes of performance w.r.t. the forest size weak diversification(overlapping) between trees by increasing the number of trees refinement operator is a bottleneck for learning phase G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 19 / 20
  • 20. Future Works Conclusions and Further Extensions We proposed an ensemble solution based on DS to improve the predictiveness of the models for class-membership prediction with imbalanced training data distribution Extensions: Development and reuse of refinement operators Further ensemble techniques and combination rules Experiments with larger ontologies Parallelization of the current implementation G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 20 / 20
  • 21. End Thank you! Questions? G.Rizzo et al. (DIB - Univ. Aldo Moro) ESWC 2015 ESWC 2015 June 3rd, 2015 20 / 20