SlideShare a Scribd company logo
Bagging Decision Trees on Data Sets
with Classification Noise
Joaquín Abellán and Andrés R. Masegosa
Department of Computer Science and Artificial Intelligence
University of Granada
Sofia, February 2010
6th International Symposium on Foundations of Infomation and Knowledge Systems
FOIKS 2010 Sofia (Bulgaria) 1/25
Introduction
Part I
Introduction
FOIKS 2010 Sofia (Bulgaria) 2/25
Introduction
Introduction
Ensembles of Decision Trees (DT)
Features
They usually build different DT for different samples of the training dataset.
The final prediction is a combination of the individual predictions of each tree.
Take advantage of the inherent instability of DT.
Bagging, AdaBoost and Randomization the most known approaches.
FOIKS 2010 Sofia (Bulgaria) 3/25
Introduction
Introduction
Classification Noise (CN) in the class values
Definition
The class values of the samples given to the learning algorithm have some
errors.
Random classification noise: the label of each example is flipped randomly
and independently with some fixed probability called the noise rate.
Causes
It is mainly due to errors in the data capture process.
Very common in real world applications: surveys, biological or medical
information...
Effects on ensembles of decision trees
The presence of classification noise degenerates the performance of any
classification inducer.
AdaBoost is known to be very affected by the presence of classification noise.
Bagging is the ensemble approach with the better response to classification
noise, specially with the C4.5 [21] as decision tree inducer.
FOIKS 2010 Sofia (Bulgaria) 4/25
Introduction
Introduction
Motivation of this study
Previous Works [7]
Simple decision trees (no continuous attributes, no missing values, no
post-pruning) built with different split criteria were considered in a Bagging
scheme.
Classic split criteria (InfoGain, InfoGain Ratio and Gini Index) and a new split
criteria based on imprecise probabilities were analyzed.
Imprecise Info-Gain generates the the most robust Bagging ensembles in data
sets with classification noise.
Contributions
An extension of Bagging ensembles of credal decision trees to deal with:
Continuous Variables
Missing Data
Post-Pruning Process
Evaluate the performance on data sets with different rates of random
classification noise.
Experimental comparison with Bagging ensembles built with C4.5R8 decision
tree inducer.
FOIKS 2010 Sofia (Bulgaria) 5/25
Previous Knowledge
Part II
Previous Knowledge
FOIKS 2010 Sofia (Bulgaria) 6/25
Previous Knowledge
Decision Trees
Decision Trees
Description
Attributes are placed at the nodes.
Class values predictions are placed at
the leaves.
Each leaf corresponds to a
classification decision rule.
Learning
Split Criteria selects the attribute to place at each branching node.
Stop Criteria decides when to fix a leaf and stop the branching.
Post-Prunning simplifies the tree pruning those branches with a low support in
their associated decision rules.
FOIKS 2010 Sofia (Bulgaria) 7/25
Previous Knowledge
Decision Trees
C4.5 Tree Inducer
Description
It is the most famous tree inducer, introduced by Quinlan in 1993 [21].
eight different releases have been proposed. In this work, we consider the last
one: C4.5R8.
The most influential data mining algorithm (IEEE ICDM06).
Features
Split Criteria: The Info-Gain Ratio as a quotient between the information gain of
an attribute and its entropy.
Numeric Attributes: It looks for the optimal binary split point in terms of
information gain.
Missing Values: Assumes Missing at Random Hypothesis. Marginalize the
missing variable when making predictions.
Post-Pruning: It employs the pessimistic error pruning: computes an upper
bound of the estimated error and when the bound of a leaf is higher than the
bound of its ancestor, this leaf is pruned.
FOIKS 2010 Sofia (Bulgaria) 8/25
Previous Knowledge
Bagging Decision Trees
Bagging Decision Trees
Procedure
Ti samples are generated by
random sampling with
replacement from the initial training
dataset.
From each Ti sample, a simple
decision tree is built using a given
split criteria.
Final prediction is made by a
majority voting criteria.
Motivation
Employing different decision tree inducers (C4.5, CART,...) we get different
Bagging ensembles.
Bagging ensembles have been reported to be on of the best classification
models when there is noisy data samples [10,13,18].
FOIKS 2010 Sofia (Bulgaria) 9/25
Bagging Credal Decision Trees
Part III
Bagging Credal Decision Trees
FOIKS 2010 Sofia (Bulgaria) 10/25
Bagging Credal Decision Trees
Credal Decision Trees
Credal Decision Trees (CDT) Inducer
New Features
Handle with numeric attributes and the presence of
missing values in the data set.
Adaptation of C4.5 methods.
Strong simplification of the algorithm and far less
number of parameters.
Take advantage of the good properties of Imprecise
Information Gain [2] measure against overfitting.
FOIKS 2010 Sofia (Bulgaria) 11/25
Bagging Credal Decision Trees
Credal Decision Trees
Imprecise Information Gain (Imprecise-IG) measure
Maximum Entropy Function
Probability intervals for multinomial
variables are computed from the
data set using Walley’s IDM [22].
We then obtain a credal set of
probability distributions for the
class variable, K(C).
Maximum Entropy of a credal set
S(K) is efficiently computed using
Abellan and Moral’s method [2].
Imprecise-IG Split criteria
Imprecise Info-Gain for each variable X is defined as:
IIG(X, C) = S(K(C))
X
p(xi )S(K(C|(X = xi )))
It was successful applied to build simple decision trees in [3]
FOIKS 2010 Sofia (Bulgaria) 12/25
Bagging Credal Decision Trees
Credal Decision Trees
Split Criteria
Credal DT
The attribute with maximum Imprecise-IG score is
selected as split attribute at each branching node.
C4.5R8
There are multiple conditions and heuristics:
Maximum Info-Gain Ratio score.
IG score higher than the average value of the valid split
attributes.
Valid split attributes are those whose number of values is
smaller than 30% of number of samples in that branch.
FOIKS 2010 Sofia (Bulgaria) 13/25
Bagging Credal Decision Trees
Credal Decision Trees
Stop Criteria
Credal DT
All split attributes have negative
Imprecise-IG.
Minimum number of instances in a
leaf higher than 2.
C4.5R8
Info-Gain measure is always positive.
There are multiple conditions and heuristics:
Minimum number of instances in a leaf higher than 2.
When there is any valid split attribute.
FOIKS 2010 Sofia (Bulgaria) 14/25
Bagging Credal Decision Trees
Credal Decision Trees
Numeric Attributes
Credal DT
Each possible split point is evaluated. The one which generates the
bi-partition with the highest Imprecise-IG score is selected.
C4.5R8
Same approach: split point with the maximum Info-Gain.
There are extra restrictions and heuristics:
Minimum n. of instances: 10% of the ratio between the number of
instances in this branch and the cardinality of the class variable.
Info-Gain is corrected subtracting the logarithm of the number of
evaluated split points divided by the number of instances in this branch.
FOIKS 2010 Sofia (Bulgaria) 15/25
Bagging Credal Decision Trees
Credal Decision Trees
Post-Pruning
Credal DT
Reduced Error Pruning [20].
Most simple pruning process.
2-folds to build the and 1-fold to estimate the test error.
FOIKS 2010 Sofia (Bulgaria) 16/25
Experiments
Part IV
Experiments
FOIKS 2010 Sofia (Bulgaria) 17/25
Experiments
Experiments
Experimental Set-up
Benchmark
25 UCI data sets with very different features.
Bagging ensembles of 100 trees.
Bagging-CDT versus Bagging-C4.5R8.
Different noise rates were applied to training datasets (not to test datasets):
0%, 5%, 10%, 20% and 30%.
k-10 fold cross validation repeated 10 times were used to estimate the
classification accuracy.
Statistical Tests [12,24]
Corrected Paired T-test.
Wilconxon Signed-Ranks Test.
Sign Test.
Friedman Test.
Nemenyi Test.
FOIKS 2010 Sofia (Bulgaria) 18/25
Experiments
Experiments
Performance Evaluation without tree post-pruning
Analysis
There is no statistical differences with low noise level.
B-CDT outperforms B-C4.5 ensembles with high noise levels.
B-CDT always induces more simple decision trees (lower n. of nodes).
FOIKS 2010 Sofia (Bulgaria) 19/25
Experiments
Experiments
Performance Evaluation with tree post-pruning
Analysis
Post-pruning methods help to improve the performance when there is noise in
the data.
The performance of B-C4.5 does not degenerate so quickly.
B-CDT also has better performance with high noise levels.
B-CDT also induces more simple trees.
FOIKS 2010 Sofia (Bulgaria) 20/25
Experiments
Experiments
Bias-Variance (BV) Error Analysis
Bias-Variance Decomposition of 0-1 Loss Functions [17]
Error = Bias2
+ Variance
Bias: Error component due to the incapacity of the predictor to model the
underlying distribution.
Variance: Error component that stems from the particularities of the training data
set (i.e. measure of overfitting).
FOIKS 2010 Sofia (Bulgaria) 21/25
Experiments
Experiments
BV Error Differences: No pruning
Analysis
Bias differences remains stable across the different noise levels.
Variance differences increases with the noise rate level.
B-CDT does not overfit so much as B-C4.5 (lower variance error) when there is
high noise levels.
Maybe the high number of heuristics of C4.5 may not help when there is
spurious data.
FOIKS 2010 Sofia (Bulgaria) 22/25
Conclusions and Future Works
Part V
Conclusions and Future Works
FOIKS 2010 Sofia (Bulgaria) 23/25
Conclusions and Future Works
Conclusions and Future Works
Conclusions and Future Works
Conclusions
We have presented an interesting application of information based
uncertainty measures on a challenging data-mining problem.
A very simple decision tree inducer that handles numeric attributes and deals
with missing values have been proposed.
An extensive experimental evaluation have been carried out to analyze the effect
of classification noise in Bagging ensembles.
Bagging with decision trees induced by the Imprecise-IG measure have a better
performance and less overfitting with medium-high noise levels.
The Imprecise-IG is a robust split criteria to build bagging ensembles of
decision trees.
Future Works
Develop a new pruning method based on imprecise probabilities.
Extend these methods to carry out credal classification.
Apply new imprecise models such as the NPI model.
FOIKS 2010 Sofia (Bulgaria) 24/25
Thanks for your attention!!
Questions?
FOIKS 2010 Sofia (Bulgaria) 25/25

More Related Content

What's hot

Multivariate decision tree
Multivariate decision treeMultivariate decision tree
Multivariate decision tree
Prafulla Shukla
 
Polikar10missing
Polikar10missingPolikar10missing
Polikar10missing
kagupta
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Andrea Dal Pozzolo
 
VHDL Design for Image Segmentation using Gabor filter for Disease Detection
VHDL Design for Image Segmentation using Gabor filter for Disease DetectionVHDL Design for Image Segmentation using Gabor filter for Disease Detection
VHDL Design for Image Segmentation using Gabor filter for Disease Detection
VLSICS Design
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
rohan_anil
 
Image watermarking based on integer wavelet transform-singular value decompos...
Image watermarking based on integer wavelet transform-singular value decompos...Image watermarking based on integer wavelet transform-singular value decompos...
Image watermarking based on integer wavelet transform-singular value decompos...
IJECEIAES
 
S0450598102
S0450598102S0450598102
S0450598102
IJERA Editor
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudData
WeCloudData
 
Df24693697
Df24693697Df24693697
Df24693697
IJERA Editor
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
Laila Fatehy
 
IRJET- A Review on Object Tracking based on KNN Classifier
IRJET- A Review on Object Tracking based on KNN ClassifierIRJET- A Review on Object Tracking based on KNN Classifier
IRJET- A Review on Object Tracking based on KNN Classifier
IRJET Journal
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET Journal
 
Decision tree
Decision treeDecision tree
Decision tree
R A Akerkar
 

What's hot (15)

Multivariate decision tree
Multivariate decision treeMultivariate decision tree
Multivariate decision tree
 
Polikar10missing
Polikar10missingPolikar10missing
Polikar10missing
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
 
VHDL Design for Image Segmentation using Gabor filter for Disease Detection
VHDL Design for Image Segmentation using Gabor filter for Disease DetectionVHDL Design for Image Segmentation using Gabor filter for Disease Detection
VHDL Design for Image Segmentation using Gabor filter for Disease Detection
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
Image watermarking based on integer wavelet transform-singular value decompos...
Image watermarking based on integer wavelet transform-singular value decompos...Image watermarking based on integer wavelet transform-singular value decompos...
Image watermarking based on integer wavelet transform-singular value decompos...
 
S0450598102
S0450598102S0450598102
S0450598102
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudData
 
Df24693697
Df24693697Df24693697
Df24693697
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
IRJET- A Review on Object Tracking based on KNN Classifier
IRJET- A Review on Object Tracking based on KNN ClassifierIRJET- A Review on Object Tracking based on KNN Classifier
IRJET- A Review on Object Tracking based on KNN Classifier
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
IRJET-Multiclass Classification Method Based On Deep Learning For Leaf Identi...
 
Decision tree
Decision treeDecision tree
Decision tree
 

Viewers also liked

Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Manish Saraswat
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Intuit Inc.
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forests
Hoang Nguyen
 
Machine Learning with R and Tableau
Machine Learning with R and TableauMachine Learning with R and Tableau
Machine Learning with R and Tableau
Kayden Kelly
 
CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"
Akisato Kimura
 
Predictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise MinerPredictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise Miner
Jeffrey Strickland, Ph.D., CMSP
 
Management Consulting
Management ConsultingManagement Consulting
Management Consulting
Alexandros Chatzopoulos
 
How to create great slides for presentations
How to create great slides for presentationsHow to create great slides for presentations
How to create great slides for presentations
mikejeffs
 

Viewers also liked (8)

Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forests
 
Machine Learning with R and Tableau
Machine Learning with R and TableauMachine Learning with R and Tableau
Machine Learning with R and Tableau
 
CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"
 
Predictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise MinerPredictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise Miner
 
Management Consulting
Management ConsultingManagement Consulting
Management Consulting
 
How to create great slides for presentations
How to create great slides for presentationsHow to create great slides for presentations
How to create great slides for presentations
 

Similar to Bagging Decision Trees on Data Sets with Classification Noise

Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
MLconf
 
On cascading small decision trees
On cascading small decision treesOn cascading small decision trees
On cascading small decision trees
Julià Minguillón
 
Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for...
Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for...Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for...
Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for...
University of Maribor
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
Kaviya452563
 
Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 Classification
Khalid Elshafie
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Editor IJMTER
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontree
tahseen shaikh
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)
eSAT Journals
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
Kalpna Saharan
 
Assign1
Assign1Assign1
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
User centric data dissemination in disruption tolerant networkas
User centric data dissemination in disruption tolerant networkasUser centric data dissemination in disruption tolerant networkas
User centric data dissemination in disruption tolerant networkas
Showyou Tang
 
Overfitting.pptx
Overfitting.pptxOverfitting.pptx
Overfitting.pptx
PerumalPitchandi
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
batubao
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
Leonardo Auslender
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
A Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification TreesA Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification Trees
NTNU
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
henonah
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
ShivarkarSandip
 

Similar to Bagging Decision Trees on Data Sets with Classification Noise (20)

Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
 
On cascading small decision trees
On cascading small decision treesOn cascading small decision trees
On cascading small decision trees
 
Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for...
Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for...Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for...
Examples Implementing Black-Box Discrete Optimization Benchmarking Survey for...
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 Classification
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontree
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
Assign1
Assign1Assign1
Assign1
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
User centric data dissemination in disruption tolerant networkas
User centric data dissemination in disruption tolerant networkasUser centric data dissemination in disruption tolerant networkas
User centric data dissemination in disruption tolerant networkas
 
Overfitting.pptx
Overfitting.pptxOverfitting.pptx
Overfitting.pptx
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
A Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification TreesA Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification Trees
 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
 

More from NTNU

Varying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilitiesVarying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilities
NTNU
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU
 
lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...
NTNU
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
NTNU
 
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
NTNU
 
An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...
NTNU
 
Learning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait lociLearning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait loci
NTNU
 
Split Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision TreesSplit Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision Trees
NTNU
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of Cases
NTNU
 
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
NTNU
 
Interactive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksInteractive Learning of Bayesian Networks
Interactive Learning of Bayesian Networks
NTNU
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees
NTNU
 
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
NTNU
 
Evaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy predictionEvaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy prediction
NTNU
 
Effects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy PredictionEffects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy Prediction
NTNU
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6
NTNU
 

More from NTNU (16)

Varying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilitiesVarying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilities
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
 
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
 
An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...
 
Learning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait lociLearning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait loci
 
Split Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision TreesSplit Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision Trees
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of Cases
 
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
 
Interactive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksInteractive Learning of Bayesian Networks
Interactive Learning of Bayesian Networks
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees
 
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
 
Evaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy predictionEvaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy prediction
 
Effects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy PredictionEffects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy Prediction
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6
 

Recently uploaded

_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li..._Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
LucyHearn1
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
frank0071
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 

Recently uploaded (20)

_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li..._Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 

Bagging Decision Trees on Data Sets with Classification Noise

  • 1. Bagging Decision Trees on Data Sets with Classification Noise Joaquín Abellán and Andrés R. Masegosa Department of Computer Science and Artificial Intelligence University of Granada Sofia, February 2010 6th International Symposium on Foundations of Infomation and Knowledge Systems FOIKS 2010 Sofia (Bulgaria) 1/25
  • 3. Introduction Introduction Ensembles of Decision Trees (DT) Features They usually build different DT for different samples of the training dataset. The final prediction is a combination of the individual predictions of each tree. Take advantage of the inherent instability of DT. Bagging, AdaBoost and Randomization the most known approaches. FOIKS 2010 Sofia (Bulgaria) 3/25
  • 4. Introduction Introduction Classification Noise (CN) in the class values Definition The class values of the samples given to the learning algorithm have some errors. Random classification noise: the label of each example is flipped randomly and independently with some fixed probability called the noise rate. Causes It is mainly due to errors in the data capture process. Very common in real world applications: surveys, biological or medical information... Effects on ensembles of decision trees The presence of classification noise degenerates the performance of any classification inducer. AdaBoost is known to be very affected by the presence of classification noise. Bagging is the ensemble approach with the better response to classification noise, specially with the C4.5 [21] as decision tree inducer. FOIKS 2010 Sofia (Bulgaria) 4/25
  • 5. Introduction Introduction Motivation of this study Previous Works [7] Simple decision trees (no continuous attributes, no missing values, no post-pruning) built with different split criteria were considered in a Bagging scheme. Classic split criteria (InfoGain, InfoGain Ratio and Gini Index) and a new split criteria based on imprecise probabilities were analyzed. Imprecise Info-Gain generates the the most robust Bagging ensembles in data sets with classification noise. Contributions An extension of Bagging ensembles of credal decision trees to deal with: Continuous Variables Missing Data Post-Pruning Process Evaluate the performance on data sets with different rates of random classification noise. Experimental comparison with Bagging ensembles built with C4.5R8 decision tree inducer. FOIKS 2010 Sofia (Bulgaria) 5/25
  • 6. Previous Knowledge Part II Previous Knowledge FOIKS 2010 Sofia (Bulgaria) 6/25
  • 7. Previous Knowledge Decision Trees Decision Trees Description Attributes are placed at the nodes. Class values predictions are placed at the leaves. Each leaf corresponds to a classification decision rule. Learning Split Criteria selects the attribute to place at each branching node. Stop Criteria decides when to fix a leaf and stop the branching. Post-Prunning simplifies the tree pruning those branches with a low support in their associated decision rules. FOIKS 2010 Sofia (Bulgaria) 7/25
  • 8. Previous Knowledge Decision Trees C4.5 Tree Inducer Description It is the most famous tree inducer, introduced by Quinlan in 1993 [21]. eight different releases have been proposed. In this work, we consider the last one: C4.5R8. The most influential data mining algorithm (IEEE ICDM06). Features Split Criteria: The Info-Gain Ratio as a quotient between the information gain of an attribute and its entropy. Numeric Attributes: It looks for the optimal binary split point in terms of information gain. Missing Values: Assumes Missing at Random Hypothesis. Marginalize the missing variable when making predictions. Post-Pruning: It employs the pessimistic error pruning: computes an upper bound of the estimated error and when the bound of a leaf is higher than the bound of its ancestor, this leaf is pruned. FOIKS 2010 Sofia (Bulgaria) 8/25
  • 9. Previous Knowledge Bagging Decision Trees Bagging Decision Trees Procedure Ti samples are generated by random sampling with replacement from the initial training dataset. From each Ti sample, a simple decision tree is built using a given split criteria. Final prediction is made by a majority voting criteria. Motivation Employing different decision tree inducers (C4.5, CART,...) we get different Bagging ensembles. Bagging ensembles have been reported to be on of the best classification models when there is noisy data samples [10,13,18]. FOIKS 2010 Sofia (Bulgaria) 9/25
  • 10. Bagging Credal Decision Trees Part III Bagging Credal Decision Trees FOIKS 2010 Sofia (Bulgaria) 10/25
  • 11. Bagging Credal Decision Trees Credal Decision Trees Credal Decision Trees (CDT) Inducer New Features Handle with numeric attributes and the presence of missing values in the data set. Adaptation of C4.5 methods. Strong simplification of the algorithm and far less number of parameters. Take advantage of the good properties of Imprecise Information Gain [2] measure against overfitting. FOIKS 2010 Sofia (Bulgaria) 11/25
  • 12. Bagging Credal Decision Trees Credal Decision Trees Imprecise Information Gain (Imprecise-IG) measure Maximum Entropy Function Probability intervals for multinomial variables are computed from the data set using Walley’s IDM [22]. We then obtain a credal set of probability distributions for the class variable, K(C). Maximum Entropy of a credal set S(K) is efficiently computed using Abellan and Moral’s method [2]. Imprecise-IG Split criteria Imprecise Info-Gain for each variable X is defined as: IIG(X, C) = S(K(C)) X p(xi )S(K(C|(X = xi ))) It was successful applied to build simple decision trees in [3] FOIKS 2010 Sofia (Bulgaria) 12/25
  • 13. Bagging Credal Decision Trees Credal Decision Trees Split Criteria Credal DT The attribute with maximum Imprecise-IG score is selected as split attribute at each branching node. C4.5R8 There are multiple conditions and heuristics: Maximum Info-Gain Ratio score. IG score higher than the average value of the valid split attributes. Valid split attributes are those whose number of values is smaller than 30% of number of samples in that branch. FOIKS 2010 Sofia (Bulgaria) 13/25
  • 14. Bagging Credal Decision Trees Credal Decision Trees Stop Criteria Credal DT All split attributes have negative Imprecise-IG. Minimum number of instances in a leaf higher than 2. C4.5R8 Info-Gain measure is always positive. There are multiple conditions and heuristics: Minimum number of instances in a leaf higher than 2. When there is any valid split attribute. FOIKS 2010 Sofia (Bulgaria) 14/25
  • 15. Bagging Credal Decision Trees Credal Decision Trees Numeric Attributes Credal DT Each possible split point is evaluated. The one which generates the bi-partition with the highest Imprecise-IG score is selected. C4.5R8 Same approach: split point with the maximum Info-Gain. There are extra restrictions and heuristics: Minimum n. of instances: 10% of the ratio between the number of instances in this branch and the cardinality of the class variable. Info-Gain is corrected subtracting the logarithm of the number of evaluated split points divided by the number of instances in this branch. FOIKS 2010 Sofia (Bulgaria) 15/25
  • 16. Bagging Credal Decision Trees Credal Decision Trees Post-Pruning Credal DT Reduced Error Pruning [20]. Most simple pruning process. 2-folds to build the and 1-fold to estimate the test error. FOIKS 2010 Sofia (Bulgaria) 16/25
  • 18. Experiments Experiments Experimental Set-up Benchmark 25 UCI data sets with very different features. Bagging ensembles of 100 trees. Bagging-CDT versus Bagging-C4.5R8. Different noise rates were applied to training datasets (not to test datasets): 0%, 5%, 10%, 20% and 30%. k-10 fold cross validation repeated 10 times were used to estimate the classification accuracy. Statistical Tests [12,24] Corrected Paired T-test. Wilconxon Signed-Ranks Test. Sign Test. Friedman Test. Nemenyi Test. FOIKS 2010 Sofia (Bulgaria) 18/25
  • 19. Experiments Experiments Performance Evaluation without tree post-pruning Analysis There is no statistical differences with low noise level. B-CDT outperforms B-C4.5 ensembles with high noise levels. B-CDT always induces more simple decision trees (lower n. of nodes). FOIKS 2010 Sofia (Bulgaria) 19/25
  • 20. Experiments Experiments Performance Evaluation with tree post-pruning Analysis Post-pruning methods help to improve the performance when there is noise in the data. The performance of B-C4.5 does not degenerate so quickly. B-CDT also has better performance with high noise levels. B-CDT also induces more simple trees. FOIKS 2010 Sofia (Bulgaria) 20/25
  • 21. Experiments Experiments Bias-Variance (BV) Error Analysis Bias-Variance Decomposition of 0-1 Loss Functions [17] Error = Bias2 + Variance Bias: Error component due to the incapacity of the predictor to model the underlying distribution. Variance: Error component that stems from the particularities of the training data set (i.e. measure of overfitting). FOIKS 2010 Sofia (Bulgaria) 21/25
  • 22. Experiments Experiments BV Error Differences: No pruning Analysis Bias differences remains stable across the different noise levels. Variance differences increases with the noise rate level. B-CDT does not overfit so much as B-C4.5 (lower variance error) when there is high noise levels. Maybe the high number of heuristics of C4.5 may not help when there is spurious data. FOIKS 2010 Sofia (Bulgaria) 22/25
  • 23. Conclusions and Future Works Part V Conclusions and Future Works FOIKS 2010 Sofia (Bulgaria) 23/25
  • 24. Conclusions and Future Works Conclusions and Future Works Conclusions and Future Works Conclusions We have presented an interesting application of information based uncertainty measures on a challenging data-mining problem. A very simple decision tree inducer that handles numeric attributes and deals with missing values have been proposed. An extensive experimental evaluation have been carried out to analyze the effect of classification noise in Bagging ensembles. Bagging with decision trees induced by the Imprecise-IG measure have a better performance and less overfitting with medium-high noise levels. The Imprecise-IG is a robust split criteria to build bagging ensembles of decision trees. Future Works Develop a new pruning method based on imprecise probabilities. Extend these methods to carry out credal classification. Apply new imprecise models such as the NPI model. FOIKS 2010 Sofia (Bulgaria) 24/25
  • 25. Thanks for your attention!! Questions? FOIKS 2010 Sofia (Bulgaria) 25/25