SlideShare a Scribd company logo
1 of 25
Download to read offline
Split Criterions for
Variable Selection
using Decision Trees
J. Abellán, A. R. Masegosa
Department of Computer Science and A.I.
University of Granada
Spain
Outline
1. Introduction
2. Previous knowledge
3. Experimentation
4. Conclusions & future work
Introduction
Information from a data base
Attribute variables Class variable
Data Base
Calcium Tumor Coma Migraine Cancer
normal a1 absent absent absent
high a1 present absent present
normal a1 absent absent absent
normal a1 absent absent absent
high ao present present absent
...... ...... ...... ...... ......
Introduction
Classificacion tree (decision tree)
Tumor
Classification:
absent
Calcium
Classification:
absent
Classification:
present
Attribute variableNode
Case of the class variableLeaf
 SPLIT CRITERION
 STOP CRITERION
 1 LEAF = 1 RULE
Introduction
Classification tree. New observation
 Observation: ( high, a1, absent, present)
 Variables: [Calcium, Tumor, Coma, Migraine]
 Classification: Cancer present
normal high
a0 a1
Classification :
absent
Classification:
absent
Classification:
present
Calcium
Tumor
Introduction
Principal problems for the clasifiers
 Redundant attribute variables
 Irrelevant attribute variables
 Excessive number of variables
Variable Selection Methods
 Filter methods
 Wrapper methods (classifier dependency)
Mark A. Hall y G. Holmes, Benchmarking Attribute Selection Techniques for
Discrete Class Data Mining, IEEE TKDE (2003)
Introduction
Variable selection with classification trees
Xa
XdXcXb
Xe Xf Xg Xh Xi XkXj
……………………………………………………..
{Xa, Xb, Xc, Xd}
{Xa, Xb,..., Xk}
FRIST LEVELS  MORE SIGNIFICATIVE VARIABLES
Introduction
Variable selection with classification trees
………………..........
…………
DB
DB
DB
Training set
SET1
SET2
SETm
SET1 U...U SETm
…………
Training set
Training set
Introduction
Variable selection with classification trees
………………..........
…………
DB
DB
DB
SET1
SET2
SETm
…………
INFORMATIVE ORDER FOR
THE ROOT NODE (Abellán &
Masegosa, 2007)
Training set
Training set
Training set
SET1 U...U SETm
Introduction
Approach of the work presented
 Stablish the most suitable split criterion for building
decision trees to use it as base for those compose
methods for VARIABLE SELECTION.
 The variables of the first levels of one decision tree
are extracted.
 The performance of this variables is evaluated with a
Naive Bayes classifier.
 We carry out EXPERIMENTS on a large set of data
bases using well-known split criterions (InfoGain,
IGRatio and GiniIndex) and another one based on
imprecise probabilities (Abellán & Moral, 2003),
Imprecise InfoGain.
Outline
1. Introduction
2. Previous knowledge
3. Experimentation
4. Conclusions & future works
Previos knowledges
Naive Bayes (Duda & Hart, 1973)
 Attribute variables {Xi | i=1,..,r}
 Class variable C with states in
{c1,..,ck}
 Select state of C:
arg maxci
(P(ci|X)).
 Supposition of independecy
known the class variable:
arg maxci
(P(ci) ∏r
j=1
P(zj|ci))
…
C
X1 X2 Xr
Graphical Structure
Previos knowledges
Split Criterions for decision trees:
Info-Gain (Quinlan, 1986)
 Selects the attribute variable with higher positive
value of IG(Xi,C) = H(C)-H(C|Xi)
 H(C) = -∑j
P(cj) log P(cj)SHANNON ENTROPY
 H(C|Xi) = -∑t
∑j
P(cj|xi
t) log P(cj|xi
t)
ID3
 Work only with discrete data bases
 Have a tendence to select variables with great
number of cases
Previos knowledges
Split Criterions for decision trees:
Info-Gain Ratio (Quinlan, 1993)
 Selects the attribute variable with higher positive
value of IGR(Xi,C) = IG(Xi,C)/ H(Xi)
C4.5
 Work with continuous data bases
 Have a posterior prune process
 Penalize the use of variables with higher number of
cases
Previos knowledges
Split Criterions for decision trees:
Gini index (Breiman et al., 1984)
 Selects the attribute variable with higher positive
value of GIx(Xi,C) = gini(C|Xi)-gini(C)
 gini(C) = ∑j
(1-P(cj))2
 gini(C|Xi) = ∑t
P(xi
t) gini(C|xi
t)
GINI INDEX
Quantify the impurity degree of a partition
(a “pure partition” has only values in one case of C)
Previos knowledges
Split Criterions for decision trees:
Imprecise Info-Gain (Abellán & Moral, 2003)
 Representing the information from a data base
Imprecise Dirichlet Model (IDM) Probability estimation
j
jj
c
cc
j I
sN
sn
sN
n
cP ≡





+
+
+
∈ ,)(
})(|{)( jcj IcqqCK ∈= })(|{)|( },{ ij xcji IcqqxXCK ∈==
Credal Sets
Previos knowledges
Split Criterions for decision trees:
Imprecise Info-Gain (Abellán & Moral, 2003)
 Select the attribute variable with higher positive
value of:
IGI(Xi,C) = S(K(C)) - ∑t
P(xi
t) S(K(C| Xi=xi
t))
with S as Maximum entropy function of a credal set.
 Global uncertainty measure ⊃ conflict & no-specificity
 Conflict is on the side of ramification.
 No-specificity tries to reduce the ramification.
Outline
1. Introduction
2. Previous knowledge
3. Experimentation
4. Conclusions & future works
Experimentation
Data Bases
 Preprocess:
- Filling of missing data
(average & mode)
- Discretization of
continuous values
 Aplication of selection
methods
 Aplication of NB on
original BDs with the set
of selected variables
•Percentage of correct
classification of NB
before and after the
selection process
•Number of variable
selected
Experimentation
Results with 3 levels. Correct classifications
 NB comparison:
 Accumulated Comparison:
10 fold-cross x 10 times. Corrected paired t-test with 5% of significance level
Experimentation
Results with 3 levels. Number of variables
 Accumulated Comparison:
Experimentation
Results with 4 levels
 Comparison over right classifications:
 Comparison over number of variables:
Experimentation
Results Analysis
1. Only using one tree, all the procedures obtain
good results using a few number of variables.
2. The improvement from 3 to 4 levels is not very
significative, except for IGR.
3. IGR penalizes excesivelly variables with high
number of cases (Audiology, Optdigits,..).
4. Using 3 levels, IIG has better results than the
other criterions. This outperforming is higher
with 4 levels.
Outline
1. Introduction
2. Previous knowledges
3. Experimentation
4. Conclusions & future works
Conclusions & future works
 Experiments over 27 DBs present to IGI as a
outperforming split-criterion considering the
trade off of accuracy and nº of variables.
 Apply IGI criterion and others ones based on
bayesian scores on the compose methods
explained in the introduction.
 Study the use of combined criterions, i.e. to
use of one or other criterion with dependency of
the characteristics of the BD (size, number of
variables, number of cases, etc…) and level of
the tree we stay.

More Related Content

Viewers also liked

International Business Environment Full Syllabus
International Business Environment   Full SyllabusInternational Business Environment   Full Syllabus
International Business Environment Full Syllabusitsvineeth209
 
IAF605 week 9 chapter 12 country evaluation and selection
IAF605 week 9 chapter 12 country evaluation and selectionIAF605 week 9 chapter 12 country evaluation and selection
IAF605 week 9 chapter 12 country evaluation and selectionIAF605
 
12 Country Evaluation and Selection
12 Country Evaluation and Selection12 Country Evaluation and Selection
12 Country Evaluation and SelectionBrent Weeks
 
Country evaluation and selection
Country evaluation and selectionCountry evaluation and selection
Country evaluation and selectionVinod Sharma
 
Country evaluation and selection - International Business - Manu Melwin Joy
Country evaluation and selection - International Business - Manu Melwin JoyCountry evaluation and selection - International Business - Manu Melwin Joy
Country evaluation and selection - International Business - Manu Melwin Joymanumelwin
 
Foreign market selection process
Foreign market selection processForeign market selection process
Foreign market selection processBeatrice Al-Nisham
 
Decision Support System
Decision Support SystemDecision Support System
Decision Support Systemparamalways
 
6. International Marketing, Market Selection, Modes of Entry in International...
6. International Marketing, Market Selection, Modes of Entry in International...6. International Marketing, Market Selection, Modes of Entry in International...
6. International Marketing, Market Selection, Modes of Entry in International...Charu Rastogi
 
Foreign market entry strategies
Foreign market entry strategiesForeign market entry strategies
Foreign market entry strategiesGeeta Shiromani
 
Modes of Entry into International Business
Modes of Entry into International BusinessModes of Entry into International Business
Modes of Entry into International BusinessPrathamesh Parab
 

Viewers also liked (11)

International Business Environment Full Syllabus
International Business Environment   Full SyllabusInternational Business Environment   Full Syllabus
International Business Environment Full Syllabus
 
IAF605 week 9 chapter 12 country evaluation and selection
IAF605 week 9 chapter 12 country evaluation and selectionIAF605 week 9 chapter 12 country evaluation and selection
IAF605 week 9 chapter 12 country evaluation and selection
 
12 Country Evaluation and Selection
12 Country Evaluation and Selection12 Country Evaluation and Selection
12 Country Evaluation and Selection
 
Country evaluation and selection
Country evaluation and selectionCountry evaluation and selection
Country evaluation and selection
 
Country evaluation and selection - International Business - Manu Melwin Joy
Country evaluation and selection - International Business - Manu Melwin JoyCountry evaluation and selection - International Business - Manu Melwin Joy
Country evaluation and selection - International Business - Manu Melwin Joy
 
Foreign market selection process
Foreign market selection processForeign market selection process
Foreign market selection process
 
Market Entry Strategies
Market Entry StrategiesMarket Entry Strategies
Market Entry Strategies
 
Decision Support System
Decision Support SystemDecision Support System
Decision Support System
 
6. International Marketing, Market Selection, Modes of Entry in International...
6. International Marketing, Market Selection, Modes of Entry in International...6. International Marketing, Market Selection, Modes of Entry in International...
6. International Marketing, Market Selection, Modes of Entry in International...
 
Foreign market entry strategies
Foreign market entry strategiesForeign market entry strategies
Foreign market entry strategies
 
Modes of Entry into International Business
Modes of Entry into International BusinessModes of Entry into International Business
Modes of Entry into International Business
 

Similar to Split Criterions for Variable Selection Using Decision Trees

Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...NTNU
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slidespannicle
 
On cascading small decision trees
On cascading small decision treesOn cascading small decision trees
On cascading small decision treesJulià Minguillón
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesNTNU
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.pptRohit Raj
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learningSylvain Ferrandiz
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactionssahirbhatnagar
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesNTNU
 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptxssuser908de6
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxHimanshuSharma997566
 
IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...George Balikas
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
Introduction
IntroductionIntroduction
Introductionbutest
 

Similar to Split Criterions for Variable Selection Using Decision Trees (20)

Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
 
On cascading small decision trees
On cascading small decision treesOn cascading small decision trees
On cascading small decision trees
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of Cases
 
Longintro
LongintroLongintro
Longintro
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees
 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptx
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
 
IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
Introduction
IntroductionIntroduction
Introduction
 

More from NTNU

Varying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilitiesVarying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilitiesNTNU
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
 
Bagging Decision Trees on Data Sets with Classification Noise
Bagging Decision Trees on Data Sets with Classification NoiseBagging Decision Trees on Data Sets with Classification Noise
Bagging Decision Trees on Data Sets with Classification NoiseNTNU
 
lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...NTNU
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsNTNU
 
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...NTNU
 
An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...NTNU
 
Learning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait lociLearning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait lociNTNU
 
Interactive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksInteractive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksNTNU
 
A Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification TreesA Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification TreesNTNU
 
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...NTNU
 
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...NTNU
 
Evaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy predictionEvaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy predictionNTNU
 
Effects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy PredictionEffects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy PredictionNTNU
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6NTNU
 

More from NTNU (15)

Varying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilitiesVarying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilities
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Bagging Decision Trees on Data Sets with Classification Noise
Bagging Decision Trees on Data Sets with Classification NoiseBagging Decision Trees on Data Sets with Classification Noise
Bagging Decision Trees on Data Sets with Classification Noise
 
lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
 
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
 
An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...
 
Learning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait lociLearning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait loci
 
Interactive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksInteractive Learning of Bayesian Networks
Interactive Learning of Bayesian Networks
 
A Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification TreesA Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification Trees
 
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
 
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
 
Evaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy predictionEvaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy prediction
 
Effects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy PredictionEffects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy Prediction
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6
 

Recently uploaded

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 

Recently uploaded (20)

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 

Split Criterions for Variable Selection Using Decision Trees

  • 1. Split Criterions for Variable Selection using Decision Trees J. Abellán, A. R. Masegosa Department of Computer Science and A.I. University of Granada Spain
  • 2. Outline 1. Introduction 2. Previous knowledge 3. Experimentation 4. Conclusions & future work
  • 3. Introduction Information from a data base Attribute variables Class variable Data Base Calcium Tumor Coma Migraine Cancer normal a1 absent absent absent high a1 present absent present normal a1 absent absent absent normal a1 absent absent absent high ao present present absent ...... ...... ...... ...... ......
  • 4. Introduction Classificacion tree (decision tree) Tumor Classification: absent Calcium Classification: absent Classification: present Attribute variableNode Case of the class variableLeaf  SPLIT CRITERION  STOP CRITERION  1 LEAF = 1 RULE
  • 5. Introduction Classification tree. New observation  Observation: ( high, a1, absent, present)  Variables: [Calcium, Tumor, Coma, Migraine]  Classification: Cancer present normal high a0 a1 Classification : absent Classification: absent Classification: present Calcium Tumor
  • 6. Introduction Principal problems for the clasifiers  Redundant attribute variables  Irrelevant attribute variables  Excessive number of variables Variable Selection Methods  Filter methods  Wrapper methods (classifier dependency) Mark A. Hall y G. Holmes, Benchmarking Attribute Selection Techniques for Discrete Class Data Mining, IEEE TKDE (2003)
  • 7. Introduction Variable selection with classification trees Xa XdXcXb Xe Xf Xg Xh Xi XkXj …………………………………………………….. {Xa, Xb, Xc, Xd} {Xa, Xb,..., Xk} FRIST LEVELS  MORE SIGNIFICATIVE VARIABLES
  • 8. Introduction Variable selection with classification trees ……………….......... ………… DB DB DB Training set SET1 SET2 SETm SET1 U...U SETm ………… Training set Training set
  • 9. Introduction Variable selection with classification trees ……………….......... ………… DB DB DB SET1 SET2 SETm ………… INFORMATIVE ORDER FOR THE ROOT NODE (Abellán & Masegosa, 2007) Training set Training set Training set SET1 U...U SETm
  • 10. Introduction Approach of the work presented  Stablish the most suitable split criterion for building decision trees to use it as base for those compose methods for VARIABLE SELECTION.  The variables of the first levels of one decision tree are extracted.  The performance of this variables is evaluated with a Naive Bayes classifier.  We carry out EXPERIMENTS on a large set of data bases using well-known split criterions (InfoGain, IGRatio and GiniIndex) and another one based on imprecise probabilities (Abellán & Moral, 2003), Imprecise InfoGain.
  • 11. Outline 1. Introduction 2. Previous knowledge 3. Experimentation 4. Conclusions & future works
  • 12. Previos knowledges Naive Bayes (Duda & Hart, 1973)  Attribute variables {Xi | i=1,..,r}  Class variable C with states in {c1,..,ck}  Select state of C: arg maxci (P(ci|X)).  Supposition of independecy known the class variable: arg maxci (P(ci) ∏r j=1 P(zj|ci)) … C X1 X2 Xr Graphical Structure
  • 13. Previos knowledges Split Criterions for decision trees: Info-Gain (Quinlan, 1986)  Selects the attribute variable with higher positive value of IG(Xi,C) = H(C)-H(C|Xi)  H(C) = -∑j P(cj) log P(cj)SHANNON ENTROPY  H(C|Xi) = -∑t ∑j P(cj|xi t) log P(cj|xi t) ID3  Work only with discrete data bases  Have a tendence to select variables with great number of cases
  • 14. Previos knowledges Split Criterions for decision trees: Info-Gain Ratio (Quinlan, 1993)  Selects the attribute variable with higher positive value of IGR(Xi,C) = IG(Xi,C)/ H(Xi) C4.5  Work with continuous data bases  Have a posterior prune process  Penalize the use of variables with higher number of cases
  • 15. Previos knowledges Split Criterions for decision trees: Gini index (Breiman et al., 1984)  Selects the attribute variable with higher positive value of GIx(Xi,C) = gini(C|Xi)-gini(C)  gini(C) = ∑j (1-P(cj))2  gini(C|Xi) = ∑t P(xi t) gini(C|xi t) GINI INDEX Quantify the impurity degree of a partition (a “pure partition” has only values in one case of C)
  • 16. Previos knowledges Split Criterions for decision trees: Imprecise Info-Gain (Abellán & Moral, 2003)  Representing the information from a data base Imprecise Dirichlet Model (IDM) Probability estimation j jj c cc j I sN sn sN n cP ≡      + + + ∈ ,)( })(|{)( jcj IcqqCK ∈= })(|{)|( },{ ij xcji IcqqxXCK ∈== Credal Sets
  • 17. Previos knowledges Split Criterions for decision trees: Imprecise Info-Gain (Abellán & Moral, 2003)  Select the attribute variable with higher positive value of: IGI(Xi,C) = S(K(C)) - ∑t P(xi t) S(K(C| Xi=xi t)) with S as Maximum entropy function of a credal set.  Global uncertainty measure ⊃ conflict & no-specificity  Conflict is on the side of ramification.  No-specificity tries to reduce the ramification.
  • 18. Outline 1. Introduction 2. Previous knowledge 3. Experimentation 4. Conclusions & future works
  • 19. Experimentation Data Bases  Preprocess: - Filling of missing data (average & mode) - Discretization of continuous values  Aplication of selection methods  Aplication of NB on original BDs with the set of selected variables •Percentage of correct classification of NB before and after the selection process •Number of variable selected
  • 20. Experimentation Results with 3 levels. Correct classifications  NB comparison:  Accumulated Comparison: 10 fold-cross x 10 times. Corrected paired t-test with 5% of significance level
  • 21. Experimentation Results with 3 levels. Number of variables  Accumulated Comparison:
  • 22. Experimentation Results with 4 levels  Comparison over right classifications:  Comparison over number of variables:
  • 23. Experimentation Results Analysis 1. Only using one tree, all the procedures obtain good results using a few number of variables. 2. The improvement from 3 to 4 levels is not very significative, except for IGR. 3. IGR penalizes excesivelly variables with high number of cases (Audiology, Optdigits,..). 4. Using 3 levels, IIG has better results than the other criterions. This outperforming is higher with 4 levels.
  • 24. Outline 1. Introduction 2. Previous knowledges 3. Experimentation 4. Conclusions & future works
  • 25. Conclusions & future works  Experiments over 27 DBs present to IGI as a outperforming split-criterion considering the trade off of accuracy and nº of variables.  Apply IGI criterion and others ones based on bayesian scores on the compose methods explained in the introduction.  Study the use of combined criterions, i.e. to use of one or other criterion with dependency of the characteristics of the BD (size, number of variables, number of cases, etc…) and level of the tree we stay.