Split Criterions for Variable Selection Using Decision Trees

Split Criterions for
Variable Selection
using Decision Trees
J. Abellán, A. R. Masegosa
Department of Computer Science and A.I.
University of Granada
Spain

Outline
1. Introduction
2. Previous knowledge
3. Experimentation
4. Conclusions & future work

Introduction
Information from a data base
Attribute variables Class variable
Data Base
Calcium Tumor Coma Migraine Cancer
normal a1 absent absent absent
high a1 present absent present
high ao present present absent
...... ...... ...... ...... ......

Introduction
Classificacion tree (decision tree)
Tumor
Classification:
absent
Calcium
Classification:
absent
Classification:
present
Attribute variableNode
Case of the class variableLeaf
 SPLIT CRITERION
 STOP CRITERION
 1 LEAF = 1 RULE

Introduction
Classification tree. New observation
 Observation: ( high, a1, absent, present)
 Variables: [Calcium, Tumor, Coma, Migraine]
 Classification: Cancer present
normal high
a0 a1
Classification :
absent
Classification:
absent
Classification:
present
Calcium
Tumor

Introduction
Principal problems for the clasifiers
 Redundant attribute variables
 Irrelevant attribute variables
 Excessive number of variables
Variable Selection Methods
 Filter methods
 Wrapper methods (classifier dependency)
Mark A. Hall y G. Holmes, Benchmarking Attribute Selection Techniques for
Discrete Class Data Mining, IEEE TKDE (2003)

Introduction
Variable selection with classification trees
Xa
XdXcXb
Xe Xf Xg Xh Xi XkXj
……………………………………………………..
{Xa, Xb, Xc, Xd}
{Xa, Xb,..., Xk}
FRIST LEVELS  MORE SIGNIFICATIVE VARIABLES

Introduction
………………..........
…………
DB
DB
DB
Training set
SET1
SET2
SETm
SET1 U...U SETm
…………
Training set
Training set

Introduction
………………..........
…………
DB
DB
DB
SET1
SET2
SETm
…………
INFORMATIVE ORDER FOR
THE ROOT NODE (Abellán &
Masegosa, 2007)
Training set
Training set
Training set
SET1 U...U SETm

Introduction
Approach of the work presented
 Stablish the most suitable split criterion for building
decision trees to use it as base for those compose
methods for VARIABLE SELECTION.
 The variables of the first levels of one decision tree
are extracted.
 The performance of this variables is evaluated with a
Naive Bayes classifier.
 We carry out EXPERIMENTS on a large set of data
bases using well-known split criterions (InfoGain,
IGRatio and GiniIndex) and another one based on
imprecise probabilities (Abellán & Moral, 2003),
Imprecise InfoGain.

Outline
1. Introduction
2. Previous knowledge
3. Experimentation
4. Conclusions & future works

Previos knowledges
Naive Bayes (Duda & Hart, 1973)
 Attribute variables {Xi | i=1,..,r}
 Class variable C with states in
{c1,..,ck}
 Select state of C:
arg maxci
(P(ci|X)).
 Supposition of independecy
known the class variable:
arg maxci
(P(ci) ∏r
j=1
P(zj|ci))
…
C
X1 X2 Xr
Graphical Structure

Previos knowledges
Split Criterions for decision trees:
Info-Gain (Quinlan, 1986)
 Selects the attribute variable with higher positive
value of IG(Xi,C) = H(C)-H(C|Xi)
 H(C) = -∑j
P(cj) log P(cj)SHANNON ENTROPY
 H(C|Xi) = -∑t
∑j
P(cj|xi
t) log P(cj|xi
t)
ID3
 Work only with discrete data bases
 Have a tendence to select variables with great
number of cases

Previos knowledges
Info-Gain Ratio (Quinlan, 1993)
value of IGR(Xi,C) = IG(Xi,C)/ H(Xi)
C4.5
 Work with continuous data bases
 Have a posterior prune process
 Penalize the use of variables with higher number of
cases

Previos knowledges
Gini index (Breiman et al., 1984)
value of GIx(Xi,C) = gini(C|Xi)-gini(C)
 gini(C) = ∑j
(1-P(cj))2
 gini(C|Xi) = ∑t
P(xi
t) gini(C|xi
t)
GINI INDEX
Quantify the impurity degree of a partition
(a “pure partition” has only values in one case of C)

Previos knowledges
Imprecise Info-Gain (Abellán & Moral, 2003)
 Representing the information from a data base
Imprecise Dirichlet Model (IDM) Probability estimation
j
jj
c
cc
j I
sN
sn
sN
n
cP ≡





+
+
+
∈ ,)(
})(|{)( jcj IcqqCK ∈= })(|{)|( },{ ij xcji IcqqxXCK ∈==
Credal Sets

Previos knowledges
Imprecise Info-Gain (Abellán & Moral, 2003)
 Select the attribute variable with higher positive
value of:
IGI(Xi,C) = S(K(C)) - ∑t
P(xi
t) S(K(C| Xi=xi
t))
with S as Maximum entropy function of a credal set.
 Global uncertainty measure ⊃ conflict & no-specificity
 Conflict is on the side of ramification.
 No-specificity tries to reduce the ramification.

Experimentation
Data Bases
 Preprocess:
- Filling of missing data
(average & mode)
- Discretization of
continuous values
 Aplication of selection
methods
 Aplication of NB on
original BDs with the set
of selected variables
•Percentage of correct
classification of NB
before and after the
selection process
•Number of variable
selected

Experimentation
Results with 3 levels. Correct classifications
 NB comparison:
 Accumulated Comparison:
10 fold-cross x 10 times. Corrected paired t-test with 5% of significance level

Experimentation
Results with 3 levels. Number of variables
 Accumulated Comparison:

Experimentation
Results with 4 levels
 Comparison over right classifications:
 Comparison over number of variables:

Experimentation
Results Analysis
1. Only using one tree, all the procedures obtain
good results using a few number of variables.
2. The improvement from 3 to 4 levels is not very
significative, except for IGR.
3. IGR penalizes excesivelly variables with high
number of cases (Audiology, Optdigits,..).
4. Using 3 levels, IIG has better results than the
other criterions. This outperforming is higher
with 4 levels.

Outline
1. Introduction
2. Previous knowledges
3. Experimentation
4. Conclusions & future works

Conclusions & future works
 Experiments over 27 DBs present to IGI as a
outperforming split-criterion considering the
trade off of accuracy and nº of variables.
 Apply IGI criterion and others ones based on
bayesian scores on the compose methods
explained in the introduction.
 Study the use of combined criterions, i.e. to
use of one or other criterion with dependency of
the characteristics of the BD (size, number of
variables, number of cases, etc…) and level of
the tree we stay.

Split Criterions for Variable Selection Using Decision Trees

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to Split Criterions for Variable Selection Using Decision Trees

Similar to Split Criterions for Variable Selection Using Decision Trees (20)

More from NTNU

More from NTNU (15)

Recently uploaded

Recently uploaded (20)

Split Criterions for Variable Selection Using Decision Trees