In the field of attribute mining, several feature selection methods have recently appeared indicating that the use of sets of decision trees learnt from a data set can be an useful tool for selecting relevant and informative variables regarding to a main class variable. With this aim, in this study, we claim that the use of a new split criterion to build decision trees outperforms another classic split criterions for variable selection purposes. We present an experimental study on a wide and different set of databases using only one decision tree with each split criterion to select variables for the Naive Bayes classifier.
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Split Criterions for Variable Selection Using Decision Trees
1. Split Criterions for
Variable Selection
using Decision Trees
J. Abellán, A. R. Masegosa
Department of Computer Science and A.I.
University of Granada
Spain
3. Introduction
Information from a data base
Attribute variables Class variable
Data Base
Calcium Tumor Coma Migraine Cancer
normal a1 absent absent absent
high a1 present absent present
normal a1 absent absent absent
normal a1 absent absent absent
high ao present present absent
...... ...... ...... ...... ......
4. Introduction
Classificacion tree (decision tree)
Tumor
Classification:
absent
Calcium
Classification:
absent
Classification:
present
Attribute variableNode
Case of the class variableLeaf
SPLIT CRITERION
STOP CRITERION
1 LEAF = 1 RULE
5. Introduction
Classification tree. New observation
Observation: ( high, a1, absent, present)
Variables: [Calcium, Tumor, Coma, Migraine]
Classification: Cancer present
normal high
a0 a1
Classification :
absent
Classification:
absent
Classification:
present
Calcium
Tumor
6. Introduction
Principal problems for the clasifiers
Redundant attribute variables
Irrelevant attribute variables
Excessive number of variables
Variable Selection Methods
Filter methods
Wrapper methods (classifier dependency)
Mark A. Hall y G. Holmes, Benchmarking Attribute Selection Techniques for
Discrete Class Data Mining, IEEE TKDE (2003)
7. Introduction
Variable selection with classification trees
Xa
XdXcXb
Xe Xf Xg Xh Xi XkXj
……………………………………………………..
{Xa, Xb, Xc, Xd}
{Xa, Xb,..., Xk}
FRIST LEVELS MORE SIGNIFICATIVE VARIABLES
8. Introduction
Variable selection with classification trees
………………..........
…………
DB
DB
DB
Training set
SET1
SET2
SETm
SET1 U...U SETm
…………
Training set
Training set
9. Introduction
Variable selection with classification trees
………………..........
…………
DB
DB
DB
SET1
SET2
SETm
…………
INFORMATIVE ORDER FOR
THE ROOT NODE (Abellán &
Masegosa, 2007)
Training set
Training set
Training set
SET1 U...U SETm
10. Introduction
Approach of the work presented
Stablish the most suitable split criterion for building
decision trees to use it as base for those compose
methods for VARIABLE SELECTION.
The variables of the first levels of one decision tree
are extracted.
The performance of this variables is evaluated with a
Naive Bayes classifier.
We carry out EXPERIMENTS on a large set of data
bases using well-known split criterions (InfoGain,
IGRatio and GiniIndex) and another one based on
imprecise probabilities (Abellán & Moral, 2003),
Imprecise InfoGain.
12. Previos knowledges
Naive Bayes (Duda & Hart, 1973)
Attribute variables {Xi | i=1,..,r}
Class variable C with states in
{c1,..,ck}
Select state of C:
arg maxci
(P(ci|X)).
Supposition of independecy
known the class variable:
arg maxci
(P(ci) ∏r
j=1
P(zj|ci))
…
C
X1 X2 Xr
Graphical Structure
13. Previos knowledges
Split Criterions for decision trees:
Info-Gain (Quinlan, 1986)
Selects the attribute variable with higher positive
value of IG(Xi,C) = H(C)-H(C|Xi)
H(C) = -∑j
P(cj) log P(cj)SHANNON ENTROPY
H(C|Xi) = -∑t
∑j
P(cj|xi
t) log P(cj|xi
t)
ID3
Work only with discrete data bases
Have a tendence to select variables with great
number of cases
14. Previos knowledges
Split Criterions for decision trees:
Info-Gain Ratio (Quinlan, 1993)
Selects the attribute variable with higher positive
value of IGR(Xi,C) = IG(Xi,C)/ H(Xi)
C4.5
Work with continuous data bases
Have a posterior prune process
Penalize the use of variables with higher number of
cases
15. Previos knowledges
Split Criterions for decision trees:
Gini index (Breiman et al., 1984)
Selects the attribute variable with higher positive
value of GIx(Xi,C) = gini(C|Xi)-gini(C)
gini(C) = ∑j
(1-P(cj))2
gini(C|Xi) = ∑t
P(xi
t) gini(C|xi
t)
GINI INDEX
Quantify the impurity degree of a partition
(a “pure partition” has only values in one case of C)
16. Previos knowledges
Split Criterions for decision trees:
Imprecise Info-Gain (Abellán & Moral, 2003)
Representing the information from a data base
Imprecise Dirichlet Model (IDM) Probability estimation
j
jj
c
cc
j I
sN
sn
sN
n
cP ≡
+
+
+
∈ ,)(
})(|{)( jcj IcqqCK ∈= })(|{)|( },{ ij xcji IcqqxXCK ∈==
Credal Sets
17. Previos knowledges
Split Criterions for decision trees:
Imprecise Info-Gain (Abellán & Moral, 2003)
Select the attribute variable with higher positive
value of:
IGI(Xi,C) = S(K(C)) - ∑t
P(xi
t) S(K(C| Xi=xi
t))
with S as Maximum entropy function of a credal set.
Global uncertainty measure ⊃ conflict & no-specificity
Conflict is on the side of ramification.
No-specificity tries to reduce the ramification.
19. Experimentation
Data Bases
Preprocess:
- Filling of missing data
(average & mode)
- Discretization of
continuous values
Aplication of selection
methods
Aplication of NB on
original BDs with the set
of selected variables
•Percentage of correct
classification of NB
before and after the
selection process
•Number of variable
selected
20. Experimentation
Results with 3 levels. Correct classifications
NB comparison:
Accumulated Comparison:
10 fold-cross x 10 times. Corrected paired t-test with 5% of significance level
23. Experimentation
Results Analysis
1. Only using one tree, all the procedures obtain
good results using a few number of variables.
2. The improvement from 3 to 4 levels is not very
significative, except for IGR.
3. IGR penalizes excesivelly variables with high
number of cases (Audiology, Optdigits,..).
4. Using 3 levels, IIG has better results than the
other criterions. This outperforming is higher
with 4 levels.
25. Conclusions & future works
Experiments over 27 DBs present to IGI as a
outperforming split-criterion considering the
trade off of accuracy and nº of variables.
Apply IGI criterion and others ones based on
bayesian scores on the compose methods
explained in the introduction.
Study the use of combined criterions, i.e. to
use of one or other criterion with dependency of
the characteristics of the BD (size, number of
variables, number of cases, etc…) and level of
the tree we stay.