We shall present a first explorative study of the variation of the parameter s of the imprecise Dirichlet model when it is used to build classification trees. In the method to build classification trees we use uncertainty measures on closed and convex sets of probability distributions, otherwise known as credal sets. We will use the imprecise Dirichlet model to obtain a credal set from a sample, where the set of probabilities obtained depends on s. According to the characteristics of the dataset used, we will see that the results can be improved varying the values of s.
Varying parameter in classification based on imprecise probabilities
1. 1
Varying parameter in classification
based on imprecise probabilities
SMPS
Bristol, September 2006
J. Abellán, S. Moral, M. Gómez, A. Masegosa
Department of Computer Science and AI
University of Granada
2. 2
Index
1. Classifiying with Decision Trees.
2. Decision Trees with Imprecise
Probabilities.
2.1. Imprecise Dirichlet Model.
2.2. Classification Method.
3. Experimentation.
4. Conclusions and Future Work.
3. 3
1. Classifying with Decision Trees
In a classification problem, there is a data set D with values of
a set of attribute variables X (petal width, petal length, sepal
width, sepal length) and a class variable C (Iris Flower
Type:Setosa, Versicolor, Virginica).
A decision tree is a tree model with an attribute variable in
each node and a value of the class variable in the leafs.
Petal-W
small large
Setal-W
small large
Petal-L
small large
Versicolor Setosa ViginicaVersicolor
4. 4
1. Classifying with Decision Trees
In a Decision Tree, the path from the root node to another node
defines a configuration. A configuration σ is a m-tuple with a
set of fixed values for several attribute variables (Petal-Width =
small, Sepal-Width = small).
ID3 of Quinlan is based in the Shannon entropy. It is measured
the reduction in the uncertainty that provokes the introduction
of one variable in the model:
The entropy measures the uncertainty about C.
)|()()|( XCHCHXCInfoGain −=
∑∑ ⋅−==⋅=−=
j
jj
j c
cc
c
jj
N
n
N
n
cCPcCPCH )(log)|(log)|()|(:Enropy 22 σ
σ
σ
σ
σσσ
5. 5
1. Classifying with Decision Trees
Example: Iris Subtype Classifcation.
IG(Class|Petal-W) =0.26
IG(Class|Petal-L) =0.15
IG(Class|Sepal-L) =0.12
IG(Class|Sepal-W)=0.11
Petal-W
small large
Setal-W
small large
Petal-L
small large
Versicolor Setosa ViginicaVersicolor
IG(Class|Petal-W=large, Petal-L) =0.13
IG(Class|Petal-W=large, Setal-W) =0.08
IG(Class|Petal-W=large, Setal-L) =-0.05
IG(Class|Petal-W=small, Setal-W) =0.11
IG(Class|Petal-W=small, Petal-L) =0.04
IG(Class|Petal-W=small, Setal-L) =-0.02
IG(Class|Petal-W=small, Setal-W=small, Petal-L) = - 0.08
IG(Class|Petal-W=small, Setal-W=small, Petal-W) = - 0.12
?
6. 6
2. Decision Trees with Imprecise Probabilities
Recently, Abellán,Moral [7] have introduced a new
algorithm to build decision trees based on Imprecise
Dirichlet Model of Walley [22].
The authors use of the maximum entropy on credal
sets as a good measure of total uncertainty [7,16].
In this way, the structure of the decision tree is exactly
the same, the difference is in the criterium to select
variables in the ramification.
Using this new entropy criterium, the performance is
better than ID3.
7. 7
2.1. Imprecise Dirichlet Model (IDM) [22]
In this model, the probabilities are estimated as an interval.
Given a data set D and a configuration σ. We shall consider a
credal set for the class variable C respect to σ defined by the
set of probabilities distributions, p = (p1,p2,…,pk), such that:
These intervals are obtained in basis of IDM and they depends
of the real value s > 0 that is a hyperparemeter of the model.
For the IDM, Walley[22] suggests a value for s between 1 and 2.
“Classification performance effects of varying the paremeter s”.
σ
CΡ
[ ])|(),|(,)|( σσσ σ
σ
σ
σ
jj
cc
jj cPcP
sN
sn
sN
n
cCPp jj
=
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
+
+
+
∈==
8. 8
2.2 The Classification Method
This method [7] evaluates a single variable and, also, a couple
of variable for the ramification in each step.
There will be considered the following functions:
If Max{Inf1(X)} > Max{Inf2(Y,Z)}, it is selected X.
If Max{Inf2(Y,Z)} > Max{Inf1(X)}, it is selected Y if
Inf(Y)>Inf(Z) and it is selected Z, otherwise.
When the inclusion of any variable does not reduce the
uncertainty of the class variable, a leaf is introduced.The most
frequence class in the subset of data compatible with its
configuration is associated to this leaf.
( )}{
),(1 i
i
xX
C
x
i
TU
N
n
XInf =
Ρ⋅= ∑ Iσ
σ
σ
σ { }
( )ji
ji
yYxX
C
yx
ji
TU
N
n
YXInf
==
Ρ= ∑
,
,
,
),,(2
Iσ
σ
σ
σ
9. 9
3. Experimentation
The evaluation was carried out over 8 data sets from
UCI repository:
Discretized with Fayyad and Irani’s procedure [14].
It was considered the following set of value for s:
{0.5, 0.1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 6, 8, k/2}
11. 11
4. Conclusions and Future Work
First Explorative Study of the results of our
classification method varying parameters.
It is posible to improve the result changing the value
of s.
It is proposed s=1.5. It is between the values
proposed by Walley [22].
More studies and experiments are necessary to
ascertain the ideal relationship between the value of
s and some characteristics of the data base.