Presentation slides of Masters Thesis 'Multivalued Subsets Under Information Theory'. Application of Metaheuristic based search algorithms in an ID3 generated Decision Tree.
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Multivalued Subsets Under Information Theory
1. Multivalued Subsets under Information Theory
THESIS
Me
Father
Father's Father
Father's
Paternal
Grandfather
Father's
Paternal
Grandmother
Father's
Mother
Father's
Maternal
Grandfather
Father's
Maternal
Grandmother
Mother
Mother's
Father
Mother's
Paternal
Grandfather
Mother's
Paternal
Grandmother
Mother's
Mother
Mother's
Maternal
Grandfather
Mother's
Maternal
Grandmother
Indraneel Dabhade
2. Key Data Mining Tools
Regression
Decision Tree
Neural Network Clustering
Association Rules
3. Information Gain
( , ) ( ) ( , )
i i
Gain A S Ent S E A S
2
0 ( , ) log
i
Gain A S K
Classes
Class1
Class2
Class2
Class3
Class3
Class1
Class1
Class2
A
A1
A1
A2
A3
A4
A5
A5
A5
Classes
Class1
Class2
Class2
Class3
Class3
Class1
Class1
Class2
Ent(S) ( , )
i
E A S
2
1
log
n
i i
i
p p
Entropy =
5. Classifiers
Adaptation from the slides of Michael Collins, Discriminative Reranking for Natural Language Parsing, ICML 2000
Given: m examples (x1, y1), …, (xm, ym) where xiÎX, yiÎY={-1, +1}
Initialize D1(i) = 1/m
For t = 1 to T
]
)
(
[
Pr ~ i
i
t
D
i
t y
x
h
t
1. Train learner ht with min error
t
t
t
1
ln
2
1
2. Compute the hypothesis weight
i
i
t
i
i
t
t
t
t
y
x
h
e
y
x
h
e
Z
i
D
i
D
t
t
)
(
if
)
(
if
)
(
)
(
1
3. For each example i = 1 to m
T
t
t
t x
h
x
H
1
)
(
sign
)
(
Output
Adaptive Boosting (Basic)
6. Classifiers
Adaptive Boosting (Basic)
Adaptation from the slides of Freund and Shapire (1996)
e1 = 0.300
a1=0.424
e2 = 0.196
a2=0.704
= + +
e3 = 0.344
a2=0.323
Weak Classifiers
Strong Classifiers
•Need to extend the 2- class to multi-class learning
•Usage of AdaBoost.M1
7. Classifiers Classification via Regression
Instances Att1 Att2 Att3 Att4 Classes
1 A1 B1 D1 C1 Class2
2 A1 B1 D1 C2 Class2
3 A2 B1 D1 C1 Class1
4 A3 B2 D1 C1 Class1
5 A3 B3 D2 C1 Class1
6 A3 B3 D2 C2 Class2
7 A2 B3 D2 C1 Class1
Instan
ces
Att1 Att2 Att3 Att4 Classe
s
1 A1 B1 D1 C1 0
2 A1 B1 D1 C2 0
3 A2 B1 D1 C1 1
4 A3 B2 D1 C1 0
5 A3 B3 D2 C1 1
6 A3 B3 D2 C2 0
7 A2 B3 D2 C1 1
Instan
ces
Att1 Att2 Att3 Att4 Classe
s
1 A1 B1 D1 C1 1
2 A1 B1 D1 C2 1
3 A2 B1 D1 C1 0
4 A3 B2 D1 C1 1
5 A3 B3 D2 C1 0
6 A3 B3 D2 C2 1
7 A2 B3 D2 C1 0
Class 1 Class 2…
Test query ( 1 2, 2 1, 3 2, 4 1) ?
f ATT A ATT B ATT D ATT C
( 1)
f Class ( 2)
f Class
( 1) 0.1
f Class ( 2) 0.9
f Class
Class(Test query) =Class 2
11. Information Gain evaluation for ID3
Insta
nces
Att1 Att n
1 A1 …. N1
2 A1 …. N2
3 A2 …. N5
4 A3 …. N4
5 A3 …. N6
6 A3 …. N7
7 A2 …. N8
8 A1 …. N4
9 A1 …. N5
10 A3 …. N3
11 A1 …. N2
12 A2 …. N5
13 A2 …. N6
14 A3 …. N8
Att1 Class1 Class2 …. Class
n
A1 4 3 …. 4
A2 5 7 …. 5
A3 7 6 …. 0
Class Quanta Identity
…
Att n Class1 Class2 …. Class
n
N1 1 0 …. 5
N2 3 2 …. 9
N3 4 6 …. 3
Att Information Gain
1 0.877
2 0.511
3 1.45
4 1.44
Considering the ‘Iris’ Dataset
12. Information Gain evaluation for GID
Att1 Class
1
Class
2
…. Clas
s n
A1 4 3 …. 4
Rest 12 13 …. 5
Class Quanta Identity
…
Att n Clas
s1
Class2 …. Class
n
N1 1 0 …. 5
Rest 7 8 …. 12
Att Information Gain
1 0.06886155
2 0.06583857
3 0.162349
4 0.3645836
Considering the ‘Iris’ Dataset
Att1 Class
1
Class
2
…. Clas
s n
A m 4 3 …. 4
Rest 12 13 …. 5
…
Att n Clas
s1
Class2 …. Class
n
NM 1 0 …. 5
Rest 7 8 …. 12
…
…
Insta
nces
Att1 Att n
1 A1 …. N1
2 A1 …. N2
3 A2 …. N5
4 A3 …. N4
5 A3 …. N6
6 A3 …. N7
7 A2 …. N8
8 A1 …. N4
9 A1 …. N5
10 A3 …. N3
11 A1 …. N2
12 A2 …. N5
13 A2 …. N6
14 A3 …. N8
13. NP Hard
Att1
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
Subset 1
Subset 2
What makes the problem interesting ? Why is it NP-Hard ?
Att Class
A1 1
A1 1
A2 1
A3 2
A3 2
A3 3
A4 4
Att Class
1 1
1 1
0 1
0 2
0 2
0 3
0 4
Att Class
1 1
1 1
1 1
0 2
0 2
0 3
0 4
0.4695652
Information Gain
0.9852281
Information Gain
GID
MVS
14. Information Gain evaluation for MVS
Att1 Class
1
Class
2
…. Clas
s n
Subset
1
4 3 …. 4
Subset
2
12 13 …. 5
Class Quanta Identity
…
Att n Cla
ss
1
Class2 …. Class
n
Subset
1
1 0 …. 5
Subset
2
7 8 …. 12
Att Information Gain
1 0.128627
2 0.120512
3 0.345634
4 0.618695
Considering the ‘Iris’ Dataset
Att1 Class
1
Class
2
…. Clas
s n
Subset
1
4 3 …. 4
Subset
2
12 13 …. 5
…
Att n Clas
s1
Class2 …. Class
n
Subset
1
1 0 …. 5
Subset
2
7 8 …. 12
…
…
Insta
nces
Att1 Att n
1 A1 …. N1
2 A1 …. N2
3 A2 …. N5
4 A3 …. N4
5 A3 …. N6
6 A3 …. N7
7 A2 …. N8
8 A1 …. N4
9 A1 …. N5
10 A3 …. N3
11 A1 …. N2
12 A2 …. N5
13 A2 …. N6
14 A3 …. N8
15. Testing Conditions
Dataset Instances Attributes Unique
Values
Data Type Missing
Values
Iris 150 4 22-43 Fractional No
Glass 214 9 32-178 Fractional No
Images 4435 36 49-104 Integer Missing
Class ‘6’
PenDigits* 7494* 16 96-101 Integer No
Vehicles 846 18 13-424 Integer No
Datasets
Palmetto High Performance Computing
Wall time runs of 50 hours (in parallel)
Use of ‘Mersenne Twister’ pseudorandom number generator
* Reduced to 4350
Classifier Time to
compute
Nature Rule Generation
AdaBoost Low Deterministic
/Probabilistic
Function of Sample
Size and weighted
predictions
ID3 Low Deterministic Robust Rule
Regression High Deterministic Robust Rule
Naïve Bayesian Low Probabilistic Robust Rule
Classification Algorithms
16. Multisubset variant using the Adaptive Simulated Annealing
generate initial solution
initialize
begin
initialize
while {
begin
initialize
while
begin
Binary-Rand( )
form CQI for the binary subsets
if solution< then change
if solution> = then change
evaluate Δ= -L_
if Δ>0 then L_ =
if Δ<0 then if then L_ =
then =
end
lower
end
end
Fl,Fh,Ebest,Econfig
To,Tend
To > Tend
Lb,I,Lt
Lt < (Lb + I)
nx1
Fl Fl
Fh Fh
Solcurr Solcurr
Solcurr Solcurr
eD/To
> Rand(1) Solcurr Solcurr
Ebest Solcurr
Lt = Lb + (Lb.(1- e)
-(Fh-Fl)
Fh
)
To
Instances Att1 Att2 Att3 Att4 Classe
s
1 A1 B1 D1 C1 Class2
2 A1 B1 D1 C2 Class2
3 A4 B1 D1 C3 Class1
4 A3 B2 D1 C4 Class1
5 A3 B3 D2 C5 Class1
6 A3 B3 D2 C6 Class2
7 A2 B3 D2 C5 Class1
8 A4 B4 D3 C3 Class2
9 A1 B3 D2 C1 Class1
10 A3 B2 D2 C2 Class1
11 A5 B2 D6 C7 Class1
12 A2 B2 D5 C4 Class1
13 A2 B1 D2 C3 Class1
14 A3 B2 D4 C2 Class2
Search
Space Att1
Att2
Att4
Att3
A1,A2
C2, C6
C7, C4
A3,A4,A5
B1,B2 B1,B2
D1,D2 D3,D4,
D5,D6
C3,C1,C5
Proportion of the classes
0.5 1
Max
18. What Next ?
Application of Multivalue Subsets
Feature Selection Discretization
Supervised, Un-
supervised, Semi-
supervised
Varying
Interval Sizes
Un-supervising the
Supervised
19. Feature Selection
What is Feature Selection ?
Selecting set of attributes that ‘increase the predictive performance and builds
more compact feature subsets’.
All features Filter
Feature
subset
Classifier
All features
Wrapper
Multiple
Feature
subsets
Classifier
Adaptation from the slides of ‘Introduction to Feature Selection’,
Isabelle Guyon.
Two of the commonly used Feature Selection techniques.
20. Feature Selection
What is Feature Selection ?
• Selecting set of attributes that contribute highest to the user’s objective
• This research focuses on identifying a lower bound on the equivalent subset
size while ranking (ID3 based gain criterion vs. MVS based gain criterion )
A B C D E F G Class
Traditional Search Method Objective
Maximizing
Classification
Accuracy
Proposed Search Method
A B C D E F G Class
A2 B3 C2 D4 E32 F45 G56 1
A34 B56 C34 C76 E45 F78 G143 2
A45 B67 C45 C89 E67 F89 G210 2
A56 B109 C76 C76 E78 F121 G301 1
…
…
…
…
…
…
…
…
A134 B231 C453 D456 E99 F201 G567 21
Search
Space
Maximizing the
Information Gain
Search Space
21. Feature Selection
What is Feature Selection ?
Feature Set Classifier Performance
{J, H, E, D, A, C, B, I, G, F} ID3 98%
{J, H, E, D, A, C, B, I, G} ID3 97%
{J, H, E, D, A, C, B, I} ID3 85%
{J, H, E, D, A, C, B} ID3 80%
{J, H, E, D, A, C} ID3 87%
{J, H, E, D, A} ID3 90%
{J, H, E, D} ID3 92%
{J, H, E} ID3 89%
{J, H} ID3 91%
{J} ID3 88%
Sequential Elimination based Selection Process
Rank the features as per
the objective values
Eliminate the lowest
ranked feature
Check for classifier
performance
24. Implications of the work
1. The research was able to identify subsets that effectively provided better
information gain values
2. The Feature Selection process did manage to identify subsets that provided a
lower bound on the classification error
Contribution to the field of Industrial Engineering
Feature Selection.
When identifying subset of factor that would provide better classifier
performances, the proposed method can be used as an effectively subject to
additional testing.