SlideShare a Scribd company logo
1 of 24
Multivalued Subsets under Information Theory
THESIS
Me
Father
Father's Father
Father's
Paternal
Grandfather
Father's
Paternal
Grandmother
Father's
Mother
Father's
Maternal
Grandfather
Father's
Maternal
Grandmother
Mother
Mother's
Father
Mother's
Paternal
Grandfather
Mother's
Paternal
Grandmother
Mother's
Mother
Mother's
Maternal
Grandfather
Mother's
Maternal
Grandmother
Indraneel Dabhade
Key Data Mining Tools
Regression
Decision Tree
Neural Network Clustering
Association Rules
Information Gain
( , ) ( ) ( , )
i i
Gain A S Ent S E A S
 
2
0 ( , ) log
i
Gain A S K
 
Classes
Class1
Class2
Class2
Class3
Class3
Class1
Class1
Class2
A
A1
A1
A2
A3
A4
A5
A5
A5
Classes
Class1
Class2
Class2
Class3
Class3
Class1
Class1
Class2
Ent(S) ( , )
i
E A S
2
1
log
n
i i
i
p p


Entropy =
Terminology Attribute 1
Attribute 4
Attribute 2
Class 1
Class 2
Class 2 Class 1
(A1)
(A2)
(A3)
(B1)
(B2)
(C1) (C2)
Instances Attribute 1 Attribute 2 Attribute 3 Attribute 4 Classes
1 A1 B1 D1 C1 Class2
2 A1 B1 D1 C2 Class2
3 A2 B1 D1 C1 Class1
4 A3 B2 D1 C1 Class1
5 A3 B3 D2 C1 Class1
6 A3 B3 D2 C2 Class2
7 A2 B3 D2 C1 Class1
8 A1 B4 D1 C1 Class2
9 A1 B3 D2 C2 Class1
10 A3 B2 D2 C2 Class1
11 A1 B2 D2 C1 Class1
12 A2 B2 D1 C1 Class1
13 A2 B1 D2 C2 Class1
14 A3 B2 D1 C1 Class2
Class 1
Attribute-values
3 4 2 2
Number of Unique Values
Attribute Information
Gain
1 0.246
2 0.029
3 0.151
4 0.048
A1,
A2,
A3
B1,
B2,
B3,B
4
D1,
D2
C1,
C2
Unique Attribute-Values
Classifiers
Adaptation from the slides of Michael Collins, Discriminative Reranking for Natural Language Parsing, ICML 2000
Given: m examples (x1, y1), …, (xm, ym) where xiÎX, yiÎY={-1, +1}
Initialize D1(i) = 1/m
For t = 1 to T
]
)
(
[
Pr ~ i
i
t
D
i
t y
x
h
t



1. Train learner ht with min error







 

t
t
t



1
ln
2
1
2. Compute the hypothesis weight









i
i
t
i
i
t
t
t
t
y
x
h
e
y
x
h
e
Z
i
D
i
D
t
t
)
(
if
)
(
if
)
(
)
(
1 

3. For each example i = 1 to m






 

T
t
t
t x
h
x
H
1
)
(
sign
)
( 
Output
Adaptive Boosting (Basic)
Classifiers
Adaptive Boosting (Basic)
Adaptation from the slides of Freund and Shapire (1996)
e1 = 0.300
a1=0.424
e2 = 0.196
a2=0.704
= + +
e3 = 0.344
a2=0.323
Weak Classifiers
Strong Classifiers
•Need to extend the 2- class to multi-class learning
•Usage of AdaBoost.M1
Classifiers Classification via Regression
Instances Att1 Att2 Att3 Att4 Classes
1 A1 B1 D1 C1 Class2
2 A1 B1 D1 C2 Class2
3 A2 B1 D1 C1 Class1
4 A3 B2 D1 C1 Class1
5 A3 B3 D2 C1 Class1
6 A3 B3 D2 C2 Class2
7 A2 B3 D2 C1 Class1
Instan
ces
Att1 Att2 Att3 Att4 Classe
s
1 A1 B1 D1 C1 0
2 A1 B1 D1 C2 0
3 A2 B1 D1 C1 1
4 A3 B2 D1 C1 0
5 A3 B3 D2 C1 1
6 A3 B3 D2 C2 0
7 A2 B3 D2 C1 1
Instan
ces
Att1 Att2 Att3 Att4 Classe
s
1 A1 B1 D1 C1 1
2 A1 B1 D1 C2 1
3 A2 B1 D1 C1 0
4 A3 B2 D1 C1 1
5 A3 B3 D2 C1 0
6 A3 B3 D2 C2 1
7 A2 B3 D2 C1 0
Class 1 Class 2…
Test query ( 1 2, 2 1, 3 2, 4 1) ?
f ATT A ATT B ATT D ATT C
    
( 1)
f Class ( 2)
f Class
( 1) 0.1
f Class  ( 2) 0.9
f Class 
Class(Test query) =Class 2
Classifiers
Attribute 1
Attribute 4
Attribute 2
Class 1 Class 2 Class 2
(A1) (A2) (A3)
(B1)
(B2) (C1) (C2)
Class 1
Class 1
Iterative Dichotomizer 3 Decision Tree
Classifiers Naïve Bayesian Classifier
Instance
s
Att1 Att2 Att3 Att4 Classes
1 A1 B1 D1 C1 Class2
2 A1 B1 D1 C2 Class2
3 A2 B1 D1 C1 Class1
4 A3 B2 D1 C1 Class1
5 A3 B3 D2 C1 Class1
6 A3 B3 D2 C2 Class2
7 A2 B3 D2 C1 Class1
8 A1 B4 D1 C1 Class2
9 A1 B3 D2 C2 Class1
10 A3 B2 D2 C2 Class1
11 A1 B2 D2 C1 Class1
12 A2 B2 D1 C1 Class1
13 A2 B1 D2 C2 Class1
14 A3 B2 D1 C1 Class2
Query = < Att1= A1, Att2 =B3,Att3=D1,Att4=C2>
P(Class1)*P(Att1=A1|Class1)*P(Att2=B3|Class1)*P(Att3=D1|Class1)*P(Att4=C2|Class1)=Prop1
P(Class2)*P(Att1=A1|Class2)*P(Att2=B3|Class2)*P(Att3=D1|Class2)*P(Att4=C2|Class2)=Prop2
If (Prop1>Prop2)=> Class(Query)=Prop1
If (Prop1<Prop2)=> Class(Query)=Prop2
Information Gains
Instance
s
Att1
1 A1
2 A1
3 A2
4 A3
5 A4
6 A3
7 A5
8 A1
9 A7
10 A3
11 A9
12 A10
13 A1
14 A6
15 A8
16 A9
17 A10
18 A2
19 A3
20 A6
21 A8
22 A2
23 A1
24 A4
Att1
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
Att1
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
Att1
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
ID3 GID MVS
Subset 1
Subset 2
Rest
Information Gain evaluation for ID3
Insta
nces
Att1 Att n
1 A1 …. N1
2 A1 …. N2
3 A2 …. N5
4 A3 …. N4
5 A3 …. N6
6 A3 …. N7
7 A2 …. N8
8 A1 …. N4
9 A1 …. N5
10 A3 …. N3
11 A1 …. N2
12 A2 …. N5
13 A2 …. N6
14 A3 …. N8
Att1 Class1 Class2 …. Class
n
A1 4 3 …. 4
A2 5 7 …. 5
A3 7 6 …. 0
Class Quanta Identity
…
Att n Class1 Class2 …. Class
n
N1 1 0 …. 5
N2 3 2 …. 9
N3 4 6 …. 3
Att Information Gain
1 0.877
2 0.511
3 1.45
4 1.44
Considering the ‘Iris’ Dataset
Information Gain evaluation for GID
Att1 Class
1
Class
2
…. Clas
s n
A1 4 3 …. 4
Rest 12 13 …. 5
Class Quanta Identity
…
Att n Clas
s1
Class2 …. Class
n
N1 1 0 …. 5
Rest 7 8 …. 12
Att Information Gain
1 0.06886155
2 0.06583857
3 0.162349
4 0.3645836
Considering the ‘Iris’ Dataset
Att1 Class
1
Class
2
…. Clas
s n
A m 4 3 …. 4
Rest 12 13 …. 5
…
Att n Clas
s1
Class2 …. Class
n
NM 1 0 …. 5
Rest 7 8 …. 12
…
…
Insta
nces
Att1 Att n
1 A1 …. N1
2 A1 …. N2
3 A2 …. N5
4 A3 …. N4
5 A3 …. N6
6 A3 …. N7
7 A2 …. N8
8 A1 …. N4
9 A1 …. N5
10 A3 …. N3
11 A1 …. N2
12 A2 …. N5
13 A2 …. N6
14 A3 …. N8
NP Hard
Att1
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
Subset 1
Subset 2
What makes the problem interesting ? Why is it NP-Hard ?
Att Class
A1 1
A1 1
A2 1
A3 2
A3 2
A3 3
A4 4
Att Class
1 1
1 1
0 1
0 2
0 2
0 3
0 4
Att Class
1 1
1 1
1 1
0 2
0 2
0 3
0 4
0.4695652
Information Gain
0.9852281
Information Gain
GID
MVS
Information Gain evaluation for MVS
Att1 Class
1
Class
2
…. Clas
s n
Subset
1
4 3 …. 4
Subset
2
12 13 …. 5
Class Quanta Identity
…
Att n Cla
ss
1
Class2 …. Class
n
Subset
1
1 0 …. 5
Subset
2
7 8 …. 12
Att Information Gain
1 0.128627
2 0.120512
3 0.345634
4 0.618695
Considering the ‘Iris’ Dataset
Att1 Class
1
Class
2
…. Clas
s n
Subset
1
4 3 …. 4
Subset
2
12 13 …. 5
…
Att n Clas
s1
Class2 …. Class
n
Subset
1
1 0 …. 5
Subset
2
7 8 …. 12
…
…
Insta
nces
Att1 Att n
1 A1 …. N1
2 A1 …. N2
3 A2 …. N5
4 A3 …. N4
5 A3 …. N6
6 A3 …. N7
7 A2 …. N8
8 A1 …. N4
9 A1 …. N5
10 A3 …. N3
11 A1 …. N2
12 A2 …. N5
13 A2 …. N6
14 A3 …. N8
Testing Conditions
Dataset Instances Attributes Unique
Values
Data Type Missing
Values
Iris 150 4 22-43 Fractional No
Glass 214 9 32-178 Fractional No
Images 4435 36 49-104 Integer Missing
Class ‘6’
PenDigits* 7494* 16 96-101 Integer No
Vehicles 846 18 13-424 Integer No
Datasets
Palmetto High Performance Computing
Wall time runs of 50 hours (in parallel)
Use of ‘Mersenne Twister’ pseudorandom number generator
* Reduced to 4350
Classifier Time to
compute
Nature Rule Generation
AdaBoost Low Deterministic
/Probabilistic
Function of Sample
Size and weighted
predictions
ID3 Low Deterministic Robust Rule
Regression High Deterministic Robust Rule
Naïve Bayesian Low Probabilistic Robust Rule
Classification Algorithms
Multisubset variant using the Adaptive Simulated Annealing
generate initial solution
initialize
begin
initialize
while {
begin
initialize
while
begin
Binary-Rand( )
form CQI for the binary subsets
if solution< then change
if solution> = then change
evaluate Δ= -L_
if Δ>0 then L_ =
if Δ<0 then if then L_ =
then =
end
lower
end
end
Fl,Fh,Ebest,Econfig
To,Tend
To > Tend
Lb,I,Lt
Lt < (Lb + I)
nx1
Fl Fl
Fh Fh
Solcurr Solcurr
Solcurr Solcurr
eD/To
> Rand(1) Solcurr Solcurr
Ebest Solcurr
Lt = Lb + (Lb.(1- e)
-(Fh-Fl)
Fh
)
To
Instances Att1 Att2 Att3 Att4 Classe
s
1 A1 B1 D1 C1 Class2
2 A1 B1 D1 C2 Class2
3 A4 B1 D1 C3 Class1
4 A3 B2 D1 C4 Class1
5 A3 B3 D2 C5 Class1
6 A3 B3 D2 C6 Class2
7 A2 B3 D2 C5 Class1
8 A4 B4 D3 C3 Class2
9 A1 B3 D2 C1 Class1
10 A3 B2 D2 C2 Class1
11 A5 B2 D6 C7 Class1
12 A2 B2 D5 C4 Class1
13 A2 B1 D2 C3 Class1
14 A3 B2 D4 C2 Class2
Search
Space Att1
Att2
Att4
Att3
A1,A2
C2, C6
C7, C4
A3,A4,A5
B1,B2 B1,B2
D1,D2 D3,D4,
D5,D6
C3,C1,C5
Proportion of the classes
0.5 1
Max
Criterion for the Classifiers Information Gain Decision Trees
Att1
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
Att1
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
Att1
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
ID3 GID
Subset 1
Subset 2
Rest
MVS
? ?
What Next ?
Application of Multivalue Subsets
Feature Selection Discretization
Supervised, Un-
supervised, Semi-
supervised
Varying
Interval Sizes
Un-supervising the
Supervised
Feature Selection
What is Feature Selection ?
Selecting set of attributes that ‘increase the predictive performance and builds
more compact feature subsets’.
All features Filter
Feature
subset
Classifier
All features
Wrapper
Multiple
Feature
subsets
Classifier
Adaptation from the slides of ‘Introduction to Feature Selection’,
Isabelle Guyon.
Two of the commonly used Feature Selection techniques.
Feature Selection
What is Feature Selection ?
• Selecting set of attributes that contribute highest to the user’s objective
• This research focuses on identifying a lower bound on the equivalent subset
size while ranking (ID3 based gain criterion vs. MVS based gain criterion )
A B C D E F G Class
Traditional Search Method Objective
Maximizing
Classification
Accuracy
Proposed Search Method
A B C D E F G Class
A2 B3 C2 D4 E32 F45 G56 1
A34 B56 C34 C76 E45 F78 G143 2
A45 B67 C45 C89 E67 F89 G210 2
A56 B109 C76 C76 E78 F121 G301 1
…
…
…
…
…
…
…
…
A134 B231 C453 D456 E99 F201 G567 21
Search
Space
Maximizing the
Information Gain
Search Space
Feature Selection
What is Feature Selection ?
Feature Set Classifier Performance
{J, H, E, D, A, C, B, I, G, F} ID3 98%
{J, H, E, D, A, C, B, I, G} ID3 97%
{J, H, E, D, A, C, B, I} ID3 85%
{J, H, E, D, A, C, B} ID3 80%
{J, H, E, D, A, C} ID3 87%
{J, H, E, D, A} ID3 90%
{J, H, E, D} ID3 92%
{J, H, E} ID3 89%
{J, H} ID3 91%
{J} ID3 88%
Sequential Elimination based Selection Process
Rank the features as per
the objective values
Eliminate the lowest
ranked feature
Check for classifier
performance
Feature Selection Feature Selection for ‘Iris’ Dataset
0.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
1.40E+00
1.60E+00
1 2 3 4
GID
MVS
ID3
Attribute
Information
Gain
Att ID3 GID MVS
1 0.877 0.06886155 0.128627
2 0.511 0.06583857 0.120512
3 1.45 0.162349 0.345634
4 1.44 0.3645836 0.618695
Feature Selection Feature Selection for ‘Vehicles’ Dataset
Information Gain for ID3 Attribute  Information Gain for MVS  Attribute
1.38E+00 12 0.252924 9
8.12E‐01 7 0.212451 6
7.88E‐01 11 0.144444 8
6.14E‐01 4 0.093892 3
6.13E‐01 8 0.083239 2
5.99E‐01 13 0.066396 14
5.78E‐01 3 0.058774 10
4.83E‐01 9 0.057838 17
3.66E‐01 10 0.051468 18
3.37E‐01 6 0.048565 7
3.26E‐01 1 0.045457 11
3.08E‐01 2 0.037533 1
2.77E‐01 14 0.033462 5
2.40E‐01 17 0.033196 4
2.31E‐01 18 0.03098 13
2.12E‐01 5 0.025649 12
1.82E‐01 16 0.018691 15
1.02E‐01 15 0.015805 16
0.00E+00 
2.00E‐01 
4.00E‐01 
6.00E‐01 
8.00E‐01 
1.00E+00 
1.20E+00 
1.40E+00 
1.60E+00 
1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18 
GID 
Mul value Subset 
ID3 
A ributes 
Informa
on Gain  
Information Gain ID3 Ranking MVS Ranking
Classifier Error Classifier Error
12,7,11,4,8,13,3,9,10,6,1,2,14,17,18,5,16,15 55.32% 9,6,8,3,2,14,10,17,18,7,11,1,5,4,13,12,15,16 55.32%
12,7,11,4,8,13,3,9,10,6,1,2,14,17,18,5,16 52.07% 9,6,8,3,2,14,10,17,18,7,11,1,5,4,13,12,15 55.67%
12,7,11,4,8,13,3,9,10,6,1,2,14,17,18,5 53.31% 9,6,8,3,2,14,10,17,18,7,11,1,5,4,13,12 53.42%
12,7,11,4,8,13,3,9,10,6,1,2,14,17,18 54.26% 9,6,8,3,2,14,10,17,18,7,11,1,5,4,13 54.13%
12,7,11,4,8,13,3,9,10,6,1,2,14,17 54.25% 9,6,8,3,2,14,10,17,18,7,11,1,5,4 53.90%
12,7,11,4,8,13,3,9,10,6,1,2,14 54.14% 9,6,8,3,2,14,10,17,18,7,11,1,5 53.90%
12,7,11,4,8,13,3,9,10,6,1,2 54.02% 9,6,8,3,2,14,10,17,18,7,11,1 53.90%
12,7,11,4,8,13,3,9,10,6,1 54.37% 9,6,8,3,2,14,10,17,18,7,11 54.72%
12,7,11,4,8,13,3,9,10,6 54.02% 9,6,8,3,2,14,10,17,18,7 54.72%
12,7,11,4,8,13,3,9,10 54.02% 9,6,8,3,2,14,10,17,18 55.56%
12,7,11,4,8,13,3,9 57.92% 9,6,8,3,2,14,10,17 55.55%
12,7,11,4,8,13,3 57.45% 9,6,8,3,2,14,10 55.67%
12,7,11,4,8,13 59.46% 9,6,8,3,2,14 58.86%
12,7,11,4,8 59.22% 9,6,8,3,2 58.75%
12,7,11,4 59.22% 9,6,8,3 59.57%
12,7,11 59.22% 9,6,8 61.35%
12,7 60.52% 9,6 62.65%
12 60.52% 9 62.64%
Implications of the work
1. The research was able to identify subsets that effectively provided better
information gain values
2. The Feature Selection process did manage to identify subsets that provided a
lower bound on the classification error
Contribution to the field of Industrial Engineering
Feature Selection.
When identifying subset of factor that would provide better classifier
performances, the proposed method can be used as an effectively subject to
additional testing.

More Related Content

What's hot

Document Analysis with Deep Learning
Document Analysis with Deep LearningDocument Analysis with Deep Learning
Document Analysis with Deep Learningaiaioo
 
B.Sc.IT: Semester - VI (April - 2015) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (April - 2015) [IDOL - Revised Course | Question Paper]B.Sc.IT: Semester - VI (April - 2015) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (April - 2015) [IDOL - Revised Course | Question Paper]Mumbai B.Sc.IT Study
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Tatsuya Yokota
 

What's hot (8)

Document Analysis with Deep Learning
Document Analysis with Deep LearningDocument Analysis with Deep Learning
Document Analysis with Deep Learning
 
Gate-Cs 1998
Gate-Cs 1998Gate-Cs 1998
Gate-Cs 1998
 
B.Sc.IT: Semester - VI (April - 2015) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (April - 2015) [IDOL - Revised Course | Question Paper]B.Sc.IT: Semester - VI (April - 2015) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (April - 2015) [IDOL - Revised Course | Question Paper]
 
Daa q.paper
Daa q.paperDaa q.paper
Daa q.paper
 
Cs 2002
Cs 2002Cs 2002
Cs 2002
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Cs 2003
Cs 2003Cs 2003
Cs 2003
 

Similar to Multivalued Subsets Under Information Theory

An optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideAn optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideWooSung Choi
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clusteringaiiniR
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
Smarter Measure ReflectionThis reflection paper is to be typed a.docx
Smarter Measure ReflectionThis reflection paper is to be typed a.docxSmarter Measure ReflectionThis reflection paper is to be typed a.docx
Smarter Measure ReflectionThis reflection paper is to be typed a.docxbudabrooks46239
 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational DataDecision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Dataijcax
 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational Data Decision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Data ijcax
 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational DataDecision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Dataijcax
 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational Data Decision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Data ijcax
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering108kaushik
 
Analysis and design of algorithms part2
Analysis and design of algorithms part2Analysis and design of algorithms part2
Analysis and design of algorithms part2Deepak John
 
About decision tree induction which helps in learning
About decision tree induction  which helps in learningAbout decision tree induction  which helps in learning
About decision tree induction which helps in learningGReshma10
 
Special topics about stocks and bonds using algebra
Special topics about stocks and bonds using algebraSpecial topics about stocks and bonds using algebra
Special topics about stocks and bonds using algebraRomualdoDayrit1
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applicationsMerge sort analysis and its real time applications
Merge sort analysis and its real time applicationsyazad dumasia
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET Journal
 
MLSD18. Unsupervised Learning
MLSD18. Unsupervised LearningMLSD18. Unsupervised Learning
MLSD18. Unsupervised LearningBigML, Inc
 
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDustiBuckner14
 

Similar to Multivalued Subsets Under Information Theory (20)

An optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideAn optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slide
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clustering
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
Smarter Measure ReflectionThis reflection paper is to be typed a.docx
Smarter Measure ReflectionThis reflection paper is to be typed a.docxSmarter Measure ReflectionThis reflection paper is to be typed a.docx
Smarter Measure ReflectionThis reflection paper is to be typed a.docx
 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational DataDecision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Data
 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational Data Decision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Data
 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational DataDecision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Data
 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational Data Decision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Data
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
Analysis and design of algorithms part2
Analysis and design of algorithms part2Analysis and design of algorithms part2
Analysis and design of algorithms part2
 
About decision tree induction which helps in learning
About decision tree induction  which helps in learningAbout decision tree induction  which helps in learning
About decision tree induction which helps in learning
 
Special topics about stocks and bonds using algebra
Special topics about stocks and bonds using algebraSpecial topics about stocks and bonds using algebra
Special topics about stocks and bonds using algebra
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applicationsMerge sort analysis and its real time applications
Merge sort analysis and its real time applications
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using Clustering
 
MLSD18. Unsupervised Learning
MLSD18. Unsupervised LearningMLSD18. Unsupervised Learning
MLSD18. Unsupervised Learning
 
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
 

Recently uploaded

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Multivalued Subsets Under Information Theory

  • 1. Multivalued Subsets under Information Theory THESIS Me Father Father's Father Father's Paternal Grandfather Father's Paternal Grandmother Father's Mother Father's Maternal Grandfather Father's Maternal Grandmother Mother Mother's Father Mother's Paternal Grandfather Mother's Paternal Grandmother Mother's Mother Mother's Maternal Grandfather Mother's Maternal Grandmother Indraneel Dabhade
  • 2. Key Data Mining Tools Regression Decision Tree Neural Network Clustering Association Rules
  • 3. Information Gain ( , ) ( ) ( , ) i i Gain A S Ent S E A S   2 0 ( , ) log i Gain A S K   Classes Class1 Class2 Class2 Class3 Class3 Class1 Class1 Class2 A A1 A1 A2 A3 A4 A5 A5 A5 Classes Class1 Class2 Class2 Class3 Class3 Class1 Class1 Class2 Ent(S) ( , ) i E A S 2 1 log n i i i p p   Entropy =
  • 4. Terminology Attribute 1 Attribute 4 Attribute 2 Class 1 Class 2 Class 2 Class 1 (A1) (A2) (A3) (B1) (B2) (C1) (C2) Instances Attribute 1 Attribute 2 Attribute 3 Attribute 4 Classes 1 A1 B1 D1 C1 Class2 2 A1 B1 D1 C2 Class2 3 A2 B1 D1 C1 Class1 4 A3 B2 D1 C1 Class1 5 A3 B3 D2 C1 Class1 6 A3 B3 D2 C2 Class2 7 A2 B3 D2 C1 Class1 8 A1 B4 D1 C1 Class2 9 A1 B3 D2 C2 Class1 10 A3 B2 D2 C2 Class1 11 A1 B2 D2 C1 Class1 12 A2 B2 D1 C1 Class1 13 A2 B1 D2 C2 Class1 14 A3 B2 D1 C1 Class2 Class 1 Attribute-values 3 4 2 2 Number of Unique Values Attribute Information Gain 1 0.246 2 0.029 3 0.151 4 0.048 A1, A2, A3 B1, B2, B3,B 4 D1, D2 C1, C2 Unique Attribute-Values
  • 5. Classifiers Adaptation from the slides of Michael Collins, Discriminative Reranking for Natural Language Parsing, ICML 2000 Given: m examples (x1, y1), …, (xm, ym) where xiÎX, yiÎY={-1, +1} Initialize D1(i) = 1/m For t = 1 to T ] ) ( [ Pr ~ i i t D i t y x h t    1. Train learner ht with min error           t t t    1 ln 2 1 2. Compute the hypothesis weight          i i t i i t t t t y x h e y x h e Z i D i D t t ) ( if ) ( if ) ( ) ( 1   3. For each example i = 1 to m          T t t t x h x H 1 ) ( sign ) (  Output Adaptive Boosting (Basic)
  • 6. Classifiers Adaptive Boosting (Basic) Adaptation from the slides of Freund and Shapire (1996) e1 = 0.300 a1=0.424 e2 = 0.196 a2=0.704 = + + e3 = 0.344 a2=0.323 Weak Classifiers Strong Classifiers •Need to extend the 2- class to multi-class learning •Usage of AdaBoost.M1
  • 7. Classifiers Classification via Regression Instances Att1 Att2 Att3 Att4 Classes 1 A1 B1 D1 C1 Class2 2 A1 B1 D1 C2 Class2 3 A2 B1 D1 C1 Class1 4 A3 B2 D1 C1 Class1 5 A3 B3 D2 C1 Class1 6 A3 B3 D2 C2 Class2 7 A2 B3 D2 C1 Class1 Instan ces Att1 Att2 Att3 Att4 Classe s 1 A1 B1 D1 C1 0 2 A1 B1 D1 C2 0 3 A2 B1 D1 C1 1 4 A3 B2 D1 C1 0 5 A3 B3 D2 C1 1 6 A3 B3 D2 C2 0 7 A2 B3 D2 C1 1 Instan ces Att1 Att2 Att3 Att4 Classe s 1 A1 B1 D1 C1 1 2 A1 B1 D1 C2 1 3 A2 B1 D1 C1 0 4 A3 B2 D1 C1 1 5 A3 B3 D2 C1 0 6 A3 B3 D2 C2 1 7 A2 B3 D2 C1 0 Class 1 Class 2… Test query ( 1 2, 2 1, 3 2, 4 1) ? f ATT A ATT B ATT D ATT C      ( 1) f Class ( 2) f Class ( 1) 0.1 f Class  ( 2) 0.9 f Class  Class(Test query) =Class 2
  • 8. Classifiers Attribute 1 Attribute 4 Attribute 2 Class 1 Class 2 Class 2 (A1) (A2) (A3) (B1) (B2) (C1) (C2) Class 1 Class 1 Iterative Dichotomizer 3 Decision Tree
  • 9. Classifiers Naïve Bayesian Classifier Instance s Att1 Att2 Att3 Att4 Classes 1 A1 B1 D1 C1 Class2 2 A1 B1 D1 C2 Class2 3 A2 B1 D1 C1 Class1 4 A3 B2 D1 C1 Class1 5 A3 B3 D2 C1 Class1 6 A3 B3 D2 C2 Class2 7 A2 B3 D2 C1 Class1 8 A1 B4 D1 C1 Class2 9 A1 B3 D2 C2 Class1 10 A3 B2 D2 C2 Class1 11 A1 B2 D2 C1 Class1 12 A2 B2 D1 C1 Class1 13 A2 B1 D2 C2 Class1 14 A3 B2 D1 C1 Class2 Query = < Att1= A1, Att2 =B3,Att3=D1,Att4=C2> P(Class1)*P(Att1=A1|Class1)*P(Att2=B3|Class1)*P(Att3=D1|Class1)*P(Att4=C2|Class1)=Prop1 P(Class2)*P(Att1=A1|Class2)*P(Att2=B3|Class2)*P(Att3=D1|Class2)*P(Att4=C2|Class2)=Prop2 If (Prop1>Prop2)=> Class(Query)=Prop1 If (Prop1<Prop2)=> Class(Query)=Prop2
  • 10. Information Gains Instance s Att1 1 A1 2 A1 3 A2 4 A3 5 A4 6 A3 7 A5 8 A1 9 A7 10 A3 11 A9 12 A10 13 A1 14 A6 15 A8 16 A9 17 A10 18 A2 19 A3 20 A6 21 A8 22 A2 23 A1 24 A4 Att1 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Att1 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Att1 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 ID3 GID MVS Subset 1 Subset 2 Rest
  • 11. Information Gain evaluation for ID3 Insta nces Att1 Att n 1 A1 …. N1 2 A1 …. N2 3 A2 …. N5 4 A3 …. N4 5 A3 …. N6 6 A3 …. N7 7 A2 …. N8 8 A1 …. N4 9 A1 …. N5 10 A3 …. N3 11 A1 …. N2 12 A2 …. N5 13 A2 …. N6 14 A3 …. N8 Att1 Class1 Class2 …. Class n A1 4 3 …. 4 A2 5 7 …. 5 A3 7 6 …. 0 Class Quanta Identity … Att n Class1 Class2 …. Class n N1 1 0 …. 5 N2 3 2 …. 9 N3 4 6 …. 3 Att Information Gain 1 0.877 2 0.511 3 1.45 4 1.44 Considering the ‘Iris’ Dataset
  • 12. Information Gain evaluation for GID Att1 Class 1 Class 2 …. Clas s n A1 4 3 …. 4 Rest 12 13 …. 5 Class Quanta Identity … Att n Clas s1 Class2 …. Class n N1 1 0 …. 5 Rest 7 8 …. 12 Att Information Gain 1 0.06886155 2 0.06583857 3 0.162349 4 0.3645836 Considering the ‘Iris’ Dataset Att1 Class 1 Class 2 …. Clas s n A m 4 3 …. 4 Rest 12 13 …. 5 … Att n Clas s1 Class2 …. Class n NM 1 0 …. 5 Rest 7 8 …. 12 … … Insta nces Att1 Att n 1 A1 …. N1 2 A1 …. N2 3 A2 …. N5 4 A3 …. N4 5 A3 …. N6 6 A3 …. N7 7 A2 …. N8 8 A1 …. N4 9 A1 …. N5 10 A3 …. N3 11 A1 …. N2 12 A2 …. N5 13 A2 …. N6 14 A3 …. N8
  • 13. NP Hard Att1 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Subset 1 Subset 2 What makes the problem interesting ? Why is it NP-Hard ? Att Class A1 1 A1 1 A2 1 A3 2 A3 2 A3 3 A4 4 Att Class 1 1 1 1 0 1 0 2 0 2 0 3 0 4 Att Class 1 1 1 1 1 1 0 2 0 2 0 3 0 4 0.4695652 Information Gain 0.9852281 Information Gain GID MVS
  • 14. Information Gain evaluation for MVS Att1 Class 1 Class 2 …. Clas s n Subset 1 4 3 …. 4 Subset 2 12 13 …. 5 Class Quanta Identity … Att n Cla ss 1 Class2 …. Class n Subset 1 1 0 …. 5 Subset 2 7 8 …. 12 Att Information Gain 1 0.128627 2 0.120512 3 0.345634 4 0.618695 Considering the ‘Iris’ Dataset Att1 Class 1 Class 2 …. Clas s n Subset 1 4 3 …. 4 Subset 2 12 13 …. 5 … Att n Clas s1 Class2 …. Class n Subset 1 1 0 …. 5 Subset 2 7 8 …. 12 … … Insta nces Att1 Att n 1 A1 …. N1 2 A1 …. N2 3 A2 …. N5 4 A3 …. N4 5 A3 …. N6 6 A3 …. N7 7 A2 …. N8 8 A1 …. N4 9 A1 …. N5 10 A3 …. N3 11 A1 …. N2 12 A2 …. N5 13 A2 …. N6 14 A3 …. N8
  • 15. Testing Conditions Dataset Instances Attributes Unique Values Data Type Missing Values Iris 150 4 22-43 Fractional No Glass 214 9 32-178 Fractional No Images 4435 36 49-104 Integer Missing Class ‘6’ PenDigits* 7494* 16 96-101 Integer No Vehicles 846 18 13-424 Integer No Datasets Palmetto High Performance Computing Wall time runs of 50 hours (in parallel) Use of ‘Mersenne Twister’ pseudorandom number generator * Reduced to 4350 Classifier Time to compute Nature Rule Generation AdaBoost Low Deterministic /Probabilistic Function of Sample Size and weighted predictions ID3 Low Deterministic Robust Rule Regression High Deterministic Robust Rule Naïve Bayesian Low Probabilistic Robust Rule Classification Algorithms
  • 16. Multisubset variant using the Adaptive Simulated Annealing generate initial solution initialize begin initialize while { begin initialize while begin Binary-Rand( ) form CQI for the binary subsets if solution< then change if solution> = then change evaluate Δ= -L_ if Δ>0 then L_ = if Δ<0 then if then L_ = then = end lower end end Fl,Fh,Ebest,Econfig To,Tend To > Tend Lb,I,Lt Lt < (Lb + I) nx1 Fl Fl Fh Fh Solcurr Solcurr Solcurr Solcurr eD/To > Rand(1) Solcurr Solcurr Ebest Solcurr Lt = Lb + (Lb.(1- e) -(Fh-Fl) Fh ) To Instances Att1 Att2 Att3 Att4 Classe s 1 A1 B1 D1 C1 Class2 2 A1 B1 D1 C2 Class2 3 A4 B1 D1 C3 Class1 4 A3 B2 D1 C4 Class1 5 A3 B3 D2 C5 Class1 6 A3 B3 D2 C6 Class2 7 A2 B3 D2 C5 Class1 8 A4 B4 D3 C3 Class2 9 A1 B3 D2 C1 Class1 10 A3 B2 D2 C2 Class1 11 A5 B2 D6 C7 Class1 12 A2 B2 D5 C4 Class1 13 A2 B1 D2 C3 Class1 14 A3 B2 D4 C2 Class2 Search Space Att1 Att2 Att4 Att3 A1,A2 C2, C6 C7, C4 A3,A4,A5 B1,B2 B1,B2 D1,D2 D3,D4, D5,D6 C3,C1,C5 Proportion of the classes 0.5 1 Max
  • 17. Criterion for the Classifiers Information Gain Decision Trees Att1 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Att1 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Att1 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 ID3 GID Subset 1 Subset 2 Rest MVS ? ?
  • 18. What Next ? Application of Multivalue Subsets Feature Selection Discretization Supervised, Un- supervised, Semi- supervised Varying Interval Sizes Un-supervising the Supervised
  • 19. Feature Selection What is Feature Selection ? Selecting set of attributes that ‘increase the predictive performance and builds more compact feature subsets’. All features Filter Feature subset Classifier All features Wrapper Multiple Feature subsets Classifier Adaptation from the slides of ‘Introduction to Feature Selection’, Isabelle Guyon. Two of the commonly used Feature Selection techniques.
  • 20. Feature Selection What is Feature Selection ? • Selecting set of attributes that contribute highest to the user’s objective • This research focuses on identifying a lower bound on the equivalent subset size while ranking (ID3 based gain criterion vs. MVS based gain criterion ) A B C D E F G Class Traditional Search Method Objective Maximizing Classification Accuracy Proposed Search Method A B C D E F G Class A2 B3 C2 D4 E32 F45 G56 1 A34 B56 C34 C76 E45 F78 G143 2 A45 B67 C45 C89 E67 F89 G210 2 A56 B109 C76 C76 E78 F121 G301 1 … … … … … … … … A134 B231 C453 D456 E99 F201 G567 21 Search Space Maximizing the Information Gain Search Space
  • 21. Feature Selection What is Feature Selection ? Feature Set Classifier Performance {J, H, E, D, A, C, B, I, G, F} ID3 98% {J, H, E, D, A, C, B, I, G} ID3 97% {J, H, E, D, A, C, B, I} ID3 85% {J, H, E, D, A, C, B} ID3 80% {J, H, E, D, A, C} ID3 87% {J, H, E, D, A} ID3 90% {J, H, E, D} ID3 92% {J, H, E} ID3 89% {J, H} ID3 91% {J} ID3 88% Sequential Elimination based Selection Process Rank the features as per the objective values Eliminate the lowest ranked feature Check for classifier performance
  • 22. Feature Selection Feature Selection for ‘Iris’ Dataset 0.00E+00 2.00E-01 4.00E-01 6.00E-01 8.00E-01 1.00E+00 1.20E+00 1.40E+00 1.60E+00 1 2 3 4 GID MVS ID3 Attribute Information Gain Att ID3 GID MVS 1 0.877 0.06886155 0.128627 2 0.511 0.06583857 0.120512 3 1.45 0.162349 0.345634 4 1.44 0.3645836 0.618695
  • 23. Feature Selection Feature Selection for ‘Vehicles’ Dataset Information Gain for ID3 Attribute  Information Gain for MVS  Attribute 1.38E+00 12 0.252924 9 8.12E‐01 7 0.212451 6 7.88E‐01 11 0.144444 8 6.14E‐01 4 0.093892 3 6.13E‐01 8 0.083239 2 5.99E‐01 13 0.066396 14 5.78E‐01 3 0.058774 10 4.83E‐01 9 0.057838 17 3.66E‐01 10 0.051468 18 3.37E‐01 6 0.048565 7 3.26E‐01 1 0.045457 11 3.08E‐01 2 0.037533 1 2.77E‐01 14 0.033462 5 2.40E‐01 17 0.033196 4 2.31E‐01 18 0.03098 13 2.12E‐01 5 0.025649 12 1.82E‐01 16 0.018691 15 1.02E‐01 15 0.015805 16 0.00E+00  2.00E‐01  4.00E‐01  6.00E‐01  8.00E‐01  1.00E+00  1.20E+00  1.40E+00  1.60E+00  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  GID  Mul value Subset  ID3  A ributes  Informa on Gain   Information Gain ID3 Ranking MVS Ranking Classifier Error Classifier Error 12,7,11,4,8,13,3,9,10,6,1,2,14,17,18,5,16,15 55.32% 9,6,8,3,2,14,10,17,18,7,11,1,5,4,13,12,15,16 55.32% 12,7,11,4,8,13,3,9,10,6,1,2,14,17,18,5,16 52.07% 9,6,8,3,2,14,10,17,18,7,11,1,5,4,13,12,15 55.67% 12,7,11,4,8,13,3,9,10,6,1,2,14,17,18,5 53.31% 9,6,8,3,2,14,10,17,18,7,11,1,5,4,13,12 53.42% 12,7,11,4,8,13,3,9,10,6,1,2,14,17,18 54.26% 9,6,8,3,2,14,10,17,18,7,11,1,5,4,13 54.13% 12,7,11,4,8,13,3,9,10,6,1,2,14,17 54.25% 9,6,8,3,2,14,10,17,18,7,11,1,5,4 53.90% 12,7,11,4,8,13,3,9,10,6,1,2,14 54.14% 9,6,8,3,2,14,10,17,18,7,11,1,5 53.90% 12,7,11,4,8,13,3,9,10,6,1,2 54.02% 9,6,8,3,2,14,10,17,18,7,11,1 53.90% 12,7,11,4,8,13,3,9,10,6,1 54.37% 9,6,8,3,2,14,10,17,18,7,11 54.72% 12,7,11,4,8,13,3,9,10,6 54.02% 9,6,8,3,2,14,10,17,18,7 54.72% 12,7,11,4,8,13,3,9,10 54.02% 9,6,8,3,2,14,10,17,18 55.56% 12,7,11,4,8,13,3,9 57.92% 9,6,8,3,2,14,10,17 55.55% 12,7,11,4,8,13,3 57.45% 9,6,8,3,2,14,10 55.67% 12,7,11,4,8,13 59.46% 9,6,8,3,2,14 58.86% 12,7,11,4,8 59.22% 9,6,8,3,2 58.75% 12,7,11,4 59.22% 9,6,8,3 59.57% 12,7,11 59.22% 9,6,8 61.35% 12,7 60.52% 9,6 62.65% 12 60.52% 9 62.64%
  • 24. Implications of the work 1. The research was able to identify subsets that effectively provided better information gain values 2. The Feature Selection process did manage to identify subsets that provided a lower bound on the classification error Contribution to the field of Industrial Engineering Feature Selection. When identifying subset of factor that would provide better classifier performances, the proposed method can be used as an effectively subject to additional testing.