United International University
​​Assignment-01
​​Title: Classification by Clustering.
Course : Pattern Recognition Lab
Course Code : CSI 416
Submitted To :
Dr.Dewan Md.Farid
Associate Professor,CSE
United International University.
Submitted By :
Nazmul Hyder
Id : 011 131 085
Section : SB
Date : 06.08.2017
Dataset 1: Mushroom
Data set
characteristics:
Multivariat
e
Number of
instances :
8124 Area : Life
Attribute
characteristics:
Categorical Number of
attribute :
22 Data
denoted:
1987-09-01
Associate Task: Classificat
ion
Missing values: Yes Number of
web hits :
261721
Using Decision Tree (j4.8) :
Accuracy : 66.9621%
Error : 33.0379%
After simple k means clustering :
Number of cluster:2
Using DT(j4.8):
Accuracy : 99.4953%
Error : 0.5047%
Number of cluster:4
Using DT(j4.8):
Accuracy : 99.8523%
Error : 0.1477%
Number of cluster:6
Using DT(j4.8):
Accuracy : 99.483%
Error : 0.517%
Observation : Accuracy increased after the clustering compared to the initial accuracy .
Dataset 2: Wine-Quality-Red
Data set
characteristics:
Multivariat
e
Number of
instances :
1599 Area : Business
Attribute
characteristics:
Real Number of
attribute :
12 Data
denoted:
2009-10-07
Associate Task: Classificat
ion,
Regression
Missing values: No Number of
web hits :
475478
Using Decision Tree (j4.8) :
Accuracy : 90.9944%
Error : 9.0056%
After simple k means clustering :
Number of cluster:2
Using DT(j4.8):
Accuracy : 99.1245%
Error : 0.8755%
Number of cluster:4
Using DT(j4.8):
Accuracy : 97.8737%
Error : 2.1263%
Number of cluster:6
Using DT(j4.8):
Accuracy : 97.8737%
Error : 2.1263%
Observation : Accuracy increased after the clustering compared to the initial accuracy .
Dataset 3 : ZOO
Data set
characteristics:
Multivariat
e
Number of
instances :
101 Area : Life
Attribute
characteristics:
Categorical
, Integer
Number of
attribute :
17 Data
denoted:
1990-05-15
Associate Task: Classificat
ion
Missing values: No Number of
web hits :
172527
Using Decision Tree (j4.8) :
Accuracy : 99.0196%
Error : 0.9804%
After simple k means clustering :
Number of cluster : 2
Using DT(j4.8):
Accuracy : 100%
Error : 0%
Number of cluster : 4
Using DT(j4.8):
Accuracy : 97.0588%
Error : 2.9412%
Number of cluster : 6
Using DT(j4.8):
Accuracy : 96.0784%
Error :3.9216%
Observation : Accuracy increased after the clustering compared to the initial accuracy .
Dataset 3 : flags
Data set
characteristics:
Multivariate Number of
instances :
194 Area : N/A
Attribute
characteristics:
Categorical,
Integer
Number of
attribute :
30 Data
denoted:
1990-05-15
Associate Task: Classificati
on
Missing values: No Number of
web hits :
159411
Using Decision Tree (j4.8) :
Accuracy : 81.9588%
Error : 18.0412%
After simple k means clustering :
Number of cluster : 2
Using DT(j4.8):
Accuracy : 92.7835%
Error : 7.2165%
Number of cluster : 4
Using DT(j4.8):
Accuracy :88.6598%
Error :11.3402%
Number of cluster : 6
Using DT(j4.8):
Accuracy : 86.5979%
Error :13.4021%
Observation : Accuracy increased after the clustering compared to the initial accuracy .

Classification by clustering

  • 1.
    United International University ​​Assignment-01 ​​Title:Classification by Clustering. Course : Pattern Recognition Lab Course Code : CSI 416 Submitted To : Dr.Dewan Md.Farid Associate Professor,CSE United International University. Submitted By : Nazmul Hyder Id : 011 131 085 Section : SB Date : 06.08.2017
  • 2.
    Dataset 1: Mushroom Dataset characteristics: Multivariat e Number of instances : 8124 Area : Life Attribute characteristics: Categorical Number of attribute : 22 Data denoted: 1987-09-01 Associate Task: Classificat ion Missing values: Yes Number of web hits : 261721 Using Decision Tree (j4.8) : Accuracy : 66.9621% Error : 33.0379% After simple k means clustering : Number of cluster:2 Using DT(j4.8): Accuracy : 99.4953% Error : 0.5047% Number of cluster:4 Using DT(j4.8): Accuracy : 99.8523% Error : 0.1477% Number of cluster:6 Using DT(j4.8): Accuracy : 99.483% Error : 0.517% Observation : Accuracy increased after the clustering compared to the initial accuracy . Dataset 2: Wine-Quality-Red Data set characteristics: Multivariat e Number of instances : 1599 Area : Business Attribute characteristics: Real Number of attribute : 12 Data denoted: 2009-10-07 Associate Task: Classificat ion, Regression Missing values: No Number of web hits : 475478 Using Decision Tree (j4.8) : Accuracy : 90.9944% Error : 9.0056%
  • 3.
    After simple kmeans clustering : Number of cluster:2 Using DT(j4.8): Accuracy : 99.1245% Error : 0.8755% Number of cluster:4 Using DT(j4.8): Accuracy : 97.8737% Error : 2.1263% Number of cluster:6 Using DT(j4.8): Accuracy : 97.8737% Error : 2.1263% Observation : Accuracy increased after the clustering compared to the initial accuracy . Dataset 3 : ZOO Data set characteristics: Multivariat e Number of instances : 101 Area : Life Attribute characteristics: Categorical , Integer Number of attribute : 17 Data denoted: 1990-05-15 Associate Task: Classificat ion Missing values: No Number of web hits : 172527 Using Decision Tree (j4.8) : Accuracy : 99.0196% Error : 0.9804% After simple k means clustering : Number of cluster : 2 Using DT(j4.8): Accuracy : 100% Error : 0% Number of cluster : 4 Using DT(j4.8): Accuracy : 97.0588% Error : 2.9412% Number of cluster : 6 Using DT(j4.8): Accuracy : 96.0784% Error :3.9216% Observation : Accuracy increased after the clustering compared to the initial accuracy .
  • 4.
    Dataset 3 :flags Data set characteristics: Multivariate Number of instances : 194 Area : N/A Attribute characteristics: Categorical, Integer Number of attribute : 30 Data denoted: 1990-05-15 Associate Task: Classificati on Missing values: No Number of web hits : 159411 Using Decision Tree (j4.8) : Accuracy : 81.9588% Error : 18.0412% After simple k means clustering : Number of cluster : 2 Using DT(j4.8): Accuracy : 92.7835% Error : 7.2165% Number of cluster : 4 Using DT(j4.8): Accuracy :88.6598% Error :11.3402% Number of cluster : 6 Using DT(j4.8): Accuracy : 86.5979% Error :13.4021% Observation : Accuracy increased after the clustering compared to the initial accuracy .