Web & Social Media Analytics Previous Year Question Paper.pdf
Classification by clustering
1. United International University
Assignment-01
Title: Classification by Clustering.
Course : Pattern Recognition Lab
Course Code : CSI 416
Submitted To :
Dr.Dewan Md.Farid
Associate Professor,CSE
United International University.
Submitted By :
Nazmul Hyder
Id : 011 131 085
Section : SB
Date : 06.08.2017
2. Dataset 1: Mushroom
Data set
characteristics:
Multivariat
e
Number of
instances :
8124 Area : Life
Attribute
characteristics:
Categorical Number of
attribute :
22 Data
denoted:
1987-09-01
Associate Task: Classificat
ion
Missing values: Yes Number of
web hits :
261721
Using Decision Tree (j4.8) :
Accuracy : 66.9621%
Error : 33.0379%
After simple k means clustering :
Number of cluster:2
Using DT(j4.8):
Accuracy : 99.4953%
Error : 0.5047%
Number of cluster:4
Using DT(j4.8):
Accuracy : 99.8523%
Error : 0.1477%
Number of cluster:6
Using DT(j4.8):
Accuracy : 99.483%
Error : 0.517%
Observation : Accuracy increased after the clustering compared to the initial accuracy .
Dataset 2: Wine-Quality-Red
Data set
characteristics:
Multivariat
e
Number of
instances :
1599 Area : Business
Attribute
characteristics:
Real Number of
attribute :
12 Data
denoted:
2009-10-07
Associate Task: Classificat
ion,
Regression
Missing values: No Number of
web hits :
475478
Using Decision Tree (j4.8) :
Accuracy : 90.9944%
Error : 9.0056%
3. After simple k means clustering :
Number of cluster:2
Using DT(j4.8):
Accuracy : 99.1245%
Error : 0.8755%
Number of cluster:4
Using DT(j4.8):
Accuracy : 97.8737%
Error : 2.1263%
Number of cluster:6
Using DT(j4.8):
Accuracy : 97.8737%
Error : 2.1263%
Observation : Accuracy increased after the clustering compared to the initial accuracy .
Dataset 3 : ZOO
Data set
characteristics:
Multivariat
e
Number of
instances :
101 Area : Life
Attribute
characteristics:
Categorical
, Integer
Number of
attribute :
17 Data
denoted:
1990-05-15
Associate Task: Classificat
ion
Missing values: No Number of
web hits :
172527
Using Decision Tree (j4.8) :
Accuracy : 99.0196%
Error : 0.9804%
After simple k means clustering :
Number of cluster : 2
Using DT(j4.8):
Accuracy : 100%
Error : 0%
Number of cluster : 4
Using DT(j4.8):
Accuracy : 97.0588%
Error : 2.9412%
Number of cluster : 6
Using DT(j4.8):
Accuracy : 96.0784%
Error :3.9216%
Observation : Accuracy increased after the clustering compared to the initial accuracy .
4. Dataset 3 : flags
Data set
characteristics:
Multivariate Number of
instances :
194 Area : N/A
Attribute
characteristics:
Categorical,
Integer
Number of
attribute :
30 Data
denoted:
1990-05-15
Associate Task: Classificati
on
Missing values: No Number of
web hits :
159411
Using Decision Tree (j4.8) :
Accuracy : 81.9588%
Error : 18.0412%
After simple k means clustering :
Number of cluster : 2
Using DT(j4.8):
Accuracy : 92.7835%
Error : 7.2165%
Number of cluster : 4
Using DT(j4.8):
Accuracy :88.6598%
Error :11.3402%
Number of cluster : 6
Using DT(j4.8):
Accuracy : 86.5979%
Error :13.4021%
Observation : Accuracy increased after the clustering compared to the initial accuracy .