Wellcome To Our presentation
Data Analysis with Weka zoo.arff
Dataset
◉Zoo.arff:
A simple database containing 17 Boolean-valued attributes. The "type“ attribute appears to be the
class attribute. Here is a breakdown of which animals are in which type.
Dataset Visualization
Pre-Processing
Why?
◉Incomplete Data
◉ Noisy Data
◉Inconsistent Data
How?
◉ Discretize
◉ Remove Duplicates
◉ RemoveUseless
◉ Etc
Case Study:
◉ It has duplicate (frog)
◉One attribute is misleading
(Animal name)
Classification
Decision Trees
 J48
Bayes
 Naive Bayes
Rule Based
 JRip
Classifiers Options Explained-Applied on Dataset
◉Training Data:
We use the whole
dataset as training set.
It gives the best
results for the data set
itself but does not
guarantee the best
test for unseen data.
◉Cross-Validation:
Divide the Data Set
into K subsamples and
use k-1 subsamples as
training data and one
subsample as test
data.
◉Percentage Split:
We divide the dataset
into two parts: the
first X% of the data
set is used as training
and the rest is used as
the test set.
JRIP Classifier Algorithm
Works fine for:
◉ Class: Missing classes values,
binary and Nominal Classes
◉ Attributes: Nominal Dates, Missing
Values, Numeric and Binary
Advantages
◉ As highly expressive as decision trees
◉Easy to interpret
◉ Easy to generate
Classifier Evaluation-Default Options JRip
Pre-Processed?-> NO Results
Classifier Evaluation->Cross Validation Increased JRip
Classifier Evaluation->Cross Validation JRip
Pre-Processed?-> NO Results: NO BIG CHANGE
Cross
Validation
Correctly
Classified Inst.
10 87.12
20 85.14
30 86.13
40 84.15
50 87.12
60 82.17
70 85.14
ClassifierClassifier Evaluation->Training Set JRip
Pre-Processed?-> NO Results
Pre-Processing- Why?
 Pre-Processing->
◉ RemoveDuplicates (instances),
◉ RemoveUseless (attributes)
RemoveDuplicates (instances), RemoveUseless (attributes)
◉RemoveDuplicates:
 Removes all duplicate instances
from the first batch of data it
receive
◉RemoveUseless:
 This filter removes attributes that
do not vary at all or that vary too
much
Classifier Evaluation->Default Options JRip
Pre-Processed?-> Yes Results
J48 Classifier Algorithm
Works fine for:
◉ Class: Missing classes values,
binary and Nominal Classes
◉ Attributes: Nominal Dates,
Missing Values, Numeric and
Binary
Advantages
◉ Easy to implement
◉Can use both categorical and
Continuous values
◉ Deals with noise
Default options J48
Default Options J48
Changing cross validation =20 J48
Changing cross validation value J48
Cross
Validation
Correctly
Classified Inst.
10 92.079
20 92.079
30 92.079
40 92.079
50 92.079
60 92.079
70 92.079
Any questions ?
Thanks!

Data Mining Zoo classification