How use weka tool

The university of
Poonch
Data Mining
Bs(Cs) 6th semester

 Weka stands for Waikato Environment for
knowledge.
 Weka contains tools for data pre- processing,
classification, regression and clustering.
 Weka is a collection of machine learning
algorithm for data mining task.

From window desktop:
 click start, choose All programs,
choose Weka 3-7 to start Weka.
 Then the first interface window
appear.

 Explorer is used for pre-
processing, attribute selection,
learning and visualization.
 When we select explorer the
environment that will open is:

 Now I click on open file to open a
data file from the folder where
data files are stored.
 Then I select my dataset
“CONTACT LENSES”
 Every instance consist a number
of attributes

 First we choose filter.
 There are two filters:
 Supervised
 unsupervised.
 We then selected unsupervised filter:
 In unsupervised filter there are two options
 Instance
 attribute
 We selected attribute:
 There are many attributes but we choose the attribute
that is Nominal To Binary.

 Firstly there is a simple classifier ZeroR.
 Determines the most common class
 Or the median (in the case of numeric
values)
 Tests how well the class can be predicted
without considering other attributes

Use training set:
 The classifier is evaluated on how well it predicts
the class of the instances it was trained on.
Supplied test set:
 The classifier is evaluated on how well it
predicts the class of a set of instances loaded from a
file. Clicking the Set... Button brings up a dialog
allowing you to choose the file to test on.

Percentage split:
• The classifier is evaluated on how well it
predicts a certain percentage of the data which
is held out for testing. The amount of data held
out depends on the value entered in the % field.
Cross-validation (CV):
 The classifier is evaluated by cross-validation,
using the number of folds that are entered in
the Folds text field.

 Having 10 folds means 90% of full data is
used for training (and 10% for testing) in
each fold test.
 cross-validation produces a fair estimation of
test performance.

 When we choose supplied test set data it
gives the same result as when we choose
training set. The results are same of both
supplied test set and training set.

 The True Positive (TP) rate is the proportion of
examples which were classified as class x, among all
examples which truly have class x, i.e. how much part
of the class was captured. It is equivalent to Recall. In
the confusion matrix, this is the diagonal element
divided by the sum over the relevant row,
i.e.4/(4+0+1)=0.8 for class soft and 1/(0+1+3)=0.425
for class hard 4/(4+0+1)=0.8 for none class in our
example.

 The False Positive (FP) rate is the proportion of
examples which were classified as class x, but belong
to a different class, among all examples which are not
of class x. In the matrix, this is the column sum of class
x minus the diagonal element, divided by the rows
sums of all other classes; i.e. 1/1+2+12=0.053 for
class soft and 1/1+0+4=0.8 for class hard.

 The Precision is the proportion of the examples
which truly have class x among all those which
were classified as class x. In the matrix, this is
the diagonal element divided by the sum over
the relevant column, i.e. 4/(4+0+1)=0.8 for
class soft and 1/(0+1+3)=0.333 for class hard
class 12/(12+3+1)=0.75 for class none

2*Precision*Recall / (Precision + Recall)
A combined measure for precision and
Recall for class soft (2*0.8*0.8)/(0.8+0.8)=0.8 for
class hard (2*0.333*0.25)/(0.333+0.8)=0.286 for
class none (2*0.75*0.8)/(0.75+0.8)=0.774

 Accuracy is measured by the area under the
ROC curve. An area of 1 represents a perfect
test; an area of .5 represents a worthless test. A
rough guide for classifying the accuracy of a
diagnostic test is the traditional academic point
system: .90-1 = excellent (A)
Recall:
All the documents that have exactly
retrieved from the query.It is equivalent to TP.

 I can change the folds in cross
validation.
 If I change the folds from 10 to 5
then its means that the folds are 80%
trained.

How use weka tool

More Related Content

What's hot

Viewers also liked

Similar to How use weka tool

Recently uploaded

How use weka tool