Department of Computer Science,
University of Waikato, New Zealand
Eibe Frank
 WEKA: A Machine
Learning Toolkit
 The Explorer
• Classification and
Regression
• Clustering
• Association Rules
• Attribute Selection
• Data Visualization
 The Experimenter
 The Knowledge
Flow GUI
 Conclusions
Machine Learning with
WEKA
8/3/2022 University of Waikato 2
WEKA: the bird
Copyright: Martin Kramer (mkramer@wxs.nl)
8/3/2022 University of Waikato 3
WEKA: the software
 Machine learning/data mining software written in
Java (distributed under the GNU Public License)
 Used for research, education, and applications
 Complements “Data Mining” by Witten & Frank
 Main features:
 Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
 Graphical user interfaces (incl. data visualization)
 Environment for comparing learning algorithms
8/3/2022 University of Waikato 4
WEKA: versions
 There are several versions of WEKA:
 WEKA 3.0: “book version” compatible with
description in data mining book
 WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
 WEKA 3.3: “development version” with lots of
improvements
 This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)
8/3/2022 University of Waikato 5
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
WEKA only deals with “flat” files
8/3/2022 University of Waikato 6
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
WEKA only deals with “flat” files
8/3/2022 University of Waikato 7
8/3/2022 University of Waikato 8
8/3/2022 University of Waikato 9
8/3/2022 University of Waikato 10
Explorer: pre-processing the data
 Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
 Data can also be read from a URL or from an SQL
database (using JDBC)
 Pre-processing tools in WEKA are called “filters”
 WEKA contains filters for:
 Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
8/3/2022 University of Waikato 11
8/3/2022 University of Waikato 12
8/3/2022 University of Waikato 13
8/3/2022 University of Waikato 14
8/3/2022 University of Waikato 15
8/3/2022 University of Waikato 16
8/3/2022 University of Waikato 17
8/3/2022 University of Waikato 18
8/3/2022 University of Waikato 19
8/3/2022 University of Waikato 20
8/3/2022 University of Waikato 21
8/3/2022 University of Waikato 22
8/3/2022 University of Waikato 23
8/3/2022 University of Waikato 24
8/3/2022 University of Waikato 25
8/3/2022 University of Waikato 26
8/3/2022 University of Waikato 27
8/3/2022 University of Waikato 28
8/3/2022 University of Waikato 29
8/3/2022 University of Waikato 30
8/3/2022 University of Waikato 31
8/3/2022 University of Waikato 32
Explorer: building “classifiers”
 Classifiers in WEKA are models for predicting
nominal or numeric quantities
 Implemented learning schemes include:
 Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
 “Meta”-classifiers include:
 Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
8/3/2022 University of Waikato 33
8/3/2022 University of Waikato 34
8/3/2022 University of Waikato 35
8/3/2022 University of Waikato 36
8/3/2022 University of Waikato 37
8/3/2022 University of Waikato 38
8/3/2022 University of Waikato 39
8/3/2022 University of Waikato 40
8/3/2022 University of Waikato 41
8/3/2022 University of Waikato 42
8/3/2022 University of Waikato 43
8/3/2022 University of Waikato 44
8/3/2022 University of Waikato 45
8/3/2022 University of Waikato 46
8/3/2022 University of Waikato 47
8/3/2022 University of Waikato 48
8/3/2022 University of Waikato 49
8/3/2022 University of Waikato 50
8/3/2022 University of Waikato 51
8/3/2022 University of Waikato 52
8/3/2022 University of Waikato 53
8/3/2022 University of Waikato 54
8/3/2022 University of Waikato 55
8/3/2022 University of Waikato 56
8/3/2022 University of Waikato 57
8/3/2022 University of Waikato 58
8/3/2022 University of Waikato 59
8/3/2022 University of Waikato 60
8/3/2022 University of Waikato 61
8/3/2022 University of Waikato 62
8/3/2022 University of Waikato 63
8/3/2022 University of Waikato 64
8/3/2022 University of Waikato 65
8/3/2022 University of Waikato 66
8/3/2022 University of Waikato 67
8/3/2022 University of Waikato 68
8/3/2022 University of Waikato 69
8/3/2022 University of Waikato 70
8/3/2022 University of Waikato 71
8/3/2022 University of Waikato 72
8/3/2022 University of Waikato 73
8/3/2022 University of Waikato 74
8/3/2022 University of Waikato 75
8/3/2022 University of Waikato 76
8/3/2022 University of Waikato 77
8/3/2022 University of Waikato 78
8/3/2022 University of Waikato 79
8/3/2022 University of Waikato 80
8/3/2022 University of Waikato 81
8/3/2022 University of Waikato 82
8/3/2022 University of Waikato 83
8/3/2022 University of Waikato 84
8/3/2022 University of Waikato 85
8/3/2022 University of Waikato 86
8/3/2022 University of Waikato 87
8/3/2022 University of Waikato 88
8/3/2022 University of Waikato 89
8/3/2022 University of Waikato 90
8/3/2022 University of Waikato 91
8/3/2022 University of Waikato 92
Explorer: clustering data
 WEKA contains “clusterers” for finding groups of
similar instances in a dataset
 Implemented schemes are:
 k-Means, EM, Cobweb, X-means, FarthestFirst
 Clusters can be visualized and compared to “true”
clusters (if given)
 Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
8/3/2022 University of Waikato 93
8/3/2022 University of Waikato 94
8/3/2022 University of Waikato 95
8/3/2022 University of Waikato 96
8/3/2022 University of Waikato 97
8/3/2022 University of Waikato 98
8/3/2022 University of Waikato 99
8/3/2022 University of Waikato 100
8/3/2022 University of Waikato 101
8/3/2022 University of Waikato 102
8/3/2022 University of Waikato 103
8/3/2022 University of Waikato 104
8/3/2022 University of Waikato 105
8/3/2022 University of Waikato 106
8/3/2022 University of Waikato 107
8/3/2022 University of Waikato 108
Explorer: finding associations
 WEKA contains an implementation of the Apriori
algorithm for learning association rules
 Works only with discrete data
 Can identify statistical dependencies between
groups of attributes:
 milk, butter  bread, eggs (with confidence 0.9 and
support 2000)
 Apriori can compute all rules that have a given
minimum support and exceed a given confidence
8/3/2022 University of Waikato 109
8/3/2022 University of Waikato 110
8/3/2022 University of Waikato 111
8/3/2022 University of Waikato 112
8/3/2022 University of Waikato 113
8/3/2022 University of Waikato 114
8/3/2022 University of Waikato 115
8/3/2022 University of Waikato 116
Explorer: attribute selection
 Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
 Attribute selection methods contain two parts:
 A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
 An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
 Very flexible: WEKA allows (almost) arbitrary
combinations of these two
8/3/2022 University of Waikato 117
8/3/2022 University of Waikato 118
8/3/2022 University of Waikato 119
8/3/2022 University of Waikato 120
8/3/2022 University of Waikato 121
8/3/2022 University of Waikato 122
8/3/2022 University of Waikato 123
8/3/2022 University of Waikato 124
8/3/2022 University of Waikato 125
Explorer: data visualization
 Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
 WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
 To do: rotating 3-d visualizations (Xgobi-style)
 Color-coded class values
 “Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
 “Zoom-in” function
8/3/2022 University of Waikato 126
8/3/2022 University of Waikato 127
8/3/2022 University of Waikato 128
8/3/2022 University of Waikato 129
8/3/2022 University of Waikato 130
8/3/2022 University of Waikato 131
8/3/2022 University of Waikato 132
8/3/2022 University of Waikato 133
8/3/2022 University of Waikato 134
8/3/2022 University of Waikato 135
8/3/2022 University of Waikato 136
8/3/2022 University of Waikato 137
8/3/2022 University of Waikato 138
Performing experiments
 Experimenter makes it easy to compare the
performance of different learning schemes
 For classification and regression problems
 Results can be written into file or database
 Evaluation options: cross-validation, learning
curve, hold-out
 Can also iterate over different parameter settings
 Significance-testing built in!
8/3/2022 University of Waikato 139
8/3/2022 University of Waikato 140
8/3/2022 University of Waikato 141
8/3/2022 University of Waikato 142
8/3/2022 University of Waikato 143
8/3/2022 University of Waikato 144
8/3/2022 University of Waikato 145
8/3/2022 University of Waikato 146
8/3/2022 University of Waikato 147
8/3/2022 University of Waikato 148
8/3/2022 University of Waikato 149
8/3/2022 University of Waikato 150
8/3/2022 University of Waikato 151
8/3/2022 University of Waikato 152
The Knowledge Flow GUI
 New graphical user interface for WEKA
 Java-Beans-based interface for setting up and
running machine learning experiments
 Data sources, classifiers, etc. are beans and can
be connected graphically
 Data “flows” through components: e.g.,
“data source” -> “filter” -> “classifier” -> “evaluator”
 Layouts can be saved and loaded again later
8/3/2022 University of Waikato 153
8/3/2022 University of Waikato 154
8/3/2022 University of Waikato 155
8/3/2022 University of Waikato 156
8/3/2022 University of Waikato 157
8/3/2022 University of Waikato 158
8/3/2022 University of Waikato 159
8/3/2022 University of Waikato 160
8/3/2022 University of Waikato 161
8/3/2022 University of Waikato 162
8/3/2022 University of Waikato 163
8/3/2022 University of Waikato 164
8/3/2022 University of Waikato 165
8/3/2022 University of Waikato 166
8/3/2022 University of Waikato 167
8/3/2022 University of Waikato 168
8/3/2022 University of Waikato 169
8/3/2022 University of Waikato 170
8/3/2022 University of Waikato 171
8/3/2022 University of Waikato 172
8/3/2022 University of Waikato 173
Conclusion: try it yourself!
 WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
 Also has a list of projects based on WEKA
 WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang

weka-tutorial-all.ppt