2. Weka was developed at the University
of Waikato in New Zealand.
Weka is a open source data mining tool
developed in Java. It is used for research,
education, and applications. It can be run
on Windows, Linux and Mac.
3.
4. Main features:
Comprehensive set of data pre-processing
tools, learning algorithms and evaluation
methods
Graphical user interfaces (incl. data
visualization)
Environment for comparing learning
algorithms
5. Weka is a collection of machine
learning algorithms for data mining
tasks. The algorithms can either be
applied directly to a dataset (using
GUI) or called from your own Java
code (using Weka Java library).
6. Weka contains tools for data pre-
processing, classification, regression,
clustering, association rules, and
visualization. It is also well-suited for
developing new machine learning
schemes.
7. Data Ming
Data Ming
by Weka
by Weka
••Pre-processing
Pre-processing Output
Input
Input ••Classification
Classification Output
••Result
Result
••Rawdata
Raw data ••Regression
Regression
••Clustering
Clustering
••AssociationRules
Association Rules
••Visualization
Visualization
8. There are mainly 2 ways to use Weka to conduct your
data mining tasks.
Use Weka Graphical User Interfaces (GUI)
GUI is straightforward and easy to use. But it is
not flexible. It can not be called from you
own application.
9. Import Weka Java library to your own java
application.
Developers can leverage on Weka Java library
to develop software or modify the source code
to meet special requirements. It is more
flexible and advanced. But it is not as easy to
use as GUI.
10. Tools (or functions) in Weka include:
Data preprocessing (e.g., Data Filters),
Classification (e.g., BayesNet, KNN, C4.5 Decision Tree,
Neural Networks, SVM),
Regression (e.g., Linear Regression, Isotonic Regression, SVM
for Regression),
Clustering (e.g., Simple K-means, Expectation Maximization
(EM)),
Association rules (e.g., Apriori Algorithm, Predictive Accuracy,
Confirmation Guided),
Feature Selection (e.g., Cfs Subset Evaluation, Information Gain,
Chi-squared Statistic), and
Visualization (e.g., View different two-dimensional plots of the
data).
11. Weka Data File Format (Input)
Weka for Data Mining
Sample Output from Weka (Output)
12. The most popular data input format of Weka is “arff” (with “arff”
being the extension name of your input data file).
FILE FORMAT
FILE FORMAT
@relation RELATION_NAME
@relation RELATION_NAME
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@data
@data
DATAROW1
DATAROW1
DATAROW2
DATAROW2
DATAROW3
DATAROW3
13. Different analysis tools/functions
The value set of the chosen attribute
and the # of input items with each value
Different attributes to
choose
15. Three sets of classes you may need to use when
developing your own application
Classes for Loading Data
Classes for Classifiers
Classes for Evaluation
16. In sum, the overall goal of Weka is to build a state-
of-the-art facility for developing machine
learning (ML) techniques and allow people to
apply them to real-world data mining problems.