An Introduction To Weka

Introduction A collection of open source ML algorithms pre-processing classifiers clustering association rule Created by researchers at the University of Waikato in New Zealand Software Platform: Java based

What is WEKA Waikato Environment for Knowledge Analysis (WEKA) Developed by the Department of Computer Science, University of Waikato, New Zealand Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications http://www.cs.waikato.ac.nz/ml/weka/

Installation Download software from http://www.cs.waikato.ac.nz/ml/weka/ If you are interested in modifying/extending weka there is a developer version that includes the source code Set the weka environment variable for java Download some ML data from http://mlearn.ics.uci.edu/MLRepository.html

Main Features 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 15 attribute/subset evaluators + 10 search algorithms for feature selection 3 algorithms for finding association rules More algorithms being added Options to customize using the Java source code is made available. Custom extensions and plug ins can be developed Excellent mailing and discussion lists available. 3 graphical user interfaces “ The Explorer” (exploratory data analysis) “ The Experimenter” (experimental environment) “ The KnowledgeFlow” (new process model inspired interface)

Weka Interfaces Command-line Explorer preprocessing, attribute selection, learning, visualiation Knowledge Flow visual design of KDD process capabilities ~ Explorer Experimenter testing and evaluating machine learning algorithms

WEKA Data format Uses flat text files to describe the data Can work with a wide variety of data files including its own “.arff” format and C4.5 file formats Data can be imported from a file in various formats: ARFF , CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC)

WEKA:: ARRF file format @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... numeric attribute nominal attribute

Attribute-Relation File Format (ARFF) Weka reads ARFF files: @relation adult @attribute age numeric @attribute name string @attribute education {College, Masters, Doctorate} @attribute class {>50K,<=50K} @data 50,Lisa, College, <= 50K 30,Martin John, College,<=50K Supported attributes: numeric, nominal, string, date Details at: www.cs.waikato.ac.nz/~ml/weka/arff.html

Weka Explorer What we will use today in Weka: Pre-process: Load, analyze, and filter data Visualize: Compare pairs of attributes Plot matrices Classify: All algorithms seem in class (Naive Bayes, etc.) Feature selection: Forward feature subset selection, etc.

Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL databases using JDBC Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, attribute combination, …

Explorer: Building classification models “ Classifiers” in WEKA are models for predicting nominal or numeric quantities Implemented schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “ Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, data cleansing, …

Weka Experimenter If you need to perform many experiments: Experimenter makes it easy to compare the performance of different learning schemes Results can be written into file or database Evaluation options: cross-validation, learning curve, etc. Can also iterate over different parameter settings Significance-testing built in .

Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

An Introduction To Weka

More Related Content

What's hot

Similar to An Introduction To Weka

More from weka Content

Recently uploaded

An Introduction To Weka

Editor's Notes