An Introduction To Weka


Published on

An Introduction To Weka

Published in: Technology, Education
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Talk about * hacking weka * discretization * cross validations
  • Simple CLI provides a commandline interface to weka’s routines Explorer interface provides a graphical front end to weka’s routines and components Experimenter allows you to build classification experiments KnowledgeFlow provides an alternative to the Explorer as a graphical front end to Weka's core algorithms.
  • An Introduction To Weka

    1. 1. Weka: A Short Introduction
    2. 2. Introduction <ul><li>A collection of open source ML algorithms </li></ul><ul><ul><li>pre-processing </li></ul></ul><ul><ul><li>classifiers </li></ul></ul><ul><ul><li>clustering </li></ul></ul><ul><ul><li>association rule </li></ul></ul><ul><li>Created by researchers at the University of Waikato in New Zealand </li></ul><ul><li>Software Platform: Java based </li></ul>
    3. 3. What is WEKA <ul><li>Waikato Environment for Knowledge Analysis (WEKA) </li></ul><ul><li>Developed by the Department of Computer Science, University of Waikato, New Zealand </li></ul><ul><li>Machine learning/data mining software written in Java (distributed under the GNU Public License) </li></ul><ul><li>Used for research, education, and applications </li></ul><ul><li> </li></ul>
    4. 4. Installation <ul><li>Download software from </li></ul><ul><ul><li>If you are interested in modifying/extending weka there is a developer version that includes the source code </li></ul></ul><ul><li>Set the weka environment variable for java </li></ul><ul><li>Download some ML data from </li></ul>
    5. 5. Main Features <ul><li>49 data preprocessing tools </li></ul><ul><li>76 classification/regression algorithms </li></ul><ul><li>8 clustering algorithms </li></ul><ul><li>15 attribute/subset evaluators + 10 search algorithms for feature selection </li></ul><ul><li>3 algorithms for finding association rules </li></ul><ul><li>More algorithms being added </li></ul><ul><li>Options to customize using the Java source code is made available. </li></ul><ul><li>Custom extensions and plug ins can be developed </li></ul><ul><li>Excellent mailing and discussion lists available. </li></ul><ul><li>3 graphical user interfaces </li></ul><ul><ul><li>“ The Explorer” (exploratory data analysis) </li></ul></ul><ul><ul><li>“ The Experimenter” (experimental environment) </li></ul></ul><ul><ul><li>“ The KnowledgeFlow” (new process model inspired interface) </li></ul></ul>
    6. 6. Weka Interfaces <ul><li>Command-line </li></ul><ul><li>Explorer </li></ul><ul><ul><li>preprocessing, attribute selection, learning, visualiation </li></ul></ul><ul><li>Knowledge Flow </li></ul><ul><ul><li>visual design of KDD process </li></ul></ul><ul><ul><li>capabilities ~ Explorer </li></ul></ul><ul><li>Experimenter </li></ul><ul><ul><li>testing and evaluating machine learning algorithms </li></ul></ul>
    7. 7. WEKA GUI Interface
    8. 8. WEKA Data format <ul><li>Uses flat text files to describe the data </li></ul><ul><li>Can work with a wide variety of data files including its own “.arff” format and C4.5 file formats </li></ul><ul><li>Data can be imported from a file in various formats: </li></ul><ul><ul><li>ARFF , CSV, C4.5, binary </li></ul></ul><ul><li>Data can also be read from a URL or from an SQL database (using JDBC) </li></ul>
    9. 9. WEKA:: ARRF file format <ul><li>@relation heart-disease-simplified </li></ul><ul><li>@attribute age numeric </li></ul><ul><li>@attribute sex { female, male} </li></ul><ul><li>@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} </li></ul><ul><li>@attribute cholesterol numeric </li></ul><ul><li>@attribute exercise_induced_angina { no, yes} </li></ul><ul><li>@attribute class { present, not_present} </li></ul><ul><li>@data </li></ul><ul><li>63,male,typ_angina,233,no,not_present </li></ul><ul><li>67,male,asympt,286,yes,present </li></ul><ul><li>67,male,asympt,229,yes,present </li></ul><ul><li>38,female,non_anginal,?,no,not_present </li></ul><ul><li>... </li></ul>numeric attribute nominal attribute
    10. 10. Attribute-Relation File Format (ARFF) <ul><li>Weka reads ARFF files: </li></ul><ul><ul><ul><li>@relation adult @attribute age numeric @attribute name string @attribute education {College, Masters, Doctorate} @attribute class {>50K,<=50K} @data </li></ul></ul></ul><ul><ul><ul><li>50,Lisa, College, <= 50K 30,Martin John, College,<=50K </li></ul></ul></ul><ul><li>Supported attributes: </li></ul><ul><ul><li>numeric, nominal, string, date </li></ul></ul><ul><li>Details at: </li></ul><ul><ul><li> </li></ul></ul>
    11. 11. Weka Explorer <ul><li>What we will use today in Weka: </li></ul><ul><li>Pre-process: </li></ul><ul><ul><li>Load, analyze, and filter data </li></ul></ul><ul><li>Visualize: </li></ul><ul><ul><li>Compare pairs of attributes </li></ul></ul><ul><ul><li>Plot matrices </li></ul></ul><ul><li>Classify: </li></ul><ul><ul><li>All algorithms seem in class (Naive Bayes, etc.) </li></ul></ul><ul><li>Feature selection: </li></ul><ul><ul><li>Forward feature subset selection, etc. </li></ul></ul>
    12. 12. Explorer: pre-processing the data <ul><li>Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary </li></ul><ul><li>Data can also be read from a URL or from an SQL databases using JDBC </li></ul><ul><li>Pre-processing tools in WEKA are called “filters” </li></ul><ul><li>WEKA contains filters for: </li></ul><ul><ul><li>Discretization, normalization, resampling, attribute selection, attribute combination, … </li></ul></ul>
    13. 13. Explorer: Building classification models <ul><li>“ Classifiers” in WEKA are models for predicting nominal or numeric quantities </li></ul><ul><li>Implemented schemes include: </li></ul><ul><ul><li>Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … </li></ul></ul><ul><li>“ Meta”-classifiers include: </li></ul><ul><ul><li>Bagging, boosting, stacking, error-correcting output codes, data cleansing, … </li></ul></ul>
    14. 14. load filter analyze
    15. 15. visualize attributes
    16. 16. Weka Experimenter <ul><li>If you need to perform many experiments: </li></ul><ul><ul><li>Experimenter makes it easy to compare the performance of different learning schemes </li></ul></ul><ul><ul><li>Results can be written into file or database </li></ul></ul><ul><ul><li>Evaluation options: cross-validation, learning curve, etc. </li></ul></ul><ul><ul><li>Can also iterate over different parameter settings </li></ul></ul><ul><ul><li>Significance-testing built in . </li></ul></ul>
    17. 17. Visit more self help tutorials <ul><li>Pick a tutorial of your choice and browse through it at your own pace. </li></ul><ul><li>The tutorials section is free, self-guiding and will not involve any additional support. </li></ul><ul><li>Visit us at </li></ul>