Your SlideShare is downloading. ×
  • Like
  • Save
An Introduction To Weka
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

An Introduction To Weka

  • 5,718 views
Published

An Introduction To Weka

An Introduction To Weka

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
5,718
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
2
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Talk about * hacking weka * discretization * cross validations
  • Simple CLI provides a commandline interface to weka’s routines Explorer interface provides a graphical front end to weka’s routines and components Experimenter allows you to build classification experiments KnowledgeFlow provides an alternative to the Explorer as a graphical front end to Weka's core algorithms.

Transcript

  • 1. Weka: A Short Introduction
  • 2. Introduction
    • A collection of open source ML algorithms
      • pre-processing
      • classifiers
      • clustering
      • association rule
    • Created by researchers at the University of Waikato in New Zealand
    • Software Platform: Java based
  • 3. What is WEKA
    • Waikato Environment for Knowledge Analysis (WEKA)
    • Developed by the Department of Computer Science, University of Waikato, New Zealand
    • Machine learning/data mining software written in Java (distributed under the GNU Public License)
    • Used for research, education, and applications
    • http://www.cs.waikato.ac.nz/ml/weka/
  • 4. Installation
    • Download software from http://www.cs.waikato.ac.nz/ml/weka/
      • If you are interested in modifying/extending weka there is a developer version that includes the source code
    • Set the weka environment variable for java
    • Download some ML data from http://mlearn.ics.uci.edu/MLRepository.html
  • 5. Main Features
    • 49 data preprocessing tools
    • 76 classification/regression algorithms
    • 8 clustering algorithms
    • 15 attribute/subset evaluators + 10 search algorithms for feature selection
    • 3 algorithms for finding association rules
    • More algorithms being added
    • Options to customize using the Java source code is made available.
    • Custom extensions and plug ins can be developed
    • Excellent mailing and discussion lists available.
    • 3 graphical user interfaces
      • “ The Explorer” (exploratory data analysis)
      • “ The Experimenter” (experimental environment)
      • “ The KnowledgeFlow” (new process model inspired interface)
  • 6. Weka Interfaces
    • Command-line
    • Explorer
      • preprocessing, attribute selection, learning, visualiation
    • Knowledge Flow
      • visual design of KDD process
      • capabilities ~ Explorer
    • Experimenter
      • testing and evaluating machine learning algorithms
  • 7. WEKA GUI Interface
  • 8. WEKA Data format
    • Uses flat text files to describe the data
    • Can work with a wide variety of data files including its own “.arff” format and C4.5 file formats
    • Data can be imported from a file in various formats:
      • ARFF , CSV, C4.5, binary
    • Data can also be read from a URL or from an SQL database (using JDBC)
  • 9. WEKA:: ARRF file format
    • @relation heart-disease-simplified
    • @attribute age numeric
    • @attribute sex { female, male}
    • @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
    • @attribute cholesterol numeric
    • @attribute exercise_induced_angina { no, yes}
    • @attribute class { present, not_present}
    • @data
    • 63,male,typ_angina,233,no,not_present
    • 67,male,asympt,286,yes,present
    • 67,male,asympt,229,yes,present
    • 38,female,non_anginal,?,no,not_present
    • ...
    numeric attribute nominal attribute
  • 10. Attribute-Relation File Format (ARFF)
    • Weka reads ARFF files:
        • @relation adult @attribute age numeric @attribute name string @attribute education {College, Masters, Doctorate} @attribute class {>50K,<=50K} @data
        • 50,Lisa, College, <= 50K 30,Martin John, College,<=50K
    • Supported attributes:
      • numeric, nominal, string, date
    • Details at:
      • www.cs.waikato.ac.nz/~ml/weka/arff.html
  • 11. Weka Explorer
    • What we will use today in Weka:
    • Pre-process:
      • Load, analyze, and filter data
    • Visualize:
      • Compare pairs of attributes
      • Plot matrices
    • Classify:
      • All algorithms seem in class (Naive Bayes, etc.)
    • Feature selection:
      • Forward feature subset selection, etc.
  • 12. Explorer: pre-processing the data
    • Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary
    • Data can also be read from a URL or from an SQL databases using JDBC
    • Pre-processing tools in WEKA are called “filters”
    • WEKA contains filters for:
      • Discretization, normalization, resampling, attribute selection, attribute combination, …
  • 13. Explorer: Building classification models
    • “ Classifiers” in WEKA are models for predicting nominal or numeric quantities
    • Implemented schemes include:
      • Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …
    • “ Meta”-classifiers include:
      • Bagging, boosting, stacking, error-correcting output codes, data cleansing, …
  • 14. load filter analyze
  • 15. visualize attributes
  • 16. Weka Experimenter
    • If you need to perform many experiments:
      • Experimenter makes it easy to compare the performance of different learning schemes
      • Results can be written into file or database
      • Evaluation options: cross-validation, learning curve, etc.
      • Can also iterate over different parameter settings
      • Significance-testing built in .
  • 17. Visit more self help tutorials
    • Pick a tutorial of your choice and browse through it at your own pace.
    • The tutorials section is free, self-guiding and will not involve any additional support.
    • Visit us at www.dataminingtools.net