Lecture 4: The Weka Package


Published on

The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools. It includes virtually all the algorithms described in this book. It is designed so that you can quickly try out existing
methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning. As well as a wide variety of learning algorithms, it includes a wide range of preprocessing tools. This diverse and comprehensive
toolkit is accessed through a common interface so that its users can compare different methods and identify those that are most appropriate for the problem at hand. (Witten and Frank, 2005)

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lecture 4: The Weka Package

  1. 1. Lecture 4: The Weka Package Marina Santini, Uppsala University Department of Linguistics and Philology, September 2013 Lec 4:TheWeka Package1 Machine Learning for Language Technology
  2. 2. Outline Lec 4:TheWeka Package2 Re:Witten & Frank (2005)  Introduction to Weka (Ch. 9)  Getting Started: The Explorer (Ch. 10)  The basic methods (4.3, 4.6, 4.7)  Implementations (6.1, 6.3, 6.4)  Evaluation (5.1-5.6)  Assignment 1
  3. 3. Introduction: What is Weka? Lec 4:TheWeka Package3  WEKA: Waikato Environment for Knowledge Analysis  Weka: the name of a flightless bird living in New Zealand  The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools;  Open source code (GNU General Public License ) written in Java  http://www.cs.waikato.ac.nz/ml/weka/downloading.html
  4. 4. The interface: The Explorer Lec 4:TheWeka Package4  Uploading the input (ARFF format);  Preprocessing  Bulding a classifier;  Tuning the parameters;  Examining the output (evaluation)
  5. 5. Uploading the input (2nd_set_7webgenres.arff) Lec 4:TheWeka Package5
  6. 6. Preprocessing Lec 4:TheWeka Package6
  7. 7. Building a classifier Lec 4:TheWeka Package7
  8. 8. Methods & Implementations Lec 4:TheWeka Package8  Decision Trees  J4.8 is Weka’s implementation of C.4.5 revision 8.  Instance-Based Learning  IBk is a k-nearest-neighbor classifier that uses the Eucledian distance as a default, other options include Manhattan, Chebyshev and Minkowski distances.The number of nearest neighbors (default k=1) can be specified explicitly in the parameter window.  Linear Models  In VotedPerceptron, each weight vector contribute a certain number of votes.  SMO implements the sequential minimal optimization algorithm for training a support vector classifier, (SVM) using polynomial or Gaussian kernels (Platt 1998, Keerthi et al. 2001).  Logistic builds linear logistic regression models
  9. 9. Tuning Parameters Lec 4:TheWeka Package9
  10. 10. Evaluation Lec 4:TheWeka Package10
  11. 11. Compare Results Lec 4:TheWeka Package11
  12. 12. Assignment 1 Lec 4:TheWeka Package12  Classification: Decision Trees, Nearest Neighbors and a linear classifier of your choice;  Software package: Weka;  Data sets:  German plural  English past tense  Send WRITTEN REPORT to: santinim@stp.lingfil.uu.se  Report deadline Fri 4 Oct 2013, week 40.
  13. 13. Thank you and Good Luck! Lec 4:TheWeka Package13