• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Microsoft PowerPoint - weka [Read-Only]
 

Microsoft PowerPoint - weka [Read-Only]

on

  • 2,299 views

 

Statistics

Views

Total Views
2,299
Views on SlideShare
2,299
Embed Views
0

Actions

Likes
0
Downloads
118
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Microsoft PowerPoint - weka [Read-Only] Microsoft PowerPoint - weka [Read-Only] Document Transcript

    • WEKA: A Machine Machine Learning with Learning Toolkit WEKA The Explorer • Classification and Regression • Clustering Eibe Frank • Association Rules • Attribute Selection Department of Computer Science, University of Waikato, New Zealand • Data Visualization The Experimenter The Knowledge Flow GUI Conclusions WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl) 2/4/2004 University of Waikato 2 Machine Learning for Data Mining 1
    • WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms 2/4/2004 University of Waikato 3 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4) 2/4/2004 University of Waikato 4 Machine Learning for Data Mining 2
    • WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/4/2004 University of Waikato 5 WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/4/2004 University of Waikato 6 Machine Learning for Data Mining 3
    • 2/4/2004 University of Waikato 7 2/4/2004 University of Waikato 8 Machine Learning for Data Mining 4
    • 2/4/2004 University of Waikato 9 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 2/4/2004 University of Waikato 10 Machine Learning for Data Mining 5
    • 2/4/2004 University of Waikato 11 2/4/2004 University of Waikato 12 Machine Learning for Data Mining 6
    • 2/4/2004 University of Waikato 13 2/4/2004 University of Waikato 14 Machine Learning for Data Mining 7
    • 2/4/2004 University of Waikato 15 2/4/2004 University of Waikato 16 Machine Learning for Data Mining 8
    • 2/4/2004 University of Waikato 17 2/4/2004 University of Waikato 18 Machine Learning for Data Mining 9
    • 2/4/2004 University of Waikato 19 2/4/2004 University of Waikato 20 Machine Learning for Data Mining 10
    • 2/4/2004 University of Waikato 21 2/4/2004 University of Waikato 22 Machine Learning for Data Mining 11
    • 2/4/2004 University of Waikato 23 2/4/2004 University of Waikato 24 Machine Learning for Data Mining 12
    • 2/4/2004 University of Waikato 25 2/4/2004 University of Waikato 26 Machine Learning for Data Mining 13
    • 2/4/2004 University of Waikato 27 2/4/2004 University of Waikato 28 Machine Learning for Data Mining 14
    • 2/4/2004 University of Waikato 29 2/4/2004 University of Waikato 30 Machine Learning for Data Mining 15
    • 2/4/2004 University of Waikato 31 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, … 2/4/2004 University of Waikato 32 Machine Learning for Data Mining 16
    • 2/4/2004 University of Waikato 33 2/4/2004 University of Waikato 34 Machine Learning for Data Mining 17
    • 2/4/2004 University of Waikato 35 2/4/2004 University of Waikato 36 Machine Learning for Data Mining 18
    • 2/4/2004 University of Waikato 37 2/4/2004 University of Waikato 38 Machine Learning for Data Mining 19
    • 2/4/2004 University of Waikato 53 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution 2/4/2004 University of Waikato 92 Machine Learning for Data Mining 20
    • Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter ⇒ bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 2/4/2004 University of Waikato 108 2/4/2004 University of Waikato 109 Machine Learning for Data Mining 21
    • 2/4/2004 University of Waikato 110 2/4/2004 University of Waikato 111 Machine Learning for Data Mining 22
    • 2/4/2004 University of Waikato 112 2/4/2004 University of Waikato 113 Machine Learning for Data Mining 23
    • 2/4/2004 University of Waikato 114 2/4/2004 University of Waikato 115 Machine Learning for Data Mining 24
    • Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 2/4/2004 University of Waikato 116 2/4/2004 University of Waikato 117 Machine Learning for Data Mining 25
    • 2/4/2004 University of Waikato 118 2/4/2004 University of Waikato 119 Machine Learning for Data Mining 26
    • 2/4/2004 University of Waikato 120 2/4/2004 University of Waikato 121 Machine Learning for Data Mining 27
    • 2/4/2004 University of Waikato 122 2/4/2004 University of Waikato 123 Machine Learning for Data Mining 28
    • 2/4/2004 University of Waikato 124 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 2/4/2004 University of Waikato 125 Machine Learning for Data Mining 29
    • 2/4/2004 University of Waikato 126 2/4/2004 University of Waikato 127 Machine Learning for Data Mining 30
    • 2/4/2004 University of Waikato 128 2/4/2004 University of Waikato 129 Machine Learning for Data Mining 31
    • 2/4/2004 University of Waikato 130 2/4/2004 University of Waikato 131 Machine Learning for Data Mining 32
    • 2/4/2004 University of Waikato 132 2/4/2004 University of Waikato 133 Machine Learning for Data Mining 33
    • 2/4/2004 University of Waikato 134 2/4/2004 University of Waikato 135 Machine Learning for Data Mining 34
    • 2/4/2004 University of Waikato 136 2/4/2004 University of Waikato 137 Machine Learning for Data Mining 35
    • Performing experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in! 2/4/2004 University of Waikato 138 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later 2/4/2004 University of Waikato 152 Machine Learning for Data Mining 36
    • 2/4/2004 University of Waikato 153 2/4/2004 University of Waikato 154 Machine Learning for Data Mining 37
    • 2/4/2004 University of Waikato 155 2/4/2004 University of Waikato 156 Machine Learning for Data Mining 38
    • 2/4/2004 University of Waikato 157 2/4/2004 University of Waikato 158 Machine Learning for Data Mining 39
    • 2/4/2004 University of Waikato 159 2/4/2004 University of Waikato 160 Machine Learning for Data Mining 40
    • 2/4/2004 University of Waikato 161 2/4/2004 University of Waikato 162 Machine Learning for Data Mining 41
    • 2/4/2004 University of Waikato 163 2/4/2004 University of Waikato 164 Machine Learning for Data Mining 42
    • 2/4/2004 University of Waikato 165 2/4/2004 University of Waikato 166 Machine Learning for Data Mining 43
    • 2/4/2004 University of Waikato 167 2/4/2004 University of Waikato 168 Machine Learning for Data Mining 44
    • 2/4/2004 University of Waikato 169 2/4/2004 University of Waikato 170 Machine Learning for Data Mining 45
    • 2/4/2004 University of Waikato 171 2/4/2004 University of Waikato 172 Machine Learning for Data Mining 46
    • Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 2/4/2004 University of Waikato 173 Machine Learning for Data Mining 47