Microsoft PowerPoint - weka [Read-Only]

2,381 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,381
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
130
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Microsoft PowerPoint - weka [Read-Only]

  1. 1. WEKA: A Machine Machine Learning with Learning Toolkit WEKA The Explorer • Classification and Regression • Clustering Eibe Frank • Association Rules • Attribute Selection Department of Computer Science, University of Waikato, New Zealand • Data Visualization The Experimenter The Knowledge Flow GUI Conclusions WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl) 2/4/2004 University of Waikato 2 Machine Learning for Data Mining 1
  2. 2. WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms 2/4/2004 University of Waikato 3 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4) 2/4/2004 University of Waikato 4 Machine Learning for Data Mining 2
  3. 3. WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/4/2004 University of Waikato 5 WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/4/2004 University of Waikato 6 Machine Learning for Data Mining 3
  4. 4. 2/4/2004 University of Waikato 7 2/4/2004 University of Waikato 8 Machine Learning for Data Mining 4
  5. 5. 2/4/2004 University of Waikato 9 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 2/4/2004 University of Waikato 10 Machine Learning for Data Mining 5
  6. 6. 2/4/2004 University of Waikato 11 2/4/2004 University of Waikato 12 Machine Learning for Data Mining 6
  7. 7. 2/4/2004 University of Waikato 13 2/4/2004 University of Waikato 14 Machine Learning for Data Mining 7
  8. 8. 2/4/2004 University of Waikato 15 2/4/2004 University of Waikato 16 Machine Learning for Data Mining 8
  9. 9. 2/4/2004 University of Waikato 17 2/4/2004 University of Waikato 18 Machine Learning for Data Mining 9
  10. 10. 2/4/2004 University of Waikato 19 2/4/2004 University of Waikato 20 Machine Learning for Data Mining 10
  11. 11. 2/4/2004 University of Waikato 21 2/4/2004 University of Waikato 22 Machine Learning for Data Mining 11
  12. 12. 2/4/2004 University of Waikato 23 2/4/2004 University of Waikato 24 Machine Learning for Data Mining 12
  13. 13. 2/4/2004 University of Waikato 25 2/4/2004 University of Waikato 26 Machine Learning for Data Mining 13
  14. 14. 2/4/2004 University of Waikato 27 2/4/2004 University of Waikato 28 Machine Learning for Data Mining 14
  15. 15. 2/4/2004 University of Waikato 29 2/4/2004 University of Waikato 30 Machine Learning for Data Mining 15
  16. 16. 2/4/2004 University of Waikato 31 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, … 2/4/2004 University of Waikato 32 Machine Learning for Data Mining 16
  17. 17. 2/4/2004 University of Waikato 33 2/4/2004 University of Waikato 34 Machine Learning for Data Mining 17
  18. 18. 2/4/2004 University of Waikato 35 2/4/2004 University of Waikato 36 Machine Learning for Data Mining 18
  19. 19. 2/4/2004 University of Waikato 37 2/4/2004 University of Waikato 38 Machine Learning for Data Mining 19
  20. 20. 2/4/2004 University of Waikato 53 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution 2/4/2004 University of Waikato 92 Machine Learning for Data Mining 20
  21. 21. Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter ⇒ bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 2/4/2004 University of Waikato 108 2/4/2004 University of Waikato 109 Machine Learning for Data Mining 21
  22. 22. 2/4/2004 University of Waikato 110 2/4/2004 University of Waikato 111 Machine Learning for Data Mining 22
  23. 23. 2/4/2004 University of Waikato 112 2/4/2004 University of Waikato 113 Machine Learning for Data Mining 23
  24. 24. 2/4/2004 University of Waikato 114 2/4/2004 University of Waikato 115 Machine Learning for Data Mining 24
  25. 25. Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 2/4/2004 University of Waikato 116 2/4/2004 University of Waikato 117 Machine Learning for Data Mining 25
  26. 26. 2/4/2004 University of Waikato 118 2/4/2004 University of Waikato 119 Machine Learning for Data Mining 26
  27. 27. 2/4/2004 University of Waikato 120 2/4/2004 University of Waikato 121 Machine Learning for Data Mining 27
  28. 28. 2/4/2004 University of Waikato 122 2/4/2004 University of Waikato 123 Machine Learning for Data Mining 28
  29. 29. 2/4/2004 University of Waikato 124 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 2/4/2004 University of Waikato 125 Machine Learning for Data Mining 29
  30. 30. 2/4/2004 University of Waikato 126 2/4/2004 University of Waikato 127 Machine Learning for Data Mining 30
  31. 31. 2/4/2004 University of Waikato 128 2/4/2004 University of Waikato 129 Machine Learning for Data Mining 31
  32. 32. 2/4/2004 University of Waikato 130 2/4/2004 University of Waikato 131 Machine Learning for Data Mining 32
  33. 33. 2/4/2004 University of Waikato 132 2/4/2004 University of Waikato 133 Machine Learning for Data Mining 33
  34. 34. 2/4/2004 University of Waikato 134 2/4/2004 University of Waikato 135 Machine Learning for Data Mining 34
  35. 35. 2/4/2004 University of Waikato 136 2/4/2004 University of Waikato 137 Machine Learning for Data Mining 35
  36. 36. Performing experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in! 2/4/2004 University of Waikato 138 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later 2/4/2004 University of Waikato 152 Machine Learning for Data Mining 36
  37. 37. 2/4/2004 University of Waikato 153 2/4/2004 University of Waikato 154 Machine Learning for Data Mining 37
  38. 38. 2/4/2004 University of Waikato 155 2/4/2004 University of Waikato 156 Machine Learning for Data Mining 38
  39. 39. 2/4/2004 University of Waikato 157 2/4/2004 University of Waikato 158 Machine Learning for Data Mining 39
  40. 40. 2/4/2004 University of Waikato 159 2/4/2004 University of Waikato 160 Machine Learning for Data Mining 40
  41. 41. 2/4/2004 University of Waikato 161 2/4/2004 University of Waikato 162 Machine Learning for Data Mining 41
  42. 42. 2/4/2004 University of Waikato 163 2/4/2004 University of Waikato 164 Machine Learning for Data Mining 42
  43. 43. 2/4/2004 University of Waikato 165 2/4/2004 University of Waikato 166 Machine Learning for Data Mining 43
  44. 44. 2/4/2004 University of Waikato 167 2/4/2004 University of Waikato 168 Machine Learning for Data Mining 44
  45. 45. 2/4/2004 University of Waikato 169 2/4/2004 University of Waikato 170 Machine Learning for Data Mining 45
  46. 46. 2/4/2004 University of Waikato 171 2/4/2004 University of Waikato 172 Machine Learning for Data Mining 46
  47. 47. Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 2/4/2004 University of Waikato 173 Machine Learning for Data Mining 47

×