Incremental Learning using WEKA

4,212 views

Published on

How to incrementally train the model using WEKA 3.7 Developer version. Model used is Stochastic Gradient Descent.

Published in: Engineering, Technology, Education

Incremental Learning using WEKA

  1. 1. Incremental Learning using WEKA CS267: Data Mining Presentation Guided By: Dr. Tran - Rohit Vobbilisetty
  2. 2.  WEKA - Definition  Incremental Learning – Definition  Incremental Learning in WEKA  Steps to train an UpdateableClassifier  Stochastic Gradient Descent  Sample Code, Result and Demo Overview
  3. 3.  Weka (Waikato Environment for Knowledge Analysis) is a collection of machine learning algorithms for data mining tasks.  Weka 3.7 (Developer version) What is WEKA ?
  4. 4.  Train the Model for each Instance within the dataset  Suitable when dealing with large datasets, which do not fit into the computer’s memory. Incremental Learning Definition and Need
  5. 5.  Applicable to Models implementing the interface: weka.classifiers.UpdateableClassifier (http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas sifier.html)  Models implementing this interface: HoeffdingTree, Ibk, KStar , LWL, MultiClassClassifierUpdateable, NaiveBayesMultinomialText, NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable, SGD, SGDText Incremental Learning - Weka
  6. 6.  Initialize an object of ArffLoader.  Retrieve this object’s structure and set it’s class index (The feature that needs to be predicted – setClassIndex() ).  Iteratively retrieve an instance from the training set and update the classifier ( updateClassifier() ).  Evaluate the trained model against the test dataset. Step to train an UpdateableClassifier()
  7. 7.  Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.  Applicable to large datasets, since each iteration involves processing only a single instance of the training dataset. Stochastic Gradient Descent w: Parameter to be estimated. Qi(w): A single instance of data
  8. 8.  Name: vote.arff ( 17 features )  Features:  Class Name: 2 (democrat, republican)  handicapped-infants: 2 (y,n)  water-project-cost-sharing: 2 (y,n)  adoption-of-the-budget-resolution: 2 (y,n)  physician-fee-freeze: 2 (y,n)  el-salvador-aid: 2 (y,n)  religious-groups-in-schools: 2 (y,n)  anti-satellite-test-ban: 2 (y,n)  aid-to-nicaraguan-contras: 2 (y,n)  mx-missile: 2 (y,n)  immigration: 2 (y,n)  synfuels-corporation-cutback: 2 (y,n)  education-spending: 2 (y,n)  superfund-right-to-sue: 2 (y,n)  crime: 2 (y,n)  duty-free-exports: 2 (y,n)  export-administration-act-south-africa: 2 (y,n) Sample DataSet Description
  9. 9. ArffLoader loader = new ArffLoader(); loader.setFile(new File(“Training File Path”)); Instances structure = loader.getStructure(); SGD classifier = new SGD(); // Configure the classifier classifier.setEpochs(500); classifier.setEpsilon(0.001); // Required if dealing with binary class classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION)); structure.setClassIndex(16); // Set the feature to be predicted classifier.buildClassifier(structure); Instance current; // Incrementally update the Classifier while ((current = loader.getNextInstance(structure)) != null) { ((UpdateableClassifier)classifier).updateClassifier(current); } Sample Code - SGD
  10. 10. Class = -0.26 handicapped-infants + -0.09 water-project-cost-sharing + -0.51 adoption-of-the-budget-resolution + 0.73 physician-fee-freeze + 0.33 el-salvador-aid + 0.04 religious-groups-in-schools + -0.14 anti-satellite-test-ban + -0.33 aid-to-nicaraguan-contras + -0.28 mx-missile + 0.1 immigration + -0.37 synfuels-corporation-cutback + 0.33 education-spending + 0.15 superfund-right-to-sue + 0.18 crime + -0.25 duty-free-exports + 0.02 export-administration-act-south-africa - 0.11 Sample Output Correctly Classified Instances 401 92.1839 % Incorrectly Classified Instances 34 7.8161 % Kappa statistic 0.838 Mean absolute error 0.0782 Root mean squared error 0.2796 Relative absolute error 16.482 % Root relative squared error 57.4214 % Coverage of cases (0.95 level) 92.1839 % Mean rel. region size (0.95 level) 50 % Total Number of Instances 435 Confusion Matrix: 242.0 25.0 9.0 159.0
  11. 11.  SGD class does not support Numeric data types, unless it is configured to use Huber Loss or Square Loss.  The learning rate should not be too small (Slow process) or large (Overshoot the minimum).  Some errors had to be resolved by consulting the WEKA Java code. Challenges Faced
  12. 12.  Wikipedia: http://en.wikipedia.org/wiki/Stochastic_gradient_desc ent  Weka Wiki http://weka.wikispaces.com/Use+Weka+in+your+Java +code References
  13. 13. Thank You

×