Incremental Learning using WEKA
CS267: Data Mining Presentation
Guided By: Dr. Tran
- Rohit Vobbilisetty
 WEKA - Definition
 Incremental Learning – Definition
 Incremental Learning in WEKA
 Steps to train an UpdateableClassifier
 Stochastic Gradient Descent
 Sample Code, Result and Demo
Overview
 Weka (Waikato Environment for Knowledge Analysis)
is a collection of machine learning algorithms for data
mining tasks.
 Weka 3.7 (Developer version)
What is WEKA ?
 Train the Model for each Instance within the dataset
 Suitable when dealing with large datasets, which do not fit
into the computer’s memory.
Incremental Learning
Definition and Need
 Applicable to Models implementing the interface:
weka.classifiers.UpdateableClassifier
(http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas
sifier.html)
 Models implementing this interface:
HoeffdingTree, Ibk, KStar , LWL,
MultiClassClassifierUpdateable, NaiveBayesMultinomialText,
NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable,
SGD, SGDText
Incremental Learning - Weka
 Initialize an object of ArffLoader.
 Retrieve this object’s structure and set it’s class index
(The feature that needs to be predicted –
setClassIndex() ).
 Iteratively retrieve an instance from the training set
and update the classifier ( updateClassifier() ).
 Evaluate the trained model against the test dataset.
Step to train an
UpdateableClassifier()
 Stochastic gradient descent is a gradient descent
optimization method for minimizing an objective
function that is written as a sum of differentiable
functions.
 Applicable to large datasets, since each iteration
involves processing only a single instance of the
training dataset.
Stochastic Gradient Descent
w: Parameter to be estimated.
Qi(w): A single instance of data
 Name: vote.arff ( 17 features )
 Features:
 Class Name: 2 (democrat, republican)
 handicapped-infants: 2 (y,n)
 water-project-cost-sharing: 2 (y,n)
 adoption-of-the-budget-resolution: 2 (y,n)
 physician-fee-freeze: 2 (y,n)
 el-salvador-aid: 2 (y,n)
 religious-groups-in-schools: 2 (y,n)
 anti-satellite-test-ban: 2 (y,n)
 aid-to-nicaraguan-contras: 2 (y,n)
 mx-missile: 2 (y,n)
 immigration: 2 (y,n)
 synfuels-corporation-cutback: 2 (y,n)
 education-spending: 2 (y,n)
 superfund-right-to-sue: 2 (y,n)
 crime: 2 (y,n)
 duty-free-exports: 2 (y,n)
 export-administration-act-south-africa: 2 (y,n)
Sample DataSet Description
ArffLoader loader = new ArffLoader();
loader.setFile(new File(“Training File Path”));
Instances structure = loader.getStructure();
SGD classifier = new SGD(); // Configure the classifier
classifier.setEpochs(500);
classifier.setEpsilon(0.001);
// Required if dealing with binary class
classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION));
structure.setClassIndex(16); // Set the feature to be predicted
classifier.buildClassifier(structure);
Instance current;
// Incrementally update the Classifier
while ((current = loader.getNextInstance(structure)) != null) {
((UpdateableClassifier)classifier).updateClassifier(current);
}
Sample Code - SGD
Class =
-0.26 handicapped-infants
+ -0.09 water-project-cost-sharing
+ -0.51 adoption-of-the-budget-resolution
+ 0.73 physician-fee-freeze
+ 0.33 el-salvador-aid
+ 0.04 religious-groups-in-schools
+ -0.14 anti-satellite-test-ban
+ -0.33 aid-to-nicaraguan-contras
+ -0.28 mx-missile
+ 0.1 immigration
+ -0.37 synfuels-corporation-cutback
+ 0.33 education-spending
+ 0.15 superfund-right-to-sue
+ 0.18 crime
+ -0.25 duty-free-exports
+ 0.02 export-administration-act-south-africa
- 0.11
Sample Output
Correctly Classified Instances 401 92.1839 %
Incorrectly Classified Instances 34 7.8161 %
Kappa statistic 0.838
Mean absolute error 0.0782
Root mean squared error 0.2796
Relative absolute error 16.482 %
Root relative squared error 57.4214 %
Coverage of cases (0.95 level) 92.1839 %
Mean rel. region size (0.95 level) 50 %
Total Number of Instances 435
Confusion Matrix:
242.0 25.0
9.0 159.0
 SGD class does not support Numeric data types,
unless it is configured to use Huber Loss or Square
Loss.
 The learning rate should not be too small (Slow
process) or large (Overshoot the minimum).
 Some errors had to be resolved by consulting the
WEKA Java code.
Challenges Faced
 Wikipedia:
http://en.wikipedia.org/wiki/Stochastic_gradient_desc
ent
 Weka Wiki
http://weka.wikispaces.com/Use+Weka+in+your+Java
+code
References
Thank You

Incremental Learning using WEKA

  • 1.
    Incremental Learning usingWEKA CS267: Data Mining Presentation Guided By: Dr. Tran - Rohit Vobbilisetty
  • 2.
     WEKA -Definition  Incremental Learning – Definition  Incremental Learning in WEKA  Steps to train an UpdateableClassifier  Stochastic Gradient Descent  Sample Code, Result and Demo Overview
  • 3.
     Weka (WaikatoEnvironment for Knowledge Analysis) is a collection of machine learning algorithms for data mining tasks.  Weka 3.7 (Developer version) What is WEKA ?
  • 4.
     Train theModel for each Instance within the dataset  Suitable when dealing with large datasets, which do not fit into the computer’s memory. Incremental Learning Definition and Need
  • 5.
     Applicable toModels implementing the interface: weka.classifiers.UpdateableClassifier (http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas sifier.html)  Models implementing this interface: HoeffdingTree, Ibk, KStar , LWL, MultiClassClassifierUpdateable, NaiveBayesMultinomialText, NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable, SGD, SGDText Incremental Learning - Weka
  • 6.
     Initialize anobject of ArffLoader.  Retrieve this object’s structure and set it’s class index (The feature that needs to be predicted – setClassIndex() ).  Iteratively retrieve an instance from the training set and update the classifier ( updateClassifier() ).  Evaluate the trained model against the test dataset. Step to train an UpdateableClassifier()
  • 7.
     Stochastic gradientdescent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.  Applicable to large datasets, since each iteration involves processing only a single instance of the training dataset. Stochastic Gradient Descent w: Parameter to be estimated. Qi(w): A single instance of data
  • 8.
     Name: vote.arff( 17 features )  Features:  Class Name: 2 (democrat, republican)  handicapped-infants: 2 (y,n)  water-project-cost-sharing: 2 (y,n)  adoption-of-the-budget-resolution: 2 (y,n)  physician-fee-freeze: 2 (y,n)  el-salvador-aid: 2 (y,n)  religious-groups-in-schools: 2 (y,n)  anti-satellite-test-ban: 2 (y,n)  aid-to-nicaraguan-contras: 2 (y,n)  mx-missile: 2 (y,n)  immigration: 2 (y,n)  synfuels-corporation-cutback: 2 (y,n)  education-spending: 2 (y,n)  superfund-right-to-sue: 2 (y,n)  crime: 2 (y,n)  duty-free-exports: 2 (y,n)  export-administration-act-south-africa: 2 (y,n) Sample DataSet Description
  • 9.
    ArffLoader loader =new ArffLoader(); loader.setFile(new File(“Training File Path”)); Instances structure = loader.getStructure(); SGD classifier = new SGD(); // Configure the classifier classifier.setEpochs(500); classifier.setEpsilon(0.001); // Required if dealing with binary class classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION)); structure.setClassIndex(16); // Set the feature to be predicted classifier.buildClassifier(structure); Instance current; // Incrementally update the Classifier while ((current = loader.getNextInstance(structure)) != null) { ((UpdateableClassifier)classifier).updateClassifier(current); } Sample Code - SGD
  • 10.
    Class = -0.26 handicapped-infants +-0.09 water-project-cost-sharing + -0.51 adoption-of-the-budget-resolution + 0.73 physician-fee-freeze + 0.33 el-salvador-aid + 0.04 religious-groups-in-schools + -0.14 anti-satellite-test-ban + -0.33 aid-to-nicaraguan-contras + -0.28 mx-missile + 0.1 immigration + -0.37 synfuels-corporation-cutback + 0.33 education-spending + 0.15 superfund-right-to-sue + 0.18 crime + -0.25 duty-free-exports + 0.02 export-administration-act-south-africa - 0.11 Sample Output Correctly Classified Instances 401 92.1839 % Incorrectly Classified Instances 34 7.8161 % Kappa statistic 0.838 Mean absolute error 0.0782 Root mean squared error 0.2796 Relative absolute error 16.482 % Root relative squared error 57.4214 % Coverage of cases (0.95 level) 92.1839 % Mean rel. region size (0.95 level) 50 % Total Number of Instances 435 Confusion Matrix: 242.0 25.0 9.0 159.0
  • 11.
     SGD classdoes not support Numeric data types, unless it is configured to use Huber Loss or Square Loss.  The learning rate should not be too small (Slow process) or large (Overshoot the minimum).  Some errors had to be resolved by consulting the WEKA Java code. Challenges Faced
  • 12.
     Wikipedia: http://en.wikipedia.org/wiki/Stochastic_gradient_desc ent  WekaWiki http://weka.wikispaces.com/Use+Weka+in+your+Java +code References
  • 13.