SlideShare a Scribd company logo
Incremental Learning using WEKA
CS267: Data Mining Presentation
Guided By: Dr. Tran
- Rohit Vobbilisetty
 WEKA - Definition
 Incremental Learning – Definition
 Incremental Learning in WEKA
 Steps to train an UpdateableClassifier
 Stochastic Gradient Descent
 Sample Code, Result and Demo
Overview
 Weka (Waikato Environment for Knowledge Analysis)
is a collection of machine learning algorithms for data
mining tasks.
 Weka 3.7 (Developer version)
What is WEKA ?
 Train the Model for each Instance within the dataset
 Suitable when dealing with large datasets, which do not fit
into the computer’s memory.
Incremental Learning
Definition and Need
 Applicable to Models implementing the interface:
weka.classifiers.UpdateableClassifier
(http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas
sifier.html)
 Models implementing this interface:
HoeffdingTree, Ibk, KStar , LWL,
MultiClassClassifierUpdateable, NaiveBayesMultinomialText,
NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable,
SGD, SGDText
Incremental Learning - Weka
 Initialize an object of ArffLoader.
 Retrieve this object’s structure and set it’s class index
(The feature that needs to be predicted –
setClassIndex() ).
 Iteratively retrieve an instance from the training set
and update the classifier ( updateClassifier() ).
 Evaluate the trained model against the test dataset.
Step to train an
UpdateableClassifier()
 Stochastic gradient descent is a gradient descent
optimization method for minimizing an objective
function that is written as a sum of differentiable
functions.
 Applicable to large datasets, since each iteration
involves processing only a single instance of the
training dataset.
Stochastic Gradient Descent
w: Parameter to be estimated.
Qi(w): A single instance of data
 Name: vote.arff ( 17 features )
 Features:
 Class Name: 2 (democrat, republican)
 handicapped-infants: 2 (y,n)
 water-project-cost-sharing: 2 (y,n)
 adoption-of-the-budget-resolution: 2 (y,n)
 physician-fee-freeze: 2 (y,n)
 el-salvador-aid: 2 (y,n)
 religious-groups-in-schools: 2 (y,n)
 anti-satellite-test-ban: 2 (y,n)
 aid-to-nicaraguan-contras: 2 (y,n)
 mx-missile: 2 (y,n)
 immigration: 2 (y,n)
 synfuels-corporation-cutback: 2 (y,n)
 education-spending: 2 (y,n)
 superfund-right-to-sue: 2 (y,n)
 crime: 2 (y,n)
 duty-free-exports: 2 (y,n)
 export-administration-act-south-africa: 2 (y,n)
Sample DataSet Description
ArffLoader loader = new ArffLoader();
loader.setFile(new File(“Training File Path”));
Instances structure = loader.getStructure();
SGD classifier = new SGD(); // Configure the classifier
classifier.setEpochs(500);
classifier.setEpsilon(0.001);
// Required if dealing with binary class
classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION));
structure.setClassIndex(16); // Set the feature to be predicted
classifier.buildClassifier(structure);
Instance current;
// Incrementally update the Classifier
while ((current = loader.getNextInstance(structure)) != null) {
((UpdateableClassifier)classifier).updateClassifier(current);
}
Sample Code - SGD
Class =
-0.26 handicapped-infants
+ -0.09 water-project-cost-sharing
+ -0.51 adoption-of-the-budget-resolution
+ 0.73 physician-fee-freeze
+ 0.33 el-salvador-aid
+ 0.04 religious-groups-in-schools
+ -0.14 anti-satellite-test-ban
+ -0.33 aid-to-nicaraguan-contras
+ -0.28 mx-missile
+ 0.1 immigration
+ -0.37 synfuels-corporation-cutback
+ 0.33 education-spending
+ 0.15 superfund-right-to-sue
+ 0.18 crime
+ -0.25 duty-free-exports
+ 0.02 export-administration-act-south-africa
- 0.11
Sample Output
Correctly Classified Instances 401 92.1839 %
Incorrectly Classified Instances 34 7.8161 %
Kappa statistic 0.838
Mean absolute error 0.0782
Root mean squared error 0.2796
Relative absolute error 16.482 %
Root relative squared error 57.4214 %
Coverage of cases (0.95 level) 92.1839 %
Mean rel. region size (0.95 level) 50 %
Total Number of Instances 435
Confusion Matrix:
242.0 25.0
9.0 159.0
 SGD class does not support Numeric data types,
unless it is configured to use Huber Loss or Square
Loss.
 The learning rate should not be too small (Slow
process) or large (Overshoot the minimum).
 Some errors had to be resolved by consulting the
WEKA Java code.
Challenges Faced
 Wikipedia:
http://en.wikipedia.org/wiki/Stochastic_gradient_desc
ent
 Weka Wiki
http://weka.wikispaces.com/Use+Weka+in+your+Java
+code
References
Thank You

More Related Content

Similar to Incremental Learning using WEKA

Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
scalaconfjp
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
rsathishwaran
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Lucidworks
 
Weka library, JAVA
Weka library, JAVAWeka library, JAVA
Weka library, JAVA
Kamthorn Puntumapon
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
Julian Hyde
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
François Garillot
 
Training course lect3
Training course lect3Training course lect3
Training course lect3
Noor Dhiya
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
SAP Concur
 
Javascript & SQL within database management system
Javascript & SQL within database management systemJavascript & SQL within database management system
Javascript & SQL within database management system
Clusterpoint
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applications
Robert Sanders
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Databricks
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
BertrandDrouvot
 
Clustering
ClusteringClustering
Clustering
Meme Hei
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online Classification
Prasad Chalasani
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
Monal Daxini
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 

Similar to Incremental Learning using WEKA (20)

Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Weka library, JAVA
Weka library, JAVAWeka library, JAVA
Weka library, JAVA
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Training course lect3
Training course lect3Training course lect3
Training course lect3
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Javascript & SQL within database management system
Javascript & SQL within database management systemJavascript & SQL within database management system
Javascript & SQL within database management system
 
First fare 2010 java-beta-2011
First fare 2010 java-beta-2011First fare 2010 java-beta-2011
First fare 2010 java-beta-2011
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applications
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei Diao
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 
Clustering
ClusteringClustering
Clustering
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online Classification
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
Prashant Kumar
Prashant KumarPrashant Kumar
Prashant Kumar
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 

Recently uploaded

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
bhadouriyakaku
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 

Recently uploaded (20)

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 

Incremental Learning using WEKA

  • 1. Incremental Learning using WEKA CS267: Data Mining Presentation Guided By: Dr. Tran - Rohit Vobbilisetty
  • 2.  WEKA - Definition  Incremental Learning – Definition  Incremental Learning in WEKA  Steps to train an UpdateableClassifier  Stochastic Gradient Descent  Sample Code, Result and Demo Overview
  • 3.  Weka (Waikato Environment for Knowledge Analysis) is a collection of machine learning algorithms for data mining tasks.  Weka 3.7 (Developer version) What is WEKA ?
  • 4.  Train the Model for each Instance within the dataset  Suitable when dealing with large datasets, which do not fit into the computer’s memory. Incremental Learning Definition and Need
  • 5.  Applicable to Models implementing the interface: weka.classifiers.UpdateableClassifier (http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas sifier.html)  Models implementing this interface: HoeffdingTree, Ibk, KStar , LWL, MultiClassClassifierUpdateable, NaiveBayesMultinomialText, NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable, SGD, SGDText Incremental Learning - Weka
  • 6.  Initialize an object of ArffLoader.  Retrieve this object’s structure and set it’s class index (The feature that needs to be predicted – setClassIndex() ).  Iteratively retrieve an instance from the training set and update the classifier ( updateClassifier() ).  Evaluate the trained model against the test dataset. Step to train an UpdateableClassifier()
  • 7.  Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.  Applicable to large datasets, since each iteration involves processing only a single instance of the training dataset. Stochastic Gradient Descent w: Parameter to be estimated. Qi(w): A single instance of data
  • 8.  Name: vote.arff ( 17 features )  Features:  Class Name: 2 (democrat, republican)  handicapped-infants: 2 (y,n)  water-project-cost-sharing: 2 (y,n)  adoption-of-the-budget-resolution: 2 (y,n)  physician-fee-freeze: 2 (y,n)  el-salvador-aid: 2 (y,n)  religious-groups-in-schools: 2 (y,n)  anti-satellite-test-ban: 2 (y,n)  aid-to-nicaraguan-contras: 2 (y,n)  mx-missile: 2 (y,n)  immigration: 2 (y,n)  synfuels-corporation-cutback: 2 (y,n)  education-spending: 2 (y,n)  superfund-right-to-sue: 2 (y,n)  crime: 2 (y,n)  duty-free-exports: 2 (y,n)  export-administration-act-south-africa: 2 (y,n) Sample DataSet Description
  • 9. ArffLoader loader = new ArffLoader(); loader.setFile(new File(“Training File Path”)); Instances structure = loader.getStructure(); SGD classifier = new SGD(); // Configure the classifier classifier.setEpochs(500); classifier.setEpsilon(0.001); // Required if dealing with binary class classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION)); structure.setClassIndex(16); // Set the feature to be predicted classifier.buildClassifier(structure); Instance current; // Incrementally update the Classifier while ((current = loader.getNextInstance(structure)) != null) { ((UpdateableClassifier)classifier).updateClassifier(current); } Sample Code - SGD
  • 10. Class = -0.26 handicapped-infants + -0.09 water-project-cost-sharing + -0.51 adoption-of-the-budget-resolution + 0.73 physician-fee-freeze + 0.33 el-salvador-aid + 0.04 religious-groups-in-schools + -0.14 anti-satellite-test-ban + -0.33 aid-to-nicaraguan-contras + -0.28 mx-missile + 0.1 immigration + -0.37 synfuels-corporation-cutback + 0.33 education-spending + 0.15 superfund-right-to-sue + 0.18 crime + -0.25 duty-free-exports + 0.02 export-administration-act-south-africa - 0.11 Sample Output Correctly Classified Instances 401 92.1839 % Incorrectly Classified Instances 34 7.8161 % Kappa statistic 0.838 Mean absolute error 0.0782 Root mean squared error 0.2796 Relative absolute error 16.482 % Root relative squared error 57.4214 % Coverage of cases (0.95 level) 92.1839 % Mean rel. region size (0.95 level) 50 % Total Number of Instances 435 Confusion Matrix: 242.0 25.0 9.0 159.0
  • 11.  SGD class does not support Numeric data types, unless it is configured to use Huber Loss or Square Loss.  The learning rate should not be too small (Slow process) or large (Overshoot the minimum).  Some errors had to be resolved by consulting the WEKA Java code. Challenges Faced
  • 12.  Wikipedia: http://en.wikipedia.org/wiki/Stochastic_gradient_desc ent  Weka Wiki http://weka.wikispaces.com/Use+Weka+in+your+Java +code References