MOA: {M}assive {O}nline {A}nalysis.




             University of Waikato
            Hamilton, New Zealand
What is MOA?

  {M}assive {O}nline {A}nalysis is a framework for online learning
  from data streams.




      It is closely related to WEKA
      It includes a collection of offline and online as well as tools
      for evaluation:
           boosting and bagging
           Hoeffding Trees
      with and without Naïve Bayes classifiers at the leaves.



                                                                       2 / 21
WEKA

   Waikato Environment for Knowledge Analysis
   Collection of state-of-the-art machine learning algorithms
   and data processing tools implemented in Java
       Released under the GPL
   Support for the whole process of experimental data mining
       Preparation of input data
       Statistical evaluation of learning schemes
       Visualization of input data and the result of learning




   Used for education, research and applications
   Complements “Data Mining” by Witten & Frank



                                                                3 / 21
WEKA: the bird




                 4 / 21
MOA: the bird

   The Moa (another native NZ bird) is not only flightless, like the
                     Weka, but also extinct.




                                                                      5 / 21
MOA: the bird

   The Moa (another native NZ bird) is not only flightless, like the
                     Weka, but also extinct.




                                                                      5 / 21
MOA: the bird

   The Moa (another native NZ bird) is not only flightless, like the
                     Weka, but also extinct.




                                                                      5 / 21
Data stream classification cycle

  1   Process an example at a time,
      and inspect it only once (at
      most)
  2   Use a limited amount of
      memory
  3   Work in a limited amount of
      time
  4   Be ready to predict at any
      point




                                      6 / 21
Experimental setting

 Evaluation procedures for Data
 Streams
    Holdout
    Interleaved Test-Then-Train or
    Prequential

 Environments
    Sensor Network: 100Kb
    Handheld Computer: 32 Mb
    Server: 400 Mb




                                     7 / 21
Experimental setting

 Data Sources
    Random Tree Generator
    Random RBF Generator
    LED Generator
    Waveform Generator
    Function Generator




                            7 / 21
Experimental setting

 Classifiers
     Naive Bayes
     Decision stumps
     Hoeffding Tree
     Hoeffding Option Tree
     Bagging and Boosting

 Prediction strategies
     Majority class
     Naive Bayes Leaves
     Adaptive Hybrid



                             7 / 21
Hoeffding Option Tree
  Hoeffding Option Trees
  Regular Hoeffding tree containing additional option nodes that
  allow several tests to be applied, leading to multiple Hoeffding
  trees as separate paths.




                                                                     8 / 21
Extension to Evolving Data Streams




  New Evolving Data Stream Extensions

      New Stream Generators
      New UNION of Streams
      New Classifiers




                                        9 / 21
Extension to Evolving Data Streams




  New Evolving Data Stream Generators

     Random RBF with Drift        Hyperplane
     LED with Drift               SEA Generator
     Waveform with Drift          STAGGER Generator




                                                      9 / 21
Concept Drift Framework
          f (t)                                      f (t)
         1


                                     α
        0.5


                         α
                                                 t
                                t0
                                W
  Definition
  Given two data streams a, b, we define c = a ⊕W b as the data
                                                  t0
  stream built joining the two data streams a and b
      Pr[c(t) = b(t)] = 1/(1 + e−4(t−t0 )/W ).
      Pr[c(t) = a(t)] = 1 − Pr[c(t) = b(t)]

                                                                 10 / 21
Concept Drift Framework
          f (t)                                       f (t)
         1


                                     α
        0.5


                         α
                                                  t
                                t0
                                W
  Example
      (((a ⊕W0 b) ⊕W1 c) ⊕W2 d) . . .
            t0     t1     t2
      (((SEA9 ⊕W SEA8 ) ⊕W0 SEA7 ) ⊕W0 SEA9.5 )
               t0        2t         3t

      CovPokElec = (CoverType ⊕5,000 Poker) ⊕5,000
                               581,012       1,000,000 ELEC2


                                                               10 / 21
Extension to Evolving Data Streams




  New Evolving Data Stream Classifiers

     Adaptive Hoeffding Option Tree     OCBoost
     DDM Hoeffding Tree                 FLBoost
     EDDM Hoeffding Tree




                                                  11 / 21
Ensemble Methods




  New ensemble methods:
     Adaptive-Size Hoeffding Tree bagging:
         each tree has a maximum size
         after one node splits, it deletes some nodes to reduce its
         size if the size of the tree is higher than the maximum value
     ADWIN bagging:
         When a change is detected, the worst classifier is removed
         and a new classifier is added.



                                                                         12 / 21
Adaptive-Size Hoeffding Tree
               T1       T2          T3            T4




   Ensemble of trees of different size
       smaller trees adapt more quickly to changes,
       larger trees do better during periods with little change
       diversity

                                                                  13 / 21
Adaptive-Size Hoeffding Tree


              0,3                                                      0,28

             0,29
                                                                      0,275
             0,28

             0,27
                                                                       0,27
             0,26
     Error




                                                              Error
             0,25                                                     0,265

             0,24
                                                                       0,26
             0,23

             0,22
                                                                      0,255
             0,21

              0,2                                                      0,25
                    0   0,1   0,2    0,3    0,4   0,5   0,6                   0,1   0,12   0,14   0,16   0,18    0,2    0,22   0,24   0,26   0,28   0,3
                                    Kappa                                                                       Kappa




   Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging
   (right) on dataset RandomRBF with drift, plotting 90 pairs of
   classifiers.




                                                                                                                                                          14 / 21
ADWIN Bagging

  ADWIN
  An adaptive sliding window whose size is recomputed online
  according to the rate of change observed.

  ADWIN has rigorous guarantees (theorems)
      On ratio of false positives and negatives
      On the relation of the size of the current window and
      change rates

  ADWIN Bagging
  When a change is detected, the worst classifier is removed and
  a new classifier is added.



                                                                  15 / 21
Web

      http://www.cs.waikato.ac.nz/∼abifet/MOA/




                                                 16 / 21
GUI
  java -cp .:moa.jar:weka.jar
  -javaagent:sizeofag.jar moa.gui.TaskLauncher




                                                 17 / 21
GUI
  java -cp .:moa.jar:weka.jar
  -javaagent:sizeofag.jar moa.gui.TaskLauncher




                                                 17 / 21
Command Line

  EvaluatePeriodicHeldOutTest
  java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar
  moa.DoTask "EvaluatePeriodicHeldOutTest
  -l DecisionStump -s generators.WaveformGenerator
  -n 100000 -i 100000000 -f 1000000" > dsresult.csv


  This command creates a comma separated values file:
      training the DecisionStump classifier on the WaveformGenerator
      data,
      using the first 100 thousand examples for testing,
      training on a total of 100 million examples, and
      testing every one million examples:



                                                                      18 / 21
Easy Design of a MOA classifier




      void resetLearningImpl ()
      void trainOnInstanceImpl (Instance inst)
      double[] getVotesForInstance (Instance i)
      void getModelDescription (StringBuilder
      out, int indent)




                                                  19 / 21
Example of a classifier: MajorityClass
   public class MajorityClass extends AbstractClassifier {

       protected DoubleVector observedClassDistribution;

       @Override
       public void resetLearningImpl() {
         this.observedClassDistribution = new DoubleVector();
       }

       @Override
       public void trainOnInstanceImpl(Instance inst) {
         this.observedClassDistribution.addToValue(
             (int) inst.classValue(), inst weight());
       }

       public double[] getVotesForInstance(Instance i) {
         return this.observedClassDistribution.getArrayCopy();
       }
       ...

   }




                                                                 20 / 21
Summary
  {M}assive {O}nline {A}nalysis is a framework for online learning
  from data streams.




      http://www.cs.waikato.ac.nz/∼abifet/MOA/

      It is closely related to WEKA
      It includes a collection of offline and online as well as tools
      for evaluation:
           boosting and bagging
           Hoeffding Trees
      MOA deals with evolving data streams
      MOA is easy to use and extend


                                                                       21 / 21

MOA : Massive Online Analysis

  • 1.
    MOA: {M}assive {O}nline{A}nalysis. University of Waikato Hamilton, New Zealand
  • 2.
    What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: boosting and bagging Hoeffding Trees with and without Naïve Bayes classifiers at the leaves. 2 / 21
  • 3.
    WEKA Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java Released under the GPL Support for the whole process of experimental data mining Preparation of input data Statistical evaluation of learning schemes Visualization of input data and the result of learning Used for education, research and applications Complements “Data Mining” by Witten & Frank 3 / 21
  • 4.
  • 5.
    MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 5 / 21
  • 6.
    MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 5 / 21
  • 7.
    MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 5 / 21
  • 8.
    Data stream classificationcycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 6 / 21
  • 9.
    Experimental setting Evaluationprocedures for Data Streams Holdout Interleaved Test-Then-Train or Prequential Environments Sensor Network: 100Kb Handheld Computer: 32 Mb Server: 400 Mb 7 / 21
  • 10.
    Experimental setting DataSources Random Tree Generator Random RBF Generator LED Generator Waveform Generator Function Generator 7 / 21
  • 11.
    Experimental setting Classifiers Naive Bayes Decision stumps Hoeffding Tree Hoeffding Option Tree Bagging and Boosting Prediction strategies Majority class Naive Bayes Leaves Adaptive Hybrid 7 / 21
  • 12.
    Hoeffding Option Tree Hoeffding Option Trees Regular Hoeffding tree containing additional option nodes that allow several tests to be applied, leading to multiple Hoeffding trees as separate paths. 8 / 21
  • 13.
    Extension to EvolvingData Streams New Evolving Data Stream Extensions New Stream Generators New UNION of Streams New Classifiers 9 / 21
  • 14.
    Extension to EvolvingData Streams New Evolving Data Stream Generators Random RBF with Drift Hyperplane LED with Drift SEA Generator Waveform with Drift STAGGER Generator 9 / 21
  • 15.
    Concept Drift Framework f (t) f (t) 1 α 0.5 α t t0 W Definition Given two data streams a, b, we define c = a ⊕W b as the data t0 stream built joining the two data streams a and b Pr[c(t) = b(t)] = 1/(1 + e−4(t−t0 )/W ). Pr[c(t) = a(t)] = 1 − Pr[c(t) = b(t)] 10 / 21
  • 16.
    Concept Drift Framework f (t) f (t) 1 α 0.5 α t t0 W Example (((a ⊕W0 b) ⊕W1 c) ⊕W2 d) . . . t0 t1 t2 (((SEA9 ⊕W SEA8 ) ⊕W0 SEA7 ) ⊕W0 SEA9.5 ) t0 2t 3t CovPokElec = (CoverType ⊕5,000 Poker) ⊕5,000 581,012 1,000,000 ELEC2 10 / 21
  • 17.
    Extension to EvolvingData Streams New Evolving Data Stream Classifiers Adaptive Hoeffding Option Tree OCBoost DDM Hoeffding Tree FLBoost EDDM Hoeffding Tree 11 / 21
  • 18.
    Ensemble Methods New ensemble methods: Adaptive-Size Hoeffding Tree bagging: each tree has a maximum size after one node splits, it deletes some nodes to reduce its size if the size of the tree is higher than the maximum value ADWIN bagging: When a change is detected, the worst classifier is removed and a new classifier is added. 12 / 21
  • 19.
    Adaptive-Size Hoeffding Tree T1 T2 T3 T4 Ensemble of trees of different size smaller trees adapt more quickly to changes, larger trees do better during periods with little change diversity 13 / 21
  • 20.
    Adaptive-Size Hoeffding Tree 0,3 0,28 0,29 0,275 0,28 0,27 0,27 0,26 Error Error 0,25 0,265 0,24 0,26 0,23 0,22 0,255 0,21 0,2 0,25 0 0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3 Kappa Kappa Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging (right) on dataset RandomRBF with drift, plotting 90 pairs of classifiers. 14 / 21
  • 21.
    ADWIN Bagging ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and negatives On the relation of the size of the current window and change rates ADWIN Bagging When a change is detected, the worst classifier is removed and a new classifier is added. 15 / 21
  • 22.
    Web http://www.cs.waikato.ac.nz/∼abifet/MOA/ 16 / 21
  • 23.
    GUI java-cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.gui.TaskLauncher 17 / 21
  • 24.
    GUI java-cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.gui.TaskLauncher 17 / 21
  • 25.
    Command Line EvaluatePeriodicHeldOutTest java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -l DecisionStump -s generators.WaveformGenerator -n 100000 -i 100000000 -f 1000000" > dsresult.csv This command creates a comma separated values file: training the DecisionStump classifier on the WaveformGenerator data, using the first 100 thousand examples for testing, training on a total of 100 million examples, and testing every one million examples: 18 / 21
  • 26.
    Easy Design ofa MOA classifier void resetLearningImpl () void trainOnInstanceImpl (Instance inst) double[] getVotesForInstance (Instance i) void getModelDescription (StringBuilder out, int indent) 19 / 21
  • 27.
    Example of aclassifier: MajorityClass public class MajorityClass extends AbstractClassifier { protected DoubleVector observedClassDistribution; @Override public void resetLearningImpl() { this.observedClassDistribution = new DoubleVector(); } @Override public void trainOnInstanceImpl(Instance inst) { this.observedClassDistribution.addToValue( (int) inst.classValue(), inst weight()); } public double[] getVotesForInstance(Instance i) { return this.observedClassDistribution.getArrayCopy(); } ... } 20 / 21
  • 28.
    Summary {M}assive{O}nline {A}nalysis is a framework for online learning from data streams. http://www.cs.waikato.ac.nz/∼abifet/MOA/ It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: boosting and bagging Hoeffding Trees MOA deals with evolving data streams MOA is easy to use and extend 21 / 21