• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
MOA : Massive Online Analysis
 

MOA : Massive Online Analysis

on

  • 3,904 views

Talk about MOA, a framework similar to WEKA dealing with massive data and data streams.

Talk about MOA, a framework similar to WEKA dealing with massive data and data streams.

Statistics

Views

Total Views
3,904
Views on SlideShare
3,162
Embed Views
742

Actions

Likes
6
Downloads
66
Comments
0

5 Embeds 742

http://www.cs.waikato.ac.nz 727
http://www.slideshare.net 9
http://albertbifet.com 4
http://translate.googleusercontent.com 1
file:// 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    MOA : Massive Online Analysis MOA : Massive Online Analysis Presentation Transcript

    • MOA: {M}assive {O}nline {A}nalysis. University of Waikato Hamilton, New Zealand
    • What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: boosting and bagging Hoeffding Trees with and without Naïve Bayes classifiers at the leaves. 2 / 21
    • WEKA Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java Released under the GPL Support for the whole process of experimental data mining Preparation of input data Statistical evaluation of learning schemes Visualization of input data and the result of learning Used for education, research and applications Complements “Data Mining” by Witten & Frank 3 / 21
    • WEKA: the bird 4 / 21
    • MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 5 / 21
    • MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 5 / 21
    • MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 5 / 21
    • Data stream classification cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 6 / 21
    • Experimental setting Evaluation procedures for Data Streams Holdout Interleaved Test-Then-Train or Prequential Environments Sensor Network: 100Kb Handheld Computer: 32 Mb Server: 400 Mb 7 / 21
    • Experimental setting Data Sources Random Tree Generator Random RBF Generator LED Generator Waveform Generator Function Generator 7 / 21
    • Experimental setting Classifiers Naive Bayes Decision stumps Hoeffding Tree Hoeffding Option Tree Bagging and Boosting Prediction strategies Majority class Naive Bayes Leaves Adaptive Hybrid 7 / 21
    • Hoeffding Option Tree Hoeffding Option Trees Regular Hoeffding tree containing additional option nodes that allow several tests to be applied, leading to multiple Hoeffding trees as separate paths. 8 / 21
    • Extension to Evolving Data Streams New Evolving Data Stream Extensions New Stream Generators New UNION of Streams New Classifiers 9 / 21
    • Extension to Evolving Data Streams New Evolving Data Stream Generators Random RBF with Drift Hyperplane LED with Drift SEA Generator Waveform with Drift STAGGER Generator 9 / 21
    • Concept Drift Framework f (t) f (t) 1 α 0.5 α t t0 W Definition Given two data streams a, b, we define c = a ⊕W b as the data t0 stream built joining the two data streams a and b Pr[c(t) = b(t)] = 1/(1 + e−4(t−t0 )/W ). Pr[c(t) = a(t)] = 1 − Pr[c(t) = b(t)] 10 / 21
    • Concept Drift Framework f (t) f (t) 1 α 0.5 α t t0 W Example (((a ⊕W0 b) ⊕W1 c) ⊕W2 d) . . . t0 t1 t2 (((SEA9 ⊕W SEA8 ) ⊕W0 SEA7 ) ⊕W0 SEA9.5 ) t0 2t 3t CovPokElec = (CoverType ⊕5,000 Poker) ⊕5,000 581,012 1,000,000 ELEC2 10 / 21
    • Extension to Evolving Data Streams New Evolving Data Stream Classifiers Adaptive Hoeffding Option Tree OCBoost DDM Hoeffding Tree FLBoost EDDM Hoeffding Tree 11 / 21
    • Ensemble Methods New ensemble methods: Adaptive-Size Hoeffding Tree bagging: each tree has a maximum size after one node splits, it deletes some nodes to reduce its size if the size of the tree is higher than the maximum value ADWIN bagging: When a change is detected, the worst classifier is removed and a new classifier is added. 12 / 21
    • Adaptive-Size Hoeffding Tree T1 T2 T3 T4 Ensemble of trees of different size smaller trees adapt more quickly to changes, larger trees do better during periods with little change diversity 13 / 21
    • Adaptive-Size Hoeffding Tree 0,3 0,28 0,29 0,275 0,28 0,27 0,27 0,26 Error Error 0,25 0,265 0,24 0,26 0,23 0,22 0,255 0,21 0,2 0,25 0 0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3 Kappa Kappa Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging (right) on dataset RandomRBF with drift, plotting 90 pairs of classifiers. 14 / 21
    • ADWIN Bagging ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and negatives On the relation of the size of the current window and change rates ADWIN Bagging When a change is detected, the worst classifier is removed and a new classifier is added. 15 / 21
    • Web http://www.cs.waikato.ac.nz/∼abifet/MOA/ 16 / 21
    • GUI java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.gui.TaskLauncher 17 / 21
    • GUI java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.gui.TaskLauncher 17 / 21
    • Command Line EvaluatePeriodicHeldOutTest java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -l DecisionStump -s generators.WaveformGenerator -n 100000 -i 100000000 -f 1000000" > dsresult.csv This command creates a comma separated values file: training the DecisionStump classifier on the WaveformGenerator data, using the first 100 thousand examples for testing, training on a total of 100 million examples, and testing every one million examples: 18 / 21
    • Easy Design of a MOA classifier void resetLearningImpl () void trainOnInstanceImpl (Instance inst) double[] getVotesForInstance (Instance i) void getModelDescription (StringBuilder out, int indent) 19 / 21
    • Example of a classifier: MajorityClass public class MajorityClass extends AbstractClassifier { protected DoubleVector observedClassDistribution; @Override public void resetLearningImpl() { this.observedClassDistribution = new DoubleVector(); } @Override public void trainOnInstanceImpl(Instance inst) { this.observedClassDistribution.addToValue( (int) inst.classValue(), inst weight()); } public double[] getVotesForInstance(Instance i) { return this.observedClassDistribution.getArrayCopy(); } ... } 20 / 21
    • Summary {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. http://www.cs.waikato.ac.nz/∼abifet/MOA/ It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: boosting and bagging Hoeffding Trees MOA deals with evolving data streams MOA is easy to use and extend 21 / 21