New ensemble methods for evolving data streams

2,430 views

Published on

Talk about MOA, its extension to evolving data streams, bagging using ADWIN, and bagging using trees of different size.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,430
On SlideShare
0
From Embeds
0
Number of Embeds
352
Actions
Shares
0
Downloads
96
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

New ensemble methods for evolving data streams

  1. 1. New ensemble methods for evolving data streams A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà Laboratory for Relational Algorithmics, Complexity and Learning LARCA UPC-Barcelona Tech, Catalonia University of Waikato Hamilton, New Zealand Paris, 29 June 2009 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2009
  2. 2. New Ensemble Methods For Evolving Data Streams Outline a new experimental data stream framework for studying concept drift two new variants of Bagging: ADWIN Bagging Adaptive-Size Hoeffding Tree (ASHT) Bagging. an evaluation study on synthetic and real-world datasets 2 / 25
  3. 3. Outline 1 MOA: Massive Online Analysis 2 Concept Drift Framework 3 New Ensemble Methods 4 Empirical evaluation 3 / 25
  4. 4. What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: boosting and bagging Hoeffding Trees with and without Naïve Bayes classifiers at the leaves. 4 / 25
  5. 5. WEKA Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java Released under the GPL Support for the whole process of experimental data mining Preparation of input data Statistical evaluation of learning schemes Visualization of input data and the result of learning Used for education, research and applications Complements “Data Mining” by Witten & Frank 5 / 25
  6. 6. WEKA: the bird 6 / 25
  7. 7. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 25
  8. 8. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 25
  9. 9. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 25
  10. 10. Data stream classification cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 8 / 25
  11. 11. Experimental setting Evaluation procedures for Data Streams Holdout Interleaved Test-Then-Train or Prequential Environments Sensor Network: 100Kb Handheld Computer: 32 Mb Server: 400 Mb 9 / 25
  12. 12. Experimental setting Data Sources Random Tree Generator Random RBF Generator LED Generator Waveform Generator Function Generator 9 / 25
  13. 13. Experimental setting Classifiers Naive Bayes Decision stumps Hoeffding Tree Hoeffding Option Tree Bagging and Boosting Prediction strategies Majority class Naive Bayes Leaves Adaptive Hybrid 9 / 25
  14. 14. Easy Design of a MOA classifier void resetLearningImpl () void trainOnInstanceImpl (Instance inst) double[] getVotesForInstance (Instance i) void getModelDescription (StringBuilder out, int indent) 10 / 25
  15. 15. Outline 1 MOA: Massive Online Analysis 2 Concept Drift Framework 3 New Ensemble Methods 4 Empirical evaluation 11 / 25
  16. 16. Extension to Evolving Data Streams New Evolving Data Stream Extensions New Stream Generators New UNION of Streams New Classifiers 12 / 25
  17. 17. Extension to Evolving Data Streams New Evolving Data Stream Generators Random RBF with Drift Hyperplane LED with Drift SEA Generator Waveform with Drift STAGGER Generator 12 / 25
  18. 18. Concept Drift Framework f (t) f (t) 1 α 0.5 α t t0 W Definition Given two data streams a, b, we define c = a ⊕W b as the data t0 stream built joining the two data streams a and b Pr[c(t) = b(t)] = 1/(1 + e−4(t−t0 )/W ). Pr[c(t) = a(t)] = 1 − Pr[c(t) = b(t)] 13 / 25
  19. 19. Concept Drift Framework f (t) f (t) 1 α 0.5 α t t0 W Example (((a ⊕W0 b) ⊕W1 c) ⊕W2 d) . . . t0 t1 t2 (((SEA9 ⊕W SEA8 ) ⊕W0 SEA7 ) ⊕W0 SEA9.5 ) t0 2t 3t CovPokElec = (CoverType ⊕5,000 Poker) ⊕5,000 581,012 1,000,000 ELEC2 13 / 25
  20. 20. Extension to Evolving Data Streams New Evolving Data Stream Classifiers Adaptive Hoeffding Option Tree OCBoost DDM Hoeffding Tree FLBoost EDDM Hoeffding Tree 14 / 25
  21. 21. Outline 1 MOA: Massive Online Analysis 2 Concept Drift Framework 3 New Ensemble Methods 4 Empirical evaluation 15 / 25
  22. 22. Ensemble Methods New ensemble methods: Adaptive-Size Hoeffding Tree bagging: each tree has a maximum size after one node splits, it deletes some nodes to reduce its size if the size of the tree is higher than the maximum value ADWIN bagging: When a change is detected, the worst classifier is removed and a new classifier is added. 16 / 25
  23. 23. Adaptive-Size Hoeffding Tree T1 T2 T3 T4 Ensemble of trees of different size smaller trees adapt more quickly to changes, larger trees do better during periods with little change diversity 17 / 25
  24. 24. Adaptive-Size Hoeffding Tree 0,3 0,28 0,29 0,275 0,28 0,27 0,27 0,26 Error Error 0,25 0,265 0,24 0,26 0,23 0,22 0,255 0,21 0,2 0,25 0 0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3 Kappa Kappa Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging (right) on dataset RandomRBF with drift, plotting 90 pairs of classifiers. 18 / 25
  25. 25. ADWIN Bagging ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and negatives On the relation of the size of the current window and change rates ADWIN Bagging When a change is detected, the worst classifier is removed and a new classifier is added. 19 / 25
  26. 26. Outline 1 MOA: Massive Online Analysis 2 Concept Drift Framework 3 New Ensemble Methods 4 Empirical evaluation 20 / 25
  27. 27. Empirical evaluation Dataset Most Accurate Method Hyperplane Drift 0.0001 Bag10 ASHT W+R Hyperplane Drift 0.001 Bag10 ASHT W+R SEA W = 50 Bag10 ASHT W+R SEA W = 50000 BagADWIN 10 HT RandomRBF No Drift 50 centers Bag 10 HT RandomRBF Drift .0001 50 centers BagADWIN 10 HT RandomRBF Drift .001 50 centers Bag10 ASHT W+R RandomRBF Drift .001 10 centers BagADWIN 10 HT Cover Type Bag10 ASHT W+R Poker OzaBoost Electricity OCBoost CovPokElec BagADWIN 10 HT 21 / 25
  28. 28. Empirical evaluation Figure: Accuracy on dataset LED with three concept drifts. 22 / 25
  29. 29. Empirical evaluation SEA W = 50 Time Acc. Mem. Bag10 ASHT W+R 33.20 88.89 0.84 BagADWIN 10 HT 54.51 88.58 1.90 Bag5 ASHT W+R 19.78 88.55 0.01 HT DDM 8.30 88.27 0.17 HT EDDM 8.56 87.97 0.18 OCBoost 59.12 87.21 2.41 OzaBoost 39.40 86.28 4.03 Bag10 HT 31.06 85.45 3.38 AdaHOT50 22.70 85.35 0.86 HOT50 22.54 85.20 0.84 AdaHOT5 11.46 84.94 0.38 HOT5 11.46 84.92 0.38 HT 6.96 84.89 0.34 NaiveBayes 5.32 83.87 0.00 23 / 25
  30. 30. Empirical evaluation SEA W = 50000 Time Acc. Mem. BagADWIN 10 HT 53.15 88.53 0.88 Bag10 ASHT W+R 33.56 88.30 0.84 HT DDM 7.88 88.07 0.16 Bag5 ASHT W+R 20.00 87.99 0.05 HT EDDM 8.52 87.64 0.06 OCBoost 60.33 86.97 2.44 OzaBoost 39.97 86.17 4.00 Bag10 HT 30.88 85.34 3.36 AdaHOT50 22.80 85.30 0.84 HOT50 22.78 85.18 0.83 AdaHOT5 12.48 84.94 0.38 HOT5 12.46 84.91 0.37 HT 7.20 84.87 0.33 NaiveBayes 5.52 83.87 0.00 24 / 25
  31. 31. Summary http://www.cs.waikato.ac.nz/∼abifet/MOA/ Conclusions Extension of MOA to evolving data streams MOA is easy to use and extend New ensemble bagging methods: Adaptive-Size Hoeffding Tree bagging ADWIN bagging Future Work Extend MOA to more data mining and learning methods. 25 / 25

×