Leveraging Bagging for Evolving Data Streams

1,299 views
1,176 views

Published on

This talk presents the new Leveraging Bagging for evolving data streams.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,299
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Leveraging Bagging for Evolving Data Streams

  1. 1. Leveraging Bagging for Evolving Data Streams Albert Bifet, Geoff Holmes, and Bernhard Pfahringer University of Waikato Hamilton, New Zealand Barcelona, 21 September 2010 ECML PKDD 2010
  2. 2. Mining Data Streams with Concept Drift Extract information from potentially infinite sequence of data possibly varying over time using few resources Adaptively: no prior knowledge of type or rate of change 2 / 32
  3. 3. Mining Data Streams with Concept Drift Extract information from potentially infinite sequence of data possibly varying over time using few resources Leveraging Bagging New improvements for adaptive bagging methods using input randomization output randomization 2 / 32
  4. 4. Outline 1 Data stream constraints 2 Leveraging Bagging for Evolving Data Streams 3 Empirical evaluation 3 / 32
  5. 5. Outline 1 Data stream constraints 2 Leveraging Bagging for Evolving Data Streams 3 Empirical evaluation 4 / 32
  6. 6. Mining Massive Data Eric Schmidt, August 2010 Every two days now we create as much information as we did from the dawn of civilization up until 2003. 5 exabytes of data 5 / 32
  7. 7. Data stream classification cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 6 / 32
  8. 8. Mining Massive Data Koichi Kawana Simplicity means the achievement of maximum effect with minimum means. time accuracy memory Data Streams 7 / 32
  9. 9. Evaluation Example Accuracy Time Memory Classifier A 70% 100 20 Classifier B 80% 20 40 Which classifier is performing better? 8 / 32
  10. 10. RAM-Hours RAM-Hour Every GB of RAM deployed for 1 hour Cloud Computing Rental Cost Options 9 / 32
  11. 11. Evaluation Example Accuracy Time Memory RAM-Hours Classifier A 70% 100 20 2,000 Classifier B 80% 20 40 800 Which classifier is performing better? 10 / 32
  12. 12. Outline 1 Data stream constraints 2 Leveraging Bagging for Evolving Data Streams 3 Empirical evaluation 11 / 32
  13. 13. Hoeffding Trees Hoeffding Tree : VFDT Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000 With high probability, constructs an identical model that a traditional (greedy) method would learn With theoretical guarantees on the error rate Time Contains “Money” YES Yes NO No Day YES Night 12 / 32
  14. 14. Hoeffding Naive Bayes Tree Hoeffding Tree Majority Class learner at leaves Hoeffding Naive Bayes Tree G. Holmes, R. Kirkby, and B. Pfahringer. Stress-testing Hoeffding trees, 2005. monitors accuracy of a Majority Class learner monitors accuracy of a Naive Bayes learner predicts using the most accurate method 13 / 32
  15. 15. Bagging Figure: Poisson(1) Distribution. Bagging builds a set of M base models, with a bootstrap sample created by drawing random samples with replacement. 14 / 32
  16. 16. Bagging Figure: Poisson(1) Distribution. Each base model’s training set contains each of the original training example K times where P(K = k) follows a binomial distribution. 14 / 32
  17. 17. Oza and Russell’s Online Bagging for M models 1: Initialize base models hm for all m ∈ {1,2,...,M} 2: for all training examples do 3: for m = 1,2,...,M do 4: Set w = Poisson(1) 5: Update hm with the current example with weight w 6: anytime output: 7: return hypothesis: hfin(x) = argmaxy∈Y ∑T t=1 I(ht (x) = y) 15 / 32
  18. 18. ADWIN Bagging (KDD’09) ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and negatives On the relation of the size of the current window and change rates ADWIN Bagging When a change is detected, the worst classifier is removed and a new classifier is added. 16 / 32
  19. 19. ADWIN Bagging for M models 1: Initialize base models hm for all m ∈ {1,2,...,M} 2: for all training examples do 3: for m = 1,2,...,M do 4: Set w = Poisson(1) 5: Update hm with the current example with weight w 6: if ADWIN detects change in error of one of the classifiers then 7: Replace classifier with higher error with a new one 8: anytime output: 9: return hypothesis: hfin(x) = argmaxy∈Y ∑T t=1 I(ht (x) = y) 17 / 32
  20. 20. Leveraging Bagging for Evolving Data Streams Randomization as a powerful tool to increase accuracy and diversity There are three ways of using randomization: Manipulating the input data Manipulating the classifier algorithms Manipulating the output targets 18 / 32
  21. 21. Input Randomization 0,00 0,05 0,10 0,15 0,20 0,25 0,30 0,35 0,40 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 k P(X=k) λ=1 λ=6 λ=10 Figure: Poisson Distribution. 19 / 32
  22. 22. ECOC Output Randomization Table: Example matrix of random output codes for 3 classes and 6 classifiers Class 1 Class 2 Class 3 Classifier 1 0 0 1 Classifier 2 0 1 1 Classifier 3 1 0 0 Classifier 4 1 1 0 Classifier 5 1 0 1 Classifier 6 0 1 0 20 / 32
  23. 23. Leveraging Bagging for Evolving Data Streams Leveraging Bagging Using Poisson(λ) Leveraging Bagging MC Using Poisson(λ) and Random Output Codes Fast Leveraging Bagging ME if an instance is misclassified: weight = 1 if not: weight = eT /(1−eT ), 21 / 32
  24. 24. Input Randomization Bagging resampling with replacement using Poisson(1) Other Strategies subagging resampling without replacement half subagging resampling without replacement half of the instances bagging without taking out any instance using 1+Poisson(1) 22 / 32
  25. 25. Leveraging Bagging for Evolving Data Streams 1: Initialize base models hm for all m ∈ {1,2,...,M} 2: 3: for all training examples (x,y) do 4: for m = 1,2,...,M do 5: Set w = Poisson(λ) 6: Update hm with the current example with weight w 7: if ADWIN detects change in error of one of the classifiers then 8: Replace classifier with higher error with a new one 9: anytime output: 10: return hfin(x) = argmaxy∈Y ∑T t=1 I(ht (x) = y) 23 / 32
  26. 26. Leveraging Bagging for Evolving Data Streams MC 1: Initialize base models hm for all m ∈ {1,2,...,M} 2: Compute coloring µm(y) 3: for all training examples (x,y) do 4: for m = 1,2,...,M do 5: Set w = Poisson(λ) 6: Update hm with the current example with weight w and class µm(y) 7: if ADWIN detects change in error of one of the classifiers then 8: Replace classifier with higher error with a new one 9: anytime output: 10: return hfin(x) = argmaxy∈Y ∑T t=1 I(ht (x) = µt (y)) 23 / 32
  27. 27. Leveraging Bagging for Evolving Data Streams ME 1: Initialize base models hm for all m ∈ {1,2,...,M} 2: 3: for all training examples (x,y) do 4: for m = 1,2,...,M do 5: Set w = 1 if misclassified, otherwise eT /(1−eT ) 6: Update hm with the current example with weight w 7: if ADWIN detects change in error of one of the classifiers then 8: Replace classifier with higher error with a new one 9: anytime output: 10: return hfin(x) = argmaxy∈Y ∑T t=1 I(ht (x) = y) 23 / 32
  28. 28. Outline 1 Data stream constraints 2 Leveraging Bagging for Evolving Data Streams 3 Empirical evaluation 24 / 32
  29. 29. What is MOA? {M}assive {O}nline {A}nalysis is a framework for mining data streams. Based on experience with Weka and VFML Focussed on classification trees, but lots of active development: clustering, item set and sequence mining, regression Easy to extend Easy to design and run experiments 25 / 32
  30. 30. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 26 / 32
  31. 31. Leveraging Bagging Empirical evaluation Accuracy 75 77 79 81 83 85 87 89 91 93 951000080000150000220000290000360000430000500000570000640000710000780000850000920000990000 Instances Accuracy(%) Leveraging Bagging Leveraging Bagging MC ADWIN Bagging Online Bagging Figure: Accuracy on dataset SEA with three concept drifts. 27 / 32
  32. 32. Empirical evaluation Accuracy RAM-Hours Hoeffding Tree 74.03% 0.01 Online Bagging 77.15% 2.98 ADWIN Bagging 79.24% 1.48 ADWIN Half Subagging 78.36% 1.04 ADWIN Subagging 78.68% 1.13 ADWIN Bagging WT 81.49% 2.74 ADWIN Bagging Strategies half subagging resampling without replacement half of the instances subagging resampling without replacement WT: bagging without taking out any instance using 1+Poisson(1) 28 / 32
  33. 33. Empirical evaluation Accuracy RAM-Hours Hoeffding Tree 74.03% 0.01 Online Bagging 77.15% 2.98 ADWIN Bagging 79.24% 1.48 Leveraging Bagging 85.54% 20.17 Leveraging Bagging MC 85.37% 22.04 Leveraging Bagging ME 80.77% 0.87 Leveraging Bagging Leveraging Bagging Using Poisson(λ) Leveraging Bagging MC Using Poisson(λ) and Random Output Codes Leveraging Bagging ME Using weight 1 if misclassified, otherwise eT /(1−eT ) 29 / 32
  34. 34. Empirical evaluation Accuracy RAM-Hours Hoeffding Tree 74.03% 0.01 Online Bagging 77.15% 2.98 ADWIN Bagging 79.24% 1.48 Random Forest Leveraging Bagging 80.69% 5.51 Random Forest Online Bagging 72.91% 1.30 Random Forest ADWIN Bagging 74.24% 0.89 Random Forests the input training set is obtained by sampling with replacement the nodes of the tree use only (n) random attributes to split we only keep statistics of these attributes. 30 / 32
  35. 35. Leveraging Bagging Diversity 0 0,02 0,04 0,06 0,08 0,1 0,12 0,14 0,16 0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96 Kappa Statistic Error Leveraging Bagging Online Bagging Figure: Kappa-Error diagrams for Leveraging Bagging and Online bagging (bottom) on on the SEA data with three concept drifts, plotting 576 pairs of classifiers. 31 / 32
  36. 36. Summary http://moa.cs.waikato.ac.nz/ Conclusions New improvements for bagging methods using input randomization Improving Accuracy: Using Poisson(λ) Improving RAM-Hours: Using weight 1 if misclassified, otherwise eT /(1−eT ) New improvements for bagging methods using output randomization No need for multi-class classifiers 32 / 32

×