Efficient Data Stream Classification via Probabilistic Adaptive Windows

778 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
778
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Efficient Data Stream Classification via Probabilistic Adaptive Windows

  1. 1. Efficient Data Stream Classification via Probabilistic Adaptive Windows Albert Bifet1, Jesse Read2, Bernhard Pfahringer3, Geoff Holmes3 1Yahoo! Research Barcelona 2Universidad Carlos III, Madrid, Spain 3University of Waikato, Hamilton, New Zealand SAC 2013, 19 March 2013
  2. 2. Data Streams Big Data & Real Time
  3. 3. Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Big Data & Real Time
  4. 4. Data Streams Approximation algorithms Small error rate with high probability An algorithm ( , δ)−approximates F if it outputs ˜F for which Pr[|˜F − F| > F] < δ. Big Data & Real Time
  5. 5. Data Stream Sliding Window Sampling algorithms Giving equal weight to old and new examples: RESERVOIR SAMPLING Giving more weight to recent examples: PROBABILISTIC APPROXIMATE WINDOW Big Data & Real Time
  6. 6. 8 Bits Counter 1 0 1 0 1 0 1 0 What is the largest number we can store in 8 bits?
  7. 7. 8 Bits Counter What is the largest number we can store in 8 bits?
  8. 8. 8 Bits Counter 0 20 40 60 80 100 0 20 40 60 80 100 x f(x) = log(1 + x)/ log(2) f(0) = 0, f(1) = 1
  9. 9. 8 Bits Counter 0 2 4 6 8 10 0 2 4 6 8 10 x f(x) = log(1 + x)/ log(2) f(0) = 0, f(1) = 1
  10. 10. 8 Bits Counter 0 2 4 6 8 10 0 2 4 6 8 10 x f(x) = log(1 + x/30)/ log(1 + 1/30) f(0) = 0, f(1) = 1
  11. 11. 8 Bits Counter 0 20 40 60 80 100 0 20 40 60 80 100 x f(x) = log(1 + x/30)/ log(1 + 1/30) f(0) = 0, f(1) = 1
  12. 12. 8 bits Counter MORRIS APPROXIMATE COUNTING ALGORITHM 1 Init counter c ← 0 2 for every event in the stream 3 do rand = random number between 0 and 1 4 if rand < p 5 then c ← c + 1 What is the largest number we can store in 8 bits?
  13. 13. 8 bits Counter MORRIS APPROXIMATE COUNTING ALGORITHM 1 Init counter c ← 0 2 for every event in the stream 3 do rand = random number between 0 and 1 4 if rand < p 5 then c ← c + 1 With p = 1/2 we can store 2 × 256 with standard deviation σ = n/2
  14. 14. 8 bits Counter MORRIS APPROXIMATE COUNTING ALGORITHM 1 Init counter c ← 0 2 for every event in the stream 3 do rand = random number between 0 and 1 4 if rand < p 5 then c ← c + 1 With p = 2−c then E[2c ] = n + 2 with variance σ2 = n(n + 1)/2
  15. 15. 8 bits Counter MORRIS APPROXIMATE COUNTING ALGORITHM 1 Init counter c ← 0 2 for every event in the stream 3 do rand = random number between 0 and 1 4 if rand < p 5 then c ← c + 1 If p = b−c then E[bc ] = n(b − 1) + b, σ2 = (b − 1)n(n + 1)/2
  16. 16. PROBABILISTIC APPROXIMATE WINDOW 1 Init window w ← ∅ 2 for every instance i in the stream 3 do store the new instance i in window w 4 for every instance j in the window 5 do rand = random number between 0 and 1 6 if rand > b−1 7 then remove instance j from window w PAW maintains a sample of instances in logarithmic memory, giving greater weight to newer instances
  17. 17. Experiments: Methods Abbr. Classifier Parameters NB Naive Bayes HT Hoeffding Tree HTLB Leveraging Bagging with HT n = 10 kNN k Nearest Neighbour w = 1000, k = 10 kNNW kNN with PAW w = 1000, k = 10 kNNWA kNN with PAW+ADWIN w = 1000, k = 10 kNNLB W Leveraging Bagging with kNNW n = 10 The methods we consider. Leveraging Bagging methods use n models. kNNWA empties its window (of max w) when drift is detected (using the ADWIN drift detector).
  18. 18. Experimental Evaluation Table : The window size for kNN and corresponding performance. Accuracy −w 100 −w 500 −w 1000 −w 5000 Real Avg. 77.88 77.78 79.59 78.23 Synth. Avg. 57.99 81.93 84.74 86.03 Overall Avg. 62.53 80.28 82.59 83.11 Results
  19. 19. Experimental Evaluation Table : The window size for kNN and corresponding performance. Time (seconds) −w 100 −w 500 −w 1000 −w 5000 Real Tot. 297 998 1754 7900 Synth. Tot. 371 1297 2313 10671 Overall Tot. 668 2295 4067 18570 Results
  20. 20. Experimental Evaluation Table : The window size for kNN and corresponding performance. RAM Hours −w 100 −w 500 −w 1000 −w 5000 Real Tot. 0.007 0.082 0.269 5.884 Synth. Tot. 0.002 0.026 0.088 1.988 Overall Tot. 0.009 0.108 0.357 7.872 Results
  21. 21. Experimental Evaluation Table : Summary of Efficiency: Accuracy and RAM-Hours. NB HT HTLB kNN kNNW kNNWA kNNLB W Accuracy 56.19 73.95 83.75 82.59 82.92 83.19 84.67 RAM-Hrs 0.02 1.57 300.02 0.36 8.08 8.80 250.98 Results
  22. 22. Conclusions Sampling algorithms for kNN Giving equal weight to old and new examples: RESERVOIR SAMPLING Giving more weight to recent examples: PROBABILISTIC APPROXIMATE WINDOW Big Data & Real Time
  23. 23. Thanks!

×