Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing

292 views

Published on

Paper: Scalable Detection of Concept Drifts on Data Streams
with Parallel Adaptive Windowing
Abstract: Machine learning techniques for data stream analysis suffer from concept drifts such as changed user preferences, varying weather conditions, or economic changes. These concept drifts cause wrong predictions and lead to incorrect business decisions. Concept drift detection methods such as adaptive windowing (Adwin) allow for adapting to concept drifts on the fly.
In this paper, we examine Adwin in detail and point out its throughput bottlenecks. We then introduce several parallelization alternatives to address these bottlenecks. Our optimizations lead
to a speedup of two orders of magnitude over the original Adwin implementation. Thus, we explore parallel adaptive windowing to provide scalable concept detection for high-velocity data streams with millions of tuples per second.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing

  1. 1. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 1 Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing Presentation at 18.04.2018 Philipp M. Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl EDBT, Vienna, 2018
  2. 2. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 2 Background: Adaptive Windowing (ADWIN) [Bifet ’07] • Maintain an adaptive window which is the basis for computing a ML model. • Grow the window as long as there is no concept drift detected. • As a result, the model can rely on growing training data.
  3. 3. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 3 Background: Adaptive Windowing (ADWIN) [Bifet ’07] • Maintain an adaptive window which is the basis for computing a ML model. • Grow the window as long as there is no concept drift detected. • As a result, the model can rely on growing training data.
  4. 4. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 4 Background: Adaptive Windowing (ADWIN) [Bifet ’07]
  5. 5. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 5 Background: Adaptive Windowing (ADWIN) [Bifet ’07]
  6. 6. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 6 Initial Performance Study: Where do we Spend Execution Time?
  7. 7. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 7 Initial Performance Study: Where do we Spend Execution Time?  Parallelize Cut Detection!
  8. 8. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 8 Scope of Parallelization Cut-Check Decoupling • Cut Checks do not block processing. INTRA Cut-Check Parallelization • Parallelize the execution of one iteration. INTER Cut-Check Parallelization • Run multiple iterations concurrently.
  9. 9. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 9 INTRA Cut Check Parallelization • Halve the latency / execution time of the cut detection. • Introducing a second thread which moves opposite to the first thread and performs half of the cut checks. • Can be extended to more threads.
  10. 10. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 10 INTRA Cut Check Parallelization • Decouple cut detection from the insertion of stream tuples. • Execute cut detection on a snapshot histogram. • Use multi-version concurrency control (MVCC). • roll back in case of concept drifts.
  11. 11. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 11 Experiment Results 1) Optimistic Adwin is two orders of magnitude faster than the original Adwin implementation and 54 times faster than an optimized sequential implementation. 2) Half-Cut Adwin is 10 times faster than the original Adwin implementation and almost 2 times faster than an optimized sequential implementation. 3) Both have a low microseconds latency.
  12. 12. P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, V. Markl - Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing , EDBT 2018 12 Open Source Repositories • Parallel Adwin Open Source Repository – github.com/TU-Berlin-DIMA/parallel-ADWIN • Original Adwin Repository – github.com/abifet/adwin

×