PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions

8,005 views
7,971 views

Published on

Presenters: Indrė Žliobaitė, João Gama, Albert Bifet, and Mykola Pechenizkiy.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,005
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
119
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions

  1. 1. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Handling Concept Drift: Importance, Challenges & Solutions A. Bifet, J. Gama, M. Pechenizkiy, I. Žliobaitė PAKDD-2011 Tutorial May 27, Shenzhen, China http://www.cs.waikato.ac.nz/~abifet/PAKDD2011/
  2. 2. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Motivation for the Tutorial • in the real world data – more and more data is organized as data streams – data comes from complex environment, and it is evolving over time – concept drift = underlying distribution of data is changing • attention to handling concept drift is increasing, since predictions become less accurate over time • to prevent that, learning models need to adapt to changes quickly and accurately PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 2 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  3. 3. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Objectives of the Tutorial • many approaches to handle drifts in a wide range of applications have been proposed • the tutorial aims to provide an integrated view of the adaptive learning methods and application needs PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 3 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  4. 4. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Outline Block 1: PROBLEM Block 2: TECHNIQUES Block 3: EVALUATION Block 4: APPLICATIONS OUTLOOK PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 4 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  5. 5. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Presenters & ScheduleIndrė Žliobaitė Block 1: what concept drift is, why it needs 14:05 – 14:50 to be handled and how João Gama 14:50 – 15:40 Block 2: approaches for adaptive machine learning to handle concept drift PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 5 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  6. 6. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Presenters & Schedule Albert Bifet Block 3: training/evaluation 16:00 – 16:45 of adaptive learning approaches and a software demoMykola Pechenizkiy Block 4: challenges for handling concept drift 16:45 – 17:30 in different applications PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 6 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  7. 7. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 7 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  8. 8. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Block 1: introduction PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 8 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  9. 9. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook The goals: • to introduce the problem of concept drift in supervised learning • to overview and categorize the main principles how concept drift can be (and is) handled PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 9 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  10. 10. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Example: production PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 10 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  11. 11. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook many processWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook many process parameters parameters Example: production prediction predict quality reduced and transformed parameters t1, t2 q 3 79 monitor 78 3S Konz. (INA&INAL) 2 2S 77 1 76 75 Target t[2] 0 74 -1 73 2S 72 -2 3S 0 100 200 300 400 -3 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Num t[1] SIMCA-P+ 11 - 13.12.2005 10:17:16 PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 11 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  12. 12. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook many processWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook many process parameters parameters Example: production prediction predict quality reduced and transformed parameters t1, t2 q 3 79 monitor 78 3S Konz. (INA&INAL) 2 2S 77 1 76 75 Target t[2] 0 74 -1 73 2S 72 -2 3S 0 100 200 300 400 -3 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Num t[1] SIMCA-P+ 11 - 13.12.2005 10:17:16 after 2-3 months predictions get “out of tune” PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 12 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  13. 13. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Static model Model does not change Process changes source: Evonik Industries PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 13 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  14. 14. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook the supplier of raw a sensor breaks/ new operating material changes “wears off”/ crew is replaced Model does not change Process changes source: Evonik Industries new regulations to new production save electricity procedures PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 14 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  15. 15. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Desired Properties of a System To Handle Concept Drift • Adapt to concept drift asap • Distinguish noise from changes – robust to noise, but adaptive to changes • Recognizing and reacting to reoccurring contexts • Adapting with limited resources – time and memory PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 15 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  16. 16. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Setting: adaptive learning PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 16 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  17. 17. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Supervised Learning Training: Learning Testing X a mapping function population data y = L (x) 2. Application: Applying L to unseen data X Historical data Learner y = L (X) 1. training y labels 2. application testing labels ? y PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 17 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  18. 18. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Learning with concept drift population Training: = ?? Testing data X y = L (x) population Application: y = L?? (X) X Historical data Learner y labels testing labels ? y PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 18 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  19. 19. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Online setting ? time Data arrives in a stream PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 19 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  20. 20. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive Learning ? time updated model as time passes the model updates/retrains prediction itself if needed PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 20 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  21. 21. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook How to Adapt Select the right training data and or adjust retrain the model incrementally ? PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 21 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  22. 22. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook The main strategies how to handle concept drift Images.com PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 22 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  23. 23. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive learning strategies Triggering EvolvingSingle classifier Detectors Forgetting variable windows fixed windows, Instance weighting Ensemble Dynamic Contextual ensemble adaptive dynamic integration, combination rules meta learning PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 23 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  24. 24. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive learning strategies Triggering Evolvingchange detection and adapting at every stepa follow up reactionSingle classifier Detectors Forgetting variable windows fixed windows, Instance weighting Ensemble Dynamic Contextual ensemble adaptive dynamic integration, combination rules meta learning PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 24 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  25. 25. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive learning strategies reactive, Triggering Evolving forgettingSingle classifier Detectors Forgetting variable windows fixed windows, Instance weighting Ensemble Dynamic Contextual ensemble maintain adaptive dynamic integration, some combination rules meta learning memory PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 25 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  26. 26. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive learning strategies Triggering Evolving forget old data and retrain at a fixed rateSingle classifier Detectors Forgetting fixed windows, Instance weighting Ensemble Dynamic Contextual ensemble PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 26 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  27. 27. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Fixed Training Window time train predict PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 27 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  28. 28. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Fixed Training Window time PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 28 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  29. 29. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive learning strategies Triggering Evolving detect a change and cutSingle classifier Detectors Forgetting variable windows Ensemble Dynamic Contextual ensemble PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 29 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  30. 30. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Variable Training Window detect a change PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 30 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  31. 31. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Variable Training Window drop old data PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 31 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  32. 32. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Variable Training Window retrain the model predict PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 32 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  33. 33. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive learning strategies Triggering EvolvingSingle classifier Detectors Forgetting Ensemble Dynamic Contextual ensemble adaptive build many models, combination rules dynamically combine PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 33 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  34. 34. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Dynamic Ensemble Classifier 1 Classifier 2 vote Classifier 3 Classifier 4 PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 34 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  35. 35. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Dynamic Ensemble voter 1 time voter 2 voter 3 voter 4 TRUE PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 35 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  36. 36. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Dynamic Ensemble punish voter 1 voter 1 reward voter 2 voter 2 punish voter 3 voter 3 reward voter 4 voter 4 TRUE PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 36 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  37. 37. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Dynamic Ensemble punish voter 1 voter 1 reward voter 2 voter 2 punish voter 3 voter 3 reward voter 4 voter 4 TRUE TRUE PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 37 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  38. 38. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Dynamic Ensemble punish voter 1 voter 1 voter 1 reward voter 2 voter 2 voter 2 punish voter 3 voter 3 voter 3 reward voter 4 voter 4 voter 4 TRUE TRUE PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 38 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  39. 39. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptive learning strategies Triggering Evolving Single classifier Detectors Forgetting Ensemble Dynamic Contextual ensemblebuild many models, dynamic integration,switch models according meta learningto the observed incoming data PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 39 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  40. 40. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Contextual (Meta) Approaches partition the training data Group 1 = Classifier 1 build classifiers Group 3 = Classifier 3 Group 2 = Classifier 2 PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 40 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  41. 41. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Contextual (Meta) Approaches find to which partition the new instance belongs and assign the classifier Group 1 = Classifier 1 Group 3 = Classifier 3 Group 2 = Classifier 2 PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 41 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  42. 42. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook • adaptive learning approaches implicitly or explicitly assume some type of change PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 42 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  43. 43. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Types of Changes: speed mean SUDDEN time mean GRADUAL time PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 43 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  44. 44. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Types of Changes: reoccurrence meanALWAYS NEW* time mean time meanREOCCURING time PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 44 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  45. 45. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Typically Used* Triggering Evolving (change detection) (adapting every step) sudden driftSingle classifier Detectors Forgetting variable windows fixed windows, Instance weighting Ensemble Dynamic Contextual ensemble adaptive * this mapping is dynamic integration, fusion rules not strict or crisp meta learning PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 45 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  46. 46. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Typically Used Triggering Evolving (change detection) (adapting every step)Single classifier Detectors Forgetting variable windows fixed windows, Instance weighting Ensemble Dynamic Contextual ensemble adaptive dynamic integration, fusion rules meta learning PAKDD-2011 Tutorial, May 27, Shenzhen, China gradual drift 46
  47. 47. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Typically Used Triggering Evolving (change detection) (adapting every step) Single classifier Detectors Forgettingreoccuring drift variable windows fixed windows, Instance weighting Ensemble Dynamic Contextual ensemble adaptive dynamic integration, fusion rules meta learning PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 47 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  48. 48. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Which approach to use? • changes occur over time • we need models that evolve over time • choice of technique depends on – what type of change is expected – user goals/ applications PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 48 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  49. 49. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Block summary Images.com PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 49 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  50. 50. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Block summary • Concept drift is a problem in data stream applications, since static models lose accuracy • Different adaptive techniques exist • The choice of technique depends on – what change is expected – peculiarities of the task and application PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 50 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  51. 51. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Block 2: methods and techniques PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 51 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  52. 52. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook The goal: to discuss selected popular approaches from each of the major types in more detail • Data management Forgetting • Detection Methods Detectors Dynamic • Adaptation methods ensembles • Decision model management Contextual PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 52 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  53. 53. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Input Data Short term Memory Data Management Control Learning Detection Adaptation Memory Model Management Model Adaptation PAKDD-2011 Tutorial, Output Model 53 May 27, Shenzhen, China
  54. 54. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Data Management • Characterize the information stored in memory to maintain a decision model consistent with the actual state of the nature. – Full Memory. – Partial Memory. PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 54 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  55. 55. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Weighting examples• Full Memory. – Store in memory sufficient statistics over all the examples. – Weighting the examples accordingly to their age. – Oldest examples are less important.• Weighting examples based on the age: wλ ( x) = exp(−λt x ) where example x was found tx time steps ago PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 55 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  56. 56. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Weight as a Function of Age PAKDD-2011 Tutorial, 56 May 27, Shenzhen, China
  57. 57. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Partial Memory • Partial Memory. – Store in memory only the most recent examples. – Examples are stored in a first-in first-out data structure. – At each time step the learner induces a decision model using only the examples that are included in the window. PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 57 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  58. 58. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Partial Memory • The key difficulty is how to select the appropriate window size: – Small window • can assure a fast adaptability in phases with concept changes • In more stable phases it can affect the learner performance – Large window • produce good and stable learning results in stable phases • can not react quickly to concept changes. PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 58 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  59. 59. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Partial Memory - Windows • Fixed Size windows. – Store in memory a fixed number of the most recent examples. – Whenever a new example is available: • it is stored in memory and • the oldest one is discarded. – This is the simplest method to deal with concept drift and can be used as a baseline for comparisons. • Adaptive Size windows. – the set of examples in the window is variable. – They are used in conjunction with a detection model. – Decreasing the size of the window whenever the detection model signals drift and increasing otherwise. PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 59 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  60. 60. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Discussion • The data management model also indicates the forgetting mechanism. – Weighting examples: • Corresponds to a gradual forgetting. • The relevance of old information is less and less important. – Time windows • Corresponds to abrupt forgetting; • The examples are deleted from memory. – Of course we can combine both forgetting mechanisms by weighting the examples in a time window. PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 60 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  61. 61. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Discussion • These methods are blind adaptation models: – There is no explicit change detection – Do not provide: • Any indication about change points • Dynamics of the process generating data PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite 61 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  62. 62. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Detection Methods • The Detection Model characterizes the techniques and mechanisms for explicit drift detection. – An advantage of the detection model is they can provide: • Meaningful descriptions: – indicating change-points or – small time-windows where the change occurs • Quantification of the changes. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 62 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  63. 63. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Detection Methods • Monitoring the evolution of performance indicators. (See Klinkenberg (98) for a good overview of these indicators) – Performance measures, – Properties of the data, etc • Monitoring distributions on two different time- windows (See D.Kifer, S.Ben-David, J.Gehrke; VLDB 04) – A reference window over past examples – A window over the most recent examples PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 63 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  64. 64. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookMonitoring the evolution of performance indicators • Klinkenberg (1998)proposed monitoring the values of three performance indicators: – Accuracy, recall and precision over time, – and then, comparing it to a confidence interval of standard sample errors – for a moving average value using the last M batches of each particular indicator. • FLORA family of algorithms (Widmer and Kubat, 1996) monitors accuracy and model size. – includes a window adjustment heuristic for a rule- based classifier. – FLORA3: dealing with recurring concepts. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 64 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  65. 65. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Monitoring distributions on two different time-windows• D.Kifer, S.Ben-David, J.Gehrke (VLDB 04) – Propose algorithms (statistical tests based on Chernoff bound) that examine samples drawn from two probability distributions and decide whether these distributions are different.• Gama, Fernandes, Rocha: VFDTc (IDA 2006) – For streaming data decision tree induction • Exploits tree structure: nested trees – Continuously monitoring differences between two class- distribution of the examples: • the distribution when a node was built and the class-distribution when a node was a leaf and • the weighted sum of the class-distributions in the leaves descendant of that node. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 65 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  66. 66. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Hoeffding Trees • Multiple windows – Each node recives information from different windows • Root node: oldest data • Leaves: most recent data PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 66 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  67. 67. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook The RS Method • Implemented in the VFDTc system (IDA 2006) • For each decision node i, compute two estimates of the classification error. – Static error (SEi): • the distribution of the error at node i ; – Backed up error (BUEi): • the sum of the error distributions of all the descending leaves of the node i ; – With these two distributions: • we can detect the concept change, • by verifying the condition SEi ≤ BUEi PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 67 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  68. 68. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook RS Method• Each new example traverses the tree from the root to a leaf – At the leaf: update the value of SEi – It makes the opposite path, and update the values of SEi and BUEi for each internal node, – Verify the regularization condition SEi ≤ BUEi : • If SEi ≤ BUEi then the node i is pruned to a new leaf. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 68 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  69. 69. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 69 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  70. 70. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook ADAPTATION METHODS PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 70 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  71. 71. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Adaptation Methods • The Adaptation model characterizes the changes in the decision model do adapt to the most recent examples. – Blind Methods: • Methods that adapt the learner at regular intervals without considering whether changes have really occurred. – Informed Methods: • Methods that only change the decision model after a change was detected. They are used in conjunction with a detection model. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 71 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  72. 72. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Granularity of Decision Models • Occurrences of drift can have impact in part of the instance space. – Global models: Require the reconstruction of all the decision model. • like naive Bayes, SVM, etc – Granular decision models: Require the reconstruction of parts of the decision model • like decision rules, decision trees PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 72 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  73. 73. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook CVFDT algorithm• Mining time-changing data streams , G. Hulten, L. Spencer, and P. Domingos; ACM SIGKDD 2001• Process examples from the stream indefinitely. – For each example (x, y), • Pass (x, y) down to a set of leaves using HT and all alternate trees of the nodes (x, y) passes through. • Add (x, y) to the sliding window of examples.• Periodically scans the internal nodes of HT.• Start a new alternate tree when a new winning attribute is found. – Tighter criteria to avoid excessive alternate tree creation. – Limit the total number of alternate trees PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 73 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  74. 74. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Smoothly adjust to concept drift• Alternate trees are grown the Age<30? same way HT is. Yes No• Periodically each node with non-empty alternate trees enter a testing mode. Married? Car Type= Yes Sports Car? – M training examples to Yes No compare accuracy. Yes No – Prune alternate trees with Yes No Experience No non-increasing accuracy over <1 year? time. Yes No – Replace if an alternate tree is more accurate. No Yes PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 74 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  75. 75. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Decision model management • Model management characterizes the number of decision models needed to maintain in memory. • The key issue here is the assumption that data generated comes from multiple distributions, – at least in the transition between contexts. – Instead of maintaining a single decision model several authors propose the use of multiple decision models. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 75 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  76. 76. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Dynamic Weighted Majority • A seminal work, is the system presented by Kolter and Maloof (ICDM03, ICML05). • The Dynamic Weighted Majority algorithm (DWM) is an ensemble method for tracking concept drift. – Maintains an ensemble of base learners, – Predicts using a weighted-majority vote of these experts. – Dynamically creates and deletes experts in response to changes in performance. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 76 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  77. 77. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook DETECTION ALGORITHMS PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 77 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  78. 78. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookChange Detection in Predictive Learning • When there is a change in the class-distribution of the examples: – The actual model does not correspond any more to the actual distribution. – The error-rate increases • Basic Idea: – Learning is a process. – Monitor the quality of the learning process: • Using Statistical Process Control techniques. – Monitor the evolution of the error rate. – Main Problems: • How to distinguish Change from Noise? • How to React to drift? PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 78 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  79. 79. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook SPC-Statistical Process Control Learning with Drift Detection ,Gama, Medas, Gladys, Rodrigues; SBIA- LNCS Springer, 2004. • Suppose a sequence of examples in the form < xi,yi> • The actual decision model classifies each example in the sequence – In the 0-1 loss function, predictions are either True or False – The predictions of the learning algorithm are sequences: T, F,T, F,T, F,T,T,T, F, … – The Error is a random variable from Bernoulli trials. – The Binomial distribution gives the general form of the probability of observing a F: • pi = (#F/i) • where i is the number of trials. si = pi (1 − pi ) / i PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 79 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  80. 80. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookStatistical Processing Control Algorithm • Maintains two registers: – Pmin and Smin such that Pmin + Smin = min(pi + si ) – Minimum of the error rate taking into account the variance of the estimator. • At example j : the error of the learning algorithm will be – Out-control if pj + sj > pmin + α smin – In-control if pj + sj < pmin + β smin – Warning Level: if pmin + α smin > pj + sj > pmin + smin • The constants α and β depend on the desired confidence level. – Admissible values are = 2 and = 3. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 80 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  81. 81. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Statistical Processing Control PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 81 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  82. 82. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Illustrative Example: SEA Concepts• The stream: PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 82 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  83. 83. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook The Individual Error of the Models PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 83 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  84. 84. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook System Error PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 84 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  85. 85. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Sequential Analysis • Sequential analysis developed a set of methods for monitoring change detection: – Sequential Probability Ratio Test - SPRT algorithm, D.Wald, Sequential Tests of Statistical Hypotheses. Annals of Mathematical Statistics 16 (2): 117–186; 1945 – Cumulative Sums - CUSUM: E. Page, Continuous Inspection Scheme. Biometrika 41,1954 PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 85 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  86. 86. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook CUSUM Algorithms The CUSUM test is used to detect significant increases (or decreases) in the successive observations of a random variable x • Detecting significant increases: • Detecting decreases: – g0 = 0 – g0 = 0 – gt = max(0; gt-1+ (xt - α)) – gt = min(0; gt-1+ (xt - α)) • The decision rule is: • The decision rule is: – if gt > λ then alarm and gt = 0. – if gt < λ then alarm and gt = 0. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 86 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  87. 87. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Page-Hinckley Test • The PH test is a sequential adaptation of the detection of an abrupt change in the average of a Gaussian signal. – It considers a cumulative variable mT , defined as the cumulated difference between the observed values and their mean till the current moment: PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 87 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  88. 88. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook The Page-Hinckley Test • The minimum value of mt is also computed: – MT = min(mt ; t = 1, …,T). • The test monitors the difference between MT and mT : – PHT = mT - MT . • When this difference is greater than a given threshold (λ) we alarm a change in the distribution. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 88 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  89. 89. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Illustrative Example – The left figure plots the on-line error rate of a learning algorithm. – The center plot is the accumulated on-line error. The slope increases after the occurrence of a change. – The right plot presents the evolution of the PH statistic. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 89 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  90. 90. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Page-Hinckley Test • The PHt test is memoryless, and its accuracy depends on the choice of parameters α and λ. – Both parameters are relevant to control the trade- off between earlier detecting true alarms by allowing some false alarms. • The threshold λ depends on the admissible false alarm rate. – Increasing λ will entail fewer false alarms, but might miss some changes. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 90 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  91. 91. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookAlgorithm Adaptive Window-ADWIN• Learning from Time-Changing Data with Adaptive Windowing, A.Bifet, R.Gavaldà (SDM07)• Whenever two “large enough” subwindows of W exhibit “distinct enough” averages, – we can conclude that the corresponding expected values are different, and – the older portion of the window is dropped PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 91 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  92. 92. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Algorithm ADWIN• The value of εcut for a partition W0 . W1 of W is computed as follows: – Let n0 and n1 be the lengths of W0 and W1 and – Let n be the length of W, so n = n0 + n1. – Let ηW0 and ηW1 be the averages of the values in W0 and W1, and W0 and W1 their expected values. – To obtain totally rigorous performance guarantees we define: PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 92 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  93. 93. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Characteristics of Algorithm ADWIN • False positive rate bound: – If ηt remains constant within W, the probability that ADWIN shrinks the window at this step is at most δ. • False negative rate bound. – Suppose that for some partition of W in two parts W0W1 (where W1 contains the most recent items) we have: |η W0 - η W1| > 2*εcut. Then with probability 1 - δ ADWIN shrinks W to W1, or shorter. PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 93 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  94. 94. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Illustrative Examples Abrupt Change Gradual Change PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite 94 May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
  95. 95. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: OutlookWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook ADWIN – Compressing Space • Using exponential histograms • Requires O(log W) space and time • It can provide exact counts for all the subwindows Example: Counting 1’s in a bit stream Compress: Compress again:A new 1 arrives: There are 3 buckets of size 1: There are 3 buckets of size 2: Remove buckets when a drift is detect:

×