While the state of the art in Machine Learning offers practitioners effective tecniques to deal with static data sets, there are only accademic results tailored to data streams. In this presentation for the 4th Stream Reasoning workshop, I report on an effort of Alessio Bernardo (a student of mines) to set up a benchmark enviroment to (i) repeat academic results, (ii) perform studies on real data for confirming the academic results, and (iii) study the research problem of "incremental rebalancing learning on evolving data streams".
4. Inductive
Stream
Reasoning
Introduction
An early attempts of mines
• How can we determining the optimal size of the window?
• What’s the correct way to perform evaluation?
D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp,
A. Rettinger, H. Wermser: Deductive and Inductive Stream
Reasoning for Semantic Social Media Analytics.
IEEE Intelligent Systems 25(6): 32-41 (2010)
17/04/2019 Emanuele Della Valle & Alessio Berdardo 4
5. Inductive
Stream
Reasoning
Introduction
Adaptive Sliding Window (ADWIN)
17/04/2019 Emanuele Della Valle & Alessio Berdardo 5
Bifet, A. and Gavaldà, R., 2009, August. Adaptive learning from evolving data streams. In
International Symposium on Intelligent Data Analysis (pp. 249-260). Springer
6. Inductive
Stream
Reasoning
Introduction
Concept Drift and Streaming Machine Learning
17/04/2019 Emanuele Della Valle & Alessio Berdardo 6
• Hoeffding
Adaptive Tree
• Adaptive
Random Forest
• Temporally
Augmented
Classifier
A. Bifet, R. Gavaldà, G. Holmes, B. Pfahringer: Machine Learning for Data Streams: with
Practical Examples in MOA. The MIT Press (March 2, 2018)
9. Inductive
Stream
Reasoning
Work in progress
• A benchmarking environment
• Usage of the benchmark environment for
• a replication study
• confirmation study
• Research on Incremental Rebalancing Learning
on Evolving Data Streams
17/04/2019 Emanuele Della Valle & Alessio Berdardo 9
18. Inductive
Stream
Reasoning
Work in progress
Incremental Rebalancing Learning on Evolving Data Streams
17/04/2019 Emanuele Della Valle & Alessio Berdardo 18
Assumption: availability of a finite static batch of data
Chawla, Nitesh V., et al. SMOTE: synthetic minority over-sampling technique. Journal of artificial
intelligence research 16 (2002): 321-357.
23. Inductive
Stream
Reasoning
Conclusions
17/04/2019 Emanuele Della Valle & Alessio Berdardo 23
Benchmarking environment that respects all the key
features
Replication study results are similar to the existing ones
Confirmation study results are similar to standard ones for
prequential evaluation, but validation tells a different story
RebalanceStream appears as a small but promising step
forward
24. Inductive
Stream
Reasoning
Future Work
17/04/2019 Emanuele Della Valle & Alessio Berdardo 24
In benchmarking environment, introduce throughput and
latency as metrics
In replication study, making replicable all the experiments
presented in literature
In confirmation study, understanding why the validation
stream tells such a different story
On the long term, add back deductive reasoning