Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
OpenML.org: Networked Science and IoT Data Streams
Jan N. van Rijn
University of Freiburg
November 24, 2016
Motivation
Galileo Galilei (1564–1642)
Created the best telescopes
Discovered the rings of Saturn
Jan N. van Rijn OpenML.o...
Motivation
Galileo Galilei (1564–1642)
Created the best telescopes
Discovered the rings of Saturn
Sent anagrams of his dis...
Openml.org
Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 4
Datasets
Data (ARFF) uploaded or referenced, versioned
Analysed, characterized, organized on line
Indexed based on name, m...
Tasks
Data alone does not define an experiment
Tasks contain: data, target attribute, goals, procedures
Readable by tools, ...
Flows (algorithms)
Run locally, auto-registered by tools
Integrations + APIs (REST, Java, R, Python, . . . )
Jan N. van Ri...
Flows (algorithms)
Run locally, auto-registered by tools
Integrations + APIs (REST, Java, R, Python, . . . )
1 from s c i ...
Flows (algorithms)
Run locally, auto-registered by tools
Integrations + APIs (REST, Java, R, Python, . . . )
Jan N. van Ri...
Runs
Flow uploads predictions
Predictions are evaluated on OpenML
Reproducible, linked to data, flows and researcher
Contai...
Analysis
Answer basic questions about performance of algorithms to study . . .
the effect / behaviour of parameters on a gi...
Effect of parameter
93
94
95
96
97
98
99 RBFK
ernel(1)
J48(2)
IBk(1)
Logistic(1)
Random
Forest(1)
REPTree(1)
PredictiveAccu...
Effect of parameter
93
94
95
96
97
98
99 RBFK
ernel(1)
J48(2)
IBk(1)
Logistic(1)
Random
Forest(1)
REPTree(1)
PredictiveAccu...
Effect of parameter
93
94
95
96
97
98
99 RBFK
ernel(1)
J48(2)
IBk(1)
Logistic(1)
Random
Forest(1)
REPTree(1)
PredictiveAccu...
Effect of Feature Selection
256
512
1024
2048
4096
8192
16384
32768
65536
1 4 16 64 256 1024 4096 16384
NumberOfInstances
N...
Effect of Feature Selection
256
512
1024
2048
4096
8192
16384
32768
65536
1 4 16 64 256 1024 4096 16384
NumberOfInstances
N...
Performance of Algorithms
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
JRip
LM
T
H
oeffdingTree
Random
Tree
Random
Forest
N
aiv...
Performance of Algorithms
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
JRip
LM
T
H
oeffdingTree
Random
Tree
Random
Forest
N
aiv...
Performance of Algorithms
105 datasets, 30 classifiers
Friedman - Nemenyi test (α = 0.05)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
Data Streams
On line learning
Many IoT applications in this paradigm
Example: Predict the electricity price for the next d...
Performance of Data Streams Algorithms
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.00
0....
Performance of Data Streams Algorithms
1 2 3 4 5 6 7 8 9 10111213141516171819202122232425
HoeffdingOptionTree
HoeffdingAdapt...
Goal
Can we build a classifier that does better?
How can we use the expermental results in OpenML for this?
Jan N. van Rijn...
Goal
Can we build a classifier that does better?
How can we use the expermental results in OpenML for this?
Probably! By co...
The OpenML approach
Many data streams (and tasks) from various sources
Real world: electricity, forest convertype, airline...
Meta-Features
Category Meta-features
Simple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Obse...
Meta-Features
Category Meta-features
Simple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Obse...
Meta-Features
Category Meta-features
Simple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Obse...
Stream Landmarkers
. . . c . . .
w
l1 ✓ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✓ ✗ 0.7
l2 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✗ ✗ ✓ 0.7
l3 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✓ ✗ ✓ 0.8
Jan N....
Stream Landmarkers
P(l , c, α, L) =
1 iff c = 0
P(l , c − 1, α, L) · α + (1 − L(l (PSc ), l(PSc ))) · (1 − α) otherwise
(1)...
Stream Landmarkers
P(l , c, α, L) =
1 iff c = 0
P(l , c − 1, α, L) · α + (1 − L(l (PSc ), l(PSc ))) · (1 − α) otherwise
(1)...
Classifier Output Difference
25 on line classifiers (data streams)
NoChange
SGDHINGELOSS
SGDLOGLOSS
SPegasosHINGELOSS
SPegaso...
Results
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.25
0.50
0.75
1.00
M
ajority
VoteEnsem
ble
AW
E(J48)
BestS...
Results
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.25
0.50
0.75
1.00
M
ajority
VoteEnsem
ble
AW
E(J48)
BestS...
Results
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1
10
100
1000
10...
Results
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1
10
100
1000
10...
Conclusions
Two techniques
Online Performance Estimation
Ensemble of heterogeneous classifiers
Individual performances are ...
Thank you for your attention
Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 26
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg
Upcoming SlideShare
Loading in …5
×

OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

602 views

Published on

OpenML enables truly collaborative machine learning. Scientists can post important data, inviting anyone to help analyze it. OpenML structures and organizes all results online to show the state of the art and push progress.

OpenML is being integrated in most popular machine learning environments, so you can automatically upload all your data, code, and experiments. And if you develop new tools, there's an API for that, plus people to help you.

OpenML allows you to search, compare, visualize, analyze and download all combined results online. Explore the state of the art, improve it, build on it, ask questions and start discussions

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

  1. 1. OpenML.org: Networked Science and IoT Data Streams Jan N. van Rijn University of Freiburg November 24, 2016
  2. 2. Motivation Galileo Galilei (1564–1642) Created the best telescopes Discovered the rings of Saturn Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 2
  3. 3. Motivation Galileo Galilei (1564–1642) Created the best telescopes Discovered the rings of Saturn Sent anagrams of his discoveries, instead of publishing the results Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 2
  4. 4. Openml.org Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 4
  5. 5. Datasets Data (ARFF) uploaded or referenced, versioned Analysed, characterized, organized on line Indexed based on name, meta-features, tags, etc. Support for other data formats (on request) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 5
  6. 6. Tasks Data alone does not define an experiment Tasks contain: data, target attribute, goals, procedures Readable by tools, automates experimentation Real time ‘leaderboard’ and overview Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 6
  7. 7. Flows (algorithms) Run locally, auto-registered by tools Integrations + APIs (REST, Java, R, Python, . . . ) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 7
  8. 8. Flows (algorithms) Run locally, auto-registered by tools Integrations + APIs (REST, Java, R, Python, . . . ) 1 from s c i k i t l e a r n import t r e e 2 from openml import tasks , runs 3 4 task = t a s k s . get (59) 5 c l f = t r e e . D e c i s i o n T r e e C l a s s i f i e r () 6 run = run . r u n t a s k ( task , c l f ) 7 r e t u r n t a s k , response = run . p u b l i s h () Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 7
  9. 9. Flows (algorithms) Run locally, auto-registered by tools Integrations + APIs (REST, Java, R, Python, . . . ) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 7
  10. 10. Runs Flow uploads predictions Predictions are evaluated on OpenML Reproducible, linked to data, flows and researcher Contains: predictions parameter settings model information evaluation measures Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 8
  11. 11. Analysis Answer basic questions about performance of algorithms to study . . . the effect / behaviour of parameters on a given algorithm the effect of feature selection on a given algorithm how algorithms behave with respect to each other which algorithms perform well on a wide range of datasets Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 9
  12. 12. Effect of parameter 93 94 95 96 97 98 99 RBFK ernel(1) J48(2) IBk(1) Logistic(1) Random Forest(1) REPTree(1) PredictiveAccuracy(%) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 10
  13. 13. Effect of parameter 93 94 95 96 97 98 99 RBFK ernel(1) J48(2) IBk(1) Logistic(1) Random Forest(1) REPTree(1) PredictiveAccuracy(%) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 10
  14. 14. Effect of parameter 93 94 95 96 97 98 99 RBFK ernel(1) J48(2) IBk(1) Logistic(1) Random Forest(1) REPTree(1) PredictiveAccuracy(%) 21 2 2 2 3 24 2 5 26 2 7 28 4 16 64 256 1024 4096 16384 Optimalvalue Number Of Features Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 10
  15. 15. Effect of Feature Selection 256 512 1024 2048 4096 8192 16384 32768 65536 1 4 16 64 256 1024 4096 16384 NumberOfInstances Number Of Features Better Equal Worse k-NN (k = 1) 256 512 1024 2048 4096 8192 16384 32768 65536 1 4 16 64 256 1024 4096 16384 NumberOfInstances Number Of Features Better Equal Worse Naive Bayes Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 11
  16. 16. Effect of Feature Selection 256 512 1024 2048 4096 8192 16384 32768 65536 1 4 16 64 256 1024 4096 16384 NumberOfInstances Number Of Features Better Equal Worse k-NN (k = 1) 256 512 1024 2048 4096 8192 16384 32768 65536 1 4 16 64 256 1024 4096 16384 NumberOfInstances Number Of Features Better Equal Worse Naive Bayes 256 512 1024 2048 4096 8192 16384 32768 65536 1 4 16 64 256 1024 4096 16384 NumberOfInstances Number Of Features Better Equal Worse Decision Tree (C4.5) 256 512 1024 2048 4096 8192 16384 32768 65536 1 4 16 64 256 1024 4096 16384 NumberOfInstances Number Of Features Better Equal Worse SVM (RBF Kernel) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 11
  17. 17. Performance of Algorithms 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 JRip LM T H oeffdingTree Random Tree Random Forest N aiveBayes SM O (PolyK ernel) M ultilayerPerceptron LogitBoost(D ecisionStum p) M ultilayerPerceptron D ecisionTable SM O (RBFK ernel) LogisticH yperPipes M ultilayerPerceptron IBk FU RIA BayesN etA daBoostM 1(N aiveBayes) O LM Sim pleCart ConjunctiveRule A daBoostM 1(D ecisionStum p) LA D TreeO neR Bagging(REPTree) J48 A daBoostM 1(J48) IBk Accuracy Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 12
  18. 18. Performance of Algorithms 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 JRip LM T H oeffdingTree Random Tree Random Forest N aiveBayes SM O (PolyK ernel) M ultilayerPerceptron LogitBoost(D ecisionStum p) M ultilayerPerceptron D ecisionTable SM O (RBFK ernel) LogisticH yperPipes M ultilayerPerceptron IBk FU RIA BayesN etA daBoostM 1(N aiveBayes) O LM Sim pleCart ConjunctiveRule A daBoostM 1(D ecisionStum p) LA D TreeO neR Bagging(REPTree) J48 A daBoostM 1(J48) IBk Accuracy 0.4 0.5 0.6 0.7 0.8 0.9 1 JRip LM T H oeffdingTree Random Tree Random Forest N aiveBayes SM O (PolyK ernel) M ultilayerPerceptron LogitBoost(D ecisionStum p) M ultilayerPerceptron D ecisionTable SM O (RBFK ernel) LogisticH yperPipes M ultilayerPerceptron IBk FU RIA BayesN etA daBoostM 1(N aiveBayes) O LM Sim pleCart ConjunctiveRule A daBoostM 1(D ecisionStum p) LA D TreeO neR Bagging(REPTree) J48 A daBoostM 1(J48) IBk AreaundertheROCcurve Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 12
  19. 19. Performance of Algorithms 105 datasets, 30 classifiers Friedman - Nemenyi test (α = 0.05) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Logistic Model Tree Random Forest Bagging(REP Tree) AdaBoost(J48) FURIA SMO(Poly Kernel) Simple Cart LogitBoost(Decision Stump) Multilayer Perceptron (20) J48 Logistic JRip Multilayer Perceptron (10) REP Tree k-NN (k=10) LAD Tree Multilayer Perc. (10, 10) k-NN (k=1) Decision Table Hoeffding Tree SMO(RBF Kernel) Bayesian Network AdaBoost(NaiveBayes) NaiveBayes AdaBoost(DecisionStump) Random Tree OneR Conjunctive Rule Hyper Pipes OLM CD Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 13
  20. 20. Data Streams On line learning Many IoT applications in this paradigm Example: Predict the electricity price for the next day Feedback whether the prediction was correct Model can become obsolete (concept drift) 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 0 5 10 15 20 25 30 35 40 accuracy interval Hoeffding Tree Naive Bayes SPegasos k-NN Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 14
  21. 21. Performance of Data Streams Algorithms q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.00 0.25 0.50 0.75 1.00N oChange M ajorityClass SPegasoslogloss SPegasoshingeloss SG D logloss SG D hingeloss D ecisionStum pPerceptron AW E(O neR) AW E(D ecisionStum p) RuleClassifier Random H oeffdingTree N aiveBayeskN N k = 1 AW E(REPTree) kN N k = 10 AW E(SM O (PolyKernel)) AW E(Logistic) kN N w ithPAW k = 10AW E(J48) AW E(JRip) H oeffdingTree A SH oeffdingTree H oeffdingO ptionTree H oeffdingA daptiveTree PredictiveAccuracy Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 15
  22. 22. Performance of Data Streams Algorithms 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425 HoeffdingOptionTree HoeffdingAdaptiveTree HoeffdingTree ASHoeffdingTree AWE(J48) AWE(JRip) AWE(SMO(PolyKernel)) AWE(Logistic) kNNwithPAW k = 10 AWE(REPTree) kNN k = 10 kNN k = 1 NaiveBayes RandomHoeffdingTree Perceptron RuleClassifier AWE(DecisionStump) AWE(OneR) SPegasos logloss DecisionStump SPegasos hingeloss SGD hingeloss SGD logloss MajorityClass NoChange CD Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 16
  23. 23. Goal Can we build a classifier that does better? How can we use the expermental results in OpenML for this? Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 17
  24. 24. Goal Can we build a classifier that does better? How can we use the expermental results in OpenML for this? Probably! By combining them in a smart way (ensembles) Approach: work on intervals of 1,000 observations Task: try to predict for the next interval which classifier to use Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 17
  25. 25. The OpenML approach Many data streams (and tasks) from various sources Real world: electricity, forest convertype, airlines Synthetic: Bayesian Network Generator, Moving Hyperplanes, LED Meta-features per data stream Direct access to all MOA classifiers Experimental results Models Predictions Measured Performance Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 18
  26. 26. Meta-Features Category Meta-features Simple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Observations with Missing Values, # Missing Values, % Observations With Missing Values, % Missing Values, # Numeric Attributes, # Nominal Attributes, # Binary Attributes, % Numeric Attributes, % Nominal Attributes, % Binary Attributes, Majority Class Size, % Majority Class, Minority Class Size, % Minority Class Statistical Mean of Means of Numeric Attributes, Mean Standard Deviation of Numeric Attributes, Mean Kurtosis of Numeric Attributes, Mean Skewness of Numeric Attributes Information Theoretic Class Entropy, Mean Attribute Entropy, Mean Mutual Information, Equivalent Number Of At- tributes, Noise to Signal Ratio Landmarkers Accuracy, Kappa and Area under the ROC Curve of the following classifiers: Decision Stump, J48 (confidence factor: 0.01), k-NN, NaiveBayes, REP Tree (maximum depth: 3) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 19
  27. 27. Meta-Features Category Meta-features Simple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Observations with Missing Values, # Missing Values, % Observations With Missing Values, % Missing Values, # Numeric Attributes, # Nominal Attributes, # Binary Attributes, % Numeric Attributes, % Nominal Attributes, % Binary Attributes, Majority Class Size, % Majority Class, Minority Class Size, % Minority Class Statistical Mean of Means of Numeric Attributes, Mean Standard Deviation of Numeric Attributes, Mean Kurtosis of Numeric Attributes, Mean Skewness of Numeric Attributes Information Theoretic Class Entropy, Mean Attribute Entropy, Mean Mutual Information, Equivalent Number Of At- tributes, Noise to Signal Ratio Landmarkers Accuracy, Kappa and Area under the ROC Curve of the following classifiers: Decision Stump, J48 (confidence factor: 0.01), k-NN, NaiveBayes, REP Tree (maximum depth: 3) Drift detection Changes by Adwin (Hoeffding Tree), Warnings by Adwin (Hoeffding Tree), Changes by DDM (Hoeffding Tree), Warnings by DDM (Hoeffding Tree), Changes by Adwin (Naive Bayes), Warnings by Adwin (Naive Bayes), Changes by DDM (Naive Bayes), Warnings by DDM (Naive Bayes) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 19
  28. 28. Meta-Features Category Meta-features Simple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Observations with Missing Values, # Missing Values, % Observations With Missing Values, % Missing Values, # Numeric Attributes, # Nominal Attributes, # Binary Attributes, % Numeric Attributes, % Nominal Attributes, % Binary Attributes, Majority Class Size, % Majority Class, Minority Class Size, % Minority Class Statistical Mean of Means of Numeric Attributes, Mean Standard Deviation of Numeric Attributes, Mean Kurtosis of Numeric Attributes, Mean Skewness of Numeric Attributes Information Theoretic Class Entropy, Mean Attribute Entropy, Mean Mutual Information, Equivalent Number Of At- tributes, Noise to Signal Ratio Landmarkers Accuracy, Kappa and Area under the ROC Curve of the following classifiers: Decision Stump, J48 (confidence factor: 0.01), k-NN, NaiveBayes, REP Tree (maximum depth: 3) Drift detection Changes by Adwin (Hoeffding Tree), Warnings by Adwin (Hoeffding Tree), Changes by DDM (Hoeffding Tree), Warnings by DDM (Hoeffding Tree), Changes by Adwin (Naive Bayes), Warnings by Adwin (Naive Bayes), Changes by DDM (Naive Bayes), Warnings by DDM (Naive Bayes) Stream Landmarkers Accuracy Naive Bayes on previous window, Accuracy k-NN on previous window, . . . Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 19
  29. 29. Stream Landmarkers . . . c . . . w l1 ✓ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✓ ✗ 0.7 l2 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✗ ✗ ✓ 0.7 l3 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✓ ✗ ✓ 0.8 Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 20
  30. 30. Stream Landmarkers P(l , c, α, L) = 1 iff c = 0 P(l , c − 1, α, L) · α + (1 − L(l (PSc ), l(PSc ))) · (1 − α) otherwise (1) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 21
  31. 31. Stream Landmarkers P(l , c, α, L) = 1 iff c = 0 P(l , c − 1, α, L) · α + (1 − L(l (PSc ), l(PSc ))) · (1 − α) otherwise (1) Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 21
  32. 32. Classifier Output Difference 25 on line classifiers (data streams) NoChange SGDHINGELOSS SGDLOGLOSS SPegasosHINGELOSS SPegasosLOGLOSS MajorityClass Perceptron AWE(OneRule) DecisionStump AWE(DecisionStump) RuleClassifier 1−NN k−NNwithPAW k−NN RandomHoeffdingTree HoeffdingAdaptiveTree HoeffdingOptionTree ASHoeffdingTree HoeffdingTree AWE(JRip) AWE(REPTree) AWE(J48) NaiveBayes AWE(SMO) AWE(Logistic) 0.00.10.20.30.40.50.6 Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 22
  33. 33. Results qq q q q q q q q q q q q q q q q q q q q q q q q q q q 0.25 0.50 0.75 1.00 M ajority VoteEnsem ble AW E(J48) BestSingleClassifier O nlineBaggingM eta−learning Ensem ble BLA ST (W indow ) BLA ST (FF) Leveraging Bagging PredictiveAccuracy Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 23
  34. 34. Results qq q q q q q q q q q q q q q q q q q q q q q q q q q q 0.25 0.50 0.75 1.00 M ajority VoteEnsem ble AW E(J48) BestSingleClassifier O nlineBaggingM eta−learning Ensem ble BLA ST (W indow ) BLA ST (FF) Leveraging Bagging PredictiveAccuracy 1 2 3 4 5 6 7 8 Leveraging Bagging BLAST (FF) Online Bagging BLAST (Window) Meta-learning Ensemble Best Single Classifier AWE(J48) Majority Vote Ensemble CD Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 23
  35. 35. Results q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q 1 10 100 1000 10000 BestSingleClassifier AW E(J48) M ajority VoteEnsem ble BLA ST (W indow ) BLA ST (FF) O nlineBagging Leveraging Bagging RunCpuTime Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 24
  36. 36. Results q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q 1 10 100 1000 10000 BestSingleClassifier AW E(J48) M ajority VoteEnsem ble BLA ST (W indow ) BLA ST (FF) O nlineBagging Leveraging Bagging RunCpuTime 1 2 3 4 5 6 7 Best Single Classifier AWE(J48) Majority Vote Ensemble BLAST (Window) BLAST (FF) Online Bagging Leveraging Bagging CD Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 24
  37. 37. Conclusions Two techniques Online Performance Estimation Ensemble of heterogeneous classifiers Individual performances are average Combination (BLAST) boosts performance considerably Parameters to optimize: Ensemble composition Window size Voting policy Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 25
  38. 38. Thank you for your attention Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 26

×