Towards Automatic Composition of Multicomponent Predictive Systems

Towards Automatic Composition of
MultiComponent Predictive Systems
Manuel Martin Salvador, Marcin Budka, Bogdan Gabrys
Data Science Institute, Bournemouth University, UK
April 18th, 2016
Seville, Spain

Predictive modelling
Labelled
Data
Supervised
Learning
Algorithm
Predictive
Model

Data is imperfect
Missing
Values
Noise
High
dimensionality
Outliers
Question Mark: http://commons.wikimedia.org/wiki/File:Question_mark_road_sign,_Australia.jpg
Noise: http://www.flickr.com/photos/benleto/3223155821/
Outliers: http://commons.wikimedia.org/wiki/File:Diagrama_de_caixa_com_outliers_and_whisker.png
3D plot: http://salsahpc.indiana.edu/plotviz/

Data Postprocessing PredictionsPreprocessing
Predictive
Model

Preprocessing
Data
Predictive
Model
Postprocessing Predictions
Preprocessing
Preprocessing Predictive
Model
Predictive
Model

Algorithm Selection
What are the best
algorithms to
process my data?

Hyperparameter Optimisation
How to tune the
hyperparameters to get
the best performance?

CASH problem
k-fold cross validation
Combined Algorithm Selection and Hyperparameter configuration problem
Objective function
(e.g. classification error)
HyperparametersAlgorithms
Training dataset
Validation dataset
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms.
In: Proc. of the 19th ACM SIGKDD. (2013) 847–855

Auto-WEKA
WEKA methods as search space
One-click black box
Data + Time Budget → MCPS
Our contribution
Recursive extension of complex
hyperparameters in the search space.
Code available in https://github.com/dsibournemouth/autoweka

Search space
Hyperparameters
PREV NEW
756 1186

Optimisation strategies
● Grid search: exhaustive exploration of the whole search space. Not feasible in high
dimensional spaces.
● Random search: explores the search space randomly during a given time.
● Bayesian optimisation: assumes that there is a function between the hyperparameters
and the objective and try to explore the most promising parts of the search space.
Hutter, F., Hoos, H. H., & Leyton-
Brown, K. (2011). Sequential
Model-Based Optimization for
General Algorithm Configuration.
Learning and Intelligent
Optimization, 6683 LNCS, 507–
523.

Evaluated strategies
1. WEKA-Def: All the predictors and meta-predictors are run using WEKA’s
default hyperparameter values.
2. Random search: The search space is randomly explored.
3. SMAC: Sequential Model-based Algorithm Configuration incrementally
builds a Random Forest as inner model.
4. TPE: Tree-structure Parzen Estimation uses Gaussian Processes to
incrementally build an inner model.
Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential Model-Based Optimization for General Algorithm Configuration. Learning and Intelligent Optimization,
6683 LNCS, 507–523.
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, Algorithms for Hyper-Parameter Optimization. in Advances in NIPS 24, 2011, pp. 1–9.

Experiments
21 datasets (classification problems)
Budget: 30 CPU-hours (per run)
25 runs with different seeds
Timeout: 30 minutes
Memout: 3GB RAM

Results
Classification error on test set
● WEKA-Def (best): 1/21
● Random search (mean): 4/21
● SMAC (mean): 10/21
● TPE (mean): 6/21
Search spaces
● NEW > PREV: 52/63

Conclusion and future work
Automation of composition and optimisation of MCPSs is feasible
Extending the search space has helped to find better solutions
Bayesian optimisation strategies have performed better than random search in
most cases
Future work:
● Still gap for improvement in Bayesian optimisation strategies.
● Multi-objective optimisation (e.g. time and error).
● Adaptive optimisation in changing environments.

Thank you!
msalvador@bournemouth.ac.uk
Paper available in https://dx.doi.org/10.1007/978-3-319-32034-2_3
Slides available in http://slideshare.net/draxus

Towards Automatic Composition of Multicomponent Predictive Systems

More Related Content

What's hot

Similar to Towards Automatic Composition of Multicomponent Predictive Systems

More from Manuel Martín

Recently uploaded

Towards Automatic Composition of Multicomponent Predictive Systems