Automating Machine Learning - Is it feasible?

Automating Machine Learning
Is it feasible?
Manuel Martin Salvador
Smart Technology Research Group
Bournemouth University
June 2nd, 2016

Index
1. Recent life-changing applications of Machine Learning
2. Multicomponent Predictive Systems (MCPS)
3. Automating the composition and optimisation of MCPS
4. Adapting MCPS to changing environments
5. Conclusion and future work

Recent life-changing
applications of
Machine Learning

Gene Discovery
Source: http://msgeneticslab.med.ubc.ca/gene-discovery/
Dessa Sadovnick and Carles Vilariño-Güell
University of British Columbia
A mutation in NR1H3 protein can trigger Multiple Sclerosis

Microsoft Seeing AI
Source: https://www.youtube.com/watch?v=R2mC-NUAmMk

Autonomous Vehicles
Source: https://www.youtube.com/watch?v=dk3oc1Hr62g

Instant Translation
Source: https://www.skype.com/en/features/skype-translator/

Multicomponent
Predictive Systems

Predictive Modelling
Labelled
Data
Supervised
Learning
Algorithm
Predictive
Model

Data is imperfect
Missing
Values
Noise
High
dimensionality
Outliers
Question Mark: http://commons.wikimedia.org/wiki/File:Question_mark_road_sign,_Australia.jpg
Noise: http://www.flickr.com/photos/benleto/3223155821/
Outliers: http://commons.wikimedia.org/wiki/File:Diagrama_de_caixa_com_outliers_and_whisker.png
3D plot: http://salsahpc.indiana.edu/plotviz/

Multicomponent Predictive System (MCPS)
Data Postprocessing PredictionsPreprocessing
Predictive
Model

Multicomponent Predictive System (MCPS)
Preprocessing
Data
Predictive
Model
Postprocessing Predictions
Preprocessing
Preprocessing
Predictive
Model
Predictive
Model

How to model MCPS?
Function composition: Not enough for modelling parallel paths.
Directed Acyclic Graph: Not enough to model process state.
Petri net: Very flexible and robust mathematical background.
Expressivepower
Y = h(g(f(X)))
f g hX Y
f g hX Y

Petri net
Mathematical modelling language invented in 1939 by Carl Adam Petri
token
place
transition
arc
N = (P,T,F)

Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosisPatient

Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosis

Petri nets can be more complex
Source: http://bit.ly/1XZQhYZ

Modelling MCPS as Petri net
A Petri net is an MCPS iff all the following conditions apply:
The Petri net is a workflow net.
The Petri net is well-handled and acyclic.
The places P{i,o} have only a single input and a single output.
The Petri net is 1-sound.
The Petri net is safe.
All the transitions with multiple inputs or outputs are AND-join or AND-split,
respectively.

Hierarchical MCPS with parallel paths
dummy dummy
i o

Hierarchical MCPS with parallel paths
dummy dummy
i o
Random
Feature
Selection
RandomSubspace
Decision
Tree
Mean

Automating the composition
and optimisation of MCPS

Algorithm Selection
What are the best algorithms to process my data?

Hyperparameter Optimisation
How to tune the hyperparameters to get the best performance?

CASH problem for MCPS
Combined Algorithm Selection and Hyperparameter configuration problem
k-fold cross validation
Objective function
(e.g. classification error)
HyperparametersMCPSs
Training dataset
Validation dataset
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms.
In: Proc. of the 19th ACM SIGKDD. (2013) 847–855
Martin Salvador M., Budka M., Gabrys B.: Automatic composition and optimisation of multicomponent predictive systems. IEEE Transactions on Knowledge and
Data Engineering. under review - available at http://bit.ly/automatic-mcps-paper (submitted on 01/04/2016)

Search space
PREV
NEW
FULL
Predictor Meta-Predictor
Missing
Value
Handling
Outlier
Detection
and
Handling
Data
Transformatio
n
Dimensionality
Reduction
Sampling
Hyperparameters
PREV NEW FULL
756 1186 1564

Optimisation strategies
Grid search: exhaustive exploration of the whole search space. Not feasible in high
dimensional spaces.
Random search: explores the search space randomly during a given time.
Bayesian optimisation: assumes that there is a function between the hyperparameters and
the objective and try to explore the most promising parts of the search space.
Hutter, F., Hoos, H. H., & Leyton-
Brown, K. (2011). Sequential
Model-Based Optimization for
General Algorithm
Configuration. Learning and
Intelligent Optimization, 6683
LNCS, 507–523.

Auto-WEKA for MCPS
WEKA methods as search space
One-click black box
Data + Time Budget → MCPS
Our contribution
● Recursive extension of complex
hyperparameters in the search space.
● Composition and optimisation of
MCPSs (including WEKA filters,
predictors and meta-predictors)
https://github.com/dsibournemouth/autoweka

Evaluated strategies
1. WEKA-Def: All the predictors and meta-predictors are run using WEKA’s
default hyperparameter values.
2. Random search: The search space is randomly explored.
3. SMAC: Sequential Model-based Algorithm Configuration incrementally builds a
Random Forest as surrogate model.
4. TPE: Tree-structure Parzen Estimation uses Gaussian Processes to
incrementally build a surrogate model.
Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential Model-Based Optimization for General Algorithm Configuration. Learning and Intelligent Optimization,
6683 LNCS, 507–523.
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, Algorithms for Hyper-Parameter Optimization. in Advances in NIPS 24, 2011, pp. 1–9.

Experiments
21 datasets (classification problems)
Budget: 30 CPU-hours (per run)
25 runs with different seeds
Timeout: 30 minutes
Memout: 3GB RAM

Holdout error (% misclassification)

Convergence analysis
10-fold CV error of best solutions over time (each color is a different run/seed)

MCPS similarity analysis
Weight for the i-th transition
Hamming distance at the i-th transition
Low error variance and
high MCPS similarity
Low error variance and
low MCPS similarity
High error variance and
low MCPS similarity For FULL search space

MCPS similarity analysis: clustering
Waveform dataset and SMAC strategy

SMAC: Sequential Model-based Algorithm Configuration.
Auto-WEKA: toolbox including random search, SMAC and TPE for WEKA
predictors.
Auto-WEKA for MCPS: extension of Auto-WEKA for MCPSs.
Auto-Sklearn: toolbox for automating scikit-learn.
Spearmint: python library for Bayesian optimisation with Gaussian Processes.
Hyperopt: python library for random search and TPE.
HPOLib: common interface for SMAC, Spearmint and Hyperopt.
Available software for Bayesian optimisation

Adapting MCPS
to changing environments

Maintaining an MCPS
Data distribution can change over time and affect predictions
External factors (e.g. weather conditions, new regulations)
Internal factors (e.g. quality of materials, equipment deterioration)
Source: INFER project

Training and testing process
1. Training data is provided
2. Best MCPS found is selected
3. New batch of unlabelled
data requires prediction
4. MCPS generates predictions
5. True labels are provided
6. Predictive accuracy is
reported
7. MCPS is adapted using the last
batch of labelled data

Datasets from chemical production processes

Average classification error (%)

Average classification error per batch (%)
Baseline
Batch
Batch+SMAC
Cumulative
Cumulative+SMAC
drierthermalox
Batch adaptation
doesn’t help! :(
Batch
adaptation
does help! :)

MCPS similarity analysis
Batch+SMAC Cumulative+SMAC
catalyst catalyst
Same components, only
hyperparameters are
adapted
Large difference
between batches

Conclusion and future work
Automatic machine learning is becoming a reality. There is a variety of open-source
software but also commercial products (e.g. SigOpt and IBM Watson)
Domain expert is still playing a crucial role (e.g. defining the search space)
Smart techniques to reduce the search space are needed
Maintaining MCPSs in a production environment is key for success
Gap in adaptive surrogate models for Bayesian optimisation methods

Thanks!
Publications with Marcin Budka and Bogdan Gabrys:
● “Towards automatic composition of Multicomponent Predictive Systems” - HAIS 2016 (published)
http://bit.ly/towards-mcps-paper
● “Automatic composition and optimisation of Multicomponent Predictive Systems” - IEEE TKDE (under
review) http://bit.ly/automatic-mcps-paper
● “Adapting Multicomponent Predictive Systems using hybrid adaptation Strategies with Auto-WEKA in
process industry” - AutoML at ICML 2016 (accepted) http://bit.ly/adapting-mcps-paper
● “Effects of change propagation resulting from adaptive preprocessing in Multicomponent Predictive
Systems” - KES 2016 (accepted) http://bit.ly/change-propagation-mcps-paper
Slides available in http://www.slideshare.net/draxus
Contact: Manuel Martin Salvador msalvador@bournemouth.ac.uk

Automating Machine Learning - Is it feasible?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Automating Machine Learning - Is it feasible?

Similar to Automating Machine Learning - Is it feasible? (20)

More from Manuel Martín

More from Manuel Martín (20)

Recently uploaded

Recently uploaded (20)

Automating Machine Learning - Is it feasible?