SlideShare a Scribd company logo
1 of 54
Automating Machine Learning
Is it feasible?
Manuel Martin Salvador
Smart Technology Research Group
Bournemouth University
June 2nd, 2016
Index
1. Recent life-changing applications of Machine Learning
2. Multicomponent Predictive Systems (MCPS)
3. Automating the composition and optimisation of MCPS
4. Adapting MCPS to changing environments
5. Conclusion and future work
Recent life-changing
applications of
Machine Learning
Gene Discovery
Source: http://msgeneticslab.med.ubc.ca/gene-discovery/
Dessa Sadovnick and Carles Vilariño-Güell
University of British Columbia
A mutation in NR1H3 protein can trigger Multiple Sclerosis
Microsoft Seeing AI
Source: https://www.youtube.com/watch?v=R2mC-NUAmMk
Autonomous Vehicles
Source: https://www.youtube.com/watch?v=dk3oc1Hr62g
Instant Translation
Source: https://www.skype.com/en/features/skype-translator/
Multicomponent
Predictive Systems
Predictive Modelling
Labelled
Data
Supervised
Learning
Algorithm
Predictive
Model
Classification and Regression
Data is imperfect
Missing
Values
Noise
High
dimensionality
Outliers
Question Mark: http://commons.wikimedia.org/wiki/File:Question_mark_road_sign,_Australia.jpg
Noise: http://www.flickr.com/photos/benleto/3223155821/
Outliers: http://commons.wikimedia.org/wiki/File:Diagrama_de_caixa_com_outliers_and_whisker.png
3D plot: http://salsahpc.indiana.edu/plotviz/
Multicomponent Predictive System (MCPS)
Data Postprocessing PredictionsPreprocessing
Predictive
Model
Multicomponent Predictive System (MCPS)
Preprocessing
Data
Predictive
Model
Postprocessing Predictions
Preprocessing
Preprocessing
Predictive
Model
Predictive
Model
How to model MCPS?
Function composition: Not enough for modelling parallel paths.
Directed Acyclic Graph: Not enough to model process state.
Petri net: Very flexible and robust mathematical background.
Expressivepower
Y = h(g(f(X)))
f g hX Y
f g hX Y
Petri net
Mathematical modelling language invented in 1939 by Carl Adam Petri
token
place
transition
arc
N = (P,T,F)
Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosisPatient
Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosis
Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosis
Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosis
Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosis
Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosis
Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosis
Example of Petri net
Reception Waiting
Room
Check in
Consulting
Room
Exit
Call in
Examination
and diagnosis
Petri nets can be more complex
Source: http://bit.ly/1XZQhYZ
Modelling MCPS as Petri net
A Petri net is an MCPS iff all the following conditions apply:
The Petri net is a workflow net.
The Petri net is well-handled and acyclic.
The places P{i,o} have only a single input and a single output.
The Petri net is 1-sound.
The Petri net is safe.
All the transitions with multiple inputs or outputs are AND-join or AND-split,
respectively.
Hierarchical MCPS with parallel paths
dummy dummy
i o
Hierarchical MCPS with parallel paths
dummy dummy
i o
Random
Feature
Selection
RandomSubspace
Decision
Tree
Mean
Any questions so far?
Automating the composition
and optimisation of MCPS
Algorithm Selection
What are the best algorithms to process my data?
Hyperparameter Optimisation
How to tune the hyperparameters to get the best performance?
CASH problem for MCPS
Combined Algorithm Selection and Hyperparameter configuration problem
k-fold cross validation
Objective function
(e.g. classification error)
HyperparametersMCPSs
Training dataset
Validation dataset
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms.
In: Proc. of the 19th ACM SIGKDD. (2013) 847–855
Martin Salvador M., Budka M., Gabrys B.: Automatic composition and optimisation of multicomponent predictive systems. IEEE Transactions on Knowledge and
Data Engineering. under review - available at http://bit.ly/automatic-mcps-paper (submitted on 01/04/2016)
Search space
PREV
NEW
FULL
Predictor Meta-Predictor
Predictor Meta-Predictor
Predictor Meta-Predictor
Missing
Value
Handling
Outlier
Detection
and
Handling
Data
Transformatio
n
Dimensionality
Reduction
Sampling
Hyperparameters
PREV NEW FULL
756 1186 1564
Optimisation strategies
Grid search: exhaustive exploration of the whole search space. Not feasible in high
dimensional spaces.
Random search: explores the search space randomly during a given time.
Bayesian optimisation: assumes that there is a function between the hyperparameters and
the objective and try to explore the most promising parts of the search space.
Hutter, F., Hoos, H. H., & Leyton-
Brown, K. (2011). Sequential
Model-Based Optimization for
General Algorithm
Configuration. Learning and
Intelligent Optimization, 6683
LNCS, 507–523.
Auto-WEKA for MCPS
WEKA methods as search space
One-click black box
Data + Time Budget → MCPS
Our contribution
● Recursive extension of complex
hyperparameters in the search space.
● Composition and optimisation of
MCPSs (including WEKA filters,
predictors and meta-predictors)
https://github.com/dsibournemouth/autoweka
Evaluated strategies
1. WEKA-Def: All the predictors and meta-predictors are run using WEKA’s
default hyperparameter values.
2. Random search: The search space is randomly explored.
3. SMAC: Sequential Model-based Algorithm Configuration incrementally builds a
Random Forest as surrogate model.
4. TPE: Tree-structure Parzen Estimation uses Gaussian Processes to
incrementally build a surrogate model.
Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential Model-Based Optimization for General Algorithm Configuration. Learning and Intelligent Optimization,
6683 LNCS, 507–523.
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, Algorithms for Hyper-Parameter Optimization. in Advances in NIPS 24, 2011, pp. 1–9.
Experiments
21 datasets (classification problems)
Budget: 30 CPU-hours (per run)
25 runs with different seeds
Timeout: 30 minutes
Memout: 3GB RAM
Training and testing process
Holdout error (% misclassification)
Convergence analysis
10-fold CV error of best solutions over time (each color is a different run/seed)
MCPS similarity analysis
Weight for the i-th transition
Hamming distance at the i-th transition
Low error variance and
high MCPS similarity
Low error variance and
low MCPS similarity
High error variance and
low MCPS similarity For FULL search space
MCPS similarity analysis: clustering
Waveform dataset and SMAC strategy
SMAC: Sequential Model-based Algorithm Configuration.
Auto-WEKA: toolbox including random search, SMAC and TPE for WEKA
predictors.
Auto-WEKA for MCPS: extension of Auto-WEKA for MCPSs.
Auto-Sklearn: toolbox for automating scikit-learn.
Spearmint: python library for Bayesian optimisation with Gaussian Processes.
Hyperopt: python library for random search and TPE.
HPOLib: common interface for SMAC, Spearmint and Hyperopt.
Available software for Bayesian optimisation
Any questions so far?
Adapting MCPS
to changing environments
Maintaining an MCPS
Data distribution can change over time and affect predictions
External factors (e.g. weather conditions, new regulations)
Internal factors (e.g. quality of materials, equipment deterioration)
Source: INFER project
Training and testing process
1. Training data is provided
2. Best MCPS found is selected
3. New batch of unlabelled
data requires prediction
4. MCPS generates predictions
5. True labels are provided
6. Predictive accuracy is
reported
7. MCPS is adapted using the last
batch of labelled data
Evaluated strategies
Datasets from chemical production processes
Average classification error (%)
Average classification error per batch (%)
Baseline
Batch
Batch+SMAC
Cumulative
Cumulative+SMAC
drierthermalox
Batch adaptation
doesn’t help! :(
Batch
adaptation
does help! :)
MCPS similarity analysis
Batch+SMAC Cumulative+SMAC
catalyst catalyst
Same components, only
hyperparameters are
adapted
Large difference
between batches
Conclusion and future work
Automatic machine learning is becoming a reality. There is a variety of open-source
software but also commercial products (e.g. SigOpt and IBM Watson)
Domain expert is still playing a crucial role (e.g. defining the search space)
Smart techniques to reduce the search space are needed
Maintaining MCPSs in a production environment is key for success
Gap in adaptive surrogate models for Bayesian optimisation methods
Thanks!
Publications with Marcin Budka and Bogdan Gabrys:
● “Towards automatic composition of Multicomponent Predictive Systems” - HAIS 2016 (published)
http://bit.ly/towards-mcps-paper
● “Automatic composition and optimisation of Multicomponent Predictive Systems” - IEEE TKDE (under
review) http://bit.ly/automatic-mcps-paper
● “Adapting Multicomponent Predictive Systems using hybrid adaptation Strategies with Auto-WEKA in
process industry” - AutoML at ICML 2016 (accepted) http://bit.ly/adapting-mcps-paper
● “Effects of change propagation resulting from adaptive preprocessing in Multicomponent Predictive
Systems” - KES 2016 (accepted) http://bit.ly/change-propagation-mcps-paper
Slides available in http://www.slideshare.net/draxus
Contact: Manuel Martin Salvador msalvador@bournemouth.ac.uk

More Related Content

What's hot

Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à ZAlexia Audevart
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine LearningCodeForFrankfurt
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine LearningJeff Tanner
 
Target Leakage in Machine Learning
Target Leakage in Machine LearningTarget Leakage in Machine Learning
Target Leakage in Machine LearningYuriy Guts
 
Meetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesMeetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesZenodia Charpy
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
 
Computational decision making
Computational decision makingComputational decision making
Computational decision makingBoris Adryan
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessMLAI2
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessingAjay Taneja
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning FundamentalsSigOpt
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Gülden Bilgütay
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 
Computation graphs - Tensorflow & CNTK
Computation graphs - Tensorflow & CNTKComputation graphs - Tensorflow & CNTK
Computation graphs - Tensorflow & CNTKA H M Forhadul Islam
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisVivek Raja P S
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with PythonBenjamin Bengfort
 

What's hot (20)

Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Target Leakage in Machine Learning
Target Leakage in Machine LearningTarget Leakage in Machine Learning
Target Leakage in Machine Learning
 
Meetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesMeetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo cases
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Computational decision making
Computational decision makingComputational decision making
Computational decision making
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessing
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning Fundamentals
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Bayesian Global Optimization
Bayesian Global OptimizationBayesian Global Optimization
Bayesian Global Optimization
 
Computation graphs - Tensorflow & CNTK
Computation graphs - Tensorflow & CNTKComputation graphs - Tensorflow & CNTK
Computation graphs - Tensorflow & CNTK
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Introduction to Auto ML
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto ML
 

Similar to Automating Machine Learning - Is it feasible?

Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...IJECEIAES
 
Modelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri NetsModelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri NetsManuel Martín
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
PNNL April 2011 ogce
PNNL April 2011 ogcePNNL April 2011 ogce
PNNL April 2011 ogcemarpierc
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biologyNeil Swainston
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningKAMAL CHOUDHARY
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsRavi Kumar
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Luigi Vanfretti
 
Mpp Rsv 2008 Public
Mpp Rsv 2008 PublicMpp Rsv 2008 Public
Mpp Rsv 2008 Publiclab13unisa
 
Machine Learning and Data Analytics in Semiconductor Yield Management.pptx
Machine Learning and Data Analytics in Semiconductor Yield Management.pptxMachine Learning and Data Analytics in Semiconductor Yield Management.pptx
Machine Learning and Data Analytics in Semiconductor Yield Management.pptxyieldWerx Semiconductor
 

Similar to Automating Machine Learning - Is it feasible? (20)

Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
Automated-tuned hyper-parameter deep neural network by using arithmetic optim...
 
Modelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri NetsModelling Multi-Component Predictive Systems as Petri Nets
Modelling Multi-Component Predictive Systems as Petri Nets
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
PNNL April 2011 ogce
PNNL April 2011 ogcePNNL April 2011 ogce
PNNL April 2011 ogce
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm""Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patterns
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
 
Mpp Rsv 2008 Public
Mpp Rsv 2008 PublicMpp Rsv 2008 Public
Mpp Rsv 2008 Public
 
PPT
PPTPPT
PPT
 
Machine Learning and Data Analytics in Semiconductor Yield Management.pptx
Machine Learning and Data Analytics in Semiconductor Yield Management.pptxMachine Learning and Data Analytics in Semiconductor Yield Management.pptx
Machine Learning and Data Analytics in Semiconductor Yield Management.pptx
 

More from Manuel Martín

Automatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datosAutomatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datosManuel Martín
 
Brand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspectiveBrand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspectiveManuel Martín
 
Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...Manuel Martín
 
Improving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devicesImproving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devicesManuel Martín
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyManuel Martín
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisManuel Martín
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream miningManuel Martín
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datosManuel Martín
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datosManuel Martín
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaManuel Martín
 
Operaciones Colectivas en MPI
Operaciones Colectivas en MPIOperaciones Colectivas en MPI
Operaciones Colectivas en MPIManuel Martín
 
Introducción a GNU/Linux
Introducción a GNU/LinuxIntroducción a GNU/Linux
Introducción a GNU/LinuxManuel Martín
 
Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Manuel Martín
 
Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Manuel Martín
 
Presentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en GranadaPresentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en GranadaManuel Martín
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaManuel Martín
 
Pintando gráficas con Python
Pintando gráficas con PythonPintando gráficas con Python
Pintando gráficas con PythonManuel Martín
 

More from Manuel Martín (20)

Hogar (Des)Conectado
Hogar (Des)ConectadoHogar (Des)Conectado
Hogar (Des)Conectado
 
Automatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datosAutomatizando el aprendizaje basado en datos
Automatizando el aprendizaje basado en datos
 
Brand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspectiveBrand engagement with mobile gamification apps from a developer perspective
Brand engagement with mobile gamification apps from a developer perspective
 
Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...Effects of change propagation resulting from adaptive preprocessing in multic...
Effects of change propagation resulting from adaptive preprocessing in multic...
 
Improving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devicesImproving transport timetables usability for mobile devices
Improving transport timetables usability for mobile devices
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream mining
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datos
 
Minería de secuencias de datos
Minería de secuencias de datosMinería de secuencias de datos
Minería de secuencias de datos
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
 
Decompiladores
DecompiladoresDecompiladores
Decompiladores
 
Operaciones Colectivas en MPI
Operaciones Colectivas en MPIOperaciones Colectivas en MPI
Operaciones Colectivas en MPI
 
Introducción a GNU/Linux
Introducción a GNU/LinuxIntroducción a GNU/Linux
Introducción a GNU/Linux
 
Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011Presentación Día de la Libertad del Software 2011
Presentación Día de la Libertad del Software 2011
 
Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010Presentacion Taller de Introducción a Linux SFD2010
Presentacion Taller de Introducción a Linux SFD2010
 
Presentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en GranadaPresentación Gnome 3.0 en Granada
Presentación Gnome 3.0 en Granada
 
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de AndalucíaAndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
 
Pintando gráficas con Python
Pintando gráficas con PythonPintando gráficas con Python
Pintando gráficas con Python
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 

Automating Machine Learning - Is it feasible?

  • 1. Automating Machine Learning Is it feasible? Manuel Martin Salvador Smart Technology Research Group Bournemouth University June 2nd, 2016
  • 2. Index 1. Recent life-changing applications of Machine Learning 2. Multicomponent Predictive Systems (MCPS) 3. Automating the composition and optimisation of MCPS 4. Adapting MCPS to changing environments 5. Conclusion and future work
  • 4. Gene Discovery Source: http://msgeneticslab.med.ubc.ca/gene-discovery/ Dessa Sadovnick and Carles Vilariño-Güell University of British Columbia A mutation in NR1H3 protein can trigger Multiple Sclerosis
  • 5. Microsoft Seeing AI Source: https://www.youtube.com/watch?v=R2mC-NUAmMk
  • 11. Data is imperfect Missing Values Noise High dimensionality Outliers Question Mark: http://commons.wikimedia.org/wiki/File:Question_mark_road_sign,_Australia.jpg Noise: http://www.flickr.com/photos/benleto/3223155821/ Outliers: http://commons.wikimedia.org/wiki/File:Diagrama_de_caixa_com_outliers_and_whisker.png 3D plot: http://salsahpc.indiana.edu/plotviz/
  • 12. Multicomponent Predictive System (MCPS) Data Postprocessing PredictionsPreprocessing Predictive Model
  • 13. Multicomponent Predictive System (MCPS) Preprocessing Data Predictive Model Postprocessing Predictions Preprocessing Preprocessing Predictive Model Predictive Model
  • 14. How to model MCPS? Function composition: Not enough for modelling parallel paths. Directed Acyclic Graph: Not enough to model process state. Petri net: Very flexible and robust mathematical background. Expressivepower Y = h(g(f(X))) f g hX Y f g hX Y
  • 15. Petri net Mathematical modelling language invented in 1939 by Carl Adam Petri token place transition arc N = (P,T,F)
  • 16. Example of Petri net Reception Waiting Room Check in Consulting Room Exit Call in Examination and diagnosisPatient
  • 17. Example of Petri net Reception Waiting Room Check in Consulting Room Exit Call in Examination and diagnosis
  • 18. Example of Petri net Reception Waiting Room Check in Consulting Room Exit Call in Examination and diagnosis
  • 19. Example of Petri net Reception Waiting Room Check in Consulting Room Exit Call in Examination and diagnosis
  • 20. Example of Petri net Reception Waiting Room Check in Consulting Room Exit Call in Examination and diagnosis
  • 21. Example of Petri net Reception Waiting Room Check in Consulting Room Exit Call in Examination and diagnosis
  • 22. Example of Petri net Reception Waiting Room Check in Consulting Room Exit Call in Examination and diagnosis
  • 23. Example of Petri net Reception Waiting Room Check in Consulting Room Exit Call in Examination and diagnosis
  • 24. Petri nets can be more complex Source: http://bit.ly/1XZQhYZ
  • 25. Modelling MCPS as Petri net A Petri net is an MCPS iff all the following conditions apply: The Petri net is a workflow net. The Petri net is well-handled and acyclic. The places P{i,o} have only a single input and a single output. The Petri net is 1-sound. The Petri net is safe. All the transitions with multiple inputs or outputs are AND-join or AND-split, respectively.
  • 26. Hierarchical MCPS with parallel paths dummy dummy i o
  • 27. Hierarchical MCPS with parallel paths dummy dummy i o Random Feature Selection RandomSubspace Decision Tree Mean
  • 29. Automating the composition and optimisation of MCPS
  • 30. Algorithm Selection What are the best algorithms to process my data?
  • 31. Hyperparameter Optimisation How to tune the hyperparameters to get the best performance?
  • 32. CASH problem for MCPS Combined Algorithm Selection and Hyperparameter configuration problem k-fold cross validation Objective function (e.g. classification error) HyperparametersMCPSs Training dataset Validation dataset Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proc. of the 19th ACM SIGKDD. (2013) 847–855 Martin Salvador M., Budka M., Gabrys B.: Automatic composition and optimisation of multicomponent predictive systems. IEEE Transactions on Knowledge and Data Engineering. under review - available at http://bit.ly/automatic-mcps-paper (submitted on 01/04/2016)
  • 33. Search space PREV NEW FULL Predictor Meta-Predictor Predictor Meta-Predictor Predictor Meta-Predictor Missing Value Handling Outlier Detection and Handling Data Transformatio n Dimensionality Reduction Sampling Hyperparameters PREV NEW FULL 756 1186 1564
  • 34. Optimisation strategies Grid search: exhaustive exploration of the whole search space. Not feasible in high dimensional spaces. Random search: explores the search space randomly during a given time. Bayesian optimisation: assumes that there is a function between the hyperparameters and the objective and try to explore the most promising parts of the search space. Hutter, F., Hoos, H. H., & Leyton- Brown, K. (2011). Sequential Model-Based Optimization for General Algorithm Configuration. Learning and Intelligent Optimization, 6683 LNCS, 507–523.
  • 35. Auto-WEKA for MCPS WEKA methods as search space One-click black box Data + Time Budget → MCPS Our contribution ● Recursive extension of complex hyperparameters in the search space. ● Composition and optimisation of MCPSs (including WEKA filters, predictors and meta-predictors) https://github.com/dsibournemouth/autoweka
  • 36. Evaluated strategies 1. WEKA-Def: All the predictors and meta-predictors are run using WEKA’s default hyperparameter values. 2. Random search: The search space is randomly explored. 3. SMAC: Sequential Model-based Algorithm Configuration incrementally builds a Random Forest as surrogate model. 4. TPE: Tree-structure Parzen Estimation uses Gaussian Processes to incrementally build a surrogate model. Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential Model-Based Optimization for General Algorithm Configuration. Learning and Intelligent Optimization, 6683 LNCS, 507–523. J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, Algorithms for Hyper-Parameter Optimization. in Advances in NIPS 24, 2011, pp. 1–9.
  • 37. Experiments 21 datasets (classification problems) Budget: 30 CPU-hours (per run) 25 runs with different seeds Timeout: 30 minutes Memout: 3GB RAM
  • 39. Holdout error (% misclassification)
  • 40. Convergence analysis 10-fold CV error of best solutions over time (each color is a different run/seed)
  • 41. MCPS similarity analysis Weight for the i-th transition Hamming distance at the i-th transition Low error variance and high MCPS similarity Low error variance and low MCPS similarity High error variance and low MCPS similarity For FULL search space
  • 42. MCPS similarity analysis: clustering Waveform dataset and SMAC strategy
  • 43. SMAC: Sequential Model-based Algorithm Configuration. Auto-WEKA: toolbox including random search, SMAC and TPE for WEKA predictors. Auto-WEKA for MCPS: extension of Auto-WEKA for MCPSs. Auto-Sklearn: toolbox for automating scikit-learn. Spearmint: python library for Bayesian optimisation with Gaussian Processes. Hyperopt: python library for random search and TPE. HPOLib: common interface for SMAC, Spearmint and Hyperopt. Available software for Bayesian optimisation
  • 46. Maintaining an MCPS Data distribution can change over time and affect predictions External factors (e.g. weather conditions, new regulations) Internal factors (e.g. quality of materials, equipment deterioration) Source: INFER project
  • 47. Training and testing process 1. Training data is provided 2. Best MCPS found is selected 3. New batch of unlabelled data requires prediction 4. MCPS generates predictions 5. True labels are provided 6. Predictive accuracy is reported 7. MCPS is adapted using the last batch of labelled data
  • 49. Datasets from chemical production processes
  • 51. Average classification error per batch (%) Baseline Batch Batch+SMAC Cumulative Cumulative+SMAC drierthermalox Batch adaptation doesn’t help! :( Batch adaptation does help! :)
  • 52. MCPS similarity analysis Batch+SMAC Cumulative+SMAC catalyst catalyst Same components, only hyperparameters are adapted Large difference between batches
  • 53. Conclusion and future work Automatic machine learning is becoming a reality. There is a variety of open-source software but also commercial products (e.g. SigOpt and IBM Watson) Domain expert is still playing a crucial role (e.g. defining the search space) Smart techniques to reduce the search space are needed Maintaining MCPSs in a production environment is key for success Gap in adaptive surrogate models for Bayesian optimisation methods
  • 54. Thanks! Publications with Marcin Budka and Bogdan Gabrys: ● “Towards automatic composition of Multicomponent Predictive Systems” - HAIS 2016 (published) http://bit.ly/towards-mcps-paper ● “Automatic composition and optimisation of Multicomponent Predictive Systems” - IEEE TKDE (under review) http://bit.ly/automatic-mcps-paper ● “Adapting Multicomponent Predictive Systems using hybrid adaptation Strategies with Auto-WEKA in process industry” - AutoML at ICML 2016 (accepted) http://bit.ly/adapting-mcps-paper ● “Effects of change propagation resulting from adaptive preprocessing in Multicomponent Predictive Systems” - KES 2016 (accepted) http://bit.ly/change-propagation-mcps-paper Slides available in http://www.slideshare.net/draxus Contact: Manuel Martin Salvador msalvador@bournemouth.ac.uk