By Hayim Makabee
Automated Machine Learning (AutoML)
systems find the right algorithm and
hyperparameters in a data-driven way
without any human intervention.
AutoML allows the data scientist
to extend his productivity
without adding more members
to the data science team.
AutoML addresses the skills gap
between the demand for data
science talent and the
availability of this talent.
Used 165 classification data sets from a variety of sources
and 13 different classification algorithms from scikit-learn.
Compared classification accuracy using default parameters
for each algorithm to a tuned version of those algorithms.
On average, got 5–10% improvement in classification
accuracy from tuning algorithms from default parameters.
However, there is no parameter combination that works best
for all problems.
Tuning is mandatory to see improvement and this feature is
built into all AutoML solutions.
Build a probabilistic model to capture the relationship
between hyperparameter settings and their
Use the model to select useful hyperparameter
settings to try next by trading off exploration
(searching in parts of the space where the model is
uncertain) and exploitation (focusing on parts of the
space predicted to perform well).
Run the machine learning algorithm with those
hyperparameter settings, measure the performance
and update the probabilistic model.
Auto-sklearn is open source, implemented
in python and built around the scikit-
It contains a machine learning pipeline
which takes care of missing values,
categorical features, sparse and dense data,
and rescaling the data.
Next, the pipeline applies a preprocessing
algorithm and an ML algorithm.
Generalizing the Bayesian Algorithm
Bayesian Optimization can be generalized to jointly select algorithms,
preprocessing methods, and their hyperparameters as follows:
• The choices of classifier / regressor and preprocessing methods are top-
level, categorical hyperparameters, and based on their settings the
hyperparameters of the selected methods become active.
• The combined space can then be searched with Bayesian optimization
methods that handle such high-dimensional, conditional spaces.
Auto-sklearn includes 15 ML algorithms, 14
preprocessing methods, and all their respective
hyperparameters, yielding a total of 110
Optimizing performance in Auto-sklearn’s space
of 110 hyperparameters can of course be slow.
To jumpstart this process it uses meta-learning
to start from good hyperparameter settings for
previous similar datasets.
Specifically, Auto-sklearn comes with a
database of previous optimization runs on 140
diverse datasets from OpenML.
For a new dataset, it first identifies the most
similar datasets and starts from the saved best
settings for those.
• Auto-sklearn automatically construct
• Instead of returning a single
hyperparameter, it automatically
constructs ensembles from the models
trained during the Bayesian
• Specifically, Auto-sklearn
uses Ensemble Selection to create
small, powerful ensembles with
increased predictive power and
The ChaLearn AutoML challenge was a machine
Auto-sklearn placed in the top three for nine out of
ten phases and won six of them.
Particularly in the last two phases, Auto-sklearn won
both the auto track and the tweakathon.
During the last two phases of the tweakathon the
team combined Auto-sklearn with Auto-Net for
several datasets to further boost performance.
• TPOT = Tree-based
• TPOT is a Python
Learning tool that
learning pipelines using
TPOT uses Genetic Algorithms to
find the best ML model and
hyperparameters based on the
training / validation set.
The model options include all the
algorithms implemented in the
Parameters include population size
and number of generations to run
the Genetic Algorithm.
Can we really move AutoML from the Lab to
What would be the latency of using an
Ensemble of models in production?
Would the AutoML training time be prohibitive
for big datasets?
I think we need Incremental AutoML: in which
the previous model (together with new data)
serves as an input to find the next best model.
at Yahoo Labs
Finite (large) number of manually pre-defined
model configurations (hyperparameters).
Incremental Learning: previous model was used
as input for training new models.
Used Hadoop Map-Reduce: each Reducer used
one configuration, trained a model and
measured its performance (parallel training).
The model with best performance was chosen
What next? My
Automated ML will not replace the Data Scientist but
will enable the Data Scientist to produce more models
in less time with higher quality.
This is probably the end of “good enough models” using
standard parameters because the Data Scientist did not
have time to check different parameters.
The main advantage is not saving time. The main
benefit is doing things that were never done because of
lack of time.
Data scientists will have more time to collaborate with
business experts to get domain knowledge and use it in