PythonとAutoML at PyConJP 2019

Feature
Preprocessing
Raw Data
Feature
Selection
Feature
Model
Selection
Data Cleaning
Automated Machine Learning in Python
PyCon JP 2019
AI Lab
Python AutoML
Feature
Preprocessing
Raw Data
Feature
Selection
Feature
Model
Selection
Data Cleaning

CyberAgent AI Lab
Masashi SHIBATA
c-bata c_bata_
Python

Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning

1
2
3
4
Automated Feature Engineering
AutoML
Automated Hyperparameter Optimization
Automated Algorithm(Model) Selection

Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 1
AutoML

Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019  
https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions

Automated Hyperparameter Optimization

Hyperopt, Optuna, SMAC3, scikit-optimize, …

 
HPO + Automated Feature Engineering

featuretools, tsfresh, boruta, …

Automated Algorithm(Model) Selection

Auto-sklearn, TPOT, H2O, auto_ml, MLBox, …

Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 2

Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.

Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.

:

:

Jamieson, K. G. and Talwalkar, A. S.: Non-stochastic Best Arm Identiﬁcation

and Hyperparameter Optimization, in AIS-TATS (2016).
10 epochs
trial #1
trial #2
trial #3
trial #4
trial #5
trial #6
trial #7
trial #8
trial #9
30 epochs 90 epochs
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina,
Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 
Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934, 2018.

TPE, Asynchronous Successive Halving, Median Stopping Rule
Deﬁne-by-Run
https://github.com/pfnet/optuna

rung 0
0.088 0.056 0.035 0.027
0.495
0.122
0.150
0.788
0.093
0.115
0.238
0.106
0.104 0.058
trial 0
trial 4
trial 2
trial 6
trial 24
trial 1
trial 8
trial 5
trial 18
trial 3
trial 7
rung 1 rung 2 rung 3
rung

Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 3

1. Feature Preprocessing Operators. 
StandardScaler, RobustScaler,
MinMaxScaler, MaxAbsScaler,
RandomizedPCA, Binarizer, and
PolynomialFeatures.

2. Feature Selection Operators:
VarianceThreshold, SelectKBest,
SelectPercentile, SelectFwe, and
Recursive Feature Elimination (RFE).
AutoML feature preprocessing
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. 
Eﬃcient and robust automated machine learning.

In Neural Information Processing Systems (NIPS), 2015
R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization

tool for automating machine learning.

In Workshop on Automatic Machine Learning, 2016
TPOT Auto-sklearn

TPOT Auto-sklearn AutoML
• featuretools: Deep feature synthesis

• tsfresh

Wrapper methods, Filter methods, Embedded methods

• scikit-learn Boruta

63  
794
Time Series FeatuRE extraction based on Scalable Hypothesis tests
https://github.com/blue-yonder/tsfresh

Guyon and A. Elisseeﬀ. An introduction to variable and feature selection. 
Journal of Machine Learning Research, 3:1157–1182, 2003.

Filter method
Wrapper method
sklearn.feature_selection.RFE(Recursive Feature Elimination), Boruta (boruta_py)
Embedded method
scikit-learn feature_importances_
Guyon and A. Elisseeﬀ. An introduction to variable and feature selection. 
Journal of Machine Learning Research, 3:1157–1182, 2003.

Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 4
( )

https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

• AutoML 2

• ML

• ML

•  
AutoML as a CASH Problem
Combined Algorithm Selection and Hyperparameter optimization
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. 
Eﬃcient and robust automated machine learning. In Neural Information Processing Systems (NIPS), 2015

Using Optuna for CASH Problems
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py

Optuna for CASH Problem
else:
return accuracy
Algorithm Selection

Optuna for CASH Problem
else:
return accuracy
Hyperparameter optimization

• auto-sklearn

• TPOT

• h2o-3

• auto_ml (unmaintained)

• MLBox
Adithya Balaji, Alexander Allen Benchmarking Automatic Machine Learning Frameworks https://arxiv.org/pdf/1808.06492v1.pdf

AutoML
SMAC3
ChaLearn AutoML challenge 2 track
Auto-sklearn
import sklearn.metrics
import autosklearn.classification
X_train, X_test, y_train, y_test = train_test_split(…)
automl = autosklearn.classification.AutoSklearnClassifier(…)
automl.fit(X_train.copy(), y_train.copy(), dataset_name='breast_cancer')
print(automl.show_models())
predictions = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
https://github.com/automl/auto-sklearn

TPOT
https://github.com/EpistasisLab/tpot
Tree-based Pipeline Optimization Tool for Automating Data Science

TPOT
https://github.com/EpistasisLab/tpot
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(…)
tpot = TPOTClassifier(verbosity=2, max_time_mins=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')

• 20 preprocessors

• 16 feature selectors,

• 1-hot encoding, missing
value imputation,
balancing, scaling

• 17 classifiers

• pre-defined
hyperparameter spaces
• 20 preprocessors

• 12 classifiers

• pre-defined
hyperparameter spaces

• Pipeline search space:

• Flexible: combining
tree-shaped pipelines
Auto-sklearn TPOT

Automated Neural
Architecture Search

PythonとAutoML at PyConJP 2019

More Related Content

What's hot

Similar to PythonとAutoML at PyConJP 2019

More from Masashi Shibata

Recently uploaded

PythonとAutoML at PyConJP 2019