Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PythonとAutoML at PyConJP 2019

8,557 views

Published on

PyCon JP 2019 発表資料「PythonとAutoML」

データ分析の活用の幅の広がりに伴い、AutoMLの重要性が増してきました。本セッションでは、AutoMLの基礎事項から研究のトレンド、注目すべきPythonのOSSライブラリの紹介を行ないます。

Published in: Data & Analytics
  • Be the first to comment

PythonとAutoML at PyConJP 2019

  1. 1. Feature Preprocessing Raw Data Feature Selection Feature Model Selection Data Cleaning Automated Machine Learning in Python PyCon JP 2019 AI Lab Python AutoML Feature Preprocessing Raw Data Feature Selection Feature Model Selection Data Cleaning
  2. 2. CyberAgent AI Lab Masashi SHIBATA c-bata c_bata_ Python
  3. 3. Feature Preprocessing Feature Selection Model Selection Parameter Optimization Model Validation Data Cleaning
  4. 4. Feature Preprocessing Feature Selection Model Selection Parameter Optimization Model Validation Data Cleaning
  5. 5. 1 2 3 4 Automated Feature Engineering AutoML Automated Hyperparameter Optimization Automated Algorithm(Model) Selection
  6. 6. Feature Preprocessing Feature Selection Feature Construction Model Selection Parameter Optimization Model Validation Data Cleaning Topic 1 AutoML
  7. 7. 
 Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 
 https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
  8. 8. 
 Automated Hyperparameter Optimization Hyperopt, Optuna, SMAC3, scikit-optimize, … Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 
 https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
  9. 9. Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 
 https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions 
 HPO + Automated Feature Engineering featuretools, tsfresh, boruta, …
  10. 10. 
 Automated Algorithm(Model) Selection Auto-sklearn, TPOT, H2O, auto_ml, MLBox, … Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 
 https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
  11. 11. Feature Preprocessing Feature Selection Feature Construction Model Selection Parameter Optimization Model Validation Data Cleaning Topic 2
  12. 12. Grid Search / Random Search
  13. 13. Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.
  14. 14. Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599. : :
  15. 15. Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599. 

  16. 16. 
 
 Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.
  17. 17. 
 
 Jamieson, K. G. and Talwalkar, A. S.: Non-stochastic Best Arm Identification and Hyperparameter Optimization, in AIS-TATS (2016). 10 epochs trial #1 trial #2 trial #3 trial #4 trial #5 trial #6 trial #7 trial #8 trial #9 30 epochs 90 epochs Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar.
 Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934, 2018.
  18. 18.
  19. 19. TPE, Asynchronous Successive Halving, Median Stopping Rule Define-by-Run https://github.com/pfnet/optuna
  20. 20. rung 0 0.088 0.056 0.035 0.027 0.495 0.122 0.150 0.788 0.093 0.115 0.238 0.106 0.104 0.058 trial 0 trial 4 trial 2 trial 6 trial 24 trial 1 trial 8 trial 5 trial 18 trial 3 trial 7 rung 1 rung 2 rung 3 rung 

  21. 21. 10 worker 
 
 
 

  22. 22. scikit-optimize 

  23. 23. Feature Preprocessing Feature Selection Feature Construction Model Selection Parameter Optimization Model Validation Data Cleaning Topic 3
  24. 24. 1. Feature Preprocessing Operators.
 StandardScaler, RobustScaler, MinMaxScaler, MaxAbsScaler, RandomizedPCA, Binarizer, and PolynomialFeatures. 2. Feature Selection Operators: VarianceThreshold, SelectKBest, SelectPercentile, SelectFwe, and Recursive Feature Elimination (RFE). AutoML feature preprocessing M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.
 Efficient and robust automated machine learning. In Neural Information Processing Systems (NIPS), 2015 R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization tool for automating machine learning. In Workshop on Automatic Machine Learning, 2016 TPOT Auto-sklearn
  25. 25. 1. Feature Preprocessing Operators.
 StandardScaler, RobustScaler, MinMaxScaler, MaxAbsScaler, RandomizedPCA, Binarizer, and PolynomialFeatures. 2. Feature Selection Operators: VarianceThreshold, SelectKBest, SelectPercentile, SelectFwe, and Recursive Feature Elimination (RFE). AutoML feature preprocessing M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.
 Efficient and robust automated machine learning. In Neural Information Processing Systems (NIPS), 2015 R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization tool for automating machine learning. In Workshop on Automatic Machine Learning, 2016 TPOT Auto-sklearn
  26. 26. TPOT Auto-sklearn AutoML • featuretools: Deep feature synthesis • tsfresh Wrapper methods, Filter methods, Embedded methods • scikit-learn Boruta
  27. 27. Feature Preprocessing Feature Selection Model Selection Parameter Optimization Model Validation Data Cleaning
  28. 28. 
 
 

  29. 29. 
 63 
 794 Time Series FeatuRE extraction based on Scalable Hypothesis tests https://github.com/blue-yonder/tsfresh
  30. 30. Feature Preprocessing Feature Selection Model Selection Parameter Optimization Model Validation Data Cleaning
  31. 31. Guyon and A. Elisseeff. An introduction to variable and feature selection.
 Journal of Machine Learning Research, 3:1157–1182, 2003.
  32. 32. Filter method Wrapper method sklearn.feature_selection.RFE(Recursive Feature Elimination), Boruta (boruta_py) Embedded method scikit-learn feature_importances_ Guyon and A. Elisseeff. An introduction to variable and feature selection.
 Journal of Machine Learning Research, 3:1157–1182, 2003.
  33. 33. Feature Preprocessing Feature Selection Feature Construction Model Selection Parameter Optimization Model Validation Data Cleaning Topic 4 ( )
  34. 34. https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
  35. 35. • AutoML 2 • ML • ML • 
 AutoML as a CASH Problem Combined Algorithm Selection and Hyperparameter optimization M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.
 Efficient and robust automated machine learning. In Neural Information Processing Systems (NIPS), 2015
  36. 36. Using Optuna for CASH Problems def objective(trial): iris = sklearn.datasets.load_iris() x, y = iris.data, iris.target classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest']) if classifier_name == 'SVC': svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10) classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto') else: rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32)) classifier_obj = sklearn.ensemble.RandomForestClassifier( max_depth=rf_max_depth, n_estimators=10) score = sklearn.model_selection.cross_val_score( classifier_obj, x, y, n_jobs=-1, cv=3) accuracy = score.mean() return accuracy https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
  37. 37. Optuna for CASH Problem def objective(trial): iris = sklearn.datasets.load_iris() x, y = iris.data, iris.target classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest']) if classifier_name == 'SVC': svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10) classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto') else: rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32)) classifier_obj = sklearn.ensemble.RandomForestClassifier( max_depth=rf_max_depth, n_estimators=10) score = sklearn.model_selection.cross_val_score( classifier_obj, x, y, n_jobs=-1, cv=3) accuracy = score.mean() return accuracy https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py Algorithm Selection
  38. 38. Optuna for CASH Problem def objective(trial): iris = sklearn.datasets.load_iris() x, y = iris.data, iris.target classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest']) if classifier_name == 'SVC': svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10) classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto') else: rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32)) classifier_obj = sklearn.ensemble.RandomForestClassifier( max_depth=rf_max_depth, n_estimators=10) score = sklearn.model_selection.cross_val_score( classifier_obj, x, y, n_jobs=-1, cv=3) accuracy = score.mean() return accuracy https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py Hyperparameter optimization
  39. 39. • auto-sklearn • TPOT • h2o-3 • auto_ml (unmaintained) • MLBox Adithya Balaji, Alexander Allen Benchmarking Automatic Machine Learning Frameworks https://arxiv.org/pdf/1808.06492v1.pdf
  40. 40. AutoML SMAC3 ChaLearn AutoML challenge 2 track Auto-sklearn import sklearn.metrics import autosklearn.classification X_train, X_test, y_train, y_test = train_test_split(…) automl = autosklearn.classification.AutoSklearnClassifier(…) automl.fit(X_train.copy(), y_train.copy(), dataset_name='breast_cancer') print(automl.show_models()) predictions = automl.predict(X_test) print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions)) https://github.com/automl/auto-sklearn
  41. 41. TPOT https://github.com/EpistasisLab/tpot Tree-based Pipeline Optimization Tool for Automating Data Science
  42. 42. 
 TPOT https://github.com/EpistasisLab/tpot from tpot import TPOTClassifier from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(…) tpot = TPOTClassifier(verbosity=2, max_time_mins=2) tpot.fit(X_train, y_train) print(tpot.score(X_test, y_test)) tpot.export('tpot_iris_pipeline.py')
  43. 43. • 20 preprocessors • 16 feature selectors, • 1-hot encoding, missing value imputation, balancing, scaling • 17 classifiers • pre-defined hyperparameter spaces • 20 preprocessors • 12 classifiers • pre-defined hyperparameter spaces • Pipeline search space: • Flexible: combining tree-shaped pipelines Auto-sklearn TPOT
  44. 44. Automated Neural Architecture Search
  45. 45. Feature Preprocessing Feature Selection Model Selection Parameter Optimization Model Validation Data Cleaning
  46. 46.
  47. 47. THANK YOU
  48. 48.
  49. 49.
  50. 50.

×