Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Automated Machine Learning (Auto ML)

  1. Automated Machine Learning (AutoML) By Hayim Makabee November 2019
  2. Automated Machine Learning Automated Machine Learning (AutoML) systems find the right algorithm and hyperparameters in a data-driven way without any human intervention.
  3. Auto ML Benefits AutoML allows the data scientist to extend his productivity without adding more members to the data science team. AutoML addresses the skills gap between the demand for data science talent and the availability of this talent.
  4. Olson Experiment on Parameter Tuning Used 165 classification data sets from a variety of sources and 13 different classification algorithms from scikit-learn. Compared classification accuracy using default parameters for each algorithm to a tuned version of those algorithms. On average, got 5–10% improvement in classification accuracy from tuning algorithms from default parameters. However, there is no parameter combination that works best for all problems. Tuning is mandatory to see improvement and this feature is built into all AutoML solutions.
  5. Example: Learning Rate
  6. Bayesian Optimization
  7. Bayesian Optimization for Hyperparameter Selection Build a probabilistic model to capture the relationship between hyperparameter settings and their performance. Use the model to select useful hyperparameter settings to try next by trading off exploration (searching in parts of the space where the model is uncertain) and exploitation (focusing on parts of the space predicted to perform well). Run the machine learning algorithm with those hyperparameter settings, measure the performance and update the probabilistic model.
  8. Bayesian Optimization Algorithm
  9. Auto-sklearn Auto-sklearn is open source, implemented in python and built around the scikit- learn library. It contains a machine learning pipeline which takes care of missing values, categorical features, sparse and dense data, and rescaling the data. Next, the pipeline applies a preprocessing algorithm and an ML algorithm.
  10. Generalizing the Bayesian Algorithm Bayesian Optimization can be generalized to jointly select algorithms, preprocessing methods, and their hyperparameters as follows: • The choices of classifier / regressor and preprocessing methods are top- level, categorical hyperparameters, and based on their settings the hyperparameters of the selected methods become active. • The combined space can then be searched with Bayesian optimization methods that handle such high-dimensional, conditional spaces.
  11. Hyperparameters Auto-sklearn includes 15 ML algorithms, 14 preprocessing methods, and all their respective hyperparameters, yielding a total of 110 hyperparameters.
  12. Meta-learning Optimizing performance in Auto-sklearn’s space of 110 hyperparameters can of course be slow. To jumpstart this process it uses meta-learning to start from good hyperparameter settings for previous similar datasets. Specifically, Auto-sklearn comes with a database of previous optimization runs on 140 diverse datasets from OpenML. For a new dataset, it first identifies the most similar datasets and starts from the saved best settings for those.
  13. Ensemble Selection • Auto-sklearn automatically construct ensembles. • Instead of returning a single hyperparameter, it automatically constructs ensembles from the models trained during the Bayesian optimization. • Specifically, Auto-sklearn uses Ensemble Selection to create small, powerful ensembles with increased predictive power and robustness.
  14. Winning the AutoML challenge The ChaLearn AutoML challenge was a machine learning competition. Auto-sklearn placed in the top three for nine out of ten phases and won six of them. Particularly in the last two phases, Auto-sklearn won both the auto track and the tweakathon. During the last two phases of the tweakathon the team combined Auto-sklearn with Auto-Net for several datasets to further boost performance.
  15. Auto-sklearn Example X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.3) automl = autosklearn.classification.AutoSklearnClassifier (time_left_for_this_task=3600, per_run_time_limit=360) automl.fit(X_train, y_train) print(automl.show_models()) predictions = automl.predict(X_test) probabilities = automl.predict_proba(X_test)[:,1]
  16. Result = Ensemble (0.520000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'random_forest', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'no_preprocessing', 'rescaling:__choice__': 'quantile_transformer', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:random_forest:bootstrap': 'True', 'classifier:random_forest:criterion': 'entropy', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.7884268823432835, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 20, 'classifier:random_forest:min_samples_split': 15, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'classifier:random_forest:n_estimators': 100, 'rescaling:quantile_transformer:n_quantiles': 1000, 'rescaling:quantile_transformer:output_distribution': 'uniform', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.002615346832354839}
  17. Result = Ensemble (0.020000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'no_encoding’,'classifier:__choice__': 'k_nearest_neighbors', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'no_preprocessing', 'rescaling:__choice__': 'standardize', 'classifier:k_nearest_neighbors:n_neighbors': 1, 'classifier:k_nearest_neighbors:p': 2, 'classifier:k_nearest_neighbors:weights': 'uniform'}
  18. Performance – Impact of Time AutoML 20 Minute Run Accuracy : 0.89 Precision : 0.89 Recall : 1.0 ROC AUC : 0.61 AutoML 60 Minutes Run Accuracy : 0.89 Precision : 0.90 Recall : 0.99 ROC AUC : 0.72
  19. Performance – Over-Fitting AutoML 60 Minutes Run Accuracy : 0.89 Precision : 0.90 Recall : 0.99 ROC AUC : 0.72 AutoML 120 Minutes Run Accuracy : 0.87 Precision : 0.91 Recall : 0.95 ROC AUC : 0.70
  20. Performance X Non-AutoML (train data 1) AutoML 60 Minutes Run Accuracy : 0.89 Precision : 0.90 Recall : 0.99 ROC AUC : 0.72 XGBoost ACCURACY = 0.89 PRECISION = 0.90 RECALL = 0.99 AUC = 0.71
  21. Performance X Non-AutoML (train data 2) AutoML 60 Minutes Run Accuracy : 0.73 Precision : 0.69 Recall : 0.56 ROC AUC : 0.79 XGBoost ACCURACY = 0.66 PRECISION = 0.61 RECALL = 0.42 AUC = 0.62
  22. Performance – Balance Negative = 8 X Positive Accuracy : 0.99 Precision : 1.0 Recall : 0.91 ROC AUC : 0.97 Negative = 20 X Positive Accuracy : 0.99 Precision : 0.99 Recall : 0.76 ROC AUC : 0.94
  23. TPOT • TPOT = Tree-based Pipeline Optimization Tool • TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using Genetic Algorithms.
  24. TPOT uses Genetic Algorithms to find the best ML model and hyperparameters based on the training / validation set. The model options include all the algorithms implemented in the scikit-learn library. Parameters include population size and number of generations to run the Genetic Algorithm. TPOT
  25. Genetic Algorithm
  26. Practical Questions Can we really move AutoML from the Lab to production environments? What would be the latency of using an Ensemble of models in production? Would the AutoML training time be prohibitive for big datasets? I think we need Incremental AutoML: in which the previous model (together with new data) serves as an input to find the next best model.
  27. My personal experience: Semi-AutoML at Yahoo Labs Finite (large) number of manually pre-defined model configurations (hyperparameters). Incremental Learning: previous model was used as input for training new models. Used Hadoop Map-Reduce: each Reducer used one configuration, trained a model and measured its performance (parallel training). The model with best performance was chosen for deployment.
  28. What next? My personal opinion Automated ML will not replace the Data Scientist but will enable the Data Scientist to produce more models in less time with higher quality. This is probably the end of “good enough models” using standard parameters because the Data Scientist did not have time to check different parameters. The main advantage is not saving time. The main benefit is doing things that were never done because of lack of time. Data scientists will have more time to collaborate with business experts to get domain knowledge and use it in feature engineering.
  29. References • https://medium.com/@ODSC/the-past-present-and-future-of-automated-machine-learning-5e081ca4b71a • https://softwareengineeringdaily.com/2019/05/15/introduction-to-automated-machine-learning-automl/ • https://medium.com/georgian-impact-blog/automatic-machine-learning-aml-landscape-survey-f75c3ae3bbf2 • https://medium.com/@MLJARofficial/automl-comparison-4b01229fae5e • https://www.fast.ai/2018/07/16/auto-ml2/ • https://www.kdnuggets.com/2016/10/interview-auto-sklearn-automated-data-science-machine-learning-team.html • https://www.kdnuggets.com/2016/08/winning-automl-challenge-auto-sklearn.html • https://www.kdnuggets.com/2017/07/design-evolution-evolve-neural-network-automl.html • https://www.slideshare.net/JoaquinVanschoren/automl-lectures-acdl-2019 • https://www.youtube.com/watch?v=QrJlj0VCHys • https://www.youtube.com/watch?v=jn-22XyKsgo • https://cloud.google.com/automl/
  30. Thanks! Questions? Comments?
Advertisement