Your SlideShare is downloading.
×

- 1. Best Practices for Hyperparameter Tuning with Joseph Bradley April 24, 2019 Spark + AI Summit
- 2. About me Joseph Bradley • Software engineer at Databricks • Apache Spark committer & PMC member
- 3. TEAM About Databricks Started Spark project (now Apache Spark) at UC Berkeley in 2009 PRODUCT Unified Analytics Platform MISSION Making Big Data Simple Try for free today. databricks.com
- 4. Hyperparameters • Express high-level concepts, such as statistical assumptions • Are fixed before training or are hard to learn from data • Affect objective, test time performance, computational cost E.g.: • Linear Regression: regularization, # iterations of optimization • Neural Network: learning rate, # hidden layers
- 5. Tuning hyperparameters E.g.: Fitting a polynomial Common goals: • More flexible modeling process • Reduced generalization error • Faster training • Plug & play ML
- 6. Challenges in tuning Curse of dimensionality Non-convex optimization Computational cost Unintuitive hyperparameters
- 7. Tuning in the Data Science workflow Data
- 8. Tuning in the Data Science workflow Training Data Test Data ML Model
- 9. Tuning in the Data Science workflow Training Data Validation Data Test Data Final ML Model ML Model 1 ML Model 2 ML Model 3
- 10. Tuning in the Data Science workflow ML Model Featurization Model family selection Hyperparameter tuning “AutoML” includes hyperparameter tuning.
- 11. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
- 12. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
- 13. Overview of tuning methods • Manual search • Grid search • Random search • Population-based algorithms • Bayesian algorithms
- 14. Manual search Select hyperparameter settings to try based on human intuition. 2 hyperparameters: • [0, ..., 5] • {A, B, ..., F} Expert knowledge tells us to try: (2,C), (2,D), (2,E), (3,C), (3,D), (3,E) A B C D E F 0 1 2 3 4 5
- 15. Grid Search Try points on a grid defined by ranges and step sizes X-axis: {A,...,F} Y-axis: 0-5, step = 1 A B C D E F 0 1 2 3 4 5
- 16. A B C D E F 0 1 2 3 4 5 Random Search Sample from distributions over ranges X-axis: Uniform({A,...,F}) Y-axis: Uniform([0,5])
- 17. Start with random search, then iterate: • Use the previous “generation” to inform the next generation • E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
- 18. Start with random search, then iterate: • Use the previous “generation” to inform the next generation • E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
- 19. Start with random search, then iterate: • Use the previous “generation” to inform the next generation • E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
- 20. Model the loss function: Hyperparameters à loss Iteratively search space, trading off between exploration and exploitation A B C D E F 0 1 2 3 4 5 Bayesian Optimization
- 21. Bayesian OptimizationPerformance Parameter Space
- 22. Bayesian OptimizationPerformance Parameter Space
- 23. Bayesian OptimizationPerformance Parameter Space
- 24. Bayesian OptimizationPerformance Parameter Space
- 25. Bayesian OptimizationPerformance Parameter Space
- 26. Bayesian OptimizationPerformance Parameter Space
- 27. Bayesian OptimizationPerformance Parameter Space
- 28. Bayesian OptimizationPerformance Parameter Space
- 29. Bayesian OptimizationPerformance Parameter Space
- 30. Bayesian OptimizationPerformance Parameter Space
- 31. Bayesian OptimizationPerformance Parameter Space
- 32. Bayesian OptimizationPerformance Parameter Space
- 33. Bayesian OptimizationPerformance Parameter Space
- 34. Bayesian OptimizationPerformance Parameter Space
- 35. Bayesian OptimizationPerformance Parameter Space
- 36. Comparing tuning methods Iterative / adaptive? # evaluations for P params Model of param space Grid search No O(c^P) none Random search No O(k) none Population-based Yes O(k) implicit Bayesian Yes O(k) explicit
- 37. Open-source tools for tuning Grid search Random search Population -based Bayesian PyPi downloads last month Github stars License scikit-learn Yes Yes --- --- BSD MLlib Yes --- --- Apache 2.0 scikit- optimize Yes 49,189 1,278 BSD Hyperopt Yes Yes 98,282 3,286 BSD DEAP Yes 26,700 2,789 LGPL v3 TPOT Yes 9,057 5,609 LGPL v3 GPyOpt Yes 4,959 451 BSD As of mid-April 2019
- 38. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
- 39. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
- 40. Tracking • Experiments • Runs • Parameters • Metrics • Tags & artifacts Projects • Directory or git repository • Entry points • Environments Models • Storage format • Flavors • Deployment tools
- 41. Organizing with Training Data Validation Data Test Data Final ML ModelML Model 1 ML Model 2 ML Model 3 Experiment Main run Child runs
- 42. Instrumenting tuning with What to track in a run for a model • Hyperparameters: all vs. ones being tuned • Metric(s): training & validation, loss & objective, multiple objectives • Tags: provenance, simple metadata • Artifacts: serialized model, large metadata Tip: Tune full pipeline, not 1 model.
- 43. Analyzing how tuning performs Questions to answer • Am I tuning the right hyperparameters? • Am I exploring the right parts of the search space? • Do I need to do another round of tuning? Examining results • Simple case: visualize param vs metric • Challenges: multiple params and metrics, iterative experimentation
- 44. Moving models to production Repeatable experiments via MLflow Projects • Code checkpoints • Environments Model serialization via MLflow Models • Flavors: TensorFlow, Keras, Spark, MLeap, ... Deployment to prediction services • Azure ML, AWS Sagemaker, Spark UDF
- 45. Auto-tracking MLlib with Training Data Validation Data Test Data Final ML ModelML Model 1 ML Model 2 ML Model 3 Experiment Main run Child runs In Databricks • CrossValidator & TrainValidationSplit • 1 run per setting of hyperparameters • Avg metrics for CV folds(demo)
- 46. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
- 47. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
- 48. Advanced topics Efficient tuning • Parallelizing hyperparameter search • Early stopping • Transfer learning Fancy tuning • Multi-metric optimization • Conditional/awkward parameter spaces Check out Maneesh Bhide’s talk: "Advanced Hyperparameter Optimization for Deep Learning" to hear about early stopping, multi-metric, & conditionals Thursday @ 3:30pm, Room 3014
- 49. Advanced topics Efficient tuning: Parallelizing hyperparameter search Challenge in analyzing results: multiple parameters or multiple metrics Hyperopt + Apache Spark + MLflow integration • Hyperopt: general tuning library for ML in Python • Spark integration: parallelize model tuning in batches • MLflow integration: track runs, analogous to MLlib + MLflow integration (demo)
- 50. Getting started MLflow: http://mlflow.org MLlib tuning • Databricks auto-tracking with MLflow in private preview now, public preview mid-May Hyperopt • Distributed tuning via Apache Spark: working to open-source the code • Databricks auto-tracking with MLflow in public preview mid-May
- 51. Thank You! Questions? AMA @ DevLounge Theater Thursday @ 10:30-11am Thanks to Maneesh Bhide for material for this talk!