Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Efficient Distributed Hyperparameter Tuning with Apache Spark

Download to read offline

Hyperparameter tuning is a key step in achieving and maintaining optimal performance from Machine Learning (ML) models. Today, there are many open-source frameworks which help automate the process and employ statistical algorithms to efficiently search the parameter space. However, optimizing these parameters over a sufficiently large dataset or search space can be computationally infeasible on a single machine. Apache Spark is a natural candidate to accelerate such workloads, but naive parallelization can actually impede the overall search speed and accuracy.



In this talk, we’ll discuss how to efficiently leverage Spark to distribute our tuning workload and go over some common pitfalls. Specifically, we’ll provide a brief introduction to tuning and motivation for moving to a distributed workflow. Next, we’ll demonstrate best practices when utilizing Spark with Hyperopt – a popular, flexible, open-source tool for hyperparameter tuning. This will include topics such as how to distribute the training data and appropriately size the cluster for the problem at hand. We’ll also touch on the conflicting nature between parallel computation and Sequential Model-Based Optimization methods, such as the Tree-structured Parzen Estimators implemented in Hyperopt. Afterwards, we’ll demonstrate these practices with Hyperopt using the SparkTrials API. Additionally, we’ll showcase joblib-spark, an extension our team recently developed, which uses Spark as a distributed backend for scikit-learn to accelerate tuning and training.



This talk will be generally accessible to those familiar with ML and particularly useful for those looking to scale up their training with Spark.

  • Be the first to like this

Efficient Distributed Hyperparameter Tuning with Apache Spark

  1. 1. Efficient Distributed Hyperparameter Tuning with Apache Spark Viswesh Periyasamy Software Engineer
  2. 2. About me ▪ Software engineer at Databricks ▪ Previously software engineer at Confluent ▪ MS in Machine Learning & Bioinformatics from UW-Madison
  3. 3. Databricks is the data and AI company Unified data, analytics and AI on one Lakehouse platform ● 5000+ customers ● Creators of popular data and machine learning OSS projects
  4. 4. Scenario: Krabby Patties at scale 1. “The cluster keeps crashing” 2. “It’s slower than running it on a single machine” 3. “It’s performing worse than before”
  5. 5. This talk ▪ Hyperparameter tuning ▪ Challenges ▪ Distributed tuning ▪ Best practices ▪ Parallelism vs. performance ▪ Broadcasting data ▪ Allocating cluster resources ▪ Demo ▪ Hyperopt with Spark ▪ joblib-spark
  6. 6. Control how the model derives its parameters • Cannot be learned via training • In practice: function parameters set by user What are hyperparameters? 1. Model hyperparameters • Problem-specific configurations • Affect model selection or architecture 2. Optimization hyperparameters • Affect speed or quality of training Centroid (parameter) k (hyperparameter)
  7. 7. Hyperparameter tuning is essential • Should be tuned routinely before productionizing a model • Can be tuned systematically or with expertise Optimizing hyperparameters can significantly impact model performance better worse
  8. 8. Hyperparameter tuning is challenging • Hyperparameters are hard to reason about • Selecting hyperparameters to optimize • Defining the search space • Choosing the sample distribution • Computational complexity • Non-convex optimization • NP-hard • Curse of dimensionality ✓ ✕ ✕ • Computational complexity • Non-convex optimization • NP-hard • Curse of dimensionality
  9. 9. Distributed hyperparameter tuning Spark is a natural candidate for distributing workloads Accelerate tuning by evaluating hyperparameters in parallel
  10. 10. 1. Parallelism vs. performance 2. Broadcasting data 3. Allocating cluster resources Best Practices
  11. 11. Best Practices 1. Parallelism vs. performance 2. Broadcasting data 3. Allocating cluster resources
  12. 12. “Embarrassingly non-parallel” • Parallelism impedes Bayesian optimization • Bayesian optimization learns from previous trials to traverse the search space • Parallel execution ignores potential information parallelism 1 # of evaluations ✓ better optimization ✕ worse speed ✓ better speed ✕ worse optimization Good rule of thumb: parallelism ≤ # of cores ≪ # of evaluations
  13. 13. “Embarrassingly non-parallel” • What if my training library is already distributed? MLlib • Distributed libraries should be tuned sequentially • Each trial is already accelerated by the cluster • Can take full advantage of Bayesian optimization, launching trials from the driver
  14. 14. Best Practices 1. Parallelism vs. performance 2. Broadcasting data 3. Allocating cluster resources
  15. 15. Broadcast, broadcast, broadcast! • The hyperparameters vary, the data does not • When the data is large, caching is crucial • Data needs to be shared with each trial • Broadcasting can reduce to once per worker
  16. 16. bc_data = sc.broadcast(data) def objective(alpha): data = bc_data.value return train_and_eval(data, alpha) data = read_csv(“/path/to/dataset”) def objective(alpha): return train_and_eval(data, alpha) def objective(alpha): data = read_csv(“/path/to/dataset”) return train_and_eval(data, alpha) Serialize • Referenced objects get serialized by default • Sent once per task Load • Load data in task from distributed file system • Loaded in memory once per task Broadcast • Broadcast data and cache on worker nodes • Sent and loaded once per worker
  17. 17. Benchmarking Hyperopt + SparkTrials Broadcast whenever possible (bounded by spark.driver.maxResultSize)
  18. 18. Best Practices 1. Parallelism vs. performance 2. Broadcasting data 3. Allocating cluster resources
  19. 19. Size the cluster to match your search 1. Choose instance types appropriate for your data • Compute or memory-optimized instances for large datasets • GPU or TPU-accelerated instances for deep learning workloads 2. Allocate additional cores for models that can use them • e.g. n_jobs in scikit-learn or nthread in XGBoost 3. Increase # of worker nodes for a larger search space • Conversely, reduce the search budget • For auto-scaling clusters, set a higher parallelism in advance
  20. 20. Demo
  21. 21. Summary 1. Don’t over-parallelize • parallelism ≤ # of cores ≪ # of evaluations Distributed hyperparameter tuning, when done right, can alleviate the curse of dimensionality 2. Broadcast your data • sc.broadcast(data) 3. Size the cluster to match your search • Choose the appropriate instance types, # of cores, and # of nodes to boost performance
  22. 22. Distributed tuning methods + Any model type + Customized objective function + Bayesian optimization (TPE) + scikit-learn models + Low instrumentation + Random search scikit-learn + joblib-spark Hyperopt + SparkTrials Demo notebooks: tinyurl.com/viswesh-dais21-demo
  23. 23. Resources ▪ Tuning ML Models: Scaling, Workflows, and Architecture - Joseph Bradley ▪ Best Practices for Hyperparameter Tuning with MLflow - Joseph Bradley ▪ Advanced Hyperparameter Optimization for Deep Learning with MLflow - Maneesh Bhide • How (Not) to Tune Your Model with Hyperopt - Sean Owen • Hyperparameter Tuning with MLflow, Apache Spark MLlib and Hyperopt - Joseph Bradley • Scaling Hyperopt to Tune Machine Learning Models in Python - Joseph Bradley hyperopt joblib-spark Hyperopt concepts Hyperopt best practices and troubleshooting Recent talks Blog posts Documentation * Special thanks to Joseph Bradley and Sean Owen for content and references
  24. 24. Thanks! Questions? Don’t forget to rate and review this session and others!

Hyperparameter tuning is a key step in achieving and maintaining optimal performance from Machine Learning (ML) models. Today, there are many open-source frameworks which help automate the process and employ statistical algorithms to efficiently search the parameter space. However, optimizing these parameters over a sufficiently large dataset or search space can be computationally infeasible on a single machine. Apache Spark is a natural candidate to accelerate such workloads, but naive parallelization can actually impede the overall search speed and accuracy. In this talk, we’ll discuss how to efficiently leverage Spark to distribute our tuning workload and go over some common pitfalls. Specifically, we’ll provide a brief introduction to tuning and motivation for moving to a distributed workflow. Next, we’ll demonstrate best practices when utilizing Spark with Hyperopt – a popular, flexible, open-source tool for hyperparameter tuning. This will include topics such as how to distribute the training data and appropriately size the cluster for the problem at hand. We’ll also touch on the conflicting nature between parallel computation and Sequential Model-Based Optimization methods, such as the Tree-structured Parzen Estimators implemented in Hyperopt. Afterwards, we’ll demonstrate these practices with Hyperopt using the SparkTrials API. Additionally, we’ll showcase joblib-spark, an extension our team recently developed, which uses Spark as a distributed backend for scikit-learn to accelerate tuning and training. This talk will be generally accessible to those familiar with ML and particularly useful for those looking to scale up their training with Spark.

Views

Total views

85

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×