Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017


Published on

Alexandra Johnson, Software Engineer, SigOpt
Alexandra works on everything from infrastructure to product features to blog posts. Previously, she worked on growth, APIs, and recommender systems at Polyvore (acquired by Yahoo). She majored in computer science at Carnegie Mellon University with a minor in discrete mathematics and logic, and during the summers she A/B tested recommendations at internships with Facebook and Rent the Runway.

Abstract Summary:

Common Problems In Hyperparameter Optimization: All large machine learning pipelines have tunable parameters, commonly referred to as hyperparameters. Hyperparameter optimization is the process by which we find the values for these parameters that cause our system to perform the best. SigOpt provides a Bayesian optimization platform that is commonly used for hyperparameter optimization, and I’m going to share some of the common problems we’ve seen when integrating into machine learning pipelines.

Published in: Technology

Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017

  1. 1. Common Problems in Hyperparameter Optimization Alexandra Johnson @alexandraj777
  2. 2. What are Hyperparameters?
  3. 3. Hyperparameter Optimization ● Hyperparameter tuning, model tuning, model selection ● Finding "the best" values for the hyperparameters of your model
  4. 4. Better Performance ● +315% accuracy boost for TensorFlow ● +49% accuracy boost for xgboost ● -41% error reduction for recommender system
  5. 5. #1 Trusting the Defaults
  6. 6. ● Default values are an implicit choice ● Defaults not always appropriate for your model ● You may build a classifier that looks like this: Default Values
  7. 7. #2 Using the Wrong Metric
  8. 8. Choosing a Metric ● Balance long-term and short-term goals ● Question underlying assumptions ● Example from Microsoft
  9. 9. Choose Multiple Metrics ● ● Composite Metric ● Multi-metric
  10. 10. #3 Overfitting
  11. 11. Metric Generalization ● Cross validation ● Backtesting ● Regularization terms
  12. 12. Metric Generalization ● Cross validation ● Backtesting ● Regularization terms
  13. 13. Metric Generalization ● Cross validation ● Backtesting ● Regularization terms
  14. 14. #4 Too Few Hyperparameters
  15. 15. Optimize all Parameters at Once
  16. 16. Include Feature Parameters
  17. 17. Include Feature Parameters
  18. 18. Example: xgboost ● Optimized model always performed better with tuned feature parameters ● No matter which optimization method
  19. 19. #5 Hand Tuning
  20. 20. What is an Optimization Method?
  21. 21. You are not an Optimization Method ● Hand tuning is time consuming and expensive ● Algorithms can quickly and cheaply beat expert tuning
  22. 22. Grid Search Random Search Bayesian Optimization Use an Algorithm
  23. 23. #6 Grid Search
  24. 24. No Grid Search Hyper- parameters Model Evaluations 2 100 3 1,000 4 10,000 5 100,000
  25. 25. #7 Random Search
  26. 26. Random Search ● Theoretically more effective than grid search ● Large variance in results ● No intelligence
  27. 27. Use an Intelligent Method Genetic algorithms Bayesian optimization Particle-based methods Convex optimizers Simulated annealing To name a few...
  28. 28. SigOpt: Bayesian Optimization Service Three API calls: 1. Define hyperparameters 2. Receive suggested hyperparameters 3. Report observed performance
  29. 29. Thank You!
  30. 30. Intro Ian Dewancker. SigOpt for ML: TensorFlow ConvNets on a Budget with Bayesian Optimization. Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization. Ian Dewancker. SigOpt for ML : Bayesian Optimization for Collaborative Filtering with MLlib. #1 Trusting the Defaults Keras recurrent layers documentation #2 Using the Wrong Metric Ron Kohavi et al. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained. Xavier Amatriain. 10 Lessons Learning from building ML systems [Video at 19:03]. Image from PhD Comics. See also: SigOpt in Depth: Intro to Multicriteria Optimization. #4 Too Few Hyperparameters Image from TensorFlow Playground. Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization. #5 Hand Tuning On algorithms beating experts: Scott Clark, Ian Dewancker, and Sathish Nagappan. Deep Neural Network Optimization with SigOpt and Nervana Cloud. #6 Grid Search References - by Section
  31. 31. References - by Section #7 Random Search James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke. A Stratified Analysis of Bayesian Optimization Methods. Learn More