Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

211 views

Published on

Scott is co-founder and CEO of SigOpt, a YC and a16z backed “Optimization as a Service” startup in San Francisco. Scott has been applying optimal learning techniques in industry and academia for years, from bioinformatics to production advertising systems. Before SigOpt, Scott worked on the Ad Targeting team at Yelp leading the charge on academic research and outreach with projects like the Yelp Dataset Challenge and open sourcing MOE. Scott holds a PhD in Applied Mathematics and an MS in Computer Science from Cornell University and BS degrees in Mathematics, Physics, and Computational Physics from Oregon State University. Scott was chosen as one of Forbes’ 30 under 30 in 2016.

Abstract summary

Bayesian Global Optimization: Using Optimal Learning to Deep Learning Models:
In this talk we introduce Bayesian Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different configurations is time-consuming or expensive. Deep learning pipelines are notoriously expensive to train and often have many tunable parameters including hyperparameters, the architecture, feature transformations that can have a large impact on the efficacy of the model.

We will motivate the problem by giving several example applications using open source deep learning frameworks and open datasets. We’ll compare the results of Bayesian Optimization to standard techniques like grid search, random search, and expert tuning.

Published in: Technology

Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

  1. 1. BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Deep Learning / AI Models Scott Clark scott@sigopt.com
  2. 2. OUTLINE 1. Why is Tuning Models Hard? 2. Comparison of Tuning Methods 3. Bayesian Global Optimization 4. Deep Learning Examples
  3. 3. Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive
  4. 4. https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3 What is the most important unresolved problem in machine learning? “...we still don't really know why some configurations of deep neural networks work in some case and not others, let alone having a more or less automatic approach to determining the architectures and the hyperparameters.” Xavier Amatriain, VP Engineering at Quora (former Director of Research at Netflix)
  5. 5. Photo: Joe Ross
  6. 6. TUNABLE PARAMETERS IN DEEP LEARNING
  7. 7. TUNABLE PARAMETERS IN DEEP LEARNING
  8. 8. Photo: Tammy Strobel
  9. 9. STANDARD METHODS FOR HYPERPARAMETER SEARCH
  10. 10. STANDARD TUNING METHODS Parameter Configuration ? Grid Search Random Search Manual Search - Weights - Thresholds - Window sizes - Transformations ML / AI Model Testing Data Cross Validation Training Data
  11. 11. OPTIMIZATION FEEDBACK LOOP Objective Metric Better Results REST API New configurations ML / AI Model Testing Data Cross Validation Training Data
  12. 12. BAYESIAN GLOBAL OPTIMIZATION
  13. 13. … the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive. Prof. Warren Powell - Princeton What is the most efficient way to collect information? Prof. Peter Frazier - Cornell How do we make the most money, as fast as possible? Scott Clark - CEO, SigOpt OPTIMAL LEARNING
  14. 14. ● Optimize objective function ○ Loss, Accuracy, Likelihood ● Given parameters ○ Hyperparameters, feature/architecture params ● Find the best hyperparameters ○ Sample function as few times as possible ○ Training on big data is expensive BAYESIAN GLOBAL OPTIMIZATION
  15. 15. SMBO Sequential Model-Based Optimization HOW DOES IT WORK?
  16. 16. 1. Build Gaussian Process (GP) with points sampled so far 2. Optimize the fit of the GP (covariance hyperparameters) 3. Find the point(s) of highest Expected Improvement within parameter domain 4. Return optimal next best point(s) to sample GP/EI SMBO
  17. 17. GAUSSIAN PROCESSES
  18. 18. GAUSSIAN PROCESSES
  19. 19. overfit good fit underfit GAUSSIAN PROCESSES
  20. 20. EXPECTED IMPROVEMENT
  21. 21. DEEP LEARNING EXAMPLES
  22. 22. ● Classify movie reviews using a CNN in MXNet SIGOPT + MXNET
  23. 23. TEXT CLASSIFICATION PIPELINE ML / AI Model (MXNet) Testing Text Validation Accuracy Better Results REST API Hyperparameter Configurations and Feature Transformations Training Text
  24. 24. TUNABLE PARAMETERS IN DEEP LEARNING
  25. 25. ● Comparison of several RMSProp SGD parametrizations STOCHASTIC GRADIENT DESCENT
  26. 26. ARCHITECTURE PARAMETERS
  27. 27. Grid Search Random Search ? TUNING METHODS
  28. 28. SPEED UP #2: RANDOM/GRID -> SIGOPT
  29. 29. CONSISTENTLY BETTER AND FASTER
  30. 30. ● Classify house numbers in an image dataset (SVHN) SIGOPT + TENSORFLOW
  31. 31. COMPUTER VISION PIPELINE ML / AI Model (Tensorflow) Testing Images Cross Validation Accuracy Better Results REST API Hyperparameter Configurations and Feature Transformations Training Images
  32. 32. METRIC OPTIMIZATION
  33. 33. ● All convolutional neural network ● Multiple convolutional and dropout layers ● Hyperparameter optimization mixture of domain expertise and grid search (brute force) SIGOPT + NEON http://arxiv.org/pdf/1412.6806.pdf
  34. 34. COMPARATIVE PERFORMANCE ● Expert baseline: 0.8995 ○ (using neon) ● SigOpt best: 0.9011 ○ 1.6% reduction in error rate ○ No expert time wasted in tuning
  35. 35. SIGOPT + NEON http://arxiv.org/pdf/1512.03385v1.pdf ● Explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions ● Variable depth ● Hyperparameter optimization mixture of domain expertise and grid search (brute force)
  36. 36. COMPARATIVE PERFORMANCE Standard Method ● Expert baseline: 0.9339 ○ (from paper) ● SigOpt best: 0.9436 ○ 15% relative error rate reduction ○ No expert time wasted in tuning
  37. 37. SIGOPT SERVICE
  38. 38. OPTIMIZATION FEEDBACK LOOP Objective Metric Better Results REST API New configurations ML / AI Model Testing Data Cross Validation Training Data
  39. 39. SIMPLIFIED OPTIMIZATION Client Libraries ● Python ● Java ● R ● Matlab ● And more... Framework Integrations ● TensorFlow ● scikit-learn ● xgboost ● Keras ● Neon ● And more... Live Demo
  40. 40. DISTRIBUTED TRAINING ● SigOpt serves as a distributed scheduler for training models across workers ● Workers access the SigOpt API for the latest parameters to try for each model ● Enables easy distributed training of non-distributed algorithms across any number of models
  41. 41. https://sigopt.com/getstarted Try it yourself!
  42. 42. Questions? contact@sigopt.com https://sigopt.com @SigOpt

×