Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tuning 2.0: Advanced Optimization Techniques Webinar

92 views

Published on

This webinar, hosted by SigOpt co-founder and CEO Scott Clark, explains how advanced features can help you achieve your modeling goals. These features include metric definition and multimetric optimization, conditional parameters, and multitask optimization for long training cycles.

Published in: Data & Analytics
  • Be the first to comment

Tuning 2.0: Advanced Optimization Techniques Webinar

  1. 1. SigOpt. Confidential. Tuning 2.0 Advanced Optimization Techniques Scott Clark, PhD — Founder and CEO Tuesday, September 10, 2019
  2. 2. SigOpt. Confidential. Accelerate and amplify the impact of modelers everywhere
  3. 3. SigOpt. Confidential. Differentiated Models Tailored Models 10,000x Analytics 2.0 Models 100x 1x Modelers by Segment Value per Model Enterprise AI Goals: Differentiate Products Generate Revenue Requirements: Modelers with Expertise Best-in-Class Solutions
  4. 4. SigOpt. Confidential. Your firewall Training Data AI, ML, DL, Simulation Model Model Evaluation or Backtest Testing Data New Configurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to fit any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Configuration Parameters or Hyperparameters Your data and models stay private Iterative, automated optimization Integrates with any modeling stack
  5. 5. SigOpt. Confidential. $300B+ in assets under management Current SigOpt algorithmic trading customers represent $500B+ in market capitalization Current SigOpt enterprise customers across six industries represent
  6. 6. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment
  7. 7. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment Hyperparameter Optimization (including long training cycles)
  8. 8. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment Early Stopping Convergence Monitoring
  9. 9. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment HPO with Conditional Parameters
  10. 10. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment Tuning Transformations
  11. 11. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment Multimetric HPO
  12. 12. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment Re-tuning
  13. 13. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment Tuning Transformations Balancing Metrics Tuning Architecture Early Stopping HPO (long training cycles) Re-tuning Opportunity Iteratively tune at all stages of the modeling lifecycle
  14. 14. SigOpt. Confidential. Benefits Learn fast, fail fast Give yourself the best chance at finding good use cases while avoiding false negatives Connect outputs to outcomes Define, select and iterate on your metrics with end-to-end evaluation Find the global maximum Early non-optimized decisions in the process limit your ability to maximize performance Boost productivity Automate modeling tasks so modelers spend more time applying their expertise
  15. 15. SigOpt. Confidential. Data Engineering Feature Engineering Metric Definition Model Search Model Training Model Tuning Model Evaluation Model Deployment Tuning Transformations Balancing Metrics Tuning Architecture Early Stopping HPO (long training cycles) Re-tuning Focus for today Metric Definition, Model Search, Long Training Cycles
  16. 16. SigOpt. Confidential. Techniques 1. Metric definition: multimetric optimization 2. Model search: conditional parameters 3. Long training cycles: multitask optimization
  17. 17. SigOpt. Confidential. Metric definition with multimetric optimization 1
  18. 18. SigOpt. Confidential. How it works: Multimetric optimization (with thresholds) ● Define two metrics instead of one ● Optimize against both metrics automatically and simultaneously ● Set thresholds on each individual metric to reflect business or modeling needs ● Compare a Pareto frontier of best model configurations that balance these two metrics ● Relevant docs ● Relevant blog post
  19. 19. SigOpt. Confidential. Potential applications of multimetric optimization Balance Competing Objectives Define and Select Metrics Connect Metrics to Outcomes https://sigopt.com/blog/intro-to-multicriteria -optimization/ https://sigopt.com/blog/multimetric-updates- in-the-experiment-insights-dashboard/ https://sigopt.com/blog/metric-thresholds-a -new-feature-to-supercharge-multimetric- optimization/
  20. 20. Use Case: Balancing Speed & Accuracy in Deep Learning Multimetric Use Case 1 ● Category: Time Series ● Task: Sequence Classification ● Model: CNN ● Data: Diatom Images ● Analysis: Accuracy-Time Tradeoff ● Result: Similar accuracy, 33% the inference time Multimetric Use Case 2 ● Category: NLP ● Task: Sentiment Analysis ● Model: CNN ● Data: Rotten Tomatoes ● Analysis: Accuracy-Time Tradeoff ● Result: ~2% in accuracy versus 50% of training time Learn more https://devblogs.nvidia.com/sigopt-deep-learning- hyperparameter-optimization/
  21. 21. SigOpt. Confidential. Experiment Design for Sequence Classification Data ● Diatom Images ● Source: UCR Time Series Classification Model ● Convolutional Neural Network ● Source: Wang et al. (paper, code) ● Tensorflow via Keras Metrics ● Inference Time ● Accuracy HPO Methods (Implemented via SigOpt) ● Random Search ● Bayesian Optimization Note: Experiment code available here
  22. 22. SigOpt. Confidential. Process: Tune variety of parameters, maximize metrics Network Architecture Stochastic Gradient Descent
  23. 23. SigOpt. Confidential. Result: Bayesian outperforms random search ● Both methods were executed via the SigOpt API ● Bayesian optimization required 90% fewer training runs than random search ● Bayesian optimization found 85.7% of the combined Pareto frontier of optimal model configurations—almost 6x as many choices 10x random
  24. 24. SigOpt. Confidential. Result: Minimal accuracy loss for 66% inference gain Maximize accuracy Minimize inference time Balance Both
  25. 25. SigOpt. Confidential. Model search with conditional parameters2
  26. 26. SigOpt. Confidential. How it works: Conditional parameters Take into account the conditionality of certain parameter types in the optimization process ● Establish conditionality between various parameters ● Use this conditionality to improve the Bayesian optimization process ● Boost results from the hyper- parameter optimization process ● Example: Architecture parameters for deep learning models ● Example: Parameter types for SGD variants (to the right) ● Relevant docs
  27. 27. Use Case: Effective and Efficient NLP Optimization Use Case ● Category: NLP ● Task: Question Answering ● Model: MemN2N ● Data: bAbI ● Analysis: Performance benchmark ● Result: 4.84% gain, 30% the cost Learn more https://devblogs.nvidia.com/optimizing-end-to-end-memory- networks-using-sigopt-gpus/
  28. 28. SigOpt. Confidential. Design: Question answering data and memory networks Data Model Sources: Facebook AI Research (FAIR) bAbI dataset: https://research.fb.com/downloads/babi/ Sukhbaatar et al.: https://arxiv.org/abs/1503.08895
  29. 29. SigOpt. Confidential. Hyperparameter Optimization Experiment Setup Comparison of Bayesian Optimization and Random Search Standard Parameters Conditional Parameters
  30. 30. SigOpt. Confidential. Result: Significant boost in consistency, accuracy Comparison across random search versus Bayesian optimization with conditionals
  31. 31. SigOpt. Confidential. Result: Highly cost efficient accuracy gains Comparison across random search versus Bayesian optimization with conditionals SigOpt is 18.5x as efficient
  32. 32. SigOpt. Confidential. Long training cycles with multitask optimization in parallel 3
  33. 33. SigOpt. Confidential.33 How it works: Multitask Optimization Partial Full ● Introduce a variety of cheap and expensive tasks in a hyperparameter optimization experiment ● Use cheaper tasks earlier (explore) in the tuning process to inform more expensive tasks later (exploit) ● In the process, reduce the full time required to tune an expensive model ● Relevant docs Sources: Matthias Poloczek, Jialei Wang, Peter I. Frazier: https://arxiv.org/abs/1603.00389 Aaron Klein, Frank Hutter, et al.: https://arxiv.org/abs/1605.07079
  34. 34. SigOpt. Confidential. How it works: Combine multitask with parallelization Your firewall New Configurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to fit any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Configuration Parameters or Hyperparameters WorkerWorker Worker Worker
  35. 35. Use Case: Image Classification on a Budget Use Case ● Category: Computer Vision ● Task: Image Classification ● Model: CNN ● Data: Stanford Cars ● Analysis: Architecture Comparison ● Result: 2.4% accuracy gain for much cheaper model Learn more https://mlconf.com/blog/insights-for-building-high-performing- image-classification-models/
  36. 36. SigOpt. Confidential. Data: Cars image classification 36 Stanford CARS Dataset 16,185 images, 196 classes Labels: Car, Make, Year Source: Stanford CARS Dataset: https://ai.stanford.edu/~jkrause/cars/car_dataset.html
  37. 37. SigOpt. Confidential. Architecture: Classifying images of cars using ResNet 37 Convolutions Classification ResNet Input Acura TLX 2015 Output Label Sources: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: https://arxiv.org/abs/1512.03385
  38. 38. SigOpt. Confidential. Architecture Comparison Training Setup Comparison Experiment design scenarios 38 Baseline SigOpt Multitask ResNet 50 Scenario 1a Pre-Train on Imagenet Tune Fully Connected Layer Scenario 1b Optimize Hyperparameters to Tune the Fully Connected Layer ResNet 18 Scenario 2a Fine Tune Full Network Scenario 2b Optimize Hyperparameters to Fine Tune the Full Network
  39. 39. SigOpt. Confidential. Training setup comparison ImageNet Pretrained Convolutional Layers Fully Connected Layer ImageNet Pretrained Convolutional Layers Fully Connected Layer Input Convolutional Features Classification Input Convolutional Features Classification Fine Tuning Feature Extractor Tuning Tuning
  40. 40. SigOpt. Confidential. Hyperparameter setup 40 Hyperparameter Lower Bound Upper Bound Log Learning Rate 1.2e-4 1.0 Learning Rate Scheduler 0 0.99 Batch Size (powers of 2) 16 256 Nesterov False True Log Weight Decay 1.2e-5 1.0 Momentum 0 0.9 Scheduler Step 1 20
  41. 41. SigOpt. Confidential. Fine-tuning the smaller network significantly outperforms feature extraction on a bigger network Results: Optimizing and tuning the full network outperforms 41 Multitask optimization drives significant performance gains +3.92% +1.58%
  42. 42. SigOpt. Confidential.42 Insight: Multitask efficiency at the hyperparameter level Example: Learning rate accuracy and values by cost of task over time Progression of observations over time Accuracy and value for each observation Parameter importance analysis
  43. 43. SigOpt. Confidential. Insight: Parallelization further accelerates wall-clock time 43 928 total hours to optimize ResNet 18 220 observations per experiment 20 p2.xlarge AWS ec2 instances 45 hour actual wall-clock time
  44. 44. SigOpt. Confidential. Implication: Fine-tuning significantly outperforms Cost Breakdown for Multitask Optimization Cost efficiency Feature Extractor ResNet 50 Fine-Tuning ResNet 18 Hours per training 4.08 4.2 Observations 220 220 Number of Runs 1 1 Total compute hours 898 924 Cost per GPU-hour $0.90 $0.90 % Improvement 1.58% 3.92% Total compute cost $808 $832 cost ($) per % improvement $509 $20 Similar Compute Cost Fine-Tuning Significantly More Efficient and Effective Similar Wall-Clock Time
  45. 45. SigOpt. Confidential. Implication: Multiple benefits from multitask 45 Tuning ResNet-18 Cost efficiency Multitask Bayesian Random Hours per training 4.2 4.2 4.2 Observations 220 646 646 Number of Runs 1 1 20 Total compute hours 924 2,713 54,264 Cost per GPU-hour $0.90 $0.90 $0.90 Total compute cost $832 $2,442 $48,838 Time to optimize Multitask Bayesian Random Total compute hours 924 2,713 54,264 # of Machines 20 20 20 Wall-clock time (hrs) 46 136 2,713 1.7% the cost of random search to achieve similar performance 58x faster wall-clock time to optimize with multitask versus random search
  46. 46. SigOpt. Confidential. Techniques 1. Metric definition: multimetric optimization Read the blog here. 2. Model search: conditional parameters Read the blog here. 3. Long training cycles: multitask optimization Read the blog here.
  47. 47. SigOpt. Confidential. Try our solution Sign up at sigopt.com/try-it today. Register with code: 1SFSPON25 https://sanfrancisco.theaisummit.com September 25-26, 2019 Register with code: SIGOPT20 https://twimlcon.com/ October 1-2, 2019 Download eBook https://twimlai.com/announcing-our- ai-platforms-series-and-ebooks/

×