Advertisement

Nov. 6, 2019•0 likes## 2 likes

•177 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Technology

Building on the TWIML eBook, TWIMLcon event and TWIML podcast series that explore Machine Learning Platforms in great detail, this webinar examines the machine learning platforms that power enterprise leaders in AI. SigOpt CEO Scott Clark will provide an overview of critical technical capabilities that our customers have prioritized in their ML platforms. Review these slides to learn about: - Critical capabilities for data, experiment and model management - Tradeoffs between building and buying these capabilities - Lessons from the implementation of these platforms by AI leaders Why focus on these platforms and the capabilities that power them? Nearly every company is investing in machine learning that differentiates products or generates revenue. These so-called "differentiated models" represent the biggest opportunity for AI to transform the business. Most of these teams find success hiring expert data scientists and machine learning engineers who can build these models. But most of these teams also struggle to create a more sustainable, scalable and reproducible process for model development, and have begun building ML platforms to tackle this challenge.

SigOptFollow

SigOptAdvertisement

Advertisement

Advertisement

Tuning for Systematic Trading: Talk 1SigOpt

Tuning for Systematic Trading: Talk 2: Deep LearningSigOpt

Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategySigOpt

BigML Education - DeepnetsBigML, Inc

Agile analytics : An exploratory study of technical complexity managementAgnirudra Sikdar

Common Problems in Hyperparameter OptimizationSigOpt

- SigOpt. Conﬁdential. Advanced Optimization for the Enterprise Considerations and use cases Scott Clark — Co-Founder and CEO, SigOpt Tuesday, November 5, 2019
- SigOpt. Conﬁdential. 2 Accelerate and amplify the impact of modelers everywhere
- SigOpt. Conﬁdential. 3 Abstract SigOpt provides an extensive set of advanced features, which help you, the expert, save time while increasing performance. Today, we will be sharing some of the intuition behind these features, while combining and applying them to tackle real-world problems
- SigOpt. Conﬁdential. How experimentation impacts your modeling 4 Notebook & Model Framework Hardware Environment Data Preparation Experimentation, Training, Evaluation Model Productionalization Validation Serving Deploying Monitoring Managing Inference Online Testing Transformation Labeling Pre-Processing Pipeline Dev. Feature Eng. Feature Stores On-Premise Hybrid Multi-Cloud Experimentation & Model Optimization Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management ...and more
- SigOpt. Conﬁdential. 5 Motivation 1. How to solve a black box optimization problem 2. Why you should optimize using multiple competing metrics 3. How to continuously and eﬃciently employ your project’s dedicated compute infrastructure 4. How to tune models with expensive training costs
- SigOpt. Conﬁdential. SigOpt Features Enterprise Platform Optimization Engine Experiment Insights Reproducibility Intuitive web dashboards Cross-team permissions and collaboration Advanced experiment visualizations Usage insights Parameter importance analysis Multimetric optimization Continuous, categorical, or integer parameters Constraints and failure regions Up to 10k observations, 100 parameters Multitask optimization and high parallelism Conditional parameters Infrastructure agnostic REST API Parallel Resource Scheduler Black-Box Interface Tunes without accessing any data Libraries for Python, Java, R, and MATLAB 6
- SigOpt. Conﬁdential. How to solve a black box optimization problem1
- SigOpt. Conﬁdential. Why black box optimization? SigOpt was designed to empower you, the practitioner, to re-deﬁne most machine learning problems as black box optimization problems with the added side beneﬁts: • Ampliﬁed performance — incremental gains in accuracy or other success metrics • Productivity gains — a consistent platform across tasks that facilitates sharing • Accelerated modeling — early elimination of non-scalable tasks • Compute eﬃciency — continuous, full utilization of infrastructure SigOpt uses an ensemble of Bayesian and Global Optimization methods to solve these black box optimization problems. 8 Black Box Optimization
- SigOpt. Conﬁdential. Your ﬁrewall Training Data AI, ML, DL, Simulation Model Model Evaluation or Backtest Testing Data New Conﬁgurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Conﬁguration Parameters or Hyperparameters Black Box Optimization
- SigOpt. Conﬁdential. EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Black Box Optimization Better Results
- SigOpt. Conﬁdential. A graphical depiction of the iterative process 11 Build a statistical model Build a statistical model Choose the next point to maximize the acquisition function Black Box Optimization Choose the next point to maximize the acquisition function
- SigOpt. Conﬁdential. Gaussian processes: a powerful tool for modeling in spatial statistics A standard tool for building statistical models is the Gaussian process [Fasshauer et al, 2015, Fraizer, 2018]. • Assume that function values are jointly normally distributed. • Apply prior beliefs about mean behavior and covariance between observations. • Posterior beliefs about unobserved locations can be computed rather easily. Diﬀerent prior assumptions produce diﬀerent statistical models: 12 Black Box Optimization
- SigOpt. Conﬁdential. Acquisition function: given a model, how should we choose the next point? An acquisition function is a strategy for deﬁning the utility of a future sample, given the current samples, while balancing exploration and exploitation [Shahriari et al, 2016]. Diﬀerent acquisition functions choose diﬀerent points (EI, PI, KG, etc.). 13 Black Box Optimization Exploration: Learning about the whole function f Exploitation: Further resolving regions where good f values have already been observed
- SigOpt Blog Posts: Intuition Behind Bayesian Optimization Some Relevant Blog Posts ● Intuition Behind Covariance Kernels ● Approximation of Data ● Likelihood for Gaussian Processes ● Proﬁle Likelihood vs. Kriging Variance ● Intuition behind Gaussian Processes ● Dealing with Troublesome Metrics Find more blog posts visit: https://sigopt.com/blog/
- SigOpt. Conﬁdential. Why you should optimize using multiple competing metrics 2
- SigOpt. Conﬁdential. Why optimize against multiple competing metrics? SigOpt allows the user to specify multiple competing metrics for either optimization or tracking to better align success of the modeling with business value and has the additional beneﬁt of: • Multiple metrics — The option to deﬁning multiple metrics, which can yield new and interesting results • Insights, metric storage — Insights through tracking of optimized and unoptimized metrics • Thresholds — The ability to deﬁne thresholds for success to better guide the optimizer We believe this process gives models that deliver more reliable business outcomes and which are better tied to real world applications. 16 Multiple Competing Metrics
- SigOpt. Conﬁdential. Your ﬁrewall Training Data AI, ML, DL, Simulation Models Model Evaluation or Backtest Testing Data New Conﬁgurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Conﬁguration Parameters or Hyperparameters Multiple Competing Metrics
- SigOpt. Conﬁdential. Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Multiple Competing Metrics Multiple Optimized and Unoptimized Metrics
- SigOpt. Conﬁdential. Multiple Competing Metrics Balancing competing metrics to ﬁnd the Pareto frontier Most problems of practical relevance involve 2 or more competing metrics. • Neural networks — Balancing accuracy and inference time • Materials design — Balancing performance and maintenance cost • Control systems — Balancing performance and safety In a situation with Competing Metrics, the set of all eﬃcient points (the Pareto frontier) is the solution. 19 Pareto Frontier Feasible Region
- SigOpt. Conﬁdential. Balancing competing metrics to ﬁnd the Pareto frontier As shown before, the goal in multi objective or multi criteria optimization the goal is to ﬁnd the optimal set of solution across a set of function [Knowles, 2006]. • This is formulated as ﬁnding the maximum of the set functions f1 to fn over the same domain x • No single point exist as the solution, but we are actively trying to maximize the size of the eﬃcient frontier, which represent the set of solutions • The solution is found through scalarization methods such as convex combination and epsilon-constraint 20 Multiple Competing Metrics
- SigOpt. Conﬁdential. Multiple Competing Metrics Intuition: Convex Combination Scalarization Idea: If we can convert the multimetric problem into a scalar problem, we can solve this problem using Bayesian optimization. One possible scalarization is through a convex combination of the objectives. 21
- SigOpt. Conﬁdential. Balancing competing metrics to ﬁnd the Pareto frontier with threshold As shown before, the goal in multi objective or multi criteria optimization the goal is to ﬁnd the optimal set of solution across a set of function [Knowles, 2006]. • This is formulated as ﬁnding the maximum of the set functions f1 to fn over the same domain x • No single point exist as the solution, but we are actively trying to maximize the size of the eﬃcient frontier, which represent the set of solutions • The solution is found through constrained scalarization methods such as convex combination and epsilon-constraint • Allow users to change constraints as the search progresses [Letham et al, 2019] 22 Multiple Competing Metrics
- SigOpt. Conﬁdential. Multiple Competing Metrics Constrained Scalarization 1. Model all metrics independently. • Requires no prior beliefs of how metrics interact. • Missing data removed on a per metric basis if unrecorded. 2. Expose the eﬃcient frontier through constrained scalar optimization. • Enforce user constraints when given. • Iterate through sub constraints to better resolve eﬃcient frontier, if desired. • Consider diﬀerent regions of the frontier when parallelism is possible. 3. Allow users to change constraints as the search progresses. • Allow the problems/goals to evolve as the user’s understanding changes. Constraints give customers more control over the circumstances and more ability to understand our actions. 23 Variation on Expected Improvement [Letham et al, 2019]
- SigOpt. Conﬁdential. Intuition: Scalarization and Epsilon Constraints 24 Multiple Competing Metrics
- SigOpt. Conﬁdential. Intuition: Constrained Scalarization and Epsilon Constraints 25 Multiple Competing Metrics
- Multimetric Use Case 1 ● Category: Time Series ● Task: Sequence Classiﬁcation ● Model: CNN ● Data: Diatom Images ● Analysis: Accuracy-Time Tradeoﬀ ● Result: Similar accuracy, 33% the inference time Multimetric Use Case 2 ● Category: NLP ● Task: Sentiment Analysis ● Model: CNN ● Data: Rotten Tomatoes Movie Reviews ● Analysis: Accuracy-Time Tradeoﬀ ● Result: ~2% in accuracy versus 50% of training time Learn more https://devblogs.nvidia.com/sigopt-deep-learning- hyperparameter-optimization/ Use Case: Balancing Speed & Accuracy in Deep Learning
- Design: Question answering data and memory networks Data Model Sources: Facebook AI Research (FAIR) bAbI dataset: https://research.fb.com/downloads/babi/ Sukhbaatar et al.: https://arxiv.org/abs/1503.08895
- Comparison of Bayesian Optimization and Random Search Setup: Hyperparameter Optimization Standard Parameters Conditional Parameters
- Result: Signiﬁcant boost in consistency, accuracy Comparison across random search versus Bayesian optimization with conditionals
- Result: Highly cost eﬃcient accuracy gains Comparison across random search versus Bayesian optimization with conditionals SigOpt is 18.5x as efficient
- SigOpt. Conﬁdential. How to continuously and eﬃciently utilize your project’s allotted compute infrastructure 3
- SigOpt. Conﬁdential. Utilize compute by asynchronous parallel optimization SigOpt natively handles Parallel Function Evaluation with the primary goal of minimizing the Overall Wall-Clock Time. Parallelism also provides: • Faster time-to-results — minimized overall wall-clock time • Full resource utilization — asynchronous parallel optimization • Scaling with infrastructure — optimize across the number of available compute resources We believe this is essential to increase Research Productivity by lowering the time-to-results and scaling with available infrastructure. 32 Continuously and eﬃciently utilize infrastructure
- SigOpt. Conﬁdential. Your ﬁrewall Training Data AI, ML, DL, Simulation Model Model Evaluation or Backtest Testing Data New Conﬁgurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Conﬁguration Parameters or Hyperparameters Continuously and eﬃciently utilize infrastructure
- SigOpt. Conﬁdential. Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Worker Continuously and eﬃciently utilize infrastructure
- SigOpt. Conﬁdential. Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Worker Worker Worker Worker Continuously and eﬃciently utilize infrastructure
- SigOpt. Conﬁdential. Parallel function evaluations 36 Parallel function evaluations are a way of eﬃciently maximizing a function while using all available compute resources [Ginsbourger et al, 2008, Garcia-Barcos et al. 2019]. • Choosing points by jointly maximizing criteria over the entire set • Asynchronously evaluating over a collection of points • Fixing points which are currently being evaluated while sampling new ones Continuously and eﬃciently utilize infrastructure 1D - Acquisition Function 2D - Acquisition Function
- SigOpt. Conﬁdential. Parallel optimization: multiple worker nodes jointly optimizes over a given function 37 Parallel bandwidth = 1 Parallel bandwidth = 2 Parallel bandwidth = 3 Parallel bandwidth = 4 Parallel bandwidth = 5 Next point(s) to evaluate: Parallel bandwidth represent the # of available compute resources Statistical Model Continuously and eﬃciently utilize infrastructure
- Parallelism Use Case ● Category: NLP ● Task: Sentiment Analysis ● Model: CNN ● Data: Rotten Tomatoes Movie Reviews ● Analysis: Predicting Positive vs. Negative Sentiment ● Result: 400x speedup Learn more https://aws.amazon.com/blogs/machine-learning/fast-cnn-tuni ng-with-aws-gpu-instances-and-sigopt/ Use Case: Fast CNN Tuning with AWS GPU Instances
- Variables we tested in this experiment Axes of exploration: • 6 hyperparameters versus 10 parameters (including SGD and alternate architectures) • CPU versus GPU compute • Grid search versus random search versus Bayesian optimization Results that we considered: • Accuracy • Compute cost • Wall clock time
- The parameters to tune A deep dive
- Results Speed and accuracy SigOpt helps you train your model faster and achieve higher accuracy This results in higher practitioner productivity and better business outcomes While random and grid search for hyperparameters do yield an accuracy improvement, SigOpt achieves better results on both dimensions
- Results: 8x more cost eﬃcient performance boost Detailed performance across diﬀerent optimization processes Experiment Type Accuracy Trials Epochs CPU Time CPU Cost GPU Time GPU Cost Link Percent Change % Δ per Comp $ Default (No Tuning) 75.70 1 50 2 hours $1.82 0 hours $0.04 NA 0 0 Grid Search (SGD Only) 79.30 729 38394 64 days $1401.38 32 hours $27.58 here 4.60 0.13 Random Search (SGD Only) 79.94 2400 127092 214 days $4638.86 106 hours $91.29 here 4.24 0.05 SigOpt Search (SGD Only) 80.40 240 15803 27 days $576.81 13 hours $11.35 here 4.70 0.42 Grid Search (SGD + Architecture) Not Feasible 59049 3109914 5255 days $113511.86 107 days $2233.95 NA NA NA Random Search (SGD + Architecture) 80.12 4000 208729 353 days $7618.61 174 hours $149.94 here 4.42 0.03 SigOpt Search (SGD + Architecture) 81.00 400 30060 51 days $1097.19 25 hours $21.59 here 5.30 0.25 % Δ per Comp $ is calculated using GPU compute
- Results: the experimentation loop The training pipeline: from data to optimization and evaluation
- SigOpt. Conﬁdential. How to tune models with expensive training costs 4
- SigOpt. Conﬁdential. How to eﬃciently minimize time to optimize any function SigOpt’s multitask feature is an eﬃcient way for modelers to tune model with an expensive training cost with the beneﬁt of: • Faster time-to-market — The ability to bring expensive models into production faster • Reduction in infrastructure cost — Intelligently leverage infrastructure while reducing cost Through novel research SigOpt helps the user lower the overall time-to-market, while reducing the overall compute budget. 45 Expensive Training Cost
- SigOpt. Conﬁdential. Your ﬁrewall Training Data AI, ML, DL, Simulation Model Model Evaluation or Backtest Testing Data New Conﬁgurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Conﬁguration Parameters or Hyperparameters Expensive Training Cost
- SigOpt. Conﬁdential. Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Expensive Training Cost
- SigOpt. Conﬁdential. Using cheap or free information to speed learning 48 Sources: Aaron Klein, Frank Hutter, et al.: https://arxiv.org/abs/1605.07079 SigOpt allows to the user to deﬁne lower-cost functions in order to quickly optimize expensive functions • Cheaper-cost functions can be ﬂexible (fewer epochs, subsampled data, other custom features) • Use cheaper tasks earlier in the tuning process to explore • Inform more expensive tasks later by exploiting what we learn • In the process, reduce the full time required to tune an expensive model Expensive Training Cost
- SigOpt. Conﬁdential. Using cheap or free information to speed learning We can build better models using inaccurate data to help point the actual optimization in the right direction with less cost. • Using a warm start through multi-task learning logic [Swersky et al, 2014] • Combining good anytime performance with active learning [Klein et al, 2018] • Accepting data from multiple sources without priors [Poloczek et al, 2017] 49 Expensive Training Cost
- Use Case: Image Classiﬁcation on a Budget Use Case ● Category: Computer Vision ● Task: Image Classiﬁcation ● Model: CNN ● Data: Stanford Cars Dataset ● Analysis: Architecture Comparison ● Result: 2.4% accuracy gain with a much shallower model Learn more https://mlconf.com/blog/insights-for-building-high-performing- image-classiﬁcation-models/
- SigOpt. Conﬁdential. Architecture: Classifying images of cars using ResNet 51 Convolutions Classiﬁcation ResNet Input Acura TLX 2015 Output Label Sources: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: https://arxiv.org/abs/1512.03385
- SigOpt. Conﬁdential. Training setup comparison ImageNet Pretrained Convolutional Layers Fully Connected Layer ImageNet Pretrained Convolutional Layers Fully Connected Layer Input Convolutional Features Classification Input Convolutional Features Classification Fine Tuning Feature Extractor Tuning Tuning
- SigOpt. Conﬁdential. Hyperparameter setup 53 Hyperparameter Lower Bound Upper Bound Log Learning Rate 1.2e-4 1.0 Learning Rate Scheduler 0 0.99 Batch Size (powers of 2) 16 256 Nesterov False True Log Weight Decay 1.2e-5 1.0 Momentum 0 0.9 Scheduler Step 1 20
- SigOpt. Conﬁdential.54 Insight: Multitask eﬃciency at the hyperparameter level Example: Learning rate accuracy and values by cost of task over time Progression of observations over time Accuracy and value for each observation Parameter importance analysis
- SigOpt. Conﬁdential. Fine-tuning the smaller network signiﬁcantly outperforms feature extraction on a bigger network Results: Optimizing and tuning the full network outperforms 55 Multitask optimization drives signiﬁcant performance gains +3.92% +1.58%
- SigOpt. Conﬁdential. Implication: Fine-tuning signiﬁcantly outperforms Cost Breakdown for Multitask Optimization Cost eﬃciency Feature Extractor ResNet 50 Fine-Tuning ResNet 18 Hours per training 4.08 4.2 Observations 220 220 Number of Runs 1 1 Total compute hours 898 924 Cost per GPU-hour $0.90 $0.90 % Improvement 1.58% 3.92% Total compute cost $808 $832 cost ($) per % improvement $509 $212 Similar Compute Cost Similar Wall-Clock Time Fine-Tuning Signiﬁcantly More Eﬃcient and Eﬀective
- SigOpt. Conﬁdential. Thank you

Advertisement