Advertisement

Sep. 11, 2019•0 likes## 2 likes

•689 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Data & Analytics

This webinar, hosted by SigOpt co-founder and CEO Scott Clark, explains how advanced features can help you achieve your modeling goals. These features include metric definition and multimetric optimization, conditional parameters, and multitask optimization for long training cycles.

SigOptFollow

SigOptAdvertisement

Advertisement

Advertisement

Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...polochau

AI Solutions in ManufacturingSri Ambati

Bring Your Own Recipes Hands-On Session Sri Ambati

TensorFlow London 18: Dr Alastair Moore, Towards the use of Graphical Models ...Seldon

Keynote by Mike Gualtieri, Forrester Research - Making AI Happen Without Gett...Sri Ambati

Getting Your Supply Chain Back on Track with AISri Ambati

- SigOpt. Conﬁdential. Tuning 2.0 Advanced Optimization Techniques Scott Clark, PhD — Founder and CEO Tuesday, September 10, 2019
- SigOpt. Conﬁdential. Accelerate and amplify the impact of modelers everywhere
- SigOpt. Conﬁdential. Diﬀerentiated Models Tailored Models 10,000x Analytics 2.0 Models 100x 1x Modelers by Segment Value per Model Enterprise AI Goals: Diﬀerentiate Products Generate Revenue Requirements: Modelers with Expertise Best-in-Class Solutions
- SigOpt. Conﬁdential. Your ﬁrewall Training Data AI, ML, DL, Simulation Model Model Evaluation or Backtest Testing Data New Conﬁgurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Conﬁguration Parameters or Hyperparameters Your data and models stay private Iterative, automated optimization Integrates with any modeling stack
- SigOpt. Conﬁdential. $300B+ in assets under management Current SigOpt algorithmic trading customers represent $500B+ in market capitalization Current SigOpt enterprise customers across six industries represent
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment Hyperparameter Optimization (including long training cycles)
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment Early Stopping Convergence Monitoring
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment HPO with Conditional Parameters
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment Tuning Transformations
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment Multimetric HPO
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment Re-tuning
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment Tuning Transformations Balancing Metrics Tuning Architecture Early Stopping HPO (long training cycles) Re-tuning Opportunity Iteratively tune at all stages of the modeling lifecycle
- SigOpt. Conﬁdential. Beneﬁts Learn fast, fail fast Give yourself the best chance at finding good use cases while avoiding false negatives Connect outputs to outcomes Define, select and iterate on your metrics with end-to-end evaluation Find the global maximum Early non-optimized decisions in the process limit your ability to maximize performance Boost productivity Automate modeling tasks so modelers spend more time applying their expertise
- SigOpt. Conﬁdential. Data Engineering Feature Engineering Metric Deﬁnition Model Search Model Training Model Tuning Model Evaluation Model Deployment Tuning Transformations Balancing Metrics Tuning Architecture Early Stopping HPO (long training cycles) Re-tuning Focus for today Metric Deﬁnition, Model Search, Long Training Cycles
- SigOpt. Conﬁdential. Techniques 1. Metric deﬁnition: multimetric optimization 2. Model search: conditional parameters 3. Long training cycles: multitask optimization
- SigOpt. Conﬁdential. Metric deﬁnition with multimetric optimization 1
- SigOpt. Conﬁdential. How it works: Multimetric optimization (with thresholds) ● Deﬁne two metrics instead of one ● Optimize against both metrics automatically and simultaneously ● Set thresholds on each individual metric to reﬂect business or modeling needs ● Compare a Pareto frontier of best model conﬁgurations that balance these two metrics ● Relevant docs ● Relevant blog post
- SigOpt. Conﬁdential. Potential applications of multimetric optimization Balance Competing Objectives Deﬁne and Select Metrics Connect Metrics to Outcomes https://sigopt.com/blog/intro-to-multicriteria -optimization/ https://sigopt.com/blog/multimetric-updates- in-the-experiment-insights-dashboard/ https://sigopt.com/blog/metric-thresholds-a -new-feature-to-supercharge-multimetric- optimization/
- Use Case: Balancing Speed & Accuracy in Deep Learning Multimetric Use Case 1 ● Category: Time Series ● Task: Sequence Classiﬁcation ● Model: CNN ● Data: Diatom Images ● Analysis: Accuracy-Time Tradeoﬀ ● Result: Similar accuracy, 33% the inference time Multimetric Use Case 2 ● Category: NLP ● Task: Sentiment Analysis ● Model: CNN ● Data: Rotten Tomatoes ● Analysis: Accuracy-Time Tradeoﬀ ● Result: ~2% in accuracy versus 50% of training time Learn more https://devblogs.nvidia.com/sigopt-deep-learning- hyperparameter-optimization/
- SigOpt. Conﬁdential. Experiment Design for Sequence Classiﬁcation Data ● Diatom Images ● Source: UCR Time Series Classiﬁcation Model ● Convolutional Neural Network ● Source: Wang et al. (paper, code) ● Tensorﬂow via Keras Metrics ● Inference Time ● Accuracy HPO Methods (Implemented via SigOpt) ● Random Search ● Bayesian Optimization Note: Experiment code available here
- SigOpt. Conﬁdential. Process: Tune variety of parameters, maximize metrics Network Architecture Stochastic Gradient Descent
- SigOpt. Conﬁdential. Result: Bayesian outperforms random search ● Both methods were executed via the SigOpt API ● Bayesian optimization required 90% fewer training runs than random search ● Bayesian optimization found 85.7% of the combined Pareto frontier of optimal model conﬁgurations—almost 6x as many choices 10x random
- SigOpt. Conﬁdential. Result: Minimal accuracy loss for 66% inference gain Maximize accuracy Minimize inference time Balance Both
- SigOpt. Conﬁdential. Model search with conditional parameters2
- SigOpt. Conﬁdential. How it works: Conditional parameters Take into account the conditionality of certain parameter types in the optimization process ● Establish conditionality between various parameters ● Use this conditionality to improve the Bayesian optimization process ● Boost results from the hyper- parameter optimization process ● Example: Architecture parameters for deep learning models ● Example: Parameter types for SGD variants (to the right) ● Relevant docs
- Use Case: Eﬀective and Eﬃcient NLP Optimization Use Case ● Category: NLP ● Task: Question Answering ● Model: MemN2N ● Data: bAbI ● Analysis: Performance benchmark ● Result: 4.84% gain, 30% the cost Learn more https://devblogs.nvidia.com/optimizing-end-to-end-memory- networks-using-sigopt-gpus/
- SigOpt. Conﬁdential. Design: Question answering data and memory networks Data Model Sources: Facebook AI Research (FAIR) bAbI dataset: https://research.fb.com/downloads/babi/ Sukhbaatar et al.: https://arxiv.org/abs/1503.08895
- SigOpt. Conﬁdential. Hyperparameter Optimization Experiment Setup Comparison of Bayesian Optimization and Random Search Standard Parameters Conditional Parameters
- SigOpt. Conﬁdential. Result: Signiﬁcant boost in consistency, accuracy Comparison across random search versus Bayesian optimization with conditionals
- SigOpt. Conﬁdential. Result: Highly cost eﬃcient accuracy gains Comparison across random search versus Bayesian optimization with conditionals SigOpt is 18.5x as efficient
- SigOpt. Conﬁdential. Long training cycles with multitask optimization in parallel 3
- SigOpt. Conﬁdential.33 How it works: Multitask Optimization Partial Full ● Introduce a variety of cheap and expensive tasks in a hyperparameter optimization experiment ● Use cheaper tasks earlier (explore) in the tuning process to inform more expensive tasks later (exploit) ● In the process, reduce the full time required to tune an expensive model ● Relevant docs Sources: Matthias Poloczek, Jialei Wang, Peter I. Frazier: https://arxiv.org/abs/1603.00389 Aaron Klein, Frank Hutter, et al.: https://arxiv.org/abs/1605.07079
- SigOpt. Conﬁdential. How it works: Combine multitask with parallelization Your ﬁrewall New Conﬁgurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Conﬁguration Parameters or Hyperparameters WorkerWorker Worker Worker
- Use Case: Image Classiﬁcation on a Budget Use Case ● Category: Computer Vision ● Task: Image Classiﬁcation ● Model: CNN ● Data: Stanford Cars ● Analysis: Architecture Comparison ● Result: 2.4% accuracy gain for much cheaper model Learn more https://mlconf.com/blog/insights-for-building-high-performing- image-classiﬁcation-models/
- SigOpt. Conﬁdential. Data: Cars image classiﬁcation 36 Stanford CARS Dataset 16,185 images, 196 classes Labels: Car, Make, Year Source: Stanford CARS Dataset: https://ai.stanford.edu/~jkrause/cars/car_dataset.html
- SigOpt. Conﬁdential. Architecture: Classifying images of cars using ResNet 37 Convolutions Classiﬁcation ResNet Input Acura TLX 2015 Output Label Sources: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: https://arxiv.org/abs/1512.03385
- SigOpt. Conﬁdential. Architecture Comparison Training Setup Comparison Experiment design scenarios 38 Baseline SigOpt Multitask ResNet 50 Scenario 1a Pre-Train on Imagenet Tune Fully Connected Layer Scenario 1b Optimize Hyperparameters to Tune the Fully Connected Layer ResNet 18 Scenario 2a Fine Tune Full Network Scenario 2b Optimize Hyperparameters to Fine Tune the Full Network
- SigOpt. Conﬁdential. Training setup comparison ImageNet Pretrained Convolutional Layers Fully Connected Layer ImageNet Pretrained Convolutional Layers Fully Connected Layer Input Convolutional Features Classification Input Convolutional Features Classification Fine Tuning Feature Extractor Tuning Tuning
- SigOpt. Conﬁdential. Hyperparameter setup 40 Hyperparameter Lower Bound Upper Bound Log Learning Rate 1.2e-4 1.0 Learning Rate Scheduler 0 0.99 Batch Size (powers of 2) 16 256 Nesterov False True Log Weight Decay 1.2e-5 1.0 Momentum 0 0.9 Scheduler Step 1 20
- SigOpt. Conﬁdential. Fine-tuning the smaller network signiﬁcantly outperforms feature extraction on a bigger network Results: Optimizing and tuning the full network outperforms 41 Multitask optimization drives signiﬁcant performance gains +3.92% +1.58%
- SigOpt. Conﬁdential.42 Insight: Multitask eﬃciency at the hyperparameter level Example: Learning rate accuracy and values by cost of task over time Progression of observations over time Accuracy and value for each observation Parameter importance analysis
- SigOpt. Conﬁdential. Insight: Parallelization further accelerates wall-clock time 43 928 total hours to optimize ResNet 18 220 observations per experiment 20 p2.xlarge AWS ec2 instances 45 hour actual wall-clock time
- SigOpt. Conﬁdential. Implication: Fine-tuning signiﬁcantly outperforms Cost Breakdown for Multitask Optimization Cost eﬃciency Feature Extractor ResNet 50 Fine-Tuning ResNet 18 Hours per training 4.08 4.2 Observations 220 220 Number of Runs 1 1 Total compute hours 898 924 Cost per GPU-hour $0.90 $0.90 % Improvement 1.58% 3.92% Total compute cost $808 $832 cost ($) per % improvement $509 $20 Similar Compute Cost Fine-Tuning Signiﬁcantly More Eﬃcient and Eﬀective Similar Wall-Clock Time
- SigOpt. Conﬁdential. Implication: Multiple beneﬁts from multitask 45 Tuning ResNet-18 Cost eﬃciency Multitask Bayesian Random Hours per training 4.2 4.2 4.2 Observations 220 646 646 Number of Runs 1 1 20 Total compute hours 924 2,713 54,264 Cost per GPU-hour $0.90 $0.90 $0.90 Total compute cost $832 $2,442 $48,838 Time to optimize Multitask Bayesian Random Total compute hours 924 2,713 54,264 # of Machines 20 20 20 Wall-clock time (hrs) 46 136 2,713 1.7% the cost of random search to achieve similar performance 58x faster wall-clock time to optimize with multitask versus random search
- SigOpt. Conﬁdential. Techniques 1. Metric deﬁnition: multimetric optimization Read the blog here. 2. Model search: conditional parameters Read the blog here. 3. Long training cycles: multitask optimization Read the blog here.
- SigOpt. Conﬁdential. Try our solution Sign up at sigopt.com/try-it today. Register with code: 1SFSPON25 https://sanfrancisco.theaisummit.com September 25-26, 2019 Register with code: SIGOPT20 https://twimlcon.com/ October 1-2, 2019 Download eBook https://twimlai.com/announcing-our- ai-platforms-series-and-ebooks/

Advertisement