Tuning the Untunable
Techniques for Deep Learning Optimization
Patrick Hayes, CTO
November 2018
Empower experts everywhere to
amplify and accelerate their
modeling impact
DevOps Builds and Maintains Proprietary Infrastructure
Tasks that depend on your particular infrastructure
(e.g., model lifecycle management, model deployment)
Experts Focus on Data Science
Tasks that benefit from domain expertise
(e.g., metric-function selection)
Our model management philosophy
Software Automates Repeatable Tasks
Tasks that do not benefit from domain expertise
(e.g., training orchestration, model tuning)
Hyperparameter Optimization
Model tuning
Grid Search
Random Search Bayesian Optimization
Training & Tuning
Evolutionary Algorithms
Deep Learning Architecture Search
Hyperparameter Search
How we optimize models
We never
access your
data or models
Iterative, automated optimization
Built specifically
for scalable
enterprise use
cases
1. Install the client library
2. Create the experiment
3. Parameterize your model
3. Parameterize your model (continued…)
4. Run the optimization loop
Easily track, manage and reproduce experiments
Uncover model insights with
parameter importance
Monitor performance improvement as
the experiment progresses via API, the
web or your mobile phone
Cycle through analysis, suggestions,
history, and other experiment insights
Benefits: Better, cheaper, faster model development
90% Cost Savings
Maximize utilization of compute
https://aws.amazon.com/blogs/machine-learning/fast
-cnn-tuning-with-aws-gpu-instances-and-sigopt/
10x Faster Time to Tune
Less expert time per model
https://devblogs.nvidia.com/sigopt-deep-learning-hy
perparameter-optimization/
Better Performance
No free lunch, but optimize any model
https://arxiv.org/pdf/1603.09441.pdf
Overview of features behind SigOpt
Enterprise
Platform
Optimization
Engine
Experiment
Insights
Reproducibility
Intuitive web dashboards
Cross-team permissions
and collaboration
Advanced experiment
visualizations
Organizational
experiment analysis
Parameter importance
analysis
Multimetric optimization
Continuous, categorical,
or integer parameters
Constraints and failure
regions
Up to 10k observations,
100 parameters
Multitask optimization
and high parallelism
Conditional
parameters
Infrastructure agnostic
REST API
Model agnostic
Black-Box Interface
Doesn’t touch data
Libraries for Python,
Java, R, and MATLAB
Key:
Only HPO solution
with this capability
Applied AI introduces
unique challenges
Failed observations
Constraints
Uncertainty
Competing objectives
Lengthy training cycles
Cluster orchestration
sigopt.com/blog
How do you more efficiently tune models
that take days (or weeks) to train?
Source: AI & Compute, OpenAI Blog, May 2018
Speech
Recognition
Computer
Vision
Deep
Reinforcement
Learning
Training Resnet-50 on ImageNet takes 12 hours
Tuning 12 parameters requires at least 120 distinct models
That equals 1440 hours, or 60 days, of training time
Tuning & training
inefficiency
Training cluster
management
Multitask Optimization
Start with a simple idea:
We can use information about “partially trained” models
to more efficiently inform hyperparameter tuning
Building on prior research related to successive halving and Bayesian
techniques, Multitask samples lower-cost tasks to inexpensively learn
about the model and accelerate full Bayesian Optimization.
Swersky, Snoek, and Adams, “Multi-Task Bayesian Optimization”
http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf
“
Cheap approximations promise a route to tractability, but bias and
noise complicate their use. An unknown bias arises whenever a
computational model incompletely models a real-world phenomenon,
and is pervasive in applications.
Poloczek, Wang, and Frazier, “Multi-Information Source Optimization”
https://papers.nips.cc/paper/7016-multi-information-source-optimization.pdf
“
Visualizing multitask: learning from approximation
Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf
Partial Full
Visualizing multitask: Power of correlated functions
Source: Swersky, Snoek, & Adams, http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization
Alternative approaches to lengthy training cycles
Early Termination
(e.g., Hyperband)
Multitask Optimization
Case: Putting multitask optimization to the test
Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf
Goal: Benchmark the performance of Multitask and Early Termination methods
Model: SVM
Dataset: Covertype, Vehicle, MNIST
Methods:
● Multitask Enhanced (Fabolas)
● Multitask Basic (MTBO)
● Early Termination (Hyperband)
● Baseline 1 (Expected Improvement)
● Baseline 2 (Entropy Search)
Result: Multitask outperforms other methods
Pull from paper
Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf
Multitask Optimization in Practice
Making multitask optimization accessible for anyone
Allow user to
flexibly define
low-cost tasks
Multitask experiment insights
Multitask experiment insights
Multitask experiment insights
Case: Putting multitask optimization to the test
Goal: Benchmark the performance of Multitask and Early Termination methods across a broad
variety of tasks and strategies to get a more complete sense of performance
Model: CNN
Dataset: CIFAR-10
Methods:
● Multitask Optimization
● Early Termination (Hyperband)
● Random Search
Multitask shows best performance
Benchmark: Which optimization technique most
efficiently tunes 10 hyperparameters under
compute constraints?
Tuning & training
inefficiency
Training cluster
management
Complexity of deep learning DevOps
Concurrent optimization experiments
Concurrent model configuration
evaluations
Multiple GPUs per model
Training one model, no optimization
Basic Case Advanced Case
Multiple Users
Problems: Infrastructure, scheduling,
dependencies, code, logging
Solution: SigOpt Orchestrate is a CLI for
managing training infrastructure and
running optimization experiments
1 Spin up and share training clusters
$ sigopt create cluster $ sigopt run -f orchestrate.yml
Containerized
Model
Schedule optimization experiments2
Integrate with the optimization API3 Monitor experiment and infrastructure4
Optimization
How it works: Command-line orchestration
SigOpt Orchestrate
Demo
Seamless integration into your model code
Easily define optimization experiments
Easily kick off optimization experiment jobs
Check the status of active and completed experiments
View experiment logs across multiple workers
Track metadata and monitor your results
Automated cluster
management
Efficient training
and tuning
Training Resnet-50 on ImageNet takes 12 4 hours
Tuning 12 parameters requires at least 120 distinct models
That equals 1,440 480 hours, or 60 20 days, of training time
While training on 10 machines, wall-clock time is 2 days
Failure regions
Constraints
Uncertainty
Competing objectives
Lengthy training cycles
Cluster orchestration
sigopt.com/blog
Thank you
Try SigOpt Orchestrate: https://sigopt.com/orchestrate
Free access for Academics & Nonprofits: https://sigopt.com/edu
Solution-oriented program for the Enterprise: https://sigopt.com/pricing
Leading applied optimization research: https://sigopt.com/research
… and we're hiring! https://sigopt.com/careers

Tuning the Untunable - Insights on Deep Learning Optimization

  • 1.
    Tuning the Untunable Techniquesfor Deep Learning Optimization Patrick Hayes, CTO November 2018
  • 2.
    Empower experts everywhereto amplify and accelerate their modeling impact
  • 3.
    DevOps Builds andMaintains Proprietary Infrastructure Tasks that depend on your particular infrastructure (e.g., model lifecycle management, model deployment) Experts Focus on Data Science Tasks that benefit from domain expertise (e.g., metric-function selection) Our model management philosophy Software Automates Repeatable Tasks Tasks that do not benefit from domain expertise (e.g., training orchestration, model tuning)
  • 5.
    Hyperparameter Optimization Model tuning GridSearch Random Search Bayesian Optimization Training & Tuning Evolutionary Algorithms Deep Learning Architecture Search Hyperparameter Search
  • 6.
    How we optimizemodels We never access your data or models Iterative, automated optimization Built specifically for scalable enterprise use cases
  • 7.
    1. Install theclient library
  • 8.
    2. Create theexperiment
  • 9.
  • 10.
    3. Parameterize yourmodel (continued…)
  • 11.
    4. Run theoptimization loop
  • 12.
    Easily track, manageand reproduce experiments Uncover model insights with parameter importance Monitor performance improvement as the experiment progresses via API, the web or your mobile phone Cycle through analysis, suggestions, history, and other experiment insights
  • 13.
    Benefits: Better, cheaper,faster model development 90% Cost Savings Maximize utilization of compute https://aws.amazon.com/blogs/machine-learning/fast -cnn-tuning-with-aws-gpu-instances-and-sigopt/ 10x Faster Time to Tune Less expert time per model https://devblogs.nvidia.com/sigopt-deep-learning-hy perparameter-optimization/ Better Performance No free lunch, but optimize any model https://arxiv.org/pdf/1603.09441.pdf
  • 14.
    Overview of featuresbehind SigOpt Enterprise Platform Optimization Engine Experiment Insights Reproducibility Intuitive web dashboards Cross-team permissions and collaboration Advanced experiment visualizations Organizational experiment analysis Parameter importance analysis Multimetric optimization Continuous, categorical, or integer parameters Constraints and failure regions Up to 10k observations, 100 parameters Multitask optimization and high parallelism Conditional parameters Infrastructure agnostic REST API Model agnostic Black-Box Interface Doesn’t touch data Libraries for Python, Java, R, and MATLAB Key: Only HPO solution with this capability
  • 15.
  • 16.
    Failed observations Constraints Uncertainty Competing objectives Lengthytraining cycles Cluster orchestration sigopt.com/blog
  • 17.
    How do youmore efficiently tune models that take days (or weeks) to train?
  • 18.
    Source: AI &Compute, OpenAI Blog, May 2018
  • 19.
  • 20.
    Training Resnet-50 onImageNet takes 12 hours Tuning 12 parameters requires at least 120 distinct models That equals 1440 hours, or 60 days, of training time
  • 21.
  • 22.
  • 23.
    Start with asimple idea: We can use information about “partially trained” models to more efficiently inform hyperparameter tuning
  • 24.
    Building on priorresearch related to successive halving and Bayesian techniques, Multitask samples lower-cost tasks to inexpensively learn about the model and accelerate full Bayesian Optimization. Swersky, Snoek, and Adams, “Multi-Task Bayesian Optimization” http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf “
  • 25.
    Cheap approximations promisea route to tractability, but bias and noise complicate their use. An unknown bias arises whenever a computational model incompletely models a real-world phenomenon, and is pervasive in applications. Poloczek, Wang, and Frazier, “Multi-Information Source Optimization” https://papers.nips.cc/paper/7016-multi-information-source-optimization.pdf “
  • 26.
    Visualizing multitask: learningfrom approximation Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf Partial Full
  • 27.
    Visualizing multitask: Powerof correlated functions Source: Swersky, Snoek, & Adams, http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization
  • 28.
    Alternative approaches tolengthy training cycles Early Termination (e.g., Hyperband) Multitask Optimization
  • 29.
    Case: Putting multitaskoptimization to the test Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf Goal: Benchmark the performance of Multitask and Early Termination methods Model: SVM Dataset: Covertype, Vehicle, MNIST Methods: ● Multitask Enhanced (Fabolas) ● Multitask Basic (MTBO) ● Early Termination (Hyperband) ● Baseline 1 (Expected Improvement) ● Baseline 2 (Entropy Search)
  • 30.
    Result: Multitask outperformsother methods Pull from paper Source: Klein et al., https://arxiv.org/pdf/1605.07079.pdf
  • 31.
  • 32.
    Making multitask optimizationaccessible for anyone Allow user to flexibly define low-cost tasks
  • 33.
  • 34.
  • 35.
  • 36.
    Case: Putting multitaskoptimization to the test Goal: Benchmark the performance of Multitask and Early Termination methods across a broad variety of tasks and strategies to get a more complete sense of performance Model: CNN Dataset: CIFAR-10 Methods: ● Multitask Optimization ● Early Termination (Hyperband) ● Random Search
  • 37.
    Multitask shows bestperformance Benchmark: Which optimization technique most efficiently tunes 10 hyperparameters under compute constraints?
  • 38.
  • 39.
    Complexity of deeplearning DevOps Concurrent optimization experiments Concurrent model configuration evaluations Multiple GPUs per model Training one model, no optimization Basic Case Advanced Case Multiple Users
  • 40.
    Problems: Infrastructure, scheduling, dependencies,code, logging Solution: SigOpt Orchestrate is a CLI for managing training infrastructure and running optimization experiments
  • 41.
    1 Spin upand share training clusters $ sigopt create cluster $ sigopt run -f orchestrate.yml Containerized Model Schedule optimization experiments2 Integrate with the optimization API3 Monitor experiment and infrastructure4 Optimization How it works: Command-line orchestration
  • 42.
  • 43.
  • 44.
  • 45.
    Easily kick offoptimization experiment jobs
  • 46.
    Check the statusof active and completed experiments
  • 47.
    View experiment logsacross multiple workers
  • 48.
    Track metadata andmonitor your results
  • 49.
  • 50.
    Training Resnet-50 onImageNet takes 12 4 hours Tuning 12 parameters requires at least 120 distinct models That equals 1,440 480 hours, or 60 20 days, of training time While training on 10 machines, wall-clock time is 2 days
  • 51.
    Failure regions Constraints Uncertainty Competing objectives Lengthytraining cycles Cluster orchestration sigopt.com/blog
  • 52.
    Thank you Try SigOptOrchestrate: https://sigopt.com/orchestrate Free access for Academics & Nonprofits: https://sigopt.com/edu Solution-oriented program for the Enterprise: https://sigopt.com/pricing Leading applied optimization research: https://sigopt.com/research … and we're hiring! https://sigopt.com/careers