Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SigOpt at GTC - Reducing operational barriers to optimization

66 views

Published on

Advanced hardware like NVIDIA technology lowers technical barriers to model size and scope, but issues remain in areas like model performance and training infrastructure management. We'll discuss operational challenges to training models at scale with a particular focus on how training management and hyperparameter tuning can inform each other to accomplish specific goals. We'll also explore techniques like parallelism and scheduling, discuss their impact on model optimization, and compare various techniques. We'll also evaluate results of this approach. In particular, we'll focus on how new tools that automate training orchestration accelerate model development and increase the volume and quality of models in production.

Published in: Software
  • Be the first to comment

  • Be the first to like this

SigOpt at GTC - Reducing operational barriers to optimization

  1. 1. Accelerating Model Development by Reducing Operational Barriers Patrick Hayes, Cofounder & CTO, SigOpt Talk ID: S9556
  2. 2. Accelerate and amplify the impact of modelers everywhere
  3. 3. 3
  4. 4. Hardware Environment SigOpt automates experimentation and optimization Transformation Labeling Pre-Processing Pipeline Dev. Feature Eng. Feature Stores Data Preparation Experimentation, Training, Evaluation Notebook & Model Framework Experimentation & Model Optimization On-Premise Hybrid Multi-Cloud Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management Validation Serving Deploying Monitoring Managing Inference Online Testing Model Deployment
  5. 5. Hyperparameter Optimization Model Tuning Grid Search Random Search Bayesian Optimization Training & Tuning Evolutionary Algorithms Deep Learning Architecture Search Hyperparameter Search
  6. 6. How it works: Seamlessly tune any model ML, DL or Simulation Model Model Evaluation or Backtest Testing Data Training Data Never accesses your data or models
  7. 7. How it Works: Seamless implementation for any stack Install SigOpt1 Create experiment2 Parameterize model3 Run optimization loop4 Analyze experiments5
  8. 8. How it Works: Seamless implementation for any stack Install SigOpt1 Create experiment2 Parameterize model3 Run optimization loop4 Analyze experiments5 https://bit.ly/sigopt-notebook
  9. 9. How it Works: Seamless implementation for any stack Install SigOpt1 Create experiment2 Parameterize model3 Run optimization loop4 Analyze experiments5 https://bit.ly/sigopt-notebook
  10. 10. How it Works: Seamless implementation for any stack Install SigOpt1 Create experiment2 Parameterize model3 Run optimization loop4 Analyze experiments5 https://bit.ly/sigopt-notebook
  11. 11. How it Works: Seamless implementation for any stack Install SigOpt1 Create experiment2 Parameterize model3 Run optimization loop4 Analyze experiments5 https://bit.ly/sigopt-notebook
  12. 12. How it Works: Seamless implementation for any stack Install SigOpt1 Create experiment2 Parameterize model3 Run optimization loop4 Analyze experiments5
  13. 13. 13 Benefits: Better, Cheaper, Faster Model Development 90% Cost Savings Maximize utilization of compute https://aws.amazon.com/blogs/machine-learning/fast- cnn-tuning-with-aws-gpu-instances-and-sigopt/ 10x Faster Time to Tune Less expert time per model https://devblogs.nvidia.com/sigopt-deep-learning-hyp erparameter-optimization/ Better Performance No free lunch, but optimize any model https://arxiv.org/pdf/1603.09441.pdf
  14. 14. 14 Overview of Features Behind SigOpt Enterprise Platform Optimization Engine Experiment Insights Reproducibility Intuitive web dashboards Cross-team permissions and collaboration Advanced experiment visualizations Organizational experiment analysis Parameter importance analysis Multimetric optimization Continuous, categorical, or integer parameters Constraints and failure regions Up to 10k observations, 100 parameters Multitask optimization and high parallelism Conditional parameters Infrastructure agnostic REST API Model agnostic Black-box interface Doesn’t touch data Libraries for Python, Java, R, and MATLAB Key: Only HPO solution with this capability
  15. 15. Applied deep learning introduces unique challenges
  16. 16. Failed observations Constraints Uncertainty Competing objectives Lengthy training cycles Cluster orchestration sigopt.com/blog
  17. 17. How do you more efficiently tune models that take days (or weeks) to train?
  18. 18. 18 AlexNex to AlphaGo Zero: 300,000x Increase in Compute 2012 20192013 2014 2015 2016 2017 2018 .00001 10,000 1 Petaflop/s-Day(Training) Year • AlexNet • Dropout • Visualizing and Understanding Conv Nets • DQN • GoogleNet • DeepSpeech2 • ResNets • Xception • Neural Architecture Search • Neural Machine Translation • AlphaZero • AlphaGo Zero • TI7 Dota 1v1 VGG • Seq2Seq
  19. 19. 19 Speech Recognition Deep Reinforcement Learning Computer Vision
  20. 20. Training Resnet-50 on ImageNet takes 10 hours Tuning 12 parameters requires at least 120 distinct models That equals 1,200 hours, or 50 days, of training time
  21. 21. Running optimization tasks in parallel is critical to tuning expensive deep learning models
  22. 22. Multiple Users Concurrent Optimization Experiments Concurrent Model Configuration Evaluations Multiple GPUs per Model 22 Complexity of Deep Learning DevOps Training One Model, No Optimization Basic Case Advanced Case
  23. 23. 23 Cluster Orchestration 1 Spin up and share training clusters Schedule optimization experiments2 Integrate with the optimization API3 Monitor experiment and infrastructure4
  24. 24. Problems: Infrastructure, scheduling, dependencies, code, monitoring Solution: SigOpt Orchestrate is a CLI for managing training infrastructure and running optimization experiments
  25. 25. How it Works
  26. 26. Seamless Integration into Your Model Code
  27. 27. Easily Define Optimization Experiments
  28. 28. Easily Kick Off Optimization Experiment Jobs
  29. 29. Check the Status of Active and Completed Experiments
  30. 30. View Experiment Logs Across Multiple Workers
  31. 31. Track Metadata and Monitor Your Results
  32. 32. Automated Cluster Management
  33. 33. Training Resnet-50 on ImageNet takes 10 hours Tuning 12 parameters requires at least 120 distinct models That equals 1,200 hours, or 50 days, of training time While training on 20 machines, wall-clock time is 50 days 2.5 days
  34. 34. Failed Observations Constraints Uncertainty Competing Objectives Lengthy Training Cycles Cluster Orchestration sigopt.com/blog
  35. 35. Thank you! patrick@sigopt.com for additional questions. Try SigOpt Orchestrate: https://sigopt.com/orchestrate Free access for Academics & Nonprofits: https://sigopt.com/edu Solution-oriented program for the Enterprise: https://sigopt.com/pricing Leading applied optimization research: https://sigopt.com/research … and we're hiring! https://sigopt.com/careers

×