SlideShare a Scribd company logo
SigOpt. Confidential.
Reducing Operational Barriers
to Model Training
Alexandra Johnson
alexandra@sigopt.com @alexandraj777
SigOpt. Confidential.
Alexandra Johnson
Software Engineer
SigOpt. Confidential.3
Operational Barriers
Machine learning experts specialize in:
• Gathering data
• Building models
• Extracting insights
Infrastructure engineers specialize in:
• Building shared tools
• Application scalability and performance
• Keeping track of interactions between large
distributed systems
The Challenge:
• Machine learning experts want to maximize
the performance of their models
• SigOpt provides an API for hyperparameter
optimization (HPO)
• SigOpt HPO helps ML experts maximize the
performance of their models!
• ML experts need to use clusters to properly
perform HPO
SigOpt. Confidential.4
Machine Learning Infrastructure
Model building
workflow problems
Infrastructure /
devops solutions+ = Machine learning
infrastructure
SigOpt. Confidential.5
Case Study: Building SigOpt Orchestrate
• Project started in 2018 to bridge ML and infrastructure
• What problems did our customers ask us to solve?
• How did a challenge for the user turn into a technical problem?
• Which tools / technologies did we use?
SigOpt. Confidential.6
Challenge #1: Can't Train Model on Laptop
Problem: Setup each remote machine
Initial Solution:
• Write a setup script to install dependencies
• SCP data, code, and setup script to every
remote machine
SigOpt. Confidential.7
Solution #1: Containerize!
Problem: Setup each remote machine
New Solution:
• Containerize code and dependencies
on the user's local environment
• Push the image to a registry
• Each machine pulls the image from a
registry
Registry
SigOpt. Confidential.8
Challenge #2: Start Training in Parallel
Problem: Kick off the hyperparameter
optimization job on six machines at once
Initial Solution:
• Open a tmux window on every
remote instance
• SSH over command to run setup
script into each tmux window
• SSH over command to train model
into each tmux window
SigOpt. Confidential.9
Solution #2: Kubernetes!
Problem: Kick off the hyperparameter
optimization job on six machines at once
New Solution:
• Spin up AWS EKS (Kubernetes) cluster
• Create a job spec
• "run 6 copies of this container at the
same time"
• Submit job spec to Kubernetes API
• Kubernetes starts the job on the cluster
SigOpt. Confidential.10
Challenge #3: View Progress and Debug
Problem: View the status of a hyperparameter
optimization job at a glance
Initial Solution:
• Save hostname and error information as
metadata in calls to external API
• SSH into machines and view the logs
directly (pre-Kubernetes)
• Use Kubernetes CLI to view logs
SigOpt. Confidential.11
Solution #3: Build a CLI!
Problem: View the status of a hyperparameter
optimization job at a glance
New Solution:
• Write an interface for the data scientist to
interact with the infrastructure tool
• We chose a command line interface
• Serves as an abstraction on top of
Kubernetes APIs + externals APIs
• Screenshots (top and bottom)
○ sigopt logs <experiment_id>
○ sigopt status <experiment_id>
SigOpt. Confidential.12
Final Thoughts...
• We're hiring!
• Connect with us
• Paper: Orchestrate: Infrastructure for Enabling Parallelism
during Hyperparameter Optimization, Alexandra Johnson
and Michael McCourt
SigOpt. Confidential.
Thank you!
Any questions?
Alexandra Johnson
alexandra@sigopt.com @alexandraj777

More Related Content

What's hot

Serverless Orchestration with Azure Durable Functions
Serverless Orchestration with Azure Durable FunctionsServerless Orchestration with Azure Durable Functions
Serverless Orchestration with Azure Durable FunctionsCallon Campbell
 
A Cloud Native Walkthrough: Infrastructure + Operations
A Cloud Native Walkthrough: Infrastructure + OperationsA Cloud Native Walkthrough: Infrastructure + Operations
A Cloud Native Walkthrough: Infrastructure + OperationsFrancisco Arturo Viveros
 
A Consul Service Mesh Integration Case Study with Presto
A Consul Service Mesh Integration Case Study with PrestoA Consul Service Mesh Integration Case Study with Presto
A Consul Service Mesh Integration Case Study with PrestoFrancisco Arturo Viveros
 
Knative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and OpenshiftKnative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and OpenshiftChris Suszyński
 
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CodeOps Technologies LLP
 
Aoyagi Lab Colloquium - 2015-06-01
Aoyagi Lab Colloquium - 2015-06-01Aoyagi Lab Colloquium - 2015-06-01
Aoyagi Lab Colloquium - 2015-06-01Michele Bianchi
 
Machine learning on kubernetes
Machine learning on kubernetesMachine learning on kubernetes
Machine learning on kubernetesAnirudh Ramanathan
 
AWS Community Day - Amy Negrette - Gateways to Gateways
AWS Community Day - Amy Negrette - Gateways to GatewaysAWS Community Day - Amy Negrette - Gateways to Gateways
AWS Community Day - Amy Negrette - Gateways to GatewaysAWS Chicago
 
Service Meshes, but at what cost?
Service Meshes, but at what cost?Service Meshes, but at what cost?
Service Meshes, but at what cost?Lee Calcote
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
 
Reactive microservices with eclipse vert.x
Reactive microservices with eclipse vert.xReactive microservices with eclipse vert.x
Reactive microservices with eclipse vert.xRam Maddali
 
App Mod 04: Reactive microservices with eclipse vert.x
App Mod 04: Reactive microservices with eclipse vert.xApp Mod 04: Reactive microservices with eclipse vert.x
App Mod 04: Reactive microservices with eclipse vert.xJudy Breedlove
 
Azure Integration DTAP Series, How to go from Development to Production – Par...
Azure Integration DTAP Series, How to go from Development to Production – Par...Azure Integration DTAP Series, How to go from Development to Production – Par...
Azure Integration DTAP Series, How to go from Development to Production – Par...BizTalk360
 
Ray distributed python framework
Ray distributed python framework Ray distributed python framework
Ray distributed python framework AryanJadon3
 
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSBUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSCodeOps Technologies LLP
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningAlex Ioannides
 
Serverless Computing with Azure
Serverless Computing with AzureServerless Computing with Azure
Serverless Computing with AzureAnalben Mehta
 
GAB 2017 - Logic Apps and Azure Functions
GAB 2017 - Logic Apps and Azure FunctionsGAB 2017 - Logic Apps and Azure Functions
GAB 2017 - Logic Apps and Azure FunctionsWagner Silveira
 
Pivotal tracker getting started
Pivotal tracker getting startedPivotal tracker getting started
Pivotal tracker getting startedAhmed Amer
 

What's hot (20)

Serverless Orchestration with Azure Durable Functions
Serverless Orchestration with Azure Durable FunctionsServerless Orchestration with Azure Durable Functions
Serverless Orchestration with Azure Durable Functions
 
A Cloud Native Walkthrough: Infrastructure + Operations
A Cloud Native Walkthrough: Infrastructure + OperationsA Cloud Native Walkthrough: Infrastructure + Operations
A Cloud Native Walkthrough: Infrastructure + Operations
 
A Consul Service Mesh Integration Case Study with Presto
A Consul Service Mesh Integration Case Study with PrestoA Consul Service Mesh Integration Case Study with Presto
A Consul Service Mesh Integration Case Study with Presto
 
Knative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and OpenshiftKnative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and Openshift
 
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
 
Knative serving
Knative servingKnative serving
Knative serving
 
Aoyagi Lab Colloquium - 2015-06-01
Aoyagi Lab Colloquium - 2015-06-01Aoyagi Lab Colloquium - 2015-06-01
Aoyagi Lab Colloquium - 2015-06-01
 
Machine learning on kubernetes
Machine learning on kubernetesMachine learning on kubernetes
Machine learning on kubernetes
 
AWS Community Day - Amy Negrette - Gateways to Gateways
AWS Community Day - Amy Negrette - Gateways to GatewaysAWS Community Day - Amy Negrette - Gateways to Gateways
AWS Community Day - Amy Negrette - Gateways to Gateways
 
Service Meshes, but at what cost?
Service Meshes, but at what cost?Service Meshes, but at what cost?
Service Meshes, but at what cost?
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
Reactive microservices with eclipse vert.x
Reactive microservices with eclipse vert.xReactive microservices with eclipse vert.x
Reactive microservices with eclipse vert.x
 
App Mod 04: Reactive microservices with eclipse vert.x
App Mod 04: Reactive microservices with eclipse vert.xApp Mod 04: Reactive microservices with eclipse vert.x
App Mod 04: Reactive microservices with eclipse vert.x
 
Azure Integration DTAP Series, How to go from Development to Production – Par...
Azure Integration DTAP Series, How to go from Development to Production – Par...Azure Integration DTAP Series, How to go from Development to Production – Par...
Azure Integration DTAP Series, How to go from Development to Production – Par...
 
Ray distributed python framework
Ray distributed python framework Ray distributed python framework
Ray distributed python framework
 
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSBUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine Learning
 
Serverless Computing with Azure
Serverless Computing with AzureServerless Computing with Azure
Serverless Computing with Azure
 
GAB 2017 - Logic Apps and Azure Functions
GAB 2017 - Logic Apps and Azure FunctionsGAB 2017 - Logic Apps and Azure Functions
GAB 2017 - Logic Apps and Azure Functions
 
Pivotal tracker getting started
Pivotal tracker getting startedPivotal tracker getting started
Pivotal tracker getting started
 

Similar to SigOpt at MLconf - Reducing Operational Barriers to Model Training

Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
Effective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and DapperEffective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and DapperMike Melusky
 
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Embarcados
 
Effective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and DapperEffective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and DapperMike Melusky
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in productionAntoine Sauray
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-productionTuri, Inc.
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Catalogic Software
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...JSFestUA
 
Why Upgrade to v8.6?
Why Upgrade to v8.6?Why Upgrade to v8.6?
Why Upgrade to v8.6?BillCavaUs
 
Ship code like a keptn
Ship code like a keptnShip code like a keptn
Ship code like a keptnRob Jahn
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Using Puppet, Ansible, and MongoDB Ops Manager Together to Create Your Own On...
Using Puppet, Ansible, and MongoDB Ops Manager Together to Create Your Own On...Using Puppet, Ansible, and MongoDB Ops Manager Together to Create Your Own On...
Using Puppet, Ansible, and MongoDB Ops Manager Together to Create Your Own On...MongoDB
 
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...MongoDB
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsAmbassador Labs
 
How to Build Single Page HTML5 Apps that Scale
How to Build Single Page HTML5 Apps that ScaleHow to Build Single Page HTML5 Apps that Scale
How to Build Single Page HTML5 Apps that ScalePhil Leggetter
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDocker, Inc.
 

Similar to SigOpt at MLconf - Reducing Operational Barriers to Model Training (20)

Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
Effective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and DapperEffective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and Dapper
 
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
 
Effective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and DapperEffective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and Dapper
 
Developing Digital Twins
Developing Digital TwinsDeveloping Digital Twins
Developing Digital Twins
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-production
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
 
Why Upgrade to v8.6?
Why Upgrade to v8.6?Why Upgrade to v8.6?
Why Upgrade to v8.6?
 
Ship code like a keptn
Ship code like a keptnShip code like a keptn
Ship code like a keptn
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Using Puppet, Ansible, and MongoDB Ops Manager Together to Create Your Own On...
Using Puppet, Ansible, and MongoDB Ops Manager Together to Create Your Own On...Using Puppet, Ansible, and MongoDB Ops Manager Together to Create Your Own On...
Using Puppet, Ansible, and MongoDB Ops Manager Together to Create Your Own On...
 
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
How to Build Single Page HTML5 Apps that Scale
How to Build Single Page HTML5 Apps that ScaleHow to Build Single Page HTML5 Apps that Scale
How to Build Single Page HTML5 Apps that Scale
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 

More from SigOpt

Optimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment ManagementOptimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment ManagementSigOpt
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the EnterpriseSigOpt
 
Efficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric OptimizationEfficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric OptimizationSigOpt
 
Detecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningDetecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningSigOpt
 
Metric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseMetric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseSigOpt
 
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategyTuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategySigOpt
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningSigOpt
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1SigOpt
 
Tuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceTuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceSigOpt
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarSigOpt
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019SigOpt
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarSigOpt
 
SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...SigOpt
 
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt
 
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt
 
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt
 
Lessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleLessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleSigOpt
 
Modeling at scale in systematic trading
Modeling at scale in systematic tradingModeling at scale in systematic trading
Modeling at scale in systematic tradingSigOpt
 

More from SigOpt (20)

Optimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment ManagementOptimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment Management
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
 
Efficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric OptimizationEfficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric Optimization
 
Detecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningDetecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep Learning
 
Metric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseMetric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use Case
 
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategyTuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
 
Tuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceTuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model Performance
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques Webinar
 
SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
 
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
 
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the Untunable
 
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimization
 
Lessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleLessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scale
 
Modeling at scale in systematic trading
Modeling at scale in systematic tradingModeling at scale in systematic trading
Modeling at scale in systematic trading
 

Recently uploaded

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessWSO2
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignNeo4j
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEJelle | Nordend
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)Max Lee
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Gáspár Nagy
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisNeo4j
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationHelp Desk Migration
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareinfo611746
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabbereGrabber
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationWave PLM
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Soroosh Khodami
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 

Recently uploaded (20)

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 

SigOpt at MLconf - Reducing Operational Barriers to Model Training

  • 1. SigOpt. Confidential. Reducing Operational Barriers to Model Training Alexandra Johnson alexandra@sigopt.com @alexandraj777
  • 3. SigOpt. Confidential.3 Operational Barriers Machine learning experts specialize in: • Gathering data • Building models • Extracting insights Infrastructure engineers specialize in: • Building shared tools • Application scalability and performance • Keeping track of interactions between large distributed systems The Challenge: • Machine learning experts want to maximize the performance of their models • SigOpt provides an API for hyperparameter optimization (HPO) • SigOpt HPO helps ML experts maximize the performance of their models! • ML experts need to use clusters to properly perform HPO
  • 4. SigOpt. Confidential.4 Machine Learning Infrastructure Model building workflow problems Infrastructure / devops solutions+ = Machine learning infrastructure
  • 5. SigOpt. Confidential.5 Case Study: Building SigOpt Orchestrate • Project started in 2018 to bridge ML and infrastructure • What problems did our customers ask us to solve? • How did a challenge for the user turn into a technical problem? • Which tools / technologies did we use?
  • 6. SigOpt. Confidential.6 Challenge #1: Can't Train Model on Laptop Problem: Setup each remote machine Initial Solution: • Write a setup script to install dependencies • SCP data, code, and setup script to every remote machine
  • 7. SigOpt. Confidential.7 Solution #1: Containerize! Problem: Setup each remote machine New Solution: • Containerize code and dependencies on the user's local environment • Push the image to a registry • Each machine pulls the image from a registry Registry
  • 8. SigOpt. Confidential.8 Challenge #2: Start Training in Parallel Problem: Kick off the hyperparameter optimization job on six machines at once Initial Solution: • Open a tmux window on every remote instance • SSH over command to run setup script into each tmux window • SSH over command to train model into each tmux window
  • 9. SigOpt. Confidential.9 Solution #2: Kubernetes! Problem: Kick off the hyperparameter optimization job on six machines at once New Solution: • Spin up AWS EKS (Kubernetes) cluster • Create a job spec • "run 6 copies of this container at the same time" • Submit job spec to Kubernetes API • Kubernetes starts the job on the cluster
  • 10. SigOpt. Confidential.10 Challenge #3: View Progress and Debug Problem: View the status of a hyperparameter optimization job at a glance Initial Solution: • Save hostname and error information as metadata in calls to external API • SSH into machines and view the logs directly (pre-Kubernetes) • Use Kubernetes CLI to view logs
  • 11. SigOpt. Confidential.11 Solution #3: Build a CLI! Problem: View the status of a hyperparameter optimization job at a glance New Solution: • Write an interface for the data scientist to interact with the infrastructure tool • We chose a command line interface • Serves as an abstraction on top of Kubernetes APIs + externals APIs • Screenshots (top and bottom) ○ sigopt logs <experiment_id> ○ sigopt status <experiment_id>
  • 12. SigOpt. Confidential.12 Final Thoughts... • We're hiring! • Connect with us • Paper: Orchestrate: Infrastructure for Enabling Parallelism during Hyperparameter Optimization, Alexandra Johnson and Michael McCourt
  • 13. SigOpt. Confidential. Thank you! Any questions? Alexandra Johnson alexandra@sigopt.com @alexandraj777