SlideShare a Scribd company logo
Evaluation System for a
Bayesian Optimization Service
Ian Dewancker1
, Michael McCourt1
, Scott Clark1
,
Patrick Hayes1
, Alexandra Johnson1
and George Ke2
1
SigOpt, 2
University of Waterloo
Objectives
An evaluation framework for rigorously testing and
quickly understanding the impact of changes to the
SigOpt optimization service.
• Perform end-to-end testing of service
• Compare between algorithm versions
• Compare against external baselines
Introduction
SigOpt o ers an optimization service to help cus-
tomers tune complex systems, simulations and mod-
els. Our optimization engine applies several concepts
from Bayesian optimization [1] and machine learning
to optimize customers metrics as quickly as possible.
In particular, we consider problems where the maxi-
mum is sought for an expensive function f : X æ R,
xopt = arg max
xœX
f(x),
SigOpt’s core optimization engine is a closed-source
fork of the open-source MOE project [2]. The SigOpt
service supports a succinct set of HTTP API end-
points for optimizing objective functions.
Figure 1: Hypothetical optimization methods A and B both
achieve the same Best Found of 0.97 after 40 evaluations.
Method A however finds the optimum in fewer evaluations.
Metrics
The performance metrics we consider for comparisons
on a given objective function are the best value seen
by the end of the optimization (Best Found), and
the area under the best seen curve (AUC). The
AUC metric can help to better di erentiate per-
formance, as shown in Figure 1. Optimization al-
gorithms are run at least 20 times on each function.
The distributions of the performance metrics are com-
pared using the non-parametric Mann-Whitney U
test, used in previous studies of Bayesian optimiza-
tion methods [3] [4].
Benchmark Suite
The tests for our evaluation system consist of closed-
form optimization functions [5] which are extensions
of an earlier set proposed for black-box optimization
evaluation. The benchmark suite is open source.
Infrastructure
Lightweight function evaluation processes are run
concurrently on a master machine with many cores.
External baseline methods including TPE, SMAC,
Spearmint, particle swarm optimization and random
search are run locally on master.
Figure 2: Architecture of evaluation system infrastructure
Each evaluation process communicates with an on-
demand cluster of SigOpt API workers, which in turn
co-ordinate each optimization request with a cluster
of instances running the SigOpt optimization engine
as well as a database used by the service. This design
closely follows the architecture of the production sys-
tem. Result data is archived in a simple, extensible
JSON format. The result data and summarizations
are then visualized on a small web application.
Visualization Tools
To better summarize the relative performance be-
tween two methods (A, B) using metric M, we cre-
ate histograms of the p-values returned by the Mann-
Whitney U tests. Test functions are first partitioned
by the relative expected value of the metric M, then
histograms are generated for the two sets.
wins(A > B)M = { func | E[MA] > E[MB] }
wins(B > A)M = { func | E[MB] > E[MA] }
Figure 3: Above: Histograms for two comparable methods.
Below: Method B (in green) has highly significant wins on test
functions, method A (in red) has only low significance wins
For inspecting comparative performance on individ-
ual functions, we plot the best value of the objective
metric seen after each function evaluation.
Figure 4: Comparative best seen trace on test function
Figure 4 shows the best seen traces of SigOpt (in red)
and particle swarm optimization (in blue) on a given
test function.
Conclusions
Our evaluation system has been become a valuable
analysis tool when considering algorithm or system
changes to the SigOpt optimization service. Data
driven performance analysis is an e ective way to en-
able faster iteration and evaluation of a wide spec-
trum of ideas. The system continues to guide im-
provements to the core SigOpt service by providing
empirical comparisons between internal changes, al-
ternative optimization methods, as well helping to
expose errors and bugs.
References
[1] Jasper Snoek, Hugo Larochelle, and Ryan P Adams.
Practical bayesian optimization of machine learning
algorithms.
In Advances in neural information processing systems,
pages 2951–2959, 2012.
[2] Scott Clark, Eric Liu, Peter Frazier, JiaLei Wang, Deniz
Oktay, and Norases Vesdapunt.
MOE: A global, black box optimization engine for real
world metric optimization.
https://github.com/Yelp/MOE, 2014.
[3] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown.
Sequential model-based optimization for general algorithm
configuration.
In Learning and Intelligent Optimization, pages 507–523.
Springer, 2011.
[4] Ian Dewancker, Michael McCourt, Scott Clark, Patrick
Hayes, Alexandra Johnson, and George Ke.
A strategy for ranking optimization methods using multiple
criteria.
In ICML AutoML Workshop, 2016.
[5] Michael McCourt.
Optimization Test Functions.
https://github.com/sigopt/evalset, 2016.
Contact Information
• Web: http://www.sigopt.com
• Email: contact@sigopt.com

More Related Content

What's hot

LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
butest
 
A scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clusteringA scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clustering
lau
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
MDO_Lab
 

What's hot (20)

Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
 
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
 
A Unified Framework for Retrieving Diverse Social Images
A Unified Framework for Retrieving Diverse Social ImagesA Unified Framework for Retrieving Diverse Social Images
A Unified Framework for Retrieving Diverse Social Images
 
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...
 
Decision tree clustering a columnstores tuple reconstruction
Decision tree clustering  a columnstores tuple reconstructionDecision tree clustering  a columnstores tuple reconstruction
Decision tree clustering a columnstores tuple reconstruction
 
A report on designing a model for improving CPU Scheduling by using Machine L...
A report on designing a model for improving CPU Scheduling by using Machine L...A report on designing a model for improving CPU Scheduling by using Machine L...
A report on designing a model for improving CPU Scheduling by using Machine L...
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
A scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clusteringA scalable collaborative filtering framework based on co-clustering
A scalable collaborative filtering framework based on co-clustering
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Object Tracking By Online Discriminative Feature Selection Algorithm
Object Tracking By Online Discriminative Feature Selection AlgorithmObject Tracking By Online Discriminative Feature Selection Algorithm
Object Tracking By Online Discriminative Feature Selection Algorithm
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 
Reconstruction of Pet Image Based On Kernelized Expectation-Maximization Method
Reconstruction of Pet Image Based On Kernelized Expectation-Maximization MethodReconstruction of Pet Image Based On Kernelized Expectation-Maximization Method
Reconstruction of Pet Image Based On Kernelized Expectation-Maximization Method
 
Issues in Query Processing and Optimization
Issues in Query Processing and OptimizationIssues in Query Processing and Optimization
Issues in Query Processing and Optimization
 

Viewers also liked

ייעוץ שיווקי
ייעוץ שיווקיייעוץ שיווקי
ייעוץ שיווקי
ronen_offer
 

Viewers also liked (14)

Ferramentas Sociais No ServiçO PúBlico
Ferramentas Sociais No ServiçO PúBlicoFerramentas Sociais No ServiçO PúBlico
Ferramentas Sociais No ServiçO PúBlico
 
LEDER SHOP
LEDER SHOPLEDER SHOP
LEDER SHOP
 
Let’s move!
Let’s move! Let’s move!
Let’s move!
 
Duke imports july ad
Duke imports july adDuke imports july ad
Duke imports july ad
 
ייעוץ שיווקי
ייעוץ שיווקיייעוץ שיווקי
ייעוץ שיווקי
 
Con0510 cs
Con0510 csCon0510 cs
Con0510 cs
 
Himpunan Koleksi Tempat Menarik Di Shah Alam
Himpunan Koleksi Tempat Menarik Di Shah AlamHimpunan Koleksi Tempat Menarik Di Shah Alam
Himpunan Koleksi Tempat Menarik Di Shah Alam
 
Case history bindi v04
Case history bindi v04Case history bindi v04
Case history bindi v04
 
Global Strategy (MNC's in China, Russia , UK, Srilanka)
Global Strategy (MNC's in China, Russia , UK, Srilanka)Global Strategy (MNC's in China, Russia , UK, Srilanka)
Global Strategy (MNC's in China, Russia , UK, Srilanka)
 
Gestione documentale e dei processi. Arxivar per Fassa Bortolo
Gestione documentale e dei processi. Arxivar per Fassa BortoloGestione documentale e dei processi. Arxivar per Fassa Bortolo
Gestione documentale e dei processi. Arxivar per Fassa Bortolo
 
25 proven social media marketing tips for global pet food & pet care industries
25 proven social media marketing tips for global pet food & pet care industries25 proven social media marketing tips for global pet food & pet care industries
25 proven social media marketing tips for global pet food & pet care industries
 
Arxivar per Publicis. Automazione del ciclo passivo dei documenti e conservaz...
Arxivar per Publicis. Automazione del ciclo passivo dei documenti e conservaz...Arxivar per Publicis. Automazione del ciclo passivo dei documenti e conservaz...
Arxivar per Publicis. Automazione del ciclo passivo dei documenti e conservaz...
 
E commerce, Ventajas y Desventajas
E commerce, Ventajas y DesventajasE commerce, Ventajas y Desventajas
E commerce, Ventajas y Desventajas
 
1316046진세라
1316046진세라1316046진세라
1316046진세라
 

Similar to mlsys_portrait

Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
Editor IJARCET
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
Editor IJARCET
 

Similar to mlsys_portrait (20)

IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
 
E132833
E132833E132833
E132833
 
RESEARCH ON DISTRIBUTED SOFTWARE TESTING PLATFORM BASED ON CLOUD RESOURCE
RESEARCH ON DISTRIBUTED SOFTWARE TESTING  PLATFORM BASED ON CLOUD RESOURCERESEARCH ON DISTRIBUTED SOFTWARE TESTING  PLATFORM BASED ON CLOUD RESOURCE
RESEARCH ON DISTRIBUTED SOFTWARE TESTING PLATFORM BASED ON CLOUD RESOURCE
 
Timetable management system(chapter 3)
Timetable management system(chapter 3)Timetable management system(chapter 3)
Timetable management system(chapter 3)
 
IRJET- Semantic Analysis of Online Customer Queries
IRJET-  	  Semantic Analysis of Online Customer QueriesIRJET-  	  Semantic Analysis of Online Customer Queries
IRJET- Semantic Analysis of Online Customer Queries
 
Apsec 2014 Presentation
Apsec 2014 PresentationApsec 2014 Presentation
Apsec 2014 Presentation
 
the application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEthe application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEE
 
EXPERIMENTAL EVALUATION AND RESULT DISCUSSION OF METAMORPHIC TESTING AUTOMATI...
EXPERIMENTAL EVALUATION AND RESULT DISCUSSION OF METAMORPHIC TESTING AUTOMATI...EXPERIMENTAL EVALUATION AND RESULT DISCUSSION OF METAMORPHIC TESTING AUTOMATI...
EXPERIMENTAL EVALUATION AND RESULT DISCUSSION OF METAMORPHIC TESTING AUTOMATI...
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicImproved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniques
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine Learning
 
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
Test Case Optimization and Redundancy Reduction Using GA and Neural Networks
 
IRJET - House Price Prediction using Machine Learning and RPA
 IRJET - House Price Prediction using Machine Learning and RPA IRJET - House Price Prediction using Machine Learning and RPA
IRJET - House Price Prediction using Machine Learning and RPA
 

mlsys_portrait

  • 1. Evaluation System for a Bayesian Optimization Service Ian Dewancker1 , Michael McCourt1 , Scott Clark1 , Patrick Hayes1 , Alexandra Johnson1 and George Ke2 1 SigOpt, 2 University of Waterloo Objectives An evaluation framework for rigorously testing and quickly understanding the impact of changes to the SigOpt optimization service. • Perform end-to-end testing of service • Compare between algorithm versions • Compare against external baselines Introduction SigOpt o ers an optimization service to help cus- tomers tune complex systems, simulations and mod- els. Our optimization engine applies several concepts from Bayesian optimization [1] and machine learning to optimize customers metrics as quickly as possible. In particular, we consider problems where the maxi- mum is sought for an expensive function f : X æ R, xopt = arg max xœX f(x), SigOpt’s core optimization engine is a closed-source fork of the open-source MOE project [2]. The SigOpt service supports a succinct set of HTTP API end- points for optimizing objective functions. Figure 1: Hypothetical optimization methods A and B both achieve the same Best Found of 0.97 after 40 evaluations. Method A however finds the optimum in fewer evaluations. Metrics The performance metrics we consider for comparisons on a given objective function are the best value seen by the end of the optimization (Best Found), and the area under the best seen curve (AUC). The AUC metric can help to better di erentiate per- formance, as shown in Figure 1. Optimization al- gorithms are run at least 20 times on each function. The distributions of the performance metrics are com- pared using the non-parametric Mann-Whitney U test, used in previous studies of Bayesian optimiza- tion methods [3] [4]. Benchmark Suite The tests for our evaluation system consist of closed- form optimization functions [5] which are extensions of an earlier set proposed for black-box optimization evaluation. The benchmark suite is open source. Infrastructure Lightweight function evaluation processes are run concurrently on a master machine with many cores. External baseline methods including TPE, SMAC, Spearmint, particle swarm optimization and random search are run locally on master. Figure 2: Architecture of evaluation system infrastructure Each evaluation process communicates with an on- demand cluster of SigOpt API workers, which in turn co-ordinate each optimization request with a cluster of instances running the SigOpt optimization engine as well as a database used by the service. This design closely follows the architecture of the production sys- tem. Result data is archived in a simple, extensible JSON format. The result data and summarizations are then visualized on a small web application. Visualization Tools To better summarize the relative performance be- tween two methods (A, B) using metric M, we cre- ate histograms of the p-values returned by the Mann- Whitney U tests. Test functions are first partitioned by the relative expected value of the metric M, then histograms are generated for the two sets. wins(A > B)M = { func | E[MA] > E[MB] } wins(B > A)M = { func | E[MB] > E[MA] } Figure 3: Above: Histograms for two comparable methods. Below: Method B (in green) has highly significant wins on test functions, method A (in red) has only low significance wins For inspecting comparative performance on individ- ual functions, we plot the best value of the objective metric seen after each function evaluation. Figure 4: Comparative best seen trace on test function Figure 4 shows the best seen traces of SigOpt (in red) and particle swarm optimization (in blue) on a given test function. Conclusions Our evaluation system has been become a valuable analysis tool when considering algorithm or system changes to the SigOpt optimization service. Data driven performance analysis is an e ective way to en- able faster iteration and evaluation of a wide spec- trum of ideas. The system continues to guide im- provements to the core SigOpt service by providing empirical comparisons between internal changes, al- ternative optimization methods, as well helping to expose errors and bugs. References [1] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012. [2] Scott Clark, Eric Liu, Peter Frazier, JiaLei Wang, Deniz Oktay, and Norases Vesdapunt. MOE: A global, black box optimization engine for real world metric optimization. https://github.com/Yelp/MOE, 2014. [3] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization, pages 507–523. Springer, 2011. [4] Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, and George Ke. A strategy for ranking optimization methods using multiple criteria. In ICML AutoML Workshop, 2016. [5] Michael McCourt. Optimization Test Functions. https://github.com/sigopt/evalset, 2016. Contact Information • Web: http://www.sigopt.com • Email: contact@sigopt.com