This talk discusses the intuition behind Bayesian optimization with and without multiple metrics. Tobias Andreassen, who supports a number of our systematic trading customers, presented the intuition behind Bayesian optimization for model optimization with a single or multiple (often competing) metrics. Many times it makes sense to analyze a second metric to avoid myopic training runs that overfit on your data, or otherwise don’t represent or impede performance in real-world scenarios.

Tuning for Systematic Trading: Talk 1

  1. 1. SigOpt. Conﬁdential. Talk #1 Intuition behind Bayesian optimization with and without multiple metrics SigOpt Talk Series Tuning for Systematic Trading Tobias Andreasen — Machine Learning Engineer Tuesday, March 24, 2020
  2. 2. SigOpt. Conﬁdential. Abstract SigOpt provides an extensive set of advanced features, which help you, the expert, save time while increasing model performance via experimentation. Today, we will start out this talking series by giving an overview of general black box optimization, eﬃcient bayesian optimization and end up extending this to multiple competing metrics.
  3. 3. SigOpt. Conﬁdential. Motivation 1. Overview of SigOpt 2. How to solve a black box optimization problem 3. Why you should optimize using multiple competing metrics
  4. 4. SigOpt. Conﬁdential. Overview of SigOpt1
  5. 5. SigOpt. Conﬁdential. Accelerate and amplify the impact of modelers everywhere
  6. 6. SigOpt. Conﬁdential. Experiment Insights Optimization Engine Track, analyze and reproduce any model to improve the productivity of your modeling Enterprise Platform Automate hyperparameter tuning to maximize the performance and impact of your models Standardize experimentation across any combination of library, infrastructure, model or task On-Premise Hybrid/Multi Solution: Experiment, optimize and analyze at scale 6
  7. 7. SigOpt. Conﬁdential. SigOpt Features Enterprise Platform Optimization Engine Experiment Insights Reproducibility Intuitive web dashboards Cross-team permissions and collaboration Advanced experiment visualizations Usage insights Parameter importance analysis Multimetric optimization Continuous, categorical, or integer parameters Constraints and failure regions Up to 10k observations, 100 parameters Multitask optimization and high parallelism Conditional parameters Infrastructure agnostic REST API Parallel Resource Scheduler Black-Box Interface Tunes without accessing any data Libraries for Python, Java, R, and MATLAB
  8. 8. SigOpt. Conﬁdential. How to solve a black box optimization problem2
  9. 9. SigOpt. Conﬁdential. Why black box optimization? SigOpt was designed to empower you, the practitioner, to re-deﬁne most machine learning problems as black box optimization problems with the beneﬁt of: • Ampliﬁed performance — incremental gains in accuracy or other success metrics • Productivity gains — a consistent platform across tasks that facilitates sharing • Accelerated modeling — early elimination of non-scalable tasks • Compute eﬃciency — continuous, full utilization of infrastructure SigOpt uses an ensemble of Bayesian and Global Optimization methods to solve these black box optimization problems. 9 Black Box Optimization
  10. 10. SigOpt. Conﬁdential. Your ﬁrewall Training Data AI, ML, DL, Simulation Model Model Evaluation or Backtest Testing Data New Conﬁgurations Objective Metric Better Results EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Conﬁguration Parameters or Hyperparameters Black Box Optimization
  11. 11. SigOpt. Conﬁdential. EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Black Box Optimization Better Results
  12. 12. SigOpt. Conﬁdential. Hyperparameter Optimization Model Tuning Grid Search Random Search Bayesian Optimization Training & Tuning Evolutionary Algorithms Deep Learning Architecture Search Hyperparameter Search
  13. 13. SigOpt. Conﬁdential. Pro Con Manual Search Leverages expertise Not scalable, inconsistent Grid Search Simple to implement Not scalable, often infeasible Random Search Scalable Ineﬃcient Evolutionary Algorithms Eﬀective at architecture search Very resource intensive Bayesian Optimization Eﬃcient, eﬀective Can be tough to parallelize
  14. 14. SigOpt. Conﬁdential. Pro Con Manual Search Leverages expertise Not scalable, inconsistent Grid Search Simple to implement Not scalable, often infeasible Random Search Scalable Ineﬃcient Evolutionary Algorithms Eﬀective at architecture search Very resource intensive Bayesian Optimization Eﬃcient, eﬀective Can be tough to parallelize
  15. 15. SigOpt. Conﬁdential. Pro Con Manual Search Leverages expertise Not scalable, inconsistent Grid Search Simple to implement Not scalable, often infeasible Random Search Scalable Inefficient Evolutionary Algorithms Effective at architecture search Very resource intensive Bayesian Optimization Efficient, effective Can be tough to parallelize
  16. 16. SigOpt. Conﬁdential. Pro Con Manual Search Leverages expertise Not scalable, inconsistent Grid Search Simple to implement Not scalable, often infeasible Random Search Scalable Inefficient Evolutionary Algorithms Effective at architecture search Very resource intensive Bayesian Optimization Efficient, effective Can be tough to parallelize
  17. 17. SigOpt. Conﬁdential. Pro Con Manual Search Leverages expertise Not scalable, inconsistent Grid Search Simple to implement Not scalable, often infeasible Random Search Scalable Inefficient Evolutionary Algorithms Effective at architecture search Very resource intensive Bayesian Optimization Efficient, effective Can be tough to parallelize
  18. 18. SigOpt. Conﬁdential. A graphical depiction of the iterative process 18 Build a statistical model Sequential Model Based Optimization (SMBO)
  19. 19. SigOpt. Conﬁdential. A graphical depiction of the iterative process 19 Build a statistical model Choose the next point to maximize the acquisition function Sequential Model Based Optimization (SMBO)
  20. 20. SigOpt. Conﬁdential. A graphical depiction of the iterative process 20 Build a statistical model Build a statistical model Choose the next point to maximize the acquisition function Sequential Model Based Optimization (SMBO)
  21. 21. SigOpt. Conﬁdential. A graphical depiction of the iterative process 21 Build a statistical model Build a statistical model Choose the next point to maximize the acquisition function Sequential Model Based Optimization (SMBO) Choose the next point to maximize the acquisition function
  22. 22. SigOpt. Conﬁdential. Gaussian processes: a powerful tool for modeling in spatial statistics A standard tool for building statistical models is the Gaussian process [Fasshauer et al, 2015, Fraizer, 2018]. • Assume that function values are jointly normally distributed. • Apply prior beliefs about mean behavior and covariance between observations. • Posterior beliefs about unobserved locations can be computed rather easily. Diﬀerent prior assumptions produce diﬀerent statistical models: 22 Complexity #1: Optimizing the Model in SMBO
  23. 23. SigOpt. Conﬁdential. Acquisition function: given a model, how should we choose the next point? An acquisition function is a strategy for deﬁning the utility of a future sample, given the current samples, while balancing exploration and exploitation [Shahriari et al, 2016]. Diﬀerent acquisition functions choose diﬀerent points (EI, PI, KG, etc.). 23 Complexity #2: Optimizing the SMBO Process Exploration: Learning about the whole function f Exploitation: Further resolving regions where good f values have already been observed
  24. 24. SigOpt. Conﬁdential. RESTAPI The SigOpt API handles the complexity Better Results
  25. 25. SigOpt Blog Posts: Intuition Behind Bayesian Optimization Some Relevant Blog Posts ● Intuition Behind Covariance Kernels ● Approximation of Data ● Likelihood for Gaussian Processes ● Proﬁle Likelihood vs. Kriging Variance ● Intuition behind Gaussian Processes ● Dealing with Troublesome Metrics Find more blog posts visit: https://sigopt.com/blog/
  26. 26. SigOpt. Conﬁdential. Why you should optimize using multiple competing metrics 3
  27. 27. SigOpt. Conﬁdential. Why optimize against multiple competing metrics? SigOpt allows the user to specify multiple competing metrics for either optimization or tracking to better align success of the experimentation with business value: • Multiple metrics — The option to deﬁning multiple metrics, which can yield new and interesting results • Insights, metric storage — Insights through tracking of optimized and unoptimized metrics • Thresholds — The ability to deﬁne thresholds for success to better guide the optimizer This process gives models that deliver more reliable business outcomes by helping optimally make the tradeoﬀs inherent in the modeling process and real world applications. 27 Optimizing Multiple Competing Metrics
  28. 28. SigOpt. Conﬁdential. Your ﬁrewall Training Data AI, ML, DL, Simulation Models Model Evaluation or Backtest Testing Data New Conﬁgurations Objective Metric EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Conﬁguration Parameters or Hyperparameters Optimizing Multiple Competing Metrics Better Results
  29. 29. SigOpt. Conﬁdential. EXPERIMENT INSIGHTS Track, organize, analyze and reproduce any model ENTERPRISE PLATFORM Built to ﬁt any stack and scale with your needs OPTIMIZATION ENGINE Explore and exploit with a variety of techniques RESTAPI Optimizing Multiple Competing Metrics Multiple Optimized and Unoptimized Metrics Better Results
  30. 30. SigOpt. Conﬁdential. Looking for the right Balance Balancing competing metrics to ﬁnd the Pareto frontier Most problems of practical relevance involve 2 or more competing metrics. • Neural networks — Balancing accuracy and inference time • Materials design — Balancing performance and maintenance cost • Algo trading — Balancing Sharpe Ratio and book size In a situation with Competing Metrics, the set of all eﬃcient points (the Pareto frontier) is the solution. 30 Pareto Frontier Feasible Region
  31. 31. SigOpt. Conﬁdential. Balancing competing metrics to ﬁnd the Pareto frontier As shown before, the goal in multi objective or multi criteria optimization the goal is to ﬁnd the optimal set of solution across a set of function [Knowles, 2006]. • This is formulated as ﬁnding the maximum of the set functions f1 to fn over the same domain x • No single point exist as the solution, but we are actively trying to maximize the size of the eﬃcient frontier, which represent the set of solutions • The solution is found through scalarization methods such as convex combination and epsilon-constraint 31 Finding the Best Tradeoﬀs
  32. 32. SigOpt. Conﬁdential. Optimizing Multiple Competing Metrics: First Approach Intuition: Convex Combination Scalarization Idea: If we can convert the multimetric problem into a scalar problem, we can solve this problem using Bayesian optimization. One possible scalarization is through a convex combination of the objectives. 32
  33. 33. SigOpt. Conﬁdential. Balancing competing metrics to ﬁnd the Pareto frontier with threshold As shown before, the goal in multi objective or multi criteria optimization the goal is to ﬁnd the optimal set of solution across a set of function [Knowles, 2006]. • This is formulated as ﬁnding the maximum of the set functions f1 to fn over the same domain x • No single point exist as the solution, but we are actively trying to maximize the size of the eﬃcient frontier, which represent the set of solutions • The solution is found through constrained scalarization methods such as convex combination and epsilon-constraint • Allow users to change constraints as the search progresses [Letham et al, 2019] 33 Optimizing Multiple Competing Metrics: Deeper Approach
  34. 34. SigOpt. Conﬁdential. Rephrase the Entire Problem Constrained Scalarization 1. Model all metrics independently. • Requires no prior beliefs of how metrics interact. • Missing data removed on a per metric basis if unrecorded. 2. Expose the eﬃcient frontier through constrained scalar optimization. • Enforce user constraints when given. • Iterate through sub constraints to better resolve eﬃcient frontier, if desired. • Consider diﬀerent regions of the frontier when parallelism is possible. 3. Allow users to change constraints as the search progresses. • Allow the problems/goals to evolve as the user’s understanding changes. Constraints give customers more control over the circumstances and more ability to understand our actions. 34 Variation on Expected Improvement [Letham et al, 2019]
  35. 35. SigOpt. Conﬁdential. Intuition: Scalarization and Epsilon Constraints 35 Use Epsilon Constraints to Focus Search
  36. 36. SigOpt. Conﬁdential. Intuition: Constrained Scalarization and Epsilon Constraints 36 Add Thresholds to Limit Frontier
  37. 37. Multimetric Use Case 1 ● Category: Time Series ● Task: Sequence Classiﬁcation ● Model: CNN ● Data: Diatom Images ● Analysis: Accuracy-Time Tradeoﬀ ● Result: Similar accuracy, 33% the inference time Multimetric Use Case 2 ● Category: NLP ● Task: Sentiment Analysis ● Model: CNN ● Data: Rotten Tomatoes Movie Reviews ● Analysis: Accuracy-Time Tradeoﬀ ● Result: ~2% in accuracy versus 50% of training time Learn more https://devblogs.nvidia.com/sigopt-deep-learning- hyperparameter-optimization/ Use Case: Balancing Speed & Accuracy in Deep Learning
  38. 38. SigOpt. Conﬁdential. Tobias Andreasen | tobias@sigopt.com For more information visit: https://sigopt.com/research/ Questions?

