SlideShare a Scribd company logo
Practical Solutions to Exploration Problems
Sam Daulton
Core Data Science, Facebook
Adaptive Experimentation Practical Solutions to Exploration Problems 1 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 2 / 68
Adaptive Experimentation Team
• Horizontal R&D team within
Facebook
• Goal: radically change the way
people run experiments and
develop systems:
• Reduce threshold for
experimentation
• Use RL to robustly solve
explore/exploit problems
• Develop tools to improve and
automate decision-making
under multiple and/or
constrained objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 3 / 68
Adaptive Experimentation Team
Adaptive Experimentation Practical Solutions to Exploration Problems 4 / 68
Spectrum of Automation
Adaptive Experimentation Practical Solutions to Exploration Problems 5 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 6 / 68
Heterogeneous Connections and Devices
Adaptive Experimentation Practical Solutions to Exploration Problems 7 / 68
Homogeneous Status Quo Policy
Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
Homogeneous Status Quo Policy
Idea: What if we loaded different numbers of stories depending on the
connection type?
Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
Potential Contextualized Policy
Idea: What if we loaded more posts for better connections types?
Adaptive Experimentation Practical Solutions to Exploration Problems 9 / 68
Potential Contextualized Policy - Opposite
Idea: What if we loaded fewer posts for better connections types?
Adaptive Experimentation Practical Solutions to Exploration Problems 10 / 68
Potential Contextualized Policies
Suppose that for each connection type c:
• We could fetch any number of posts xc ∈ [2, 24]
• Then there are 224 = 234, 256 possible configurations to test!
Adaptive Experimentation Practical Solutions to Exploration Problems 11 / 68
Policies as Black-box Functions
The average treatment effect over all individuals can be expected to be
some smooth function of the policy table x = [x1, ..., xk]:
f(x) : Rk
→ R
Adaptive Experimentation Practical Solutions to Exploration Problems 12 / 68
Black-box Function View of RL
• Turns ”full RL” problem into an infinite-armed bandit problem
πx∗ = arg max
x
g(f(x))
• Advantages:
• Does not require estimating value functions, state transition functions,
or inference about unobserved states
• Involves virtually no logging of actions, states, or intermediate rewards
• Allows for direct maximization of multiple, delayed rewards
Question: How can we make predictions about long-term outcomes from
limited number of vector-valued policies?
Adaptive Experimentation Practical Solutions to Exploration Problems 13 / 68
Gaussian Process (GP) Priors
Adaptive Experimentation Practical Solutions to Exploration Problems 14 / 68
Gaussian Process (GP) Priors
Adaptive Experimentation Practical Solutions to Exploration Problems 15 / 68
Gaussian Process (GP) Posteriors
Adaptive Experimentation Practical Solutions to Exploration Problems 16 / 68
Gaussian Process (GP) Posteriors
Adaptive Experimentation Practical Solutions to Exploration Problems 17 / 68
Gaussian Process (GP) Posteriors
GP regression gives well-calibrated posterior predictive intervals that are
easy to compute
Adaptive Experimentation Practical Solutions to Exploration Problems 18 / 68
Gaussian Process (GP) Regression
In practice, we find that GP surrogate models fit the data well for many
online experiments.
Adaptive Experimentation Practical Solutions to Exploration Problems 19 / 68
Other Examples with Continuous Action Spaces
• Value models governing ranking policies: e.g.
rank(Z) = x1P(click|Z) + x2Zx3
num friends + f(P(spam|Z)/x4) + ...
• Bit-rate controllers for video and audio streaming
• Data retrieval policies for ML backends
Question: How do we use GP surrogate models to guide the
explore-exploit trade-off?
Adaptive Experimentation Practical Solutions to Exploration Problems 20 / 68
Bayesian Optimization
Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 21 / 68
Bayesian Optimization
Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 22 / 68
Bayesian Optimization
Round 1
Adaptive Experimentation Practical Solutions to Exploration Problems 23 / 68
Bayesian Optimization
Round 2
Adaptive Experimentation Practical Solutions to Exploration Problems 24 / 68
Bayesian Optimization
Round 3
Adaptive Experimentation Practical Solutions to Exploration Problems 25 / 68
Bayesian Optimization
q-Batch Bayesian Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 26 / 68
Bayesian Optimization
Response surface is maximized sequentially
• Models tell us which regions should be considered for further
assessment
Adaptive Experimentation Practical Solutions to Exploration Problems 27 / 68
Bayesian Optimization
Algorithm 1 BayesianOptimization
1: Run N random initial arms
2: for t = 0 to T do
3: Fit GP model to data
4: Use acquistion function select candidates C
5: Evaluate C on black box function
6: Add new observations to dataset
7: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 28 / 68
Alternatives
Grid Search (Expensive - 81 arms)
Adaptive Experimentation Practical Solutions to Exploration Problems 29 / 68
Alternatives
Random Search (Cheaper - 25 arms)
• Maxima can be deduced with only a few, smartly chosen arms
Adaptive Experimentation Practical Solutions to Exploration Problems 30 / 68
Competing Objectives
• Product teams are used to running an A/B test and observing the
outcomes.
• Often, there are multiple competing objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 31 / 68
Competing Objectives
If we want full automation, we need to specify more information in
advance: ideally, ”the” scalarized objective
Adaptive Experimentation Practical Solutions to Exploration Problems 32 / 68
Competing Objectives
Decision Makers Have Multiple Objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 33 / 68
Competing Objectives
Decision makers don’t like scalarizations: e.g.
objective = −0.8 · cpu + 1.1 · time spent
Adaptive Experimentation Practical Solutions to Exploration Problems 34 / 68
Competing Objectives
Decision makers prefer constraints:
min(cpu) subject to time spent > 0.7
Adaptive Experimentation Practical Solutions to Exploration Problems 35 / 68
Practical Challenges
• Constrained optimization
• Observations often have high variance, leading to potentially large
measurement error
• High noise levels can degrade the performance of many common
acquisition functions including Expected Improvement
Adaptive Experimentation Practical Solutions to Exploration Problems 36 / 68
Solution
For more details, see
• Constrained Bayesian Optimization with Noisy Experiments Bayesian
Analysis 2019. Letham, Karrer, Ottoni, & Bakshy
Adaptive Experimentation Practical Solutions to Exploration Problems 37 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 38 / 68
Value Model Tuning
• Ranking teams use value models, combine multiple predictive models
and features, e.g.
rank(Z) = x1P(click|Z) + x2Zx3
num friends + f(P(spam|Z)/x4) + ...
• Not feasible to run sufficiently powered experiments with 20+ arms,
so the team developed a simulator
Adaptive Experimentation Practical Solutions to Exploration Problems 39 / 68
Simulation Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 40 / 68
Biased Simulator
Adaptive Experimentation Practical Solutions to Exploration Problems 41 / 68
Debiasing Simulations with Multi-Task Models
Adaptive Experimentation Practical Solutions to Exploration Problems 42 / 68
Debiasing Simulations with Multi-Task Models
Adaptive Experimentation Practical Solutions to Exploration Problems 43 / 68
Multi-Task Bayesian Optimization Loop
Algorithm 2 MultiTaskBayesianOptimization
1: Run N random arms online
2: Run M random arms offline with M > N
3: for t = 0 to T do
4: Fit MT-GP model to all data, with each batch as separate task
5: Use NEI to generate q candidates C (e.g. q = 30)
6: Run C on the simulator, fit GP model again
7: Use NEI to generate candidates to run online
8: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 44 / 68
Example of Multi-Task Bayesian Optimization
0 5 10 15 20 25 30 35 40
Iteration
−1
0
1
2
Outcome
Objective
−2
−1
0
1
Outcome
Constraints
−2
−1
0
1
2
Outcome
0 5 10 15 20 25 30 35 40
Iteration
−2
0
2
Outcome
Adaptive Experimentation Practical Solutions to Exploration Problems 45 / 68
Paper
For more details, see
• See Bayesian Optimization for Policy Search via Online-Offline
Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv
1904.01049
Adaptive Experimentation Practical Solutions to Exploration Problems 46 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 47 / 68
Open Source Tools
Adaptive Experimentation Practical Solutions to Exploration Problems 48 / 68
Research to Production
Adaptive Experimentation Practical Solutions to Exploration Problems 49 / 68
Simple APIs
Adaptive Experimentation Practical Solutions to Exploration Problems 50 / 68
Adaptive Experimentation in Practice
Adaptive Experimentation Practical Solutions to Exploration Problems 51 / 68
Experiment Understanding
Adaptive Experimentation Practical Solutions to Exploration Problems 52 / 68
BoTorch
Adaptive Experimentation Practical Solutions to Exploration Problems 53 / 68
BoTorch: Building Blocks
Adaptive Experimentation Practical Solutions to Exploration Problems 54 / 68
Improving Researcher Efficiency
Adaptive Experimentation Practical Solutions to Exploration Problems 55 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 56 / 68
Video Upload Transcoding Optimization
Problem
• System receives requests to upload videos of different source qualities
and file sizes from a variety of network connections and devices.
• To ensure high reliability, a video may be transcoded to be uploaded
at a lower quality
• For each video upload request, we have features about
• the video: file size, duration, source resolution
• the network: country, network type, download speed
• the device
Goal
• Maximize quality preserved without decreasing reliability
Adaptive Experimentation Practical Solutions to Exploration Problems 57 / 68
Video Upload Transcoding - CB Problem
• Context: features about video, network, device
• Actions: 360p, 480p, 720p, 1080p
• Outcomes: reliability y(x, a)
• Rewards: ?? some function R(x, a, y)
Adaptive Experimentation Practical Solutions to Exploration Problems 58 / 68
Approach - Bandit Algorithmm
Thompson Sampling
• Works well in batch mode
• Hyper-parameter free exploration
• Always ”picks the best” codec: picks codecs with probability
proportional to it being the best
Adaptive Experimentation Practical Solutions to Exploration Problems 59 / 68
Approach - Modeling
Bayesian Linear Model
• Bernoulli likelihood to predict reliability
• Using a neural network feature extractor
• Simple two-layer MLP (50, 4) trained via SGD
• Last layer is a stochastic variational GP with a linear kernel
• Trained via stochastic variational inference using 1000 inducing points
according to space-filling design
Adaptive Experimentation Practical Solutions to Exploration Problems 60 / 68
Thompson Sampling
Algorithm 3 ThompsonSampling
Input: discrete set of actions A, distribution over models P0(f)
1: for t = 0 to T do
2: Sample model ˜ft ∼ Pt(f|X, y)
3: Select an action at ← arg maxa∈A E(rt|xt, a, ˜ft)
4: Observe reward rt
5: Update distribution Pt+1(f)
6: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 61 / 68
Issues with Vanilla Thompson Sampling
• Thompson sampling does not account for the constraint
• Change in reliability must be non-negative
• Unclear how to optimally specify reward parameterization
Adaptive Experimentation Practical Solutions to Exploration Problems 62 / 68
Constrained Thompson Sampling
Algorithm 4 ConstrainedThompsonSampling
1: Input: discrete set of actions A, distribution over models P0(f)
2: for t = 0 to T do
3: Receive context xt
4: Sample model ˜ft ∼ Pt(f|X, y)
5: for a ∈ A do
6: Estimate outcomes ˜ft(xt, a)
7: end for
8: Fetch action under baseline policy b ← πb(xt)
9: Filter feasible actions: Afeas ← {a ∈ A| ˜ft(xt, a) ≥ ε · ˜ft(xt, b)}
10: Select an action at ← arg maxa∈Afeas
E(rt|xt, a, ˜ft)
11: Observe outcome yt
12: Update distribution Pt+1(f)
13: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 63 / 68
Reward Shaping Setup
Reward Shaping:
• Reward is 0 if the upload is a failure
• Reward is fixed at 1 for a 360p upload success:
• Reward is monotonically increasing with quality:
R(y = 1, a) = 1 +
a ≤a
wa
where
wi ∈ (0.0, 0.2]
Safety Constraint: ε ∈ [0.95, 1.0]
Adaptive Experimentation Practical Solutions to Exploration Problems 64 / 68
Reward Shaping Optimization
• Teams care about top-line outcomes:
• Reliability: mean reliability per user
• Quality preserved: mean quality (e.g., 1080p preserved, HD) per user
• Other outcomes: watch time, content production
• Difficult to evaluate these outcomes from purely offline data
Solution: Use Bayesian Optimization (via Ax) using online experiments
Adaptive Experimentation Practical Solutions to Exploration Problems 65 / 68
Reward Shaping Optimization
(a) 1080p quality preserved (b) Reliability
Figure: GP-modeled response surface of mean percent change in video quality
and reliability relative to the baseline policy. Each point represents a policy
parameterized by reward function hyperparameters and constraint parameter ε.
Adaptive Experimentation Practical Solutions to Exploration Problems 66 / 68
Reward Shaping Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 67 / 68
Thanks
Adaptive Experimentation Team
• Manager: Eytan Bakshy
• Constrained TS: Sam Daulton, Shaun Singh, Drew Dimmery
• BoTorch: Max Balandat, Sam Daulton, Daniel Jiang, Brian Karrer,
Ben Letham
• Ax: Kostya Kashin, Lili Dworkin, Lena Kashtelyan, Ben Letham,
Ashwin Murthy, Shaun Singh, and Drew Dimmery
Papers
• Constrained Bayesian Optimization with Noisy Experiments. Letham
et al. 2019, Bayesian Analysis.
• Bayesian Optimization for Policy Search via Online-Offline
Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv
1904.01049
Adaptive Experimentation Practical Solutions to Exploration Problems 68 / 68

More Related Content

What's hot

What's hot (20)

ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020 Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 

Similar to Facebook Talk at Netflix ML Platform meetup Sep 2019

04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
butest
 
Sca a sine cosine algorithm for solving optimization problems
Sca a sine cosine algorithm for solving optimization problemsSca a sine cosine algorithm for solving optimization problems
Sca a sine cosine algorithm for solving optimization problems
laxmanLaxman03209
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 

Similar to Facebook Talk at Netflix ML Platform meetup Sep 2019 (20)

VET4SBO Level 2 module 2 - unit 1 - v1.0 en
VET4SBO Level 2   module 2 - unit 1 - v1.0 enVET4SBO Level 2   module 2 - unit 1 - v1.0 en
VET4SBO Level 2 module 2 - unit 1 - v1.0 en
 
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGA GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
OR Ndejje Univ.pptx
OR Ndejje Univ.pptxOR Ndejje Univ.pptx
OR Ndejje Univ.pptx
 
OR Ndejje Univ (1).pptx
OR Ndejje Univ (1).pptxOR Ndejje Univ (1).pptx
OR Ndejje Univ (1).pptx
 
modeling.ppt
modeling.pptmodeling.ppt
modeling.ppt
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Online Tuning of Large Scale Recommendation Systems
Online Tuning of Large Scale Recommendation SystemsOnline Tuning of Large Scale Recommendation Systems
Online Tuning of Large Scale Recommendation Systems
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Development of Multi-Level ROM
Development of Multi-Level ROMDevelopment of Multi-Level ROM
Development of Multi-Level ROM
 
Supply Chain Management - Optimization technology
Supply Chain Management - Optimization technologySupply Chain Management - Optimization technology
Supply Chain Management - Optimization technology
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
ANSSummer2015
ANSSummer2015ANSSummer2015
ANSSummer2015
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01
 
Sca a sine cosine algorithm for solving optimization problems
Sca a sine cosine algorithm for solving optimization problemsSca a sine cosine algorithm for solving optimization problems
Sca a sine cosine algorithm for solving optimization problems
 
"Computational Support for Functionality Selection in Interaction Design" CHI...
"Computational Support for Functionality Selection in Interaction Design" CHI..."Computational Support for Functionality Selection in Interaction Design" CHI...
"Computational Support for Functionality Selection in Interaction Design" CHI...
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 

Facebook Talk at Netflix ML Platform meetup Sep 2019

  • 1. Practical Solutions to Exploration Problems Sam Daulton Core Data Science, Facebook Adaptive Experimentation Practical Solutions to Exploration Problems 1 / 68
  • 2. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 2 / 68
  • 3. Adaptive Experimentation Team • Horizontal R&D team within Facebook • Goal: radically change the way people run experiments and develop systems: • Reduce threshold for experimentation • Use RL to robustly solve explore/exploit problems • Develop tools to improve and automate decision-making under multiple and/or constrained objectives Adaptive Experimentation Practical Solutions to Exploration Problems 3 / 68
  • 4. Adaptive Experimentation Team Adaptive Experimentation Practical Solutions to Exploration Problems 4 / 68
  • 5. Spectrum of Automation Adaptive Experimentation Practical Solutions to Exploration Problems 5 / 68
  • 6. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 6 / 68
  • 7. Heterogeneous Connections and Devices Adaptive Experimentation Practical Solutions to Exploration Problems 7 / 68
  • 8. Homogeneous Status Quo Policy Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
  • 9. Homogeneous Status Quo Policy Idea: What if we loaded different numbers of stories depending on the connection type? Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
  • 10. Potential Contextualized Policy Idea: What if we loaded more posts for better connections types? Adaptive Experimentation Practical Solutions to Exploration Problems 9 / 68
  • 11. Potential Contextualized Policy - Opposite Idea: What if we loaded fewer posts for better connections types? Adaptive Experimentation Practical Solutions to Exploration Problems 10 / 68
  • 12. Potential Contextualized Policies Suppose that for each connection type c: • We could fetch any number of posts xc ∈ [2, 24] • Then there are 224 = 234, 256 possible configurations to test! Adaptive Experimentation Practical Solutions to Exploration Problems 11 / 68
  • 13. Policies as Black-box Functions The average treatment effect over all individuals can be expected to be some smooth function of the policy table x = [x1, ..., xk]: f(x) : Rk → R Adaptive Experimentation Practical Solutions to Exploration Problems 12 / 68
  • 14. Black-box Function View of RL • Turns ”full RL” problem into an infinite-armed bandit problem πx∗ = arg max x g(f(x)) • Advantages: • Does not require estimating value functions, state transition functions, or inference about unobserved states • Involves virtually no logging of actions, states, or intermediate rewards • Allows for direct maximization of multiple, delayed rewards Question: How can we make predictions about long-term outcomes from limited number of vector-valued policies? Adaptive Experimentation Practical Solutions to Exploration Problems 13 / 68
  • 15. Gaussian Process (GP) Priors Adaptive Experimentation Practical Solutions to Exploration Problems 14 / 68
  • 16. Gaussian Process (GP) Priors Adaptive Experimentation Practical Solutions to Exploration Problems 15 / 68
  • 17. Gaussian Process (GP) Posteriors Adaptive Experimentation Practical Solutions to Exploration Problems 16 / 68
  • 18. Gaussian Process (GP) Posteriors Adaptive Experimentation Practical Solutions to Exploration Problems 17 / 68
  • 19. Gaussian Process (GP) Posteriors GP regression gives well-calibrated posterior predictive intervals that are easy to compute Adaptive Experimentation Practical Solutions to Exploration Problems 18 / 68
  • 20. Gaussian Process (GP) Regression In practice, we find that GP surrogate models fit the data well for many online experiments. Adaptive Experimentation Practical Solutions to Exploration Problems 19 / 68
  • 21. Other Examples with Continuous Action Spaces • Value models governing ranking policies: e.g. rank(Z) = x1P(click|Z) + x2Zx3 num friends + f(P(spam|Z)/x4) + ... • Bit-rate controllers for video and audio streaming • Data retrieval policies for ML backends Question: How do we use GP surrogate models to guide the explore-exploit trade-off? Adaptive Experimentation Practical Solutions to Exploration Problems 20 / 68
  • 22. Bayesian Optimization Setup Adaptive Experimentation Practical Solutions to Exploration Problems 21 / 68
  • 23. Bayesian Optimization Setup Adaptive Experimentation Practical Solutions to Exploration Problems 22 / 68
  • 24. Bayesian Optimization Round 1 Adaptive Experimentation Practical Solutions to Exploration Problems 23 / 68
  • 25. Bayesian Optimization Round 2 Adaptive Experimentation Practical Solutions to Exploration Problems 24 / 68
  • 26. Bayesian Optimization Round 3 Adaptive Experimentation Practical Solutions to Exploration Problems 25 / 68
  • 27. Bayesian Optimization q-Batch Bayesian Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 26 / 68
  • 28. Bayesian Optimization Response surface is maximized sequentially • Models tell us which regions should be considered for further assessment Adaptive Experimentation Practical Solutions to Exploration Problems 27 / 68
  • 29. Bayesian Optimization Algorithm 1 BayesianOptimization 1: Run N random initial arms 2: for t = 0 to T do 3: Fit GP model to data 4: Use acquistion function select candidates C 5: Evaluate C on black box function 6: Add new observations to dataset 7: end for Adaptive Experimentation Practical Solutions to Exploration Problems 28 / 68
  • 30. Alternatives Grid Search (Expensive - 81 arms) Adaptive Experimentation Practical Solutions to Exploration Problems 29 / 68
  • 31. Alternatives Random Search (Cheaper - 25 arms) • Maxima can be deduced with only a few, smartly chosen arms Adaptive Experimentation Practical Solutions to Exploration Problems 30 / 68
  • 32. Competing Objectives • Product teams are used to running an A/B test and observing the outcomes. • Often, there are multiple competing objectives Adaptive Experimentation Practical Solutions to Exploration Problems 31 / 68
  • 33. Competing Objectives If we want full automation, we need to specify more information in advance: ideally, ”the” scalarized objective Adaptive Experimentation Practical Solutions to Exploration Problems 32 / 68
  • 34. Competing Objectives Decision Makers Have Multiple Objectives Adaptive Experimentation Practical Solutions to Exploration Problems 33 / 68
  • 35. Competing Objectives Decision makers don’t like scalarizations: e.g. objective = −0.8 · cpu + 1.1 · time spent Adaptive Experimentation Practical Solutions to Exploration Problems 34 / 68
  • 36. Competing Objectives Decision makers prefer constraints: min(cpu) subject to time spent > 0.7 Adaptive Experimentation Practical Solutions to Exploration Problems 35 / 68
  • 37. Practical Challenges • Constrained optimization • Observations often have high variance, leading to potentially large measurement error • High noise levels can degrade the performance of many common acquisition functions including Expected Improvement Adaptive Experimentation Practical Solutions to Exploration Problems 36 / 68
  • 38. Solution For more details, see • Constrained Bayesian Optimization with Noisy Experiments Bayesian Analysis 2019. Letham, Karrer, Ottoni, & Bakshy Adaptive Experimentation Practical Solutions to Exploration Problems 37 / 68
  • 39. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 38 / 68
  • 40. Value Model Tuning • Ranking teams use value models, combine multiple predictive models and features, e.g. rank(Z) = x1P(click|Z) + x2Zx3 num friends + f(P(spam|Z)/x4) + ... • Not feasible to run sufficiently powered experiments with 20+ arms, so the team developed a simulator Adaptive Experimentation Practical Solutions to Exploration Problems 39 / 68
  • 41. Simulation Setup Adaptive Experimentation Practical Solutions to Exploration Problems 40 / 68
  • 42. Biased Simulator Adaptive Experimentation Practical Solutions to Exploration Problems 41 / 68
  • 43. Debiasing Simulations with Multi-Task Models Adaptive Experimentation Practical Solutions to Exploration Problems 42 / 68
  • 44. Debiasing Simulations with Multi-Task Models Adaptive Experimentation Practical Solutions to Exploration Problems 43 / 68
  • 45. Multi-Task Bayesian Optimization Loop Algorithm 2 MultiTaskBayesianOptimization 1: Run N random arms online 2: Run M random arms offline with M > N 3: for t = 0 to T do 4: Fit MT-GP model to all data, with each batch as separate task 5: Use NEI to generate q candidates C (e.g. q = 30) 6: Run C on the simulator, fit GP model again 7: Use NEI to generate candidates to run online 8: end for Adaptive Experimentation Practical Solutions to Exploration Problems 44 / 68
  • 46. Example of Multi-Task Bayesian Optimization 0 5 10 15 20 25 30 35 40 Iteration −1 0 1 2 Outcome Objective −2 −1 0 1 Outcome Constraints −2 −1 0 1 2 Outcome 0 5 10 15 20 25 30 35 40 Iteration −2 0 2 Outcome Adaptive Experimentation Practical Solutions to Exploration Problems 45 / 68
  • 47. Paper For more details, see • See Bayesian Optimization for Policy Search via Online-Offline Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv 1904.01049 Adaptive Experimentation Practical Solutions to Exploration Problems 46 / 68
  • 48. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 47 / 68
  • 49. Open Source Tools Adaptive Experimentation Practical Solutions to Exploration Problems 48 / 68
  • 50. Research to Production Adaptive Experimentation Practical Solutions to Exploration Problems 49 / 68
  • 51. Simple APIs Adaptive Experimentation Practical Solutions to Exploration Problems 50 / 68
  • 52. Adaptive Experimentation in Practice Adaptive Experimentation Practical Solutions to Exploration Problems 51 / 68
  • 53. Experiment Understanding Adaptive Experimentation Practical Solutions to Exploration Problems 52 / 68
  • 54. BoTorch Adaptive Experimentation Practical Solutions to Exploration Problems 53 / 68
  • 55. BoTorch: Building Blocks Adaptive Experimentation Practical Solutions to Exploration Problems 54 / 68
  • 56. Improving Researcher Efficiency Adaptive Experimentation Practical Solutions to Exploration Problems 55 / 68
  • 57. Overview 1 Adaptive Experimentation Introduction 2 Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 56 / 68
  • 58. Video Upload Transcoding Optimization Problem • System receives requests to upload videos of different source qualities and file sizes from a variety of network connections and devices. • To ensure high reliability, a video may be transcoded to be uploaded at a lower quality • For each video upload request, we have features about • the video: file size, duration, source resolution • the network: country, network type, download speed • the device Goal • Maximize quality preserved without decreasing reliability Adaptive Experimentation Practical Solutions to Exploration Problems 57 / 68
  • 59. Video Upload Transcoding - CB Problem • Context: features about video, network, device • Actions: 360p, 480p, 720p, 1080p • Outcomes: reliability y(x, a) • Rewards: ?? some function R(x, a, y) Adaptive Experimentation Practical Solutions to Exploration Problems 58 / 68
  • 60. Approach - Bandit Algorithmm Thompson Sampling • Works well in batch mode • Hyper-parameter free exploration • Always ”picks the best” codec: picks codecs with probability proportional to it being the best Adaptive Experimentation Practical Solutions to Exploration Problems 59 / 68
  • 61. Approach - Modeling Bayesian Linear Model • Bernoulli likelihood to predict reliability • Using a neural network feature extractor • Simple two-layer MLP (50, 4) trained via SGD • Last layer is a stochastic variational GP with a linear kernel • Trained via stochastic variational inference using 1000 inducing points according to space-filling design Adaptive Experimentation Practical Solutions to Exploration Problems 60 / 68
  • 62. Thompson Sampling Algorithm 3 ThompsonSampling Input: discrete set of actions A, distribution over models P0(f) 1: for t = 0 to T do 2: Sample model ˜ft ∼ Pt(f|X, y) 3: Select an action at ← arg maxa∈A E(rt|xt, a, ˜ft) 4: Observe reward rt 5: Update distribution Pt+1(f) 6: end for Adaptive Experimentation Practical Solutions to Exploration Problems 61 / 68
  • 63. Issues with Vanilla Thompson Sampling • Thompson sampling does not account for the constraint • Change in reliability must be non-negative • Unclear how to optimally specify reward parameterization Adaptive Experimentation Practical Solutions to Exploration Problems 62 / 68
  • 64. Constrained Thompson Sampling Algorithm 4 ConstrainedThompsonSampling 1: Input: discrete set of actions A, distribution over models P0(f) 2: for t = 0 to T do 3: Receive context xt 4: Sample model ˜ft ∼ Pt(f|X, y) 5: for a ∈ A do 6: Estimate outcomes ˜ft(xt, a) 7: end for 8: Fetch action under baseline policy b ← πb(xt) 9: Filter feasible actions: Afeas ← {a ∈ A| ˜ft(xt, a) ≥ ε · ˜ft(xt, b)} 10: Select an action at ← arg maxa∈Afeas E(rt|xt, a, ˜ft) 11: Observe outcome yt 12: Update distribution Pt+1(f) 13: end for Adaptive Experimentation Practical Solutions to Exploration Problems 63 / 68
  • 65. Reward Shaping Setup Reward Shaping: • Reward is 0 if the upload is a failure • Reward is fixed at 1 for a 360p upload success: • Reward is monotonically increasing with quality: R(y = 1, a) = 1 + a ≤a wa where wi ∈ (0.0, 0.2] Safety Constraint: ε ∈ [0.95, 1.0] Adaptive Experimentation Practical Solutions to Exploration Problems 64 / 68
  • 66. Reward Shaping Optimization • Teams care about top-line outcomes: • Reliability: mean reliability per user • Quality preserved: mean quality (e.g., 1080p preserved, HD) per user • Other outcomes: watch time, content production • Difficult to evaluate these outcomes from purely offline data Solution: Use Bayesian Optimization (via Ax) using online experiments Adaptive Experimentation Practical Solutions to Exploration Problems 65 / 68
  • 67. Reward Shaping Optimization (a) 1080p quality preserved (b) Reliability Figure: GP-modeled response surface of mean percent change in video quality and reliability relative to the baseline policy. Each point represents a policy parameterized by reward function hyperparameters and constraint parameter ε. Adaptive Experimentation Practical Solutions to Exploration Problems 66 / 68
  • 68. Reward Shaping Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 67 / 68
  • 69. Thanks Adaptive Experimentation Team • Manager: Eytan Bakshy • Constrained TS: Sam Daulton, Shaun Singh, Drew Dimmery • BoTorch: Max Balandat, Sam Daulton, Daniel Jiang, Brian Karrer, Ben Letham • Ax: Kostya Kashin, Lili Dworkin, Lena Kashtelyan, Ben Letham, Ashwin Murthy, Shaun Singh, and Drew Dimmery Papers • Constrained Bayesian Optimization with Noisy Experiments. Letham et al. 2019, Bayesian Analysis. • Bayesian Optimization for Policy Search via Online-Offline Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv 1904.01049 Adaptive Experimentation Practical Solutions to Exploration Problems 68 / 68