Practical Solutions to Exploration Problems
Sam Daulton
Core Data Science, Facebook
Adaptive Experimentation Practical Solutions to Exploration Problems 1 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 2 / 68
Adaptive Experimentation Team
• Horizontal R&D team within
Facebook
• Goal: radically change the way
people run experiments and
develop systems:
• Reduce threshold for
experimentation
• Use RL to robustly solve
explore/exploit problems
• Develop tools to improve and
automate decision-making
under multiple and/or
constrained objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 3 / 68
Adaptive Experimentation Team
Adaptive Experimentation Practical Solutions to Exploration Problems 4 / 68
Spectrum of Automation
Adaptive Experimentation Practical Solutions to Exploration Problems 5 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 6 / 68
Heterogeneous Connections and Devices
Adaptive Experimentation Practical Solutions to Exploration Problems 7 / 68
Homogeneous Status Quo Policy
Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
Homogeneous Status Quo Policy
Idea: What if we loaded different numbers of stories depending on the
connection type?
Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
Potential Contextualized Policy
Idea: What if we loaded more posts for better connections types?
Adaptive Experimentation Practical Solutions to Exploration Problems 9 / 68
Potential Contextualized Policy - Opposite
Idea: What if we loaded fewer posts for better connections types?
Adaptive Experimentation Practical Solutions to Exploration Problems 10 / 68
Potential Contextualized Policies
Suppose that for each connection type c:
• We could fetch any number of posts xc ∈ [2, 24]
• Then there are 224 = 234, 256 possible configurations to test!
Adaptive Experimentation Practical Solutions to Exploration Problems 11 / 68
Policies as Black-box Functions
The average treatment effect over all individuals can be expected to be
some smooth function of the policy table x = [x1, ..., xk]:
f(x) : Rk
→ R
Adaptive Experimentation Practical Solutions to Exploration Problems 12 / 68
Black-box Function View of RL
• Turns ”full RL” problem into an infinite-armed bandit problem
πx∗ = arg max
x
g(f(x))
• Advantages:
• Does not require estimating value functions, state transition functions,
or inference about unobserved states
• Involves virtually no logging of actions, states, or intermediate rewards
• Allows for direct maximization of multiple, delayed rewards
Question: How can we make predictions about long-term outcomes from
limited number of vector-valued policies?
Adaptive Experimentation Practical Solutions to Exploration Problems 13 / 68
Gaussian Process (GP) Priors
Adaptive Experimentation Practical Solutions to Exploration Problems 14 / 68
Gaussian Process (GP) Priors
Adaptive Experimentation Practical Solutions to Exploration Problems 15 / 68
Gaussian Process (GP) Posteriors
Adaptive Experimentation Practical Solutions to Exploration Problems 16 / 68
Gaussian Process (GP) Posteriors
Adaptive Experimentation Practical Solutions to Exploration Problems 17 / 68
Gaussian Process (GP) Posteriors
GP regression gives well-calibrated posterior predictive intervals that are
easy to compute
Adaptive Experimentation Practical Solutions to Exploration Problems 18 / 68
Gaussian Process (GP) Regression
In practice, we find that GP surrogate models fit the data well for many
online experiments.
Adaptive Experimentation Practical Solutions to Exploration Problems 19 / 68
Other Examples with Continuous Action Spaces
• Value models governing ranking policies: e.g.
rank(Z) = x1P(click|Z) + x2Zx3
num friends + f(P(spam|Z)/x4) + ...
• Bit-rate controllers for video and audio streaming
• Data retrieval policies for ML backends
Question: How do we use GP surrogate models to guide the
explore-exploit trade-off?
Adaptive Experimentation Practical Solutions to Exploration Problems 20 / 68
Bayesian Optimization
Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 21 / 68
Bayesian Optimization
Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 22 / 68
Bayesian Optimization
Round 1
Adaptive Experimentation Practical Solutions to Exploration Problems 23 / 68
Bayesian Optimization
Round 2
Adaptive Experimentation Practical Solutions to Exploration Problems 24 / 68
Bayesian Optimization
Round 3
Adaptive Experimentation Practical Solutions to Exploration Problems 25 / 68
Bayesian Optimization
q-Batch Bayesian Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 26 / 68
Bayesian Optimization
Response surface is maximized sequentially
• Models tell us which regions should be considered for further
assessment
Adaptive Experimentation Practical Solutions to Exploration Problems 27 / 68
Bayesian Optimization
Algorithm 1 BayesianOptimization
1: Run N random initial arms
2: for t = 0 to T do
3: Fit GP model to data
4: Use acquistion function select candidates C
5: Evaluate C on black box function
6: Add new observations to dataset
7: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 28 / 68
Alternatives
Grid Search (Expensive - 81 arms)
Adaptive Experimentation Practical Solutions to Exploration Problems 29 / 68
Alternatives
Random Search (Cheaper - 25 arms)
• Maxima can be deduced with only a few, smartly chosen arms
Adaptive Experimentation Practical Solutions to Exploration Problems 30 / 68
Competing Objectives
• Product teams are used to running an A/B test and observing the
outcomes.
• Often, there are multiple competing objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 31 / 68
Competing Objectives
If we want full automation, we need to specify more information in
advance: ideally, ”the” scalarized objective
Adaptive Experimentation Practical Solutions to Exploration Problems 32 / 68
Competing Objectives
Decision Makers Have Multiple Objectives
Adaptive Experimentation Practical Solutions to Exploration Problems 33 / 68
Competing Objectives
Decision makers don’t like scalarizations: e.g.
objective = −0.8 · cpu + 1.1 · time spent
Adaptive Experimentation Practical Solutions to Exploration Problems 34 / 68
Competing Objectives
Decision makers prefer constraints:
min(cpu) subject to time spent > 0.7
Adaptive Experimentation Practical Solutions to Exploration Problems 35 / 68
Practical Challenges
• Constrained optimization
• Observations often have high variance, leading to potentially large
measurement error
• High noise levels can degrade the performance of many common
acquisition functions including Expected Improvement
Adaptive Experimentation Practical Solutions to Exploration Problems 36 / 68
Solution
For more details, see
• Constrained Bayesian Optimization with Noisy Experiments Bayesian
Analysis 2019. Letham, Karrer, Ottoni, & Bakshy
Adaptive Experimentation Practical Solutions to Exploration Problems 37 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 38 / 68
Value Model Tuning
• Ranking teams use value models, combine multiple predictive models
and features, e.g.
rank(Z) = x1P(click|Z) + x2Zx3
num friends + f(P(spam|Z)/x4) + ...
• Not feasible to run sufficiently powered experiments with 20+ arms,
so the team developed a simulator
Adaptive Experimentation Practical Solutions to Exploration Problems 39 / 68
Simulation Setup
Adaptive Experimentation Practical Solutions to Exploration Problems 40 / 68
Biased Simulator
Adaptive Experimentation Practical Solutions to Exploration Problems 41 / 68
Debiasing Simulations with Multi-Task Models
Adaptive Experimentation Practical Solutions to Exploration Problems 42 / 68
Debiasing Simulations with Multi-Task Models
Adaptive Experimentation Practical Solutions to Exploration Problems 43 / 68
Multi-Task Bayesian Optimization Loop
Algorithm 2 MultiTaskBayesianOptimization
1: Run N random arms online
2: Run M random arms offline with M > N
3: for t = 0 to T do
4: Fit MT-GP model to all data, with each batch as separate task
5: Use NEI to generate q candidates C (e.g. q = 30)
6: Run C on the simulator, fit GP model again
7: Use NEI to generate candidates to run online
8: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 44 / 68
Example of Multi-Task Bayesian Optimization
0 5 10 15 20 25 30 35 40
Iteration
−1
0
1
2
Outcome
Objective
−2
−1
0
1
Outcome
Constraints
−2
−1
0
1
2
Outcome
0 5 10 15 20 25 30 35 40
Iteration
−2
0
2
Outcome
Adaptive Experimentation Practical Solutions to Exploration Problems 45 / 68
Paper
For more details, see
• See Bayesian Optimization for Policy Search via Online-Offline
Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv
1904.01049
Adaptive Experimentation Practical Solutions to Exploration Problems 46 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 47 / 68
Open Source Tools
Adaptive Experimentation Practical Solutions to Exploration Problems 48 / 68
Research to Production
Adaptive Experimentation Practical Solutions to Exploration Problems 49 / 68
Simple APIs
Adaptive Experimentation Practical Solutions to Exploration Problems 50 / 68
Adaptive Experimentation in Practice
Adaptive Experimentation Practical Solutions to Exploration Problems 51 / 68
Experiment Understanding
Adaptive Experimentation Practical Solutions to Exploration Problems 52 / 68
BoTorch
Adaptive Experimentation Practical Solutions to Exploration Problems 53 / 68
BoTorch: Building Blocks
Adaptive Experimentation Practical Solutions to Exploration Problems 54 / 68
Improving Researcher Efficiency
Adaptive Experimentation Practical Solutions to Exploration Problems 55 / 68
Overview
1 Adaptive Experimentation
Introduction
2 Direct policy search via Bayesian optimization
Motivating Example
Gaussian Process Regression
Bayesian Optimization
3 Combining online and offline experiments
Value Model Tuning
Multi-Task Bayesian Optimization
4 Open Source Tools
Ax
BoTorch
5 Constrained Bayesian Contextual Bandits
Video Upload Transcoding Optimization
Constrained Thompson Sampling (CTS)
Reward Shaping and Hyperparameter Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 56 / 68
Video Upload Transcoding Optimization
Problem
• System receives requests to upload videos of different source qualities
and file sizes from a variety of network connections and devices.
• To ensure high reliability, a video may be transcoded to be uploaded
at a lower quality
• For each video upload request, we have features about
• the video: file size, duration, source resolution
• the network: country, network type, download speed
• the device
Goal
• Maximize quality preserved without decreasing reliability
Adaptive Experimentation Practical Solutions to Exploration Problems 57 / 68
Video Upload Transcoding - CB Problem
• Context: features about video, network, device
• Actions: 360p, 480p, 720p, 1080p
• Outcomes: reliability y(x, a)
• Rewards: ?? some function R(x, a, y)
Adaptive Experimentation Practical Solutions to Exploration Problems 58 / 68
Approach - Bandit Algorithmm
Thompson Sampling
• Works well in batch mode
• Hyper-parameter free exploration
• Always ”picks the best” codec: picks codecs with probability
proportional to it being the best
Adaptive Experimentation Practical Solutions to Exploration Problems 59 / 68
Approach - Modeling
Bayesian Linear Model
• Bernoulli likelihood to predict reliability
• Using a neural network feature extractor
• Simple two-layer MLP (50, 4) trained via SGD
• Last layer is a stochastic variational GP with a linear kernel
• Trained via stochastic variational inference using 1000 inducing points
according to space-filling design
Adaptive Experimentation Practical Solutions to Exploration Problems 60 / 68
Thompson Sampling
Algorithm 3 ThompsonSampling
Input: discrete set of actions A, distribution over models P0(f)
1: for t = 0 to T do
2: Sample model ˜ft ∼ Pt(f|X, y)
3: Select an action at ← arg maxa∈A E(rt|xt, a, ˜ft)
4: Observe reward rt
5: Update distribution Pt+1(f)
6: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 61 / 68
Issues with Vanilla Thompson Sampling
• Thompson sampling does not account for the constraint
• Change in reliability must be non-negative
• Unclear how to optimally specify reward parameterization
Adaptive Experimentation Practical Solutions to Exploration Problems 62 / 68
Constrained Thompson Sampling
Algorithm 4 ConstrainedThompsonSampling
1: Input: discrete set of actions A, distribution over models P0(f)
2: for t = 0 to T do
3: Receive context xt
4: Sample model ˜ft ∼ Pt(f|X, y)
5: for a ∈ A do
6: Estimate outcomes ˜ft(xt, a)
7: end for
8: Fetch action under baseline policy b ← πb(xt)
9: Filter feasible actions: Afeas ← {a ∈ A| ˜ft(xt, a) ≥ ε · ˜ft(xt, b)}
10: Select an action at ← arg maxa∈Afeas
E(rt|xt, a, ˜ft)
11: Observe outcome yt
12: Update distribution Pt+1(f)
13: end for
Adaptive Experimentation Practical Solutions to Exploration Problems 63 / 68
Reward Shaping Setup
Reward Shaping:
• Reward is 0 if the upload is a failure
• Reward is fixed at 1 for a 360p upload success:
• Reward is monotonically increasing with quality:
R(y = 1, a) = 1 +
a ≤a
wa
where
wi ∈ (0.0, 0.2]
Safety Constraint: ε ∈ [0.95, 1.0]
Adaptive Experimentation Practical Solutions to Exploration Problems 64 / 68
Reward Shaping Optimization
• Teams care about top-line outcomes:
• Reliability: mean reliability per user
• Quality preserved: mean quality (e.g., 1080p preserved, HD) per user
• Other outcomes: watch time, content production
• Difficult to evaluate these outcomes from purely offline data
Solution: Use Bayesian Optimization (via Ax) using online experiments
Adaptive Experimentation Practical Solutions to Exploration Problems 65 / 68
Reward Shaping Optimization
(a) 1080p quality preserved (b) Reliability
Figure: GP-modeled response surface of mean percent change in video quality
and reliability relative to the baseline policy. Each point represents a policy
parameterized by reward function hyperparameters and constraint parameter ε.
Adaptive Experimentation Practical Solutions to Exploration Problems 66 / 68
Reward Shaping Optimization
Adaptive Experimentation Practical Solutions to Exploration Problems 67 / 68
Thanks
Adaptive Experimentation Team
• Manager: Eytan Bakshy
• Constrained TS: Sam Daulton, Shaun Singh, Drew Dimmery
• BoTorch: Max Balandat, Sam Daulton, Daniel Jiang, Brian Karrer,
Ben Letham
• Ax: Kostya Kashin, Lili Dworkin, Lena Kashtelyan, Ben Letham,
Ashwin Murthy, Shaun Singh, and Drew Dimmery
Papers
• Constrained Bayesian Optimization with Noisy Experiments. Letham
et al. 2019, Bayesian Analysis.
• Bayesian Optimization for Policy Search via Online-Offline
Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv
1904.01049
Adaptive Experimentation Practical Solutions to Exploration Problems 68 / 68

Facebook Talk at Netflix ML Platform meetup Sep 2019

  • 1.
    Practical Solutions toExploration Problems Sam Daulton Core Data Science, Facebook Adaptive Experimentation Practical Solutions to Exploration Problems 1 / 68
  • 2.
    Overview 1 Adaptive Experimentation Introduction 2Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 2 / 68
  • 3.
    Adaptive Experimentation Team •Horizontal R&D team within Facebook • Goal: radically change the way people run experiments and develop systems: • Reduce threshold for experimentation • Use RL to robustly solve explore/exploit problems • Develop tools to improve and automate decision-making under multiple and/or constrained objectives Adaptive Experimentation Practical Solutions to Exploration Problems 3 / 68
  • 4.
    Adaptive Experimentation Team AdaptiveExperimentation Practical Solutions to Exploration Problems 4 / 68
  • 5.
    Spectrum of Automation AdaptiveExperimentation Practical Solutions to Exploration Problems 5 / 68
  • 6.
    Overview 1 Adaptive Experimentation Introduction 2Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 6 / 68
  • 7.
    Heterogeneous Connections andDevices Adaptive Experimentation Practical Solutions to Exploration Problems 7 / 68
  • 8.
    Homogeneous Status QuoPolicy Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
  • 9.
    Homogeneous Status QuoPolicy Idea: What if we loaded different numbers of stories depending on the connection type? Adaptive Experimentation Practical Solutions to Exploration Problems 8 / 68
  • 10.
    Potential Contextualized Policy Idea:What if we loaded more posts for better connections types? Adaptive Experimentation Practical Solutions to Exploration Problems 9 / 68
  • 11.
    Potential Contextualized Policy- Opposite Idea: What if we loaded fewer posts for better connections types? Adaptive Experimentation Practical Solutions to Exploration Problems 10 / 68
  • 12.
    Potential Contextualized Policies Supposethat for each connection type c: • We could fetch any number of posts xc ∈ [2, 24] • Then there are 224 = 234, 256 possible configurations to test! Adaptive Experimentation Practical Solutions to Exploration Problems 11 / 68
  • 13.
    Policies as Black-boxFunctions The average treatment effect over all individuals can be expected to be some smooth function of the policy table x = [x1, ..., xk]: f(x) : Rk → R Adaptive Experimentation Practical Solutions to Exploration Problems 12 / 68
  • 14.
    Black-box Function Viewof RL • Turns ”full RL” problem into an infinite-armed bandit problem πx∗ = arg max x g(f(x)) • Advantages: • Does not require estimating value functions, state transition functions, or inference about unobserved states • Involves virtually no logging of actions, states, or intermediate rewards • Allows for direct maximization of multiple, delayed rewards Question: How can we make predictions about long-term outcomes from limited number of vector-valued policies? Adaptive Experimentation Practical Solutions to Exploration Problems 13 / 68
  • 15.
    Gaussian Process (GP)Priors Adaptive Experimentation Practical Solutions to Exploration Problems 14 / 68
  • 16.
    Gaussian Process (GP)Priors Adaptive Experimentation Practical Solutions to Exploration Problems 15 / 68
  • 17.
    Gaussian Process (GP)Posteriors Adaptive Experimentation Practical Solutions to Exploration Problems 16 / 68
  • 18.
    Gaussian Process (GP)Posteriors Adaptive Experimentation Practical Solutions to Exploration Problems 17 / 68
  • 19.
    Gaussian Process (GP)Posteriors GP regression gives well-calibrated posterior predictive intervals that are easy to compute Adaptive Experimentation Practical Solutions to Exploration Problems 18 / 68
  • 20.
    Gaussian Process (GP)Regression In practice, we find that GP surrogate models fit the data well for many online experiments. Adaptive Experimentation Practical Solutions to Exploration Problems 19 / 68
  • 21.
    Other Examples withContinuous Action Spaces • Value models governing ranking policies: e.g. rank(Z) = x1P(click|Z) + x2Zx3 num friends + f(P(spam|Z)/x4) + ... • Bit-rate controllers for video and audio streaming • Data retrieval policies for ML backends Question: How do we use GP surrogate models to guide the explore-exploit trade-off? Adaptive Experimentation Practical Solutions to Exploration Problems 20 / 68
  • 22.
    Bayesian Optimization Setup Adaptive ExperimentationPractical Solutions to Exploration Problems 21 / 68
  • 23.
    Bayesian Optimization Setup Adaptive ExperimentationPractical Solutions to Exploration Problems 22 / 68
  • 24.
    Bayesian Optimization Round 1 AdaptiveExperimentation Practical Solutions to Exploration Problems 23 / 68
  • 25.
    Bayesian Optimization Round 2 AdaptiveExperimentation Practical Solutions to Exploration Problems 24 / 68
  • 26.
    Bayesian Optimization Round 3 AdaptiveExperimentation Practical Solutions to Exploration Problems 25 / 68
  • 27.
    Bayesian Optimization q-Batch BayesianOptimization Adaptive Experimentation Practical Solutions to Exploration Problems 26 / 68
  • 28.
    Bayesian Optimization Response surfaceis maximized sequentially • Models tell us which regions should be considered for further assessment Adaptive Experimentation Practical Solutions to Exploration Problems 27 / 68
  • 29.
    Bayesian Optimization Algorithm 1BayesianOptimization 1: Run N random initial arms 2: for t = 0 to T do 3: Fit GP model to data 4: Use acquistion function select candidates C 5: Evaluate C on black box function 6: Add new observations to dataset 7: end for Adaptive Experimentation Practical Solutions to Exploration Problems 28 / 68
  • 30.
    Alternatives Grid Search (Expensive- 81 arms) Adaptive Experimentation Practical Solutions to Exploration Problems 29 / 68
  • 31.
    Alternatives Random Search (Cheaper- 25 arms) • Maxima can be deduced with only a few, smartly chosen arms Adaptive Experimentation Practical Solutions to Exploration Problems 30 / 68
  • 32.
    Competing Objectives • Productteams are used to running an A/B test and observing the outcomes. • Often, there are multiple competing objectives Adaptive Experimentation Practical Solutions to Exploration Problems 31 / 68
  • 33.
    Competing Objectives If wewant full automation, we need to specify more information in advance: ideally, ”the” scalarized objective Adaptive Experimentation Practical Solutions to Exploration Problems 32 / 68
  • 34.
    Competing Objectives Decision MakersHave Multiple Objectives Adaptive Experimentation Practical Solutions to Exploration Problems 33 / 68
  • 35.
    Competing Objectives Decision makersdon’t like scalarizations: e.g. objective = −0.8 · cpu + 1.1 · time spent Adaptive Experimentation Practical Solutions to Exploration Problems 34 / 68
  • 36.
    Competing Objectives Decision makersprefer constraints: min(cpu) subject to time spent > 0.7 Adaptive Experimentation Practical Solutions to Exploration Problems 35 / 68
  • 37.
    Practical Challenges • Constrainedoptimization • Observations often have high variance, leading to potentially large measurement error • High noise levels can degrade the performance of many common acquisition functions including Expected Improvement Adaptive Experimentation Practical Solutions to Exploration Problems 36 / 68
  • 38.
    Solution For more details,see • Constrained Bayesian Optimization with Noisy Experiments Bayesian Analysis 2019. Letham, Karrer, Ottoni, & Bakshy Adaptive Experimentation Practical Solutions to Exploration Problems 37 / 68
  • 39.
    Overview 1 Adaptive Experimentation Introduction 2Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 38 / 68
  • 40.
    Value Model Tuning •Ranking teams use value models, combine multiple predictive models and features, e.g. rank(Z) = x1P(click|Z) + x2Zx3 num friends + f(P(spam|Z)/x4) + ... • Not feasible to run sufficiently powered experiments with 20+ arms, so the team developed a simulator Adaptive Experimentation Practical Solutions to Exploration Problems 39 / 68
  • 41.
    Simulation Setup Adaptive ExperimentationPractical Solutions to Exploration Problems 40 / 68
  • 42.
    Biased Simulator Adaptive ExperimentationPractical Solutions to Exploration Problems 41 / 68
  • 43.
    Debiasing Simulations withMulti-Task Models Adaptive Experimentation Practical Solutions to Exploration Problems 42 / 68
  • 44.
    Debiasing Simulations withMulti-Task Models Adaptive Experimentation Practical Solutions to Exploration Problems 43 / 68
  • 45.
    Multi-Task Bayesian OptimizationLoop Algorithm 2 MultiTaskBayesianOptimization 1: Run N random arms online 2: Run M random arms offline with M > N 3: for t = 0 to T do 4: Fit MT-GP model to all data, with each batch as separate task 5: Use NEI to generate q candidates C (e.g. q = 30) 6: Run C on the simulator, fit GP model again 7: Use NEI to generate candidates to run online 8: end for Adaptive Experimentation Practical Solutions to Exploration Problems 44 / 68
  • 46.
    Example of Multi-TaskBayesian Optimization 0 5 10 15 20 25 30 35 40 Iteration −1 0 1 2 Outcome Objective −2 −1 0 1 Outcome Constraints −2 −1 0 1 2 Outcome 0 5 10 15 20 25 30 35 40 Iteration −2 0 2 Outcome Adaptive Experimentation Practical Solutions to Exploration Problems 45 / 68
  • 47.
    Paper For more details,see • See Bayesian Optimization for Policy Search via Online-Offline Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv 1904.01049 Adaptive Experimentation Practical Solutions to Exploration Problems 46 / 68
  • 48.
    Overview 1 Adaptive Experimentation Introduction 2Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 47 / 68
  • 49.
    Open Source Tools AdaptiveExperimentation Practical Solutions to Exploration Problems 48 / 68
  • 50.
    Research to Production AdaptiveExperimentation Practical Solutions to Exploration Problems 49 / 68
  • 51.
    Simple APIs Adaptive ExperimentationPractical Solutions to Exploration Problems 50 / 68
  • 52.
    Adaptive Experimentation inPractice Adaptive Experimentation Practical Solutions to Exploration Problems 51 / 68
  • 53.
    Experiment Understanding Adaptive ExperimentationPractical Solutions to Exploration Problems 52 / 68
  • 54.
    BoTorch Adaptive Experimentation PracticalSolutions to Exploration Problems 53 / 68
  • 55.
    BoTorch: Building Blocks AdaptiveExperimentation Practical Solutions to Exploration Problems 54 / 68
  • 56.
    Improving Researcher Efficiency AdaptiveExperimentation Practical Solutions to Exploration Problems 55 / 68
  • 57.
    Overview 1 Adaptive Experimentation Introduction 2Direct policy search via Bayesian optimization Motivating Example Gaussian Process Regression Bayesian Optimization 3 Combining online and offline experiments Value Model Tuning Multi-Task Bayesian Optimization 4 Open Source Tools Ax BoTorch 5 Constrained Bayesian Contextual Bandits Video Upload Transcoding Optimization Constrained Thompson Sampling (CTS) Reward Shaping and Hyperparameter Optimization Adaptive Experimentation Practical Solutions to Exploration Problems 56 / 68
  • 58.
    Video Upload TranscodingOptimization Problem • System receives requests to upload videos of different source qualities and file sizes from a variety of network connections and devices. • To ensure high reliability, a video may be transcoded to be uploaded at a lower quality • For each video upload request, we have features about • the video: file size, duration, source resolution • the network: country, network type, download speed • the device Goal • Maximize quality preserved without decreasing reliability Adaptive Experimentation Practical Solutions to Exploration Problems 57 / 68
  • 59.
    Video Upload Transcoding- CB Problem • Context: features about video, network, device • Actions: 360p, 480p, 720p, 1080p • Outcomes: reliability y(x, a) • Rewards: ?? some function R(x, a, y) Adaptive Experimentation Practical Solutions to Exploration Problems 58 / 68
  • 60.
    Approach - BanditAlgorithmm Thompson Sampling • Works well in batch mode • Hyper-parameter free exploration • Always ”picks the best” codec: picks codecs with probability proportional to it being the best Adaptive Experimentation Practical Solutions to Exploration Problems 59 / 68
  • 61.
    Approach - Modeling BayesianLinear Model • Bernoulli likelihood to predict reliability • Using a neural network feature extractor • Simple two-layer MLP (50, 4) trained via SGD • Last layer is a stochastic variational GP with a linear kernel • Trained via stochastic variational inference using 1000 inducing points according to space-filling design Adaptive Experimentation Practical Solutions to Exploration Problems 60 / 68
  • 62.
    Thompson Sampling Algorithm 3ThompsonSampling Input: discrete set of actions A, distribution over models P0(f) 1: for t = 0 to T do 2: Sample model ˜ft ∼ Pt(f|X, y) 3: Select an action at ← arg maxa∈A E(rt|xt, a, ˜ft) 4: Observe reward rt 5: Update distribution Pt+1(f) 6: end for Adaptive Experimentation Practical Solutions to Exploration Problems 61 / 68
  • 63.
    Issues with VanillaThompson Sampling • Thompson sampling does not account for the constraint • Change in reliability must be non-negative • Unclear how to optimally specify reward parameterization Adaptive Experimentation Practical Solutions to Exploration Problems 62 / 68
  • 64.
    Constrained Thompson Sampling Algorithm4 ConstrainedThompsonSampling 1: Input: discrete set of actions A, distribution over models P0(f) 2: for t = 0 to T do 3: Receive context xt 4: Sample model ˜ft ∼ Pt(f|X, y) 5: for a ∈ A do 6: Estimate outcomes ˜ft(xt, a) 7: end for 8: Fetch action under baseline policy b ← πb(xt) 9: Filter feasible actions: Afeas ← {a ∈ A| ˜ft(xt, a) ≥ ε · ˜ft(xt, b)} 10: Select an action at ← arg maxa∈Afeas E(rt|xt, a, ˜ft) 11: Observe outcome yt 12: Update distribution Pt+1(f) 13: end for Adaptive Experimentation Practical Solutions to Exploration Problems 63 / 68
  • 65.
    Reward Shaping Setup RewardShaping: • Reward is 0 if the upload is a failure • Reward is fixed at 1 for a 360p upload success: • Reward is monotonically increasing with quality: R(y = 1, a) = 1 + a ≤a wa where wi ∈ (0.0, 0.2] Safety Constraint: ε ∈ [0.95, 1.0] Adaptive Experimentation Practical Solutions to Exploration Problems 64 / 68
  • 66.
    Reward Shaping Optimization •Teams care about top-line outcomes: • Reliability: mean reliability per user • Quality preserved: mean quality (e.g., 1080p preserved, HD) per user • Other outcomes: watch time, content production • Difficult to evaluate these outcomes from purely offline data Solution: Use Bayesian Optimization (via Ax) using online experiments Adaptive Experimentation Practical Solutions to Exploration Problems 65 / 68
  • 67.
    Reward Shaping Optimization (a)1080p quality preserved (b) Reliability Figure: GP-modeled response surface of mean percent change in video quality and reliability relative to the baseline policy. Each point represents a policy parameterized by reward function hyperparameters and constraint parameter ε. Adaptive Experimentation Practical Solutions to Exploration Problems 66 / 68
  • 68.
    Reward Shaping Optimization AdaptiveExperimentation Practical Solutions to Exploration Problems 67 / 68
  • 69.
    Thanks Adaptive Experimentation Team •Manager: Eytan Bakshy • Constrained TS: Sam Daulton, Shaun Singh, Drew Dimmery • BoTorch: Max Balandat, Sam Daulton, Daniel Jiang, Brian Karrer, Ben Letham • Ax: Kostya Kashin, Lili Dworkin, Lena Kashtelyan, Ben Letham, Ashwin Murthy, Shaun Singh, and Drew Dimmery Papers • Constrained Bayesian Optimization with Noisy Experiments. Letham et al. 2019, Bayesian Analysis. • Bayesian Optimization for Policy Search via Online-Offline Experimentation. Letham & Bakshy 2019. Forthcoming, arXiv 1904.01049 Adaptive Experimentation Practical Solutions to Exploration Problems 68 / 68