What are Multi-armed Bandits?
What ARE THEY? Why ARE WE TALKING
ABOUT THEM?
Source: VWO Blog
Source: Medium Article
MAB vs A/B Test
How ARE THEY DIFFERENT? PROS-CONS
Source: VWO Blog
Source: VWO Blog
BUSINESS USE-cases
Source: Vector Stock
Short-lived CAMPAIGNS
Multiple VARIATIONS
Source: Bob WP
Just saving Conversions
Source: Search Engine Journal
Core Components of VWO’s Bandit Algorithm
● Weight Initialization
● Weight Updation
● Traffic Split Computation
● Add exploration factor
Refer - Understanding the Working of Multi Armed Bandit in VWO to understand
mathematics in more details.
Weight Initialization
Content Weights
Individual performance of each variant
Content Interaction Weights(only MVT)
Paired performance of each variant
All weights are a distribution
Weight Updation
We use the Assumed Density Filtering algorithm
to model a layout’s conversion rate/revenue and
the message passing algorithm of Bayesian factor
graphs to update its corresponding weight
distributions.
This approach has helped us model conversion
rate/average revenue in MAB analytically,
helping us build a scalable solution for our
customers.
Message
Forward-Pass
Message
Backward-Pass
Thompson SAmpling(1993)
● Less trials means more uncertainty in
estimates. Spread/variance captures
uncertainty: enables Exploration.
● With more trials posteriors concentrate
on true parameter: enables Exploitation
For any system to sustain itself and adapt itself to the changing environment, it needs to explore while
exploiting constantly.
Traffic - Split Computation
1. Compute layout’s score -
○ Obtain a sample from weight
distributions corresponding to a
layout.
○ Add scores of all sampled weights.
2. Do step 1 for each layout.
3. Find the layout with the maximum score.
4. Do steps 1-3 several times, and we’ll obtain
a winning proportion of each layout which
will be the traffic split obtained from the
algorithm.
Add exploration factor
With thompson sampling after obtaining many data points, learning would come to a halt, and
the model will run in full exploitation mode. We use epsilon-greedy and thompson sampling to
determine traffic split to avoid convergence of a model to a single variation.
Therefore, after obtaining the traffic split from thompson sampling, we adjust the traffic split by
considering a fixed epsilon factor for exploration.
Performing statistical analysis in MAB Report
To ensure an equal proportion of visitors for
computing statistical reports-
● We take the minimum of the traffic
proportion obtained from MAB and use
it as a probability of a visitor to be
considered for statistical analysis.
● We mark the visitor based on the result of
a bernoulli trial.
So while performing statistical analysis, we
consider only marked visitors.
PROS/CONS OF OUR ALGORITHM
MAX-OUT AT
%
Revisiting the benefits and the tradeoffs
Questions?

Minimize conversion loss in cro testing with multi armed bandits (1)

  • 2.
    What are Multi-armedBandits? What ARE THEY? Why ARE WE TALKING ABOUT THEM? Source: VWO Blog Source: Medium Article
  • 3.
    MAB vs A/BTest How ARE THEY DIFFERENT? PROS-CONS Source: VWO Blog Source: VWO Blog
  • 4.
    BUSINESS USE-cases Source: VectorStock Short-lived CAMPAIGNS Multiple VARIATIONS Source: Bob WP Just saving Conversions Source: Search Engine Journal
  • 5.
    Core Components ofVWO’s Bandit Algorithm ● Weight Initialization ● Weight Updation ● Traffic Split Computation ● Add exploration factor Refer - Understanding the Working of Multi Armed Bandit in VWO to understand mathematics in more details.
  • 6.
    Weight Initialization Content Weights Individualperformance of each variant Content Interaction Weights(only MVT) Paired performance of each variant All weights are a distribution
  • 7.
    Weight Updation We usethe Assumed Density Filtering algorithm to model a layout’s conversion rate/revenue and the message passing algorithm of Bayesian factor graphs to update its corresponding weight distributions. This approach has helped us model conversion rate/average revenue in MAB analytically, helping us build a scalable solution for our customers. Message Forward-Pass Message Backward-Pass
  • 8.
    Thompson SAmpling(1993) ● Lesstrials means more uncertainty in estimates. Spread/variance captures uncertainty: enables Exploration. ● With more trials posteriors concentrate on true parameter: enables Exploitation For any system to sustain itself and adapt itself to the changing environment, it needs to explore while exploiting constantly.
  • 9.
    Traffic - SplitComputation 1. Compute layout’s score - ○ Obtain a sample from weight distributions corresponding to a layout. ○ Add scores of all sampled weights. 2. Do step 1 for each layout. 3. Find the layout with the maximum score. 4. Do steps 1-3 several times, and we’ll obtain a winning proportion of each layout which will be the traffic split obtained from the algorithm.
  • 10.
    Add exploration factor Withthompson sampling after obtaining many data points, learning would come to a halt, and the model will run in full exploitation mode. We use epsilon-greedy and thompson sampling to determine traffic split to avoid convergence of a model to a single variation. Therefore, after obtaining the traffic split from thompson sampling, we adjust the traffic split by considering a fixed epsilon factor for exploration.
  • 11.
    Performing statistical analysisin MAB Report To ensure an equal proportion of visitors for computing statistical reports- ● We take the minimum of the traffic proportion obtained from MAB and use it as a probability of a visitor to be considered for statistical analysis. ● We mark the visitor based on the result of a bernoulli trial. So while performing statistical analysis, we consider only marked visitors.
  • 12.
    PROS/CONS OF OURALGORITHM MAX-OUT AT %
  • 13.
    Revisiting the benefitsand the tradeoffs
  • 14.