Advertisement

Mar. 18, 2018•0 likes## 2 likes

•494 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Data & Analytics

A talk I gave at 99designs about website optimisation using Multi Armed Bandit algorithms. It's based on a talk and ebook by John Miles White — "Bandit Algorithms for Website Optimization"

Dennis HotsonFollow

Senior Developer at 99designsAdvertisement

- A/B testing
- Multi Armed Bandit Algorithms
- Why do we run A/B tests?
- Before During After Logo 1 Logo 1 + Logo 2 Logo 2 ConversionRate Time
- Before During After Logo 1 Logo 2 ConversionRate Time
- Explore or exploit
- Objectively best Option that will be the best in the future
- Subjectively best Option that has been the best in the past
- Explore Choose any option Exploit Choose the subjectively best option
- Regret
- Some classic Multi Armed Bandits...
- Epsilon greedy
- ((1-e) * 100)% to subjectively best (e/2 * 100)% to subjectively best (e/2 * 100)% to subjectively worst
- Monte Carlo Run random simulations 1,000’s of times
- Weaknesses of ε-greedy Situation 1: A: 99% B: 0.001% Situation 2: A: 0.001% B: 0.002%
- Softmax
- P(A) = 0.1 P(B) - 0.2
- Weaknesses of softmax Situation 1: A: 0.01% after 100 trials B: 0.02% after 100 trials Situation 2: A: 0.01% after 100,000,000 trials A: 0.02% after 100,000,000 trials
- UCB Upper Confidence Bound
- Weakness of UCB1 Gotcha: rewards have to be between 0.0 and 1.0 Works best on conversion rates. Not as well on arbitrary dollar rewards.
- Further reading..
- • Other UCB* algorithms • LinUCB / GLM-UCB • Exp3 and other Exp* algorithms
- Thanks!

- hello
- Why run AB tests? because we might have some idea that something is better
- The promise of MAB algorithms is that we can do something more like this. Ideally we want to be able to take advantage of what we learn as we go.
- So there’s this dilemma between whether we exploit what we think is the best based on what we’ve seen or to explore other options to find out more about them.
- MAB’s introduce this concept of regret. It’s how often did you have to try the objectively worst option in order to figure out the objectively best.
- Lets look at a few classic MAB algorithms
- Epsilon greedy works by alternating between exploration and exploitation. The name comes from the parameter epsilon that determines how much exploitation to do vs exploring.
- So one of the weaknesses of epsilon greedy is that it doesn’t take into account of the proportional differences between variations.
- Softmax attempts to address this by exploring options in proportion to how good they appear to be.
- Lets say we’ve got two options and one is twice as good as the other.
- So we could do a straight proportionality but instead softmax does this trick with exponentials so you can have rewards of arbitrary sizes but get back values between 0 and 1. The exponential thing kind of squishes it into a known range. You can even have negative rewards.
- Softmax also has a concept of this “temperature”. Bigger temperature means more “energy”, more random.. closer to 50/50 A/B Lower number closer to 0 will explore the best option more in proportion to how good it is. Temp of 0 will be 100% exploitation.
- So one of the weaknesses of softmax is it doesnt take into account how much you know about the diff options
- Idea is to keep track of how much you know about each option gives you a measure of how confident you are about different options
- There’s a whole family of UCB algorithms, but this is one called UCB1
- So we take the observed conversion rate and add the confidence bound as a kind of bonus
- So the main gotcha of UCB1 is that the payoff has to be between 0 and 1.
- There are a bunch of variations of UCB algorithms. Also, contextual bandit algorithms that can take into account information about visitors. Exp3 algorithms are useful
- Thanks to Lars. Also, thanks to John Myles White. This talk is based on a presentation of his.

Advertisement