SlideShare a Scribd company logo
1 of 13
Download to read offline
Bandit Algorithms
for Website
Optimization
by John Myles White
summary by Kyle (Kwanghee Choi)
Reference
1. Two Characters:
Exploration and Exploitation
- Need to balance exploration and exploitation
- or, experimentation and profit-maximization
- or, learning new ideas and taking advantage of the best of old ideas,
- or, gathering data and acting on that data
2. Why Use
Multiarmed Bandit Algorithms?
- Measurable achievements examples
- Traffic, Conversions, Sales, CTRs
- Definitions
- Reward: Measure of success
- Arms: List of potential changes
- Explaining standard A/B testing as an exploration - exploitation tradeoff
- Short period of pure exploration (Assigning equal numbers of users to A/B)
- Long period of pure exploitation (Send all of the users to successful option
- Why A/B testing might be a bad strategy?
- Abrupt transition
- Wastes resources exploring inferior options
3. The ϵ-Greedy Algorithm
- Tries to be fair to the two opposite goals of exploration & exploitation
- ϵ=0: Pure exploitation
- ϵ=1: Pure exploration
- Problem of fixed ϵ
- May need more exploration at the start, may need more exploitation after some time.
- Explores arms completely at random without any concern about their merits.
4. Debugging
Bandit Algorithms
- Bandit algorithms are not black-box functions.
- Bandit algorithms have to actively select which data it should acquire (Active Learning)
and analyze that data at real time (Online Learning).
- Bandit data and bandit analysis are inseparable. “Feedback cycle.”
4. Debugging
Bandit Algorithms
- Use Monte Carlo simulation to provide simulated data in real-time.
- Analyzing results
- Tracking the probability of choosing the best arm,
as both bandit algorithms and rewards are probabilistic.
- Tracking the average reward at each point in time.
- Tracking the cumulative reward at each point in time,
to look at the bigger picture of the lifetime performance.
5. The Softmax Algorithm
- Problem of fixed ϵ revisited
- If the difference in rewards between two arms is small,
more exploration is needed, and vice versa.
- Never get past the intrinsic errors caused by the purely random exploration strategy.
- Set the probability of choosing arm A with accumulative reward rA
as …
-
- Temperature parameter τ shifts the behavior along a continuum
between pure exploration ( τ = ∞ ) and exploitation ( τ = 0 ) .
- Negative rewards are okay thanks to exponential rescaling.
- Annealing: Encouraging to explore less over time by slowly decreasing τ .
6. UCB
The Upper Confidence Bound Algorithm
- Problems of softmax algorithms
- Only pay attention on how much reward they’ve gotten from the arms.
- Gullible: easily misled by a few negative experiences, as the algorithm do not keep track of how
much they know about the arms (how much confident).
- UCBs avoid being gullible by keeping track of confidence in assessments of the
estimated values of all the arms.
- UCBs doesn’t use randomness, and doesn’t have any free parameters.
6. UCB
The Upper Confidence Bound Algorithm
- UCB1 (one of the variants of UCBs) chooses arm i
with accumulative rewards ri
, bonus bi
, and number of times ni
as …
-
- Cold start is prevented by bi
= ∞
- UCBs are explicitly curious algorithms.
- Curiousness are implemented with bonus bi
, where bi
gets bigger when ni
is too small.
- So, we will occasionally visit the worst of the arms.
6. UCB
The Upper Confidence Bound Algorithm
- Comparing bandit algorithms side-by-side
- UCB1 is much noisier than ϵ -Greedy or Softmax.
- ϵ -Greedy doesn’t converge as quickly as Softmax.
- UCB1 takes a while to catch up with Softmax.
- UCB1 finds the best arm quickly,
but the backpedaling it does causes it to underperform the Softmax.
7. Bandits in the Real World:
Complexity and Complications
- A/A Testing
- Testing of bandit algorithms itself
- Estimation of the actual variability in real-time data.
- Running concurrent experiments
- May have strange interactions between experiments (ex. different logos and fonts)
- Continuous experimentation vs. Periodic testing
- Bandit algorithms look much better than A/B testing when you are willing to let them run for a
very long time.
- Metrics of Success
- Optimizing short-term CTR may destroy long-term retainability.
- Rescaling metrics into 0-1 space helps algorithms to work well.
- Moving worlds
- Arms with changing rewards raise serious problems.
- Average (No parameter to tune) vs. Weighted Average (Flexibility towards moving worlds)
8. Conclusion
- There is no universal bandit algorithm that will always do the best job.
- Domain expertise and good judgement will always be necessary.
- There is always a trade-off between exploration & exploitation.
Initialization of an algorithm matters a lot. Biases may both help or hurt.
- Make sure you explore less over time.

More Related Content

Similar to Bandit algorithms for website optimization - A summary

The monte carlo method
The monte carlo methodThe monte carlo method
The monte carlo method
Saurabh Sood
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
MLconf
 
GA.-.Presentation
GA.-.PresentationGA.-.Presentation
GA.-.Presentation
oldmanpat
 
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxApriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
NingthoujamMahesh1
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
Triskelion_Kaggle
 

Similar to Bandit algorithms for website optimization - A summary (20)

algorithm_2algorithm_analysis.pdf
algorithm_2algorithm_analysis.pdfalgorithm_2algorithm_analysis.pdf
algorithm_2algorithm_analysis.pdf
 
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHMNON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
The monte carlo method
The monte carlo methodThe monte carlo method
The monte carlo method
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
GA.-.Presentation
GA.-.PresentationGA.-.Presentation
GA.-.Presentation
 
cs1538.ppt
cs1538.pptcs1538.ppt
cs1538.ppt
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxApriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
 
Bandit Algorithms
Bandit AlgorithmsBandit Algorithms
Bandit Algorithms
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
 
Multi Armed Bandits
Multi Armed BanditsMulti Armed Bandits
Multi Armed Bandits
 
Ensemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized RecommendationEnsemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized Recommendation
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 
Using Java & Genetic Algorithms to Beat the Market
Using Java & Genetic Algorithms to Beat the MarketUsing Java & Genetic Algorithms to Beat the Market
Using Java & Genetic Algorithms to Beat the Market
 
Cmpe 255 cross validation
Cmpe 255 cross validationCmpe 255 cross validation
Cmpe 255 cross validation
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 
September 11, Deliberative Algorithms II
September 11, Deliberative Algorithms IISeptember 11, Deliberative Algorithms II
September 11, Deliberative Algorithms II
 
A/B testing in Firebase. Intermediate and advanced approach
A/B testing in Firebase. Intermediate and advanced approachA/B testing in Firebase. Intermediate and advanced approach
A/B testing in Firebase. Intermediate and advanced approach
 

More from Kwanghee Choi

More from Kwanghee Choi (19)

Visual Transformers
Visual TransformersVisual Transformers
Visual Transformers
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)
 
Recommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsRecommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal Scrolls
 
추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)
 
추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)
 
Before and After the AI Winter - Recap
Before and After the AI Winter - RecapBefore and After the AI Winter - Recap
Before and After the AI Winter - Recap
 
Mastering Gomoku - Recap
Mastering Gomoku - RecapMastering Gomoku - Recap
Mastering Gomoku - Recap
 
Teachings of Ada Lovelace
Teachings of Ada LovelaceTeachings of Ada Lovelace
Teachings of Ada Lovelace
 
div, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewdiv, grad, curl, and all that - a review
div, grad, curl, and all that - a review
 
Gaussian processes
Gaussian processesGaussian processes
Gaussian processes
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
 
Duality between OOP and RL
Duality between OOP and RLDuality between OOP and RL
Duality between OOP and RL
 
JFEF encoding
JFEF encodingJFEF encoding
JFEF encoding
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
 
Azure functions: Quickstart
Azure functions: QuickstartAzure functions: Quickstart
Azure functions: Quickstart
 
Modern convolutional object detectors
Modern convolutional object detectorsModern convolutional object detectors
Modern convolutional object detectors
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving Average
 
Jpl coding standard for the c programming language
Jpl coding standard for the c programming languageJpl coding standard for the c programming language
Jpl coding standard for the c programming language
 

Recently uploaded

Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 

Recently uploaded (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 

Bandit algorithms for website optimization - A summary

  • 1. Bandit Algorithms for Website Optimization by John Myles White summary by Kyle (Kwanghee Choi)
  • 3. 1. Two Characters: Exploration and Exploitation - Need to balance exploration and exploitation - or, experimentation and profit-maximization - or, learning new ideas and taking advantage of the best of old ideas, - or, gathering data and acting on that data
  • 4. 2. Why Use Multiarmed Bandit Algorithms? - Measurable achievements examples - Traffic, Conversions, Sales, CTRs - Definitions - Reward: Measure of success - Arms: List of potential changes - Explaining standard A/B testing as an exploration - exploitation tradeoff - Short period of pure exploration (Assigning equal numbers of users to A/B) - Long period of pure exploitation (Send all of the users to successful option - Why A/B testing might be a bad strategy? - Abrupt transition - Wastes resources exploring inferior options
  • 5. 3. The ϵ-Greedy Algorithm - Tries to be fair to the two opposite goals of exploration & exploitation - ϵ=0: Pure exploitation - ϵ=1: Pure exploration - Problem of fixed ϵ - May need more exploration at the start, may need more exploitation after some time. - Explores arms completely at random without any concern about their merits.
  • 6. 4. Debugging Bandit Algorithms - Bandit algorithms are not black-box functions. - Bandit algorithms have to actively select which data it should acquire (Active Learning) and analyze that data at real time (Online Learning). - Bandit data and bandit analysis are inseparable. “Feedback cycle.”
  • 7. 4. Debugging Bandit Algorithms - Use Monte Carlo simulation to provide simulated data in real-time. - Analyzing results - Tracking the probability of choosing the best arm, as both bandit algorithms and rewards are probabilistic. - Tracking the average reward at each point in time. - Tracking the cumulative reward at each point in time, to look at the bigger picture of the lifetime performance.
  • 8. 5. The Softmax Algorithm - Problem of fixed ϵ revisited - If the difference in rewards between two arms is small, more exploration is needed, and vice versa. - Never get past the intrinsic errors caused by the purely random exploration strategy. - Set the probability of choosing arm A with accumulative reward rA as … - - Temperature parameter τ shifts the behavior along a continuum between pure exploration ( τ = ∞ ) and exploitation ( τ = 0 ) . - Negative rewards are okay thanks to exponential rescaling. - Annealing: Encouraging to explore less over time by slowly decreasing τ .
  • 9. 6. UCB The Upper Confidence Bound Algorithm - Problems of softmax algorithms - Only pay attention on how much reward they’ve gotten from the arms. - Gullible: easily misled by a few negative experiences, as the algorithm do not keep track of how much they know about the arms (how much confident). - UCBs avoid being gullible by keeping track of confidence in assessments of the estimated values of all the arms. - UCBs doesn’t use randomness, and doesn’t have any free parameters.
  • 10. 6. UCB The Upper Confidence Bound Algorithm - UCB1 (one of the variants of UCBs) chooses arm i with accumulative rewards ri , bonus bi , and number of times ni as … - - Cold start is prevented by bi = ∞ - UCBs are explicitly curious algorithms. - Curiousness are implemented with bonus bi , where bi gets bigger when ni is too small. - So, we will occasionally visit the worst of the arms.
  • 11. 6. UCB The Upper Confidence Bound Algorithm - Comparing bandit algorithms side-by-side - UCB1 is much noisier than ϵ -Greedy or Softmax. - ϵ -Greedy doesn’t converge as quickly as Softmax. - UCB1 takes a while to catch up with Softmax. - UCB1 finds the best arm quickly, but the backpedaling it does causes it to underperform the Softmax.
  • 12. 7. Bandits in the Real World: Complexity and Complications - A/A Testing - Testing of bandit algorithms itself - Estimation of the actual variability in real-time data. - Running concurrent experiments - May have strange interactions between experiments (ex. different logos and fonts) - Continuous experimentation vs. Periodic testing - Bandit algorithms look much better than A/B testing when you are willing to let them run for a very long time. - Metrics of Success - Optimizing short-term CTR may destroy long-term retainability. - Rescaling metrics into 0-1 space helps algorithms to work well. - Moving worlds - Arms with changing rewards raise serious problems. - Average (No parameter to tune) vs. Weighted Average (Flexibility towards moving worlds)
  • 13. 8. Conclusion - There is no universal bandit algorithm that will always do the best job. - Domain expertise and good judgement will always be necessary. - There is always a trade-off between exploration & exploitation. Initialization of an algorithm matters a lot. Biases may both help or hurt. - Make sure you explore less over time.