SlideShare a Scribd company logo
Bandit Algorithms
for Website
Optimization
by John Myles White
summary by Kyle (Kwanghee Choi)
Reference
1. Two Characters:
Exploration and Exploitation
- Need to balance exploration and exploitation
- or, experimentation and profit-maximization
- or, learning new ideas and taking advantage of the best of old ideas,
- or, gathering data and acting on that data
2. Why Use
Multiarmed Bandit Algorithms?
- Measurable achievements examples
- Traffic, Conversions, Sales, CTRs
- Definitions
- Reward: Measure of success
- Arms: List of potential changes
- Explaining standard A/B testing as an exploration - exploitation tradeoff
- Short period of pure exploration (Assigning equal numbers of users to A/B)
- Long period of pure exploitation (Send all of the users to successful option
- Why A/B testing might be a bad strategy?
- Abrupt transition
- Wastes resources exploring inferior options
3. The ϵ-Greedy Algorithm
- Tries to be fair to the two opposite goals of exploration & exploitation
- ϵ=0: Pure exploitation
- ϵ=1: Pure exploration
- Problem of fixed ϵ
- May need more exploration at the start, may need more exploitation after some time.
- Explores arms completely at random without any concern about their merits.
4. Debugging
Bandit Algorithms
- Bandit algorithms are not black-box functions.
- Bandit algorithms have to actively select which data it should acquire (Active Learning)
and analyze that data at real time (Online Learning).
- Bandit data and bandit analysis are inseparable. “Feedback cycle.”
4. Debugging
Bandit Algorithms
- Use Monte Carlo simulation to provide simulated data in real-time.
- Analyzing results
- Tracking the probability of choosing the best arm,
as both bandit algorithms and rewards are probabilistic.
- Tracking the average reward at each point in time.
- Tracking the cumulative reward at each point in time,
to look at the bigger picture of the lifetime performance.
5. The Softmax Algorithm
- Problem of fixed ϵ revisited
- If the difference in rewards between two arms is small,
more exploration is needed, and vice versa.
- Never get past the intrinsic errors caused by the purely random exploration strategy.
- Set the probability of choosing arm A with accumulative reward rA
as …
-
- Temperature parameter τ shifts the behavior along a continuum
between pure exploration ( τ = ∞ ) and exploitation ( τ = 0 ) .
- Negative rewards are okay thanks to exponential rescaling.
- Annealing: Encouraging to explore less over time by slowly decreasing τ .
6. UCB
The Upper Confidence Bound Algorithm
- Problems of softmax algorithms
- Only pay attention on how much reward they’ve gotten from the arms.
- Gullible: easily misled by a few negative experiences, as the algorithm do not keep track of how
much they know about the arms (how much confident).
- UCBs avoid being gullible by keeping track of confidence in assessments of the
estimated values of all the arms.
- UCBs doesn’t use randomness, and doesn’t have any free parameters.
6. UCB
The Upper Confidence Bound Algorithm
- UCB1 (one of the variants of UCBs) chooses arm i
with accumulative rewards ri
, bonus bi
, and number of times ni
as …
-
- Cold start is prevented by bi
= ∞
- UCBs are explicitly curious algorithms.
- Curiousness are implemented with bonus bi
, where bi
gets bigger when ni
is too small.
- So, we will occasionally visit the worst of the arms.
6. UCB
The Upper Confidence Bound Algorithm
- Comparing bandit algorithms side-by-side
- UCB1 is much noisier than ϵ -Greedy or Softmax.
- ϵ -Greedy doesn’t converge as quickly as Softmax.
- UCB1 takes a while to catch up with Softmax.
- UCB1 finds the best arm quickly,
but the backpedaling it does causes it to underperform the Softmax.
7. Bandits in the Real World:
Complexity and Complications
- A/A Testing
- Testing of bandit algorithms itself
- Estimation of the actual variability in real-time data.
- Running concurrent experiments
- May have strange interactions between experiments (ex. different logos and fonts)
- Continuous experimentation vs. Periodic testing
- Bandit algorithms look much better than A/B testing when you are willing to let them run for a
very long time.
- Metrics of Success
- Optimizing short-term CTR may destroy long-term retainability.
- Rescaling metrics into 0-1 space helps algorithms to work well.
- Moving worlds
- Arms with changing rewards raise serious problems.
- Average (No parameter to tune) vs. Weighted Average (Flexibility towards moving worlds)
8. Conclusion
- There is no universal bandit algorithm that will always do the best job.
- Domain expertise and good judgement will always be necessary.
- There is always a trade-off between exploration & exploitation.
Initialization of an algorithm matters a lot. Biases may both help or hurt.
- Make sure you explore less over time.

More Related Content

Similar to Bandit algorithms for website optimization - A summary

algorithm_2algorithm_analysis.pdf
algorithm_2algorithm_analysis.pdfalgorithm_2algorithm_analysis.pdf
algorithm_2algorithm_analysis.pdf
HsuChi Chen
 
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHMNON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
IRJET Journal
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
HJ van Veen
 
The monte carlo method
The monte carlo methodThe monte carlo method
The monte carlo method
Saurabh Sood
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
MLconf
 
GA.-.Presentation
GA.-.PresentationGA.-.Presentation
GA.-.Presentation
oldmanpat
 
cs1538.ppt
cs1538.pptcs1538.ppt
cs1538.ppt
TaraLeander
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
Jim Dowling
 
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxApriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
NingthoujamMahesh1
 
Bandit Algorithms
Bandit AlgorithmsBandit Algorithms
Bandit Algorithms
SC5.io
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
Triskelion_Kaggle
 
Multi Armed Bandits
Multi Armed BanditsMulti Armed Bandits
Multi Armed Bandits
Alireza Shafaei
 
Ensemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized RecommendationEnsemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized Recommendation
Liang Tang
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
Olivier Teytaud
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
Olivier Teytaud
 
Using Java & Genetic Algorithms to Beat the Market
Using Java & Genetic Algorithms to Beat the MarketUsing Java & Genetic Algorithms to Beat the Market
Using Java & Genetic Algorithms to Beat the Market
Matthew Ring
 
Cmpe 255 cross validation
Cmpe 255 cross validationCmpe 255 cross validation
Cmpe 255 cross validation
Abraham Kong
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
M Waleed Kadous
 
September 11, Deliberative Algorithms II
September 11, Deliberative Algorithms IISeptember 11, Deliberative Algorithms II
September 11, Deliberative Algorithms II
University of Colorado at Boulder
 
AWS Cost Opt Meetup 2 - News corp - Spot On deep dive
AWS Cost Opt Meetup 2 - News corp - Spot On deep diveAWS Cost Opt Meetup 2 - News corp - Spot On deep dive
AWS Cost Opt Meetup 2 - News corp - Spot On deep dive
Peter Shi
 

Similar to Bandit algorithms for website optimization - A summary (20)

algorithm_2algorithm_analysis.pdf
algorithm_2algorithm_analysis.pdfalgorithm_2algorithm_analysis.pdf
algorithm_2algorithm_analysis.pdf
 
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHMNON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
The monte carlo method
The monte carlo methodThe monte carlo method
The monte carlo method
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
GA.-.Presentation
GA.-.PresentationGA.-.Presentation
GA.-.Presentation
 
cs1538.ppt
cs1538.pptcs1538.ppt
cs1538.ppt
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxApriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
 
Bandit Algorithms
Bandit AlgorithmsBandit Algorithms
Bandit Algorithms
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
 
Multi Armed Bandits
Multi Armed BanditsMulti Armed Bandits
Multi Armed Bandits
 
Ensemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized RecommendationEnsemble Contextual Bandits for Personalized Recommendation
Ensemble Contextual Bandits for Personalized Recommendation
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 
Using Java & Genetic Algorithms to Beat the Market
Using Java & Genetic Algorithms to Beat the MarketUsing Java & Genetic Algorithms to Beat the Market
Using Java & Genetic Algorithms to Beat the Market
 
Cmpe 255 cross validation
Cmpe 255 cross validationCmpe 255 cross validation
Cmpe 255 cross validation
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 
September 11, Deliberative Algorithms II
September 11, Deliberative Algorithms IISeptember 11, Deliberative Algorithms II
September 11, Deliberative Algorithms II
 
AWS Cost Opt Meetup 2 - News corp - Spot On deep dive
AWS Cost Opt Meetup 2 - News corp - Spot On deep diveAWS Cost Opt Meetup 2 - News corp - Spot On deep dive
AWS Cost Opt Meetup 2 - News corp - Spot On deep dive
 

More from Kwanghee Choi

Visual Transformers
Visual TransformersVisual Transformers
Visual Transformers
Kwanghee Choi
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
Kwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)
Kwanghee Choi
 
Recommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsRecommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal Scrolls
Kwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)
Kwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)
Kwanghee Choi
 
Before and After the AI Winter - Recap
Before and After the AI Winter - RecapBefore and After the AI Winter - Recap
Before and After the AI Winter - Recap
Kwanghee Choi
 
Mastering Gomoku - Recap
Mastering Gomoku - RecapMastering Gomoku - Recap
Mastering Gomoku - Recap
Kwanghee Choi
 
Teachings of Ada Lovelace
Teachings of Ada LovelaceTeachings of Ada Lovelace
Teachings of Ada Lovelace
Kwanghee Choi
 
div, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewdiv, grad, curl, and all that - a review
div, grad, curl, and all that - a review
Kwanghee Choi
 
Gaussian processes
Gaussian processesGaussian processes
Gaussian processes
Kwanghee Choi
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
Kwanghee Choi
 
Duality between OOP and RL
Duality between OOP and RLDuality between OOP and RL
Duality between OOP and RL
Kwanghee Choi
 
JFEF encoding
JFEF encodingJFEF encoding
JFEF encoding
Kwanghee Choi
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
Kwanghee Choi
 
Azure functions: Quickstart
Azure functions: QuickstartAzure functions: Quickstart
Azure functions: Quickstart
Kwanghee Choi
 
Modern convolutional object detectors
Modern convolutional object detectorsModern convolutional object detectors
Modern convolutional object detectors
Kwanghee Choi
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving Average
Kwanghee Choi
 
Jpl coding standard for the c programming language
Jpl coding standard for the c programming languageJpl coding standard for the c programming language
Jpl coding standard for the c programming language
Kwanghee Choi
 

More from Kwanghee Choi (19)

Visual Transformers
Visual TransformersVisual Transformers
Visual Transformers
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)
 
Recommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsRecommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal Scrolls
 
추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)
 
추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)
 
Before and After the AI Winter - Recap
Before and After the AI Winter - RecapBefore and After the AI Winter - Recap
Before and After the AI Winter - Recap
 
Mastering Gomoku - Recap
Mastering Gomoku - RecapMastering Gomoku - Recap
Mastering Gomoku - Recap
 
Teachings of Ada Lovelace
Teachings of Ada LovelaceTeachings of Ada Lovelace
Teachings of Ada Lovelace
 
div, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewdiv, grad, curl, and all that - a review
div, grad, curl, and all that - a review
 
Gaussian processes
Gaussian processesGaussian processes
Gaussian processes
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
 
Duality between OOP and RL
Duality between OOP and RLDuality between OOP and RL
Duality between OOP and RL
 
JFEF encoding
JFEF encodingJFEF encoding
JFEF encoding
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
 
Azure functions: Quickstart
Azure functions: QuickstartAzure functions: Quickstart
Azure functions: Quickstart
 
Modern convolutional object detectors
Modern convolutional object detectorsModern convolutional object detectors
Modern convolutional object detectors
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving Average
 
Jpl coding standard for the c programming language
Jpl coding standard for the c programming languageJpl coding standard for the c programming language
Jpl coding standard for the c programming language
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 

Bandit algorithms for website optimization - A summary

  • 1. Bandit Algorithms for Website Optimization by John Myles White summary by Kyle (Kwanghee Choi)
  • 3. 1. Two Characters: Exploration and Exploitation - Need to balance exploration and exploitation - or, experimentation and profit-maximization - or, learning new ideas and taking advantage of the best of old ideas, - or, gathering data and acting on that data
  • 4. 2. Why Use Multiarmed Bandit Algorithms? - Measurable achievements examples - Traffic, Conversions, Sales, CTRs - Definitions - Reward: Measure of success - Arms: List of potential changes - Explaining standard A/B testing as an exploration - exploitation tradeoff - Short period of pure exploration (Assigning equal numbers of users to A/B) - Long period of pure exploitation (Send all of the users to successful option - Why A/B testing might be a bad strategy? - Abrupt transition - Wastes resources exploring inferior options
  • 5. 3. The ϵ-Greedy Algorithm - Tries to be fair to the two opposite goals of exploration & exploitation - ϵ=0: Pure exploitation - ϵ=1: Pure exploration - Problem of fixed ϵ - May need more exploration at the start, may need more exploitation after some time. - Explores arms completely at random without any concern about their merits.
  • 6. 4. Debugging Bandit Algorithms - Bandit algorithms are not black-box functions. - Bandit algorithms have to actively select which data it should acquire (Active Learning) and analyze that data at real time (Online Learning). - Bandit data and bandit analysis are inseparable. “Feedback cycle.”
  • 7. 4. Debugging Bandit Algorithms - Use Monte Carlo simulation to provide simulated data in real-time. - Analyzing results - Tracking the probability of choosing the best arm, as both bandit algorithms and rewards are probabilistic. - Tracking the average reward at each point in time. - Tracking the cumulative reward at each point in time, to look at the bigger picture of the lifetime performance.
  • 8. 5. The Softmax Algorithm - Problem of fixed ϵ revisited - If the difference in rewards between two arms is small, more exploration is needed, and vice versa. - Never get past the intrinsic errors caused by the purely random exploration strategy. - Set the probability of choosing arm A with accumulative reward rA as … - - Temperature parameter τ shifts the behavior along a continuum between pure exploration ( τ = ∞ ) and exploitation ( τ = 0 ) . - Negative rewards are okay thanks to exponential rescaling. - Annealing: Encouraging to explore less over time by slowly decreasing τ .
  • 9. 6. UCB The Upper Confidence Bound Algorithm - Problems of softmax algorithms - Only pay attention on how much reward they’ve gotten from the arms. - Gullible: easily misled by a few negative experiences, as the algorithm do not keep track of how much they know about the arms (how much confident). - UCBs avoid being gullible by keeping track of confidence in assessments of the estimated values of all the arms. - UCBs doesn’t use randomness, and doesn’t have any free parameters.
  • 10. 6. UCB The Upper Confidence Bound Algorithm - UCB1 (one of the variants of UCBs) chooses arm i with accumulative rewards ri , bonus bi , and number of times ni as … - - Cold start is prevented by bi = ∞ - UCBs are explicitly curious algorithms. - Curiousness are implemented with bonus bi , where bi gets bigger when ni is too small. - So, we will occasionally visit the worst of the arms.
  • 11. 6. UCB The Upper Confidence Bound Algorithm - Comparing bandit algorithms side-by-side - UCB1 is much noisier than ϵ -Greedy or Softmax. - ϵ -Greedy doesn’t converge as quickly as Softmax. - UCB1 takes a while to catch up with Softmax. - UCB1 finds the best arm quickly, but the backpedaling it does causes it to underperform the Softmax.
  • 12. 7. Bandits in the Real World: Complexity and Complications - A/A Testing - Testing of bandit algorithms itself - Estimation of the actual variability in real-time data. - Running concurrent experiments - May have strange interactions between experiments (ex. different logos and fonts) - Continuous experimentation vs. Periodic testing - Bandit algorithms look much better than A/B testing when you are willing to let them run for a very long time. - Metrics of Success - Optimizing short-term CTR may destroy long-term retainability. - Rescaling metrics into 0-1 space helps algorithms to work well. - Moving worlds - Arms with changing rewards raise serious problems. - Average (No parameter to tune) vs. Weighted Average (Flexibility towards moving worlds)
  • 13. 8. Conclusion - There is no universal bandit algorithm that will always do the best job. - Domain expertise and good judgement will always be necessary. - There is always a trade-off between exploration & exploitation. Initialization of an algorithm matters a lot. Biases may both help or hurt. - Make sure you explore less over time.