SlideShare a Scribd company logo
UCB
Upper Confidence Bound
Challenge Description
• RL –serves the option that aims to maximize the reward
(e.g. if we measure clicks we wish to serve the option that will to be
clicked with the biggest probability )
Problem: After a certain duration there is a stronger option that will
always be served.
Epsilon-Greedy
• What is epsilon greedy?
• Causata’s implementation .
Multi –Armed Bandit (bandit)
• The problem :
Consider a casino with many slot machines. Each with a certain
unknown pay-out rates (e.g. 0.6 ,0.3, 0.4).
We aim to maximize our reward, hence we should learn the rates.
Exploration – We explore over the payouts
Exploitation – We assume that we have learned and we take the optimal
Q: How to balance between Exploration & Exploitation ?
Bandit algorithms verify that exploration will always take place
Bandit (Cont.)
• We can do A/B testing
1. Consider K machines
2. Play each of them randomly and measure the reward
3. Take the best measured rate.
• We can do UCB
• Impressions
• Responses (Positive responses)
• Opportunities
UCB – How does it work?
• We measure the pay-out rate of each option as in A/B
• Rather taking the biggest rate we take the rate+std
• It can be used as exploration mechanism (We follow this mechanism)
• It can be used in exploitation (explore and while exploiting using this
mechanism)
Visual Example
Chernoff/Hoeffding
• Chernoff/Hoeffding
• Let Xi ∈ [ai , bi ] independent random variables with µi = E[Xi ]
P(Ʃ|xi- µi| ≥ε) ≤ 2*exp((-2ε2 )/(Ʃ|𝑏𝑖 − 𝑎𝑖|2 ))
For every ε >0
Chernoff Hoefding (cont)
• For UCB needs we take :
• ε = 2log(t) /s where t is the amount of samples and s the amount of
impressions for a single arm .
• With some manipulations we get
• P(µi + 2log(t) /s ≤ µi) ≤ exp(-4log(t)) =-𝑡4
Formulas
• UCB= P +sqrt( (1-p) * p /impressions)
• Auer improvement
UCB =P +sqrt((1-p)*P*log(opportunities) /impressions))
• Next improvement
• UCB = P +sqrt((1-p)*P*log(opportunities) /impressions)) +log(opportunities
)/impressions -
• Note that this correction term may go to infinity thus we have a window,
• Further reading – Chernoff/Hoeffding inequality
Where it is used?
• In Causata’s engine –Exploration and solely exploration
• One can use the current exploration mechanism and use UCB as
exploitation (i.e. rather taking the best mean take the best UCB)

More Related Content

Similar to Ucb

Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
SC5.io
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
Utkarsh Garg
 
Ninja Cursors
Ninja CursorsNinja Cursors
Ninja Cursors
Masatomo Kobayashi
 
Step Count Method for Time Complexity Analysis.pptx
Step Count Method for Time Complexity Analysis.pptxStep Count Method for Time Complexity Analysis.pptx
Step Count Method for Time Complexity Analysis.pptx
vijaykumarsoni16
 
Week 2 - ML models and Linear Regression.pptx
Week 2 - ML models and Linear Regression.pptxWeek 2 - ML models and Linear Regression.pptx
Week 2 - ML models and Linear Regression.pptx
HafizAliHummad
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
Sreedhar Chowdam
 
Lecture 4 asymptotic notations
Lecture 4   asymptotic notationsLecture 4   asymptotic notations
Lecture 4 asymptotic notations
jayavignesh86
 
7. Reinforcement Learning.pdf
7. Reinforcement Learning.pdf7. Reinforcement Learning.pdf
7. Reinforcement Learning.pdf
Jyoti Yadav
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
Aryanhayaran
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
Gianvito Siciliano
 
ERF Training Workshop Panel Data 5
ERF Training WorkshopPanel Data 5ERF Training WorkshopPanel Data 5
ERF Training Workshop Panel Data 5
Economic Research Forum
 
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
Masanori HIRANO
 
e.ppt
e.ppte.ppt
Week08.pdf
Week08.pdfWeek08.pdf
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
IDEAS - Int'l Data Engineering and Science Association
 
Permutations and Combinations IIT JEE+Olympiad Lecture 1
Permutations and Combinations IIT JEE+Olympiad Lecture 1 Permutations and Combinations IIT JEE+Olympiad Lecture 1
Permutations and Combinations IIT JEE+Olympiad Lecture 1
Parth Nandedkar
 
DSA
DSADSA
DSA
rrupa2
 

Similar to Ucb (20)

Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
 
Ninja Cursors
Ninja CursorsNinja Cursors
Ninja Cursors
 
Step Count Method for Time Complexity Analysis.pptx
Step Count Method for Time Complexity Analysis.pptxStep Count Method for Time Complexity Analysis.pptx
Step Count Method for Time Complexity Analysis.pptx
 
Week 2 - ML models and Linear Regression.pptx
Week 2 - ML models and Linear Regression.pptxWeek 2 - ML models and Linear Regression.pptx
Week 2 - ML models and Linear Regression.pptx
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
 
Lecture 4 asymptotic notations
Lecture 4   asymptotic notationsLecture 4   asymptotic notations
Lecture 4 asymptotic notations
 
Lecture1
Lecture1Lecture1
Lecture1
 
7. Reinforcement Learning.pdf
7. Reinforcement Learning.pdf7. Reinforcement Learning.pdf
7. Reinforcement Learning.pdf
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
ERF Training Workshop Panel Data 5
ERF Training WorkshopPanel Data 5ERF Training WorkshopPanel Data 5
ERF Training Workshop Panel Data 5
 
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
 
e.ppt
e.ppte.ppt
e.ppt
 
Week08.pdf
Week08.pdfWeek08.pdf
Week08.pdf
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Permutations and Combinations IIT JEE+Olympiad Lecture 1
Permutations and Combinations IIT JEE+Olympiad Lecture 1 Permutations and Combinations IIT JEE+Olympiad Lecture 1
Permutations and Combinations IIT JEE+Olympiad Lecture 1
 
DSA
DSADSA
DSA
 

More from Natan Katz

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
Natan Katz
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
Natan Katz
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
Natan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
Natan Katz
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
Natan Katz
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
Natan Katz
 
Quant2a
Quant2aQuant2a
Quant2a
Natan Katz
 
Bismark
BismarkBismark
Bismark
Natan Katz
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
Natan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
Natan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
Natan Katz
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
Natan Katz
 
Neural ODE
Neural ODENeural ODE
Neural ODE
Natan Katz
 
Variational inference
Variational inference  Variational inference
Variational inference
Natan Katz
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
Natan Katz
 

More from Natan Katz (15)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 
Variational inference
Variational inference  Variational inference
Variational inference
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 

Recently uploaded

Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 

Recently uploaded (20)

Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 

Ucb

  • 2. Challenge Description • RL –serves the option that aims to maximize the reward (e.g. if we measure clicks we wish to serve the option that will to be clicked with the biggest probability ) Problem: After a certain duration there is a stronger option that will always be served.
  • 3. Epsilon-Greedy • What is epsilon greedy? • Causata’s implementation .
  • 4. Multi –Armed Bandit (bandit) • The problem : Consider a casino with many slot machines. Each with a certain unknown pay-out rates (e.g. 0.6 ,0.3, 0.4). We aim to maximize our reward, hence we should learn the rates. Exploration – We explore over the payouts Exploitation – We assume that we have learned and we take the optimal Q: How to balance between Exploration & Exploitation ? Bandit algorithms verify that exploration will always take place
  • 5. Bandit (Cont.) • We can do A/B testing 1. Consider K machines 2. Play each of them randomly and measure the reward 3. Take the best measured rate. • We can do UCB • Impressions • Responses (Positive responses) • Opportunities
  • 6. UCB – How does it work? • We measure the pay-out rate of each option as in A/B • Rather taking the biggest rate we take the rate+std • It can be used as exploration mechanism (We follow this mechanism) • It can be used in exploitation (explore and while exploiting using this mechanism)
  • 8. Chernoff/Hoeffding • Chernoff/Hoeffding • Let Xi ∈ [ai , bi ] independent random variables with µi = E[Xi ] P(Ʃ|xi- µi| ≥ε) ≤ 2*exp((-2ε2 )/(Ʃ|𝑏𝑖 − 𝑎𝑖|2 )) For every ε >0
  • 9. Chernoff Hoefding (cont) • For UCB needs we take : • ε = 2log(t) /s where t is the amount of samples and s the amount of impressions for a single arm . • With some manipulations we get • P(µi + 2log(t) /s ≤ µi) ≤ exp(-4log(t)) =-𝑡4
  • 10. Formulas • UCB= P +sqrt( (1-p) * p /impressions) • Auer improvement UCB =P +sqrt((1-p)*P*log(opportunities) /impressions)) • Next improvement • UCB = P +sqrt((1-p)*P*log(opportunities) /impressions)) +log(opportunities )/impressions - • Note that this correction term may go to infinity thus we have a window, • Further reading – Chernoff/Hoeffding inequality
  • 11. Where it is used? • In Causata’s engine –Exploration and solely exploration • One can use the current exploration mechanism and use UCB as exploitation (i.e. rather taking the best mean take the best UCB)