SlideShare a Scribd company logo
1 of 11
UCB
Upper Confidence Bound
Challenge Description
• RL –serves the option that aims to maximize the reward
(e.g. if we measure clicks we wish to serve the option that will to be
clicked with the biggest probability )
Problem: After a certain duration there is a stronger option that will
always be served.
Epsilon-Greedy
• What is epsilon greedy?
• Causata’s implementation .
Multi –Armed Bandit (bandit)
• The problem :
Consider a casino with many slot machines. Each with a certain
unknown pay-out rates (e.g. 0.6 ,0.3, 0.4).
We aim to maximize our reward, hence we should learn the rates.
Exploration – We explore over the payouts
Exploitation – We assume that we have learned and we take the optimal
Q: How to balance between Exploration & Exploitation ?
Bandit algorithms verify that exploration will always take place
Bandit (Cont.)
• We can do A/B testing
1. Consider K machines
2. Play each of them randomly and measure the reward
3. Take the best measured rate.
• We can do UCB
• Impressions
• Responses (Positive responses)
• Opportunities
UCB – How does it work?
• We measure the pay-out rate of each option as in A/B
• Rather taking the biggest rate we take the rate+std
• It can be used as exploration mechanism (We follow this mechanism)
• It can be used in exploitation (explore and while exploiting using this
mechanism)
Visual Example
Chernoff/Hoeffding
• Chernoff/Hoeffding
• Let Xi ∈ [ai , bi ] independent random variables with µi = E[Xi ]
P(Ʃ|xi- µi| ≥ε) ≤ 2*exp((-2ε2 )/(Ʃ|𝑏𝑖 − 𝑎𝑖|2 ))
For every ε >0
Chernoff Hoefding (cont)
• For UCB needs we take :
• ε = 2log(t) /s where t is the amount of samples and s the amount of
impressions for a single arm .
• With some manipulations we get
• P(µi + 2log(t) /s ≤ µi) ≤ exp(-4log(t)) =-𝑡4
Formulas
• UCB= P +sqrt( (1-p) * p /impressions)
• Auer improvement
UCB =P +sqrt((1-p)*P*log(opportunities) /impressions))
• Next improvement
• UCB = P +sqrt((1-p)*P*log(opportunities) /impressions)) +log(opportunities
)/impressions -
• Note that this correction term may go to infinity thus we have a window,
• Further reading – Chernoff/Hoeffding inequality
Where it is used?
• In Causata’s engine –Exploration and solely exploration
• One can use the current exploration mechanism and use UCB as
exploitation (i.e. rather taking the best mean take the best UCB)

More Related Content

Similar to Ucb

Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
Aryanhayaran
 

Similar to Ucb (20)

Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
 
Ninja Cursors
Ninja CursorsNinja Cursors
Ninja Cursors
 
Step Count Method for Time Complexity Analysis.pptx
Step Count Method for Time Complexity Analysis.pptxStep Count Method for Time Complexity Analysis.pptx
Step Count Method for Time Complexity Analysis.pptx
 
Week 2 - ML models and Linear Regression.pptx
Week 2 - ML models and Linear Regression.pptxWeek 2 - ML models and Linear Regression.pptx
Week 2 - ML models and Linear Regression.pptx
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
 
Lecture 4 asymptotic notations
Lecture 4   asymptotic notationsLecture 4   asymptotic notations
Lecture 4 asymptotic notations
 
Lecture1
Lecture1Lecture1
Lecture1
 
7. Reinforcement Learning.pdf
7. Reinforcement Learning.pdf7. Reinforcement Learning.pdf
7. Reinforcement Learning.pdf
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
ERF Training Workshop Panel Data 5
ERF Training WorkshopPanel Data 5ERF Training WorkshopPanel Data 5
ERF Training Workshop Panel Data 5
 
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
 
e.ppt
e.ppte.ppt
e.ppt
 
Week08.pdf
Week08.pdfWeek08.pdf
Week08.pdf
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Permutations and Combinations IIT JEE+Olympiad Lecture 1
Permutations and Combinations IIT JEE+Olympiad Lecture 1 Permutations and Combinations IIT JEE+Olympiad Lecture 1
Permutations and Combinations IIT JEE+Olympiad Lecture 1
 
DSA
DSADSA
DSA
 

More from Natan Katz

More from Natan Katz (15)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 
Variational inference
Variational inference  Variational inference
Variational inference
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 

Recently uploaded

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Cherry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Cherry
 

Recently uploaded (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Daily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter PhysicsDaily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter Physics
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
 

Ucb

  • 2. Challenge Description • RL –serves the option that aims to maximize the reward (e.g. if we measure clicks we wish to serve the option that will to be clicked with the biggest probability ) Problem: After a certain duration there is a stronger option that will always be served.
  • 3. Epsilon-Greedy • What is epsilon greedy? • Causata’s implementation .
  • 4. Multi –Armed Bandit (bandit) • The problem : Consider a casino with many slot machines. Each with a certain unknown pay-out rates (e.g. 0.6 ,0.3, 0.4). We aim to maximize our reward, hence we should learn the rates. Exploration – We explore over the payouts Exploitation – We assume that we have learned and we take the optimal Q: How to balance between Exploration & Exploitation ? Bandit algorithms verify that exploration will always take place
  • 5. Bandit (Cont.) • We can do A/B testing 1. Consider K machines 2. Play each of them randomly and measure the reward 3. Take the best measured rate. • We can do UCB • Impressions • Responses (Positive responses) • Opportunities
  • 6. UCB – How does it work? • We measure the pay-out rate of each option as in A/B • Rather taking the biggest rate we take the rate+std • It can be used as exploration mechanism (We follow this mechanism) • It can be used in exploitation (explore and while exploiting using this mechanism)
  • 8. Chernoff/Hoeffding • Chernoff/Hoeffding • Let Xi ∈ [ai , bi ] independent random variables with µi = E[Xi ] P(Ʃ|xi- µi| ≥ε) ≤ 2*exp((-2ε2 )/(Ʃ|𝑏𝑖 − 𝑎𝑖|2 )) For every ε >0
  • 9. Chernoff Hoefding (cont) • For UCB needs we take : • ε = 2log(t) /s where t is the amount of samples and s the amount of impressions for a single arm . • With some manipulations we get • P(µi + 2log(t) /s ≤ µi) ≤ exp(-4log(t)) =-𝑡4
  • 10. Formulas • UCB= P +sqrt( (1-p) * p /impressions) • Auer improvement UCB =P +sqrt((1-p)*P*log(opportunities) /impressions)) • Next improvement • UCB = P +sqrt((1-p)*P*log(opportunities) /impressions)) +log(opportunities )/impressions - • Note that this correction term may go to infinity thus we have a window, • Further reading – Chernoff/Hoeffding inequality
  • 11. Where it is used? • In Causata’s engine –Exploration and solely exploration • One can use the current exploration mechanism and use UCB as exploitation (i.e. rather taking the best mean take the best UCB)