Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Byron Galbraith is the Chief Data Scientist and co-founder of Talla, where he works to translate the latest advancements in machine learning and natural language processing to build AI-powered conversational agents. Byron has a PhD in Cognitive and Neural Systems from Boston University and an MS in Bioinformatics from Marquette University. His research expertise includes brain-computer interfaces, neuromorphic robotics, spiking neural networks, high-performance computing, and natural language processing. Byron has also held several software engineering roles including back-end system engineer, full stack web developer, office automation consultant, and game engine developer at companies ranging in size from a two-person startup to a multi-national enterprise.

Abstract Summary:

Bayesian Bandits:
What color should that button be to convert more sales? What ad will most likely get clicked on? What movie recommendations should be displayed to keep subscribers engaged? What should we have for lunch? These are all examples of iterated decision problems β€” the same choice has to be made repeatedly with the goal being to arrive at an optimal decision strategy by incorporating the results of the previous decisions. In this talk I will describe the Bayesian Bandit solution to these types of problems, how it adaptively learns to minimize regret, how additional contextual information can be incorporated, and how it compares to the more traditional A/B testing solution.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

  1. 1. Bayesian Bandits Byron Galbraith, PhD Cofounder / Chief Data Scientist, Talla 2017.03.24
  2. 2. Bayesian Bandits for the Impatient Online adaptive learning: β€œEarn while you Learn”1 2 3 Powerful alternative to A/B testing optimization Can be efficient and easy to implement
  3. 3. Dining Ware VR Experiences on Demand
  4. 4. Dining Ware VR Experiences on Demand
  5. 5. Iterated Decision Problems What product recommendations should we present to subscribers to keep them engaged?
  6. 6. A/B Testing
  7. 7. Exploit vs Explore - What should we do? Choose what seems best so far πŸ™‚ Feel good about our decision πŸ™‚ There still may be something better Try something new πŸ˜„ Discover a superior approach 😧 Regret our choice
  8. 8. A/B/n Testing
  9. 9. Regret - What did that experiment cost us?
  10. 10. The Multi-Armed Bandit Problem
  11. 11. Bandit Solutions 𝑅 𝑇 = 𝑑=1 𝑇 π‘Ÿ(π‘Œπ‘‘ π‘Žβˆ— ) βˆ’ π‘Ÿ π‘Œπ‘‘ π‘Ž 𝑑 k-MAB = 𝐴, π‘Œ, 𝑃, π‘Ÿ π‘Ÿπ‘Ž 𝑛+1 = π‘Ÿπ‘Ž 𝑛 + 1 𝑛 π‘Ž π‘Ÿπ‘Ž 𝑑 βˆ’ π‘Ÿπ‘Ž 𝑛 π‘Ž 𝑑 = argmax 𝑖 π‘Ÿπ‘– 𝑑 + 𝑐 log 𝑑 𝑛𝑖 𝑃 𝐴 𝑑 = π‘Ž = π‘’β„Ž π‘Ž 𝑛 𝑏=1 π‘˜ π‘’β„Ž 𝑏 𝑛 = πœ‹ 𝑑(π‘Ž) β„Ž π‘Ž 𝑛+1 = β„Ž π‘Ž 𝑛 + 𝛼 π‘Ÿπ‘Ž 𝑑 βˆ’ π‘Ÿπ‘Ž 𝑛 (1 βˆ’ πœ‹ 𝑑 π‘Ž ) β„Ž 𝑏 𝑛+1 = β„Ž 𝑏 𝑛 βˆ’ 𝛼 π‘Ÿπ‘Ž 𝑑 βˆ’ π‘Ÿπ‘Ž 𝑛 πœ‹ 𝑑 𝑏 , 𝑏 β‰  π‘Ž 𝑃 𝑋 = π‘₯ = π‘₯ π›Όβˆ’1 1 βˆ’ π‘₯ π›½βˆ’1 𝐡 𝛼, 𝛽 𝑃 𝑋 = π‘₯ = 𝑛 π‘₯ 𝑝 π‘₯ 1 βˆ’ 𝑝 π‘›βˆ’π‘₯ π΅π‘’π‘‘π‘Ž π‘Ž(𝛼 + π‘Ÿπ‘Ž, 𝛽 + 𝑁 βˆ’ π‘Ÿπ‘Ž) 𝑃 𝑋 π‘Œ, 𝑍 = 𝑃 π‘Œ 𝑋, 𝑍 𝑃 𝑋 𝑍 𝑃 π‘Œ 𝑍
  12. 12. Thompson Sampling 𝑷 𝜽 𝒓, 𝒂 ∝ 𝑷 𝒓 𝜽, 𝒂 𝑷 𝜽|𝒂 Prior Likelihood Posterior
  13. 13. Bayesian Bandits – The Model Model if a recommendation will result in user engagement β€’ Bernoulli distribution: 𝑝 - likelihood of event occurring How do we find 𝑝? β€’ Conjugate prior β€’ Beta distribution: 𝛼 - number of hits, 𝛽 - number of misses Only need to keep track of two numbers per option β€’ # of hits, # of misses
  14. 14. Bayesian Bandits – The Algorithm 1. Initialize 𝛼𝑖 = 𝛽𝑖 = 1 (uniform prior) 2. For each user request for recommendations t 1. Sample 𝑝𝑖 ~ π΅π‘’π‘‘π‘Ž 𝛼𝑖, 𝛽𝑖 2. Choose action corresponding to largest 𝑝𝑖 3. Observe reward π‘Ÿπ‘‘ 4. Update 𝛼𝑑 += π‘Ÿπ‘‘, 𝛽𝑑 += 1 βˆ’ π‘Ÿπ‘‘
  15. 15. Belief Adaptation
  16. 16. Belief Adaptation
  17. 17. Belief Adaptation
  18. 18. Belief Adaptation
  19. 19. Belief Adaptation
  20. 20. Bandit Regret
  21. 21. But behavior is dependent on context β€’ Categorical contexts β€’ One bandit model per category β€’ One-hot context vector β€’ Real-valued contexts β€’ Can capture interrelatedness of context dimensions β€’ More difficult to incorporate effectively
  22. 22. So why would I ever A/B test again? Test intent Optimization vs understanding Difficulty with non-stationarity Monday vs Friday behavior Deployment Few turnkey options Specialized skill set
  23. 23. Bayesian Bandits for the Patient Thompson Sampling balances exploitation & exploration while minimizing decision regret1 2 3 No need to pre-specify decision splits, time horizon for experiments Can model a variety of problems and complex interactions
  24. 24. Resources