Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sequential Learning in the Position-Based Model

1,073 views

Published on

Talk given by Claire Vernade, Télécom ParisTech, during the RecsysFR meetup on February 1st 2017.

Published in: Internet
  • Be the first to comment

Sequential Learning in the Position-Based Model

  1. 1. Sequential Learning in the Position-Based Model Claire Vernade, Olivier Cappé, Paul Lagrée (Télécom ParisTech) B.Kveton, S.Katariya, Z.Weng, C.Szepesvàri (Adobe Research, U.Alberta)
  2. 2. -Chris Stucchio « Don’t use Bandit Algorithms, they probably don’t work for you.» Blog de C.Stucchio: https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html
  3. 3. Position-Based Model 1 2 3 4 ✓ Xt ⇠ B(1 ⇥ ✓) Chucklin et al. (2008): ! Cascade Model, User Browsing, DCN, CCN, DCM, …
  4. 4. Multi-Armed Bandit 0,53 0,61 0,42 0,40 0,60 0,55 Unobserved expected reward Estimated empirical averages after a few pulls
  5. 5. Multi-Armed Bandit 0,53 0,61 0,42 0,40 0,60 0,55 ✓1 ✓2 ✓3
  6. 6. Two Bandit Games 1. Website optimization: You are the website manager ! 2. Add Placement: You want to place the right add in the right location 1 2 3 4 Balzac Zola
  7. 7. Website Optimization At = ( , , , ) ✓1✓2 ✓3✓4 ✓4rt = 4321 + + +✓2 ✓1 ✓3 Multiple-Plays Bandits in the Position-Based Model. NIPS 2016
  8. 8. Website Optimization The C-KLUCB algorithm The KL-UCB algorithm for Bounded Stochastic Bandits and Beyond. Cappé, Garivier, COLT 2011
  9. 9. Website Optimization Complexity Theorem (Lower Bound on the Regret) For any uniformly e cient algorithm, the regret is asymptotically bounded from below by For T large enough, R(T) log(T) ⇥ C(, ✓) 102 103 104 Round t 0 20 40 60 80 100 RegretR(T) Lower Bound C-KLUCB Ranked-UCB
  10. 10. Add Placement 0 B B @ · · · · · · · ✓kl · · · · · · · · · · · · 1 C C A ✓k l 1 2 3 4 At = (k, l) rt = ✓kl ✓1 ✓2 ✓3 ✓4 Stochastic Rank-1 Bandits. AISTATS 2017 KxL arms but K+L parameters !
  11. 11. Add Placement Stochastic Rank-1 Bandits. AISTATS 2017 lim inf T !1 R(T) log(T) KX k=2 (✓11 ✓k1) d(✓k1; ✓11) + LX l=2 (✓11 ✓1l) d(✓1l; ✓11) Complexity Theorem (Lower Bound on the Regret) Ccol(, ✓) Crow(, ✓)+R(T) log(T) ( ) For any uniformly e cient algorithm, the regret is asymptotically bounded from below by Which can be rewritten : for any T su ciently large,
  12. 12. Add Placement BM-KLUCB Idea : Alternatively explore the rows and the columns of the matrix using KL-UCB 102 103 104 105 106 Round t 0 20 40 60 80 100 120 140 RegretR(T) K = 3, L = 3 Lower Bound R1klucb
  13. 13. Take-Home Message ‘Real-Life’ Bandit Algorithms are getting real… but not yet. What comes next on Bandit models for recommendation and conversion optimization : stochastic bandits with delays, Rank-1 best arm identification, higher rank models ? No free lunch theorems : exploring comes at some price which depends on the complexity of the problem Existing ‘super theoretical’ works on bandits provide us super efficient algorithms in the end…
  14. 14. @vernadec

×