Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Efficient Approximate Thompson Sampling
for Search Query Recommendation
Chu-Cheng Hsieh
1
2
Nov.
13
Dec.
13
Jan.
13
Feb.
13
Training Date
Training Data
3
What’s wrong?
4
Resource is limited
5
Before vs. After Xmas
6
Multi-arm bandit
Problem (MAB)
Thompson Sampling
Query recommendation
Experiments
The k-arm Bandit Problem
7
A
B
C
8
Play the right slot machine
that maximize your profit
#Goal#
9
1. 30% (play each evenly)
2. 70% play the best one
#Simple Strategy#
10
Related Search => MAB problem (M=1)
A B C D E
11
Multi-arm bandit Problem
Thompson (TS)
Sampling
Query recommendation
Experiments
12
Exploration
(learn)
(earn)
Exploitation
x = random.random() #0<=x<1
y = 0.49
if (x < y):
return true
else:
return false
(Observed)
13
(Estimate)
I played 10 times...
14
Data Observed
Action (Machine)
(Refienments being displayed)
Reward (Win/Loss)
(CTR, BBOWA, ...)
15
#Question#
Learn or Earn?
Red
19 win
9 lose
Blue
59 win
39 lose
16
Thompson Sampling
Selecting a candidate based on the following formula:
Assuming:
0.0 0.2 0.4 0.6 0.8 1.0
0.60.81.01.21.4
x
PDF
17
Beta(0+1,0+1)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.5
x
PDF
18
Beta(5+1,5+1)
0.0 0.2 0.4 0.6 0.8 1.0
02468
x
PDF
19
Beta(45+1,55+1)
20
#Example#
Learn or Earn?
Red
19 win
9 lose
Blue
59 win
39 lose
75% 25%
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
The motivation of Thompson-S (2)
21
Beta(20,10)
Beta(6...
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
Intuition
(Underdog, but worth to learn)
22
Beta(4,6)
...
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
The motivation of Thompson-S (1)
23
Beta(10,15)
Beta(6...
Intuition (Equal exploration)
24 0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
Beta(40,60) Beta(60,4...
25
Init: a=1, b=1, Sx=Fx=0 for all x
each arm corresponds to Beta(Sx+a, Fx+b) prior
1. Draw a random number from each arm
...
26
27
Multi-arm bandit Problem
Thompson Sampling
Query Recommendation
Experiments
28
Ugly truth #1
M > 1 (M=5 here)
29
Ugly truth # 2
No response => failure ?
31
32
Multi-arm bandit Problem
Thompson Sampling
Query Recommendation
Experiments
33
#Experiments#
 Target: popular 100 (queries)
 Date: 2 weeks (Nov. 2013)
 Goal: identify top M
 Measurement: Regret
...
34
M=1M=2
#Goal#
Identify top M quickly.
35
M=1,2,3 and γ=0.02
36
M=2, γ=0~2
37
 Chu-Cheng Hsieh
38
Hsieh, C. Neufeld, J. Holloway King, T. Cho, J.J.
Efficient Approximate Thompson Sampling for
Search Query Recommendati...
Upcoming SlideShare
Loading in …5
×

Efficient Approximate Thompson Sampling for Search Query Recommendation in SAC'15

457 views

Published on

Query suggestions have been a valuable feature for e-commerce sites in helping shoppers refine their search intent. In this work, we develop an algorithm that helps e-commerce sites like eBay mingle the output of different recommendation algorithms. Our algorithm is based on “Thompson Sampling” — a technique designed for solving multi-arm bandit problems where the best results are not known in advance but instead are tried out to gather feedback. Our approach is to treat query suggestions as a competition among data resources: we have many query suggestion candidates competing for limited space on the search results page. An “arm” is played when a query suggestion candidate is chosen for display, and our goal is to maximize the expected reward (user clicks on a suggestion). Our experiments have shown promising results in using the click-based user feedback to drive success by enhancing the quality of query suggestions.

Published in: Technology
  • Be the first to comment

Efficient Approximate Thompson Sampling for Search Query Recommendation in SAC'15

  1. 1. Efficient Approximate Thompson Sampling for Search Query Recommendation Chu-Cheng Hsieh 1
  2. 2. 2 Nov. 13 Dec. 13 Jan. 13 Feb. 13 Training Date Training Data
  3. 3. 3 What’s wrong?
  4. 4. 4 Resource is limited
  5. 5. 5 Before vs. After Xmas
  6. 6. 6 Multi-arm bandit Problem (MAB) Thompson Sampling Query recommendation Experiments
  7. 7. The k-arm Bandit Problem 7 A B C
  8. 8. 8 Play the right slot machine that maximize your profit #Goal#
  9. 9. 9 1. 30% (play each evenly) 2. 70% play the best one #Simple Strategy#
  10. 10. 10 Related Search => MAB problem (M=1) A B C D E
  11. 11. 11 Multi-arm bandit Problem Thompson (TS) Sampling Query recommendation Experiments
  12. 12. 12 Exploration (learn) (earn) Exploitation
  13. 13. x = random.random() #0<=x<1 y = 0.49 if (x < y): return true else: return false (Observed) 13 (Estimate) I played 10 times -- win 5 and lose 5 I played 100 times -- win 45 and lose 55 #Learn# Chanceofhavingμ μ ``prior’’
  14. 14. 14 Data Observed Action (Machine) (Refienments being displayed) Reward (Win/Loss) (CTR, BBOWA, ...)
  15. 15. 15 #Question# Learn or Earn? Red 19 win 9 lose Blue 59 win 39 lose
  16. 16. 16 Thompson Sampling Selecting a candidate based on the following formula: Assuming:
  17. 17. 0.0 0.2 0.4 0.6 0.8 1.0 0.60.81.01.21.4 x PDF 17 Beta(0+1,0+1)
  18. 18. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.02.5 x PDF 18 Beta(5+1,5+1)
  19. 19. 0.0 0.2 0.4 0.6 0.8 1.0 02468 x PDF 19 Beta(45+1,55+1)
  20. 20. 20 #Example# Learn or Earn? Red 19 win 9 lose Blue 59 win 39 lose 75% 25%
  21. 21. 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF The motivation of Thompson-S (2) 21 Beta(20,10) Beta(60,40) See a good one; “learn more”
  22. 22. 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF Intuition (Underdog, but worth to learn) 22 Beta(4,6) Beta(60,40)
  23. 23. 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF The motivation of Thompson-S (1) 23 Beta(10,15) Beta(60,40) avoid exploring “low potential” arm early on
  24. 24. Intuition (Equal exploration) 24 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF Beta(40,60) Beta(60,40)
  25. 25. 25 Init: a=1, b=1, Sx=Fx=0 for all x each arm corresponds to Beta(Sx+a, Fx+b) prior 1. Draw a random number from each arm based on Beta(Sx+a, Fx+b) 2. Play the arm (x’) with the highest number 3. If (see a reward) Sx’ += 1 else Fx’ += 1 Algorithm
  26. 26. 26
  27. 27. 27 Multi-arm bandit Problem Thompson Sampling Query Recommendation Experiments
  28. 28. 28 Ugly truth #1 M > 1 (M=5 here)
  29. 29. 29
  30. 30. Ugly truth # 2 No response => failure ?
  31. 31. 31
  32. 32. 32 Multi-arm bandit Problem Thompson Sampling Query Recommendation Experiments
  33. 33. 33 #Experiments#  Target: popular 100 (queries)  Date: 2 weeks (Nov. 2013)  Goal: identify top M  Measurement: Regret -> Best -> Picked
  34. 34. 34 M=1M=2 #Goal# Identify top M quickly.
  35. 35. 35 M=1,2,3 and γ=0.02
  36. 36. 36 M=2, γ=0~2
  37. 37. 37  Chu-Cheng Hsieh
  38. 38. 38 Hsieh, C. Neufeld, J. Holloway King, T. Cho, J.J. Efficient Approximate Thompson Sampling for Search Query Recommendation The 30th ACM/SIGAPP Symposium On Applied Computing (SAC 2015) About the author & download: http://oak.cs.ucla.edu/~chucheng/

×