Efficient Approximate Thompson Sampling for Search Query Recommendation in SAC'15

Efficient Approximate Thompson Sampling
for Search Query Recommendation
Chu-Cheng Hsieh
1

2
Nov.
13
Dec.
13
Jan.
13
Feb.
13
Training Date
Training Data

6
Multi-arm bandit
Problem (MAB)
Thompson Sampling
Query recommendation
Experiments

The k-arm Bandit Problem
7
A
B
C

8
Play the right slot machine
that maximize your profit
#Goal#

9
1. 30% (play each evenly)
2. 70% play the best one
#Simple Strategy#

10
Related Search => MAB problem (M=1)
A B C D E

11
Multi-arm bandit Problem
Thompson (TS)
Sampling
Query recommendation
Experiments

12
Exploration
(learn)
(earn)
Exploitation

x = random.random() #0<=x<1
y = 0.49
if (x < y):
return true
else:
return false
(Observed)
13
(Estimate)
I played 10 times -- win 5 and lose 5
I played 100 times -- win 45 and lose 55
#Learn#
Chanceofhavingμ
μ
``prior’’

14
Data Observed
Action (Machine)
(Refienments being displayed)
Reward (Win/Loss)
(CTR, BBOWA, ...)

15
#Question#
Learn or Earn?
Red
19 win
9 lose
Blue
59 win
39 lose

16
Thompson Sampling
Selecting a candidate based on the following formula:
Assuming:

0.0 0.2 0.4 0.6 0.8 1.0
0.60.81.01.21.4
x
PDF
17
Beta(0+1,0+1)

0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.5
x
PDF
18
Beta(5+1,5+1)

0.0 0.2 0.4 0.6 0.8 1.0
02468
x
PDF
19
Beta(45+1,55+1)

20
#Example#
Learn or Earn?
Red
19 win
9 lose
Blue
59 win
39 lose
75% 25%

0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
The motivation of Thompson-S (2)
21
Beta(20,10)
Beta(60,40)
See a
good one;
“learn more”

0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
Intuition
(Underdog, but worth to learn)
22
Beta(4,6)
Beta(60,40)

0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
The motivation of Thompson-S (1)
23
Beta(10,15)
Beta(60,40)
avoid exploring
“low potential” arm
early on

Intuition (Equal exploration)
24 0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
Beta(40,60) Beta(60,40)

25
Init: a=1, b=1, Sx=Fx=0 for all x
each arm corresponds to Beta(Sx+a, Fx+b) prior
1. Draw a random number from each arm
based on Beta(Sx+a, Fx+b)
2. Play the arm (x’) with the highest number
3. If (see a reward)
Sx’ += 1
else
Fx’ += 1
Algorithm

27
Thompson Sampling
Query Recommendation
Experiments

28
Ugly truth #1
M > 1 (M=5 here)

Ugly truth # 2
No response => failure ?

32
Thompson Sampling
Query Recommendation
Experiments

33
#Experiments#
 Target: popular 100 (queries)
 Date: 2 weeks (Nov. 2013)
 Goal: identify top M
 Measurement: Regret
->
Best
->
Picked

34
M=1M=2
#Goal#
Identify top M quickly.

38
Hsieh, C. Neufeld, J. Holloway King, T. Cho, J.J.
Efficient Approximate Thompson Sampling for
Search Query Recommendation The 30th
ACM/SIGAPP Symposium On Applied Computing
(SAC 2015)
About the author & download:
http://oak.cs.ucla.edu/~chucheng/

Efficient Approximate Thompson Sampling for Search Query Recommendation in SAC'15

More Related Content

Similar to Efficient Approximate Thompson Sampling for Search Query Recommendation in SAC'15

Recently uploaded

Efficient Approximate Thompson Sampling for Search Query Recommendation in SAC'15

Editor's Notes