Efficient Approximate Thompson Sampling
for Search Query Recommendation
Chu-Cheng Hsieh
1
2
Nov.
13
Dec.
13
Jan.
13
Feb.
13
Training Date
Training Data
3
What’s wrong?
4
Resource is limited
5
Before vs. After Xmas
6
Multi-arm bandit
Problem (MAB)
Thompson Sampling
Query recommendation
Experiments
The k-arm Bandit Problem
7
A
B
C
8
Play the right slot machine
that maximize your profit
#Goal#
9
1. 30% (play each evenly)
2. 70% play the best one
#Simple Strategy#
10
Related Search => MAB problem (M=1)
A B C D E
11
Multi-arm bandit Problem
Thompson (TS)
Sampling
Query recommendation
Experiments
12
Exploration
(learn)
(earn)
Exploitation
x = random.random() #0<=x<1
y = 0.49
if (x < y):
return true
else:
return false
(Observed)
13
(Estimate)
I played 10 times -- win 5 and lose 5
I played 100 times -- win 45 and lose 55
#Learn#
Chanceofhavingμ
μ
``prior’’
14
Data Observed
Action (Machine)
(Refienments being displayed)
Reward (Win/Loss)
(CTR, BBOWA, ...)
15
#Question#
Learn or Earn?
Red
19 win
9 lose
Blue
59 win
39 lose
16
Thompson Sampling
Selecting a candidate based on the following formula:
Assuming:
0.0 0.2 0.4 0.6 0.8 1.0
0.60.81.01.21.4
x
PDF
17
Beta(0+1,0+1)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.5
x
PDF
18
Beta(5+1,5+1)
0.0 0.2 0.4 0.6 0.8 1.0
02468
x
PDF
19
Beta(45+1,55+1)
20
#Example#
Learn or Earn?
Red
19 win
9 lose
Blue
59 win
39 lose
75% 25%
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
The motivation of Thompson-S (2)
21
Beta(20,10)
Beta(60,40)
See a
good one;
“learn more”
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
Intuition
(Underdog, but worth to learn)
22
Beta(4,6)
Beta(60,40)
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
The motivation of Thompson-S (1)
23
Beta(10,15)
Beta(60,40)
avoid exploring
“low potential” arm
early on
Intuition (Equal exploration)
24 0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
Beta(40,60) Beta(60,40)
25
Init: a=1, b=1, Sx=Fx=0 for all x
each arm corresponds to Beta(Sx+a, Fx+b) prior
1. Draw a random number from each arm
based on Beta(Sx+a, Fx+b)
2. Play the arm (x’) with the highest number
3. If (see a reward)
Sx’ += 1
else
Fx’ += 1
Algorithm
26
27
Multi-arm bandit Problem
Thompson Sampling
Query Recommendation
Experiments
28
Ugly truth #1
M > 1 (M=5 here)
29
Ugly truth # 2
No response => failure ?
31
32
Multi-arm bandit Problem
Thompson Sampling
Query Recommendation
Experiments
33
#Experiments#
 Target: popular 100 (queries)
 Date: 2 weeks (Nov. 2013)
 Goal: identify top M
 Measurement: Regret
->
Best
->
Picked
34
M=1M=2
#Goal#
Identify top M quickly.
35
M=1,2,3 and γ=0.02
36
M=2, γ=0~2
37
 Chu-Cheng Hsieh
38
Hsieh, C. Neufeld, J. Holloway King, T. Cho, J.J.
Efficient Approximate Thompson Sampling for
Search Query Recommendation The 30th
ACM/SIGAPP Symposium On Applied Computing
(SAC 2015)
About the author & download:
http://oak.cs.ucla.edu/~chucheng/

Efficient Approximate Thompson Sampling for Search Query Recommendation in SAC'15

Editor's Notes

  • #3 Small training period? => low recall
  • #5 One related keyword is allowed
  • #6 (2) We have more items than we can displayed.
  • #7 Find an old problem so that you can stand on the shoulder of giants
  • #8 In each time, you can pick a slot machine to play. You are given choice of selecting which slot machine to play.
  • #10 You are assuming info you learn from 30% is reliable
  • #11 One related keyword is allowed
  • #12 用Slot Machine當例子,暫時別用Related Search William R. Thompson
  • #13 “Learn” is more intersting, earn is simple You learn from observing “rewriting”
  • #14 用80%當example You don’t know the code. You only observe what happens after. So you need to make assumption. The simple assumption is to assume the prior is “normal distribution”
  • #15 a_x: which\;machine\;is\;played r: 1\;if\;win;\;otherwise,\;0
  • #17 beta is conjugate prior of binomial likelihood function \documentclass[10pt]{article} \usepackage[usenames]{color} %used for font color \usepackage{amsfonts} \usepackage{amsmath} \usepackage{amssymb} \usepackage{bm} \usepackage[utf8]{inputenc} %useful to type directly diacritic characters \newcommand{\defined}{\stackrel{{\rm def}}{=}}
  • #18 Beta(1,1) is uniform. If you randomly draw a number from Beta, the chance you see 0 or 0.5 or 1 are the same.
  • #20 When alpha and beta is large enough, it’s a bell curve (normal distribution)
  • #22 75.46%
  • #23 10.9%
  • #24 3.5%
  • #25 0.188%
  • #28 TS is a strategy of MAP
  • #29 Jello
  • #31 Multiple choice to dispaly
  • #33 TS is a strategy of MAP
  • #35 M=1, how quickly we can find the best?
  • #37 Gamma = “when no response, how much discount you would like to apply to penalty (beta)”?