Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

457 views

Published on

Published in:
Technology

No Downloads

Total views

457

On SlideShare

0

From Embeds

0

Number of Embeds

36

Shares

0

Downloads

9

Comments

0

Likes

2

No embeds

No notes for slide

William R. Thompson

You learn from observing “rewriting”

You don’t know the code. You only observe what happens after. So you need to make assumption.

The simple assumption is to assume the prior is “normal distribution”

r: 1\;if\;win;\;otherwise,\;0

\documentclass[10pt]{article}

\usepackage[usenames]{color} %used for font color

\usepackage{amsfonts}

\usepackage{amsmath}

\usepackage{amssymb}

\usepackage{bm}

\usepackage[utf8]{inputenc} %useful to type directly diacritic characters

\newcommand{\defined}{\stackrel{{\rm def}}{=}}

- 1. Efficient Approximate Thompson Sampling for Search Query Recommendation Chu-Cheng Hsieh 1
- 2. 2 Nov. 13 Dec. 13 Jan. 13 Feb. 13 Training Date Training Data
- 3. 3 What’s wrong?
- 4. 4 Resource is limited
- 5. 5 Before vs. After Xmas
- 6. 6 Multi-arm bandit Problem (MAB) Thompson Sampling Query recommendation Experiments
- 7. The k-arm Bandit Problem 7 A B C
- 8. 8 Play the right slot machine that maximize your profit #Goal#
- 9. 9 1. 30% (play each evenly) 2. 70% play the best one #Simple Strategy#
- 10. 10 Related Search => MAB problem (M=1) A B C D E
- 11. 11 Multi-arm bandit Problem Thompson (TS) Sampling Query recommendation Experiments
- 12. 12 Exploration (learn) (earn) Exploitation
- 13. x = random.random() #0<=x<1 y = 0.49 if (x < y): return true else: return false (Observed) 13 (Estimate) I played 10 times -- win 5 and lose 5 I played 100 times -- win 45 and lose 55 #Learn# Chanceofhavingμ μ ``prior’’
- 14. 14 Data Observed Action (Machine) (Refienments being displayed) Reward (Win/Loss) (CTR, BBOWA, ...)
- 15. 15 #Question# Learn or Earn? Red 19 win 9 lose Blue 59 win 39 lose
- 16. 16 Thompson Sampling Selecting a candidate based on the following formula: Assuming:
- 17. 0.0 0.2 0.4 0.6 0.8 1.0 0.60.81.01.21.4 x PDF 17 Beta(0+1,0+1)
- 18. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.02.5 x PDF 18 Beta(5+1,5+1)
- 19. 0.0 0.2 0.4 0.6 0.8 1.0 02468 x PDF 19 Beta(45+1,55+1)
- 20. 20 #Example# Learn or Earn? Red 19 win 9 lose Blue 59 win 39 lose 75% 25%
- 21. 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF The motivation of Thompson-S (2) 21 Beta(20,10) Beta(60,40) See a good one; “learn more”
- 22. 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF Intuition (Underdog, but worth to learn) 22 Beta(4,6) Beta(60,40)
- 23. 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF The motivation of Thompson-S (1) 23 Beta(10,15) Beta(60,40) avoid exploring “low potential” arm early on
- 24. Intuition (Equal exploration) 24 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF 0.0 0.2 0.4 0.6 0.8 1.0 02468 PDF Beta(40,60) Beta(60,40)
- 25. 25 Init: a=1, b=1, Sx=Fx=0 for all x each arm corresponds to Beta(Sx+a, Fx+b) prior 1. Draw a random number from each arm based on Beta(Sx+a, Fx+b) 2. Play the arm (x’) with the highest number 3. If (see a reward) Sx’ += 1 else Fx’ += 1 Algorithm
- 26. 26
- 27. 27 Multi-arm bandit Problem Thompson Sampling Query Recommendation Experiments
- 28. 28 Ugly truth #1 M > 1 (M=5 here)
- 29. 29
- 30. Ugly truth # 2 No response => failure ?
- 31. 31
- 32. 32 Multi-arm bandit Problem Thompson Sampling Query Recommendation Experiments
- 33. 33 #Experiments# Target: popular 100 (queries) Date: 2 weeks (Nov. 2013) Goal: identify top M Measurement: Regret -> Best -> Picked
- 34. 34 M=1M=2 #Goal# Identify top M quickly.
- 35. 35 M=1,2,3 and γ=0.02
- 36. 36 M=2, γ=0~2
- 37. 37 Chu-Cheng Hsieh
- 38. 38 Hsieh, C. Neufeld, J. Holloway King, T. Cho, J.J. Efficient Approximate Thompson Sampling for Search Query Recommendation The 30th ACM/SIGAPP Symposium On Applied Computing (SAC 2015) About the author & download: http://oak.cs.ucla.edu/~chucheng/

No public clipboards found for this slide

Be the first to comment