Real Time Learning

1,888 views

Published on

A talk about real-time learning, especially using

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,888
On SlideShare
0
From Embeds
0
Number of Embeds
95
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Real Time Learning

  1. 1. Real-time Learning©MapR Technologies - Confidential 1
  2. 2.  Contact: – tdunning@maprtech.com – @ted_dunning Slides and such (available late tonight): – http://slideshare.net/tdunning Hash tags: #mapr #hivedata©MapR Technologies - Confidential 2
  3. 3. We have a product to sell … from a web-site©MapR Technologies - Confidential 3
  4. 4. What tag- What line? picture? Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! What call to action?©MapR Technologies - Confidential 4
  5. 5. The Challenge Design decisions affect probability of success – Cheesy web-sites don’t even sell cheese The best designers do better when allowed to fail – Exploration juices creativity But failing is expensive – If only because we could have succeeded – But also because offending or disappointing customers is bad©MapR Technologies - Confidential 5
  6. 6. More Challenges Too many designs – 5 pictures – 10 tag-lines – 4 calls to action – 3 back-ground colors => 5 x 10 x 4 x 3 = 600 designs It gets worse quickly – What about changes on the back-end? – Search engine variants? – Checkout process variants?©MapR Technologies - Confidential 6
  7. 7. Example – AB testing in real-time I have 15 versions of my landing page Each visitor is assigned to a version – Which version? A conversion or sale or whatever can happen – How long to wait? Some versions of the landing page are horrible – Don’t want to give them traffic©MapR Technologies - Confidential 7
  8. 8. A Quick Diversion You see a coin – What is the probability of heads? – Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value?©MapR Technologies - Confidential 8
  9. 9. A Philosophical Conclusion Probability as expressed by humans is subjective and depends on information and experience©MapR Technologies - Confidential 9
  10. 10. I Dunno©MapR Technologies - Confidential 10
  11. 11. 5 heads out of 10 throws©MapR Technologies - Confidential 11
  12. 12. 2 heads out of 12 throws©MapR Technologies - Confidential 12
  13. 13. So now you understand Bayesian probability©MapR Technologies - Confidential 13
  14. 14. Another Quick Diversion Let’s play a shell game This is a special shell game It costs you nothing to play The pea has constant probability of being under each shell (trust me) How do you find the best shell? How do you find it while maximizing the number of wins?©MapR Technologies - Confidential 14
  15. 15. Pause for short con-game©MapR Technologies - Confidential 15
  16. 16. Interim Thoughts Can you identify winners or losers without trying them out? Can you ever completely eliminate a shell with a bad streak? Should you keep trying apparent losers?©MapR Technologies - Confidential 16
  17. 17. Pause for second con-game©MapR Technologies - Confidential 17
  18. 18. So now you understand multi-armed bandits©MapR Technologies - Confidential 18
  19. 19. Conclusions Can you identify winners or losers without trying them out? No Can you ever completely eliminate a shell with a bad streak? No Should you keep trying apparent losers? Yes, but at a decreasing rate©MapR Technologies - Confidential 19
  20. 20. Is there an optimum strategy?©MapR Technologies - Confidential 20
  21. 21. Bayesian Bandit Compute distributions based on data so far Sample p1, p2 and p2 from these distributions Pick shell i where i = argmaxi pi Lemma 1: The probability of picking shell i will match the probability it is the best shell Lemma 2: This is as good as it gets©MapR Technologies - Confidential 21
  22. 22. And it works! 0.12 0.11 0.1 0.09 0.08 0.07 regret 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 500 600 700 800 900 1000 1100 n©MapR Technologies - Confidential 22
  23. 23. Video Demo©MapR Technologies - Confidential 23
  24. 24. The Code Select an alternative n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0))) Select and learn for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k) But we already know how to count!©MapR Technologies - Confidential 24
  25. 25. The Basic Idea We can encode a distribution by sampling Sampling allows unification of exploration and exploitation Can be extended to more general response models©MapR Technologies - Confidential 25
  26. 26. The Original Problem x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3©MapR Technologies - Confidential 26
  27. 27. Response Function æ ö p(win) = w çåqi xi ÷ è i ø 1 0.5 y 0 -6 -4 -2 0 2 4 6 x©MapR Technologies - Confidential 27
  28. 28. Generalized Banditry Suppose we have an infinite number of bandits – suppose they are each labeled by two real numbers x and y in [0,1] – also that expected payoff is a parameterized function of x and y E [ z ] = f (x, y | q ) – now assume a distribution for θ that we can learn online Selection works by sampling θ, then computing f Learning works by propagating updates back to θ – If f is linear, this is very easy Don’t just have to have two labels, could have labels and context©MapR Technologies - Confidential 28
  29. 29. Context Variables x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3 user.geo env.time env.day_of_week env.weekend©MapR Technologies - Confidential 29
  30. 30. Caveats Original Bayesian Bandit only requires real-time Generalized Bandit may require access to long history for learning – Pseudo online learning may be easier than true online Bandit variables can include content, time of day, day of week Context variables can include user id, user features Bandit × context variables provide the real power©MapR Technologies - Confidential 30
  31. 31. You can do this yourself!©MapR Technologies - Confidential 31
  32. 32. Thank You©MapR Technologies - Confidential 32
  33. 33.  Contact: – tdunning@maprtech.com – @ted_dunning Slides and such (available late tonight): – http://slideshare.net/tdunning Hash tags: #mapr #hivedata©MapR Technologies - Confidential 33

×