Real-time Learning©MapR Technologies - Confidential       1
     Contact:       –   tdunning@maprtech.com       –   @ted_dunning     Slides and such (available late tonight):      ...
We have a product                             to sell …                                    from a web-site©MapR Technologi...
What tag-                               What                                 line?                              picture?  ...
The Challenge     Design decisions affect probability of success       –   Cheesy web-sites don’t even sell cheese     T...
More Challenges     Too many designs       – 5 pictures       – 10 tag-lines       – 4 calls to action       – 3 back-gro...
Example – AB testing in real-time     I have 15 versions of my landing page     Each visitor is assigned to a version   ...
A Quick Diversion     You see a coin       –   What is the probability of heads?       –   Could it be larger or smaller ...
A Philosophical Conclusion     Probability as expressed by humans is subjective and depends on      information and exper...
I Dunno©MapR Technologies - Confidential   10
5 heads out of 10 throws©MapR Technologies - Confidential   11
2 heads out of 12 throws©MapR Technologies - Confidential   12
So now you understand                   Bayesian probability©MapR Technologies - Confidential   13
Another Quick Diversion     Let’s play a shell game     This is a special shell game     It costs you nothing to play ...
Pause for short                                    con-game©MapR Technologies - Confidential          15
Interim Thoughts     Can you identify winners or losers without trying them out?     Can you ever completely eliminate a...
Pause for second                                    con-game©MapR Technologies - Confidential          17
So now you understand                   multi-armed bandits©MapR Technologies - Confidential   18
Conclusions     Can you identify winners or losers without trying them out?       No     Can you ever completely elimina...
Is there an optimum                   strategy?©MapR Technologies - Confidential   20
Bayesian Bandit     Compute distributions based on data so far     Sample p1, p2 and p2 from these distributions     Pi...
And it works!                                    0.12                                    0.11                             ...
Video Demo©MapR Technologies - Confidential       23
The Code     Select an alternative                   n = dim(k)[1]                   p0 = rep(0, length.out=n)           ...
The Basic Idea     We can encode a distribution by sampling     Sampling allows unification of exploration and exploitat...
The Original Problem                                                                      x2                              ...
Response Function                                                        æ       ö                                        ...
Generalized Banditry     Suppose we have an infinite number of bandits       –   suppose they are each labeled by two rea...
Context Variables                                                                         x2                              ...
Caveats     Original Bayesian Bandit only requires real-time     Generalized Bandit may require access to long history f...
You can do this                                       yourself!©MapR Technologies - Confidential         31
Thank You©MapR Technologies - Confidential   32
     Contact:       –   tdunning@maprtech.com       –   @ted_dunning     Slides and such (available late tonight):      ...
Upcoming SlideShare
Loading in …5
×

Real Time Learning

2,049 views

Published on

A talk about real-time learning, especially using

  • Be the first to comment

  • Be the first to like this

Real Time Learning

  1. 1. Real-time Learning©MapR Technologies - Confidential 1
  2. 2.  Contact: – tdunning@maprtech.com – @ted_dunning Slides and such (available late tonight): – http://slideshare.net/tdunning Hash tags: #mapr #hivedata©MapR Technologies - Confidential 2
  3. 3. We have a product to sell … from a web-site©MapR Technologies - Confidential 3
  4. 4. What tag- What line? picture? Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! What call to action?©MapR Technologies - Confidential 4
  5. 5. The Challenge Design decisions affect probability of success – Cheesy web-sites don’t even sell cheese The best designers do better when allowed to fail – Exploration juices creativity But failing is expensive – If only because we could have succeeded – But also because offending or disappointing customers is bad©MapR Technologies - Confidential 5
  6. 6. More Challenges Too many designs – 5 pictures – 10 tag-lines – 4 calls to action – 3 back-ground colors => 5 x 10 x 4 x 3 = 600 designs It gets worse quickly – What about changes on the back-end? – Search engine variants? – Checkout process variants?©MapR Technologies - Confidential 6
  7. 7. Example – AB testing in real-time I have 15 versions of my landing page Each visitor is assigned to a version – Which version? A conversion or sale or whatever can happen – How long to wait? Some versions of the landing page are horrible – Don’t want to give them traffic©MapR Technologies - Confidential 7
  8. 8. A Quick Diversion You see a coin – What is the probability of heads? – Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value?©MapR Technologies - Confidential 8
  9. 9. A Philosophical Conclusion Probability as expressed by humans is subjective and depends on information and experience©MapR Technologies - Confidential 9
  10. 10. I Dunno©MapR Technologies - Confidential 10
  11. 11. 5 heads out of 10 throws©MapR Technologies - Confidential 11
  12. 12. 2 heads out of 12 throws©MapR Technologies - Confidential 12
  13. 13. So now you understand Bayesian probability©MapR Technologies - Confidential 13
  14. 14. Another Quick Diversion Let’s play a shell game This is a special shell game It costs you nothing to play The pea has constant probability of being under each shell (trust me) How do you find the best shell? How do you find it while maximizing the number of wins?©MapR Technologies - Confidential 14
  15. 15. Pause for short con-game©MapR Technologies - Confidential 15
  16. 16. Interim Thoughts Can you identify winners or losers without trying them out? Can you ever completely eliminate a shell with a bad streak? Should you keep trying apparent losers?©MapR Technologies - Confidential 16
  17. 17. Pause for second con-game©MapR Technologies - Confidential 17
  18. 18. So now you understand multi-armed bandits©MapR Technologies - Confidential 18
  19. 19. Conclusions Can you identify winners or losers without trying them out? No Can you ever completely eliminate a shell with a bad streak? No Should you keep trying apparent losers? Yes, but at a decreasing rate©MapR Technologies - Confidential 19
  20. 20. Is there an optimum strategy?©MapR Technologies - Confidential 20
  21. 21. Bayesian Bandit Compute distributions based on data so far Sample p1, p2 and p2 from these distributions Pick shell i where i = argmaxi pi Lemma 1: The probability of picking shell i will match the probability it is the best shell Lemma 2: This is as good as it gets©MapR Technologies - Confidential 21
  22. 22. And it works! 0.12 0.11 0.1 0.09 0.08 0.07 regret 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 500 600 700 800 900 1000 1100 n©MapR Technologies - Confidential 22
  23. 23. Video Demo©MapR Technologies - Confidential 23
  24. 24. The Code Select an alternative n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0))) Select and learn for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k) But we already know how to count!©MapR Technologies - Confidential 24
  25. 25. The Basic Idea We can encode a distribution by sampling Sampling allows unification of exploration and exploitation Can be extended to more general response models©MapR Technologies - Confidential 25
  26. 26. The Original Problem x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3©MapR Technologies - Confidential 26
  27. 27. Response Function æ ö p(win) = w çåqi xi ÷ è i ø 1 0.5 y 0 -6 -4 -2 0 2 4 6 x©MapR Technologies - Confidential 27
  28. 28. Generalized Banditry Suppose we have an infinite number of bandits – suppose they are each labeled by two real numbers x and y in [0,1] – also that expected payoff is a parameterized function of x and y E [ z ] = f (x, y | q ) – now assume a distribution for θ that we can learn online Selection works by sampling θ, then computing f Learning works by propagating updates back to θ – If f is linear, this is very easy Don’t just have to have two labels, could have labels and context©MapR Technologies - Confidential 28
  29. 29. Context Variables x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3 user.geo env.time env.day_of_week env.weekend©MapR Technologies - Confidential 29
  30. 30. Caveats Original Bayesian Bandit only requires real-time Generalized Bandit may require access to long history for learning – Pseudo online learning may be easier than true online Bandit variables can include content, time of day, day of week Context variables can include user id, user features Bandit × context variables provide the real power©MapR Technologies - Confidential 30
  31. 31. You can do this yourself!©MapR Technologies - Confidential 31
  32. 32. Thank You©MapR Technologies - Confidential 32
  33. 33.  Contact: – tdunning@maprtech.com – @ted_dunning Slides and such (available late tonight): – http://slideshare.net/tdunning Hash tags: #mapr #hivedata©MapR Technologies - Confidential 33

×