Your SlideShare is downloading. ×
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Real Time Learning
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Real Time Learning

1,388

Published on

A talk about real-time learning, especially using

A talk about real-time learning, especially using

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,388
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Real-time Learning©MapR Technologies - Confidential 1
  • 2.  Contact: – tdunning@maprtech.com – @ted_dunning Slides and such (available late tonight): – http://slideshare.net/tdunning Hash tags: #mapr #hivedata©MapR Technologies - Confidential 2
  • 3. We have a product to sell … from a web-site©MapR Technologies - Confidential 3
  • 4. What tag- What line? picture? Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! What call to action?©MapR Technologies - Confidential 4
  • 5. The Challenge Design decisions affect probability of success – Cheesy web-sites don’t even sell cheese The best designers do better when allowed to fail – Exploration juices creativity But failing is expensive – If only because we could have succeeded – But also because offending or disappointing customers is bad©MapR Technologies - Confidential 5
  • 6. More Challenges Too many designs – 5 pictures – 10 tag-lines – 4 calls to action – 3 back-ground colors => 5 x 10 x 4 x 3 = 600 designs It gets worse quickly – What about changes on the back-end? – Search engine variants? – Checkout process variants?©MapR Technologies - Confidential 6
  • 7. Example – AB testing in real-time I have 15 versions of my landing page Each visitor is assigned to a version – Which version? A conversion or sale or whatever can happen – How long to wait? Some versions of the landing page are horrible – Don’t want to give them traffic©MapR Technologies - Confidential 7
  • 8. A Quick Diversion You see a coin – What is the probability of heads? – Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value?©MapR Technologies - Confidential 8
  • 9. A Philosophical Conclusion Probability as expressed by humans is subjective and depends on information and experience©MapR Technologies - Confidential 9
  • 10. I Dunno©MapR Technologies - Confidential 10
  • 11. 5 heads out of 10 throws©MapR Technologies - Confidential 11
  • 12. 2 heads out of 12 throws©MapR Technologies - Confidential 12
  • 13. So now you understand Bayesian probability©MapR Technologies - Confidential 13
  • 14. Another Quick Diversion Let’s play a shell game This is a special shell game It costs you nothing to play The pea has constant probability of being under each shell (trust me) How do you find the best shell? How do you find it while maximizing the number of wins?©MapR Technologies - Confidential 14
  • 15. Pause for short con-game©MapR Technologies - Confidential 15
  • 16. Interim Thoughts Can you identify winners or losers without trying them out? Can you ever completely eliminate a shell with a bad streak? Should you keep trying apparent losers?©MapR Technologies - Confidential 16
  • 17. Pause for second con-game©MapR Technologies - Confidential 17
  • 18. So now you understand multi-armed bandits©MapR Technologies - Confidential 18
  • 19. Conclusions Can you identify winners or losers without trying them out? No Can you ever completely eliminate a shell with a bad streak? No Should you keep trying apparent losers? Yes, but at a decreasing rate©MapR Technologies - Confidential 19
  • 20. Is there an optimum strategy?©MapR Technologies - Confidential 20
  • 21. Bayesian Bandit Compute distributions based on data so far Sample p1, p2 and p2 from these distributions Pick shell i where i = argmaxi pi Lemma 1: The probability of picking shell i will match the probability it is the best shell Lemma 2: This is as good as it gets©MapR Technologies - Confidential 21
  • 22. And it works! 0.12 0.11 0.1 0.09 0.08 0.07 regret 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 500 600 700 800 900 1000 1100 n©MapR Technologies - Confidential 22
  • 23. Video Demo©MapR Technologies - Confidential 23
  • 24. The Code Select an alternative n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0))) Select and learn for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k) But we already know how to count!©MapR Technologies - Confidential 24
  • 25. The Basic Idea We can encode a distribution by sampling Sampling allows unification of exploration and exploitation Can be extended to more general response models©MapR Technologies - Confidential 25
  • 26. The Original Problem x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3©MapR Technologies - Confidential 26
  • 27. Response Function æ ö p(win) = w çåqi xi ÷ è i ø 1 0.5 y 0 -6 -4 -2 0 2 4 6 x©MapR Technologies - Confidential 27
  • 28. Generalized Banditry Suppose we have an infinite number of bandits – suppose they are each labeled by two real numbers x and y in [0,1] – also that expected payoff is a parameterized function of x and y E [ z ] = f (x, y | q ) – now assume a distribution for θ that we can learn online Selection works by sampling θ, then computing f Learning works by propagating updates back to θ – If f is linear, this is very easy Don’t just have to have two labels, could have labels and context©MapR Technologies - Confidential 28
  • 29. Context Variables x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3 user.geo env.time env.day_of_week env.weekend©MapR Technologies - Confidential 29
  • 30. Caveats Original Bayesian Bandit only requires real-time Generalized Bandit may require access to long history for learning – Pseudo online learning may be easier than true online Bandit variables can include content, time of day, day of week Context variables can include user id, user features Bandit × context variables provide the real power©MapR Technologies - Confidential 30
  • 31. You can do this yourself!©MapR Technologies - Confidential 31
  • 32. Thank You©MapR Technologies - Confidential 32
  • 33.  Contact: – tdunning@maprtech.com – @ted_dunning Slides and such (available late tonight): – http://slideshare.net/tdunning Hash tags: #mapr #hivedata©MapR Technologies - Confidential 33

×