1©MapR Technologies - Confidential
Real-time Learning
2©MapR Technologies - Confidential
whoami – Ted Dunning
 Chief Application Architect, MapR Technologies
 Committer, member, Apache Software Foundation
– particularly Mahout, Zookeeper and Drill
(we’re hiring)
 Contact me at
tdunning@maprtech.com
tdunning@apache.com
ted.dunning@gmail.com
@ted_dunning
3©MapR Technologies - Confidential
 Slides and such (available late tonight):
– http://www.mapr.com/company/events/devoxx-3-29-2013
 Hash tags: #mapr #devoxxfr
4©MapR Technologies - Confidential
Agenda
 What is real-time learning?
 A sample problem
 Philosophy, statistics and the nature of the knowledge
 A solution
 System design
5©MapR Technologies - Confidential
What is Real-time Learning?
 Training data arrives one record at a time
 The system improves a mathematical model based on a small
amount of training data
 We retain at most a fixed amount of state
 Each learning step takes O(1) time and memory
6©MapR Technologies - Confidential
We have a product
to sell …
from a web-site
7©MapR Technologies - Confidential
Bogus Dog Food is the Best!
Now available in handy 1 ton
bags!
Buy 5!
What
picture?
What tag-
line?
What call to
action?
8©MapR Technologies - Confidential
The Challenge
 Design decisions affect probability of success
– Cheesy web-sites don’t even sell cheese
 The best designers do better when allowed to fail
– Exploration juices creativity
 But failing is expensive
– If only because we could have succeeded
– But also because offending or disappointing customers is bad
9©MapR Technologies - Confidential
A Quick Diversion
 You see a coin
– What is the probability of heads?
– Could it be larger or smaller than that?
 I flip the coin and while it is in the air ask again
 I catch the coin and ask again
 I look at the coin (and you don’t) and ask again
 Why does the answer change?
– And did it ever have a single value?
10©MapR Technologies - Confidential
A Philosophical Conclusion
 Probability as expressed by humans is subjective and depends on
information and experience
11©MapR Technologies - Confidential
So now you understand
Bayesian probability
12©MapR Technologies - Confidential
Another Quick Diversion
 Let’s play a shell game
 This is a special shell game
 It costs you nothing to play
 The pea has constant probability of being under each shell
(trust me)
 How do you find the best shell?
 How do you find it while maximizing the number of wins?
13©MapR Technologies - Confidential
Pause for short
con-game
14©MapR Technologies - Confidential
Conclusions
 Can you identify winners or losers without trying them out?
No
 Can you ever completely eliminate a shell with a bad streak?
No
 Should you keep trying apparent losers?
Yes, but at a decreasing rate
15©MapR Technologies - Confidential
So now you understand
multi-armed bandits
16©MapR Technologies - Confidential
Is there an optimum
strategy?
17©MapR Technologies - Confidential
Thompson Sampling
 Select each shell according to the probability that it is the best
 Probability that it is the best can be computed using posterior
 But I promised a simple answer
P(i is best) = I E[ri |q]= max
j
E[rj |q]
é
ëê
ù
ûúò P(q | D) dq
18©MapR Technologies - Confidential
Thompson Sampling – Take 2
 Sample θ
 Pick i to maximize reward
 Record result from using i
q ~P(q | D)
i = argmax
j
E[r |q]
19©MapR Technologies - Confidential
Nearly Forgotten until Recently
 Citations for Thompson sampling
20©MapR Technologies - Confidential
Bayesian Bandit for the Shells
 Compute distributions based on data so far
 Sample p1, p2 and p3 from these distributions
 Pick shell i where i = argmaxi pi
 Lemma 1: The probability of picking shell i will match the
probability it is the best shell
 Lemma 2: This is as good as it gets
21©MapR Technologies - Confidential
And it works!
11000 100 200 300 400 500 600 700 800 900 1000
0.12
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
n
regret
ε- greedy, ε = 0.05
Bayesian Bandit with Gamma- Normal
22©MapR Technologies - Confidential
Video Demo
23©MapR Technologies - Confidential
The Basic Idea
 We can encode a distribution by sampling
 Sampling allows unification of exploration and exploitation
 Can be extended to more general response models
24©MapR Technologies - Confidential
The Original Problem
Bogus Dog Food is the Best!
Now available in handy 1 ton
bags!
Buy 5!
x1
x2
x3
25©MapR Technologies - Confidential
Mathematical Statement
 Logistic or probit regression
P(conversion) = w xiqijå( )
w(x) =
1
1+ e-x
w(x) =
erf(x)+1
2
26©MapR Technologies - Confidential
Same Algorithm
 Sample θ
 Pick design x to maximize reward
q ~P(q | D)
x*
= argmax
x
E[rx |q]= argmax
x
xiqijå
27©MapR Technologies - Confidential
Context Variables
Bogus Dog Food is the Best!
Now available in handy 1 ton
bags!
Buy 5!
x1
x2
x3
y1=user.geo y2=env.time y3=env.day_of_week y4=env.weekend
28©MapR Technologies - Confidential
Two Kinds of Variables
 The web-site design - x1, x2, x3
– We can change these
– Different values give different web-site designs
 The environment or context – y1, y2, y3, y4
– We can’t change these
– They can change themselves
 Our model should include interactions between x and y
29©MapR Technologies - Confidential
Same Algorithm, More Greek Letters
 Sample θ, π, φ
 Pick design x to maximize reward, y’s are constant
 This looks very fancy, but is actually pretty simple
(q,P,F)~P(q,P,F | D)
x*
= argmax
x
E[rx |q]
= argmax
x
xiqi
i
å + xi yjpij
i, j
å + yiji
i
å
30©MapR Technologies - Confidential
Surprises
 We cannot record a non-conversion until we wait
 We cannot record a conversion until we wait for the same time
 Learning from conversions requires delay
 We don’t have to wait very long
31©MapR Technologies - Confidential
32©MapR Technologies - Confidential
33©MapR Technologies - Confidential
34©MapR Technologies - Confidential
35©MapR Technologies - Confidential
Required Steps
 Learn distribution of parameters from data
– Logistic regression or probit regression (can be on-line!)
– Need Bayesian learning algorithm
 Sample from posterior distribution
– Generally included in Bayesian learning algorithm
 Pick design
– Simple sequential search
 Record data
36©MapR Technologies - Confidential
Required system
design
37©MapR Technologies - Confidential
t
now
Hadoop is Not Very Real-time
Unprocessed
Data
Fully
processed
Latest full
period
Hadoop job
takes this
long for this
data
38©MapR Technologies - Confidential
t
now
Hadoop works
great back here
Storm
works
here
Real-time and Long-time together
Blended
view
Blended
view
Blended
View
39©MapR Technologies - Confidential
Traditional Hadoop Design
 Can use Kafka cluster to queue log lines
 Can use Storm cluster to do real time learning
 Can host web site on NAS
 Can use Flume cluster to import data from Kafka to Hadoop
 Can record long-term history on Hadoop Cluster
 How many clusters?
40©MapR Technologies - Confidential
Kafka
Kafka
Cluster
Kafka
Cluster
Kafka
Cluster
Storm
Users
Web Site
Kafka
API
Web Service NAS
Design
Targeting
Hadoop
HDFS
Data
Flume
41©MapR Technologies - Confidential
That is a lot of
moving parts!
42©MapR Technologies - Confidential
Alternative Design
 Can host log catcher on MapR via NFS
 Storm can read data directly from queue
 Can host web server directly on cluster
 Only one cluster needed
– Total instances drops by 3x
– Admin burden massively decreased
43©MapR Technologies - Confidential
Users
Catcher Storm
Topic
Queue
Web-server
http
Web
Data
MapR
44©MapR Technologies - Confidential
You can do this
yourself!
45©MapR Technologies - Confidential
Contact Me!
 We’re hiring at MapR in US and Europe
 MapR software available for research use
 Contact me at tdunning@maprtech.com or @ted_dunning
 Share news with @apachemahout
 Tweet #devoxxfr #mapr #mahout @ted_dunning

Devoxx Real-Time Learning

  • 1.
    1©MapR Technologies -Confidential Real-time Learning
  • 2.
    2©MapR Technologies -Confidential whoami – Ted Dunning  Chief Application Architect, MapR Technologies  Committer, member, Apache Software Foundation – particularly Mahout, Zookeeper and Drill (we’re hiring)  Contact me at tdunning@maprtech.com tdunning@apache.com ted.dunning@gmail.com @ted_dunning
  • 3.
    3©MapR Technologies -Confidential  Slides and such (available late tonight): – http://www.mapr.com/company/events/devoxx-3-29-2013  Hash tags: #mapr #devoxxfr
  • 4.
    4©MapR Technologies -Confidential Agenda  What is real-time learning?  A sample problem  Philosophy, statistics and the nature of the knowledge  A solution  System design
  • 5.
    5©MapR Technologies -Confidential What is Real-time Learning?  Training data arrives one record at a time  The system improves a mathematical model based on a small amount of training data  We retain at most a fixed amount of state  Each learning step takes O(1) time and memory
  • 6.
    6©MapR Technologies -Confidential We have a product to sell … from a web-site
  • 7.
    7©MapR Technologies -Confidential Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! What picture? What tag- line? What call to action?
  • 8.
    8©MapR Technologies -Confidential The Challenge  Design decisions affect probability of success – Cheesy web-sites don’t even sell cheese  The best designers do better when allowed to fail – Exploration juices creativity  But failing is expensive – If only because we could have succeeded – But also because offending or disappointing customers is bad
  • 9.
    9©MapR Technologies -Confidential A Quick Diversion  You see a coin – What is the probability of heads? – Could it be larger or smaller than that?  I flip the coin and while it is in the air ask again  I catch the coin and ask again  I look at the coin (and you don’t) and ask again  Why does the answer change? – And did it ever have a single value?
  • 10.
    10©MapR Technologies -Confidential A Philosophical Conclusion  Probability as expressed by humans is subjective and depends on information and experience
  • 11.
    11©MapR Technologies -Confidential So now you understand Bayesian probability
  • 12.
    12©MapR Technologies -Confidential Another Quick Diversion  Let’s play a shell game  This is a special shell game  It costs you nothing to play  The pea has constant probability of being under each shell (trust me)  How do you find the best shell?  How do you find it while maximizing the number of wins?
  • 13.
    13©MapR Technologies -Confidential Pause for short con-game
  • 14.
    14©MapR Technologies -Confidential Conclusions  Can you identify winners or losers without trying them out? No  Can you ever completely eliminate a shell with a bad streak? No  Should you keep trying apparent losers? Yes, but at a decreasing rate
  • 15.
    15©MapR Technologies -Confidential So now you understand multi-armed bandits
  • 16.
    16©MapR Technologies -Confidential Is there an optimum strategy?
  • 17.
    17©MapR Technologies -Confidential Thompson Sampling  Select each shell according to the probability that it is the best  Probability that it is the best can be computed using posterior  But I promised a simple answer P(i is best) = I E[ri |q]= max j E[rj |q] é ëê ù ûúò P(q | D) dq
  • 18.
    18©MapR Technologies -Confidential Thompson Sampling – Take 2  Sample θ  Pick i to maximize reward  Record result from using i q ~P(q | D) i = argmax j E[r |q]
  • 19.
    19©MapR Technologies -Confidential Nearly Forgotten until Recently  Citations for Thompson sampling
  • 20.
    20©MapR Technologies -Confidential Bayesian Bandit for the Shells  Compute distributions based on data so far  Sample p1, p2 and p3 from these distributions  Pick shell i where i = argmaxi pi  Lemma 1: The probability of picking shell i will match the probability it is the best shell  Lemma 2: This is as good as it gets
  • 21.
    21©MapR Technologies -Confidential And it works! 11000 100 200 300 400 500 600 700 800 900 1000 0.12 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 n regret ε- greedy, ε = 0.05 Bayesian Bandit with Gamma- Normal
  • 22.
    22©MapR Technologies -Confidential Video Demo
  • 23.
    23©MapR Technologies -Confidential The Basic Idea  We can encode a distribution by sampling  Sampling allows unification of exploration and exploitation  Can be extended to more general response models
  • 24.
    24©MapR Technologies -Confidential The Original Problem Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x1 x2 x3
  • 25.
    25©MapR Technologies -Confidential Mathematical Statement  Logistic or probit regression P(conversion) = w xiqijå( ) w(x) = 1 1+ e-x w(x) = erf(x)+1 2
  • 26.
    26©MapR Technologies -Confidential Same Algorithm  Sample θ  Pick design x to maximize reward q ~P(q | D) x* = argmax x E[rx |q]= argmax x xiqijå
  • 27.
    27©MapR Technologies -Confidential Context Variables Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x1 x2 x3 y1=user.geo y2=env.time y3=env.day_of_week y4=env.weekend
  • 28.
    28©MapR Technologies -Confidential Two Kinds of Variables  The web-site design - x1, x2, x3 – We can change these – Different values give different web-site designs  The environment or context – y1, y2, y3, y4 – We can’t change these – They can change themselves  Our model should include interactions between x and y
  • 29.
    29©MapR Technologies -Confidential Same Algorithm, More Greek Letters  Sample θ, π, φ  Pick design x to maximize reward, y’s are constant  This looks very fancy, but is actually pretty simple (q,P,F)~P(q,P,F | D) x* = argmax x E[rx |q] = argmax x xiqi i å + xi yjpij i, j å + yiji i å
  • 30.
    30©MapR Technologies -Confidential Surprises  We cannot record a non-conversion until we wait  We cannot record a conversion until we wait for the same time  Learning from conversions requires delay  We don’t have to wait very long
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
    35©MapR Technologies -Confidential Required Steps  Learn distribution of parameters from data – Logistic regression or probit regression (can be on-line!) – Need Bayesian learning algorithm  Sample from posterior distribution – Generally included in Bayesian learning algorithm  Pick design – Simple sequential search  Record data
  • 36.
    36©MapR Technologies -Confidential Required system design
  • 37.
    37©MapR Technologies -Confidential t now Hadoop is Not Very Real-time Unprocessed Data Fully processed Latest full period Hadoop job takes this long for this data
  • 38.
    38©MapR Technologies -Confidential t now Hadoop works great back here Storm works here Real-time and Long-time together Blended view Blended view Blended View
  • 39.
    39©MapR Technologies -Confidential Traditional Hadoop Design  Can use Kafka cluster to queue log lines  Can use Storm cluster to do real time learning  Can host web site on NAS  Can use Flume cluster to import data from Kafka to Hadoop  Can record long-term history on Hadoop Cluster  How many clusters?
  • 40.
    40©MapR Technologies -Confidential Kafka Kafka Cluster Kafka Cluster Kafka Cluster Storm Users Web Site Kafka API Web Service NAS Design Targeting Hadoop HDFS Data Flume
  • 41.
    41©MapR Technologies -Confidential That is a lot of moving parts!
  • 42.
    42©MapR Technologies -Confidential Alternative Design  Can host log catcher on MapR via NFS  Storm can read data directly from queue  Can host web server directly on cluster  Only one cluster needed – Total instances drops by 3x – Admin burden massively decreased
  • 43.
    43©MapR Technologies -Confidential Users Catcher Storm Topic Queue Web-server http Web Data MapR
  • 44.
    44©MapR Technologies -Confidential You can do this yourself!
  • 45.
    45©MapR Technologies -Confidential Contact Me!  We’re hiring at MapR in US and Europe  MapR software available for research use  Contact me at tdunning@maprtech.com or @ted_dunning  Share news with @apachemahout  Tweet #devoxxfr #mapr #mahout @ted_dunning