Devoxx Real-Time Learning

1©MapR Technologies - Confidential
Real-time Learning

whoami – Ted Dunning
 Chief Application Architect, MapR Technologies
 Committer, member, Apache Software Foundation
– particularly Mahout, Zookeeper and Drill
(we’re hiring)
 Contact me at
tdunning@maprtech.com
tdunning@apache.com
ted.dunning@gmail.com
@ted_dunning

 Slides and such (available late tonight):
– http://www.mapr.com/company/events/devoxx-3-29-2013
 Hash tags: #mapr #devoxxfr

Agenda
 What is real-time learning?
 A sample problem
 Philosophy, statistics and the nature of the knowledge
 A solution
 System design

What is Real-time Learning?
 Training data arrives one record at a time
 The system improves a mathematical model based on a small
amount of training data
 We retain at most a fixed amount of state
 Each learning step takes O(1) time and memory

We have a product
to sell …
from a web-site

Bogus Dog Food is the Best!
Now available in handy 1 ton
bags!
Buy 5!
What
picture?
What tag-
line?
What call to
action?

The Challenge
 Design decisions affect probability of success
– Cheesy web-sites don’t even sell cheese
 The best designers do better when allowed to fail
– Exploration juices creativity
 But failing is expensive
– If only because we could have succeeded
– But also because offending or disappointing customers is bad

A Quick Diversion
 You see a coin
– What is the probability of heads?
– Could it be larger or smaller than that?
 I flip the coin and while it is in the air ask again
 I catch the coin and ask again
 I look at the coin (and you don’t) and ask again
 Why does the answer change?
– And did it ever have a single value?

A Philosophical Conclusion
 Probability as expressed by humans is subjective and depends on
information and experience

So now you understand
Bayesian probability

Another Quick Diversion
 Let’s play a shell game
 This is a special shell game
 It costs you nothing to play
 The pea has constant probability of being under each shell
(trust me)
 How do you find the best shell?
 How do you find it while maximizing the number of wins?

Pause for short
con-game

Conclusions
 Can you identify winners or losers without trying them out?
No
 Can you ever completely eliminate a shell with a bad streak?
No
 Should you keep trying apparent losers?
Yes, but at a decreasing rate

So now you understand
multi-armed bandits

Is there an optimum
strategy?

Thompson Sampling
 Select each shell according to the probability that it is the best
 Probability that it is the best can be computed using posterior
 But I promised a simple answer
P(i is best) = I E[ri |q]= max
j
E[rj |q]
é
ëê
ù
ûúò P(q | D) dq

Thompson Sampling – Take 2
 Sample θ
 Pick i to maximize reward
 Record result from using i
q ~P(q | D)
i = argmax
j
E[r |q]

Nearly Forgotten until Recently
 Citations for Thompson sampling

Bayesian Bandit for the Shells
 Compute distributions based on data so far
 Sample p1, p2 and p3 from these distributions
 Pick shell i where i = argmaxi pi
 Lemma 1: The probability of picking shell i will match the
probability it is the best shell
 Lemma 2: This is as good as it gets

And it works!
11000 100 200 300 400 500 600 700 800 900 1000
0.12
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
n
regret
ε- greedy, ε = 0.05
Bayesian Bandit with Gamma- Normal

Video Demo

The Basic Idea
 We can encode a distribution by sampling
 Sampling allows unification of exploration and exploitation
 Can be extended to more general response models

The Original Problem
bags!
Buy 5!
x1
x2
x3

Mathematical Statement
 Logistic or probit regression
P(conversion) = w xiqijå( )
w(x) =
1
1+ e-x
w(x) =
erf(x)+1
2

Same Algorithm
 Sample θ
 Pick design x to maximize reward
q ~P(q | D)
x*
= argmax
x
E[rx |q]= argmax
x
xiqijå

Context Variables
bags!
Buy 5!
x1
x2
x3
y1=user.geo y2=env.time y3=env.day_of_week y4=env.weekend

Two Kinds of Variables
 The web-site design - x1, x2, x3
– We can change these
– Different values give different web-site designs
 The environment or context – y1, y2, y3, y4
– We can’t change these
– They can change themselves
 Our model should include interactions between x and y

Same Algorithm, More Greek Letters
 Sample θ, π, φ
 Pick design x to maximize reward, y’s are constant
 This looks very fancy, but is actually pretty simple
(q,P,F)~P(q,P,F | D)
x*
= argmax
x
E[rx |q]
= argmax
x
xiqi
i
å + xi yjpij
i, j
å + yiji
i
å

Surprises
 We cannot record a non-conversion until we wait
 We cannot record a conversion until we wait for the same time
 Learning from conversions requires delay
 We don’t have to wait very long

Required Steps
 Learn distribution of parameters from data
– Logistic regression or probit regression (can be on-line!)
– Need Bayesian learning algorithm
 Sample from posterior distribution
– Generally included in Bayesian learning algorithm
 Pick design
– Simple sequential search
 Record data

Required system
design

t
now
Hadoop is Not Very Real-time
Unprocessed
Data
Fully
processed
Latest full
period
Hadoop job
takes this
long for this
data

t
now
Hadoop works
great back here
Storm
works
here
Real-time and Long-time together
Blended
view
Blended
view
Blended
View

Traditional Hadoop Design
 Can use Kafka cluster to queue log lines
 Can use Storm cluster to do real time learning
 Can host web site on NAS
 Can use Flume cluster to import data from Kafka to Hadoop
 Can record long-term history on Hadoop Cluster
 How many clusters?

Kafka
Kafka
Cluster
Kafka
Cluster
Kafka
Cluster
Storm
Users
Web Site
Kafka
API
Web Service NAS
Design
Targeting
Hadoop
HDFS
Data
Flume

That is a lot of
moving parts!

Alternative Design
 Can host log catcher on MapR via NFS
 Storm can read data directly from queue
 Can host web server directly on cluster
 Only one cluster needed
– Total instances drops by 3x
– Admin burden massively decreased

Users
Catcher Storm
Topic
Queue
Web-server
http
Web
Data
MapR

You can do this
yourself!

Contact Me!
 We’re hiring at MapR in US and Europe
 MapR software available for research use
 Contact me at tdunning@maprtech.com or @ted_dunning
 Share news with @apachemahout
 Tweet #devoxxfr #mapr #mahout @ted_dunning

Devoxx Real-Time Learning

More Related Content

What's hot

Similar to Devoxx Real-Time Learning

More from MapR Technologies

Recently uploaded

Devoxx Real-Time Learning