Tokyo r35 bandit

TokyoR #35
Multi-Armed-Bandit Problem
Masayuki Isobe
Adfive, Inc.

Who am I ?
• Masayuki Isobe
– MS in computer science.
• Graduation research is modeling statistical model for
game player in logic programing with EM-learning.

– Now I’m a head of own company - Adfive Inc.
• dedicated in ad-tech consulting, related system
development.

– Interested in “scientific and business”.

• Twitter: @chiral, Facebook: masayuki.isobe.14
– Anyone in this room is welcome to touch me in
these SNS.

What is multi-armed-bundit?
• A problem in search-theory.
– Finite targets of investment (like arms of slot
machine in casino) which have own reward
probabilities respectively, finite resources to be
bet, and limited number of trials are given.
– Problem is: “what is the best bets in each trials?”

• Typical dilemma:
– explore vs harvest: difficulty in evaluation for
partial knowledge possibly converged to rewards
future.
- Detailed in wikipedia:en “Multi-armed_bandit” entry

formulation
• M arms, N units of res in each trial, T trials.
– M, N, T are natural number.

• P(Reward): Each arm’s reward probability
– Binomial, poisson, etc.
• In usual formulation, arms don’t have mutual relations.

– Parameter is allowed to vary in trials for a complex
model.

• Goal: maximize the sum of rewards for all
trials.

Betting strategy
• Strategies can be put into four typical categories.
(by wikipedia)
– Semi-uniform
• epsilon-greedy and its sophisticated versions.
• sample source in an O’reilly book.
– https://github.com/johnmyleswhite/BanditsBook

– Probability matching
• Suppose some distributions and bet infered parameters
• Packages ‘bandit’ in CRAN

– Pricing
• evaluate knowleges for future rewards.

– Like ethical constraints
• avoid arms found out to be inferior.

Simulation in R
• Code is in chiral’s Gists.
– https://gist.github.com/chiral/7340722
– Including simulation code, and simple two
strategy: “random”, “epsilon-greedy”

• Results:
Simple random strategy
epsilon-greedy,
with epsilon parametes
from 0.1 to 0.5

e-greedy seems to be worthless to apply..

conclusion & future works
• Compared to simple random betting, simple egreedy strategy seems to have little difference.
– More sophisticated strategy is needed.
– Some useful algorithms are implemented in
Google Analytics.

• High performing algorithm for bandit problem
could be applied to programmatic marketing
such as RTB.
– So I’ll continue exploring in R.

Tokyo r35 bandit

More Related Content

Viewers also liked

More from Masayuki Isobe

Recently uploaded

Tokyo r35 bandit