Who am I?
• Masayuki Isobe
– MS in computer science.
• Graduation research is modeling statistical model for
game player in logic programing with EM-learning.
– Now I’m a head of own company - Adfive Inc.
• dedicated in ad-tech consulting, related system
development.
– Interested in “scientific and business”.
• Twitter: @chiral, Facebook: masayuki.isobe.14
– Anyone in this room is welcome to touch me in
these SNS.
3.
What is multi-armed-bundit?
•A problem in search-theory.
– Finite targets of investment (like arms of slot
machine in casino) which have own reward
probabilities respectively, finite resources to be
bet, and limited number of trials are given.
– Problem is: “what is the best bets in each trials?”
• Typical dilemma:
– explore vs harvest: difficulty in evaluation for
partial knowledge possibly converged to rewards
future.
- Detailed in wikipedia:en “Multi-armed_bandit” entry
4.
formulation
• M arms,N units of res in each trial, T trials.
– M, N, T are natural number.
• P(Reward): Each arm’s reward probability
– Binomial, poisson, etc.
• In usual formulation, arms don’t have mutual relations.
– Parameter is allowed to vary in trials for a complex
model.
• Goal: maximize the sum of rewards for all
trials.
5.
Betting strategy
• Strategiescan be put into four typical categories.
(by wikipedia)
– Semi-uniform
• epsilon-greedy and its sophisticated versions.
• sample source in an O’reilly book.
– https://github.com/johnmyleswhite/BanditsBook
– Probability matching
• Suppose some distributions and bet infered parameters
• Packages ‘bandit’ in CRAN
– Pricing
• evaluate knowleges for future rewards.
– Like ethical constraints
• avoid arms found out to be inferior.
6.
Simulation in R
•Code is in chiral’s Gists.
– https://gist.github.com/chiral/7340722
– Including simulation code, and simple two
strategy: “random”, “epsilon-greedy”
• Results:
Simple random strategy
epsilon-greedy,
with epsilon parametes
from 0.1 to 0.5
e-greedy seems to be worthless to apply..
7.
conclusion & futureworks
• Compared to simple random betting, simple egreedy strategy seems to have little difference.
– More sophisticated strategy is needed.
– Some useful algorithms are implemented in
Google Analytics.
• High performing algorithm for bandit problem
could be applied to programmatic marketing
such as RTB.
– So I’ll continue exploring in R.