PRACTICAL BANDITS
FOR BUSINESS
Yan Xu
Houston Machine Learning Meetup
June 22, 2019
OUTLINE
- Recap on Bandit Problem
- A Contextual-Bandit Approach to Personalized News
Article Recommendation
http://rob.schapire.net/papers/www10.pdf
- An efficient bandit algorithm for realtime multivariate
optimization
https://www.kdd.org/kdd2017/papers/view/an-
efficient-bandit-algorithm-for-realtime-multivariate-
optimization
MULTI-ARMED BANDITS
DILEMMA: EXPLORATION VS.
EXPLOITATION
The exploration/exploitation trade-off is a dilemma we
frequently face in choosing between options.
Stay the same route to drive home, or try a new route?
Choose your favorite restaurant, or the new one?
Listen to your favorite music channel, or try a new artist?
Attend a new meetup?
HOW TO RESOLVE THE
DILEMMA
https://pavlov.tech/2019/03/02/animated-multi-
armed-bandit-policies/
Epsilon Greedy
UCB (Upper Confidence Bound)
Thompson Sampling
REWARD AND REGRET
REWARD AND REGRET
REWARD AND REGRET
MULTI-ARMED BANDITS
FORMULATION
PRACTICAL BANDITS
APPLICATION
BANDITS FOR PERSONALIZED
RECOMMENDATION
BANDITS FOR NEWS
RECOMMENDATION
CONTEXTUAL BANDITS
CONTEXTUAL BANDITS
CONTEXTUAL BANDITS
[0.1, 0.6]
[0.6, 0.4]
[0.7, 0.1]
[0.4, 0.2]
LINUCB ALGORITHM
LINUCB ALGORITHM
LINEAR DISJOINT MODEL
LINEAR DISJOINT MODEL
UPPER BOUND ILLUSTRATION
FEATURE FREE VS LINEAR
CONTEXTUAL BANDIT
BANDITS EVALUATION
BANDITS EVALUATION
BANDITS EVALUATION
BANDITS EVALUATION
DEALING WITH HIGH
DIMENSIONALITY
~1000 binary features per user; ~100 binary feature per article
DEALING WITH HIGH
DIMENSIONALITY
DEALING WITH HIGH
DIMENSIONALITY
RESULT: PERSONALIZED
NEWS
Omniscient: always chooses the article with highest empirical
CONCLUSION
AMAZON: BANDITS FOR
MULTIVARIATE OPTIMIZATION
Published at KDD 2017, KDD 2019 is in Alaska!
AMAZON: BANDITS FOR
MULTIVARIATE OPTIMIZATION
OPTIMIZING WEB LAYOUT
PROBLEM FORMULATION
STEP 1: PROBIT REGRESSION
STEP 2: THOMPSON
SAMPLING
STEP 3: HILLING-CLIMBING TO
DECIDE
SIMULATION RESULT
SIMULATION RESULT
Control widget interaction in simulation
through alpha_2.
EXPERIMENT ON REAL
TRAFFIC
• After only a single week of online
optimization, we saw a 21%
conversion increase compared to the
median layout
SUMMARY
Contextual bandits
 Linear payoff
 Add interaction components
 UCB: Variance estimation of expected rewards
 Thompson sampling: Sample weights from posterior distribution
Applications
 Recommendation
 Multi-variate optimization
For more details
 - A Contextual-Bandit Approach to Personalized News Article
Recommendation
http://rob.schapire.net/papers/www10.pdf
- An efficient bandit algorithm for realtime multivariate
optimization
https://www.kdd.org/kdd2017/papers/view/an-efficient-
bandit-algorithm-for-realtime-multivariate-optimization

Practical contextual bandits for business