This document summarizes Yan Xu's presentation on practical applications of multi-armed bandits. Bandits can be used for personalized recommendation, such as recommending news articles, by balancing exploration of new articles with exploitation of known good articles. Amazon's bandit algorithm allows for real-time optimization of multiple variables by modeling interactions between variables. The algorithm was able to increase website conversion by 21% after a single week of optimization.
DILEMMA: EXPLORATION VS.
EXPLOITATION
Theexploration/exploitation trade-off is a dilemma we
frequently face in choosing between options.
Stay the same route to drive home, or try a new route?
Choose your favorite restaurant, or the new one?
Listen to your favorite music channel, or try a new artist?
Attend a new meetup?
5.
HOW TO RESOLVETHE
DILEMMA
https://pavlov.tech/2019/03/02/animated-multi-
armed-bandit-policies/
Epsilon Greedy
UCB (Upper Confidence Bound)
Thompson Sampling