This document provides an overview of multi-armed bandit algorithms from an algorithmic perspective. It introduces the multi-armed bandit problem and defines it as a reinforcement learning problem where an agent must balance exploring unknown arms versus exploiting the best known arm to maximize rewards over time. The document summarizes several common bandit algorithms, including epsilon-greedy, upper confidence bound, and Thompson sampling. It also discusses extensions for non-stationary and contextual multi-armed bandit problems.