Bandit algorithms aim to balance exploration of new options with exploitation of existing best options. The ε-greedy algorithm tries to be fair to exploration and exploitation but has issues with its fixed ε value. The softmax algorithm calculates choice probabilities based on accumulated rewards and a temperature parameter to control exploration vs exploitation. The UCB algorithm chooses options based on accumulated rewards plus an exploration bonus, making it explicitly curious while avoiding being misled by early results. Real-world use involves additional complexities around concurrent experiments, dynamic metrics and environments. Overall bandit algorithms require domain expertise and judgment in application.