This document provides an overview of reinforcement learning and AlphaZero. It discusses the math behind reinforcement learning concepts like policy iteration, policy improvement, and policy evaluation. It then explains how AlphaZero uses these concepts along with a deep neural network and self-play to master the game of Go without human data. Key algorithms discussed include Monte Carlo tree search and how AlphaZero implements them in code to learn directly from games played between copies of itself.