The document describes reinforcement learning algorithms. It defines equations for the policy, reward, and value functions in a reinforcement learning problem. It then derives the policy gradient theorem, which gives an expression for the gradient of expected returns with respect to the policy parameters that can be used to optimize the policy via gradient ascent. Subsequent equations adjust the policy gradient derivation for use in actor-critic methods.