This document summarizes the policy gradient reinforcement learning algorithm. It begins by introducing the objective of directly maximizing expected reward over a policy. It then derives the policy gradient theorem, which allows calculating the analytical gradient of the expected reward with respect to the policy parameters. This is used to develop the REINFORCE algorithm, which approximates the policy gradient using sampled episodes. REINFORCE estimates state-action values to compute the policy gradient and updates the policy in the direction of increasing expected reward. Baseline functions can be subtracted from the state-action values to reduce variance in the policy gradient estimate.