This document summarizes key concepts from Andrew Ng's lecture notes on reinforcement learning and control. It introduces reinforcement learning as a framework where learning agents receive rewards to indicate good or bad behavior, rather than explicit labels as in supervised learning. Markov decision processes (MDPs) provide a formalism for modeling reinforcement learning problems. The document describes value iteration and policy iteration algorithms for solving finite state MDPs to find the optimal value function and policy. It also discusses estimating transition probabilities and rewards from experience to learn a model of an unknown MDP.