The document discusses intrinsically motivated reinforcement learning. It presents a graphical model showing how an agent learns a policy by optimizing both expected extrinsic reward (value) and intrinsic information gain together using Bellman-like equations. The agent interacts with an environment over discrete time steps, taking actions and receiving rewards while also estimating internal states and information gain. The value and information equations are combined using a temperature parameter to optimize both goals of minimizing decision complexity and maximizing environmental information gain. Examples on a grid world demonstrate how the agent's behavior changes with different temperature parameters when trading off value and information.