This document summarizes different approaches for multi-agent deep reinforcement learning. It discusses training multiple independent agents concurrently, centralized training with decentralized execution, and approaches that involve agent communication like parameter sharing and multi-agent deep deterministic policy gradient (MADDPG). MADDPG allows each agent to have its own reward function and trains agents centrally while executing decisions in a decentralized manner. The document provides examples of applying these methods to problems like predator-prey and uses the prisoners dilemma to illustrate how agents can learn communication protocols.