This document discusses reinforcement learning with deep energy-based policies. It motivates using maximum entropy reinforcement learning to find policies that not only maximize reward but also explore possibilities. It presents an approach using energy-based models for the policy and soft Q-learning to find the optimal maximum entropy policy. The method uses neural networks to approximate the soft Q-function and a sampling network to draw samples from the policy. Experiments show maximum entropy policies provide better exploration, initialization, compositionality and robustness compared to deterministic policies.