QMIX is a deep multi-agent reinforcement learning method that allows for centralized training with decentralized execution. It represents the joint action-value function as a factored and monotonic combination of individual agent value functions. This ensures greedy policies over the individual value functions correspond to greedy policies over the joint value function. Experiments in StarCraft II micromanagement tasks show QMIX outperforms independent learners and value decomposition networks by effectively learning cooperative behaviors while ensuring scalability.