COMPASS is a new framework that trains a latent space of diverse reinforcement learning policies to solve combinatorial optimization problems. It has two phases: (1) training phase samples the latent space and trains policies, and (2) inference phase searches the latent space within a budget to find high-performing policies. COMPASS achieves state-of-the-art results on 29 tasks, generalizes better than baselines on out-of-distribution instances, and its search strategy effectively reaches high-performance regions of the latent space.