This document discusses optimization models for scheduling deep learning jobs on demand GPUs in the cloud. It aims to jointly plan VM capacity and schedule DL training jobs to minimize costs. The proposed model reduces total costs by over 90% compared to FIFO, priority, and EDF scheduling based on preliminary results for multiple node and job simulations. Performance models for predicting GPU-based deep learning applications are described in a referenced paper. The work is co-funded by the European Commission Horizon 2020 program.