The proof that the best response time in queuing systems is obtained by scheduling the jobs with the shortest remaining processing time dates back to 1966; since then, other size-based scheduling protocols that pair near-optimal response times with strong fairness guarantees have been proposed. Yet, despite these very desirable properties, size-based scheduling policies are almost never used in practice: a key reason is that, in real systems, it is prohibitive to know a priori exact job sizes.
In this talk, I will first describe our efforts to put in practice concepts coming from theory, developing HFSP: a size-based scheduler for Hadoop MapReduce that uses estimations rather than exact size information. We obtained results that were surprisingly good even with very inaccurate size estimations: this motivated us to return to theory, and perform an in-depth study of scheduling based on estimated sizes. We obtained very promising results: for a large class of workloads, size-based scheduling performs well even with very rough size estimations; for the other workloads, simple modifications to the existing scheduling protocols are sufficient to greatly enhance performance.