We have been working to get Xen up and running on self-boot Intel® Xeon Phi processors to build HPC clouds. We see several challenges because of the unique (but not unusual for HPC) hardware technologies and performance requirements. For example, such hardware technologies include 1) >256 CPUs, 2) MCDRAM (high-bandwidth memory), 3) integrated fabric (i.e. Intel® Omni-Path). Unlike the “coprocessor“ model, supporting self-boot with >256 CPUs has various implications to Xen, including scheduling and scalability. We need to allow user applications to use MCDRAM directly to perform optimally. Also, we need to enable the integrated HPC fabric for the VM to use by direct I/O assignment.
In addition, we have only a single VM on each node to meet the high-performance requirements of HPC clouds. This (i.e. non-shared) model allowed us to optimize Xen more. In this talk, we share our design and lessons, and discuss the options we considered to achieve high-performance virtualization for HPC.