RT-Xen: Real-Time Virtualization 
Sisu Xi, Meng Xu, Chenyang Lu, Linh T.X. Phan, Christopher D. Gill, Insup Lee, Oleg Sokolsky
Real-Time Virtualization 
Cars: Consolidate ~100 ECUs -> ~10 multicore processors 
Infotainment on Linux or Android 
Safety-critical control on AUTOSAR 
Cloud Computing’s Killer App: Gaming [IEEE Spectrum] 
Need to compute and stream 30 to 50 frames per second 
1 
Applications must meet real-time performance constraints on virtualized platforms!
RT-Xen: Real-Time Virtualization 
Real-time hypervisor scheduling framework in Xen 
Implement a suite of real-time scheduling algorithms 
Based on compositional scheduling theory 
VMs specify resource interfaces 
Real-time guarantees to tasks in VMs 
Open source 
RT-Xen patch submitted 
https://sites.google.com/site/realtimexen/ 
2
Xen Virtualization Architecture 
Guest OS runs on VCPUs 
Hypervisor schedules VCPUs on PCPUs 
Credit scheduler 
[Weight, Cap] per VM 
Round robin 
3
RT-Xen Interface 
VM resource interface 
A set of VCPUs, each characterized by <period, budget> 
Optional: use cpumask to specify VCPU affinity with PCPUs 
Hide task-specific information 
Real-Time scheduling algorithms 
Ordering of VCPUs? Priority scheme 
Placement of VCPUs?Global vs. partition 
Resource isolation?Server mechanisms 
4
Real-Time Scheduling Policies 
Priority scheme 
Static priority: Deadline Monotonic (DM) 
Dynamic priority: Earliest Deadline First (EDF) 
Global scheduling 
Schedule VCPUs based on global information 
Allow VCPU migration across cores 
Flexible use of multiple cores 
Migration overhead and cache penalty 
Partitioned scheduling 
Assign and bind VCPUs to PCPUs 
Schedule VCPUs on each core independently 
May underutilize PCPUs 
No migration overhead or associated cache penalty 
5
Scheduling a VCPU as a Deferrable Server 
6 
time 
T1 (10, 3) 
T2 (10, 3) 
0 
5 
10 
15 
0 
5 
10 
15 
time 
Deferrable Server 
(5,3) 
Budget 
A VCPU receives budgetus of CPU resources every periodus 
Budget is replenished at every start of period 
VCPU consumes budget when running, suspends when no budget left 
Preserves budget when there is no task
RT-Xen Investigation Roadmap 
Single-coreRT-Xen 1.0 
Single-core enhancedRT-Xen 1.1 
Multi-coreRT-Xen 2.0 
7 
Global 
Scheduling 
Fixed Priority 
(DM) 
Partitioned Scheduling 
Dynamic Priority 
(EDF) 
Periodic 
Deferrable 
Periodic 
Deferrable 
Deferrable 
Periodic 
Deferrable 
Periodic 
Sporadic 
Polling 
Work Conserving Periodic 
Capacity Reclaiming Periodic 
gDM 
gEDF 
pEDF 
pDM
RT-Xen 2.0: Run Queues 
A run queue 
holds VCPUs that are runnable (have task to run) 
has two parts: VCPUs with budget and out of budget 
is sorted by priority (DM or EDF) within each part 
rt-global: all cores share one run queue with a spinlock 
rt-partition: one run queue per core 
Patches for more efficient implementation on the way! 
8 
RunQ 
VCPUs withbudget 
VCPUs out of budget 
Sorted by priority
Experimental Setup 
Hardware: Intel i7 processor, six cores running at 3.33 GHz 
Dedicate one PCPU to domain 0 
All guest VMs use the remaining cores 
Cache architecture 
Each core has dedicated L1 cache (32 KB) and L2 cache (256 KB) 
All six cores share L3 cache (12 MB) 
Inclusive L3 cache, all data in L2 cache must also be in L3 cache 
Software 
Xen 4.3 patched with RT-Xen 
Guest OS: Linux patched with LITMUSRT 
9
RT-Xen 2.0: Scheduling Overhead 
10 
rt-global has extra overhead due to global lock 
credit has high max overhead due to load balancing
RT-Xen 2.0: Credit Scheduler 
11 
credit missed deadline at 22% CPU capacity 
RT-Xen delivers real-time performance up to 78%
RT-Xen 2.0: Real-Time Performance 
12 
Global scheduling wins empirically! 
gEDF + deferrable server -> best real-time performance
Demo 
YouTube: “RT-Xen Demonstration” 
https://www.youtube.com/watch?v=wisxWn3mR5s 
13
Patch Status 
Patch RFC v1 & v2July 10th& July 29th 
gEDF + deferrable server 
cpupool support 
Patch RFC v3Expected on Aug 24th 
Scheduling trace support 
Performance improvement (splitting RunQ) 
Patch RFC v4Expected before Sep 10th 
Performance improvement 
•Timer based budget replenishment 
•Improve the timing resolution of budget 
14
Conclusion 
Diverse applications demand real-time virtualization 
Real-time virtualization in embedded area 
Cloud gaming 
RT-Xen provides real-time performance 
Efficient implementation of diverse real-time scheduling policies 
Leverage compositional scheduling theory -> analytical guarantee 
gEDF + deferrable server wins empirically 
15
Research Contributions 
RT-Xen 1.0:S. Xi, J. Wilson, C. Lu, and C.D. Gill, RT-Xen: Towards Real-Time Hypervisor Scheduling in Xen, ACM International 
Conferences on Embedded Software (EMSOFT) 2011 
RT-Xen 1.1: J. Lee, S. Xi, S. Chen, L.T.X. Phan, C. Gill, I. Lee, C. Lu and O. Sokolsky 
Realizing Compositional Scheduling through Virtualization, IEEE Real-Time and 
Embedded Technology and Applications Symposium (RTAS) 2012 
RT-Xen 2.0:S. Xi, M. Xu, C. Lu, L.T.X. Phan, C. Gill, I. Lee, and O. SokolskyReal-Time Multi-Core Virtual Machine Scheduling in Xen, ACM International 
Conferences on Embedded Software (EMSOFT) 2014 
RT-Xen 2.1 + RT-OpenStack: S. Xi, C. Li, C. Lu, C. Gill, M. Xu, L.T.X. Phan, C. Gill, I. Lee, and O. SokolskyRT-OpenStack: Co-Hosting RT VM with non RT VMs, in submission 
RTCA:S. Xi, C. Li, C. Lu, and C. GillPrioritizing Local Inter-Domain Communication in Xen, ACM/IEEE International 
Symposium on Quality of Service (IWQoS) 2013 
RT-Xen patch (gEDF with deferrable server) 
RT-Xen: Real-Time Virtualization in Xen, Xen Blog, 2013 
RT-Xen: Real-Time Virtualization in Xen, Xen Developer Summit, 2014 
16
Backup Slides 
17
RT-Xen 1.0 
18
Implementation –PCPU 19 
Budget 
XBudget 
Task 
RunQ 
RunQ 
XTask 
RdyQ 
RdyQ 
VCPU Position 
Three Queues within One Physical Core 
VCPU Params 
(period, budget, priority) 
IDLE 
Periodic 
0 
3 
IDLE
Server Design –Deferrable & Polling 
Servers (Period, Budget, Priority) 
S1 (5, 3) with Two Tasks 
T1 (10, 3 ) 
20 
T2 (10, 3 ) 
Time 
5 
0 
10 
15 
Budget in S1 
Actual Execution 
Deferrable Server 
Time 
5 
0 
10 
15 
3 
back-to-back 
Budget in S1 
Actual Execution 
Polling Server 
Time 
5 
0 
10 
15 
3 
1.Replenish? 
2.Budget but NO task?
Server Design –Periodic & Sporadic 
Servers (Period, Budget, Priority) 
S1 (5, 3) with Two Tasks 
T1 (10, 3 ) 
21 
T2 (10, 3 ) 
Time 
5 
0 
10 
15 
Budget in S1 
Actual Execution 
Sporadic Server 
Time 
5 
0 
10 
15 
3 
5 
+3 
5 
+3 
overhead ++ 
Budget in S1 
Actual Execution 
Periodic Server 
Time 
5 
0 
10 
15 
3 
theory favored 
1.Replenish? 
2.Budget but NO task? 
?
RT-Xen 2.0 
22
RT-Xen 2.0: Workload 
Periodic task sets: [period, execution time, deadline] 
CPU-intensive, independent tasks 
Randomly generate the task sets until a total task utilization, then distribute tasks to four VMs, and apply compositional scheduling theory to calculate each VM’s resource interface 
25 task sets per data point, measure fraction of schedulable tasksets 
23
RT-Xen 2.0: Context Saved 
8/20/2014 24 
Less than 1 us overhead for the spinlock
RT-Xen 2.0: Theory vs. Experiments 
25 
gEDF < pEDF theoretically due to pessimistic analysis 
gEDF > pEDF empirically, thanks to global scheduling
RT-Xen 2.0: Theory vs. Experiments 
8/20/2014 26 
gEDF > pEDF empirically, thanks to global scheduling 
gEDF < pEDF theoretically due to pessimistic analysis
RT-Xen 2.0: How about Cache? 
27 
Benefit of global scheduling dominate migration cost on a shared L3-cache platform.
RT-Xen 2.0: Context Switch 
8/20/2014 28 
“Fake” context switch with idle VCPUs

XPDS14 - RT-Xen: Real-Time Virtualization in Xen - Sisu Xi, Washington University

  • 1.
    RT-Xen: Real-Time Virtualization Sisu Xi, Meng Xu, Chenyang Lu, Linh T.X. Phan, Christopher D. Gill, Insup Lee, Oleg Sokolsky
  • 2.
    Real-Time Virtualization Cars:Consolidate ~100 ECUs -> ~10 multicore processors Infotainment on Linux or Android Safety-critical control on AUTOSAR Cloud Computing’s Killer App: Gaming [IEEE Spectrum] Need to compute and stream 30 to 50 frames per second 1 Applications must meet real-time performance constraints on virtualized platforms!
  • 3.
    RT-Xen: Real-Time Virtualization Real-time hypervisor scheduling framework in Xen Implement a suite of real-time scheduling algorithms Based on compositional scheduling theory VMs specify resource interfaces Real-time guarantees to tasks in VMs Open source RT-Xen patch submitted https://sites.google.com/site/realtimexen/ 2
  • 4.
    Xen Virtualization Architecture Guest OS runs on VCPUs Hypervisor schedules VCPUs on PCPUs Credit scheduler [Weight, Cap] per VM Round robin 3
  • 5.
    RT-Xen Interface VMresource interface A set of VCPUs, each characterized by <period, budget> Optional: use cpumask to specify VCPU affinity with PCPUs Hide task-specific information Real-Time scheduling algorithms Ordering of VCPUs? Priority scheme Placement of VCPUs?Global vs. partition Resource isolation?Server mechanisms 4
  • 6.
    Real-Time Scheduling Policies Priority scheme Static priority: Deadline Monotonic (DM) Dynamic priority: Earliest Deadline First (EDF) Global scheduling Schedule VCPUs based on global information Allow VCPU migration across cores Flexible use of multiple cores Migration overhead and cache penalty Partitioned scheduling Assign and bind VCPUs to PCPUs Schedule VCPUs on each core independently May underutilize PCPUs No migration overhead or associated cache penalty 5
  • 7.
    Scheduling a VCPUas a Deferrable Server 6 time T1 (10, 3) T2 (10, 3) 0 5 10 15 0 5 10 15 time Deferrable Server (5,3) Budget A VCPU receives budgetus of CPU resources every periodus Budget is replenished at every start of period VCPU consumes budget when running, suspends when no budget left Preserves budget when there is no task
  • 8.
    RT-Xen Investigation Roadmap Single-coreRT-Xen 1.0 Single-core enhancedRT-Xen 1.1 Multi-coreRT-Xen 2.0 7 Global Scheduling Fixed Priority (DM) Partitioned Scheduling Dynamic Priority (EDF) Periodic Deferrable Periodic Deferrable Deferrable Periodic Deferrable Periodic Sporadic Polling Work Conserving Periodic Capacity Reclaiming Periodic gDM gEDF pEDF pDM
  • 9.
    RT-Xen 2.0: RunQueues A run queue holds VCPUs that are runnable (have task to run) has two parts: VCPUs with budget and out of budget is sorted by priority (DM or EDF) within each part rt-global: all cores share one run queue with a spinlock rt-partition: one run queue per core Patches for more efficient implementation on the way! 8 RunQ VCPUs withbudget VCPUs out of budget Sorted by priority
  • 10.
    Experimental Setup Hardware:Intel i7 processor, six cores running at 3.33 GHz Dedicate one PCPU to domain 0 All guest VMs use the remaining cores Cache architecture Each core has dedicated L1 cache (32 KB) and L2 cache (256 KB) All six cores share L3 cache (12 MB) Inclusive L3 cache, all data in L2 cache must also be in L3 cache Software Xen 4.3 patched with RT-Xen Guest OS: Linux patched with LITMUSRT 9
  • 11.
    RT-Xen 2.0: SchedulingOverhead 10 rt-global has extra overhead due to global lock credit has high max overhead due to load balancing
  • 12.
    RT-Xen 2.0: CreditScheduler 11 credit missed deadline at 22% CPU capacity RT-Xen delivers real-time performance up to 78%
  • 13.
    RT-Xen 2.0: Real-TimePerformance 12 Global scheduling wins empirically! gEDF + deferrable server -> best real-time performance
  • 14.
    Demo YouTube: “RT-XenDemonstration” https://www.youtube.com/watch?v=wisxWn3mR5s 13
  • 15.
    Patch Status PatchRFC v1 & v2July 10th& July 29th gEDF + deferrable server cpupool support Patch RFC v3Expected on Aug 24th Scheduling trace support Performance improvement (splitting RunQ) Patch RFC v4Expected before Sep 10th Performance improvement •Timer based budget replenishment •Improve the timing resolution of budget 14
  • 16.
    Conclusion Diverse applicationsdemand real-time virtualization Real-time virtualization in embedded area Cloud gaming RT-Xen provides real-time performance Efficient implementation of diverse real-time scheduling policies Leverage compositional scheduling theory -> analytical guarantee gEDF + deferrable server wins empirically 15
  • 17.
    Research Contributions RT-Xen1.0:S. Xi, J. Wilson, C. Lu, and C.D. Gill, RT-Xen: Towards Real-Time Hypervisor Scheduling in Xen, ACM International Conferences on Embedded Software (EMSOFT) 2011 RT-Xen 1.1: J. Lee, S. Xi, S. Chen, L.T.X. Phan, C. Gill, I. Lee, C. Lu and O. Sokolsky Realizing Compositional Scheduling through Virtualization, IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) 2012 RT-Xen 2.0:S. Xi, M. Xu, C. Lu, L.T.X. Phan, C. Gill, I. Lee, and O. SokolskyReal-Time Multi-Core Virtual Machine Scheduling in Xen, ACM International Conferences on Embedded Software (EMSOFT) 2014 RT-Xen 2.1 + RT-OpenStack: S. Xi, C. Li, C. Lu, C. Gill, M. Xu, L.T.X. Phan, C. Gill, I. Lee, and O. SokolskyRT-OpenStack: Co-Hosting RT VM with non RT VMs, in submission RTCA:S. Xi, C. Li, C. Lu, and C. GillPrioritizing Local Inter-Domain Communication in Xen, ACM/IEEE International Symposium on Quality of Service (IWQoS) 2013 RT-Xen patch (gEDF with deferrable server) RT-Xen: Real-Time Virtualization in Xen, Xen Blog, 2013 RT-Xen: Real-Time Virtualization in Xen, Xen Developer Summit, 2014 16
  • 18.
  • 19.
  • 20.
    Implementation –PCPU 19 Budget XBudget Task RunQ RunQ XTask RdyQ RdyQ VCPU Position Three Queues within One Physical Core VCPU Params (period, budget, priority) IDLE Periodic 0 3 IDLE
  • 21.
    Server Design –Deferrable& Polling Servers (Period, Budget, Priority) S1 (5, 3) with Two Tasks T1 (10, 3 ) 20 T2 (10, 3 ) Time 5 0 10 15 Budget in S1 Actual Execution Deferrable Server Time 5 0 10 15 3 back-to-back Budget in S1 Actual Execution Polling Server Time 5 0 10 15 3 1.Replenish? 2.Budget but NO task?
  • 22.
    Server Design –Periodic& Sporadic Servers (Period, Budget, Priority) S1 (5, 3) with Two Tasks T1 (10, 3 ) 21 T2 (10, 3 ) Time 5 0 10 15 Budget in S1 Actual Execution Sporadic Server Time 5 0 10 15 3 5 +3 5 +3 overhead ++ Budget in S1 Actual Execution Periodic Server Time 5 0 10 15 3 theory favored 1.Replenish? 2.Budget but NO task? ?
  • 23.
  • 24.
    RT-Xen 2.0: Workload Periodic task sets: [period, execution time, deadline] CPU-intensive, independent tasks Randomly generate the task sets until a total task utilization, then distribute tasks to four VMs, and apply compositional scheduling theory to calculate each VM’s resource interface 25 task sets per data point, measure fraction of schedulable tasksets 23
  • 25.
    RT-Xen 2.0: ContextSaved 8/20/2014 24 Less than 1 us overhead for the spinlock
  • 26.
    RT-Xen 2.0: Theoryvs. Experiments 25 gEDF < pEDF theoretically due to pessimistic analysis gEDF > pEDF empirically, thanks to global scheduling
  • 27.
    RT-Xen 2.0: Theoryvs. Experiments 8/20/2014 26 gEDF > pEDF empirically, thanks to global scheduling gEDF < pEDF theoretically due to pessimistic analysis
  • 28.
    RT-Xen 2.0: Howabout Cache? 27 Benefit of global scheduling dominate migration cost on a shared L3-cache platform.
  • 29.
    RT-Xen 2.0: ContextSwitch 8/20/2014 28 “Fake” context switch with idle VCPUs