More Related Content
Similar to Shalini xs10 (20)
More from The Linux Foundation
More from The Linux Foundation (20)
Shalini xs10
- 1. Supporting Soft Real-Time Tasks
Min Lee, A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik
Paper published in VEE 2010
- 2. Problem Statement
Given a mix of workloads how do you schedule the
workload such that
– Real-time workloads get the needed resources
– Non-real-time workloads are not starved
Main goal:
– Present a new scheduler based on the credit scheduler
which meets the demands of a real-time workload
– Target application studied: Media in an IP telephony system
© 2009 Avaya Inc. All rights reserved. 2
- 3. Real-Time Applications
Requirements of a Real-time application like Media Server
– Highly I/O bound
– Needs timely allocation of compute resources
Challenges when deployed on Xen
– I/O virtualization overhead
– Variable scheduling latency with mixed workloads
– Cache contention between workloads
Scheduler is central to all these issues!
© 2009 Avaya Inc. All rights reserved. 3
- 4. Target Application: Enterprise IP Telephony System
Call-C and Sip-S
(Call setup/tear down)
Gateway/ Media
Server
(stream
IP Endpoint encode/decode)
. .
. Backbone .
. Local Area Network Network .
Local Area Network
Gateway/
Media Server
Media-S
Call-C
Sip-S
Dom0
IP Endpoint
Xen Hypervisor
Hardware
© 2009 Avaya Inc. All rights reserved. 4
- 5. Credit Scheduler: Credit Handling
Consumes credits to run
CPU0 VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5
Under Over
priority priority
Every 30ms, distribute credits based on weights
– E.g. 20% CPU to VM 1, 40% CPU to VM 2
– Proportional distribution
– Default weight (256)
© 2009 Avaya Inc. All rights reserved. 5
- 6. Credit Scheduler: I/O Handling
VCPUs woken up and boosted in priority when event
arrives
CPU0 VCPU3 VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5
Boost Under Over
priority priority priority
CPU0 VCPU4 VCPU2 VCPU7 VCPU1 VCPU3 VCPU0 VCPU5
Only one time boost (<10ms)
© 2009 Avaya Inc. All rights reserved. 6
- 7. Credit Scheduler: Credit Crisis
Credits
– Good for CPU-bound
Compute-
task intensive Tasks
(supported by
Boost Priority credit)
– Good for Low-latency
tasks, e.g. I/O in Dom0
– Short period boost
Media Servers Low-latency
Tasks (I/O Multi-media
– Need both CPU and low processing, Tasks
latency Interactive) (Soft Real-time)
– Timely CPU CPU- bound
– No support in current
scheduler © 2009 Avaya Inc. All rights reserved. 7
- 8. Laxity-based Scheduler
Four Components in the scheduler
– Laxity (A)
– Boost with event (B)
– Simple Load Balance (L)
– Cache-aware Real-time load balancing (LL)
© 2009 Avaya Inc. All rights reserved. 8
- 9. Experimental Setup
Enterprise Telephony system
– Media server – ‘Media-S’
– Signaling server – ‘Call-C,Sip-S’
– Other VMs
• Dom0
• Cdom [Computational domain]
• Two more domains [Licensing, management]
Two workload scenarios
– Standard (Cdom with no load)
– Cpuload (Cdom has 4 cpu-bound tasks)
© 2009 Avaya Inc. All rights reserved. 9
- 10. Experimental Setup
Media-S
C-Dom
Call-C
Dom0
…
SIP-S
Dell 2950 server
– 2 quad-core Xeon processors
Xen Hypervisor
– 4 GB of RAM
Server
4 cores used: 2 cores from each COMPACT
socket
– So private cache
4cps, sample one out of 4calls
IP network
IP network
– G.711, 20ms packetization
– 30sec hold time
– Max 240
streams through Media-S
tcpdump
PESQ
– Quality of voice metric SIPp and RTP Clients
– Compare the reference
with stream from Media-S © 2009 Avaya Inc. All rights reserved. 10
- 11. Instrumentation
Custom events for Xentrace
CPU utilization
Worst/average wait time
– Each priority
Scheduled Time
Average wait time
– Each priority
Cache misses
– By xenoprof
L2 cache missesAvaya Inc. All rights reserved.
© 2009 11
- 12. Reading the plots
PESQ
– Quality of voice
– 0 (bad)~4.5
– 4.0 (toll quality)
Cumulative Density
Boxplots
– Min/Max
– 25%,75%
percentile
– Median
Poor quality Good quality
© 2009 Avaya Inc. All rights reserved. 12
- 13. Default Credit Scheduler: Performance
Add weights – 512, 1024
– Insignificant impact
Pinning
– Media-S/Call-C require
significant CPU
– Pinned Media-S/Call-C to
CPU 0,1 respectively
– Pinned others to CPU 2,3
– Dom0 is floating
– Significant performance gain Weights Pinning
– Underutilizing CPU!
© 2009 Avaya Inc. All rights reserved. 13
- 14. Laxity (A)
A form of priority
– Target scheduling latency or deadline
– E.g. less than 5ms wait time in the run queue
– Parameter specified by user
Laxity values for Real-time domains
No value specified for Non real-time domains
– Conceptually infinite laxity
© 2009 Avaya Inc. All rights reserved. 14
- 15. Implementation of laxity
Where to insert real-time tasks to meet their deadline
– In over priority, laxity value is ignored.
VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5
CPU0 (1us) (7us) (4us) (13us) (40us) (21us)
VCPU#
Boost Under Over (Length of
priority priority priority previous time slice)
5us
20us
20us&boost
© 2009 Avaya Inc. All rights reserved. 15
- 16. Prediction of wait time in runqueue
Each VCPU maintains an
expected run time
– The amount of CPU time it
utilized in its previous run
Works reasonably well
– Min/max
– 25%,75% percentile
More sophisticated formula can
be used
Average difference between
expected and actual wait times
© 2009 Avaya Inc. All rights reserved. 16
- 18. Boost with event (B)
Credit scheduler boosts only waiting VCPUs
– Media-S in run queue doesn’t get boosted
Idea: Boost a VCPU that receives an event
– Even if VCPU with under priority in runqueue
VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5
CPU0 (1us) (7us) (4us) (13us) (40us) (21us)
(1) Receiving event
(2) Get boosted within queue
VCPU4 VCPU1 VCPU2 VCPU7 VCPU0 VCPU5
CPU0 (1us) (13us) (7us) (4us) (40us) (21us)
© 2009 Avaya Inc. All rights reserved. 18
- 19. Credit Scheduler: Load Balance
CPU0 VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5
(2) Peek peer’s Q
(3) Steal higher priority task
(1) Over or idle task?
CPU1 VCPU10 VCPU15
© 2009 Avaya Inc. All rights reserved. 19
- 20. Simple Load Balance (L)
Laxity-based scheduler (Simple Load Balance)
CPU0 VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5
(2) Peek peer’s Q
(3) Steal higher priority task
(4) If same priority
(a) If mine is real time task, don’t steal
(b) If peer’s is real time task, steal
(c) Both non-real-time, compare entrance time into queue
- with some delay (2ms)
(1) Under, over or idle task?
CPU1 VCPU17 VCPU11 VCPU10 VCPU15
© 2009 Avaya Inc. All rights reserved. 20
- 21. Simple Load Balance (L)
Laxity-based scheduler (Simple Load Balance)
CPU0 Media-S VCPU2 VCPU7 VCPU1 VCPU0 VCPU5
(1) This effectively distributes real-time tasks over CPUs
(2) Also prevents starvation
CPU1 VCPU17 VCPU11 VCPU10 VCPU15
© 2009 Avaya Inc. All rights reserved. 21
- 22. Simple Load Balance (L)
Laxity-based scheduler (Simple Load Balance)
CPU0 Media-S VCPU2 VCPU7 VCPU1 VCPU0 VCPU5
(1) Moves non-real-time task if they waited for some time
CPU1 VCPU2 VCPU17 VCPU11 VCPU10 VCPU15
© 2009 Avaya Inc. All rights reserved. 22
- 23. Cache-aware Real-time load balancing (LL)
L good, but…
– Ping-ponging tasks trash cache
Solution: Bind RT-tasks to CPU
– Fix RT-tasks to its initial CPU
– Disable load-balancing for RT-tasks
Creates unbalance of multiple RT-tasks
– Need a new load-balancer for RT-tasks
© 2009 Avaya Inc. All rights reserved. 23
- 24. Cache-aware Real-time load balancing (LL)
Don’t steal peer’s real-time tasks in L
– Prefer hot cache (to low wait time)
New x-sec-load balancer
– Balance of real time task’s cpu utilization via bin-packing
CPU0 90% utilized by one real time task
CPU1 80% utilized by three real time tasks
© 2009 Avaya Inc. All rights reserved. 24
- 26. Result (CPU utilization)
We’re fully utilizing CPUs!
14000
Amount of work done by cpuhogging
12000
10000
8000
process
6000
4000
2000
0
baseline(pinned) baseline A AL ALL
Policies
© 2009 Avaya Inc. All rights reserved. 26
- 27. Result (Cache misses)
Total cache misses down Cache misses for Media Server down
4000 1200
3500
1000
3000
800
2500
Misses (x10000)
2000 600
1500
400
1000
200
500
0
0
baseline
baseline
baseline
baseline
baseline
A
A
A
A
A
AL
ALL
AL
ALL
AL
ALL
AL
ALL
AL
ALL
baseline
A
AL
ALL
Total misses C-dom Domain-0 Call-C Sip-S Media-S
Cache misses for standard configuration
© 2009 Avaya Inc. All rights reserved. 27
- 28. Conclusion
New soft real-time aware scheduler for Xen
– Better support for real-time applications without penalizing
non-real time tasks
– Fully utilizing CPU resources
– Timeliness requirement expressed by laxity
© 2009 Avaya Inc. All rights reserved. 28
- 29. Backup
© 2009 Avaya Inc. All rights reserved.
- 30. Result (Various laxity value)
Lower laxity more realtime better quality
various laxity values - standard configuration
© 2009 Avaya Inc. All rights reserved. 30
- 31. Result (Various laxity value)
Lower laxity more realtime better quality
various laxity values - cpuload configuration
© 2009 Avaya Inc. All rights reserved. 31
- 32. Result (Load balance)
Load balancing is essential for multicore
PESQ improvement through load balancer in standard configuration
© 2009 Avaya Inc. All rights reserved. 32
- 33. Result (Boost with event)
Some impact
Adding Boost with event
© 2009 Avaya Inc. All rights reserved. 33