SlideShare a Scribd company logo
Demand-Based Coordinated
Scheduling for SMP VMs
Hwanju Kim1, Sangwook Kim2, Jinkyu Jeong1, Joonwon Lee2,
and Seungryoul Maeng1
Korea Advanced Institute of Science and Technology (KAIST)1
Sungkyunkwan University2
The 18th International Conference on Architectural Support for Programming Languages and Operating
Systems (ASPLOS)
Houston, Texas, March 16-20 2013
1
Software Trends in Multi-core Era
• Making the best use of HW parallelism
• Increasing “thread-level parallelism”
HW
SW
“Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications”,
Proceedings of IEEE, 2008
Processor
OS
App App App
Apps increasingly being multithreaded
RMS apps are “emerging killer apps”
Processors increasingly adding more cores
2/28
Software Trends in Multi-core Era
• Synchronization (communication)
• The greatest obstacle to the performance of
multithreaded workloads
HW
SW
Processor
OS
App App App
Barrier
Barrier
Thread
Lock wait
SpinlockSpin
wait
CPU
3/28
Software Trends in Multi-core Era
• Virtualization
• Ubiquitous for consolidating multiple workloads
• “Even OSes are workloads to be handled by VMM”
HW
SW
Processor
OS
App App App
OS OS
VMM
SMP
VM
SMP
VM
SMP
VM
“Synchronization-conscious coordination”
is essential for VMM to improve efficiency
Virtual CPU (vCPU) as a software entity
dictated by VMM scheduler
4/28
Coordinated Scheduling
vCPU
VMM scheduler VMM
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Time
shared
Uncoordinated scheduling
 A vCPU treated as an independent entity
Independent
entity
vCPU
VMM scheduler VMM
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Coordinated scheduling
 Sibling vCPUs treated as a group
(who belongs to the same VM)
Coordinated
group
vCPU
vCPU
vCPU
Time
shared
Lock
holder
Lock
waiter
Lock
waiter
Running
Waiting
Waiting
Uncoordinated scheduling makes
inter-vCPU synchronization ineffective
Time
shared
5/28
Prior Efforts for Coordination
Coscheduling [Ousterhout82]
: Synchronizing execution
Time
pCPU
pCPU
pCPU
pCPU
vCPU execution
Illusion of dedicated multi-core,
but CPU fragmentation
Relaxed coscheduling [VMware10]
: Balancing execution time
Time
pCPU
pCPU
pCPU
pCPU
Stop execution for siblings to catch up
Good CPU utilization & coordination,
but not based on synchronization demands
Time
pCPU
pCPU
pCPU
pCPU
Balance scheduling [Sukwong11]
: Balancing pCPU allocation
Good CPU utilization & coordination,
but not based on synchronization demands
Selective coscheduling [Weng09,11]…
: Coscheduling selected vCPUs
Time
pCPU
pCPU
pCPU
pCPU
Better coordination through explicit information,
but relying on user or OS support
Selected vCPUs
Need for VMM scheduling based on
synchronization (coordination) demands
6/28
Overview
• Demand-based coordinated scheduling
• Identifying synchronization demands
• With non-intrusive design
• Not compromising inter-VM fairness
Time
pCPU
pCPU
pCPU
pCPU
Demand of coscheduling for synchronization
Demand of delayed preemption for synchronization
Preemption
attempt
7/28
Coordination Space
• Time and space domains
• Independent scheduling decision for each domain
Space
Where to schedule?
Time
When to schedule?
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Coordinated
groupPreemptive scheduling policy
 Coscheduling
 Delayed preemption
pCPU assignment policy
8/28
Outline
• Motivation
• Coordination in time domain
• Kernel-level coordination demands
• User-level coordination demands
• Coordination in space domain
• Load-conscious balance scheduling
• Evaluation
vCPU
pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Space
Time
9/28
Synchronization to be Coordinated
• Synchronization based on “busy-waiting”
• Unnecessary CPU consumption by busy-waiting
for a descheduled vCPU
• Significant performance degradation
• Semantic gap
• “OSes make liberal use of busy-waiting (e.g., spinlock)
since they believe their vCPUs are dedicated”
 Serious problem in kernel
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
• When and where to demand synchronization?
• How to identify coordination demands?
10/28
Kernel-Level Coordination Demands
• Does kernel really need coordination?
• Experimental analysis
• Multithreaded applications in the PARSEC suite
• Measuring “kernel time” when uncoordinated
Solorun (no consolidation) Corun (w/ 1 VM running streamcluster)A 8-vCPU VM
on 8 pCPUs
0%
20%
40%
60%
80%
100%
blackscholes
bodytrack
canneal
dedup
facesim
ferret
fluidanimate
freqmine
raytrace
streamcluster
swaptions
vips
x264
CPUtime(%)
Kernel time User time
0%
20%
40%
60%
80%
100%
blackscholes
bodytrack
canneal
dedup
facesim
ferret
fluidanimate
freqmine
raytrace
streamcluster
swaptions
vips
x264
CPUtime(%)
Kernel time User time
Kernel time ratio is largely amplified by x1.3-x30
 “Newly introduced kernel-level contention”
11/28
Kernel-Level Coordination Demands
• Where is the kernel time amplified?
0%
20%
40%
60%
80%
100%
blackscholes
bodytrack
canneal
dedup
facesim
ferret
fluidanimate
freqmine
raytrace
streamcluster
swaptions
vips
x264
CPUtime(%)
Kernel time User time
0%
20%
40%
60%
80%
100%
CPUusageforkerneltime(%)
TLB shootdown Lock spinning Others
Kernel time breakdown by functions
Dominant sources
1) TLB shootdown
2) Lock spinning
How to identify?
12/28
How to Identify TLB Shootdown?
• TLB shootdown
• Notification of TLB invalidation to a remote CPU
CPU
Thread
CPU
Thread
Virtual address
space
TLB TLB
V->P1
V->P1
V->P1
V->P2 or V->Null
Modify
or
Unmap
Inter-processor interrupt (IPI)
Busy-waiting until all corresponding
TLB entries are invalidated
“Busy-waiting for TLB synchronization” is efficient in native systems,
but not in virtualized systems if target vCPUs are not scheduled.
(Even worse if TLBs are synchronized in a broadcast manner)
13/28
How to Identify TLB Shootdown?
• TLB shootdown IPI
• Virtualized by VMM
• Used in x86-based Windows and Linux
0%
20%
40%
60%
80%
100%
bodytrack
canneal
dedup
facesim
ferret
fluidani…
streamcl…
swaptions
vips
x264
CPUusageforkerneltime(%)
TLB shootdown Lock spinning Others
0
500
1000
1500
2000
bodytrack
canneal
dedup
facesim
ferret
fluidanim…
streamclu…
swaptions
vips
x264
#ofIPIs/vCPU/sec
“A TLB shootdown IPI is a signal for coordination demand!”
 Co-schedule IPI-recipient vCPUs with its sender vCPU
TLB shootdown IPI traffic
14/28
How to Identify Lock Spinning?
• Why excessive lock spinning?
• “Lock-holder preemption (LHP)”
• Short critical section can be unpredictably prolonged by
vCPU preemption
• Which spinlock is problematic?
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
0%
20%
40%
60%
80%
100%
Lockwaittime(%)
Other locks
Runqueue lock
Pagetable lock
Semaphore wait-queue lock
Futex wait-queue lock
Spinlock
wait time
breakdown
82%
93%
15/28
How to Identify Lock Spinning?
• Futex
• Linux kernel support for user-level synchronization
(e.g., mutex, barrier, conditional variables, etc)
mutex_lock(mutex)
/* critical section */
mutex_unlock(mutex)
futex_wake(mutex) {
spin_lock(queue->lock)
thread=dequeue(queue)
wake_up(thread)
spin_unlock(queue->lock)
}
mutex_lock(mutex)
futex_wait(mutex) {
spin_lock(queue->lock)
enqueue(queue, me)
spin_unlock(queue->lock)
schedule() /* blocked */
vCPU1 vCPU2
/* wake-up */
/* critical section */
mutex_unlock(mutex)
futex_wake(mutex) {
spin_lock(queue->lock)
Reschedule IPI
User-level
contention
Kernel-level
contention
If vCPU1 is preempted before releasing its spinlock,
vCPU2 starts busy-waiting on the preempted spinlock
 LHP!
Kernel
space
16/28
Preempted
How to Identify Lock Spinning?
• Why preemption-prone?
pCPU
vCPU1
vCPU0
VMExit
IPI emulation
Wait-queue lock
VMExit
APIC reg access
VMEntry
VMExit
APIC reg access
VMEntry
Wait-queue unlock
VMEntry
Wait-queue lock
spinning
 Prolonged by VMM intervention
 Multiple VMM interventions
for one IPI transmission
 Repeated by iterative wake-up
No more short critical section!
 Likelihood of preemption
 Preemption by woken-up sibling
 Serious issue
Remote thread wake-up
17/28
How to Identify Lock Spinning?
• Generalization: “Wait-queue locks”
• Not limited to futex wake-up
• Many wake-up functions in the Linux kernel
• General wake-up
• __wake_up*()
• Semaphore or mutex unlock
• rwsem_wake(), __mutex_unlock_common_slowpath(), …
• “Multithreaded workloads usually communicate
and synchronize on wait-queues”
“A Reschedule IPI is a signal for coordination demand!”
Delay preemption of an IPI-sender vCPU
until a likely-held spinlock is released
18/28
Outline
• Motivation
• Coordination in time domain
• Kernel-level coordination demands
• User-level coordination demands
• Coordination in space domain
• Load-conscious balance scheduling
• Evaluation
vCPU
pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Space
Time
19/28
vCPU-to-pCPU Assignment
• Balance scheduling [Sukwong11]
• Spreading sibling vCPUs on different pCPUs
• Increase in likelihood of coscheduling
• No coordination in time domain
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
pCPU pCPU pCPU pCPU
Uncoordinated scheduling Balance scheduling
vCPU stacking
Likelihood of
coscheduling
<
No vCPU stacking
20/28
vCPU-to-pCPU Assignment
• Balance scheduling [Sukwong11]
• Limitation
• Based on “global CPU loads are well balanced”
• In practice, VMs with fair CPU shares can have
vCPU vCPU
vCPU
vCPU vCPU
x4 shares
SMP VM
UP VM
vCPU vCPU
vCPU
vCPU vCPUSMP VM
SMP VM
Inactive vCPUs
Single-threaded workload
Multithreaded workload
Different # of vCPUs Different TLP
0
200
400
600
800
5 15 25 35 45 55 65 75 85 95
CPUusage(%)
Time (sec)
canneal
0
200
400
600
800
1 4 7 10 13 16 19 22
CPUusage(%)
Time (sec)
dedup
TLP can be changed
in a multithreaded app
TLP: Thread-level parallelism
pCPU pCPU
vCPUvCPU
pCPU pCPU
vCPU vCPU
vCPU vCPU
High scheduling latency
Balance scheduling
on imbalanced loads
21/28
Proposed Scheme
• Load-conscious balance scheduling
• Adaptive scheme based on pCPU loads
• When assigning a vCPU, check pCPU loads
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
If load is balanced
 Balance scheduling
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU vCPU
vCPU
If load is imbalanced
 Favoring underloaded pCPUs
CPU load > Avg. CPU load
 overloaded
Handled by coordination
in time domain
22/28
Outline
• Motivation
• Coordination in time domain
• Kernel-level coordination demands
• User-level coordination demands
• Coordination in space domain
• Load-conscious balance scheduling
• Evaluation
23/28
Evaluation
• Implementation
• Based on Linux KVM and CFS
• Evaluation
• Effective time slice
• For coscheduling & delayed preemption
• 500us decided by sensitive analysis
• Performance improvement
• Alternative
• OS re-engineering
24/28
Evaluation
• SMP VM with UP VMs
• One 8-vCPU VM + four 1-vCPU VMs (x264)
0.00
0.50
1.00
1.50
2.00
Normalizedexecutiontime
Workloads of 8-vCPU VM
Baseline Balance LC-Balance LC-Balance+Resched-DP LC-Balance+Resched-DP+TLB-Co
Futex-intensive
 5-53% improvement
TLB-intensive
 20-90% improvement
Performance of 8-vCPU VM
LC-Balance: Load-conscious balance scheduling
Resched-DP: Delayed preemption for reschedule IPI
TLB-Co: Coscheduling for TLB shootdown IPI
Non-synchronization-intensive
25/28
High scheduling latencyBalance
scheduling
Alternative: OS Re-engineering
• Virtualization-friendly re-engineering
• Decoupling reschedule IPI transmission from
thread wake-up
wake_up (queue) {
spin_lock(queue->lock)
thread=dequeue(queue)
wake_up(thread)
spin_unlock(queue->lock)
}
Reschedule IPI
Delayed reschedule IPI transmission
• Modified wake_up func
• Using per-cpu bitmap
• Applied to futex_wakeup
& futex_requeue
One 8-vCPU VM + four 1-vCPU VMs (x264)
Delayed reschedule IPI is virtualization-friendly to resolve LHP problems
26/28
0.00
0.20
0.40
0.60
0.80
1.00
1.20
facesim streamcluster
Normalizedexecutiontime
Baseline
Baseline w/ DelayedResched
LC_Balance
LC_Balance w/ DelayedResched
LC_Balance w/ Resched-DP
Conclusions & Future Work
• Demand-based coordinated scheduling
• IPI as an effective signal for coordination
• pCPU assignment conscious of dynamic CPU loads
• Limitation
• Cannot cover ALL types of synchronization demands
• Kernel spinlock contention w/o VMM intervention
• Future work
• Cooperation with HW (e.g., PLE) & paravirt
Barrier or lock
27/28
Address
space
Thank You!
• Questions and comments
• Contacts
• hjukim@calab.kaist.ac.kr
• http://calab.kaist.ac.kr/~hjukim
28/28
EXTRA SLIDES
29
User-Level Coordination Demands
• Coscheduling-friendly workloads
• SPMD, bulk-synchronous, etc.
• Busy-waiting synchronization
• “Spin-then-block”
Barrier
Barrier
Thread1 Thread2 Thread3 Thread4
Wake
up
Wake
up
Wake
up
Wake
up
Additional
barrier
Thread1 Thread2 Thread3 Thread4 Thread1 Thread2 Thread3 Thread4
Wake
up
Coscheduling
(balanced execution)
Uncoordinated
(largely skewed execution)
Uncoordinated
(skewed execution)
More blocking operations
when uncoordinated
Spin Block
30/28
User-Level Coordination Demands
• Coscheduling
• Avoiding more expensive blocking in a VM
• VMExits for CPU yielding and wake-up
• Halt (HLT) and Reschedule IPI
• When to coschedule?
• User-level synchronization involves reschedule IPIs
Providing a knob to selectively enable this coscheduling for coscheduling-friendly VMs
Reschedule IPI traffic of streamcluster
Barriers Barriers Barriers Barriers Barriers Barriers
“A Reschedule IPI is a signal for coordination demand!”
Co-schedule IPI-recipient vCPUs with a sender vCPU
31/28
Urgent vCPU First (UVF) Scheduling
• Urgent vCPU
• 1. Preemptively scheduled if fairness is kept
• 2. Protected from preemption once scheduled
• During “Urgent time slice (utslice)”
pCPU
vCPU vCPU vCPU
Urgent queue Runqueue
vCPU
pCPU
vCPU vCPU vCPUvCPU
FIFO order Proportional shares order
vCPU : urgent vCPU
vCPU vCPU
Wait queue
If inter-VM fairness is kept
Coscheduled
Protected from
preemption
32/28
Proposed Scheme
• Load-conscious balance scheduling
• Adaptive scheme based on pCPU loads
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Balanced loads
 Balance scheduling
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU vCPU
vCPU
Imbalanced loads
 Favoring underloaded pCPUs
vCPU
pCPU0 pCPU1 pCPU2 pCPU3
vCPU
vCPU vCPU
Wait queue
• Example
vCPUvCPU vCPU
Candidate pCPU set
(Scheduler assigns a lowest-loaded pCPU in this set)
= {pCPU0, pCPU1, pCPU2, pCPU3}
pCPU3 is overloaded
(i.e., CPU load > Avg. CPU load)
Handled by coordination in time domain
(UVF scheduling)
33/28
Evaluation
• Urgent time slice (utslice)
• 1. Utslice for reducing LHP
• 2. Utslice for quickly serving multiple urgent vCPUs
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 100 300 500 700 1000
#offutexqueueLHP
Utslice (usec)
bodytrack
facesim
streamcluster
Workloads:
A futex-intensive workload in one VM
+ dedup in another VM as a preempting VM
>300us utslice
2x-3.8x LHP reduction
Remaining LHPs occur during local wake-up or
before reschedule IPI transmission
 Unlikely lead to lock contention
34/28
Evaluation
• Urgent time slice (utslice)
• 1. utslice for reducing LHP
• 2. utslice for quickly serving multiple urgent vCPUs
30
35
40
45
50
55
60
0
2
4
6
8
10
12
14
16
100 500 1000 3000 5000
Averageexecutiontime(sec)
CPUcycles(%)
Utslice (usec)
Spinlock cycles (%)
TLB cycles (%)
Execution time (sec)
Workloads:
3 VMs, each of which runs vips
(vips - TLB-IPI-intensive application)
As utslice increases,
TLB shootdown cycles increase
500usec is an appropriate utslice for both
LHP reduction and multiple urgent vCPUs
~11% degradation
35/28
Evaluation
• Urgent allowance
• Improving overall efficiency with fairness
0
0.5
1
1.5
2
2.5
3
3.5
0
5
10
15
20
25
30
No UVF 0 6 12 18 24
Slowdown
CPUcycles(%)
Urgent allowance (msec)
Spinlock cycles
TLB cycles
Slowdown (vips)
Slowdown (facesim x 2)
Workloads:
vips (TLB-IPI-intensive) VM + two facesim VMs
Efficient TLB synchronization
No performance drop
36/28
Evaluation
• Impact of kernel-level coordination
• One 8-vCPU VM + four 1-vCPU VMs (x264)
0.00
0.50
1.00
1.50
Normalizedexecutiontime
Co-running workloads with 1-vCPU VM (x264)
Baseline Balance LC-Balance LC-Balance+Resched-DP LC-Balance+Resched-DP+TLB-Co
Performance of 1-vCPU VM
LC-Balance: Load-conscious balance scheduling
Resched-DP: Delayed preemption for reschedule IPI
TLB-Co: Coscheduling for TLB shootdown IPI
Unfair
contention
Balance
scheduling
Balance scheduling  Up to 26% degradation
37/28
Evaluation: Two SMP VMs
w/ dedup
w/ freqmine
a: baseline
b: balance
c: LC-balance
d: LC-balance+Resched-DP
e: LC-balance+Resched-DP+TLB-Co

corun
solorun
Time
Time
38/28
Evaluation
• Effectiveness on HW-assisted feature
• CPU feature to reduce the amount of busy-waiting
• VMExit in response to excessive busy-waiting
• Intel Pause-Loop-Exiting (PLE), AMD Pause Filter
• Inevitable cost of some busy-waiting and VMExit
LHP
PAUSE
PAUSE
PAUSE
…
Threshold
VMExit
Yielding
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
Baseline LC_Balance LC_Balance
w/ UVF
Normalizedexecutiontime
CPUcycles(%)
TLB cycles (%) Spinlock cycles (%)
Execution time (sec)
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
Baseline LC_Balance LC_Balance
w/ UVF
Normalizedexecutiontime
CPUcycles(%)
TLB cycles (%) Spinlock cycles (%)
Execution time (sec)
streamcluster (futex-intensive) ferret (TLB-IPI-intensive)
Apps Streamcluster facesim ferret vips
Reduction in Pause-
loop VMExits (%) 44.5 97.7 74.0 37.9
39/28
Evaluation
• Coscheduling-friendly user-level workload
• Streamcluster
• Spin-then-block barrier intensive workload
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
UVF w/o Resched-Co UVF w/ Resched-Co
#ofbarriersynchronization
Departure (block)
Departure (spin)
Arrival (block)
Arrival (spin)
More performance improvement
as the time of spin-waiting increases
Blocking: 38%
Reschedule IPIs (3 VMExits): 21%
Additional (departure) barriers: 29%
Normalized execution time (corunning w/ bodytrack)
Additional barriers
Barrier breakdown
Resched-Co: Coscheduling for rescheudle IPI
0.00
0.20
0.40
0.60
0.80
1.00
0.1ms spin wait
(default)
10x spin wait 20x spin wait
Normalizedexecutiontime
UVF w/o Resched-Co UVF w/ Resched-Co
40/28

More Related Content

What's hot

Application Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN EnvironmentApplication Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN Environment
Mahendra Kutare
 
5. IO virtualization
5. IO virtualization5. IO virtualization
5. IO virtualization
Hwanju Kim
 
VM Live Migration Speedup in Xen
VM Live Migration Speedup in XenVM Live Migration Speedup in Xen
VM Live Migration Speedup in Xen
The Linux Foundation
 
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Hann Yu-Ju Huang
 
Xen Memory Management
Xen Memory ManagementXen Memory Management
Xen Memory Management
The Linux Foundation
 
Memory Virtualization
Memory VirtualizationMemory Virtualization
Memory Virtualization
Tsuyoshi OZAWA
 
4. Memory virtualization and management
4. Memory virtualization and management4. Memory virtualization and management
4. Memory virtualization and management
Hwanju Kim
 
Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Survey
ijsrd.com
 
Redesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) MechanismRedesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) Mechanism
The Linux Foundation
 
webinar vmware v-sphere performance management Challenges and Best Practices
webinar vmware v-sphere performance management Challenges and Best Practiceswebinar vmware v-sphere performance management Challenges and Best Practices
webinar vmware v-sphere performance management Challenges and Best Practices
Metron
 
XS Boston 2008 Quantitative
XS Boston 2008 QuantitativeXS Boston 2008 Quantitative
XS Boston 2008 Quantitative
The Linux Foundation
 
Vm migration techniques
Vm migration techniquesVm migration techniques
Vm migration techniques
garishma bhatia
 
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, HuaweiXPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
The Linux Foundation
 
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...
Sangwook Kim
 
Introduction to Virtualization, Virsh and Virt-Manager
Introduction to Virtualization, Virsh and Virt-ManagerIntroduction to Virtualization, Virsh and Virt-Manager
Introduction to Virtualization, Virsh and Virt-Manager
walkerchang
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization Opportunities
The Linux Foundation
 
cloud computing: Vm migration
cloud computing: Vm migrationcloud computing: Vm migration
cloud computing: Vm migration
Dr.Neeraj Kumar Pandey
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
vwchu
 
Xen io
Xen ioXen io
Xen io
wangyuanzhf
 
Hardware supports for Virtualization
Hardware supports for VirtualizationHardware supports for Virtualization
Hardware supports for Virtualization
Yoonje Choi
 

What's hot (20)

Application Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN EnvironmentApplication Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN Environment
 
5. IO virtualization
5. IO virtualization5. IO virtualization
5. IO virtualization
 
VM Live Migration Speedup in Xen
VM Live Migration Speedup in XenVM Live Migration Speedup in Xen
VM Live Migration Speedup in Xen
 
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...
 
Xen Memory Management
Xen Memory ManagementXen Memory Management
Xen Memory Management
 
Memory Virtualization
Memory VirtualizationMemory Virtualization
Memory Virtualization
 
4. Memory virtualization and management
4. Memory virtualization and management4. Memory virtualization and management
4. Memory virtualization and management
 
Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Survey
 
Redesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) MechanismRedesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) Mechanism
 
webinar vmware v-sphere performance management Challenges and Best Practices
webinar vmware v-sphere performance management Challenges and Best Practiceswebinar vmware v-sphere performance management Challenges and Best Practices
webinar vmware v-sphere performance management Challenges and Best Practices
 
XS Boston 2008 Quantitative
XS Boston 2008 QuantitativeXS Boston 2008 Quantitative
XS Boston 2008 Quantitative
 
Vm migration techniques
Vm migration techniquesVm migration techniques
Vm migration techniques
 
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, HuaweiXPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
 
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...
 
Introduction to Virtualization, Virsh and Virt-Manager
Introduction to Virtualization, Virsh and Virt-ManagerIntroduction to Virtualization, Virsh and Virt-Manager
Introduction to Virtualization, Virsh and Virt-Manager
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization Opportunities
 
cloud computing: Vm migration
cloud computing: Vm migrationcloud computing: Vm migration
cloud computing: Vm migration
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
 
Xen io
Xen ioXen io
Xen io
 
Hardware supports for Virtualization
Hardware supports for VirtualizationHardware supports for Virtualization
Hardware supports for Virtualization
 

Viewers also liked

Heat在企业中的应用实践
Heat在企业中的应用实践Heat在企业中的应用实践
Heat在企业中的应用实践
xuanlangjian
 
Task-aware Virtual Machine Scheduling for I/O Performance
Task-aware Virtual Machine Scheduling for I/O PerformanceTask-aware Virtual Machine Scheduling for I/O Performance
Task-aware Virtual Machine Scheduling for I/O Performance
Hwanju Kim
 
Extending TripleO for OpenStack Management
Extending TripleO for OpenStack ManagementExtending TripleO for OpenStack Management
Extending TripleO for OpenStack Management
Keith Basil
 
Become An OpenStack TripleO ATC - Easy As ABC
Become An OpenStack TripleO ATC - Easy As ABCBecome An OpenStack TripleO ATC - Easy As ABC
Become An OpenStack TripleO ATC - Easy As ABC
K Rain Leander
 
TripleO
 TripleO TripleO
TripleO
Kiran Murari
 
Heat optimization
Heat optimizationHeat optimization
Heat optimization
Rico Lin
 
1.Introduction to virtualization
1.Introduction to virtualization1.Introduction to virtualization
1.Introduction to virtualization
Hwanju Kim
 

Viewers also liked (7)

Heat在企业中的应用实践
Heat在企业中的应用实践Heat在企业中的应用实践
Heat在企业中的应用实践
 
Task-aware Virtual Machine Scheduling for I/O Performance
Task-aware Virtual Machine Scheduling for I/O PerformanceTask-aware Virtual Machine Scheduling for I/O Performance
Task-aware Virtual Machine Scheduling for I/O Performance
 
Extending TripleO for OpenStack Management
Extending TripleO for OpenStack ManagementExtending TripleO for OpenStack Management
Extending TripleO for OpenStack Management
 
Become An OpenStack TripleO ATC - Easy As ABC
Become An OpenStack TripleO ATC - Easy As ABCBecome An OpenStack TripleO ATC - Easy As ABC
Become An OpenStack TripleO ATC - Easy As ABC
 
TripleO
 TripleO TripleO
TripleO
 
Heat optimization
Heat optimizationHeat optimization
Heat optimization
 
1.Introduction to virtualization
1.Introduction to virtualization1.Introduction to virtualization
1.Introduction to virtualization
 

Similar to Demand-Based Coordinated Scheduling for SMP VMs

RTOS Material hfffffffffffffffffffffffffffffffffffff
RTOS Material hfffffffffffffffffffffffffffffffffffffRTOS Material hfffffffffffffffffffffffffffffffffffff
RTOS Material hfffffffffffffffffffffffffffffffffffff
adugnanegero
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
Jean-François Deverge
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
Hajime Tazaki
 
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
Jiannan Ouyang, PhD
 
Project ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN CPU sharing BVT scheduler in ACRN hypervisorProject ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Jiannan Ouyang, PhD
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
Hajime Tazaki
 
PFQ@ 9th Italian Networking Workshop (Courmayeur)
PFQ@ 9th Italian Networking Workshop (Courmayeur)PFQ@ 9th Italian Networking Workshop (Courmayeur)
PFQ@ 9th Italian Networking Workshop (Courmayeur)
Nicola Bonelli
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtop
Alan Renouf
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
Ensuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paperEnsuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paper
hptoga
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World  (Ram, ITSF 2016)Sync in an NFV World  (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
Adam Paterson
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
Calnex Solutions
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld
 
The Next Step of OpenStack Evolution for NFV Deployments
The Next Step ofOpenStack Evolution for NFV DeploymentsThe Next Step ofOpenStack Evolution for NFV Deployments
The Next Step of OpenStack Evolution for NFV Deployments
Dirk Kutscher
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
ScyllaDB
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
Matteo Ferroni
 
Experiences porting KVM to SmartOS
Experiences porting KVM to SmartOSExperiences porting KVM to SmartOS
Experiences porting KVM to SmartOS
bcantrill
 
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Peter Tripp
 

Similar to Demand-Based Coordinated Scheduling for SMP VMs (20)

RTOS Material hfffffffffffffffffffffffffffffffffffff
RTOS Material hfffffffffffffffffffffffffffffffffffffRTOS Material hfffffffffffffffffffffffffffffffffffff
RTOS Material hfffffffffffffffffffffffffffffffffffff
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
 
Project ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN CPU sharing BVT scheduler in ACRN hypervisorProject ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN CPU sharing BVT scheduler in ACRN hypervisor
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
PFQ@ 9th Italian Networking Workshop (Courmayeur)
PFQ@ 9th Italian Networking Workshop (Courmayeur)PFQ@ 9th Italian Networking Workshop (Courmayeur)
PFQ@ 9th Italian Networking Workshop (Courmayeur)
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtop
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Ensuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paperEnsuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paper
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World  (Ram, ITSF 2016)Sync in an NFV World  (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
 
Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)Sync in an NFV World (Ram, ITSF 2016)
Sync in an NFV World (Ram, ITSF 2016)
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
 
The Next Step of OpenStack Evolution for NFV Deployments
The Next Step ofOpenStack Evolution for NFV DeploymentsThe Next Step ofOpenStack Evolution for NFV Deployments
The Next Step of OpenStack Evolution for NFV Deployments
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
 
Experiences porting KVM to SmartOS
Experiences porting KVM to SmartOSExperiences porting KVM to SmartOS
Experiences porting KVM to SmartOS
 
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
 

Recently uploaded

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 

Recently uploaded (20)

2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 

Demand-Based Coordinated Scheduling for SMP VMs

  • 1. Demand-Based Coordinated Scheduling for SMP VMs Hwanju Kim1, Sangwook Kim2, Jinkyu Jeong1, Joonwon Lee2, and Seungryoul Maeng1 Korea Advanced Institute of Science and Technology (KAIST)1 Sungkyunkwan University2 The 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) Houston, Texas, March 16-20 2013 1
  • 2. Software Trends in Multi-core Era • Making the best use of HW parallelism • Increasing “thread-level parallelism” HW SW “Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications”, Proceedings of IEEE, 2008 Processor OS App App App Apps increasingly being multithreaded RMS apps are “emerging killer apps” Processors increasingly adding more cores 2/28
  • 3. Software Trends in Multi-core Era • Synchronization (communication) • The greatest obstacle to the performance of multithreaded workloads HW SW Processor OS App App App Barrier Barrier Thread Lock wait SpinlockSpin wait CPU 3/28
  • 4. Software Trends in Multi-core Era • Virtualization • Ubiquitous for consolidating multiple workloads • “Even OSes are workloads to be handled by VMM” HW SW Processor OS App App App OS OS VMM SMP VM SMP VM SMP VM “Synchronization-conscious coordination” is essential for VMM to improve efficiency Virtual CPU (vCPU) as a software entity dictated by VMM scheduler 4/28
  • 5. Coordinated Scheduling vCPU VMM scheduler VMM pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU Time shared Uncoordinated scheduling  A vCPU treated as an independent entity Independent entity vCPU VMM scheduler VMM pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU Coordinated scheduling  Sibling vCPUs treated as a group (who belongs to the same VM) Coordinated group vCPU vCPU vCPU Time shared Lock holder Lock waiter Lock waiter Running Waiting Waiting Uncoordinated scheduling makes inter-vCPU synchronization ineffective Time shared 5/28
  • 6. Prior Efforts for Coordination Coscheduling [Ousterhout82] : Synchronizing execution Time pCPU pCPU pCPU pCPU vCPU execution Illusion of dedicated multi-core, but CPU fragmentation Relaxed coscheduling [VMware10] : Balancing execution time Time pCPU pCPU pCPU pCPU Stop execution for siblings to catch up Good CPU utilization & coordination, but not based on synchronization demands Time pCPU pCPU pCPU pCPU Balance scheduling [Sukwong11] : Balancing pCPU allocation Good CPU utilization & coordination, but not based on synchronization demands Selective coscheduling [Weng09,11]… : Coscheduling selected vCPUs Time pCPU pCPU pCPU pCPU Better coordination through explicit information, but relying on user or OS support Selected vCPUs Need for VMM scheduling based on synchronization (coordination) demands 6/28
  • 7. Overview • Demand-based coordinated scheduling • Identifying synchronization demands • With non-intrusive design • Not compromising inter-VM fairness Time pCPU pCPU pCPU pCPU Demand of coscheduling for synchronization Demand of delayed preemption for synchronization Preemption attempt 7/28
  • 8. Coordination Space • Time and space domains • Independent scheduling decision for each domain Space Where to schedule? Time When to schedule? vCPU pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU Coordinated groupPreemptive scheduling policy  Coscheduling  Delayed preemption pCPU assignment policy 8/28
  • 9. Outline • Motivation • Coordination in time domain • Kernel-level coordination demands • User-level coordination demands • Coordination in space domain • Load-conscious balance scheduling • Evaluation vCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU Space Time 9/28
  • 10. Synchronization to be Coordinated • Synchronization based on “busy-waiting” • Unnecessary CPU consumption by busy-waiting for a descheduled vCPU • Significant performance degradation • Semantic gap • “OSes make liberal use of busy-waiting (e.g., spinlock) since they believe their vCPUs are dedicated”  Serious problem in kernel vCPU pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU • When and where to demand synchronization? • How to identify coordination demands? 10/28
  • 11. Kernel-Level Coordination Demands • Does kernel really need coordination? • Experimental analysis • Multithreaded applications in the PARSEC suite • Measuring “kernel time” when uncoordinated Solorun (no consolidation) Corun (w/ 1 VM running streamcluster)A 8-vCPU VM on 8 pCPUs 0% 20% 40% 60% 80% 100% blackscholes bodytrack canneal dedup facesim ferret fluidanimate freqmine raytrace streamcluster swaptions vips x264 CPUtime(%) Kernel time User time 0% 20% 40% 60% 80% 100% blackscholes bodytrack canneal dedup facesim ferret fluidanimate freqmine raytrace streamcluster swaptions vips x264 CPUtime(%) Kernel time User time Kernel time ratio is largely amplified by x1.3-x30  “Newly introduced kernel-level contention” 11/28
  • 12. Kernel-Level Coordination Demands • Where is the kernel time amplified? 0% 20% 40% 60% 80% 100% blackscholes bodytrack canneal dedup facesim ferret fluidanimate freqmine raytrace streamcluster swaptions vips x264 CPUtime(%) Kernel time User time 0% 20% 40% 60% 80% 100% CPUusageforkerneltime(%) TLB shootdown Lock spinning Others Kernel time breakdown by functions Dominant sources 1) TLB shootdown 2) Lock spinning How to identify? 12/28
  • 13. How to Identify TLB Shootdown? • TLB shootdown • Notification of TLB invalidation to a remote CPU CPU Thread CPU Thread Virtual address space TLB TLB V->P1 V->P1 V->P1 V->P2 or V->Null Modify or Unmap Inter-processor interrupt (IPI) Busy-waiting until all corresponding TLB entries are invalidated “Busy-waiting for TLB synchronization” is efficient in native systems, but not in virtualized systems if target vCPUs are not scheduled. (Even worse if TLBs are synchronized in a broadcast manner) 13/28
  • 14. How to Identify TLB Shootdown? • TLB shootdown IPI • Virtualized by VMM • Used in x86-based Windows and Linux 0% 20% 40% 60% 80% 100% bodytrack canneal dedup facesim ferret fluidani… streamcl… swaptions vips x264 CPUusageforkerneltime(%) TLB shootdown Lock spinning Others 0 500 1000 1500 2000 bodytrack canneal dedup facesim ferret fluidanim… streamclu… swaptions vips x264 #ofIPIs/vCPU/sec “A TLB shootdown IPI is a signal for coordination demand!”  Co-schedule IPI-recipient vCPUs with its sender vCPU TLB shootdown IPI traffic 14/28
  • 15. How to Identify Lock Spinning? • Why excessive lock spinning? • “Lock-holder preemption (LHP)” • Short critical section can be unpredictably prolonged by vCPU preemption • Which spinlock is problematic? vCPU pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU 0% 20% 40% 60% 80% 100% Lockwaittime(%) Other locks Runqueue lock Pagetable lock Semaphore wait-queue lock Futex wait-queue lock Spinlock wait time breakdown 82% 93% 15/28
  • 16. How to Identify Lock Spinning? • Futex • Linux kernel support for user-level synchronization (e.g., mutex, barrier, conditional variables, etc) mutex_lock(mutex) /* critical section */ mutex_unlock(mutex) futex_wake(mutex) { spin_lock(queue->lock) thread=dequeue(queue) wake_up(thread) spin_unlock(queue->lock) } mutex_lock(mutex) futex_wait(mutex) { spin_lock(queue->lock) enqueue(queue, me) spin_unlock(queue->lock) schedule() /* blocked */ vCPU1 vCPU2 /* wake-up */ /* critical section */ mutex_unlock(mutex) futex_wake(mutex) { spin_lock(queue->lock) Reschedule IPI User-level contention Kernel-level contention If vCPU1 is preempted before releasing its spinlock, vCPU2 starts busy-waiting on the preempted spinlock  LHP! Kernel space 16/28 Preempted
  • 17. How to Identify Lock Spinning? • Why preemption-prone? pCPU vCPU1 vCPU0 VMExit IPI emulation Wait-queue lock VMExit APIC reg access VMEntry VMExit APIC reg access VMEntry Wait-queue unlock VMEntry Wait-queue lock spinning  Prolonged by VMM intervention  Multiple VMM interventions for one IPI transmission  Repeated by iterative wake-up No more short critical section!  Likelihood of preemption  Preemption by woken-up sibling  Serious issue Remote thread wake-up 17/28
  • 18. How to Identify Lock Spinning? • Generalization: “Wait-queue locks” • Not limited to futex wake-up • Many wake-up functions in the Linux kernel • General wake-up • __wake_up*() • Semaphore or mutex unlock • rwsem_wake(), __mutex_unlock_common_slowpath(), … • “Multithreaded workloads usually communicate and synchronize on wait-queues” “A Reschedule IPI is a signal for coordination demand!” Delay preemption of an IPI-sender vCPU until a likely-held spinlock is released 18/28
  • 19. Outline • Motivation • Coordination in time domain • Kernel-level coordination demands • User-level coordination demands • Coordination in space domain • Load-conscious balance scheduling • Evaluation vCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU Space Time 19/28
  • 20. vCPU-to-pCPU Assignment • Balance scheduling [Sukwong11] • Spreading sibling vCPUs on different pCPUs • Increase in likelihood of coscheduling • No coordination in time domain vCPU pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU pCPU pCPU pCPU pCPU Uncoordinated scheduling Balance scheduling vCPU stacking Likelihood of coscheduling < No vCPU stacking 20/28
  • 21. vCPU-to-pCPU Assignment • Balance scheduling [Sukwong11] • Limitation • Based on “global CPU loads are well balanced” • In practice, VMs with fair CPU shares can have vCPU vCPU vCPU vCPU vCPU x4 shares SMP VM UP VM vCPU vCPU vCPU vCPU vCPUSMP VM SMP VM Inactive vCPUs Single-threaded workload Multithreaded workload Different # of vCPUs Different TLP 0 200 400 600 800 5 15 25 35 45 55 65 75 85 95 CPUusage(%) Time (sec) canneal 0 200 400 600 800 1 4 7 10 13 16 19 22 CPUusage(%) Time (sec) dedup TLP can be changed in a multithreaded app TLP: Thread-level parallelism pCPU pCPU vCPUvCPU pCPU pCPU vCPU vCPU vCPU vCPU High scheduling latency Balance scheduling on imbalanced loads 21/28
  • 22. Proposed Scheme • Load-conscious balance scheduling • Adaptive scheme based on pCPU loads • When assigning a vCPU, check pCPU loads vCPU pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU If load is balanced  Balance scheduling vCPU pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU If load is imbalanced  Favoring underloaded pCPUs CPU load > Avg. CPU load  overloaded Handled by coordination in time domain 22/28
  • 23. Outline • Motivation • Coordination in time domain • Kernel-level coordination demands • User-level coordination demands • Coordination in space domain • Load-conscious balance scheduling • Evaluation 23/28
  • 24. Evaluation • Implementation • Based on Linux KVM and CFS • Evaluation • Effective time slice • For coscheduling & delayed preemption • 500us decided by sensitive analysis • Performance improvement • Alternative • OS re-engineering 24/28
  • 25. Evaluation • SMP VM with UP VMs • One 8-vCPU VM + four 1-vCPU VMs (x264) 0.00 0.50 1.00 1.50 2.00 Normalizedexecutiontime Workloads of 8-vCPU VM Baseline Balance LC-Balance LC-Balance+Resched-DP LC-Balance+Resched-DP+TLB-Co Futex-intensive  5-53% improvement TLB-intensive  20-90% improvement Performance of 8-vCPU VM LC-Balance: Load-conscious balance scheduling Resched-DP: Delayed preemption for reschedule IPI TLB-Co: Coscheduling for TLB shootdown IPI Non-synchronization-intensive 25/28 High scheduling latencyBalance scheduling
  • 26. Alternative: OS Re-engineering • Virtualization-friendly re-engineering • Decoupling reschedule IPI transmission from thread wake-up wake_up (queue) { spin_lock(queue->lock) thread=dequeue(queue) wake_up(thread) spin_unlock(queue->lock) } Reschedule IPI Delayed reschedule IPI transmission • Modified wake_up func • Using per-cpu bitmap • Applied to futex_wakeup & futex_requeue One 8-vCPU VM + four 1-vCPU VMs (x264) Delayed reschedule IPI is virtualization-friendly to resolve LHP problems 26/28 0.00 0.20 0.40 0.60 0.80 1.00 1.20 facesim streamcluster Normalizedexecutiontime Baseline Baseline w/ DelayedResched LC_Balance LC_Balance w/ DelayedResched LC_Balance w/ Resched-DP
  • 27. Conclusions & Future Work • Demand-based coordinated scheduling • IPI as an effective signal for coordination • pCPU assignment conscious of dynamic CPU loads • Limitation • Cannot cover ALL types of synchronization demands • Kernel spinlock contention w/o VMM intervention • Future work • Cooperation with HW (e.g., PLE) & paravirt Barrier or lock 27/28 Address space
  • 28. Thank You! • Questions and comments • Contacts • hjukim@calab.kaist.ac.kr • http://calab.kaist.ac.kr/~hjukim 28/28
  • 30. User-Level Coordination Demands • Coscheduling-friendly workloads • SPMD, bulk-synchronous, etc. • Busy-waiting synchronization • “Spin-then-block” Barrier Barrier Thread1 Thread2 Thread3 Thread4 Wake up Wake up Wake up Wake up Additional barrier Thread1 Thread2 Thread3 Thread4 Thread1 Thread2 Thread3 Thread4 Wake up Coscheduling (balanced execution) Uncoordinated (largely skewed execution) Uncoordinated (skewed execution) More blocking operations when uncoordinated Spin Block 30/28
  • 31. User-Level Coordination Demands • Coscheduling • Avoiding more expensive blocking in a VM • VMExits for CPU yielding and wake-up • Halt (HLT) and Reschedule IPI • When to coschedule? • User-level synchronization involves reschedule IPIs Providing a knob to selectively enable this coscheduling for coscheduling-friendly VMs Reschedule IPI traffic of streamcluster Barriers Barriers Barriers Barriers Barriers Barriers “A Reschedule IPI is a signal for coordination demand!” Co-schedule IPI-recipient vCPUs with a sender vCPU 31/28
  • 32. Urgent vCPU First (UVF) Scheduling • Urgent vCPU • 1. Preemptively scheduled if fairness is kept • 2. Protected from preemption once scheduled • During “Urgent time slice (utslice)” pCPU vCPU vCPU vCPU Urgent queue Runqueue vCPU pCPU vCPU vCPU vCPUvCPU FIFO order Proportional shares order vCPU : urgent vCPU vCPU vCPU Wait queue If inter-VM fairness is kept Coscheduled Protected from preemption 32/28
  • 33. Proposed Scheme • Load-conscious balance scheduling • Adaptive scheme based on pCPU loads vCPU pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU Balanced loads  Balance scheduling vCPU pCPU pCPU pCPU pCPU vCPU vCPU vCPU vCPU vCPU Imbalanced loads  Favoring underloaded pCPUs vCPU pCPU0 pCPU1 pCPU2 pCPU3 vCPU vCPU vCPU Wait queue • Example vCPUvCPU vCPU Candidate pCPU set (Scheduler assigns a lowest-loaded pCPU in this set) = {pCPU0, pCPU1, pCPU2, pCPU3} pCPU3 is overloaded (i.e., CPU load > Avg. CPU load) Handled by coordination in time domain (UVF scheduling) 33/28
  • 34. Evaluation • Urgent time slice (utslice) • 1. Utslice for reducing LHP • 2. Utslice for quickly serving multiple urgent vCPUs 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0 100 300 500 700 1000 #offutexqueueLHP Utslice (usec) bodytrack facesim streamcluster Workloads: A futex-intensive workload in one VM + dedup in another VM as a preempting VM >300us utslice 2x-3.8x LHP reduction Remaining LHPs occur during local wake-up or before reschedule IPI transmission  Unlikely lead to lock contention 34/28
  • 35. Evaluation • Urgent time slice (utslice) • 1. utslice for reducing LHP • 2. utslice for quickly serving multiple urgent vCPUs 30 35 40 45 50 55 60 0 2 4 6 8 10 12 14 16 100 500 1000 3000 5000 Averageexecutiontime(sec) CPUcycles(%) Utslice (usec) Spinlock cycles (%) TLB cycles (%) Execution time (sec) Workloads: 3 VMs, each of which runs vips (vips - TLB-IPI-intensive application) As utslice increases, TLB shootdown cycles increase 500usec is an appropriate utslice for both LHP reduction and multiple urgent vCPUs ~11% degradation 35/28
  • 36. Evaluation • Urgent allowance • Improving overall efficiency with fairness 0 0.5 1 1.5 2 2.5 3 3.5 0 5 10 15 20 25 30 No UVF 0 6 12 18 24 Slowdown CPUcycles(%) Urgent allowance (msec) Spinlock cycles TLB cycles Slowdown (vips) Slowdown (facesim x 2) Workloads: vips (TLB-IPI-intensive) VM + two facesim VMs Efficient TLB synchronization No performance drop 36/28
  • 37. Evaluation • Impact of kernel-level coordination • One 8-vCPU VM + four 1-vCPU VMs (x264) 0.00 0.50 1.00 1.50 Normalizedexecutiontime Co-running workloads with 1-vCPU VM (x264) Baseline Balance LC-Balance LC-Balance+Resched-DP LC-Balance+Resched-DP+TLB-Co Performance of 1-vCPU VM LC-Balance: Load-conscious balance scheduling Resched-DP: Delayed preemption for reschedule IPI TLB-Co: Coscheduling for TLB shootdown IPI Unfair contention Balance scheduling Balance scheduling  Up to 26% degradation 37/28
  • 38. Evaluation: Two SMP VMs w/ dedup w/ freqmine a: baseline b: balance c: LC-balance d: LC-balance+Resched-DP e: LC-balance+Resched-DP+TLB-Co  corun solorun Time Time 38/28
  • 39. Evaluation • Effectiveness on HW-assisted feature • CPU feature to reduce the amount of busy-waiting • VMExit in response to excessive busy-waiting • Intel Pause-Loop-Exiting (PLE), AMD Pause Filter • Inevitable cost of some busy-waiting and VMExit LHP PAUSE PAUSE PAUSE … Threshold VMExit Yielding 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 Baseline LC_Balance LC_Balance w/ UVF Normalizedexecutiontime CPUcycles(%) TLB cycles (%) Spinlock cycles (%) Execution time (sec) 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 Baseline LC_Balance LC_Balance w/ UVF Normalizedexecutiontime CPUcycles(%) TLB cycles (%) Spinlock cycles (%) Execution time (sec) streamcluster (futex-intensive) ferret (TLB-IPI-intensive) Apps Streamcluster facesim ferret vips Reduction in Pause- loop VMExits (%) 44.5 97.7 74.0 37.9 39/28
  • 40. Evaluation • Coscheduling-friendly user-level workload • Streamcluster • Spin-then-block barrier intensive workload 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 UVF w/o Resched-Co UVF w/ Resched-Co #ofbarriersynchronization Departure (block) Departure (spin) Arrival (block) Arrival (spin) More performance improvement as the time of spin-waiting increases Blocking: 38% Reschedule IPIs (3 VMExits): 21% Additional (departure) barriers: 29% Normalized execution time (corunning w/ bodytrack) Additional barriers Barrier breakdown Resched-Co: Coscheduling for rescheudle IPI 0.00 0.20 0.40 0.60 0.80 1.00 0.1ms spin wait (default) 10x spin wait 20x spin wait Normalizedexecutiontime UVF w/o Resched-Co UVF w/ Resched-Co 40/28