Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Preemptable ticket spinlocks: improving consolidated performance in the cloud

654 views

Published on

This slides were presented at the 9th ACM SIGPLAN/SIGOPS international conference on Virtual Execution Environments (VEE '13).
When executing inside a virtual machine environment, OS level synchronization primitives are faced with significant challenges due to the scheduling behavior of the underlying virtual machine monitor. Operations that are ensured to last only a short amount of time on real hardware, are capable of taking considerably longer when running virtualized. This change in assumptions has significant impact when an OS is executing inside a critical region that is protected by a spinlock. The interaction between OS level spinlocks and VMM scheduling is known as the Lock Holder Preemption problem and has a significant impact on overall VM performance. However, with the use of ticket locks instead of generic spinlocks, virtual environments must also contend with waiters being preempted before they are able to acquire the lock. This has the effect of blocking access to a lock, even if the lock itself is available. We identify this scenario as the Lock Waiter Preemption problem. In order to solve both problems we introduce Preemptable Ticket spinlocks, a new locking primitive that is designed to enable a VM to always make forward progress by relaxing the ordering guarantees offered by ticket locks. We show that the use of Preemptable Ticket spinlocks improves VM performance by 5.32X on average, when running on a non paravirtual VMM, and by 7.91X when running on a VMM that supports a paravirtual locking interface, when executing a set of microbenchmarks as well as a realistic e-commerce benchmark.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Preemptable ticket spinlocks: improving consolidated performance in the cloud

  1. 1. Jiannan Ouyang, John Lange Department of Computer Science University of Pittsburgh VEE ’13 03/17/2013 Preemptable Ticket Spinlocks Improving Consolidated Performance in the Cloud
  2. 2. Motivation 2 —  VM interference in overcommitted environments —  OS synchronization overhead —  Lock holder preemption (LHP)
  3. 3. —  Lock Waiter Preemption —  significance analysis of lock waiter preemption —  PreemptableTicket Spinlock —  implementation inside Linux —  Evaluation —  significant speedup over Linux Contributions 3
  4. 4. Spinlocks 4 —  Basics —  lock() & unlock() —  Busy waiting lock —  generic spinlock: random order, unfair (starvation) —  ticket spinlock: FIFO order, fair —  Designed for fast mutual exclusion —  busy waiting vs. sleep/wakeup —  spinlocks for short & fast critical sections (~1us) —  OS assumptions —  use spinlocks for short critical section only —  never preempt a thread holding or waiting a kernel spinlock
  5. 5. Preemption in VMs 5 —  Lock Holder Preemption (LHP) —  virtualization breaks the OS assumption —  vCPU holding a lock is unscheduled byVMM —  preemption prolongs critical section (~1m v.s. ~1us) —  Proposed Solutions —  Co-scheduling and variants —  Hardware-assisted scheme (Pause Loop Exiting) —  Paravirtual spinlocks
  6. 6. Preemption in Ticket Lock 6 0 1 2 head = 0 tail = 2 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  7. 7. Preemption in Ticket Lock 7 0 1 2 head = 0 tail = 2 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  8. 8. Preemption in Ticket Lock 8 0 1 2 3 head = 0 tail = 3 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  9. 9. Preemption in Ticket Lock 9 0 1 2 3 4 head = 0 tail = 4 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  10. 10. Preemption in Ticket Lock 10 1 2 3 4 tail = 4head = 1 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  11. 11. Preemption in Ticket Lock 11 1 2 3 4 tail = 4head = 1 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 Lock Holder Preemption!
  12. 12. Preemption in Ticket Lock 12 1 2 3 4 tail = 4head = 1 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  13. 13. Preemption in Ticket Lock 13 1 2 3 4 tail = 4head = 1 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  14. 14. Preemption in Ticket Lock 14 2 3 4 tail = 4head = 2 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  15. 15. Preemption in Ticket Lock 15 3 4 head = 3 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 tail = 4
  16. 16. Preemption in Ticket Lock 16 3 4 5 head = 3 tail = 5 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1
  17. 17. Preemption in Ticket Lock 17 3 4 5 6 head = 3 tail = 6 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 LockWaiter Preemption wait on available resource
  18. 18. Lock Waiter Preemption 18 —  Lock waiter is preempted —  Later waiters wait on an available lock —  Possible to adapt to it, if we —  detect preempted waiter —  acquire lock out of order How significant is it??
  19. 19. Waiter Preemption Dominates 19 LHP + LWP LWP ​ 𝐋 𝐖𝐏/ 𝐋𝐇𝐏 +𝐋𝐖𝐏  hackbench x1 1089 452 41.5% hackbench x2 44342 39221 88.5% ebizzy x1 294 166 56.5% ebizzy x2 1017 980 96.4% Table 2: LockWaiter Preemption Problem in the Linux Kernel Lock waiter preemption dominates in overcommitted environments
  20. 20. Challenges & Approach 20 —  How to identify a preempted waiter? —  timeout threshold —  How to violate order constraints? —  allow timed out waiters get the lock randomly —  ensure mutual exclusion between them —  How NOT to break the whole ordering mechanism? —  timeout threshold proportional to queue position
  21. 21. Queue Position Index 21 N = ticket – queue_head —  ticket: copy of queue tail value upon enqueue —  N: number of earlier waiters n n+1 n+2 head = n tail = n+2 ticket = n+2 N = 2
  22. 22. Proportional Timeout Threshold 22 T = N x t —  t is a constant parameter —  large enough to avoid false detection —  small enough to save waiting time —  Performance is NOT t value sensitive —  most locks take ~1us & most spinning time wasted on locks that wait ~1ms —  larger t does not harm & smaller t does not gain much n n+1 n+2 head = n tail = n+2 0 t 2tTimeout Threshold
  23. 23. Preemptable Ticket Spinlock 23 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 0 1 2 3 4 5 head = 0 tail = 5 0 Timeout Threshold t 2t 3t 4t 5t
  24. 24. Preemptable Ticket Spinlock 24 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 2 3 4 5 head = 1 tail = 5 0 Timeout Threshold t 2t 3t 4t
  25. 25. Preemptable Ticket Spinlock 25 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 2 3 4 5 head = 1 tail = 5 0 Timeout Threshold t 2t 3t 4t
  26. 26. Preemptable Ticket Spinlock 26 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 4 5 head = 2 tail = 5 Timeout Threshold t 2t 3t N = ticket – head
  27. 27. Preemptable Ticket Spinlock 27 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 4 5 head = 2 tail = 5 Timeout Threshold t 2t 3t
  28. 28. Preemptable Ticket Spinlock 28 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 5 head = 3 tail = 5 Timeout Threshold 0 2t
  29. 29. Preemptable Ticket Spinlock 29 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 5 head = 3 tail = 5 Timeout Threshold 0 2t
  30. 30. Preemptable Ticket Spinlock 30 0 1 a scheduled waiter with ticket 0 a preempted waiter with ticket 1 1 3 5 head = 3 tail = 5 Timeout Threshold 0 2t
  31. 31. Summary 31 —  PreemptableTicket Lock adapts to preemption —  preserve order in absence of preemption —  violate order upon preemption —  PreemptableTicket Lock preserves fairness —  order violations are restricted —  priority is always given to timed out waiters —  timed out waiters bounded by vCPU numbers of aVM
  32. 32. Implementation 32 —  Drop-in replacement —  lock(), unlock(), is_locked(), trylock(), etc. —  Correct —  race condition free: atomic updates —  Fast —  performance is sensitive to lock efficiency —  ~60 lines of C/inline-assembly in Linux 3.5.0
  33. 33. Paravirtual Spinlocks 33 —  Lock holder preemption is unaddressed —  semantic gap between guest and host —  paravirtualization: guest/host cooperation —  signal long waiting lock / put a vCPU to sleep —  notify to wake up a vCPU / wake up a vCPU —  paravirtual preemptable ticket spinlock —  sleep when waiting too long after timed out —  wake up all sleeping waiters upon lock releasing
  34. 34. Evaluation 34 —  Host —  8 core 2.6GHz Intel Core i7 CPU, 8 GB RAM, 1Gbit NIC, Fedora 17 (Linux 3.5.0) —  Guest —  8 core, 1G RAM, Fedora 17 (Linux 3.5.0) —  Benchmarks —  hackbench, ebizzy, dell dvd store —  Lock implementations —  baseline: ticket lock, paravirtual ticket lock (pv-lock) —  preemptable ticket lock —  paravirtual (pv) preemptable ticket lock
  35. 35. Hackbench 35 —  Average Speedup —  preemptable-lock vs. ticket lock: 4.82X —  pv-preemptable-lock v.s. ticket lock: 7.08X —  pv-preemptable-lock v.s. pv-lock: 1.03X
  36. 36. Ebizzy 36 Less variance over ticket lock and pv-lock —  in-VM preemption adaptivity —  lessVM interference variance 80.36 vs. 10.94 variance 131.62 vs. 16.09
  37. 37. Dell DVD Store (apache/mysql) 37 —  Average Speedup —  preemptable-lock vs. ticket lock: 11.68X —  pv-preemptable-lock v.s. ticket lock: 19.52X —  pv-preemptable-lock v.s. pv-lock: 1.11X
  38. 38. Evaluation Summary 38 —  PreemptableTicket Spinlocks speedup —  5.32X over ticket lock —  Paravirtual PreemptableTicket Spinlocks speedup —  7.91X over ticket lock —  1.08X over paravirtual ticket lock Average speedup across cases for all benchmarks
  39. 39. —  LockWaiter Preemption —  most significant preemption problem in queue based lock under overcommitted environment —  PreemptableTicket Spinlock —  Implementation with ~60 lines of code in Linux —  Better performance in overcommitted environment —  5.32X average speedup up over ticket lock w/oVMM support —  1.08X average speedup over pv-lock with less variance Conclusion 39
  40. 40. Thank You 40 — Jiannan Ouyang —  ouyang@cs.pitt.edu —  http://www.cs.pitt.edu/~ouyang/ 1 2 3 4 5 0 t 2t 3t 4t PreemptableTicket Spinlock

×