Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Locks? We Don't Need No Stinkin' Locks - Michael Barker

1,000 views

Published on

Embrace the dark side. As a developer you'll often be advised that writing concurrent code should be the purview of the genius coders alone. In this talk Michael Barker will discard that notion into the cesspits of logic and reason and attempt to present on the less understood area of non-blocking concurrency, i.e. concurrency without locks. We'll look the modern Intel CPU architecture, why we need a memory model, the performance costs of various non-blocking constructs and delve into the implementation details of the latest version of the Disruptor to see how non-blocking concurrency can be applied to build high performance data structures.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Locks? We Don't Need No Stinkin' Locks - Michael Barker

  1. 1. Locks? We don’t need no stinkin’ Locks! @mikeb2701http://bad-concurrency.blogspot.com Image: http://subcirlce.co.uk
  2. 2. Memory Models
  3. 3. Happens-Before
  4. 4. CausalityCausality Fear will keep the local systems inline. instructions - Grand Moff Wilhuff Tarkin
  5. 5. • Loads are not reordered with other loads.• Stores are not reordered with other stores.• Stores are not reordered with older loads.• In a multiprocessor system, memory ordering obeys causality (memory ordering respects transitive visibility).• In a multiprocessor system, stores to the same location have a total order.• In a multiprocessor system, locked instructions to the same location have a total order.• Loads and Stores are not reordered with locked instructions.
  6. 6. Non-Blocking Primitives
  7. 7. Unsafe
  8. 8. public class AtomicLong extends Number implements Serializable { // ... private volatile long value; // ... /** * Sets to the given value. * * @param newValue the new value */ public final void set(long newValue) { value = newValue; } // ...}
  9. 9. # {method} set (J)V in java/util/concurrent/atomic/AtomicLong# this: rsi:rsi = java/util/concurrent/atomic/AtomicLong# parm0: rdx:rdx = long# [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f1f410378a0 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax push %rbp sub $0x10,%rsp nop mov %rdx,0x10(%rsi) lock addl $0x0,(%rsp) ;*putfield value ; - j.u.c.a.AtomicLong::set@2 (line 112) add $0x10,%rsp pop %rbp test %eax,0xa40fd06(%rip) # 0x00007f1f4b471000 ; {poll_return}
  10. 10. public class AtomicLong extends Number implements Serializable { // setup to use Unsafe.compareAndSwapLong for updates private static final Unsafe unsafe = Unsafe.getUnsafe(); private static final long valueOffset; // ... /** * Eventually sets to the given value. * * @param newValue the new value * @since 1.6 */ public final void lazySet(long newValue) { unsafe.putOrderedLong(this, valueOffset, newValue); } // ...}
  11. 11. # {method} lazySet (J)V in java/util/concurrent/atomic/AtomicLong# this: rsi:rsi = java/util/concurrent/atomic/AtomicLong# parm0: rdx:rdx = long# [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f1f410378a0 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax push %rbp sub $0x10,%rsp nop mov %rdx,0x10(%rsi) ;*invokevirtual putOrderedLong ; - AtomicLong::lazySet@8 (line 122) add $0x10,%rsp pop %rbp test %eax,0xa41204b(%rip) # 0x00007f1f4b471000 ; {poll_return}
  12. 12. public class AtomicInteger extends Number implements Serializable { // setup to use Unsafe.compareAndSwapInt for updates private static final Unsafe unsafe = Unsafe.getUnsafe(); private static final long valueOffset; private volatile int value; //... public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); }}
  13. 13. # {method} compareAndSet (JJ)Z in java/util/concurrent/atomic/AtomicLong # this: rsi:rsi = java/util/concurrent/atomic/AtomicLong # parm0: rdx:rdx = long # parm1: rcx:rcx = long # [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f6699037a60 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax sub $0x18,%rsp mov %rbp,0x10(%rsp) mov %rdx,%rax lock cmpxchg %rcx,0x10(%rsi) sete %r11b movzbl %r11b,%r11d ;*invokevirtual compareAndSwapLong ; - j.u.c.a.AtomicLong::compareAndSet@9 (line149) mov %r11d,%eax add $0x10,%rsp pop %rbp test %eax,0x91df935(%rip) # 0x00007f66a223e000 ; {poll_return}
  14. 14. set() compareAndSet lazySet() 96.75 4.52.25 0 nanoseconds/op
  15. 15. Example - Disruptor Multi-producerprivate void publish(Disruptor disruptor, long value) { long next = disruptor.next(); disruptor.setValue(next, value); disruptor.publish(next);}
  16. 16. Example - Disruptor Multi-producerpublic long next() { long next; long current; do { current = nextSequence.get(); next = current + 1; while (next > (readSequence.get() + size)) { LockSupport.parkNanos(1L); continue; } } while (!nextSequence.compareAndSet(current, next)); return next;}
  17. 17. Algorithm: Spin - 1public void publish(long sequence) { long sequenceMinusOne = sequence - 1; while (cursor.get() != sequenceMinusOne) { // Spin } cursor.lazySet(sequence);}
  18. 18. Spin - 1 25 18.75million ops/sec 12.5 6.25 0 1 2 3 4 5 6 7 8 Producer Threads
  19. 19. Algorithm: Co-Oppublic void publish(long sequence) { int counter = RETRIES; while (sequence - cursor.get() > pendingPublication.length()) { if (--counter == 0) { Thread.yield(); counter = RETRIES; } } long expectedSequence = sequence - 1; pendingPublication.set((int) sequence & pendingMask, sequence); if (cursor.get() >= sequence) { return; } long nextSequence = sequence; while (cursor.compareAndSet(expectedSequence, nextSequence)) { expectedSequence = nextSequence; nextSequence++; if (pendingPublication.get((int) nextSequence & pendingMask) != nextSequence) { break; } }}
  20. 20. Spin - 1 Co-Op 30 22.5million ops/sec 15 7.5 0 1 2 3 4 5 6 7 8 Producer Threads
  21. 21. Algorithm: Bufferpublic long next() { long next; long current; do { current = cursor.get(); next = current + 1; while (next > (readSequence.get() + size)) { LockSupport.parkNanos(1L); continue; } } while (!cursor.compareAndSet(current, next)); return next;}
  22. 22. Algorithm: Bufferpublic void publish(long sequence) { int publishedValue = (int) (sequence >>> indexShift); published.set(indexOf(sequence), publishedValue);}// Get Valueint availableValue = (int) (current >>> indexShift);int index = indexOf(current);while (published.get(index) != availableValue) { // Spin}
  23. 23. Spin - 1 Co-Op Buffer 70 52.5million ops/sec 35 17.5 0 1 2 3 4 5 6 7 8 Threads
  24. 24. Stuff that sucks...
  25. 25. Q&A• https://github.com/mikeb01/jax2012• http://www.lmax.com/careers• http://www.infoq.com/presentations/Lock- free-Algorithms• http://www.youtube.com/watch? v=DCdGlxBbKU4

×