Locks? We don’t need no       stinkin’ Locks!            @mikeb2701http://bad-concurrency.blogspot.com                    ...
Memory Models
Happens-Before
CausalityCausality  Fear will keep the local systems inline.     instructions           - Grand Moff Wilhuff Tarkin
•   Loads are not reordered with other loads.•   Stores are not reordered with other stores.•   Stores are not reordered w...
Non-Blocking Primitives
Unsafe
public class AtomicLong extends Number                        implements Serializable {    // ...    private volatile long...
# {method} set (J)V in java/util/concurrent/atomic/AtomicLong# this:       rsi:rsi   = java/util/concurrent/atomic/AtomicL...
public class AtomicLong extends Number                        implements Serializable {    // setup to use Unsafe.compareA...
# {method} lazySet (J)V in java/util/concurrent/atomic/AtomicLong# this:       rsi:rsi   = java/util/concurrent/atomic/Ato...
public class AtomicInteger extends Number                           implements Serializable {    // setup to use Unsafe.co...
# {method} compareAndSet (JJ)Z in java/util/concurrent/atomic/AtomicLong  # this:       rsi:rsi    = java/util/concurrent/...
set()   compareAndSet      lazySet()  96.75 4.52.25  0                 nanoseconds/op
Example - Disruptor Multi-producerprivate void publish(Disruptor disruptor, long value) {    long next = disruptor.next();...
Example - Disruptor Multi-producerpublic long next() {    long next;    long current;    do {        current = nextSequenc...
Algorithm: Spin - 1public void publish(long sequence) {    long sequenceMinusOne = sequence - 1;    while (cursor.get() !=...
Spin - 1                    25                  18.75million ops/sec                   12.5                   6.25        ...
Algorithm: Co-Oppublic void publish(long sequence) {    int counter = RETRIES;    while (sequence - cursor.get() > pending...
Spin - 1              Co-Op                   30                  22.5million ops/sec                   15                ...
Algorithm: Bufferpublic long next() {    long next;    long current;    do {        current = cursor.get();        next = ...
Algorithm: Bufferpublic void publish(long sequence) {    int publishedValue = (int) (sequence >>> indexShift);    publishe...
Spin - 1   Co-Op             Buffer                   70                  52.5million ops/sec                   35        ...
Stuff that sucks...
Q&A• https://github.com/mikeb01/jax2012• http://www.lmax.com/careers• http://www.infoq.com/presentations/Lock-  free-Algor...
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Upcoming SlideShare
Loading in...5
×

Locks? We Don't Need No Stinkin' Locks - Michael Barker

558

Published on

Embrace the dark side. As a developer you'll often be advised that writing concurrent code should be the purview of the genius coders alone. In this talk Michael Barker will discard that notion into the cesspits of logic and reason and attempt to present on the less understood area of non-blocking concurrency, i.e. concurrency without locks. We'll look the modern Intel CPU architecture, why we need a memory model, the performance costs of various non-blocking constructs and delve into the implementation details of the latest version of the Disruptor to see how non-blocking concurrency can be applied to build high performance data structures.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
558
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • - Concurrency is taught all wrong.\n- What is non-blocking concurrency.\n- Mechanical Sympathy, locks/mutexs are a completely artificial construct\n- MTs concurrency course blocking v. non-blocking.\n- Tools for non-blocking concurrency functions of the CPU, need to look at CPU architecture first.\n
  • - Causality\n- Why CPUs/Compilers reorder\n
  • - Java Memory Model provides serial consistency for race-free programs\n- As-if-serial\n- Disallows out of thin air values\n- First main-stream programming language to include a memory model (C/C++ combination of the CPU and whatever the compiler happens to do.\n
  • \n
  • \n
  • \n
  • - volatile\n- java.util.concurrent.atomic.*\n - Atomic<Long|Integer|Reference>\n - Atomic<Long|Integer|Reference>Array (why use over an array of atomics)\n - Atomic<Long|Integer|Reference>FieldUpdater (can be a bit slow)\n
  • - Fight club\n- If you’re smart enough\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • - Thread wake ups\n- Hard spin\n- Spin with yield\n- PAUSE instruction - please add to Java\n- MONITOR and MWAIT\n
  • \n
  • Transcript of "Locks? We Don't Need No Stinkin' Locks - Michael Barker"

    1. 1. Locks? We don’t need no stinkin’ Locks! @mikeb2701http://bad-concurrency.blogspot.com Image: http://subcirlce.co.uk
    2. 2. Memory Models
    3. 3. Happens-Before
    4. 4. CausalityCausality Fear will keep the local systems inline. instructions - Grand Moff Wilhuff Tarkin
    5. 5. • Loads are not reordered with other loads.• Stores are not reordered with other stores.• Stores are not reordered with older loads.• In a multiprocessor system, memory ordering obeys causality (memory ordering respects transitive visibility).• In a multiprocessor system, stores to the same location have a total order.• In a multiprocessor system, locked instructions to the same location have a total order.• Loads and Stores are not reordered with locked instructions.
    6. 6. Non-Blocking Primitives
    7. 7. Unsafe
    8. 8. public class AtomicLong extends Number implements Serializable { // ... private volatile long value; // ... /** * Sets to the given value. * * @param newValue the new value */ public final void set(long newValue) { value = newValue; } // ...}
    9. 9. # {method} set (J)V in java/util/concurrent/atomic/AtomicLong# this: rsi:rsi = java/util/concurrent/atomic/AtomicLong# parm0: rdx:rdx = long# [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f1f410378a0 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax push %rbp sub $0x10,%rsp nop mov %rdx,0x10(%rsi) lock addl $0x0,(%rsp) ;*putfield value ; - j.u.c.a.AtomicLong::set@2 (line 112) add $0x10,%rsp pop %rbp test %eax,0xa40fd06(%rip) # 0x00007f1f4b471000 ; {poll_return}
    10. 10. public class AtomicLong extends Number implements Serializable { // setup to use Unsafe.compareAndSwapLong for updates private static final Unsafe unsafe = Unsafe.getUnsafe(); private static final long valueOffset; // ... /** * Eventually sets to the given value. * * @param newValue the new value * @since 1.6 */ public final void lazySet(long newValue) { unsafe.putOrderedLong(this, valueOffset, newValue); } // ...}
    11. 11. # {method} lazySet (J)V in java/util/concurrent/atomic/AtomicLong# this: rsi:rsi = java/util/concurrent/atomic/AtomicLong# parm0: rdx:rdx = long# [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f1f410378a0 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax push %rbp sub $0x10,%rsp nop mov %rdx,0x10(%rsi) ;*invokevirtual putOrderedLong ; - AtomicLong::lazySet@8 (line 122) add $0x10,%rsp pop %rbp test %eax,0xa41204b(%rip) # 0x00007f1f4b471000 ; {poll_return}
    12. 12. public class AtomicInteger extends Number implements Serializable { // setup to use Unsafe.compareAndSwapInt for updates private static final Unsafe unsafe = Unsafe.getUnsafe(); private static final long valueOffset; private volatile int value; //... public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); }}
    13. 13. # {method} compareAndSet (JJ)Z in java/util/concurrent/atomic/AtomicLong # this: rsi:rsi = java/util/concurrent/atomic/AtomicLong # parm0: rdx:rdx = long # parm1: rcx:rcx = long # [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f6699037a60 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax sub $0x18,%rsp mov %rbp,0x10(%rsp) mov %rdx,%rax lock cmpxchg %rcx,0x10(%rsi) sete %r11b movzbl %r11b,%r11d ;*invokevirtual compareAndSwapLong ; - j.u.c.a.AtomicLong::compareAndSet@9 (line149) mov %r11d,%eax add $0x10,%rsp pop %rbp test %eax,0x91df935(%rip) # 0x00007f66a223e000 ; {poll_return}
    14. 14. set() compareAndSet lazySet() 96.75 4.52.25 0 nanoseconds/op
    15. 15. Example - Disruptor Multi-producerprivate void publish(Disruptor disruptor, long value) { long next = disruptor.next(); disruptor.setValue(next, value); disruptor.publish(next);}
    16. 16. Example - Disruptor Multi-producerpublic long next() { long next; long current; do { current = nextSequence.get(); next = current + 1; while (next > (readSequence.get() + size)) { LockSupport.parkNanos(1L); continue; } } while (!nextSequence.compareAndSet(current, next)); return next;}
    17. 17. Algorithm: Spin - 1public void publish(long sequence) { long sequenceMinusOne = sequence - 1; while (cursor.get() != sequenceMinusOne) { // Spin } cursor.lazySet(sequence);}
    18. 18. Spin - 1 25 18.75million ops/sec 12.5 6.25 0 1 2 3 4 5 6 7 8 Producer Threads
    19. 19. Algorithm: Co-Oppublic void publish(long sequence) { int counter = RETRIES; while (sequence - cursor.get() > pendingPublication.length()) { if (--counter == 0) { Thread.yield(); counter = RETRIES; } } long expectedSequence = sequence - 1; pendingPublication.set((int) sequence & pendingMask, sequence); if (cursor.get() >= sequence) { return; } long nextSequence = sequence; while (cursor.compareAndSet(expectedSequence, nextSequence)) { expectedSequence = nextSequence; nextSequence++; if (pendingPublication.get((int) nextSequence & pendingMask) != nextSequence) { break; } }}
    20. 20. Spin - 1 Co-Op 30 22.5million ops/sec 15 7.5 0 1 2 3 4 5 6 7 8 Producer Threads
    21. 21. Algorithm: Bufferpublic long next() { long next; long current; do { current = cursor.get(); next = current + 1; while (next > (readSequence.get() + size)) { LockSupport.parkNanos(1L); continue; } } while (!cursor.compareAndSet(current, next)); return next;}
    22. 22. Algorithm: Bufferpublic void publish(long sequence) { int publishedValue = (int) (sequence >>> indexShift); published.set(indexOf(sequence), publishedValue);}// Get Valueint availableValue = (int) (current >>> indexShift);int index = indexOf(current);while (published.get(index) != availableValue) { // Spin}
    23. 23. Spin - 1 Co-Op Buffer 70 52.5million ops/sec 35 17.5 0 1 2 3 4 5 6 7 8 Threads
    24. 24. Stuff that sucks...
    25. 25. Q&A• https://github.com/mikeb01/jax2012• http://www.lmax.com/careers• http://www.infoq.com/presentations/Lock- free-Algorithms• http://www.youtube.com/watch? v=DCdGlxBbKU4

    ×