Locks? We don’t need no
       stinkin’ Locks!

            @mikeb2701
http://bad-concurrency.blogspot.com
                                Image: http://subcirlce.co.uk
Memory Models
Happens-Before
Causality
Causality
  Fear will keep the
 local systems inline.
     instructions
           - Grand Moff Wilhuff Tarkin
•   Loads are not reordered with other loads.


•   Stores are not reordered with other stores.


•   Stores are not reordered with older loads.


•   In a multiprocessor system, memory ordering obeys causality (memory
    ordering respects transitive visibility).


•   In a multiprocessor system, stores to the same location have a total order.


•   In a multiprocessor system, locked instructions to the same
    location have a total order.


•   Loads and Stores are not reordered with locked instructions.
Non-Blocking
 Primitives
Unsafe
public class AtomicLong extends Number
                        implements Serializable {

    // ...
    private volatile long value;

    // ...
    /**
      * Sets to the given value.
      *
      * @param newValue the new value
      */
    public final void set(long newValue) {
         value = newValue;
    }

    // ...
}
# {method} 'set' '(J)V' in 'java/util/concurrent/atomic/AtomicLong'
# this:       rsi:rsi   = 'java/util/concurrent/atomic/AtomicLong'
# parm0:      rdx:rdx   = long
#             [sp+0x20] (sp of caller)
  mov    0x8(%rsi),%r10d
  shl    $0x3,%r10
  cmp    %r10,%rax
  jne    0x00007f1f410378a0 ;     {runtime_call}
  xchg   %ax,%ax
  nopl   0x0(%rax,%rax,1)
  xchg   %ax,%ax
  push   %rbp
  sub    $0x10,%rsp
  nop
  mov    %rdx,0x10(%rsi)
  lock addl $0x0,(%rsp)     ;*putfield value
                            ; - j.u.c.a.AtomicLong::set@2 (line 112)
  add    $0x10,%rsp
  pop    %rbp
  test   %eax,0xa40fd06(%rip)         # 0x00007f1f4b471000
                            ;   {poll_return}
public class AtomicLong extends Number
                        implements Serializable {


    // setup to use Unsafe.compareAndSwapLong for updates
    private static final Unsafe unsafe = Unsafe.getUnsafe();
    private static final long valueOffset;

    // ...
    /**
      * Eventually sets to the given value.
      *
      * @param newValue the new value
      * @since 1.6
      */
    public final void lazySet(long newValue) {
         unsafe.putOrderedLong(this, valueOffset, newValue);
    }

    // ...
}
# {method} 'lazySet' '(J)V' in 'java/util/concurrent/atomic/
AtomicLong'
# this:       rsi:rsi   = 'java/util/concurrent/atomic/AtomicLong'
# parm0:      rdx:rdx   = long
#             [sp+0x20] (sp of caller)
  mov    0x8(%rsi),%r10d
  shl    $0x3,%r10
  cmp    %r10,%rax
  jne    0x00007f1f410378a0 ;     {runtime_call}
  xchg   %ax,%ax
  nopl   0x0(%rax,%rax,1)
  xchg   %ax,%ax
  push   %rbp
  sub    $0x10,%rsp
  nop
  mov    %rdx,0x10(%rsi)     ;*invokevirtual putOrderedLong
                             ; - AtomicLong::lazySet@8 (line 122)
  add    $0x10,%rsp
  pop    %rbp
  test   %eax,0xa41204b(%rip)         # 0x00007f1f4b471000
                             ;   {poll_return}
public class AtomicInteger extends Number
                           implements Serializable {

    // setup to use Unsafe.compareAndSwapInt for updates
    private static final Unsafe unsafe = Unsafe.getUnsafe();
    private static final long valueOffset;

    private volatile int value;

    //...

    public final boolean compareAndSet(int expect,
                                       int update) {
        return unsafe.compareAndSwapInt(this, valueOffset,
                                        expect, update);
    }
}
# {method} 'compareAndSet' '(JJ)Z' in 'java/util/concurrent/
atomic/AtomicLong'
  # this:       rsi:rsi    = 'java/util/concurrent/atomic/AtomicLong'
  # parm0:      rdx:rdx    = long
  # parm1:      rcx:rcx    = long
  #             [sp+0x20] (sp of caller)
  mov     0x8(%rsi),%r10d
  shl     $0x3,%r10
  cmp     %r10,%rax
  jne     0x00007f6699037a60 ;      {runtime_call}
  xchg    %ax,%ax
  nopl    0x0(%rax,%rax,1)
  xchg    %ax,%ax
  sub     $0x18,%rsp
  mov     %rbp,0x10(%rsp)
  mov     %rdx,%rax
  lock cmpxchg %rcx,0x10(%rsi)
  sete    %r11b
  movzbl %r11b,%r11d ;*invokevirtual compareAndSwapLong
                        ; - j.u.c.a.AtomicLong::compareAndSet@9 (line
149)
  mov     %r11d,%eax
  add     $0x10,%rsp
  pop     %rbp
  test    %eax,0x91df935(%rip)          # 0x00007f66a223e000
                        ;   {poll_return}
set()   compareAndSet      lazySet()
  9



6.75



 4.5



2.25



  0
                 nanoseconds/op
Example - Disruptor Multi-producer




private void publish(Disruptor disruptor, long value) {
    long next = disruptor.next();
    disruptor.setValue(next, value);
    disruptor.publish(next);
}
Example - Disruptor Multi-producer
public long next() {
    long next;
    long current;

    do {
        current = nextSequence.get();
        next = current + 1;
        while (next > (readSequence.get() + size)) {
            LockSupport.parkNanos(1L);
            continue;
        }
    } while (!nextSequence.compareAndSet(current, next));

    return next;
}
Algorithm: Spin - 1



public void publish(long sequence) {
    long sequenceMinusOne = sequence - 1;
    while (cursor.get() != sequenceMinusOne) {
        // Spin
    }

    cursor.lazySet(sequence);
}
Spin - 1
                    25



                  18.75
million ops/sec




                   12.5



                   6.25



                     0
                          1   2   3     4         5      6   7   8
                                      Producer Threads
Algorithm: Co-Op
public void publish(long sequence) {
    int counter = RETRIES;
    while (sequence - cursor.get() > pendingPublication.length()) {
        if (--counter == 0) {
            Thread.yield();
            counter = RETRIES;
        }
    }

    long expectedSequence = sequence - 1;
    pendingPublication.set((int) sequence & pendingMask, sequence);

    if (cursor.get() >= sequence) { return; }

    long nextSequence = sequence;
    while (cursor.compareAndSet(expectedSequence, nextSequence)) {
        expectedSequence = nextSequence;
        nextSequence++;
        if (pendingPublication.get((int) nextSequence & pendingMask) != nextSequence) {
            break;
        }
    }
}
Spin - 1              Co-Op
                   30



                  22.5
million ops/sec




                   15



                   7.5



                    0
                         1   2   3            4         5      6   7   8
                                            Producer Threads
Algorithm: Buffer
public long next() {
    long next;
    long current;

    do {
        current = cursor.get();
        next = current + 1;
        while (next > (readSequence.get() + size)) {
            LockSupport.parkNanos(1L);
            continue;
        }
    } while (!cursor.compareAndSet(current, next));

    return next;
}
Algorithm: Buffer


public void publish(long sequence) {
    int publishedValue = (int) (sequence >>> indexShift);
    published.set(indexOf(sequence), publishedValue);
}



// Get Value
int availableValue = (int) (current >>> indexShift);
int index = indexOf(current);
while (published.get(index) != availableValue) {
     // Spin
}
Spin - 1   Co-Op             Buffer
                   70



                  52.5
million ops/sec




                   35



                  17.5



                    0
                         1   2        3     4             5       6    7   8
                                                Threads
Stuff that sucks...
Q&A
• https://github.com/mikeb01/jax2012
• http://www.lmax.com/careers
• http://www.infoq.com/presentations/Lock-
  free-Algorithms
• http://www.youtube.com/watch?
  v=DCdGlxBbKU4

Locks? We Don't Need No Stinkin' Locks - Michael Barker

  • 1.
    Locks? We don’tneed no stinkin’ Locks! @mikeb2701 http://bad-concurrency.blogspot.com Image: http://subcirlce.co.uk
  • 3.
  • 4.
  • 5.
    Causality Causality Fearwill keep the local systems inline. instructions - Grand Moff Wilhuff Tarkin
  • 6.
    Loads are not reordered with other loads. • Stores are not reordered with other stores. • Stores are not reordered with older loads. • In a multiprocessor system, memory ordering obeys causality (memory ordering respects transitive visibility). • In a multiprocessor system, stores to the same location have a total order. • In a multiprocessor system, locked instructions to the same location have a total order. • Loads and Stores are not reordered with locked instructions.
  • 7.
  • 8.
  • 9.
    public class AtomicLongextends Number implements Serializable { // ... private volatile long value; // ... /** * Sets to the given value. * * @param newValue the new value */ public final void set(long newValue) { value = newValue; } // ... }
  • 10.
    # {method} 'set''(J)V' in 'java/util/concurrent/atomic/AtomicLong' # this: rsi:rsi = 'java/util/concurrent/atomic/AtomicLong' # parm0: rdx:rdx = long # [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f1f410378a0 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax push %rbp sub $0x10,%rsp nop mov %rdx,0x10(%rsi) lock addl $0x0,(%rsp) ;*putfield value ; - j.u.c.a.AtomicLong::set@2 (line 112) add $0x10,%rsp pop %rbp test %eax,0xa40fd06(%rip) # 0x00007f1f4b471000 ; {poll_return}
  • 11.
    public class AtomicLongextends Number implements Serializable { // setup to use Unsafe.compareAndSwapLong for updates private static final Unsafe unsafe = Unsafe.getUnsafe(); private static final long valueOffset; // ... /** * Eventually sets to the given value. * * @param newValue the new value * @since 1.6 */ public final void lazySet(long newValue) { unsafe.putOrderedLong(this, valueOffset, newValue); } // ... }
  • 12.
    # {method} 'lazySet''(J)V' in 'java/util/concurrent/atomic/ AtomicLong' # this: rsi:rsi = 'java/util/concurrent/atomic/AtomicLong' # parm0: rdx:rdx = long # [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f1f410378a0 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax push %rbp sub $0x10,%rsp nop mov %rdx,0x10(%rsi) ;*invokevirtual putOrderedLong ; - AtomicLong::lazySet@8 (line 122) add $0x10,%rsp pop %rbp test %eax,0xa41204b(%rip) # 0x00007f1f4b471000 ; {poll_return}
  • 13.
    public class AtomicIntegerextends Number implements Serializable { // setup to use Unsafe.compareAndSwapInt for updates private static final Unsafe unsafe = Unsafe.getUnsafe(); private static final long valueOffset; private volatile int value; //... public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); } }
  • 14.
    # {method} 'compareAndSet''(JJ)Z' in 'java/util/concurrent/ atomic/AtomicLong' # this: rsi:rsi = 'java/util/concurrent/atomic/AtomicLong' # parm0: rdx:rdx = long # parm1: rcx:rcx = long # [sp+0x20] (sp of caller) mov 0x8(%rsi),%r10d shl $0x3,%r10 cmp %r10,%rax jne 0x00007f6699037a60 ; {runtime_call} xchg %ax,%ax nopl 0x0(%rax,%rax,1) xchg %ax,%ax sub $0x18,%rsp mov %rbp,0x10(%rsp) mov %rdx,%rax lock cmpxchg %rcx,0x10(%rsi) sete %r11b movzbl %r11b,%r11d ;*invokevirtual compareAndSwapLong ; - j.u.c.a.AtomicLong::compareAndSet@9 (line 149) mov %r11d,%eax add $0x10,%rsp pop %rbp test %eax,0x91df935(%rip) # 0x00007f66a223e000 ; {poll_return}
  • 15.
    set() compareAndSet lazySet() 9 6.75 4.5 2.25 0 nanoseconds/op
  • 16.
    Example - DisruptorMulti-producer private void publish(Disruptor disruptor, long value) { long next = disruptor.next(); disruptor.setValue(next, value); disruptor.publish(next); }
  • 17.
    Example - DisruptorMulti-producer public long next() { long next; long current; do { current = nextSequence.get(); next = current + 1; while (next > (readSequence.get() + size)) { LockSupport.parkNanos(1L); continue; } } while (!nextSequence.compareAndSet(current, next)); return next; }
  • 18.
    Algorithm: Spin -1 public void publish(long sequence) { long sequenceMinusOne = sequence - 1; while (cursor.get() != sequenceMinusOne) { // Spin } cursor.lazySet(sequence); }
  • 19.
    Spin - 1 25 18.75 million ops/sec 12.5 6.25 0 1 2 3 4 5 6 7 8 Producer Threads
  • 20.
    Algorithm: Co-Op public voidpublish(long sequence) { int counter = RETRIES; while (sequence - cursor.get() > pendingPublication.length()) { if (--counter == 0) { Thread.yield(); counter = RETRIES; } } long expectedSequence = sequence - 1; pendingPublication.set((int) sequence & pendingMask, sequence); if (cursor.get() >= sequence) { return; } long nextSequence = sequence; while (cursor.compareAndSet(expectedSequence, nextSequence)) { expectedSequence = nextSequence; nextSequence++; if (pendingPublication.get((int) nextSequence & pendingMask) != nextSequence) { break; } } }
  • 21.
    Spin - 1 Co-Op 30 22.5 million ops/sec 15 7.5 0 1 2 3 4 5 6 7 8 Producer Threads
  • 22.
    Algorithm: Buffer public longnext() { long next; long current; do { current = cursor.get(); next = current + 1; while (next > (readSequence.get() + size)) { LockSupport.parkNanos(1L); continue; } } while (!cursor.compareAndSet(current, next)); return next; }
  • 23.
    Algorithm: Buffer public voidpublish(long sequence) { int publishedValue = (int) (sequence >>> indexShift); published.set(indexOf(sequence), publishedValue); } // Get Value int availableValue = (int) (current >>> indexShift); int index = indexOf(current); while (published.get(index) != availableValue) { // Spin }
  • 24.
    Spin - 1 Co-Op Buffer 70 52.5 million ops/sec 35 17.5 0 1 2 3 4 5 6 7 8 Threads
  • 25.
  • 26.
    Q&A • https://github.com/mikeb01/jax2012 • http://www.lmax.com/careers •http://www.infoq.com/presentations/Lock- free-Algorithms • http://www.youtube.com/watch? v=DCdGlxBbKU4

Editor's Notes

  • #2 - Concurrency is taught all wrong.\n- What is non-blocking concurrency.\n- Mechanical Sympathy, locks/mutexs are a completely artificial construct\n- MTs concurrency course blocking v. non-blocking.\n- Tools for non-blocking concurrency functions of the CPU, need to look at CPU architecture first.\n
  • #3 - Causality\n- Why CPUs/Compilers reorder\n
  • #4 - Java Memory Model provides serial consistency for race-free programs\n- As-if-serial\n- Disallows out of thin air values\n- First main-stream programming language to include a memory model (C/C++ combination of the CPU and whatever the compiler happens to do.\n
  • #5 \n
  • #6 \n
  • #7 \n
  • #8 - volatile\n- java.util.concurrent.atomic.*\n - Atomic<Long|Integer|Reference>\n - Atomic<Long|Integer|Reference>Array (why use over an array of atomics)\n - Atomic<Long|Integer|Reference>FieldUpdater (can be a bit slow)\n
  • #9 - Fight club\n- If you’re smart enough\n
  • #10 \n
  • #11 \n
  • #12 \n
  • #13 \n
  • #14 \n
  • #15 \n
  • #16 \n
  • #17 \n
  • #18 \n
  • #19 \n
  • #20 \n
  • #21 \n
  • #22 \n
  • #23 \n
  • #24 \n
  • #25 \n
  • #26 - Thread wake ups\n- Hard spin\n- Spin with yield\n- PAUSE instruction - please add to Java\n- MONITOR and MWAIT\n
  • #27 \n