Lockless Programming
Tomasz Barański
IBM Research
Me
Making software for 15 years
IBM Research @ KRK
Lockless?
Programming with multiple threads that access
shared memory and threads cannot block
each other.
Why?
And also
(Dead|Live)locks
Priority inversion
Lock convoy
How?
Atomic operations Memory barriers
Atomic operations Memory barriers
( τομοςἄ indivisible)
Atomic operations Memory barriers
CAS FAA|AAF
Atomic operations Memory barriers
CAS FAA|AAF
LoadLoad LoadStore
StoreLoad StoreStore
Compare-And-Swap
cas(val, old, new) =
if val == old
val = new
return SUCCESS
else
return FAIL
Fetch-And-Add
faa(val, i) =
tmp = val
val += i
return tmp
Sequential consistency
acqiure lock
read X
read Y
(…)
store Y
store X
release lock
Pseudo-assembly
acqiure lock
read X
read Y
(…)
store Y
store X
release lock
acqiure lock
read Y
(…)
store X
(...)
read X
(...)
store Y
release lock
reordering
compiler
(JVM)
CPU
read Y
(…)
store X
(...)
read X
(...)
store Y
read Y
(…)
store X
(...)
read X
(...)
store Y
Thread 2Thread 1
What are X and Y?
Sequential consistency
All threads (on all CPUs) agree on order of all
memory operations, and the order is consistent
with the operations order in the source code.
Memory barriers
read X
LoadLoad Barrier
read Y
(…)
store Y
store X
read X
(…)
store X
(...)
read Y
(...)
store Y
reordering
compiler
(JVM)
CPU
read X
read Y
(…)
store Y
StoreStore Barrier
store X
read Y
(…)
store Y
(...)
read X
(...)
store X
reordering
compiler
(JVM)
CPU
read X
read Y
(…)
LoadStore Barrier
store Y
store X
read Y
(…)
read X
(…)
store X
(...)
store Y
reordering
compiler
(JVM)
CPU
store X
store Y
(…)
StoreLoad Barrier
read X
read Y
store Y
(…)
store X
(…)
read X
(...)
read Y
reordering
compiler
(JVM)
CPU
Full barrier
Let's get practical!
Lock-free (FIFO) queue
(by John D. Valois)
enqueue(x) =
acquire(lock)
q = new Node
q.value = x
q.next = NULL
tail.next = q
tail = q
release(lock)
enqueue(x) =
acquire(lock)
q = new Node
q.value = x
q.next = NULL
tail.next = q
tail = q
release(lock)
enqueue(x) =
acquire(lock)
q = new Node
q.value = x
q.next = NULL
tail.next = q
tail = q
release(lock)
enqueue(x) =
q = new Node
q.value = x
q.next = NULL
do
p = tail
succ = CAS(p.next, NULL, q)
if !succ
CAS(tail, p, p.next)
while !succ
CAS(tail, p, q)
enqueue(x) =
q = new Node
q.value = x
q.next = NULL
do
p = tail
succ = CAS(p.next, NULL, q)
if !succ
CAS(tail, p, p.next)
while !succ
CAS(tail, p, q)
dequeue() =
do
p = head
if p.next == NULL
error QUEUE_EMPTY
while !CAS(head, p, p.next)
return p.next.value
Never waits
Never blocks
Silver bullet?
More difficult
ABA problem
Solution?
Tagged reference
Intermediate nodes
LL/SC
Load-Link / Store-Conditional
Separates storage has value
from storage has been changed.
PowerPC, ARM
but NOT: x86, SPARC
LoadLink(x) =
read(x)
mark(x)
StoreConditional(x) =
if x marked
store(x)
unmark(x)
return SUCCESS
else
return FAILURE
Language support
C (gcc)
__sync_fetch_and_add (_sub, _or...)
__sync_add_and_fetch (_sub, _or...)
__sync_bool_compare_and_swap
__sync_val_compare_and_swap
__sync_synchronize
C++11
#include <atomic>
template <class T> struct atomic;
atomic_thread_fence(...)
::store(...)
::load(...)
::compare_exchange(...)
::fetch_add(...)
Java
java.util.concurrent.atomic
AtomicInteger
.addAndGet
.getAndAdd
.compareAndSet
AtomicIntegerArray
AtomicReference
AtomicStampedReference
?

Atmosphere 2014: Lockless programming - Tomasz Barański