This document discusses trends towards lock-free concurrency in Java. It begins with an introduction to concurrency and Java's memory model. It then covers various concurrency primitives in Java 8 like volatile variables, atomics, concurrent collections, and explicit locks. It discusses how techniques like biased locking improve performance of atomics. Finally, it explores trends like software transactional memory (STM) that aim to make lock-free programming more practical.
2. agenda
intro to concurrency & memory model on jvm
reordering -> barriers -> happens-before
jdk 8 concurrency primitives
volatile -> atomics, collections, explicit locks, fork/join
trends in this area (to make all of this practical)
lock-free, STM, TM
3. background
for control and performance, sometimes there are valid
reasons to use locks (like a mutex) for concurrency control
in most other situations, primitive synchronization
constructs in some modules lead to unreliable & incorrect
programs in most non-trivial systems that are composed
over such modules
the best practice, in the current state, is to write single
threaded programs
4. ‘automatic’ concurrency
there are platforms that take your single
threaded program and run it concurrently —
most web servers do this, for example
on the other hand, there are times when you
really must use multiple threads
5. practicality
concurrency control techniques have been studied for a
while, but since 2005 it is being studied intensely* to make
it more practical for more widespread (and safer) use
simpler software techniques, and also hardware level
support for those techniques are being developed
before we see how to write safe code using these new
techniques, let’s look into some basics
* https://scholar.google.com/scholar?as_ylo=2005&q=%22software+transactional+memory%22
6. why concurrency control?
when dealing with multiple threads,
concurrency control/synchronization is
necessary not only to guard critical sections
from multiple threads using a mutex…
but also to ensure that the memory updates
(through mutable variables) are made visible
to all threads ‘correctly’
7. memory model
as a platform, jvm guarantees that ‘correctly
synchronized’ programs have a very well
defined memory behavior
let’s look into the jvm memory model which
defines those guarantees
8. memory model
your code manipulates memory by using variables and
objects
the memory is separated by a few layers of caches from
the cpu
on a multi-core cpu when a write happens in one cpu’s
cache, we need to make it visible to other cpus as well
and then there is the topic of re-odering…
* http://en.wikipedia.org/wiki/Memory_barrier
9. memory model
to improve performance, the hardware (cpu,
caches, …) reorders memory access using its
own memory model (set of rules)* dynamically
the visibility of a value in a memory location is
further complicated by the code reordering
performed by the compiler statically
http://en.wikipedia.org/wiki/Memory_ordering
10. memory model
the static and dynamic reordering strive to
ensure an ‘as-if serial’ semantics
i.e., the program appears to be executing
sequentially as per the lines in your source
code
11. memory model
memory reordering is transparent in single-
threaded use-cases because of that as-if-
serial guarantee
but logic quickly falls apart and causes
surprises in incorrectly synchronized multi-
threaded programs
12. memory model
while jvm’s OOTA safety (out of thin air)
guarantees that a thread always reads a value
written by *some* thread, and not some value
out of thin air…
with all the reordering, it’s good to have a
slightly stronger guarantee …
13. the need for memory barriers
in the following code, say reader is called after writer
(from different threads)
class Reordering {
int x = 0, y = 0;
public void writer() {
x = 1;
y = 2;
}
public void reader() {
int r1 = y;
int r2 = x;
// use r1 and r2
}
}
in reader, even if r1 == 2, r2 can be 0 or 1
synchronization is needed if we want to control the
ordering (and ensure r2 == 1) using a memory barrier
14. memory barrier
the jvm memory model essentially defines the
relationship between the variables in your
code
the semantics also define a partial ordering on
the memory operations so certain actions are
guaranteed to ‘happen before’ others
15. happens-before
happens-before is a visibility guarantee for
memory provided through synchronization
such as locking, volatiles, atomics, etc
…and for completeness, through Thread
start() & join()
16. Concurrency control on jvm with
JDK 8
with that background, let’s look at some
specific tools & mechanisms available on the
jvm & jdk 8..
17. Concurrency control on jvm with
JDK 8
volatiles
atomics
concurrent collections/data-structures
synchronizers
fork/join framework
18. volatiles
volatiles are typically used as a state variables
across threads
writing to & reading from a volatile is like releasing
and acquiring a monitor (lock), respectively
i.e., it guarantees a happens-before relationship
not just with other volatile but also non-volatile
memory
19. volatiles
typical use of volatiles with reader and writer called from
different threads:
class VolatileExample {
int x = 0;
volatile boolean v = false;
public void writer() {
x = 42;
v = true;
}
public void reader() {
if (v == true) {
//uses x - guaranteed to see 42.
}
}
}
the happens-before guarantee in jvm memory model makes it
simpler to reason about the value in x, even though x is non-
volatile!
code: https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
20. volatiles
guaranteeing happens-before relationship for
non-volatile memory is a performance
overhead, so like any other synchronization
primitive, it must be used judiciously
but, it greatly simplifies the program and by
aligning the dynamic and static reordering with
most programmers’ expectations
21. atomics
atomics* extend the notion of volatiles, and support
conditional updates
being an extension to volatiles, they guarantee
happens-before relationship on memory operations
the updates are performed through a CAS cpu
instruction
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/package-summary.html
22. atomics
atomics/cas allow designing non-blocking
algorithms where the critical section is around
a single variable
if there is more than one variable, other forms
of synchronization is needed
23. CAS
JDK 8 uses CAS for ‘lock-free’ operation
at a high-level, it piggy backs on a cpu
provided CAS* instruction —like lock:cmpxchg on
x86
let’s see how jvm dynamically improves the
performance of the hardware provided CAS
*CAS: http://en.wikipedia.org/wiki/Compare-and-swap
24. CAS/atomics
CAS in recent cpu implementations don’t assert the lock# to gain
exclusive bus access, but rather rely on efficient cache-coherence
protocols* — unless the memory address is not cache-line aligned
even if that helps CAS to scale on many-core systems, CAS still
adds a lot to local latency, sometimes nearly halting the cpu
to address that local latency, a biased-locking* approach is used
— where uncontended usage of atomics are recompiled
dynamically to not use CAS instructions!
* more about MESI: https://courses.engr.illinois.edu/cs232/sp2009/lectures/x24.pdf
* biased locking in jvm: https://blogs.oracle.com/dave/entry/biased_locking_in_hotspot
26. atomics
before we move on, JDK 7 also provides
‘weakCompareAndSet’ atomic api, which relaxes the
happens-before ordering guarantee
relaxing the ordering makes it very hard to reason
about the program’s execution so its use is limited to
debugging counters, etc
there are better ways of doing this ‘fast’ — which
brings us to…
27. adders & accumulators
under high contention, the biased locking would be
spending too much time in lock revocation from a
thread if we used atomics
in these high contention situations, adders* help
gather counts by actively reducing contention, and
‘gather’ the value only when sum() or longValue() is
called
* http://download.java.net/lambda/b78/docs/api/java/util/concurrent/atomic/LongAdder.html
28. concurrent collections
the JDK also comes with a handful of lock-
free collections
these help in correctly synchronizing larger
data sets than single variables
29. concurrent collections
ConcurrentHashMap (CHM) uses some of
the concepts listed so far and provides a lock-
free read, and a mostly lock-free write in java 8
relies on a good hashCode to reduce
collisions, after which it reverts to using a lock
for that bin
30. concurrent collections
CHM — in general — allows concurrent use of
a Map which can be pretty useful especially to
represent a shared ‘mutating’ state, and such
CHM, together with adders for example,
enable concurrent, lock-free, histogram
generation across threads
more about CHM here, ofcourse: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-
summary.html
33. synchronized
synchronized keyword is a coarse grained locking
scheme
you acquire & release locks at method or block level,
typically holding the lock longer than needed
translates directly to jvm synchronization (intrinsic) &
hardware monitor
so its use is currently discouraged (might change in java9)
34. explicit locks
Locks* enables fine-grained locking
these extend intrinsic locks, and allow unconditional,
polled, timed & interruptible lock acquisition
allow ‘custom’ wait/notify queues (Condition*) on the
same lock
nice features, but …
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html
* http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html
35. explicit locks
developer needs to remember to release locks, so
following style is encouraged:
Lock l = ...;
l.lock();
try {
// access the resource protected by this lock
} finally {
l.unlock();
}
it gets *very* complicated when we have to deal with
more than 1 lock
…source of all kinds of bugs & surprises
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html
36. ReentrantLock
an implementation of Lock described earlier
support fairness policy to deal with lock
starvation — ‘fair’, not ‘fast’
there is nothing special in this lock to make it
‘reentrant’; all intrinsic locks are per-thread and
reentrant, unlike POSIX invocation based locks
* http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html
37. a note about reentrancy
reentrancy helps encapsulate locking behavior & helps write
cleaner (oop) concurrent code
in simpler cases (using single ‘resource’ but multiple methods)
this also helps avoid deadlocks:
class A {
synchronized void run(){
//..
}
}
class B extends A {
synchronized void run() {
super.run()
}
}
if intrinsic locks were not reentrant on jvm, the call to
super.run() would be deadlocked
39. StampedLock
supports optimistic reads & lock upgrades
is not reentrant — needs the stamp, so not
usable across calls to unknown methods
for internal use in thread safe components,
where you fully understand the data, objects
& methods involved
40. StampedLock
for very short read-only code, optimistic
reads improve throughput by reducing
contention
useful when reading multiple fields of an
object from memory without locking
must call validate() later to ensure consistency
41. StampedLock
along with optimistic reads, the lock upgrade
capability enables many useful idioms:
StampedLock sl = new StampedLock();
double x, y;
..
double distanceFromOrigin() { // A read-only method
long stamp = sl.tryOptimisticRead();
double currentX = x, currentY = y; // read without locking
if (!sl.validate(stamp)) {
stamp = sl.readLock(); // upgrade to read-lock if values are dirty
try {
currentX = x;
currentY = y;
} finally {
sl.unlockRead(stamp);
}
}
return Math.sqrt(currentX * currentX + currentY * currentY);
}
42. fork/join
unlike regular java.lang.Thread (which are mostly
based on POSIX threads), fork/join tasks never
‘block’
for simple tasks, the overhead of constructing and/or
managing a thread is more expensive than the task
itself
programming on fork/join, in essence, allows
frameworks to optimize such tasks ‘behind the scenes’
43. fork/join
going beyond performance, the framework does
nothing to ensure concurrency control
the framework is also only usable in a few
scenarios where task can be easily disintegrated
in a sense, this is not making it easier to create
correct (and fast) programs
44. lambdas & streams
framework available on jdk 8 for data-
processing workloads
looks ‘functional’ — but due to type-erasure
these aren't typed
‘look’ like anonymous inner class but are
fundamentally different from the ground-up —
enabling jvm optimizations for concurrency & gc
45. lock-free
we’ve looked at a few lock-free concepts at a
single-variable level, using CAS
and atomics, which rely on CAS
and optimizations to make CAS faster…
46. lock-free
but how do we write ‘real-world’ concurrent
applications using lock-free concepts?
i.e., more than just CAS?
47. lock-free
that brings us to software transactional
memory (STM)!
STM is to concurrency control, what garbage-
collection is to memory management
48. STM
brings DB transaction concept to regular
memory access
read & write ‘as-if’ there is no contention…
during commit time the system ensures sanity
under the hood
… no locks in the code!
49. STM
in low contention use-cases (i.e., well-
designed programs), the absence of
synchronization makes execution very fast!
even in poorly designed programs, the
absence of locks makes it easier to focus on
correctness
50. STM implementation
multiverse[1] is a popular jvm implementation of
STM (groovy and Scala/Akka use it in their STM)
in essence, multiverse implements multiversion
concurrency control (MVCC[2])
Clojure has a language built-in STM feature
[1] http://multiverse.codehaus.org/overview.html
[2] http://en.wikipedia.org/wiki/Multiversion_concurrency_control
51. STM & composability
the biggest benefit of STM is composability (software
reuse)
class Account {
private final TxnRef<Date> lastUpdate = …;
private final TxnInteger balance = …;
public void incBalance(int amount, Date date){
atomic(new Runnable() {
public void run(){
balance.inc(amount);
lastUpdate.set(date);
if(balance.get() < 0) {
throw new IllegalStateException("Not enough money");
}
}
});
}
}
class Teller {
state void transfer(Account from, Account to, int amount) {
atomic(new Runnable() {
public void run() {
Date date = new Date();
from.incBalance(-amount, date);
to.incBalance(amount, date);
}
});
}
}
52. STM & composability
the Teller class is able to ‘compose’ over other
atomic operations without knowing their internal
details (i.e., what locks they use to synchronize)
so if to.incBalance() fails, the memory effects of
from.incBalance() are not committed so will
never be visible to other threads!
this is a pretty big deal…
53. Simplicity
STM makes composing concurrent software
modules appear very trivial
in the absence of locks, it is easier to
conceptualize the code flow
the ability to code atomic operations this way
essentially nullifies the challenges typically
associated with concurrent programming
54. performance
as stated earlier, stm allows optimistic execution: ‘as
though’ there are no other threads running, so it
increases concurrency
STM synchronizes only when required and falls back to
slower (serialized) executions when necessary
STM performs better than explicit locks as the number
of cores increase beyond 4*
* http://en.wikipedia.org/wiki/Software_transactional_memory
http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
55. more performance
apart from just software improvements, cpu
makers have started looking into hardware
support for TM
this is an emerging area and more advances are
being made, apart from Haswell, and TSX from
Intel
* https://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell
* http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions
56. STM & Practicality
concurrent programming is getting more
practical
stm brings the benefits of fine-grained locking
to coarse-grained locking without using locks
57. Summary
lock-free concurrency control techniques like
STM not only make it easier to write correct
code…
but also allows platforms (like JVM) to make
your code correct code run faster
58. References
Being a long slideshow with dense content,
I’ve put references on each slide so you can
read through
Reach out to me on LinkedIn if you’d like more
info or just to discuss!