Advanced
Java Concurrency
arno.haase@haase-consulting.com
Premature Optimization is
the root of all evil.
Donald Knuth
We should forget about small
efficiencies, say about 97% of
the time:
Yet we should not pass up our
opportunities in that critical 3%.
JDK: Atomic*, Streams, synchronized,
Locks, Fork/Join, ConcHashMap, ...
Libraries: JEE / Spring,
Akka, LMAX, Servlet 3, ...
Java Memory Model
Architecture
Algorithms
Concepts
Ideas
1. Java Memory Model
2. Concepts and Paradigms
3. Performance
Threads (naive)
● Threads take turns doing work.
● Each Thread does what the source code says.
● When a thread is interrupted, other threads see
the intermediate state.
● Synchronization serves only to make several
changes atomic.
Threads (a little less naive)
● There are several CPUs / cores executing
threads really at the same time.
● Each thread does what the source code says.
● Data access goes to RAM.
● When different threads access the same data,
they do that one after the other.
Threads: the truth
● Within a single thread, the JVM makes it look
as if the thread does what the source code
says.
● There is a defined ordering between things
happening in different threads if
synchronization prescribes that („happens-
before“).
● And that is all!
Does your computer do
what you programmed?
yes no
Compiler: Instruction Reordering,
expression caching, …
Hotspot: Register Allocation, Instruction
Reordering, Escape Analysis, …
Prozessor: Branch Prediction, Speculative
Evaluation, Prefetch, …
Caches: Store Buffers, Private Shared
Caches, …
Transformations
Visible effects are the same at all levels.
Example: Loop termination
boolean shutdown = false;
…
void doIt() {
while (!shutdown) {
…
}
}
boolean shutdown = false;
…
void doIt() {
boolean b = shutdown;
while (!b) {
…
}
}
Correct Synchronization
● No Race Conditions:
● If several threads access the same variable,
● and at least one of them writes,
● then access must be ordered by „happens-before“.
● If that is true for a program, then it will run as if
● all memory access was sequential
● and really went to RAM
● in the order defined by „happens-before“.
Example: volatile
● If
● Thread T1 writes a volatile Variable v, and
● Thread T2 then reads v,
● then
● all changes T1 made up to writing v are 'before' all
code in T2 after the read from v.
Memory Barriers
● Tools used to implement JMM
● Assembler instructions
● „synchronizing“ CPUs and Caches
● Constrain Reordering
● Potentially expensive
● Direct Cost: Cache Dirtying
● Limit Optimizations
● Especially between processors
● Where does Java use them?
● synchronized, Locks
● volatile
● after constructors (for final fields), …
Concurrency
is
complex!
A simple Logger
public class Logger {
public void log (String msg, Object... params) {
String s = doFormat (msg, params);
doLog (s);
}
private String doFormat (String msg, Object... params) {…}
private void doLog (String msg) {…}
}
thread safe?
public class Logger {
public synchronized void log (String msg, Object... params) {
String s = doFormat (msg, params);
doLog (s);
}
private String doFormat (String msg, Object... params) {…}
private void doLog (String msg) {…}
}
Hidden Deadlocks
public class X {
public synchronized void doIt () {
log.log ("doing it");
…
}
public synchronized String toString () { … }
…
}
x.doIt(); log.log ("%s", x);
lock (x)
log.log("...") → lock (log)
lock (log)
x.toString() → lock (x)
Asynchronous Logging
final ExecutorService exec =
Executors.newSingleThreadExecutor();
public void log (String msg, Object... args) {
exec.execute (() -> {
String formatted = doFormat (msg, args);
doLog (formatted);
});
}
Parallel Execution?
● Formatting can be expensive
● toString(): Callbacks to application code!
● Thread Pool for Logging?
● Message ordering is lost!
● … and doLog() ist not thread safe
Future<String>
final ExecutorService exec =
Executors.newSingleThreadExecutor();
public void log (String msg, Object... args) {
final Future<String> formatted =
CompletableFuture.supplyAsync(() -> doFormat(msg, args));
exec.execute (() -> {
try {
doLog (formatted.get());
}
catch (Exception exc) {
exc.printStackTrace();
}
});
}
Concurrency:
Shared Mutable State
● Locks: Blocking
● Read/Write and other optimizations
● Single Worker-Thread with a Message-Queue:
Non-Blocking
● Actors as a variant
● „Lock Free“ has share mutable state
● Every approach has its specific cost!
Worker Threads
● Queues
● Bounded / Unbounded
● Blocking / Non-Blocking
● Single / Multi Producers / Consumers
● Optimized for reading or writing (or a mix)
● Special features: Priority, remove(), Batch, …
● Future for results
● .thenAccept(...) / .thenAcceptAsync(...)
Which Thread Pool to use?
● Code with side effects
● (usually) ordering is important → Executors.newSingleThreadPool()
● non-blocking without Seiteneffekte
● ForkJoinPool.commonPool()
● blocking
● Specific ExecutorService per context
● Tuning, Monitoring, …
● Divide and conquer big proglems
● Measure to verify assumptions!
● Never ad hoc!
Amdahl's Law
Sharing is the root
of all contention
● more precisely: sharing mutable Ressources
● Locks, I/O, …
● Hidden Sharing
● ReadLock: Counter
● UUID.randomUUID()
● volatile: Shared main memory
● Cache Lines (Locality, Poisoning)
● Disk, DVDs, Network
● …
● Solutions: Isolation or Immutability
Lock-free Programming
● Shared Mutable State
● Lock Free
● Callers need never wait
● Processing may be delayed however
● e.g. asynchronous decoupling
● Responsive, saves ressources
● Wait Free
● Processing without a delay
● Special algorithms and data structures
● Highly complex; Ongoing research, work in progress
● ConcurrentHashMap, ConcurrentLinkedQueue
CAS loop
final AtomicInteger n = new AtomicInteger (0);
int max (int i) {
int prev, next;
do {
prev = n.get();
next = Math.max (prev, i);
} while (! n.compareAndSet (prev, next));
return next;
}
Functional Programming
● No Seiteneffekte
● All Objects are immutable, changes create new objects
● not just using Lambdas: JDK collections, Guice, …
● Scala, Clojure; a-foundation
● different style of programming
● efficient copying: partial reuse of objects
● functional algorithms
● State is stable automatically, even concurrently
Performance-Tuning
for Concurrency
public class StockExchange {
private final Map<Currency, Double> rates = new HashMap<>();
private final Map<String, Double> pricesInEuro = new HashMap<>();
public void updateRate (Currency currency, double fromEuro) {
rates.put (currency, fromEuro);
}
public void updatePrice (String wkz, double euros) {
pricesInEuro.put (wkz, euros);
}
public double currentPrice (String wkz, Currency currency) {
return pricesInEuro.get (wkz) * rates.get (currency);
}
}
Wait-Free:
ConcurrentHashMap
public class StockExchange {
private final Map<Currency, Double> rates =
new ConcurrentHashMap<>();
private final Map<String, Double> pricesInEuro =
new ConcurrentHashMap<>();
…
}
Fine-grained Locks:
Collections.synchronizedMap
public class StockExchange {
private final Map<Currency, Double> rates =
Collections.synchronizedMap (new HashMap<>());
private final Map<String, Double> pricesInEuro =
Collections.synchronizedMap (new HashMap<>());
…
}
Coarse Locks
public class StockExchange {
…
public synchronized void updateRate (…) {
…
}
public synchronized void updatePrice (…) {
…
}
public synchronized double currentPrice (…) {
…
}
}
Functional: Immutable Maps
public class StockExchange {
private final AtomicReference<AMap<Currency, Double>> rates =
new AtomicReference<> (AHashMap.empty ());
…
public void updateRate (Currency currency, double fromEuro) {
AMap<Currency, Double> prev, next;
do {
prev = rates.get ();
next = prev.updated (currency, fromEuro);
}
while (! rates.compareAndSet (prev, next));
}
…
}
Variant:
reduced update guarantees
public class StockExchange {
private volatile AMap<Currency, Double> rates =
AHashMap.empty ();
…
public void updateRate (Currency currency, double fromEuro) {
rates = rates.updated (currency, fromEuro);
}
…
}
Queue with a Worker-Thread
public class StockExchange {
private final Map<Currency, Double> rates = new HashMap<>();
…
private final BlockingQueue<Runnable> queue = …;
public StockExchange() {
new Thread(() -> {
while (true) queue.take().run();
}).start();
}
public void updateRate (Currency currency, double fromEuro) {
queue.put (() -> rates.put (currency, fromEuro));
}
…
}
Tests for comparison
for (int i=0; i<1_000_000; i++) {
stockExchange.updatePrice ("abc", 1.23);
}
volatile int v=0;
…
for (int i=0; i<1_000_000; i++) {
v=v;
stockExchange.updatePrice ("abc", 1.23);
}
volatile int v=0;
final LinkedList<String> l = new LinkedList<>();
…
for (int i=0; i<1_000_000; i++) {
l.add ("abc");
v=v;
stockExchange.updatePrice (l.remove(), 1.23);
}
Mixed Load
Mixed Load
Mostly Updates
And the Winner is...
● ConcurrentHashMap
● No consistency requirements, mostly reads, algorithmic access
● Queue with Worker Thread
● Transactionaler access, centralized Event-Queue
●
Data locality in multi-processor systems
● Immutable Map
●
Stable data during read operations
● long running reads
● „lossy“ → fast udate
● Locks
● never particularly fast → prescribed threading and data model
● Granularity: Lock per operation
Testing (1): Correctness
● „it wors“ is not enough
● JMM vs. JVM, Hardware, …
● Reviews
● controlled interruptions
● Shotgun
Testen (2): Performance
● Reserve time for testing!
● realistic Hardware
● big Hardware
● Multi-Core vs. Multi-Prozessor → HW optimizations for cache exchange
● Scaling effects
● realistic load scenarios (→ Know your load!!! Document assumptions!!!)
● Virtualized hardware
● Ask specific questions – there are many variables
● Pool sizes, cut-off for serial processing
● Hardware sizes: RAM, #Cores, …
● Series of measurements to evaluate alternatives
Practice: Local Parallelization
● e.g. searching in a large collection
● Fork/Join examples
●
Streams-API
● Verify improvements
● APIs are simple and invite „ad hoc“ use
● apparent improvements – at least without background load
●
overall CPU load is increased – use only if CPUs would
otherwise be idling!
● Measure, measure, measure: real hardware, real load
scenarios
Wrap-Up
● Concurrency is difficult
● Really understand your problems
● Measure, measure, measure!
● OS and hardware influence behavior
● Correctness before performance
● Korrektheit vor Performance
● Möglichst grobe Concurrency
● Share Nothing
The End
● Links:
● http://channel9.msdn.com/Shows/Going+Deep/Cpp
-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-
of-2
● http://github.com/arnohaase/a-foundation

Java Concurrency in Practice

  • 1.
  • 2.
    Premature Optimization is theroot of all evil. Donald Knuth We should forget about small efficiencies, say about 97% of the time: Yet we should not pass up our opportunities in that critical 3%.
  • 3.
    JDK: Atomic*, Streams,synchronized, Locks, Fork/Join, ConcHashMap, ... Libraries: JEE / Spring, Akka, LMAX, Servlet 3, ... Java Memory Model Architecture Algorithms Concepts Ideas
  • 4.
    1. Java MemoryModel 2. Concepts and Paradigms 3. Performance
  • 5.
    Threads (naive) ● Threadstake turns doing work. ● Each Thread does what the source code says. ● When a thread is interrupted, other threads see the intermediate state. ● Synchronization serves only to make several changes atomic.
  • 6.
    Threads (a littleless naive) ● There are several CPUs / cores executing threads really at the same time. ● Each thread does what the source code says. ● Data access goes to RAM. ● When different threads access the same data, they do that one after the other.
  • 7.
    Threads: the truth ●Within a single thread, the JVM makes it look as if the thread does what the source code says. ● There is a defined ordering between things happening in different threads if synchronization prescribes that („happens- before“). ● And that is all!
  • 8.
    Does your computerdo what you programmed? yes no
  • 9.
    Compiler: Instruction Reordering, expressioncaching, … Hotspot: Register Allocation, Instruction Reordering, Escape Analysis, … Prozessor: Branch Prediction, Speculative Evaluation, Prefetch, … Caches: Store Buffers, Private Shared Caches, … Transformations Visible effects are the same at all levels.
  • 10.
    Example: Loop termination booleanshutdown = false; … void doIt() { while (!shutdown) { … } } boolean shutdown = false; … void doIt() { boolean b = shutdown; while (!b) { … } }
  • 11.
    Correct Synchronization ● NoRace Conditions: ● If several threads access the same variable, ● and at least one of them writes, ● then access must be ordered by „happens-before“. ● If that is true for a program, then it will run as if ● all memory access was sequential ● and really went to RAM ● in the order defined by „happens-before“.
  • 12.
    Example: volatile ● If ●Thread T1 writes a volatile Variable v, and ● Thread T2 then reads v, ● then ● all changes T1 made up to writing v are 'before' all code in T2 after the read from v.
  • 13.
    Memory Barriers ● Toolsused to implement JMM ● Assembler instructions ● „synchronizing“ CPUs and Caches ● Constrain Reordering ● Potentially expensive ● Direct Cost: Cache Dirtying ● Limit Optimizations ● Especially between processors ● Where does Java use them? ● synchronized, Locks ● volatile ● after constructors (for final fields), …
  • 14.
  • 15.
    A simple Logger publicclass Logger { public void log (String msg, Object... params) { String s = doFormat (msg, params); doLog (s); } private String doFormat (String msg, Object... params) {…} private void doLog (String msg) {…} }
  • 16.
    thread safe? public classLogger { public synchronized void log (String msg, Object... params) { String s = doFormat (msg, params); doLog (s); } private String doFormat (String msg, Object... params) {…} private void doLog (String msg) {…} }
  • 17.
    Hidden Deadlocks public classX { public synchronized void doIt () { log.log ("doing it"); … } public synchronized String toString () { … } … } x.doIt(); log.log ("%s", x); lock (x) log.log("...") → lock (log) lock (log) x.toString() → lock (x)
  • 18.
    Asynchronous Logging final ExecutorServiceexec = Executors.newSingleThreadExecutor(); public void log (String msg, Object... args) { exec.execute (() -> { String formatted = doFormat (msg, args); doLog (formatted); }); }
  • 19.
    Parallel Execution? ● Formattingcan be expensive ● toString(): Callbacks to application code! ● Thread Pool for Logging? ● Message ordering is lost! ● … and doLog() ist not thread safe
  • 20.
    Future<String> final ExecutorService exec= Executors.newSingleThreadExecutor(); public void log (String msg, Object... args) { final Future<String> formatted = CompletableFuture.supplyAsync(() -> doFormat(msg, args)); exec.execute (() -> { try { doLog (formatted.get()); } catch (Exception exc) { exc.printStackTrace(); } }); }
  • 21.
    Concurrency: Shared Mutable State ●Locks: Blocking ● Read/Write and other optimizations ● Single Worker-Thread with a Message-Queue: Non-Blocking ● Actors as a variant ● „Lock Free“ has share mutable state ● Every approach has its specific cost!
  • 22.
    Worker Threads ● Queues ●Bounded / Unbounded ● Blocking / Non-Blocking ● Single / Multi Producers / Consumers ● Optimized for reading or writing (or a mix) ● Special features: Priority, remove(), Batch, … ● Future for results ● .thenAccept(...) / .thenAcceptAsync(...)
  • 23.
    Which Thread Poolto use? ● Code with side effects ● (usually) ordering is important → Executors.newSingleThreadPool() ● non-blocking without Seiteneffekte ● ForkJoinPool.commonPool() ● blocking ● Specific ExecutorService per context ● Tuning, Monitoring, … ● Divide and conquer big proglems ● Measure to verify assumptions! ● Never ad hoc!
  • 24.
  • 25.
    Sharing is theroot of all contention ● more precisely: sharing mutable Ressources ● Locks, I/O, … ● Hidden Sharing ● ReadLock: Counter ● UUID.randomUUID() ● volatile: Shared main memory ● Cache Lines (Locality, Poisoning) ● Disk, DVDs, Network ● … ● Solutions: Isolation or Immutability
  • 26.
    Lock-free Programming ● SharedMutable State ● Lock Free ● Callers need never wait ● Processing may be delayed however ● e.g. asynchronous decoupling ● Responsive, saves ressources ● Wait Free ● Processing without a delay ● Special algorithms and data structures ● Highly complex; Ongoing research, work in progress ● ConcurrentHashMap, ConcurrentLinkedQueue
  • 27.
    CAS loop final AtomicIntegern = new AtomicInteger (0); int max (int i) { int prev, next; do { prev = n.get(); next = Math.max (prev, i); } while (! n.compareAndSet (prev, next)); return next; }
  • 28.
    Functional Programming ● NoSeiteneffekte ● All Objects are immutable, changes create new objects ● not just using Lambdas: JDK collections, Guice, … ● Scala, Clojure; a-foundation ● different style of programming ● efficient copying: partial reuse of objects ● functional algorithms ● State is stable automatically, even concurrently
  • 29.
    Performance-Tuning for Concurrency public classStockExchange { private final Map<Currency, Double> rates = new HashMap<>(); private final Map<String, Double> pricesInEuro = new HashMap<>(); public void updateRate (Currency currency, double fromEuro) { rates.put (currency, fromEuro); } public void updatePrice (String wkz, double euros) { pricesInEuro.put (wkz, euros); } public double currentPrice (String wkz, Currency currency) { return pricesInEuro.get (wkz) * rates.get (currency); } }
  • 30.
    Wait-Free: ConcurrentHashMap public class StockExchange{ private final Map<Currency, Double> rates = new ConcurrentHashMap<>(); private final Map<String, Double> pricesInEuro = new ConcurrentHashMap<>(); … }
  • 31.
    Fine-grained Locks: Collections.synchronizedMap public classStockExchange { private final Map<Currency, Double> rates = Collections.synchronizedMap (new HashMap<>()); private final Map<String, Double> pricesInEuro = Collections.synchronizedMap (new HashMap<>()); … }
  • 32.
    Coarse Locks public classStockExchange { … public synchronized void updateRate (…) { … } public synchronized void updatePrice (…) { … } public synchronized double currentPrice (…) { … } }
  • 33.
    Functional: Immutable Maps publicclass StockExchange { private final AtomicReference<AMap<Currency, Double>> rates = new AtomicReference<> (AHashMap.empty ()); … public void updateRate (Currency currency, double fromEuro) { AMap<Currency, Double> prev, next; do { prev = rates.get (); next = prev.updated (currency, fromEuro); } while (! rates.compareAndSet (prev, next)); } … }
  • 34.
    Variant: reduced update guarantees publicclass StockExchange { private volatile AMap<Currency, Double> rates = AHashMap.empty (); … public void updateRate (Currency currency, double fromEuro) { rates = rates.updated (currency, fromEuro); } … }
  • 35.
    Queue with aWorker-Thread public class StockExchange { private final Map<Currency, Double> rates = new HashMap<>(); … private final BlockingQueue<Runnable> queue = …; public StockExchange() { new Thread(() -> { while (true) queue.take().run(); }).start(); } public void updateRate (Currency currency, double fromEuro) { queue.put (() -> rates.put (currency, fromEuro)); } … }
  • 36.
    Tests for comparison for(int i=0; i<1_000_000; i++) { stockExchange.updatePrice ("abc", 1.23); } volatile int v=0; … for (int i=0; i<1_000_000; i++) { v=v; stockExchange.updatePrice ("abc", 1.23); } volatile int v=0; final LinkedList<String> l = new LinkedList<>(); … for (int i=0; i<1_000_000; i++) { l.add ("abc"); v=v; stockExchange.updatePrice (l.remove(), 1.23); }
  • 37.
  • 38.
  • 39.
  • 40.
    And the Winneris... ● ConcurrentHashMap ● No consistency requirements, mostly reads, algorithmic access ● Queue with Worker Thread ● Transactionaler access, centralized Event-Queue ● Data locality in multi-processor systems ● Immutable Map ● Stable data during read operations ● long running reads ● „lossy“ → fast udate ● Locks ● never particularly fast → prescribed threading and data model ● Granularity: Lock per operation
  • 41.
    Testing (1): Correctness ●„it wors“ is not enough ● JMM vs. JVM, Hardware, … ● Reviews ● controlled interruptions ● Shotgun
  • 42.
    Testen (2): Performance ●Reserve time for testing! ● realistic Hardware ● big Hardware ● Multi-Core vs. Multi-Prozessor → HW optimizations for cache exchange ● Scaling effects ● realistic load scenarios (→ Know your load!!! Document assumptions!!!) ● Virtualized hardware ● Ask specific questions – there are many variables ● Pool sizes, cut-off for serial processing ● Hardware sizes: RAM, #Cores, … ● Series of measurements to evaluate alternatives
  • 43.
    Practice: Local Parallelization ●e.g. searching in a large collection ● Fork/Join examples ● Streams-API ● Verify improvements ● APIs are simple and invite „ad hoc“ use ● apparent improvements – at least without background load ● overall CPU load is increased – use only if CPUs would otherwise be idling! ● Measure, measure, measure: real hardware, real load scenarios
  • 44.
    Wrap-Up ● Concurrency isdifficult ● Really understand your problems ● Measure, measure, measure! ● OS and hardware influence behavior ● Correctness before performance ● Korrektheit vor Performance ● Möglichst grobe Concurrency ● Share Nothing
  • 45.
    The End ● Links: ●http://channel9.msdn.com/Shows/Going+Deep/Cpp -and-Beyond-2012-Herb-Sutter-atomic-Weapons-1- of-2 ● http://github.com/arnohaase/a-foundation