Java memory model

Java Memory Model
Michał Warecki

Outline
● Introduction to JMM
● Happens-before
● Memory barriers
● Performance issues
● Atomicity
● JEP 171
● Non blocking algorithms
Java
C++
ASM

Java Memory Model
● Instructions reordering
● Visibility
● Final fields
● Interaction with atomic instructions

Java Memory Model
● The Java memory model (JMM) describes how threads in
the Java programming language interact through
memory.
● Provides sequential consistency for data race free
programs.

Instructions reordering
Program order:
int a = 1;
int b = 2;
int c = 3;
int d = 4;
int e = a + b;
int f = c – d;
Execution order:
int d = 4;
int c = 3;
int f = c – d;
int b = 2;
int a = 1;
int e = a + b;

Quiz
x = y = 0
x = 1
j = y
y = 1
i = x
What could be the result?
Thread 1 Thread 2

Answer(s)
● i = 1; j = 1
● i = 0; j = 1
● i = 1; j = 0
● i = 0; j = 0

Happens-before order
Two actions can be ordered by a happens-before
relationship. If one action happens-before another, then the
first is visible to and ordered before the second.
Java Language Specification, Java SE 7 Edition

Happens-before rules
● A monitor release and matching later monitor acquire
establish a happens before ordering.
● A write to a volatile field happens-before every
subsequent read of that field.
● Execution order within a thread also establishes a
happens before order.
● Happens before order is transitive.

Java tools
● Volatile variables
volatile boolean running = true;
● Monitors
synchronized (this) {
i = a;
a = i;
}
ReentrantLock lock = new ReentrantLock();
lock.lock();
lock.unlock();

What does volatile do?
● Volatile reads/writes can not be reordered
● Compilers and runtime are not allowed to allocate volatile
variables in registers
● Volatile longs and doubles are atomic

Volatiles and monitors ordering
Can Reorder 2nd operation
1st operation Normal Load
Normal Store
Volatile Load
MonitorEnter
Volatile Store
MonitorExit
Normal Load
Normal Store
No
Volatile Load
MonitorEnter
No No No
Volatile store
MonitorExit
No No
The JSR-133 Cookbook for Compiler Writers

Visibility
Thread 1:
public void run() {
int counter = 0;
while (running) {
counter++;
}
System.out.println("Counted up
to " + counter);
}
Thread 2:
public void run() {
try {
Thread.sleep(100);
} catch (InterruptedException
ignored) { }
running = false;
}
LoopFlag

How is it possible?
● Compiler can reorder instructions.
● Compiler can keep values in registers.
● Processor can reorder instructions.
● Values may not be synchronized to main memory.
● JMM is designed to allow aggressive optimizations.
LoopFlag - volatile

Visibility
LoopFlag – asm - loop

Memory access time
● Registers / Buffers: < 1ns
● L1: ~1ns (3-4 cycles)
● L2: ~3ns (10-12 cycles)
● L3: ~15ns (40-45 cycles)
● DRAM: ~65ns
● QPI: ~40ns

Memory barriers
● LoadLoad
● StoreStore
● LoadStore
● StoreLoad

Memory barrier - LoadLoad
The sequence: Load1; LoadLoad; Load2
Ensures that Load1's data are loaded before data accessed
by Load2 and all subsequent load instructions are loaded. In
general, explicit LoadLoad barriers are needed on
processors that perform speculative loads and/or out-of-
order processing in which waiting load instructions can
bypass waiting stores. On processors that guarantee to
always preserve load ordering, the barriers amount to no-
ops.

Memory barrier - StoreStore
The sequence: Store1; StoreStore; Store2
Ensures that Store1's data are visible to other processors
(i.e., flushed to memory) before the data associated with
Store2 and all subsequent store instructions. In general,
StoreStore barriers are needed on processors that do not
otherwise guarantee strict ordering of flushes from write
buffers and/or caches to other processors or main memory.

Memory barrier - LoadStore
The sequence: Load1; LoadStore; Store2
Ensures that Load1's data are loaded before all data
associated with Store2 and subsequent store instructions
are flushed. LoadStore barriers are needed only on those
out-of-order procesors in which waiting store instructions
can bypass loads.

Memory barrier - StoreLoad
The sequence: Store1; StoreLoad; Load2
Ensures that Store1's data are made visible to other processors (i.e., flushed to
main memory) before data accessed by Load2 and all subsequent load instructions
are loaded. StoreLoad barriers protect against a subsequent load incorrectly using
Store1's data value rather than that from a more recent store to the same location
performed by a different processor. Because of this, on the processors discussed
below, a StoreLoad is strictly necessary only for separating stores from subsequent
loads of the same location(s) as were stored before the barrier. StoreLoad barriers
are needed on nearly all recent multiprocessors, and are usually the most
expensive kind. Part of the reason they are expensive is that they must disable
mechanisms that ordinarily bypass cache to satisfy loads from write-buffers. This
might be implemented by letting the buffer fully flush, among other possible stalls.

Memory barriers
Required
barriers
2nd operation
1st operation
Normal Load Normal Store Volatile Load
MonitorEnter
Volatile Store
MonitorExit
Normal Load LoadStore
Normal Store StoreStore
Volatile Load
MonitorEnter
LoadLoad LoadStore LoadLoad LoadStore
Volatile Store
MonitorExit
StoreLoad StoreStore

Intel X86/64 Memory Model
● Loads are not reordered with other loads.
● Stores are not reordered with other stores.
● Stores are not reordered with older loads.
● Loads may be reordered with older stores to different locations but
not with older stores to the same location.
● In a multiprocessor system, memory ordering obeys causality (memory
ordering respects transitive visibility).
● In a multiprocessor system, stores to the same location have a total order.
● In a multiprocessor system, locked instructions have a total order.
● Loads and stores are not reordered with locked instructions.
LoopFlag – asm - store, MemoryBarriers – asm

StoreLoad on Intel Ivy Bridge
lock addl $0x0,(%rsp)
Intel's IA-32 developer manual: Locked operations are
atomic with respect to all other memory operations and all
externally visible events. [...] Locked instructions can be
used to synchronize data written by one processor and read
by another processor.

Volatile performance
Normal write Volatile write Normal read Volatile read
0
200000000
400000000
600000000
800000000
1000000000
1200000000
1000000000operations
JiT - asm

Memory barriers - architecture
Processor LoadStore LoadLoad StoreStore StoreLoad Data
dependency
orders
loads?
Atomic
Conditional
Other
Atomics
Atomics
provide
barrier?
sparc-TSO no-op no-op no-op membar
(StoreLoad)
yes CAS:
casa
swap,
ldstub
full
x86 no-op no-op no-op mfence or
cpuid or
locked
insn
yes CAS:
cmpxchg
xchg,
locked
insn
full
ia64 combine
with
st.rel or
ld.acq
ld.acq st.rel mf yes CAS:
cmpxchg
xchg,
fetchadd
target +
acq/rel
arm dmb
(see below)
dmb
(see below)
dmb-st dmb indirection
only
LL/SC:
ldrex/strex
target
only
ppc lwsync
(see below)
lwsync
(see below)
lwsync hwsync indirection
only
LL/SC:
ldarx/stwcx
target
only
alpha mb mb wmb mb no LL/SC:
ldx_l/stx_c
target
only
pa-risc no-op no-op no-op no-op yes build
from
ldcw
ldcw (NA)
* The x86 processors supporting "streaming SIMD" SSE2 extensions require LoadLoad "lfence" only only in connection with these
streaming instructions.

Final fields
● Act as a normal field, but:
– A store of a final field (inside a constructor) and, if the field
is a reference, any store that this final can reference, cannot
be reordered with a subsequent store (outside that
constructor) of the reference to the object holding that field
into a variable accessible to other threads. (x.finalField =
v; ... ; sharedRef = x;)
– The initial load (i.e., the very first encounter by a thread) of
a final field cannot be reordered with the initial load of the
reference to the object containing the final field. (v.afield =
1; x.finalField = v; ... ; sharedRef = x;)

Final field example
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x;
int j = f.y;
}
}
}

Final field example
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x;
int j = f.y;
}
}
}
Guaranteed value 3
4 or 0 !!

●Atomicity
● java.util.concurrent.atomic
– AtomicBoolean
– AtomicInteger
– AtomicIntegerArray
– AtomicIntegerFieldUpdater<T>
– AtomicLong
– AtomicLongArray
– AtomicLongFieldUpdater<T>
– AtomicMarkableReference<V>
– AtomicReference<V>
– AtomicReferenceArray<E>
– AtomicReferenceFieldUpdater<T,V>
– AtomicStampedReference<V>

AtomicInteger
public class AtomicInteger extends Number implements java.io.Serializable {
//...
private volatile int value;
public final void set(int newValue) {
value = newValue;
}
//...
public final void lazySet(int newValue) {
unsafe.putOrderedInt(this, valueOffset, newValue);
}
//...
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}
Atomic - asm

Unsafe.putOrdered*
StoreStore barrier

JEP 171: Fence Intrinsics
● loadFence: { OrderAccess::acquire(); }
● storeFence: { OrderAccess::release(); }
● fullFence: { OrderAccess::fence(); }
NonBlocking

Java memory model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Java memory model

Similar to Java memory model (20)

Recently uploaded

Recently uploaded (20)

Java memory model