Java memory model


Published on

Java Memory Mode, Atomic instructions, Non-blocking algorithms

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Java memory model

  1. 1. Java Memory ModelMichał Warecki
  2. 2. Outline● Introduction to JMM● Happens-before● Memory barriers● Performance issues● Atomicity● JEP 171● Non blocking algorithmsJavaC++ASM
  3. 3. Java Memory Model● Instructions reordering● Visibility● Final fields● Interaction with atomic instructions
  4. 4. Java Memory Model● The Java memory model (JMM) describes how threads inthe Java programming language interact throughmemory.● Provides sequential consistency for data race freeprograms.
  5. 5. Instructions reorderingProgram order:int a = 1;int b = 2;int c = 3;int d = 4;int e = a + b;int f = c – d;Execution order:int d = 4;int c = 3;int f = c – d;int b = 2;int a = 1;int e = a + b;
  6. 6. Quizx = y = 0x = 1j = yy = 1i = xWhat could be the result?Thread 1 Thread 2
  7. 7. Answer(s)● i = 1; j = 1● i = 0; j = 1● i = 1; j = 0● i = 0; j = 0
  8. 8. Happens-before orderTwo actions can be ordered by a happens-beforerelationship. If one action happens-before another, then thefirst is visible to and ordered before the second.Java Language Specification, Java SE 7 Edition
  9. 9. Happens-before rules● A monitor release and matching later monitor acquireestablish a happens before ordering.● A write to a volatile field happens-before everysubsequent read of that field.● Execution order within a thread also establishes ahappens before order.● Happens before order is transitive.
  10. 10. Java tools● Volatile variablesvolatile boolean running = true;● Monitorssynchronized (this) {i = a;a = i;}ReentrantLock lock = new ReentrantLock();lock.lock();lock.unlock();
  11. 11. What does volatile do?● Volatile reads/writes can not be reordered● Compilers and runtime are not allowed to allocate volatilevariables in registers● Volatile longs and doubles are atomic
  12. 12. Happens-before, volatile
  13. 13. Happens-before, Monitors
  14. 14. Volatiles and monitors orderingCan Reorder 2nd operation1st operation Normal LoadNormal StoreVolatile LoadMonitorEnterVolatile StoreMonitorExitNormal LoadNormal StoreNoVolatile LoadMonitorEnterNo No NoVolatile storeMonitorExitNo NoThe JSR-133 Cookbook for Compiler Writers
  15. 15. VisibilityThread 1:public void run() {int counter = 0;while (running) {counter++;}System.out.println("Counted upto " + counter);}Thread 2:public void run() {try {Thread.sleep(100);} catch (InterruptedExceptionignored) { }running = false;}LoopFlag
  16. 16. Visibility
  17. 17. How is it possible?● Compiler can reorder instructions.● Compiler can keep values in registers.● Processor can reorder instructions.● Values may not be synchronized to main memory.● JMM is designed to allow aggressive optimizations.LoopFlag - volatile
  18. 18. VisibilityLoopFlag – asm - loop
  19. 19. Intel processor
  20. 20. Processor
  21. 21. Memory access time● Registers / Buffers: < 1ns● L1: ~1ns (3-4 cycles)● L2: ~3ns (10-12 cycles)● L3: ~15ns (40-45 cycles)● DRAM: ~65ns● QPI: ~40ns
  22. 22. Memory barriers● LoadLoad● StoreStore● LoadStore● StoreLoad
  23. 23. Memory barrier - LoadLoadThe sequence: Load1; LoadLoad; Load2Ensures that Load1s data are loaded before data accessedby Load2 and all subsequent load instructions are loaded. Ingeneral, explicit LoadLoad barriers are needed onprocessors that perform speculative loads and/or out-of-order processing in which waiting load instructions canbypass waiting stores. On processors that guarantee toalways preserve load ordering, the barriers amount to no-ops.The JSR-133 Cookbook for Compiler Writers
  24. 24. Memory barrier - StoreStoreThe sequence: Store1; StoreStore; Store2Ensures that Store1s data are visible to other processors(i.e., flushed to memory) before the data associated withStore2 and all subsequent store instructions. In general,StoreStore barriers are needed on processors that do nototherwise guarantee strict ordering of flushes from writebuffers and/or caches to other processors or main memory.The JSR-133 Cookbook for Compiler Writers
  25. 25. Memory barrier - LoadStoreThe sequence: Load1; LoadStore; Store2Ensures that Load1s data are loaded before all dataassociated with Store2 and subsequent store instructionsare flushed. LoadStore barriers are needed only on thoseout-of-order procesors in which waiting store instructionscan bypass loads.The JSR-133 Cookbook for Compiler Writers
  26. 26. Memory barrier - StoreLoadThe sequence: Store1; StoreLoad; Load2Ensures that Store1s data are made visible to other processors (i.e., flushed tomain memory) before data accessed by Load2 and all subsequent load instructionsare loaded. StoreLoad barriers protect against a subsequent load incorrectly usingStore1s data value rather than that from a more recent store to the same locationperformed by a different processor. Because of this, on the processors discussedbelow, a StoreLoad is strictly necessary only for separating stores from subsequentloads of the same location(s) as were stored before the barrier. StoreLoad barriersare needed on nearly all recent multiprocessors, and are usually the mostexpensive kind. Part of the reason they are expensive is that they must disablemechanisms that ordinarily bypass cache to satisfy loads from write-buffers. Thismight be implemented by letting the buffer fully flush, among other possible stalls.The JSR-133 Cookbook for Compiler Writers
  27. 27. Memory barriersRequiredbarriers2nd operation1st operationNormal Load Normal Store Volatile LoadMonitorEnterVolatile StoreMonitorExitNormal Load LoadStoreNormal Store StoreStoreVolatile LoadMonitorEnterLoadLoad LoadStore LoadLoad LoadStoreVolatile StoreMonitorExitStoreLoad StoreStoreThe JSR-133 Cookbook for Compiler Writers
  28. 28. Intel X86/64 Memory Model● Loads are not reordered with other loads.● Stores are not reordered with other stores.● Stores are not reordered with older loads.● Loads may be reordered with older stores to different locations butnot with older stores to the same location.● In a multiprocessor system, memory ordering obeys causality (memoryordering respects transitive visibility).● In a multiprocessor system, stores to the same location have a total order.● In a multiprocessor system, locked instructions have a total order.● Loads and stores are not reordered with locked instructions.LoopFlag – asm - store, MemoryBarriers – asm
  29. 29. StoreLoad on Intel Ivy Bridgelock addl $0x0,(%rsp)Intels IA-32 developer manual: Locked operations areatomic with respect to all other memory operations and allexternally visible events. [...] Locked instructions can beused to synchronize data written by one processor and readby another processor.
  30. 30. Volatile performanceNormal write Volatile write Normal read Volatile read0200000000400000000600000000800000000100000000012000000001000000000operationsJiT - asm
  31. 31. Memory barriers - architectureProcessor LoadStore LoadLoad StoreStore StoreLoad Datadependencyordersloads?AtomicConditionalOtherAtomicsAtomicsprovidebarrier?sparc-TSO no-op no-op no-op membar(StoreLoad)yes CAS:casaswap,ldstubfullx86 no-op no-op no-op mfence orcpuid orlockedinsnyes CAS:cmpxchgxchg,lockedinsnfullia64 combinewithst.rel orld.acqld.acq st.rel mf yes CAS:cmpxchgxchg,fetchaddtarget +acq/relarm dmb(see below)dmb(see below)dmb-st dmb indirectiononlyLL/SC:ldrex/strextargetonlyppc lwsync(see below)lwsync(see below)lwsync hwsync indirectiononlyLL/SC:ldarx/stwcxtargetonlyalpha mb mb wmb mb no LL/SC:ldx_l/stx_ctargetonlypa-risc no-op no-op no-op no-op yes buildfromldcwldcw (NA)The JSR-133 Cookbook for Compiler Writers* The x86 processors supporting "streaming SIMD" SSE2 extensions require LoadLoad "lfence" only only in connection with thesestreaming instructions.
  32. 32. Final fields● Act as a normal field, but:– A store of a final field (inside a constructor) and, if the fieldis a reference, any store that this final can reference, cannotbe reordered with a subsequent store (outside thatconstructor) of the reference to the object holding that fieldinto a variable accessible to other threads. (x.finalField =v; ... ; sharedRef = x;)– The initial load (i.e., the very first encounter by a thread) ofa final field cannot be reordered with the initial load of thereference to the object containing the final field. (v.afield =1; x.finalField = v; ... ; sharedRef = x;)
  33. 33. Final field exampleclass FinalFieldExample {final int x;int y;static FinalFieldExample f;public FinalFieldExample() {x = 3;y = 4;}static void writer() {f = new FinalFieldExample();}static void reader() {if (f != null) {int i = f.x;int j = f.y;}}}
  34. 34. Final field exampleclass FinalFieldExample {final int x;int y;static FinalFieldExample f;public FinalFieldExample() {x = 3;y = 4;}static void writer() {f = new FinalFieldExample();}static void reader() {if (f != null) {int i = f.x;int j = f.y;}}}Guaranteed value 34 or 0 !!
  35. 35. ●Atomicity● java.util.concurrent.atomic– AtomicBoolean– AtomicInteger– AtomicIntegerArray– AtomicIntegerFieldUpdater<T>– AtomicLong– AtomicLongArray– AtomicLongFieldUpdater<T>– AtomicMarkableReference<V>– AtomicReference<V>– AtomicReferenceArray<E>– AtomicReferenceFieldUpdater<T,V>– AtomicStampedReference<V>
  36. 36. AtomicIntegerpublic class AtomicInteger extends Number implements {//...private volatile int value;public final void set(int newValue) {value = newValue;}//...public final void lazySet(int newValue) {unsafe.putOrderedInt(this, valueOffset, newValue);}//...public final boolean compareAndSet(int expect, int update) {return unsafe.compareAndSwapInt(this, valueOffset, expect, update);}Atomic - asm
  37. 37. Unsafe.putOrdered*StoreStore barrier
  38. 38. JEP 171: Fence Intrinsics● loadFence: { OrderAccess::acquire(); }● storeFence: { OrderAccess::release(); }● fullFence: { OrderAccess::fence(); }NonBlocking
  39. 39. Thanks!Questions?