● When programming we use memory all the
– Reading/Writing data structures on the heap, stack
or data segment.
– Reading/Writing from/to hardware
– By “Memory” I do not refer to registers in this
presentation (since every core has it's own
What kinds of guarantees do we
want from memory operations?
● That the operation is not optimized away completely
● That the operation does not take place in registers (that are not
visible to other cores by definition)
● Visibility to other cores (bypass/flush/sync CPU caches)
● Visibility to hardware
● Order between such two or more memory operations
● Any combination of the above (possibly none of them...)
● Regular C programming (without using special features, that is), the
compiler and the CPU provides none of the above guarantees
So what guarantees does volatile
● It's just not clear!
● The first two, yes
● The others, maybe. No in a lot of the architectures.
● Depends on your compiler, it's version, your compilation flags,
the astrological sign of the compiler authors best friend...
● To be specific: most volatile implementations do not imply
atomicity or ordering.
● And what does volatile mean for bigger than word or int or
long structures? Pass me the joint as things are getting hazy...
● Don't use volatile (true in most cases!)
● Imagine the following code (with no compiler
● What states do you expect other cores to see?
● Or maybe
● Yes! The CPU does this. (well, not Intel, but others)
Is the Compiler/CPU allowed to do
● Yes. Actually there are many types of reordering that the
Compiler/CPU is allowed to perform
● Common CPU reordering include:
– Load reordered after load
– Load reordered after store
– Store reordered after load
– Store reordered after store
– Store reordered after atomics
– Load reordered after atomics
– Dependant load reordered (YES! Alpha does this, they should all
be locked up...)
So what do the compiler/CPU
They guarantee results in one thread.
This means that they may alter your code, reorder it, discard parts of it, use
different operations than the ones you use and more.
● But all of these guarantee that the results you will get will be the same, in
the same thread that they are in.
But sometimes you want your code to be left unaltered.
This is especially true when other threads or hardware is involved.
In these cases the order matters, the specific operations matter, etc.
Enter memory barrier/fence
● A machine memory barrier is a special machine instruction or a
special type of memory access instruction that guarantees order
of execution between memory instructions before it and after it.
● __sync_synchronize() in gcc (user space).
● asm volatile ("mfence" ::: "memory")
● (smp_?)mb(),(smp_?)rmb(),(smp_?)wmb() in kernel development.
● In most cases atomic operations imply a memory barrier of some sort
and new C++11 has nice API with memory model included.
OK, prove it to me...
● Time for a demo.
● Two threads, when we start we have:
● Could it be that R1==R2==0 at the end?
Hey, but I need volatile to overcome
● No, you don't
● There is something called a “compiler barrier”
● Compiler barriers usually offer several features:
– Forces the compiler to sync unsynchronized registers with memory so that memory writes
before the barrier will go to memory (no cache flush, no memory barrier)
– Forces the compiler to read from memory after the barrier even if the compiler thinks it knows
the value of certain memory locations.
– Forces order of memory operations at the compiler level (not machine level) in relation to the
barrier location in the code
● A compiler barrier is not a machine instruction (as opposed to memory barrier
● It is a compiler directive, influencing how to the compiler will generate machine code
after the directive is given.
● The compiler may emit machine instructions or it may not (depends on many factors)
● Time for another demo again...
● “What every programmer should know about memory” by
● “memory-barries.txt” from the Linux kernel.
● The example for memory barriers shown is derived from
“Memory Reordering Caught in the Act” by Jeff Preshing
● “Volatile_variable” from wikipedia
● “Memory_barrier” from wikipedia
● All examples can be found at linuxapi project at GitHub by