Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply



Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. 'volatile' is volatile
  • 2. Memory... ● When programming we use memory all the time – Reading/Writing data structures on the heap, stack or data segment. – Reading/Writing from/to hardware – By “Memory” I do not refer to registers in this presentation (since every core has it's own registers)
  • 3. What kinds of guarantees do we want from memory operations? ● That the operation is not optimized away completely ● That the operation does not take place in registers (that are not visible to other cores by definition) ● Visibility to other cores (bypass/flush/sync CPU caches) ● Visibility to hardware ● Atomicity ● Order between such two or more memory operations ● Any combination of the above (possibly none of them...) ● Regular C programming (without using special features, that is), the compiler and the CPU provides none of the above guarantees
  • 4. So what guarantees does volatile provide? ● It's just not clear! ● The first two, yes ● The others, maybe. No in a lot of the architectures. ● Depends on your compiler, it's version, your compilation flags, the astrological sign of the compiler authors best friend... ● To be specific: most volatile implementations do not imply atomicity or ordering. ● And what does volatile mean for bigger than word or int or long structures? Pass me the joint as things are getting hazy... ● Don't use volatile (true in most cases!)
  • 5. Memory reordering ● Imagine the following code (with no compiler optimizations): ● ● ● What states do you expect other cores to see? ● Or maybe ● ● Yes! The CPU does this. (well, not Intel, but others) X=5 Y=6 X=7 Y=8 X=5, Y=6 X=7, Y=6 X=7, Y=8 X=5, Y=6 X=5, Y=8 X=7, Y=8
  • 6. Is the Compiler/CPU allowed to do that? ● Yes. Actually there are many types of reordering that the Compiler/CPU is allowed to perform ● Common CPU reordering include: – Load reordered after load – Load reordered after store – Store reordered after load – Store reordered after store – Store reordered after atomics – Load reordered after atomics – Dependant load reordered (YES! Alpha does this, they should all be locked up...)
  • 7. So what do the compiler/CPU guarantee? ● They guarantee results in one thread. ● This means that they may alter your code, reorder it, discard parts of it, use different operations than the ones you use and more. ● But all of these guarantee that the results you will get will be the same, in the same thread that they are in. ● But sometimes you want your code to be left unaltered. ● This is especially true when other threads or hardware is involved. ● In these cases the order matters, the specific operations matter, etc.
  • 8. Enter memory barrier/fence ● A machine memory barrier is a special machine instruction or a special type of memory access instruction that guarantees order of execution between memory instructions before it and after it. ● __sync_synchronize() in gcc (user space). ● asm volatile ("mfence" ::: "memory") ● (smp_?)mb(),(smp_?)rmb(),(smp_?)wmb() in kernel development. ● In most cases atomic operations imply a memory barrier of some sort and new C++11 has nice API with memory model included.
  • 9. OK, prove it to me... ● Time for a demo. ● Two threads, when we start we have: ● ● ● ● ● Could it be that R1==R2==0 at the end? X=0 Y=0 X=1 R1=Y Y=1 R2=X
  • 10. Hey, but I need volatile to overcome the compiler! ● No, you don't ● There is something called a “compiler barrier” ● Compiler barriers usually offer several features: – Forces the compiler to sync unsynchronized registers with memory so that memory writes before the barrier will go to memory (no cache flush, no memory barrier) – Forces the compiler to read from memory after the barrier even if the compiler thinks it knows the value of certain memory locations. – Forces order of memory operations at the compiler level (not machine level) in relation to the barrier location in the code ● A compiler barrier is not a machine instruction (as opposed to memory barrier ● It is a compiler directive, influencing how to the compiler will generate machine code after the directive is given. ● The compiler may emit machine instructions or it may not (depends on many factors) ● Time for another demo again...
  • 11. References ● “What every programmer should know about memory” by Ulrich Drepper ● “memory-barries.txt” from the Linux kernel. ● The example for memory barriers shown is derived from “Memory Reordering Caught in the Act” by Jeff Preshing ● “Volatile_variable” from wikipedia ● “Memory_barrier” from wikipedia ● All examples can be found at linuxapi project at GitHub by me.
  • 12. Questions?