SlideShare a Scribd company logo
Memory Model 
Mingdong Liao
Overview 
• Simple definition of memory model. 
• Optimizations. 
• HW: SC & TSO & RC: Strong and Weak. 
• SW: Ordering & C++11 memory model. 
• Further reading.
Lock-free programming 
Joggling razor blades. 
--Herb Sutter 
Just don’t do it, use lock!
Memory model(Consistency Model) 
• “the memory model specifies the allowed 
behavior of multithreaded programs executing 
with shared memory.”[1] 
• “consistency(memory model) provide rules 
about loads and stores and how they act upon 
memory.”[1] 
 A contract between software and hardware.
Does the computer execute the program you wrote. 
NO! 
Source code 
compiler 
processor 
caches 
execution
Real-world example 
g_a == 0 g_b = 42 
g_a == 24 g_b = 0 
g_a == 24 g_b = 42 
g_a == 0 g_b == 0 ??
How dare they change my code! 
• The program you wrote is not what you want. 
• Transformation to make better performance. 
• As long as long they have the same effects.
Optimizations 
Z = 3 
Y = 2 
X = 1 
// use X , Y, Z 
X = 1 
Y = 2 
Z = 3 
// use X ,Y, Z 
X = 1 
Y = 2 
X = 3 
// use X and Y 
Y = 2 
X = 3 
// use X and Y 
for(i = 0; i < cols; ++i) 
for(j = 0; j < rows; ++j) 
a[j*rows + i] += 42; 
for(j = 0; j < rows; ++j) 
for(i = 0; i < cols; ++i) 
a[j*rows + i] += 42; 
Optimizations are ubiquitous: compiler, processor will do whatever they 
see fit to optimize your code to improve performance.
Memory model from HW’s perspective 
Shared memory support for multicore computer 
system is the source of all these difficulties.
Memory architecture 
• The effect of memory operation. 
Core Memory 
Core1 
Core2 Memory 
Core3 
Accesses to memory are serialized.
Cache(and store buffer) 
Core 1 Core 2 
Store buffer 
Memory 
cache 
2 issues arise: 
a. coherence. (invisible to software) 
b. consistency. 
How to order stores and loads to the memory? 
core 1 
S1: store data = d1 
S2: store flag = d2 
core 2 
L1: load r1 = flag 
B1: if(r1 != d2) goto L1 
L2: load r2 = data 
Key point: 
 Writes are not automatically visible. 
 Reads/writes are not necessarily performed in order. 
Store buffer 
cache
Program order & memory order 
• Program order: the order of execution in the 
program. 
 what programmer wants. 
• Memory order: the order of the corresponding 
operation with respect to memory. 
 the observed order.
Sequential consistency 
Program order is the same as memory order for 
every single thread. 
core1 core2 
If L(a) <p L(b)  L(a) <m L(b) 
If L(a) <p S(b)  L(a) <m S(b) 
If S(a) <p S(b)  S(a) <m S(b) 
If S(a) <p L(b)  S(a) <m L(b) 
Every load gets its value from the last 
store before it in memory order. 
store 
load 
store 
store 
load 
simple & easy to program with. 
performance optimizations are constrained.
Total store order(TS) 
Also known as “processor consistency”, used in x86/64, SPARC, etc. 
core1 core2 
If L(a) <p L(b)  L(a) <m L(b) 
If L(a) <p S(b)  L(a) <m S(b) 
If S(a) <p S(b)  S(a) <m S(b) 
If S(a) <p L(b)  S(a) <m L(b) 
Every load gets its value from the last 
store before it in memory order or in 
program order. 
store 
load 
store 
store 
load 
Need fence to accomplish SC.
Memory fence 
• independent memory operations are effectively 
performed in random order. 
• Need a way to instruct compiler and processor to 
restrict the order. 
 Memory fence, a per cpu based intervention. 
• Fences are not guaranteed to have any effect on 
other cpu. 
• Fences do not guarantee what order other cpu will 
see.
Release consistency 
• Provide 2 types of operation(fence). 
a) acquire operation. 
b) release operation. 
Acquire operation 
Release operation 
Memory operations are not 
allowed to move up across 
Key observations: 
Acquire operation indicates the start of an critical section. 
Release operation indicates the end of an critical section. 
Memory operations are 
not allowed to move 
down across
Memory model from SW’s perspective 
Software memory model 
X86/64 PowerPC ARM 
The other part of the contract for SW to obey
Ordering 
Down to the earth: it is all about side effect of the execution of your program 
with respect to memory interaction. 
a) Memory operations in program order are not the same as memory order. 
b) Use of fence to prevent the potential ordering.
How does ordering matter 
1: load(g_y) 
2: load(g_x) 
3: store(g_x) 
4: store(g_y) 
Non-deterministic reordering makes program nearly impossible to reason about.
How does ordering matter? 
• One more try, Peterson’s algorithm on x86/64. 
int g_victim; 
bool g_flag[2]; 
void lock1() 
{ 
g_flag[o] = true; 
g_victim = 0; 
while (g_flag[1] && g_victim == 0); 
// lock acquired. 
} 
void unlock1() 
{ 
g_flag[0] = false; 
} 
void lock2() 
{ 
g_flag[1] = true; 
g_victim = 1; 
while (g_flag[0] && g_victim == 1); 
// lock acquired. 
} 
void unlock2() 
{ 
g_flag[1] = false; 
} 
Thread 0 
Store(g_flag[0]) 
Store(g_victim) 
Load(g_flag[1]) 
Load(g_victim) 
Thread 1 
Store(g_flag[1]) 
Store(g_victim) 
Load(g_flag[0]) 
Load(g_victim)
Is reordering that bad? 
 Yes.  No. 
 It depends. 
As long as we don’t see the reordering, whatever it is! 
 Hardware loves to do reordering in order to optimize 
performance. 
 Software, however, need SC to ensure correct code.
SC-DRF 
• Fully sequential consistency, ideal world. 
 execute the code you wrote. 
 what most programmers expect. 
• SC-DRF: sequential consistency for data race free, 
the reality. 
 Compromise between software and hardware! 
As long as you don’t write data race code, HW guarantees you the illusion 
of fully sequential consistency.
Race condition 
• A memory location is simultaneously accessed 
by two or more threads, and at least one 
thread is a writer. 
• Key point: transaction. 
1) atomic: no torn-read or torn-write. 
2) visibility: propagate side effect from thread 
to thread.
Critical section 
• Race condition occurs only when we have to 
manipulate shared variables. 
• Create a critical region to serialize the accesses. 
 a way to implement transaction.
Critical section 
Execution of shared variables 
Good fence makes good neighbor 
Reordering within critical section? 
 As long as they don’t move out 
of the section. 
Acquire fence 
Release fence 
 
 
 Full fence will work, but acquire and release operation are better.
c++11 atomic 
• Operations on atomic type are performed atomically, AKA, synchronization operations. 
• User can specify the memory ordering for every load & store. 
template <class T> struct atomic { 
bool is_lock_free() const noexcept; 
void store(T, memory_order = memory_order_seq_cst) noexcept; 
T load(memory_order = memory_order_seq_cst) const noexcept; 
T exchange(T, memory_order = memory_order_seq_cst) noexcept; 
bool compare_exchange_weak(T&, T, memory_order, memory_order) noexcept; 
bool compare_exchange_strong(T&, T, memory_order, memory_order) noexcept; 
bool compare_exchange_weak(T&, T, memory_order = memory_order_seq_cst) noexcept; 
bool compare_exchange_strong(T&, T, memory_order = memory_order_seq_cst) noexcept; 
}; 
Synchronization operations specify how assignments in one thread visible to another. 
[c++ standard: 1.10.5]
C++11 memory order 
namespace std { 
typedef enum memory_order { 
memory_order_relaxed, // no ordering constraint. 
memory_order_acquire, // load operation using this order is an acquire operation. 
memory_order_consume, // a weaker version of acquire semantic. 
memory_order_release, // store operation using this order is an release operation. 
memory_order_acq_rel, // both, for RMW operation: eg, exchange(). 
memory_order_seq_cst // sequential consistency, like memory_order_acq_rel, 
// plus a single total order on all memory_order_acq_rel operation. 
} memory_order; 
} 
Note: applied only to read and write performed to the same memory location.
Acquire/release and Consume/release 
atomic<int> guard(0); 
int pay_load = 0; 
// thread 0 
pay_load = 1; 
guard.store(1, memory_order_release); 
// thread 1 
int pay; 
int g = guard.load(memory_order_acquire); 
If (g) pay = pay_load; 
atomic<int*> guard(0); 
int pay_load = 0; 
// thread 0 
pay_load = 1; 
guard.store(&pay_load, memory_order_release); 
// thread 1 
int pay; 
Int* g = guard.load(memory_order_consume); 
If (g) pay = *g; 
 g mush carry a dependency to pay = *g 
 data dependency 
On most weak-order architectures, memory ordering between data dependent 
instructions is preserved, in such case explicit memory fence is not necessary.[7]
memory_order_seq_cst 
• Order memory operation the same way as release and 
acquire. 
• Establish a single total order on all memory_order_seq_cst 
operations. 
Suppose x,y are atomic variables and are initialized to 0.[6] 
Thread 1 
x = 1 
Thread 2 
y = 1 
Thread 3 
if (y = 1 && x == 0) 
cout << “y first”; 
Thread 4 
if (y = 0 && x == 1) 
cout << “x first”; 
 Must not allow to print both messages.
C++11 memory fence 
data.store(3, std::memory_order_relaxed); 
std::atomic_thread_fence(std::memory_order_release); 
flag.store(1, std::memory_order_relaxed); 
flag2.store(2, std::memory_order_relaxed); 
• It is different from what you think comparing to a 
traditional fence. 
• More like a way to do synchronization. 
Above code is NOT equivalent to the following: 
extern "C" void atomic_thread_fence(memory_order order) noexcept; 
data.store(3, std::memory_order_relaxed); 
flag.store(1, std::memory_order_release); 
flag2.store(2, std::memory_order_relaxed); 
 A release fence prevents all preceding memory operations from reordered past all 
subsequent writes. 
 flag2.store() is allowed reorder before data.store(). 
// other memory operation preceding the fence. 
std::atomic_thread_fence(std::memory_order_release); 
flag.store(1, std::memory_order_relaxed); 
An acquire fence prevents all subsequent memory operations from reordered 
past the all preceding read. 
flag.load(1, std::memory_order_relaxed); 
std::atomic_thread_fence(std::memory_order_acquire); 
// other memory operations.
Quiz 
Hint: 
a. Need an acquire before load of g_y in foo1(). 
b. Need an acquire before load of g_x in foo2(). 
Can we accomplish that? 
 acquire/release is pairwise operations. 
State what order is needed to prevent reordering?
Quiz: Peterson’s algo again. 
atomic<int> g_victim; 
atomic<bool> g_flag[2]; 
void lock1() 
{ 
g_flag[o].store(true, ?); 
g_victim.store(0, ?); 
while (g_flag[1].load(?) && g_victim.load(?) == 0); 
// lock acquired. 
} 
void unlock1() 
{ 
g_flag[0].store(false, ?); 
} 
Thread 0 
Store(g_flag[0]) 
Store(g_victim) 
Load(g_flag[1]) 
Load(g_victim) 
Thread 1 
Store(g_flag[1]) 
Store(g_victim) 
Load(g_flag[0]) 
Load(g_victim) 
atomic<int> g_victim; 
atomic<bool> g_flag[2]; 
void lock1() 
{ 
g_flag[o].store(true, memory_order_relaxed); 
g_victim.exchange(0, memory_order_acq_rel); 
while (g_flag[1].load(memory_order_acquire) 
&& g_victim.load(memory_order_relaxed) == 0); 
// lock acquired. 
} 
void unlock1() 
{ 
g_flag[0].store(false, memory_order_release); 
} 
Atomic read-modify-write operations shall always read the last value (in the modification order) written 
before the write associated with the read-modify-write operation.[standard §29.3.12]
A few terms: synchronize with 
Thread 1: 
• An operation A synchronizes-with an operation B if: 
1) A is a store Data = 42 
to some atomic variable m, with an 
Flag = 1 
ordering ofstd::memory_order_release, 
or std::memory_order_seq_cst. 
2) B is a load from the same variable m, with an ordering 
of std::memory_order_acquire or std::Thread memory_2: 
order_seq 
_cst. 
R1 = Flag 
If (R1== 1) Use data 
3) and B reads the value stored by A.
A few terms: dependency-ordered before 
An operation A dependency-ordered before an operation B if: 
1) A is a store to some atomic variable m, with an ordering 
Thread 1: 
ofstd::memory_order_release, or std::memory_order_seq_cst. 
Data = 42 
Flag = &Data 
2) B is a load from the same variable m, with an ordering 
of std::memory_order_consume. 
Thread 2: 
3) and B reads the value stored by the “release sequence headed 
by A. 
R1 = Flag 
If (R1) Use R1
A few terms: happen before 
 Sequence before: the order of evaluations within a single thread 
 Or synchronize with. 
 Or dependency-ordered before. 
 Or concatenations of the above 3 relationships 
with 2 exceptions.[standard 1.10.11] 
 happen-before indicates visibility.
volatile 
• A compiler aware semantic. 
 compiler guarantees that no reordering, no 
optimization enforced for this variable. 
 other thread may not see this guarantee. 
 has nothing to do with inter-thread synchronization. 
• Not an atomic operation.
Further reading 
• [1]https://class.stanford.edu/c4x/Engineering/CS316/asset/A_Primer_on_M 
emory_Consistency_and_Coherence.pdf 
• [2]http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=F2BAAAED623 
D54B73C5FF41DF14D5864?doi=10.1.1.17.8112&rep=rep1&type=pdf 
• [3]http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2075.pdf 
• [4]http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1942.html 
• [5]http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2664.htm 
• [5]http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html 
• [6]http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012- 
Herb-Sutter-atomic-Weapons-1-of-2 
• [7]https://www.kernel.org/doc/Documentation/memory-barriers.txt 
• [8]www.preshing.com

More Related Content

What's hot

Slide tesi
Slide tesiSlide tesi
Slide tesi
Nicolò Savioli
 
We Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle Games
We Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle GamesWe Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle Games
We Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle Games
Unity Technologies
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementation
nkslides
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
David Evans
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectre
Kim Phillips
 
Linux Serial Driver
Linux Serial DriverLinux Serial Driver
Linux Serial Driver
艾鍗科技
 
MyShell - English
MyShell - EnglishMyShell - English
MyShell - English
Johnnatan Messias
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
Amgad Muhammad
 
Let'swift "Concurrency in swift"
Let'swift "Concurrency in swift"Let'swift "Concurrency in swift"
Let'swift "Concurrency in swift"
Hyuk Hur
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
Subhajit Sahu
 
BlueHat v18 || Hardening hyper-v through offensive security research
BlueHat v18 || Hardening hyper-v through offensive security researchBlueHat v18 || Hardening hyper-v through offensive security research
BlueHat v18 || Hardening hyper-v through offensive security research
BlueHat Security Conference
 
Paper id 71201933
Paper id 71201933Paper id 71201933
Paper id 71201933
IJRAT
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
douglaslyon
 
Brace yourselves, leap second is coming
Brace yourselves, leap second is comingBrace yourselves, leap second is coming
Brace yourselves, leap second is coming
Nati Cohen
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
Prashant Rane
 
CUDA by Example : Thread Cooperation : Notes
CUDA by Example : Thread Cooperation : NotesCUDA by Example : Thread Cooperation : Notes
CUDA by Example : Thread Cooperation : Notes
Subhajit Sahu
 
Disruptor
DisruptorDisruptor
Disruptor
Larry Nung
 
Using Entity Command Buffers – Unite Copenhagen 2019
Using Entity Command Buffers – Unite Copenhagen 2019Using Entity Command Buffers – Unite Copenhagen 2019
Using Entity Command Buffers – Unite Copenhagen 2019
Unity Technologies
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
Codemotion Tel Aviv
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
Stefan Marr
 

What's hot (20)

Slide tesi
Slide tesiSlide tesi
Slide tesi
 
We Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle Games
We Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle GamesWe Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle Games
We Love Performance! How Tic Toc Games Uses ECS in Mobile Puzzle Games
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementation
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectre
 
Linux Serial Driver
Linux Serial DriverLinux Serial Driver
Linux Serial Driver
 
MyShell - English
MyShell - EnglishMyShell - English
MyShell - English
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Let'swift "Concurrency in swift"
Let'swift "Concurrency in swift"Let'swift "Concurrency in swift"
Let'swift "Concurrency in swift"
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
BlueHat v18 || Hardening hyper-v through offensive security research
BlueHat v18 || Hardening hyper-v through offensive security researchBlueHat v18 || Hardening hyper-v through offensive security research
BlueHat v18 || Hardening hyper-v through offensive security research
 
Paper id 71201933
Paper id 71201933Paper id 71201933
Paper id 71201933
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
Brace yourselves, leap second is coming
Brace yourselves, leap second is comingBrace yourselves, leap second is coming
Brace yourselves, leap second is coming
 
Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
 
CUDA by Example : Thread Cooperation : Notes
CUDA by Example : Thread Cooperation : NotesCUDA by Example : Thread Cooperation : Notes
CUDA by Example : Thread Cooperation : Notes
 
Disruptor
DisruptorDisruptor
Disruptor
 
Using Entity Command Buffers – Unite Copenhagen 2019
Using Entity Command Buffers – Unite Copenhagen 2019Using Entity Command Buffers – Unite Copenhagen 2019
Using Entity Command Buffers – Unite Copenhagen 2019
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
 

Similar to Memory model

Java memory model
Java memory modelJava memory model
Java memory model
Michał Warecki
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
Łukasz Koniecki
 
Data race
Data raceData race
Data race
James Wong
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
ugur candan
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
Viller Hsiao
 
bluespec talk
bluespec talkbluespec talk
bluespec talk
Suman Karumuri
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
Adrian Huang
 
Architecture Assignment Help
Architecture Assignment HelpArchitecture Assignment Help
Architecture Assignment Help
Architecture Assignment Help
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
Kernel TLV
 
Meltdown & spectre
Meltdown & spectreMeltdown & spectre
Meltdown & spectre
Sergio Shevchenko
 
Meltdown & Spectre
Meltdown & Spectre Meltdown & Spectre
Meltdown & Spectre
Marco Cipriano
 
Chapter 8 : Memory
Chapter 8 : MemoryChapter 8 : Memory
Chapter 8 : Memory
Amin Omi
 
ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata
EvonCanales257
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
inside-BigData.com
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
tidwellveronique
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
tidwellveronique
 
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
corehard_by
 
Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?
greenwop
 
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docxPlease do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
ARIV4
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
Jeremy Leisy
 

Similar to Memory model (20)

Java memory model
Java memory modelJava memory model
Java memory model
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Data race
Data raceData race
Data race
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
 
bluespec talk
bluespec talkbluespec talk
bluespec talk
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Architecture Assignment Help
Architecture Assignment HelpArchitecture Assignment Help
Architecture Assignment Help
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Meltdown & spectre
Meltdown & spectreMeltdown & spectre
Meltdown & spectre
 
Meltdown & Spectre
Meltdown & Spectre Meltdown & Spectre
Meltdown & Spectre
 
Chapter 8 : Memory
Chapter 8 : MemoryChapter 8 : Memory
Chapter 8 : Memory
 
ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
 
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
 
Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?
 
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docxPlease do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 

Recently uploaded

LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 

Recently uploaded (20)

LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 

Memory model

  • 2. Overview • Simple definition of memory model. • Optimizations. • HW: SC & TSO & RC: Strong and Weak. • SW: Ordering & C++11 memory model. • Further reading.
  • 3. Lock-free programming Joggling razor blades. --Herb Sutter Just don’t do it, use lock!
  • 4. Memory model(Consistency Model) • “the memory model specifies the allowed behavior of multithreaded programs executing with shared memory.”[1] • “consistency(memory model) provide rules about loads and stores and how they act upon memory.”[1]  A contract between software and hardware.
  • 5. Does the computer execute the program you wrote. NO! Source code compiler processor caches execution
  • 6. Real-world example g_a == 0 g_b = 42 g_a == 24 g_b = 0 g_a == 24 g_b = 42 g_a == 0 g_b == 0 ??
  • 7. How dare they change my code! • The program you wrote is not what you want. • Transformation to make better performance. • As long as long they have the same effects.
  • 8. Optimizations Z = 3 Y = 2 X = 1 // use X , Y, Z X = 1 Y = 2 Z = 3 // use X ,Y, Z X = 1 Y = 2 X = 3 // use X and Y Y = 2 X = 3 // use X and Y for(i = 0; i < cols; ++i) for(j = 0; j < rows; ++j) a[j*rows + i] += 42; for(j = 0; j < rows; ++j) for(i = 0; i < cols; ++i) a[j*rows + i] += 42; Optimizations are ubiquitous: compiler, processor will do whatever they see fit to optimize your code to improve performance.
  • 9. Memory model from HW’s perspective Shared memory support for multicore computer system is the source of all these difficulties.
  • 10. Memory architecture • The effect of memory operation. Core Memory Core1 Core2 Memory Core3 Accesses to memory are serialized.
  • 11. Cache(and store buffer) Core 1 Core 2 Store buffer Memory cache 2 issues arise: a. coherence. (invisible to software) b. consistency. How to order stores and loads to the memory? core 1 S1: store data = d1 S2: store flag = d2 core 2 L1: load r1 = flag B1: if(r1 != d2) goto L1 L2: load r2 = data Key point:  Writes are not automatically visible.  Reads/writes are not necessarily performed in order. Store buffer cache
  • 12. Program order & memory order • Program order: the order of execution in the program.  what programmer wants. • Memory order: the order of the corresponding operation with respect to memory.  the observed order.
  • 13. Sequential consistency Program order is the same as memory order for every single thread. core1 core2 If L(a) <p L(b)  L(a) <m L(b) If L(a) <p S(b)  L(a) <m S(b) If S(a) <p S(b)  S(a) <m S(b) If S(a) <p L(b)  S(a) <m L(b) Every load gets its value from the last store before it in memory order. store load store store load simple & easy to program with. performance optimizations are constrained.
  • 14. Total store order(TS) Also known as “processor consistency”, used in x86/64, SPARC, etc. core1 core2 If L(a) <p L(b)  L(a) <m L(b) If L(a) <p S(b)  L(a) <m S(b) If S(a) <p S(b)  S(a) <m S(b) If S(a) <p L(b)  S(a) <m L(b) Every load gets its value from the last store before it in memory order or in program order. store load store store load Need fence to accomplish SC.
  • 15. Memory fence • independent memory operations are effectively performed in random order. • Need a way to instruct compiler and processor to restrict the order.  Memory fence, a per cpu based intervention. • Fences are not guaranteed to have any effect on other cpu. • Fences do not guarantee what order other cpu will see.
  • 16. Release consistency • Provide 2 types of operation(fence). a) acquire operation. b) release operation. Acquire operation Release operation Memory operations are not allowed to move up across Key observations: Acquire operation indicates the start of an critical section. Release operation indicates the end of an critical section. Memory operations are not allowed to move down across
  • 17. Memory model from SW’s perspective Software memory model X86/64 PowerPC ARM The other part of the contract for SW to obey
  • 18. Ordering Down to the earth: it is all about side effect of the execution of your program with respect to memory interaction. a) Memory operations in program order are not the same as memory order. b) Use of fence to prevent the potential ordering.
  • 19. How does ordering matter 1: load(g_y) 2: load(g_x) 3: store(g_x) 4: store(g_y) Non-deterministic reordering makes program nearly impossible to reason about.
  • 20. How does ordering matter? • One more try, Peterson’s algorithm on x86/64. int g_victim; bool g_flag[2]; void lock1() { g_flag[o] = true; g_victim = 0; while (g_flag[1] && g_victim == 0); // lock acquired. } void unlock1() { g_flag[0] = false; } void lock2() { g_flag[1] = true; g_victim = 1; while (g_flag[0] && g_victim == 1); // lock acquired. } void unlock2() { g_flag[1] = false; } Thread 0 Store(g_flag[0]) Store(g_victim) Load(g_flag[1]) Load(g_victim) Thread 1 Store(g_flag[1]) Store(g_victim) Load(g_flag[0]) Load(g_victim)
  • 21. Is reordering that bad?  Yes.  No.  It depends. As long as we don’t see the reordering, whatever it is!  Hardware loves to do reordering in order to optimize performance.  Software, however, need SC to ensure correct code.
  • 22. SC-DRF • Fully sequential consistency, ideal world.  execute the code you wrote.  what most programmers expect. • SC-DRF: sequential consistency for data race free, the reality.  Compromise between software and hardware! As long as you don’t write data race code, HW guarantees you the illusion of fully sequential consistency.
  • 23. Race condition • A memory location is simultaneously accessed by two or more threads, and at least one thread is a writer. • Key point: transaction. 1) atomic: no torn-read or torn-write. 2) visibility: propagate side effect from thread to thread.
  • 24. Critical section • Race condition occurs only when we have to manipulate shared variables. • Create a critical region to serialize the accesses.  a way to implement transaction.
  • 25. Critical section Execution of shared variables Good fence makes good neighbor Reordering within critical section?  As long as they don’t move out of the section. Acquire fence Release fence    Full fence will work, but acquire and release operation are better.
  • 26. c++11 atomic • Operations on atomic type are performed atomically, AKA, synchronization operations. • User can specify the memory ordering for every load & store. template <class T> struct atomic { bool is_lock_free() const noexcept; void store(T, memory_order = memory_order_seq_cst) noexcept; T load(memory_order = memory_order_seq_cst) const noexcept; T exchange(T, memory_order = memory_order_seq_cst) noexcept; bool compare_exchange_weak(T&, T, memory_order, memory_order) noexcept; bool compare_exchange_strong(T&, T, memory_order, memory_order) noexcept; bool compare_exchange_weak(T&, T, memory_order = memory_order_seq_cst) noexcept; bool compare_exchange_strong(T&, T, memory_order = memory_order_seq_cst) noexcept; }; Synchronization operations specify how assignments in one thread visible to another. [c++ standard: 1.10.5]
  • 27. C++11 memory order namespace std { typedef enum memory_order { memory_order_relaxed, // no ordering constraint. memory_order_acquire, // load operation using this order is an acquire operation. memory_order_consume, // a weaker version of acquire semantic. memory_order_release, // store operation using this order is an release operation. memory_order_acq_rel, // both, for RMW operation: eg, exchange(). memory_order_seq_cst // sequential consistency, like memory_order_acq_rel, // plus a single total order on all memory_order_acq_rel operation. } memory_order; } Note: applied only to read and write performed to the same memory location.
  • 28. Acquire/release and Consume/release atomic<int> guard(0); int pay_load = 0; // thread 0 pay_load = 1; guard.store(1, memory_order_release); // thread 1 int pay; int g = guard.load(memory_order_acquire); If (g) pay = pay_load; atomic<int*> guard(0); int pay_load = 0; // thread 0 pay_load = 1; guard.store(&pay_load, memory_order_release); // thread 1 int pay; Int* g = guard.load(memory_order_consume); If (g) pay = *g;  g mush carry a dependency to pay = *g  data dependency On most weak-order architectures, memory ordering between data dependent instructions is preserved, in such case explicit memory fence is not necessary.[7]
  • 29. memory_order_seq_cst • Order memory operation the same way as release and acquire. • Establish a single total order on all memory_order_seq_cst operations. Suppose x,y are atomic variables and are initialized to 0.[6] Thread 1 x = 1 Thread 2 y = 1 Thread 3 if (y = 1 && x == 0) cout << “y first”; Thread 4 if (y = 0 && x == 1) cout << “x first”;  Must not allow to print both messages.
  • 30. C++11 memory fence data.store(3, std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_release); flag.store(1, std::memory_order_relaxed); flag2.store(2, std::memory_order_relaxed); • It is different from what you think comparing to a traditional fence. • More like a way to do synchronization. Above code is NOT equivalent to the following: extern "C" void atomic_thread_fence(memory_order order) noexcept; data.store(3, std::memory_order_relaxed); flag.store(1, std::memory_order_release); flag2.store(2, std::memory_order_relaxed);  A release fence prevents all preceding memory operations from reordered past all subsequent writes.  flag2.store() is allowed reorder before data.store(). // other memory operation preceding the fence. std::atomic_thread_fence(std::memory_order_release); flag.store(1, std::memory_order_relaxed); An acquire fence prevents all subsequent memory operations from reordered past the all preceding read. flag.load(1, std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_acquire); // other memory operations.
  • 31. Quiz Hint: a. Need an acquire before load of g_y in foo1(). b. Need an acquire before load of g_x in foo2(). Can we accomplish that?  acquire/release is pairwise operations. State what order is needed to prevent reordering?
  • 32. Quiz: Peterson’s algo again. atomic<int> g_victim; atomic<bool> g_flag[2]; void lock1() { g_flag[o].store(true, ?); g_victim.store(0, ?); while (g_flag[1].load(?) && g_victim.load(?) == 0); // lock acquired. } void unlock1() { g_flag[0].store(false, ?); } Thread 0 Store(g_flag[0]) Store(g_victim) Load(g_flag[1]) Load(g_victim) Thread 1 Store(g_flag[1]) Store(g_victim) Load(g_flag[0]) Load(g_victim) atomic<int> g_victim; atomic<bool> g_flag[2]; void lock1() { g_flag[o].store(true, memory_order_relaxed); g_victim.exchange(0, memory_order_acq_rel); while (g_flag[1].load(memory_order_acquire) && g_victim.load(memory_order_relaxed) == 0); // lock acquired. } void unlock1() { g_flag[0].store(false, memory_order_release); } Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.[standard §29.3.12]
  • 33. A few terms: synchronize with Thread 1: • An operation A synchronizes-with an operation B if: 1) A is a store Data = 42 to some atomic variable m, with an Flag = 1 ordering ofstd::memory_order_release, or std::memory_order_seq_cst. 2) B is a load from the same variable m, with an ordering of std::memory_order_acquire or std::Thread memory_2: order_seq _cst. R1 = Flag If (R1== 1) Use data 3) and B reads the value stored by A.
  • 34. A few terms: dependency-ordered before An operation A dependency-ordered before an operation B if: 1) A is a store to some atomic variable m, with an ordering Thread 1: ofstd::memory_order_release, or std::memory_order_seq_cst. Data = 42 Flag = &Data 2) B is a load from the same variable m, with an ordering of std::memory_order_consume. Thread 2: 3) and B reads the value stored by the “release sequence headed by A. R1 = Flag If (R1) Use R1
  • 35. A few terms: happen before  Sequence before: the order of evaluations within a single thread  Or synchronize with.  Or dependency-ordered before.  Or concatenations of the above 3 relationships with 2 exceptions.[standard 1.10.11]  happen-before indicates visibility.
  • 36. volatile • A compiler aware semantic.  compiler guarantees that no reordering, no optimization enforced for this variable.  other thread may not see this guarantee.  has nothing to do with inter-thread synchronization. • Not an atomic operation.
  • 37. Further reading • [1]https://class.stanford.edu/c4x/Engineering/CS316/asset/A_Primer_on_M emory_Consistency_and_Coherence.pdf • [2]http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=F2BAAAED623 D54B73C5FF41DF14D5864?doi=10.1.1.17.8112&rep=rep1&type=pdf • [3]http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2075.pdf • [4]http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1942.html • [5]http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2664.htm • [5]http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html • [6]http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012- Herb-Sutter-atomic-Weapons-1-of-2 • [7]https://www.kernel.org/doc/Documentation/memory-barriers.txt • [8]www.preshing.com

Editor's Notes

  1. At source code level, c++ programs appear sequential: first do this, then do that.
  2. Top 2 most difficult tasks in programming: Naming stuff. Caches invalidation.
  3. Before I explain why we have these optimization, let’s discuss why these optimizations are allowed.
  4. http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/