Overview
2
●
Transitive intuitions (use-case view)
–
●
Singular intuitive bliss, locking intuitions, release-
acquire intuitions, RCU intuitions, and fully-ordered
phantasms
Rules of thumb (memory-model view)
– Singular intuitive bliss, load buffering, almost load
buffering, multiple non-store-to-load links
Transitive Intuitions
3
Singular Intuitive Bliss
4
●
●
Single thread: All accesses seen in order
Single shared variable: All threads agree on at
least one order for all accesses
– But be careful of mixed-size accesses!!!
– And of “at least one order”...
Test of Singular Intuitive Bliss
5
●
●
CPU 0 controls test running on PowerPC
CPUs 1-15:
– Write their CPU number to shared variable
– Loop re-reading variable and recording timestamp
●
Next slide presents results
Results of Singular Intuitive Bliss
6
Each tick is 5.3 nanoseconds on 1.5 GHz POWER5 system
Results of Singular Intuitive Bliss
7
Each tick is 5.3 nanoseconds on 1.5 GHz POWER5 system
Orderings of Singular Intuitive Bliss
8
Singular Intuitive Bliss Cautions
9
●
●
Compilers assume that plain C-language
memory accesses are not to shared variables!!!
Mark your accesses to disabuse the compiler
–
–
– Linux-kernel, C, C++ atomics
Linux-kernel READ_ONCE() and WRITE_ONCE()
Extremely careful use of volatile
Singular Intuitive Bliss Cautions
10
●
Compiler is within its rights to transform this:
while (a)
do_something();
●
Into this (and this really happens!!!):
if (a)
for (;;)
do_something();
● Use READ_ONCE(a) to constrain compiler
Singular Intuitive Bliss Cautions
11
●
Compiler is within its rights to transform this:
x = 0x10002;
●
Into this (and this really happens!!!):
*(unsigned short)(((uintptr_t)x) + 2) = 0x1;
*(unsigned short)((uintptr_t)x) = 0x2;
● Use WRITE_ONCE(x, 0x10002) to constrain
compiler
More Compiler Cautionary Tales
12
●
●
“Who's afraid of a big bad optimizing compiler?” (series)
– https://lwn.net/Articles/793253, https://lwn.net/Articles/799218
“An introduction to lockless algorithms” (Paolo Bonzini series)
–
●
https://lwn.net/Articles/844224, https://lwn.net/Articles/846700,
https://lwn.net/Articles/847481, https://lwn.net/Articles/847973,
https://lwn.net/Articles/849237, https://lwn.net/Articles/850202
“Is Parallel Programming Hard, And, If So, What Can You Do About It?”
Section 4.3.4 (“Accessing Shared Variables”)
– https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/
perfbook.html
Locking Intuitions
13
Locking Intuitions: Graphical
o
L
k
c
CPU
0
Before
Critical
Section
Lock
Critical
Section
Unlock
After
Critical
Section
Before
Critical
Section
o
L
k
c
Unlock
Lock
Critical
Section
After
Critical
Section
CPU
1
Before
Critical
Section
o
L
k
c
Unlock
Lock
Critical
Section
After
Critical
Section
CPU
2
14
Locking Intuitions: Wall o’Text
15
●
While holding or after releasing a lock:
– A load will:
●
●
See values stored during or before earlier critical sections (or some later value)
Not see values stored during or after later critical sections
– A store will:
●
●
●
Overwrite values stored during or before earlier critical sections (or some later value)
Not overwrite values stored during or after later critical sections
Not affect values loaded during or before earlier critical sections
●
In other words, mutual exclusion plus a bit
Release-Acquire Intuitions
16
Release-Acquire Intuitions: Graphical
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
0
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
1
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
2
Acquire must load value stored by release,
or that of some later release in the same
acquire-release chain
17
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
2
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
0
Release-Acquire Intuitions: Graphical
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
1
Acquire must load value stored by release,
or that of some later release in the same
acquire-release chain
18
Release-Acquire Intuitions: Caution!!!
o
L
k
c
CPU
0
Before
Acquire Acquire
Release
After
Release
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
1
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
2
Acquire must load value stored by release,
or that of some later release in the same
acquire-release chain
Before
Acquire
o
L
k
c
Release
Acquire
After
Release
CPU
0
No mutual exclusion,
so running concurrently!!!
19
Release-Acquire Intuitions: Wall o’Text
20
●
After an acquire that loads the value stored by a prior release (or the
value by some later release in that release-acquire chain):
– A load will:
●
●
See values stored before earlier releases (or some later value)
Not see values stored after later acquires
– A store will:
●
●
●
Overwrite values stored before earlier releases (or some later value)
Not overwrite values stored after later acquires
Not affect values loaded before earlier releases
●
But there is no mutual exclusion unless you provide it separately!!!
RCU Intuitions
21
RCU Intuitions: Insertion
rcu_assign_pointer()
Read data
Then
If new pointer...
…also initialized data
Old data
New data
Pointer
Happens
Before
22
Updater
Initialization
Reader
rcu_dereference()
RCU Intuitions: Earlier Reader
he
Given this ordering...
rcu_read_lock()
r1 = READ_ONCE(x)
WRITE_ONCE(y, 1)
rcu_read_unlock()
...RCU guarantees this ordering
CPU 0
23
T n
WRITE_ONCE(x, 1)
synchronize_rcu()
r2 = READ_ONCE(y)
CPU 1
RCU Intuitions: Later Reader
rcu_read_lock()
r1 = READ_ONCE(x)
WRITE_ONCE(y, 1)
...RCU guarantees this ordering
Given this ordering...
CPU 0
rcu_read_unlock()
CPU 1
WRITE_ONCE(x, 1)
synchronize_rcu()
r2 = READ_ONCE(y)
24
RCU Intuitions: Later Reader
rcu_read_lock()
r1 = READ_ONCE(x)
WRITE_ONCE(y, 1)
...RCU guarantees this ordering
Given this ordering...
CPU 0
rcu_read_unlock()
CPU 1
25
WRITE_ONCE(x, 1)
synchronize_rcu()
r2 = READ_ONCE(y)
RCU Intuitions: Later Reader
rcu_read_lock()
r1 = READ_ONCE(x)
WRITE_ONCE(y, 1)
...RCU guarantees this ordering
Given this ordering...
CPU 0
rcu_read_unlock()
CPU 1
WRITE_ONCE(x, 1)
synchronize_rcu()
r2 = READ_ONCE(y)
26
Fully Ordered Phantasms
27
Fully Ordered Phantasms
28
●
●
In each thread, place an smp_mb() between each pair
of successive accesses to shared variables
All threads will agree on at least on order for all
threads’ accesses
–
–
– Give or take mixed-size accesses
And give or take performance degradation
And again, no mutual exclusion!!!
Memory-Model Rules of Thumb
29
Memory-Model Rules of Thumb
30
●
●
Based on communication properties
Three link types:
–
–
– Store-to-load (temporal)
Store-to-store (non-temporal)
Load-to-store (non-temporal)
The Three Link Types
WRITE_ONCE(a, 1);
WRITE_ONCE(b, 1);
WRITE_ONCE(c, 2);
r1 = READ_ONCE(a);
r1 = READ_ONCE(b);
WRITE_ONCE(c, 1);
Store-to-load
link
31
Store-to-store
link
Load-to-Store
link
The Three Link Types
WRITE_ONCE(a, 1);
WRITE_ONCE(b, 1);
WRITE_ONCE(c, 2);
r1 = READ_ONCE(a);
r1 = READ_ONCE(b);
WRITE_ONCE(c, 1);
Store-to-load
link
Store-to-store
link
Load-to-Store
link
32
33
Memory Model and Laws of Physics
● Following the footsteps of Admiral Hopper:
– Light goes 11.803 inches/ns in a vacuum
●
●
Or, if you prefer, 1.0097 lengths of A4 paper per nanosecond
Light goes 1 width of A4 paper per nanosecond in 50% sugar solution
–
–
–
–
But over and back: 5.9015 in/ns
But not 1GHz! Instead, ~2GHz: ~3in/ns
But Cu: ~1 in/ns, or Si transistors: ~0.1 in/ns
Plus other slowdowns: prototols, electronics, ...
“One nanosecond per foot” courtesy of Grace Hopper (https://www.youtube.com/watch?v=9eyFDBPk4Yw)
https://en.wikipedia.org/wiki/List_of_refractive_indices A 50% sugar solution is “light syrup”.
34
Store-to-Load is Temporal
Time
CPU 0
CPU 1
CPU 2
CPU 3
WRITE_ONCE(b, 1);
r1 = READ_ONCE(b);
“rf” stands for “Reads From”, and its arrow points forward in time
35
Store-to-Store is Non-Temporal
Time
CPU 0
CPU 1
CPU 2
CPU 3
WRITE_ONCE(c, 1);
WRITE_ONCE(c, 2);
“co” stands for “Coherence”, and its arrow really can point backwards in time!
36
Load-to-Store is Non-Temporal
Time
CPU 0
CPU 1
CPU 2
CPU 3
WRITE_ONCE(a, 1);
r1 = READ_ONCE(a);
“fr” stands for “From Read”, and its arrow really can point backwards in time!
Load Buffering
37
Load-Buffering Example
r1 = READ_ONCE(a);
WRITE_ONCE(b, 1);
r1 = READ_ONCE(c);
WRITE_ONCE(a, 1);
r1 = READ_ONCE(b);
WRITE_ONCE(c, 1);
Store-to-load
link
38
Store-to-load
link
Store-to-load
link
Load-Buffering Example
r1 = READ_ONCE(a);
WRITE_ONCE(b, 1);
r1 = READ_ONCE(c);
WRITE_ONCE(a, 1);
r1 = READ_ONCE(b);
WRITE_ONCE(c, 1);
Store-to-load
link
Store-to-load
link
Store-to-load
link
39
Ordered Load-Buffering Example
r1 = READ_ONCE(a);
if (r1)
WRITE_ONCE(b, 1);
r1 = READ_ONCE(c);
if (r1)
WRITE_ONCE(a, 1);
r1 = READ_ONCE(b);
if (r1)
WRITE_ONCE(c, 1);
Store-to-load
link
40
Store-to-load
link
Store-to-load
link
Ordered Load-Buffering Example
r1 = READ_ONCE(a);
if (r1)
WRITE_ONCE(b, 1);
r1 = READ_ONCE(c);
if (r1)
WRITE_ONCE(a, 1);
r1 = READ_ONCE(b);
if (r1)
WRITE_ONCE(c, 1);
Store-to-load
link
Store-to-load
link
Store-to-load
link
41
Ordered Load-Buffering Example
r1 = READ_ONCE(a);
if (r1)
WRITE_ONCE(b, 1);
r1 = READ_ONCE(c);
if (r1)
WRITE_ONCE(a, 1);
r1 = READ_ONCE(b);
if (r1)
WRITE_ONCE(c, 1);
Store-to-load
link
Store-to-load
link
Store-to-load
link
42
Ordered Load-Buffering Example
if (r1)
WRITE_ONC
r1 = READ_ONCE(c);
_ONCE(a, 1);
r1 = READ_ONCE(a); r1 = READ_ONCE(b);
Store-to-load
link
Store-to-load
link
Store-to-load
link
if (
r
1
)
s
s
e
e
p
p
r
r
e
e
e
e
n
n
i
i
v
v
e
e if
(r1 E(b, 1);
o
l
l
WRITE_ONCE(c,
d
d
e
e l
l
e
e
s
s
i
i
t
t
1);
WRIT
Plus no mutual exclusion!!!
E
Plus no mutual exclusion!!!)
43
Almost Load Buffering
44
45
Almost Load-Buffering Example #1
WRITE_ONCE(a, 1);
smp_store_release(&b, 1);
r1 = smp_load_acquire(&c);
WRITE_ONCE(a, 2);
r1 = smp_load_acquire(&b);
smp_store_release(&c, 1);
Release-acquire
store-to-load
link
Store-to-store
link
Release-acquire
store-to-load
link
C-Z6.2+o-r+a-r+a-o.litmus: If both instances of r1==1, then at end a==2
46
Almost Load-Buffering Example #2
WRITE_ONCE(a, 1);
smp_store_release(&b, 1);
r1 = smp_load_acquire(&c);
r2 = READ_ONCE(a);
r1 = smp_load_acquire(&b);
smp_store_release(&c, 1);
Load-to-store
link
C-Z6.4+o-r+a-r+a-o.litmus: If both instances of r1==1, then r2==1
Release-acquire
store-to-load
link
Release-acquire
store-to-load
link
Almost Load Buffering
47
●
●
If all but one of the links is a store-to-load links,
then you can avoid the counterintuitive outcome
by making all the store-to-load links be release-
acquire links
But there is still no mutual exclusion!!!
Multiple Non-Store-to-Load Links
48
Multiple Non-Store-to-Load Links
49
●
If your communications pattern has more than
one non-store-to-load link, you need at least
one smp_mb() between each pair of non-store-
to-load links
Multiple Non-Store-to-Load Links
WRITE_ONCE(a, 1);
smp_mb();
WRITE_ONCE(b, 1);
WRITE_ONCE(c, 2);
smp_mb();
r2 = READ_ONCE(a);
r1 = smp_load_acquire(&b);
smp_store_release(&c, 1);
Release-acquire
store-to-load
link
50
C-Z6.1+o-mb-o+a-r+o-mb-o.litmus: If r1==1 and c==2, then r2==1
Load-to-store
link + smp_mb()
Release-to-store
store-to-store
link + smp_mb()
Release-acquire
store-to-load
link
Multiple Non-Store-to-Load Links
WRITE_ONCE(a, 1);
smp_mb();
WRITE_ONCE(b, 1);
WRITE_ONCE(c, 2);
smp_mb();
r2 = READ_ONCE(a);
r1 = smp_load_acquire(&b);
if (r1)
smp_store_release(&c, 1);
51
C-Z6.1+o-mb-o+a-r+o-mb-o.litmus: If r1==1 and c==2, then r2==1
Load-to-store
link + smp_mb()
Release-to-store
store-to-store
link + smp_mb()
Rules-of-Thumb Summary
52
●
●
●
●
One thread or one variable: Auto-ordered!!!
All load-to-store links: Minimal ordering
All but one load-to-store links: Release-acquire
More than one non-store-to-load links: Put an
smp_mb() between each pair of such links
Summary
53
Summary: Avoiding Learning LKMM
54
● Use prepackaged primitives
–
●
Single thread and/or variable, locking, release-
acquire chains, RCU, full ordering
Use known-good patterns
– Single thread and/or variable, load buffering, release-
acquire chains, smp_mb() between each pair of non-
load-to-store links
Summary: Avoiding Learning LKMM
● Use prepackaged primitives
–
●
Single thread and/or variable, locking, release-
acquire chains, RCU, full ordering
Use known-good patterns
– Single thread and/or variable, load buffering, release-
acquire chains, smp_mb() between each pair of non-
load-to-store links
55
Summary: Avoiding Learning LKMM
● Use prepackaged primitives
–
●
Single thread and/or variable, locking, release-
acquire chains, RCU, full ordering
Use known-good patterns
– Single thread and/or variable, load buffering, release-
acquire chains, smp_mb() between each pair of non-
load-to-store links
56
For More Information
57
● “Who's afraid of a big bad optimizing compiler?” (series)
– https://lwn.net/Articles/793253, https://lwn.net/Articles/799218
● “An introduction to lockless algorithms” (Paolo Bonzini series)
– https://lwn.net/Articles/844224, https://lwn.net/Articles/846700, https://lwn.net/Articles/847481,
https://lwn.net/Articles/847973, https://lwn.net/Articles/849237, https://lwn.net/Articles/850202
● “Concurrency bugs should fear the big bad data-race detector (series)
– https://lwn.net/Articles/816850, https://lwn.net/Articles/816854
● Linux kernel source tree: tools/memory-model
● “Is Parallel Programming Hard, And, If So, What Can You Do About It?”
– Section 4.3.4 (“Accessing Shared Variables”)
– Chapter 15 (“Advanced Synchronization: Memory Ordering”)
– Appendic C (“Why Memory Barriers?”)
● https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
Thank you! Let’s connect.
Paul E. McKenney
paulmck@kernel.org
@paulmckrcu
www.rdrop.com/~paulmck

How to Avoid Learning the Linux-Kernel Memory Model

  • 2.
    Overview 2 ● Transitive intuitions (use-caseview) – ● Singular intuitive bliss, locking intuitions, release- acquire intuitions, RCU intuitions, and fully-ordered phantasms Rules of thumb (memory-model view) – Singular intuitive bliss, load buffering, almost load buffering, multiple non-store-to-load links
  • 3.
  • 4.
    Singular Intuitive Bliss 4 ● ● Singlethread: All accesses seen in order Single shared variable: All threads agree on at least one order for all accesses – But be careful of mixed-size accesses!!! – And of “at least one order”...
  • 5.
    Test of SingularIntuitive Bliss 5 ● ● CPU 0 controls test running on PowerPC CPUs 1-15: – Write their CPU number to shared variable – Loop re-reading variable and recording timestamp ● Next slide presents results
  • 6.
    Results of SingularIntuitive Bliss 6 Each tick is 5.3 nanoseconds on 1.5 GHz POWER5 system
  • 7.
    Results of SingularIntuitive Bliss 7 Each tick is 5.3 nanoseconds on 1.5 GHz POWER5 system
  • 8.
    Orderings of SingularIntuitive Bliss 8
  • 9.
    Singular Intuitive BlissCautions 9 ● ● Compilers assume that plain C-language memory accesses are not to shared variables!!! Mark your accesses to disabuse the compiler – – – Linux-kernel, C, C++ atomics Linux-kernel READ_ONCE() and WRITE_ONCE() Extremely careful use of volatile
  • 10.
    Singular Intuitive BlissCautions 10 ● Compiler is within its rights to transform this: while (a) do_something(); ● Into this (and this really happens!!!): if (a) for (;;) do_something(); ● Use READ_ONCE(a) to constrain compiler
  • 11.
    Singular Intuitive BlissCautions 11 ● Compiler is within its rights to transform this: x = 0x10002; ● Into this (and this really happens!!!): *(unsigned short)(((uintptr_t)x) + 2) = 0x1; *(unsigned short)((uintptr_t)x) = 0x2; ● Use WRITE_ONCE(x, 0x10002) to constrain compiler
  • 12.
    More Compiler CautionaryTales 12 ● ● “Who's afraid of a big bad optimizing compiler?” (series) – https://lwn.net/Articles/793253, https://lwn.net/Articles/799218 “An introduction to lockless algorithms” (Paolo Bonzini series) – ● https://lwn.net/Articles/844224, https://lwn.net/Articles/846700, https://lwn.net/Articles/847481, https://lwn.net/Articles/847973, https://lwn.net/Articles/849237, https://lwn.net/Articles/850202 “Is Parallel Programming Hard, And, If So, What Can You Do About It?” Section 4.3.4 (“Accessing Shared Variables”) – https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/ perfbook.html
  • 13.
  • 14.
  • 15.
    Locking Intuitions: Wallo’Text 15 ● While holding or after releasing a lock: – A load will: ● ● See values stored during or before earlier critical sections (or some later value) Not see values stored during or after later critical sections – A store will: ● ● ● Overwrite values stored during or before earlier critical sections (or some later value) Not overwrite values stored during or after later critical sections Not affect values loaded during or before earlier critical sections ● In other words, mutual exclusion plus a bit
  • 16.
  • 17.
  • 18.
  • 19.
    Release-Acquire Intuitions: Caution!!! o L k c CPU 0 Before AcquireAcquire Release After Release Before Acquire o L k c Release Acquire After Release CPU 1 Before Acquire o L k c Release Acquire After Release CPU 2 Acquire must load value stored by release, or that of some later release in the same acquire-release chain Before Acquire o L k c Release Acquire After Release CPU 0 No mutual exclusion, so running concurrently!!! 19
  • 20.
    Release-Acquire Intuitions: Wallo’Text 20 ● After an acquire that loads the value stored by a prior release (or the value by some later release in that release-acquire chain): – A load will: ● ● See values stored before earlier releases (or some later value) Not see values stored after later acquires – A store will: ● ● ● Overwrite values stored before earlier releases (or some later value) Not overwrite values stored after later acquires Not affect values loaded before earlier releases ● But there is no mutual exclusion unless you provide it separately!!!
  • 21.
  • 22.
    RCU Intuitions: Insertion rcu_assign_pointer() Readdata Then If new pointer... …also initialized data Old data New data Pointer Happens Before 22 Updater Initialization Reader rcu_dereference()
  • 23.
    RCU Intuitions: EarlierReader he Given this ordering... rcu_read_lock() r1 = READ_ONCE(x) WRITE_ONCE(y, 1) rcu_read_unlock() ...RCU guarantees this ordering CPU 0 23 T n WRITE_ONCE(x, 1) synchronize_rcu() r2 = READ_ONCE(y) CPU 1
  • 24.
    RCU Intuitions: LaterReader rcu_read_lock() r1 = READ_ONCE(x) WRITE_ONCE(y, 1) ...RCU guarantees this ordering Given this ordering... CPU 0 rcu_read_unlock() CPU 1 WRITE_ONCE(x, 1) synchronize_rcu() r2 = READ_ONCE(y) 24
  • 25.
    RCU Intuitions: LaterReader rcu_read_lock() r1 = READ_ONCE(x) WRITE_ONCE(y, 1) ...RCU guarantees this ordering Given this ordering... CPU 0 rcu_read_unlock() CPU 1 25 WRITE_ONCE(x, 1) synchronize_rcu() r2 = READ_ONCE(y)
  • 26.
    RCU Intuitions: LaterReader rcu_read_lock() r1 = READ_ONCE(x) WRITE_ONCE(y, 1) ...RCU guarantees this ordering Given this ordering... CPU 0 rcu_read_unlock() CPU 1 WRITE_ONCE(x, 1) synchronize_rcu() r2 = READ_ONCE(y) 26
  • 27.
  • 28.
    Fully Ordered Phantasms 28 ● ● Ineach thread, place an smp_mb() between each pair of successive accesses to shared variables All threads will agree on at least on order for all threads’ accesses – – – Give or take mixed-size accesses And give or take performance degradation And again, no mutual exclusion!!!
  • 29.
  • 30.
    Memory-Model Rules ofThumb 30 ● ● Based on communication properties Three link types: – – – Store-to-load (temporal) Store-to-store (non-temporal) Load-to-store (non-temporal)
  • 31.
    The Three LinkTypes WRITE_ONCE(a, 1); WRITE_ONCE(b, 1); WRITE_ONCE(c, 2); r1 = READ_ONCE(a); r1 = READ_ONCE(b); WRITE_ONCE(c, 1); Store-to-load link 31 Store-to-store link Load-to-Store link
  • 32.
    The Three LinkTypes WRITE_ONCE(a, 1); WRITE_ONCE(b, 1); WRITE_ONCE(c, 2); r1 = READ_ONCE(a); r1 = READ_ONCE(b); WRITE_ONCE(c, 1); Store-to-load link Store-to-store link Load-to-Store link 32
  • 33.
    33 Memory Model andLaws of Physics ● Following the footsteps of Admiral Hopper: – Light goes 11.803 inches/ns in a vacuum ● ● Or, if you prefer, 1.0097 lengths of A4 paper per nanosecond Light goes 1 width of A4 paper per nanosecond in 50% sugar solution – – – – But over and back: 5.9015 in/ns But not 1GHz! Instead, ~2GHz: ~3in/ns But Cu: ~1 in/ns, or Si transistors: ~0.1 in/ns Plus other slowdowns: prototols, electronics, ... “One nanosecond per foot” courtesy of Grace Hopper (https://www.youtube.com/watch?v=9eyFDBPk4Yw) https://en.wikipedia.org/wiki/List_of_refractive_indices A 50% sugar solution is “light syrup”.
  • 34.
    34 Store-to-Load is Temporal Time CPU0 CPU 1 CPU 2 CPU 3 WRITE_ONCE(b, 1); r1 = READ_ONCE(b); “rf” stands for “Reads From”, and its arrow points forward in time
  • 35.
    35 Store-to-Store is Non-Temporal Time CPU0 CPU 1 CPU 2 CPU 3 WRITE_ONCE(c, 1); WRITE_ONCE(c, 2); “co” stands for “Coherence”, and its arrow really can point backwards in time!
  • 36.
    36 Load-to-Store is Non-Temporal Time CPU0 CPU 1 CPU 2 CPU 3 WRITE_ONCE(a, 1); r1 = READ_ONCE(a); “fr” stands for “From Read”, and its arrow really can point backwards in time!
  • 37.
  • 38.
    Load-Buffering Example r1 =READ_ONCE(a); WRITE_ONCE(b, 1); r1 = READ_ONCE(c); WRITE_ONCE(a, 1); r1 = READ_ONCE(b); WRITE_ONCE(c, 1); Store-to-load link 38 Store-to-load link Store-to-load link
  • 39.
    Load-Buffering Example r1 =READ_ONCE(a); WRITE_ONCE(b, 1); r1 = READ_ONCE(c); WRITE_ONCE(a, 1); r1 = READ_ONCE(b); WRITE_ONCE(c, 1); Store-to-load link Store-to-load link Store-to-load link 39
  • 40.
    Ordered Load-Buffering Example r1= READ_ONCE(a); if (r1) WRITE_ONCE(b, 1); r1 = READ_ONCE(c); if (r1) WRITE_ONCE(a, 1); r1 = READ_ONCE(b); if (r1) WRITE_ONCE(c, 1); Store-to-load link 40 Store-to-load link Store-to-load link
  • 41.
    Ordered Load-Buffering Example r1= READ_ONCE(a); if (r1) WRITE_ONCE(b, 1); r1 = READ_ONCE(c); if (r1) WRITE_ONCE(a, 1); r1 = READ_ONCE(b); if (r1) WRITE_ONCE(c, 1); Store-to-load link Store-to-load link Store-to-load link 41
  • 42.
    Ordered Load-Buffering Example r1= READ_ONCE(a); if (r1) WRITE_ONCE(b, 1); r1 = READ_ONCE(c); if (r1) WRITE_ONCE(a, 1); r1 = READ_ONCE(b); if (r1) WRITE_ONCE(c, 1); Store-to-load link Store-to-load link Store-to-load link 42
  • 43.
    Ordered Load-Buffering Example if(r1) WRITE_ONC r1 = READ_ONCE(c); _ONCE(a, 1); r1 = READ_ONCE(a); r1 = READ_ONCE(b); Store-to-load link Store-to-load link Store-to-load link if ( r 1 ) s s e e p p r r e e e e n n i i v v e e if (r1 E(b, 1); o l l WRITE_ONCE(c, d d e e l l e e s s i i t t 1); WRIT Plus no mutual exclusion!!! E Plus no mutual exclusion!!!) 43
  • 44.
  • 45.
    45 Almost Load-Buffering Example#1 WRITE_ONCE(a, 1); smp_store_release(&b, 1); r1 = smp_load_acquire(&c); WRITE_ONCE(a, 2); r1 = smp_load_acquire(&b); smp_store_release(&c, 1); Release-acquire store-to-load link Store-to-store link Release-acquire store-to-load link C-Z6.2+o-r+a-r+a-o.litmus: If both instances of r1==1, then at end a==2
  • 46.
    46 Almost Load-Buffering Example#2 WRITE_ONCE(a, 1); smp_store_release(&b, 1); r1 = smp_load_acquire(&c); r2 = READ_ONCE(a); r1 = smp_load_acquire(&b); smp_store_release(&c, 1); Load-to-store link C-Z6.4+o-r+a-r+a-o.litmus: If both instances of r1==1, then r2==1 Release-acquire store-to-load link Release-acquire store-to-load link
  • 47.
    Almost Load Buffering 47 ● ● Ifall but one of the links is a store-to-load links, then you can avoid the counterintuitive outcome by making all the store-to-load links be release- acquire links But there is still no mutual exclusion!!!
  • 48.
  • 49.
    Multiple Non-Store-to-Load Links 49 ● Ifyour communications pattern has more than one non-store-to-load link, you need at least one smp_mb() between each pair of non-store- to-load links
  • 50.
    Multiple Non-Store-to-Load Links WRITE_ONCE(a,1); smp_mb(); WRITE_ONCE(b, 1); WRITE_ONCE(c, 2); smp_mb(); r2 = READ_ONCE(a); r1 = smp_load_acquire(&b); smp_store_release(&c, 1); Release-acquire store-to-load link 50 C-Z6.1+o-mb-o+a-r+o-mb-o.litmus: If r1==1 and c==2, then r2==1 Load-to-store link + smp_mb() Release-to-store store-to-store link + smp_mb()
  • 51.
    Release-acquire store-to-load link Multiple Non-Store-to-Load Links WRITE_ONCE(a,1); smp_mb(); WRITE_ONCE(b, 1); WRITE_ONCE(c, 2); smp_mb(); r2 = READ_ONCE(a); r1 = smp_load_acquire(&b); if (r1) smp_store_release(&c, 1); 51 C-Z6.1+o-mb-o+a-r+o-mb-o.litmus: If r1==1 and c==2, then r2==1 Load-to-store link + smp_mb() Release-to-store store-to-store link + smp_mb()
  • 52.
    Rules-of-Thumb Summary 52 ● ● ● ● One threador one variable: Auto-ordered!!! All load-to-store links: Minimal ordering All but one load-to-store links: Release-acquire More than one non-store-to-load links: Put an smp_mb() between each pair of such links
  • 53.
  • 54.
    Summary: Avoiding LearningLKMM 54 ● Use prepackaged primitives – ● Single thread and/or variable, locking, release- acquire chains, RCU, full ordering Use known-good patterns – Single thread and/or variable, load buffering, release- acquire chains, smp_mb() between each pair of non- load-to-store links
  • 55.
    Summary: Avoiding LearningLKMM ● Use prepackaged primitives – ● Single thread and/or variable, locking, release- acquire chains, RCU, full ordering Use known-good patterns – Single thread and/or variable, load buffering, release- acquire chains, smp_mb() between each pair of non- load-to-store links 55
  • 56.
    Summary: Avoiding LearningLKMM ● Use prepackaged primitives – ● Single thread and/or variable, locking, release- acquire chains, RCU, full ordering Use known-good patterns – Single thread and/or variable, load buffering, release- acquire chains, smp_mb() between each pair of non- load-to-store links 56
  • 57.
    For More Information 57 ●“Who's afraid of a big bad optimizing compiler?” (series) – https://lwn.net/Articles/793253, https://lwn.net/Articles/799218 ● “An introduction to lockless algorithms” (Paolo Bonzini series) – https://lwn.net/Articles/844224, https://lwn.net/Articles/846700, https://lwn.net/Articles/847481, https://lwn.net/Articles/847973, https://lwn.net/Articles/849237, https://lwn.net/Articles/850202 ● “Concurrency bugs should fear the big bad data-race detector (series) – https://lwn.net/Articles/816850, https://lwn.net/Articles/816854 ● Linux kernel source tree: tools/memory-model ● “Is Parallel Programming Hard, And, If So, What Can You Do About It?” – Section 4.3.4 (“Accessing Shared Variables”) – Chapter 15 (“Advanced Synchronization: Memory Ordering”) – Appendic C (“Why Memory Barriers?”) ● https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
  • 58.
    Thank you! Let’sconnect. Paul E. McKenney paulmck@kernel.org @paulmckrcu www.rdrop.com/~paulmck