SlideShare a Scribd company logo
1 of 22
Introduction to memory order
consume
2015
issue.hsu@gmail.com
Outline
• Quick recap of acquire and release semantics
• The purpose of consume semantics
• Today’s compiler support
2
Quick Recap of Acquire and Release
Semantics
3
Memory Order
• In the C++11 standard atomic library, most functions accept a
memory_order argument
• Both consume and acquire serve the same purpose
– To help pass non-atomic information safely between threads
• Like acquire operations, a consume operation must be combined with
a release operation in another thread
4
enum memory_order {
memory_order_relaxed,
memory_order_consume,
memory_order_acquire,
memory_order_release,
memory_order_acq_rel,
memory_order_seq_cst
};
Example of Acquire and Release
• Declare two shared variables
• The main thread sits in a loop, repeatedly attempting the following
sequence of read operations
• Another asynchronous task running in another thread try to do a write
operation
5
atomic<int> Guard(0);
int Payload = 0;
for(…)
{
….
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
…
}
Payload = 42;
Guard.store(1, memory_order_release);
Example of Acquire and Release
• Once the asynchronous task writes to Guard, the main thread reads it
– It means that the write-release synchronized-with the read-acquire
– We are guaranteed that p will equal 42, no matter what platform we run this
example on
• We’ve used acquire and release semantics to pass a simple non-
atomic integer Payload between threads
6
The Cost of Acquire Semantics
7
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
strong memory model weakly-ordered CPU
The Purpose of Consume Semantics
8
Data Dependency
• The PowerPC and ARM are weakly-ordered CPUs, but in fact, there
are some cases where they do enforce memory ordering at the
machine instruction level without the need for explicit memory barrier
instructions
– These processors always preserve memory ordering between data-dependent
instructions
• When multiple instructions are data-dependent on each other, we call
it a data dependency chain
– In the following PowerPC listing, there are two independent data dependency
chains
9
Data Dependency
• Consume semantics are designed to exploit the data dependency
ordering
• At the source code level, a dependency chain is a sequence of
expressions whose evaluations all carry-a-dependency to each
another
– Carries-a-dependency is defined in §1.10.9 of the C++11 standard
– It mainly says that one evaluation carries-a-dependency to another if the value of
the first is used as an operand of the second
10
Example of Consume and Release
• Declare two shared variables
• The main thread sits in a loop, repeatedly attempting the following
sequence of read operations
• Another asynchronous task running in another thread try to do a write
operation
11
atomic<int> Guard(0);
int Payload = 0;
for(…)
{
…
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
…
}
Payload = 42;
Guard.store(1, memory_order_release);
atomic<int*> Guard(nullptr);
int Payload = 0;
Payload = 42;
Guard.store(&Payload, memory_order_release);
for(…)
{
…
g = Guard.load(memory_order_consume);
if (g != nullptr)
p = *g;
…
}
Example of Consume and Release
• This time, we don’t have a synchronizes-with relationship anywhere.
What we have this time is called a dependency-ordered-before
relationship
• In any dependency-ordered-before relationship, there’s a
dependency chain starting at the consume operation, and all memory
operations performed before the write-release are guaranteed to be
visible to that chain.
12
Example of Consume and Release
13
Today’s Compiler Support
14
Current Compiler Status
• Those assembly code listings just showed you for PowerPC and
ARMv7 were fabricated
– Sorry, but GCC 4.8.3 and Clang 4.6 don’t actually generate that machine code for
consume operations
• Current versions of GCC and Clang/LLVM use the heavy strategy, all
the time
– As a result, if you compile memory_order_consume for PowerPC or ARMv7 using
today’s compilers, you’ll end up with unnecessary memory barrier instructions
15
Efficient Compiler Strategy in GCC
• GCC 4.9.2 actually has an efficient compiler strategy in its
implementation of memory_order_consume, as described in this
GCC bug report
– Only available in GCC 4.9.2 AARCH64 target
16
• In this example, we are admittedly abusing C++11’s definition of carry-a-dependency
by using f in an expression that cancels it out (f - f). Nonetheless, we are still
technically playing by the standard’s current rules, and thus, its ordering guarantees
should still apply
Example That Illustrates the Compiler Bug
17
int read()
{
int f = Guard.load(std::memory_order_consume); // load-consume
if (f != 0)
return Payload[f - f]; // plain load from Payload[f - f]
return 0;
}
int write()
{
Payload[0] = 42; // plain store to Payload[0]
Guard.store(1, std::memory_order_release); // store-release
}
#include <atomic>
std::atomic<int> Guard(0);
int Payload[1] = { 0xbadf00d };
$ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
A Patch for This Bug
• Andrew Macleod posted a patch for this issue in the bug report. His
patch adds the following lines near the end of the get_memmodel
function in gcc/builtins.c
• After patching
– $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
18
/* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so
be conservative and promote consume to acquire. */
if (val == MEMMODEL_CONSUME)
val = MEMMODEL_ACQUIRE;
This Bug Doesn’t Happen on PowerPC
• Interestingly, if you compile the same example for PowerPC, there is
no bug. This is using the same GCC version 4.9.2 without Andrew’s
patch applied
– $ powerpc-linux-g++ -std=c++11 -O2 -S consumetest.cpp
19
if (model == MEMMODEL_RELAXED
|| model == MEMMODEL_CONSUME
|| model == MEMMODEL_RELEASE)
return "ldr<atomic_sfx>t%<w>0, %1";
else
return "ldar<atomic_sfx>t%<w>0, %1";
switch (model)
{
case MEMMODEL_RELAXED:
break;
case MEMMODEL_CONSUME:
case MEMMODEL_ACQUIRE:
case MEMMODEL_SEQ_CST:
emit_insn (gen_loadsync_<mode> (operands[0]));
break;
gcc-4.9.2/gcc/config/rs6000/sync.mdgcc-4.9.2/gcc/config/aarch64/atomics.md
The Uncertain Future of memory order
consume
• The C++ standard committee is wondering what to do with
memory_order_consume in future revisions of C++
• The author’s opinion is that the definition of carries-a-dependency
should be narrowed to require that different return values from a load-
consume result in different behavior for any dependent statements
that are executed
– Using f - f as a dependency is nonsense, and narrowing the definition would free
the compiler from having to support such nonsense “dependencies” if it chooses
to implement the efficient strategy
– This idea was first proposed by Torvald Riegel in the Linux Kernel Mailing List and
is captured among various alternatives described in Paul McKenney’s proposal
N4036
20
int my_array[MY_ARRAY_SIZE];
i = atomic_load_explicit(gi, memory_order_consume);
r1 = my_array[i];
References
21
References
• The Purpose of memory_order_consume in C++11
– http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/
• Fixing GCC's Implementation of memory_order_consume
– http://preshing.com/20141124/fixing-gccs-implementation-of-
memory_order_consume/
• http://en.cppreference.com/w/cpp/atomic/memory_order
• Bug 59448 - Code generation doesn't respect C11 address-dependency
– https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
• N4036: Towards Implementation and Use of memory order consume
– https://isocpp.org/files/papers/n4036.pdf
• Demo program
– https://github.com/preshing/ConsumeDemo
22

More Related Content

What's hot

Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other AttacksExploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacksinside-BigData.com
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control InterfaceLinaro
 
linux device driver
linux device driverlinux device driver
linux device driverRahul Batra
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicJoseph Lu
 
OPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialOPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialDalton Valadares
 
Linux Kernel Image
Linux Kernel ImageLinux Kernel Image
Linux Kernel Image艾鍗科技
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersDhanashree Prasad
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 
Kernel Process Management
Kernel Process ManagementKernel Process Management
Kernel Process Managementpradeep_tewani
 
basics of compiler design
basics of compiler designbasics of compiler design
basics of compiler designPreeti Katiyar
 
Advanced Debugging with GDB
Advanced Debugging with GDBAdvanced Debugging with GDB
Advanced Debugging with GDBDavid Khosid
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64Yi-Hsiu Hsu
 

What's hot (20)

LLVM
LLVMLLVM
LLVM
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other AttacksExploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
 
linux device driver
linux device driverlinux device driver
linux device driver
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
OPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialOPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build Tutorial
 
24 Multithreaded Algorithms
24 Multithreaded Algorithms24 Multithreaded Algorithms
24 Multithreaded Algorithms
 
Cuda
CudaCuda
Cuda
 
eMMC 5.0 Total IP Solution
eMMC 5.0 Total IP SolutioneMMC 5.0 Total IP Solution
eMMC 5.0 Total IP Solution
 
Linux Kernel Image
Linux Kernel ImageLinux Kernel Image
Linux Kernel Image
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
LLVM introduction
LLVM introductionLLVM introduction
LLVM introduction
 
Kernel Process Management
Kernel Process ManagementKernel Process Management
Kernel Process Management
 
Linux dma engine
Linux dma engineLinux dma engine
Linux dma engine
 
Chpt7
Chpt7Chpt7
Chpt7
 
basics of compiler design
basics of compiler designbasics of compiler design
basics of compiler design
 
Advanced Debugging with GDB
Advanced Debugging with GDBAdvanced Debugging with GDB
Advanced Debugging with GDB
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
 

Similar to Introduction to memory order consume

Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionHemanth Venkatesh
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization Ganesan Narayanasamy
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EASLinaro
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresLinaro
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory modelSeongJae Park
 
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cruftex
 
slides8 SharedMemory.ppt
slides8 SharedMemory.pptslides8 SharedMemory.ppt
slides8 SharedMemory.pptaminnezarat
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisationgrooverdan
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLELinaro
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLinaro
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxtidwellveronique
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxtidwellveronique
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
jvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrencyjvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrencyArvind Kalyan
 

Similar to Introduction to memory order consume (20)

Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emption
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Memory model
Memory modelMemory model
Memory model
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad cores
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
 
Open Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFVOpen Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFV
 
slides8 SharedMemory.ppt
slides8 SharedMemory.pptslides8 SharedMemory.ppt
slides8 SharedMemory.ppt
 
Java memory model
Java memory modelJava memory model
Java memory model
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
 
01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
jvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrencyjvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrency
 

Recently uploaded

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 

Recently uploaded (20)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 

Introduction to memory order consume

  • 1. Introduction to memory order consume 2015 issue.hsu@gmail.com
  • 2. Outline • Quick recap of acquire and release semantics • The purpose of consume semantics • Today’s compiler support 2
  • 3. Quick Recap of Acquire and Release Semantics 3
  • 4. Memory Order • In the C++11 standard atomic library, most functions accept a memory_order argument • Both consume and acquire serve the same purpose – To help pass non-atomic information safely between threads • Like acquire operations, a consume operation must be combined with a release operation in another thread 4 enum memory_order { memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst };
  • 5. Example of Acquire and Release • Declare two shared variables • The main thread sits in a loop, repeatedly attempting the following sequence of read operations • Another asynchronous task running in another thread try to do a write operation 5 atomic<int> Guard(0); int Payload = 0; for(…) { …. g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; … } Payload = 42; Guard.store(1, memory_order_release);
  • 6. Example of Acquire and Release • Once the asynchronous task writes to Guard, the main thread reads it – It means that the write-release synchronized-with the read-acquire – We are guaranteed that p will equal 42, no matter what platform we run this example on • We’ve used acquire and release semantics to pass a simple non- atomic integer Payload between threads 6
  • 7. The Cost of Acquire Semantics 7 g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; strong memory model weakly-ordered CPU
  • 8. The Purpose of Consume Semantics 8
  • 9. Data Dependency • The PowerPC and ARM are weakly-ordered CPUs, but in fact, there are some cases where they do enforce memory ordering at the machine instruction level without the need for explicit memory barrier instructions – These processors always preserve memory ordering between data-dependent instructions • When multiple instructions are data-dependent on each other, we call it a data dependency chain – In the following PowerPC listing, there are two independent data dependency chains 9
  • 10. Data Dependency • Consume semantics are designed to exploit the data dependency ordering • At the source code level, a dependency chain is a sequence of expressions whose evaluations all carry-a-dependency to each another – Carries-a-dependency is defined in §1.10.9 of the C++11 standard – It mainly says that one evaluation carries-a-dependency to another if the value of the first is used as an operand of the second 10
  • 11. Example of Consume and Release • Declare two shared variables • The main thread sits in a loop, repeatedly attempting the following sequence of read operations • Another asynchronous task running in another thread try to do a write operation 11 atomic<int> Guard(0); int Payload = 0; for(…) { … g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; … } Payload = 42; Guard.store(1, memory_order_release); atomic<int*> Guard(nullptr); int Payload = 0; Payload = 42; Guard.store(&Payload, memory_order_release); for(…) { … g = Guard.load(memory_order_consume); if (g != nullptr) p = *g; … }
  • 12. Example of Consume and Release • This time, we don’t have a synchronizes-with relationship anywhere. What we have this time is called a dependency-ordered-before relationship • In any dependency-ordered-before relationship, there’s a dependency chain starting at the consume operation, and all memory operations performed before the write-release are guaranteed to be visible to that chain. 12
  • 13. Example of Consume and Release 13
  • 15. Current Compiler Status • Those assembly code listings just showed you for PowerPC and ARMv7 were fabricated – Sorry, but GCC 4.8.3 and Clang 4.6 don’t actually generate that machine code for consume operations • Current versions of GCC and Clang/LLVM use the heavy strategy, all the time – As a result, if you compile memory_order_consume for PowerPC or ARMv7 using today’s compilers, you’ll end up with unnecessary memory barrier instructions 15
  • 16. Efficient Compiler Strategy in GCC • GCC 4.9.2 actually has an efficient compiler strategy in its implementation of memory_order_consume, as described in this GCC bug report – Only available in GCC 4.9.2 AARCH64 target 16
  • 17. • In this example, we are admittedly abusing C++11’s definition of carry-a-dependency by using f in an expression that cancels it out (f - f). Nonetheless, we are still technically playing by the standard’s current rules, and thus, its ordering guarantees should still apply Example That Illustrates the Compiler Bug 17 int read() { int f = Guard.load(std::memory_order_consume); // load-consume if (f != 0) return Payload[f - f]; // plain load from Payload[f - f] return 0; } int write() { Payload[0] = 42; // plain store to Payload[0] Guard.store(1, std::memory_order_release); // store-release } #include <atomic> std::atomic<int> Guard(0); int Payload[1] = { 0xbadf00d }; $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
  • 18. A Patch for This Bug • Andrew Macleod posted a patch for this issue in the bug report. His patch adds the following lines near the end of the get_memmodel function in gcc/builtins.c • After patching – $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp 18 /* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so be conservative and promote consume to acquire. */ if (val == MEMMODEL_CONSUME) val = MEMMODEL_ACQUIRE;
  • 19. This Bug Doesn’t Happen on PowerPC • Interestingly, if you compile the same example for PowerPC, there is no bug. This is using the same GCC version 4.9.2 without Andrew’s patch applied – $ powerpc-linux-g++ -std=c++11 -O2 -S consumetest.cpp 19 if (model == MEMMODEL_RELAXED || model == MEMMODEL_CONSUME || model == MEMMODEL_RELEASE) return "ldr<atomic_sfx>t%<w>0, %1"; else return "ldar<atomic_sfx>t%<w>0, %1"; switch (model) { case MEMMODEL_RELAXED: break; case MEMMODEL_CONSUME: case MEMMODEL_ACQUIRE: case MEMMODEL_SEQ_CST: emit_insn (gen_loadsync_<mode> (operands[0])); break; gcc-4.9.2/gcc/config/rs6000/sync.mdgcc-4.9.2/gcc/config/aarch64/atomics.md
  • 20. The Uncertain Future of memory order consume • The C++ standard committee is wondering what to do with memory_order_consume in future revisions of C++ • The author’s opinion is that the definition of carries-a-dependency should be narrowed to require that different return values from a load- consume result in different behavior for any dependent statements that are executed – Using f - f as a dependency is nonsense, and narrowing the definition would free the compiler from having to support such nonsense “dependencies” if it chooses to implement the efficient strategy – This idea was first proposed by Torvald Riegel in the Linux Kernel Mailing List and is captured among various alternatives described in Paul McKenney’s proposal N4036 20 int my_array[MY_ARRAY_SIZE]; i = atomic_load_explicit(gi, memory_order_consume); r1 = my_array[i];
  • 22. References • The Purpose of memory_order_consume in C++11 – http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/ • Fixing GCC's Implementation of memory_order_consume – http://preshing.com/20141124/fixing-gccs-implementation-of- memory_order_consume/ • http://en.cppreference.com/w/cpp/atomic/memory_order • Bug 59448 - Code generation doesn't respect C11 address-dependency – https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 • N4036: Towards Implementation and Use of memory order consume – https://isocpp.org/files/papers/n4036.pdf • Demo program – https://github.com/preshing/ConsumeDemo 22