SlideShare a Scribd company logo
Introduction to memory order
consume
2015
issue.hsu@gmail.com
Outline
• Quick recap of acquire and release semantics
• The purpose of consume semantics
• Today’s compiler support
2
Quick Recap of Acquire and Release
Semantics
3
Memory Order
• In the C++11 standard atomic library, most functions accept a
memory_order argument
• Both consume and acquire serve the same purpose
– To help pass non-atomic information safely between threads
• Like acquire operations, a consume operation must be combined with
a release operation in another thread
4
enum memory_order {
memory_order_relaxed,
memory_order_consume,
memory_order_acquire,
memory_order_release,
memory_order_acq_rel,
memory_order_seq_cst
};
Example of Acquire and Release
• Declare two shared variables
• The main thread sits in a loop, repeatedly attempting the following
sequence of read operations
• Another asynchronous task running in another thread try to do a write
operation
5
atomic<int> Guard(0);
int Payload = 0;
for(…)
{
….
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
…
}
Payload = 42;
Guard.store(1, memory_order_release);
Example of Acquire and Release
• Once the asynchronous task writes to Guard, the main thread reads it
– It means that the write-release synchronized-with the read-acquire
– We are guaranteed that p will equal 42, no matter what platform we run this
example on
• We’ve used acquire and release semantics to pass a simple non-
atomic integer Payload between threads
6
The Cost of Acquire Semantics
7
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
strong memory model weakly-ordered CPU
The Purpose of Consume Semantics
8
Data Dependency
• The PowerPC and ARM are weakly-ordered CPUs, but in fact, there
are some cases where they do enforce memory ordering at the
machine instruction level without the need for explicit memory barrier
instructions
– These processors always preserve memory ordering between data-dependent
instructions
• When multiple instructions are data-dependent on each other, we call
it a data dependency chain
– In the following PowerPC listing, there are two independent data dependency
chains
9
Data Dependency
• Consume semantics are designed to exploit the data dependency
ordering
• At the source code level, a dependency chain is a sequence of
expressions whose evaluations all carry-a-dependency to each
another
– Carries-a-dependency is defined in §1.10.9 of the C++11 standard
– It mainly says that one evaluation carries-a-dependency to another if the value of
the first is used as an operand of the second
10
Example of Consume and Release
• Declare two shared variables
• The main thread sits in a loop, repeatedly attempting the following
sequence of read operations
• Another asynchronous task running in another thread try to do a write
operation
11
atomic<int> Guard(0);
int Payload = 0;
for(…)
{
…
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
…
}
Payload = 42;
Guard.store(1, memory_order_release);
atomic<int*> Guard(nullptr);
int Payload = 0;
Payload = 42;
Guard.store(&Payload, memory_order_release);
for(…)
{
…
g = Guard.load(memory_order_consume);
if (g != nullptr)
p = *g;
…
}
Example of Consume and Release
• This time, we don’t have a synchronizes-with relationship anywhere.
What we have this time is called a dependency-ordered-before
relationship
• In any dependency-ordered-before relationship, there’s a
dependency chain starting at the consume operation, and all memory
operations performed before the write-release are guaranteed to be
visible to that chain.
12
Example of Consume and Release
13
Today’s Compiler Support
14
Current Compiler Status
• Those assembly code listings just showed you for PowerPC and
ARMv7 were fabricated
– Sorry, but GCC 4.8.3 and Clang 4.6 don’t actually generate that machine code for
consume operations
• Current versions of GCC and Clang/LLVM use the heavy strategy, all
the time
– As a result, if you compile memory_order_consume for PowerPC or ARMv7 using
today’s compilers, you’ll end up with unnecessary memory barrier instructions
15
Efficient Compiler Strategy in GCC
• GCC 4.9.2 actually has an efficient compiler strategy in its
implementation of memory_order_consume, as described in this
GCC bug report
– Only available in GCC 4.9.2 AARCH64 target
16
• In this example, we are admittedly abusing C++11’s definition of carry-a-dependency
by using f in an expression that cancels it out (f - f). Nonetheless, we are still
technically playing by the standard’s current rules, and thus, its ordering guarantees
should still apply
Example That Illustrates the Compiler Bug
17
int read()
{
int f = Guard.load(std::memory_order_consume); // load-consume
if (f != 0)
return Payload[f - f]; // plain load from Payload[f - f]
return 0;
}
int write()
{
Payload[0] = 42; // plain store to Payload[0]
Guard.store(1, std::memory_order_release); // store-release
}
#include <atomic>
std::atomic<int> Guard(0);
int Payload[1] = { 0xbadf00d };
$ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
A Patch for This Bug
• Andrew Macleod posted a patch for this issue in the bug report. His
patch adds the following lines near the end of the get_memmodel
function in gcc/builtins.c
• After patching
– $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
18
/* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so
be conservative and promote consume to acquire. */
if (val == MEMMODEL_CONSUME)
val = MEMMODEL_ACQUIRE;
This Bug Doesn’t Happen on PowerPC
• Interestingly, if you compile the same example for PowerPC, there is
no bug. This is using the same GCC version 4.9.2 without Andrew’s
patch applied
– $ powerpc-linux-g++ -std=c++11 -O2 -S consumetest.cpp
19
if (model == MEMMODEL_RELAXED
|| model == MEMMODEL_CONSUME
|| model == MEMMODEL_RELEASE)
return "ldr<atomic_sfx>t%<w>0, %1";
else
return "ldar<atomic_sfx>t%<w>0, %1";
switch (model)
{
case MEMMODEL_RELAXED:
break;
case MEMMODEL_CONSUME:
case MEMMODEL_ACQUIRE:
case MEMMODEL_SEQ_CST:
emit_insn (gen_loadsync_<mode> (operands[0]));
break;
gcc-4.9.2/gcc/config/rs6000/sync.mdgcc-4.9.2/gcc/config/aarch64/atomics.md
The Uncertain Future of memory order
consume
• The C++ standard committee is wondering what to do with
memory_order_consume in future revisions of C++
• The author’s opinion is that the definition of carries-a-dependency
should be narrowed to require that different return values from a load-
consume result in different behavior for any dependent statements
that are executed
– Using f - f as a dependency is nonsense, and narrowing the definition would free
the compiler from having to support such nonsense “dependencies” if it chooses
to implement the efficient strategy
– This idea was first proposed by Torvald Riegel in the Linux Kernel Mailing List and
is captured among various alternatives described in Paul McKenney’s proposal
N4036
20
int my_array[MY_ARRAY_SIZE];
i = atomic_load_explicit(gi, memory_order_consume);
r1 = my_array[i];
References
21
References
• The Purpose of memory_order_consume in C++11
– http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/
• Fixing GCC's Implementation of memory_order_consume
– http://preshing.com/20141124/fixing-gccs-implementation-of-
memory_order_consume/
• http://en.cppreference.com/w/cpp/atomic/memory_order
• Bug 59448 - Code generation doesn't respect C11 address-dependency
– https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
• N4036: Towards Implementation and Use of memory order consume
– https://isocpp.org/files/papers/n4036.pdf
• Demo program
– https://github.com/preshing/ConsumeDemo
22

More Related Content

What's hot

Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
Yi-Hsiu Hsu
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
Hossam Adel
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V International
 
ARM Architecture for Kernel Development
ARM Architecture for Kernel DevelopmentARM Architecture for Kernel Development
ARM Architecture for Kernel Development
GlobalLogic Ukraine
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
Chiou-Nan Chen
 
LLVM Backend Porting
LLVM Backend PortingLLVM Backend Porting
LLVM Backend Porting
Shiva Chen
 
VLIW(Very Long Instruction Word)
VLIW(Very Long Instruction Word)VLIW(Very Long Instruction Word)
VLIW(Very Long Instruction Word)
Pragnya Dash
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
Yi-Hsiu Hsu
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoC
Linaro
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi coremukul bhardwaj
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Eric Lin
 
Vulkan introduction
Vulkan introductionVulkan introduction
Vulkan introduction
Jiahan Su
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
Ali Raza
 
Processamento paralelo
Processamento paraleloProcessamento paralelo
Processamento paralelo
Gabriel Nepomuceno
 
Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage
jemin lee
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
 
Timer
TimerTimer
ARM Architecture in Details
ARM Architecture in Details ARM Architecture in Details
ARM Architecture in Details
GlobalLogic Ukraine
 
Attacking Windows NDIS Drivers
Attacking Windows NDIS DriversAttacking Windows NDIS Drivers
Attacking Windows NDIS Drivers
Kique Nissim
 

What's hot (20)

Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
ARM Architecture for Kernel Development
ARM Architecture for Kernel DevelopmentARM Architecture for Kernel Development
ARM Architecture for Kernel Development
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
LLVM Backend Porting
LLVM Backend PortingLLVM Backend Porting
LLVM Backend Porting
 
Oop principles
Oop principlesOop principles
Oop principles
 
VLIW(Very Long Instruction Word)
VLIW(Very Long Instruction Word)VLIW(Very Long Instruction Word)
VLIW(Very Long Instruction Word)
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoC
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
 
Vulkan introduction
Vulkan introductionVulkan introduction
Vulkan introduction
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Processamento paralelo
Processamento paraleloProcessamento paralelo
Processamento paralelo
 
Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Timer
TimerTimer
Timer
 
ARM Architecture in Details
ARM Architecture in Details ARM Architecture in Details
ARM Architecture in Details
 
Attacking Windows NDIS Drivers
Attacking Windows NDIS DriversAttacking Windows NDIS Drivers
Attacking Windows NDIS Drivers
 

Similar to Introduction to memory order consume

Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
Anil Bohare
 
Memory model
Memory modelMemory model
Memory model
MingdongLiao
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization
Ganesan Narayanasamy
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad cores
Linaro
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
SeongJae Park
 
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cruftex
 
Open Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFVOpen Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFV
Open Networking Perú (Opennetsoft)
 
slides8 SharedMemory.ppt
slides8 SharedMemory.pptslides8 SharedMemory.ppt
slides8 SharedMemory.ppt
aminnezarat
 
Java memory model
Java memory modelJava memory model
Java memory model
Michał Warecki
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
grooverdan
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
Adrian Huang
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Linaro
 
01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
Smitha Padmanabhan
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
Linaro
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
tidwellveronique
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
tidwellveronique
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
Amazon Web Services
 
jvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrencyjvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrency
Arvind Kalyan
 
ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata
EvonCanales257
 
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docxPlease do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
ARIV4
 

Similar to Introduction to memory order consume (20)

Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Memory model
Memory modelMemory model
Memory model
 
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad cores
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
 
Open Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFVOpen Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFV
 
slides8 SharedMemory.ppt
slides8 SharedMemory.pptslides8 SharedMemory.ppt
slides8 SharedMemory.ppt
 
Java memory model
Java memory modelJava memory model
Java memory model
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
 
01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
jvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrencyjvm/java - towards lock-free concurrency
jvm/java - towards lock-free concurrency
 
ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata
 
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docxPlease do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
 

Recently uploaded

GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 

Recently uploaded (20)

GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 

Introduction to memory order consume

  • 1. Introduction to memory order consume 2015 issue.hsu@gmail.com
  • 2. Outline • Quick recap of acquire and release semantics • The purpose of consume semantics • Today’s compiler support 2
  • 3. Quick Recap of Acquire and Release Semantics 3
  • 4. Memory Order • In the C++11 standard atomic library, most functions accept a memory_order argument • Both consume and acquire serve the same purpose – To help pass non-atomic information safely between threads • Like acquire operations, a consume operation must be combined with a release operation in another thread 4 enum memory_order { memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst };
  • 5. Example of Acquire and Release • Declare two shared variables • The main thread sits in a loop, repeatedly attempting the following sequence of read operations • Another asynchronous task running in another thread try to do a write operation 5 atomic<int> Guard(0); int Payload = 0; for(…) { …. g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; … } Payload = 42; Guard.store(1, memory_order_release);
  • 6. Example of Acquire and Release • Once the asynchronous task writes to Guard, the main thread reads it – It means that the write-release synchronized-with the read-acquire – We are guaranteed that p will equal 42, no matter what platform we run this example on • We’ve used acquire and release semantics to pass a simple non- atomic integer Payload between threads 6
  • 7. The Cost of Acquire Semantics 7 g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; strong memory model weakly-ordered CPU
  • 8. The Purpose of Consume Semantics 8
  • 9. Data Dependency • The PowerPC and ARM are weakly-ordered CPUs, but in fact, there are some cases where they do enforce memory ordering at the machine instruction level without the need for explicit memory barrier instructions – These processors always preserve memory ordering between data-dependent instructions • When multiple instructions are data-dependent on each other, we call it a data dependency chain – In the following PowerPC listing, there are two independent data dependency chains 9
  • 10. Data Dependency • Consume semantics are designed to exploit the data dependency ordering • At the source code level, a dependency chain is a sequence of expressions whose evaluations all carry-a-dependency to each another – Carries-a-dependency is defined in §1.10.9 of the C++11 standard – It mainly says that one evaluation carries-a-dependency to another if the value of the first is used as an operand of the second 10
  • 11. Example of Consume and Release • Declare two shared variables • The main thread sits in a loop, repeatedly attempting the following sequence of read operations • Another asynchronous task running in another thread try to do a write operation 11 atomic<int> Guard(0); int Payload = 0; for(…) { … g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; … } Payload = 42; Guard.store(1, memory_order_release); atomic<int*> Guard(nullptr); int Payload = 0; Payload = 42; Guard.store(&Payload, memory_order_release); for(…) { … g = Guard.load(memory_order_consume); if (g != nullptr) p = *g; … }
  • 12. Example of Consume and Release • This time, we don’t have a synchronizes-with relationship anywhere. What we have this time is called a dependency-ordered-before relationship • In any dependency-ordered-before relationship, there’s a dependency chain starting at the consume operation, and all memory operations performed before the write-release are guaranteed to be visible to that chain. 12
  • 13. Example of Consume and Release 13
  • 15. Current Compiler Status • Those assembly code listings just showed you for PowerPC and ARMv7 were fabricated – Sorry, but GCC 4.8.3 and Clang 4.6 don’t actually generate that machine code for consume operations • Current versions of GCC and Clang/LLVM use the heavy strategy, all the time – As a result, if you compile memory_order_consume for PowerPC or ARMv7 using today’s compilers, you’ll end up with unnecessary memory barrier instructions 15
  • 16. Efficient Compiler Strategy in GCC • GCC 4.9.2 actually has an efficient compiler strategy in its implementation of memory_order_consume, as described in this GCC bug report – Only available in GCC 4.9.2 AARCH64 target 16
  • 17. • In this example, we are admittedly abusing C++11’s definition of carry-a-dependency by using f in an expression that cancels it out (f - f). Nonetheless, we are still technically playing by the standard’s current rules, and thus, its ordering guarantees should still apply Example That Illustrates the Compiler Bug 17 int read() { int f = Guard.load(std::memory_order_consume); // load-consume if (f != 0) return Payload[f - f]; // plain load from Payload[f - f] return 0; } int write() { Payload[0] = 42; // plain store to Payload[0] Guard.store(1, std::memory_order_release); // store-release } #include <atomic> std::atomic<int> Guard(0); int Payload[1] = { 0xbadf00d }; $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
  • 18. A Patch for This Bug • Andrew Macleod posted a patch for this issue in the bug report. His patch adds the following lines near the end of the get_memmodel function in gcc/builtins.c • After patching – $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp 18 /* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so be conservative and promote consume to acquire. */ if (val == MEMMODEL_CONSUME) val = MEMMODEL_ACQUIRE;
  • 19. This Bug Doesn’t Happen on PowerPC • Interestingly, if you compile the same example for PowerPC, there is no bug. This is using the same GCC version 4.9.2 without Andrew’s patch applied – $ powerpc-linux-g++ -std=c++11 -O2 -S consumetest.cpp 19 if (model == MEMMODEL_RELAXED || model == MEMMODEL_CONSUME || model == MEMMODEL_RELEASE) return "ldr<atomic_sfx>t%<w>0, %1"; else return "ldar<atomic_sfx>t%<w>0, %1"; switch (model) { case MEMMODEL_RELAXED: break; case MEMMODEL_CONSUME: case MEMMODEL_ACQUIRE: case MEMMODEL_SEQ_CST: emit_insn (gen_loadsync_<mode> (operands[0])); break; gcc-4.9.2/gcc/config/rs6000/sync.mdgcc-4.9.2/gcc/config/aarch64/atomics.md
  • 20. The Uncertain Future of memory order consume • The C++ standard committee is wondering what to do with memory_order_consume in future revisions of C++ • The author’s opinion is that the definition of carries-a-dependency should be narrowed to require that different return values from a load- consume result in different behavior for any dependent statements that are executed – Using f - f as a dependency is nonsense, and narrowing the definition would free the compiler from having to support such nonsense “dependencies” if it chooses to implement the efficient strategy – This idea was first proposed by Torvald Riegel in the Linux Kernel Mailing List and is captured among various alternatives described in Paul McKenney’s proposal N4036 20 int my_array[MY_ARRAY_SIZE]; i = atomic_load_explicit(gi, memory_order_consume); r1 = my_array[i];
  • 22. References • The Purpose of memory_order_consume in C++11 – http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/ • Fixing GCC's Implementation of memory_order_consume – http://preshing.com/20141124/fixing-gccs-implementation-of- memory_order_consume/ • http://en.cppreference.com/w/cpp/atomic/memory_order • Bug 59448 - Code generation doesn't respect C11 address-dependency – https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 • N4036: Towards Implementation and Use of memory order consume – https://isocpp.org/files/papers/n4036.pdf • Demo program – https://github.com/preshing/ConsumeDemo 22