SlideShare a Scribd company logo
1 of 43
Download to read offline
无锁编程
chenxiaojie@qiyi.com
2016.08.18
Content
• Parallel
• Barrier
• Memory Order
• Volatile
• Atomic
• Lock-Free
• ABA Problem
• Reference
Parallel Computing
• Cache Coherence
• https://en.wikipedia.org/wiki/Cache_coherence
• False sharing
• Sequential Consistency
• https://en.wikipedia.org/wiki/Sequential_consistency
• Compiler, CPU, multicore
• Cache load, register
False Sharing
• Solution: Padding
Processor Guaranteed Atomic
• Bus Lock
• https://software.intel.com/en-us/node/544402
• LOCK# signal
• Cache Lock
• Between CPU and Memory
• Cache Coherence
Memory Barrier
• Memory Barrier
• https://en.wikipedia.org/wiki/Memory_barrier
• Causes a CPU or compiler to enforce an ordering constraint on
memory operations issued before and after the barrier instruction
• Compile-time Memory Ordering
• atomic_thread_fence(memory_order_acq_rel);
• Forbids compiler to reorder read and write commands around it
Memory Ordering
• Memory Ordering
• https://en.wikipedia.org/wiki/Memory_ordering
• The runtime order of accesses to computer memory by a CPU
• Sequential Consistency
• All reads and all writes are in-order
• Relaxed consistency
• Some types of reordering are allowed
• Weak consistency
• Reads and writes are arbitrarily reordered, limited only by explicit
memory barriers
Volatile
• Volatile
• https://en.wikipedia.org/wiki/Volatile_(computer_programming)
• Un-cacheable variable
• Prevents reordering between volatile variables
• Not applicable
• Depend on other variable
• Depend on old value
• Enhanced in Java
• Write: release
• Read: acquire
Volatile in Java
Read
Write
StoreStore
Volatile Write
StoreLoad
Volatile Read
LoadStore
Read
LoadLoad
Write
Memory Ordering in C++11
• memory_order_relaxed
• memory_order_acquire
• memory_order_release
• memory_order_consume
• memory_order_acq_rel
• memory_order_seq_cst
Relaxed Ordering
• Atomicity
• Modification order consistency
• Example
• A is sequenced-before B, C is sequenced before D
• Is allowed to produce r1 == r2 == 42 ?
• Reference counters of std::shared_ptr
Relaxed Ordering
// thread 1
r1 = y.load(memory_order_relaxed); // A
x.store(r1, memory_order_relaxed); // B
// thread 2
r2 = x.load(memory_order_relaxed); // C
y.store(42, memory_order_relaxed); // D
// possible order
y.store(42, memory_order_relaxed);
r1 = y.load(memory_order_relaxed);
x.store(r1, memory_order_relaxed);
r2 = x.load(memory_order_relaxed);
Release-Acquire Ordering
• Between the threads releasing and acquiring the same
atomic variable
• All memory writes happened-before the atomic store
• The atomic load happened-before all memory loads
• Example
• A sequenced-before B sequenced-before C
• C synchronizes-with D
• D sequenced-before E sequenced-before F
Release-Acquire Ordering
atomic<string*> ptr;
int data;
void producer() {
string* p = new string("Hello"); // A
data = 42; // B
ptr.store(p, memory_order_release); // C
}
void consumer() {
string* p2;
while (!(p2 = ptr.load(memory_order_acquire))); // D
assert(*p2 == "Hello"); // E
assert(data == 42); // F
}
thread t1(producer);
thread t2(consumer);
Release-Consume ordering
• Data-dependency relationship
• Example
• A sequenced-before B sequenced-before C
• C dependency-ordered-before D
• D sequenced-before E sequenced-before F
• A happens-before E ?
• B happens-before F ?
• Discouraged
Release-Consume ordering
atomic<string*> ptr;
int data;
void producer() {
string* p = new string("Hello"); // A
data = 42; // B
ptr.store(p, memory_order_release); // C
}
void consumer() {
string* p2;
while (!(p2 = ptr.load(memory_order_consume))); // D
assert(*p2 == "Hello"); // E
assert(data == 42); // F
}
thread t1(producer);
thread t2(consumer);
Sequentially-Consistent Ordering
• Order memory the same way as release/acquire ordering
• Establish a single total modification order of all atomic
operations
• Example
• Is r1 == r2 == 0 possible ?
Sequentially-Consistent Ordering
atomic<int> x { 0 }, y { 0 };
// thread 1
x.store(1, memory_order_seq_cst);
r1 = y.load(memory_order_seq_cst);
// thread 2
y.store(1, memory_order_seq_cst);
r2 = x.load(memory_order_seq_cst);
// thread 1
x.store(1, memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
r1 = y.load(memory_order_relaxed);
// thread 2
y.store(1, memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
r2 = x.load(memory_order_relaxed);
Sequentially-Consistent Ordering
atomic<int> x { 0 }, y { 0 };
// thread 1
x.store(1, memory_order_acq_rel);
r1 = y.load(memory_order_acq_rel);
// thread 2
y.store(1, memory_order_acq_rel);
r2 = x.load(memory_order_acq_rel);
// thread 1
x.store(1, memory_order_relaxed);
atomic_thread_fence(memory_order_acq_rel);
r1 = y.load(memory_order_relaxed);
// thread 2
y.store(1, memory_order_relaxed);
atomic_thread_fence(memory_order_acq_rel);
r2 = x.load(memory_order_relaxed);
Atomic Operations
• atomic_store/load
• atomic_exchange
• atomic_compare_exchange_weak/strong
• atomic_fetch_add/sub/and/or/xor
• atomic_thread_fence
• atomic_signal_fence
Atomic Compare and Exchange
• compare_exchange_weak
• Allow to fail spuriously
• Act as if (actual value != expected) even if they are equal
• May require a loop
• compare_exchange_strong
• Distinguish spurious failure and concurrent acces
• Needs extra overhead to retry in the case of failure
Concurrency Control
• Pessimistic
• Blocking until the possibility of violation disappears
• Optimistic
• Collisions between transactions will rarely occur
• Use resources without acquiring locks
• If conflict, the committing rolls back and restart
• Compare and Swap
do {
expected = resource;
some operation;
} while (compare_and_swap(resource, expected, new_value) == false);
Progress Condition
• Blocking
• Obstruction-Free
• http://cs.brown.edu/people/mph/HerlihyLM03/main.pdf
• Lock-Free
• Wait-Free
while (!lock.compare_and_set(0, 1)) {
this_thread::yield();
}
while (!atomic_value.compare_and_set(local_value, local_value + 1)) {
local_value = atomic_value.load();
}
counter.fetch_add(1); // XADD
Lock-Free Stack
• Treiber (1986) Algorithm
• https://en.wikipedia.org/wiki/Treiber_Stack
• 《Treiber, R.K., 1986. Systems programming: Coping with
parallelism. International Business Machines Incorporated,
Thomas J. Watson Research Center.》
// Copyright 2016, Xiaojie Chen. All rights reserved.
// https://github.com/vorfeed/naesala
struct IStackNode {
IStackNode* next;
};
template <class T>
class LockfreeStack {
public:
void Push(T* node);
T* Pop();
private:
static_assert(is_base_of<IStackNode, T>::value, "");
atomic<uint64_t> top_ { 0 };
};
Lock-Free Stack
Lock-Free Stack
void Push(T* node) {
uint64_t last_top = 0;
uint64_t node_ptr = reinterpret_cast<uint64_t>(node);
do {
// Take out the top node of the stack
last_top = top_.load(memory_order_acquire);
// Add a new node as the top of the stack, and point to the old top
node->next = reinterpret_cast<T*>(last_top);
// If the top node is modified by other threads, discard this operation and retry
} while (!top_.compare_exchange_weak(last_top, node_ptr));
}
Lock-Free Stack
Node2 Node1
Top
NewNode Node2 Node1
Top
NewNode Node2 Node1
Top
Lock-Free Stack
T* Pop() {
T* top = nullptr;
uint64_t top_ptr = 0, new_top_ptr = 0;
do {
// Take out the top node of the stack
top_ptr = top_.load(memory_order_acquire);
top = reinterpret_cast<T*>(top_ptr);
// Empty stack
if (!top) {
return nullptr;
}
// Set the next node of the top node as the new top of the stack
new_top_ptr = reinterpret_cast<uint64_t>(top->next);
// If the top node is modified by other threads, discard this operation and retry
} while (!top_.compare_exchange_weak(top_ptr, new_top_ptr));
return top;
}
Lock-Free Stack
Node3 Node2 Node1
Top
Node3 Node2 Node1
Top
Node3 Node2 Node1
Top
Lock-Free Queue
• Michael & Scott (1996) Algorithm
• Java ConcurrentLinkedQueue
• 《Michael, Maged; Scott, Michael (1996). Simple, Fast, and
Practical Non-Blocking and Blocking Concurrent Queue
Algorithms. Proc. 15th Annual ACM Symp. on Principles of
Distributed Computing (PODC). pp. 267–275.
doi:10.1145/248052.248106. ISBN 0-89791-800-2.》
Lock-Free Queue
// Copyright 2016, Xiaojie Chen. All rights reserved.
// https://github.com/vorfeed/naesala
struct IListNode {
IListNode(uint64_t next) : next(next) {}
atomic<uint64_t> next;
};
template <class T>
class LockfreeList {
public:
// Both head and tail point to a dummy if queue is empty
LockfreeList() : dummy_(reinterpret_cast<uint64_t>(new T())),
head_(dummy_), tail_(dummy_) {}
private:
static_assert(is_base_of<IListNode<T>, T>::value, "");
uint64_t dummy_;
atomic<uint64_t> head_, tail_;
};
Lock-Free Queue
void Put(T* node) {
while (true) {
// The tail node of the queue
uint64_t tail_ptr = tail_.load(memory_order_acquire);
T* tail = reinterpret_cast<T*>(tail_ptr);
// The next node of the tail node
uint64_t tail_next_ptr = tail->next.load(memory_order_acquire);
T* tail_next = reinterpret_cast<T*>(tail_next_ptr);
// If the next node of tail node is modified by other threads
if (tail_next) {
// Try to help other threads to swing tail to the next node, and then retry
tail_.compare_exchange_strong(tail_ptr, reinterpret_cast<uint64_t>(tail_next));
// Else try to link node at the end of the queue
} else if (tail->next.compare_exchange_weak(tail_next_ptr,
reinterpret_cast<uint64_t>(node))) {
// If successful, try to swing Tail to the inserted node
// Can also be done by other threads
tail_.compare_exchange_strong(tail_ptr, reinterpret_cast<uint64_t>(node));
break;
}
}
}
Lock-Free Queue
Dummy Node1 Node2
Head
Tail
Dummy Node1 Node2
Head
Tail
Node3
Dummy Node1 Node2
Head
Tail
Node3
Lock-Free Queue
T* Take() {
while (true) {
// The head node of the queue
uint64_t head_ptr = head_.load(memory_order_acquire);
T* head = reinterpret_cast<T*>(head_ptr);
// The tail node of the queue
uint64_t tail_ptr = tail_.load(memory_order_acquire);
T* tail = reinterpret_cast<T*>(tail_ptr);
// The next node of the head node
uint64_t head_next_ptr = head->next.load(memory_order_acquire);
T* head_next = reinterpret_cast<T*>(head_next_ptr);
// Empty queue or the tail falling behind
if (head == tail) {
// Empty queue, couldn’t pop
if (!head_next) {
return nullptr;
}
// another thread is pushing and the tail is falling behind, try to advance it
tail_.compare_exchange_strong(tail_ptr, reinterpret_cast<uint64_t>(head_next));
} else {
// Queue is not empty, do pop operation
}
}
return nullptr;
}
Lock-Free Queue
// pop operation
// another thread had just taken a node
if (!head_next) {
continue;
}
// copy the next node of the head node to a buffer
T data(*head_next);
// Try to swing head to the next node
if (head_.compare_exchange_weak(head_ptr, reinterpret_cast<uint64_t>(head_next))) {
// If successful, copy the buffer data to the head node
*head = move(data);
// Clear the next node pointer of the head node
head->next.store(0, memory_order_release);
// Return the head node
return head;
}
Lock-Free Queue
Dummy Node1 Node2
Head
Tail
Dummy Node1 Node2
Head
Tail
Node1 Dummy Node2
Head
Tail
ABA Problem
• https://en.wikipedia.org/wiki/ABA_problem
• Another thread change the value, do other work, then
change the value back
• Fooling the first thread into thinking "nothing has
changed"
ABA Problem
template <class T>
T* Pointer(uint64_t combine) {
return reinterpret_cast<T*>(combine & 0x0000FFFFFFFFFFFF);
}
template <class T>
uint64_t Combine(T* pointer) {
static atomic_short version(0);
return reinterpret_cast<uint64_t>(pointer) |
(static_cast<uint64_t>(version.fetch_add(1, memory_order_acq_rel)) << 48);
}
ABA Problem
void Push(T* node) {
uint64_t last_top_combine = 0;
uint64_t node_combine = Combine(node);
do {
last_top_combine = top_.load(memory_order_acquire);
node->next = Pointer<T>(last_top_combine);
// If the top node is still next, then assume no one has changed the stack
// (That statement is not always true because of the ABA problem)
// Atomically replace top with new node
} while (!top_.compare_exchange_weak(last_top_combine, node_combine));
}
ABA Problem
T* Pop() {
T* top = nullptr;
uint64_t top_combine = 0, new_top_combine = 0;
do {
top_combine = top_.load(memory_order_acquire);
top = Pointer<T>(top_combine);
if (!top) {
return nullptr;
}
new_top_combine = Combine(top->next);
// If the top node is still ret, then assume no one has changed the stack
// (That statement is not always true because of the ABA problem)
// Atomically replace top with next
} while (!top_.compare_exchange_weak(top_combine, new_top_combine));
return top;
}
Benchmark
0
500000000
1E+09
1.5E+09
1 PRODUCER 1 CONSUMER
SPSC
Condition Variable Queue Lock-Free Queue
0
500000000
1E+09
1.5E+09
1P1C 1P2C 1P4C 1P8C 1P16C 1P32C
SPMC
Condition Variable Queue Lock-Free Queue
0
200000000
400000000
600000000
800000000
1E+09
1.2E+09
1P1C 2P1C 4P1C 8P1C 16P1C 32P1C
MPSC
Condition Variable Queue Lock-Free Queue
0
200000000
400000000
600000000
800000000
1E+09
1.2E+09
1P1C 2P2C 4P4C 8P8C 16P16C 32P32C
MPMC
Condition Variable Queue Lock-Free Queue
Reference
• 《Java Concurrency in Practice》
• 《The Art of Multiprocessor Programming》
• 《C++ Concurrency In Action》
• http://open-std.org
• java.util.concurrent
• https://github.com/vorfeed/naesala/lockfree
Thank you

More Related Content

What's hot

Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012 Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Tom Croucher
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
Michael Barker
 

What's hot (20)

Netty from the trenches
Netty from the trenchesNetty from the trenches
Netty from the trenches
 
JVM Garbage Collection Tuning
JVM Garbage Collection TuningJVM Garbage Collection Tuning
JVM Garbage Collection Tuning
 
JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011
 
How to Test Asynchronous Code (v2)
How to Test Asynchronous Code (v2)How to Test Asynchronous Code (v2)
How to Test Asynchronous Code (v2)
 
Non Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJavaNon Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJava
 
Non-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itNon-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need it
 
Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016
 
Loom and concurrency latest
Loom and concurrency latestLoom and concurrency latest
Loom and concurrency latest
 
GCD and OperationQueue.
GCD and OperationQueue.GCD and OperationQueue.
GCD and OperationQueue.
 
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012 Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012
 
Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013
 
C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴
 
Bucks County Tech Meetup: node.js introduction
Bucks County Tech Meetup: node.js introductionBucks County Tech Meetup: node.js introduction
Bucks County Tech Meetup: node.js introduction
 
The Year of JRuby - RubyC 2018
The Year of JRuby - RubyC 2018The Year of JRuby - RubyC 2018
The Year of JRuby - RubyC 2018
 
Introduction of failsafe
Introduction of failsafeIntroduction of failsafe
Introduction of failsafe
 
All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopAll you need to know about the JavaScript event loop
All you need to know about the JavaScript event loop
 
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
 

Similar to 无锁编程

AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
Zubair Nabi
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
Enkitec
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
Enkitec
 
Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011
CodeIgniter Conference
 

Similar to 无锁编程 (20)

AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding Style
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrency
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
 
Composable Futures with Akka 2.0
Composable Futures with Akka 2.0Composable Futures with Akka 2.0
Composable Futures with Akka 2.0
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Bottom to Top Stack Optimization with LAMP
Bottom to Top Stack Optimization with LAMPBottom to Top Stack Optimization with LAMP
Bottom to Top Stack Optimization with LAMP
 
Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011Bottom to Top Stack Optimization - CICON2011
Bottom to Top Stack Optimization - CICON2011
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
Need for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applicationsNeed for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applications
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
New hope is comming? Project Loom.pdf
New hope is comming? Project Loom.pdfNew hope is comming? Project Loom.pdf
New hope is comming? Project Loom.pdf
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

无锁编程

  • 2. Content • Parallel • Barrier • Memory Order • Volatile • Atomic • Lock-Free • ABA Problem • Reference
  • 3. Parallel Computing • Cache Coherence • https://en.wikipedia.org/wiki/Cache_coherence • False sharing • Sequential Consistency • https://en.wikipedia.org/wiki/Sequential_consistency • Compiler, CPU, multicore • Cache load, register
  • 5. Processor Guaranteed Atomic • Bus Lock • https://software.intel.com/en-us/node/544402 • LOCK# signal • Cache Lock • Between CPU and Memory • Cache Coherence
  • 6. Memory Barrier • Memory Barrier • https://en.wikipedia.org/wiki/Memory_barrier • Causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction • Compile-time Memory Ordering • atomic_thread_fence(memory_order_acq_rel); • Forbids compiler to reorder read and write commands around it
  • 7. Memory Ordering • Memory Ordering • https://en.wikipedia.org/wiki/Memory_ordering • The runtime order of accesses to computer memory by a CPU • Sequential Consistency • All reads and all writes are in-order • Relaxed consistency • Some types of reordering are allowed • Weak consistency • Reads and writes are arbitrarily reordered, limited only by explicit memory barriers
  • 8. Volatile • Volatile • https://en.wikipedia.org/wiki/Volatile_(computer_programming) • Un-cacheable variable • Prevents reordering between volatile variables • Not applicable • Depend on other variable • Depend on old value • Enhanced in Java • Write: release • Read: acquire
  • 9. Volatile in Java Read Write StoreStore Volatile Write StoreLoad Volatile Read LoadStore Read LoadLoad Write
  • 10. Memory Ordering in C++11 • memory_order_relaxed • memory_order_acquire • memory_order_release • memory_order_consume • memory_order_acq_rel • memory_order_seq_cst
  • 11. Relaxed Ordering • Atomicity • Modification order consistency • Example • A is sequenced-before B, C is sequenced before D • Is allowed to produce r1 == r2 == 42 ? • Reference counters of std::shared_ptr
  • 12. Relaxed Ordering // thread 1 r1 = y.load(memory_order_relaxed); // A x.store(r1, memory_order_relaxed); // B // thread 2 r2 = x.load(memory_order_relaxed); // C y.store(42, memory_order_relaxed); // D // possible order y.store(42, memory_order_relaxed); r1 = y.load(memory_order_relaxed); x.store(r1, memory_order_relaxed); r2 = x.load(memory_order_relaxed);
  • 13. Release-Acquire Ordering • Between the threads releasing and acquiring the same atomic variable • All memory writes happened-before the atomic store • The atomic load happened-before all memory loads • Example • A sequenced-before B sequenced-before C • C synchronizes-with D • D sequenced-before E sequenced-before F
  • 14. Release-Acquire Ordering atomic<string*> ptr; int data; void producer() { string* p = new string("Hello"); // A data = 42; // B ptr.store(p, memory_order_release); // C } void consumer() { string* p2; while (!(p2 = ptr.load(memory_order_acquire))); // D assert(*p2 == "Hello"); // E assert(data == 42); // F } thread t1(producer); thread t2(consumer);
  • 15. Release-Consume ordering • Data-dependency relationship • Example • A sequenced-before B sequenced-before C • C dependency-ordered-before D • D sequenced-before E sequenced-before F • A happens-before E ? • B happens-before F ? • Discouraged
  • 16. Release-Consume ordering atomic<string*> ptr; int data; void producer() { string* p = new string("Hello"); // A data = 42; // B ptr.store(p, memory_order_release); // C } void consumer() { string* p2; while (!(p2 = ptr.load(memory_order_consume))); // D assert(*p2 == "Hello"); // E assert(data == 42); // F } thread t1(producer); thread t2(consumer);
  • 17. Sequentially-Consistent Ordering • Order memory the same way as release/acquire ordering • Establish a single total modification order of all atomic operations • Example • Is r1 == r2 == 0 possible ?
  • 18. Sequentially-Consistent Ordering atomic<int> x { 0 }, y { 0 }; // thread 1 x.store(1, memory_order_seq_cst); r1 = y.load(memory_order_seq_cst); // thread 2 y.store(1, memory_order_seq_cst); r2 = x.load(memory_order_seq_cst); // thread 1 x.store(1, memory_order_relaxed); atomic_thread_fence(memory_order_seq_cst); r1 = y.load(memory_order_relaxed); // thread 2 y.store(1, memory_order_relaxed); atomic_thread_fence(memory_order_seq_cst); r2 = x.load(memory_order_relaxed);
  • 19. Sequentially-Consistent Ordering atomic<int> x { 0 }, y { 0 }; // thread 1 x.store(1, memory_order_acq_rel); r1 = y.load(memory_order_acq_rel); // thread 2 y.store(1, memory_order_acq_rel); r2 = x.load(memory_order_acq_rel); // thread 1 x.store(1, memory_order_relaxed); atomic_thread_fence(memory_order_acq_rel); r1 = y.load(memory_order_relaxed); // thread 2 y.store(1, memory_order_relaxed); atomic_thread_fence(memory_order_acq_rel); r2 = x.load(memory_order_relaxed);
  • 20. Atomic Operations • atomic_store/load • atomic_exchange • atomic_compare_exchange_weak/strong • atomic_fetch_add/sub/and/or/xor • atomic_thread_fence • atomic_signal_fence
  • 21. Atomic Compare and Exchange • compare_exchange_weak • Allow to fail spuriously • Act as if (actual value != expected) even if they are equal • May require a loop • compare_exchange_strong • Distinguish spurious failure and concurrent acces • Needs extra overhead to retry in the case of failure
  • 22. Concurrency Control • Pessimistic • Blocking until the possibility of violation disappears • Optimistic • Collisions between transactions will rarely occur • Use resources without acquiring locks • If conflict, the committing rolls back and restart • Compare and Swap do { expected = resource; some operation; } while (compare_and_swap(resource, expected, new_value) == false);
  • 23. Progress Condition • Blocking • Obstruction-Free • http://cs.brown.edu/people/mph/HerlihyLM03/main.pdf • Lock-Free • Wait-Free while (!lock.compare_and_set(0, 1)) { this_thread::yield(); } while (!atomic_value.compare_and_set(local_value, local_value + 1)) { local_value = atomic_value.load(); } counter.fetch_add(1); // XADD
  • 24. Lock-Free Stack • Treiber (1986) Algorithm • https://en.wikipedia.org/wiki/Treiber_Stack • 《Treiber, R.K., 1986. Systems programming: Coping with parallelism. International Business Machines Incorporated, Thomas J. Watson Research Center.》
  • 25. // Copyright 2016, Xiaojie Chen. All rights reserved. // https://github.com/vorfeed/naesala struct IStackNode { IStackNode* next; }; template <class T> class LockfreeStack { public: void Push(T* node); T* Pop(); private: static_assert(is_base_of<IStackNode, T>::value, ""); atomic<uint64_t> top_ { 0 }; }; Lock-Free Stack
  • 26. Lock-Free Stack void Push(T* node) { uint64_t last_top = 0; uint64_t node_ptr = reinterpret_cast<uint64_t>(node); do { // Take out the top node of the stack last_top = top_.load(memory_order_acquire); // Add a new node as the top of the stack, and point to the old top node->next = reinterpret_cast<T*>(last_top); // If the top node is modified by other threads, discard this operation and retry } while (!top_.compare_exchange_weak(last_top, node_ptr)); }
  • 27. Lock-Free Stack Node2 Node1 Top NewNode Node2 Node1 Top NewNode Node2 Node1 Top
  • 28. Lock-Free Stack T* Pop() { T* top = nullptr; uint64_t top_ptr = 0, new_top_ptr = 0; do { // Take out the top node of the stack top_ptr = top_.load(memory_order_acquire); top = reinterpret_cast<T*>(top_ptr); // Empty stack if (!top) { return nullptr; } // Set the next node of the top node as the new top of the stack new_top_ptr = reinterpret_cast<uint64_t>(top->next); // If the top node is modified by other threads, discard this operation and retry } while (!top_.compare_exchange_weak(top_ptr, new_top_ptr)); return top; }
  • 29. Lock-Free Stack Node3 Node2 Node1 Top Node3 Node2 Node1 Top Node3 Node2 Node1 Top
  • 30. Lock-Free Queue • Michael & Scott (1996) Algorithm • Java ConcurrentLinkedQueue • 《Michael, Maged; Scott, Michael (1996). Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms. Proc. 15th Annual ACM Symp. on Principles of Distributed Computing (PODC). pp. 267–275. doi:10.1145/248052.248106. ISBN 0-89791-800-2.》
  • 31. Lock-Free Queue // Copyright 2016, Xiaojie Chen. All rights reserved. // https://github.com/vorfeed/naesala struct IListNode { IListNode(uint64_t next) : next(next) {} atomic<uint64_t> next; }; template <class T> class LockfreeList { public: // Both head and tail point to a dummy if queue is empty LockfreeList() : dummy_(reinterpret_cast<uint64_t>(new T())), head_(dummy_), tail_(dummy_) {} private: static_assert(is_base_of<IListNode<T>, T>::value, ""); uint64_t dummy_; atomic<uint64_t> head_, tail_; };
  • 32. Lock-Free Queue void Put(T* node) { while (true) { // The tail node of the queue uint64_t tail_ptr = tail_.load(memory_order_acquire); T* tail = reinterpret_cast<T*>(tail_ptr); // The next node of the tail node uint64_t tail_next_ptr = tail->next.load(memory_order_acquire); T* tail_next = reinterpret_cast<T*>(tail_next_ptr); // If the next node of tail node is modified by other threads if (tail_next) { // Try to help other threads to swing tail to the next node, and then retry tail_.compare_exchange_strong(tail_ptr, reinterpret_cast<uint64_t>(tail_next)); // Else try to link node at the end of the queue } else if (tail->next.compare_exchange_weak(tail_next_ptr, reinterpret_cast<uint64_t>(node))) { // If successful, try to swing Tail to the inserted node // Can also be done by other threads tail_.compare_exchange_strong(tail_ptr, reinterpret_cast<uint64_t>(node)); break; } } }
  • 33. Lock-Free Queue Dummy Node1 Node2 Head Tail Dummy Node1 Node2 Head Tail Node3 Dummy Node1 Node2 Head Tail Node3
  • 34. Lock-Free Queue T* Take() { while (true) { // The head node of the queue uint64_t head_ptr = head_.load(memory_order_acquire); T* head = reinterpret_cast<T*>(head_ptr); // The tail node of the queue uint64_t tail_ptr = tail_.load(memory_order_acquire); T* tail = reinterpret_cast<T*>(tail_ptr); // The next node of the head node uint64_t head_next_ptr = head->next.load(memory_order_acquire); T* head_next = reinterpret_cast<T*>(head_next_ptr); // Empty queue or the tail falling behind if (head == tail) { // Empty queue, couldn’t pop if (!head_next) { return nullptr; } // another thread is pushing and the tail is falling behind, try to advance it tail_.compare_exchange_strong(tail_ptr, reinterpret_cast<uint64_t>(head_next)); } else { // Queue is not empty, do pop operation } } return nullptr; }
  • 35. Lock-Free Queue // pop operation // another thread had just taken a node if (!head_next) { continue; } // copy the next node of the head node to a buffer T data(*head_next); // Try to swing head to the next node if (head_.compare_exchange_weak(head_ptr, reinterpret_cast<uint64_t>(head_next))) { // If successful, copy the buffer data to the head node *head = move(data); // Clear the next node pointer of the head node head->next.store(0, memory_order_release); // Return the head node return head; }
  • 36. Lock-Free Queue Dummy Node1 Node2 Head Tail Dummy Node1 Node2 Head Tail Node1 Dummy Node2 Head Tail
  • 37. ABA Problem • https://en.wikipedia.org/wiki/ABA_problem • Another thread change the value, do other work, then change the value back • Fooling the first thread into thinking "nothing has changed"
  • 38. ABA Problem template <class T> T* Pointer(uint64_t combine) { return reinterpret_cast<T*>(combine & 0x0000FFFFFFFFFFFF); } template <class T> uint64_t Combine(T* pointer) { static atomic_short version(0); return reinterpret_cast<uint64_t>(pointer) | (static_cast<uint64_t>(version.fetch_add(1, memory_order_acq_rel)) << 48); }
  • 39. ABA Problem void Push(T* node) { uint64_t last_top_combine = 0; uint64_t node_combine = Combine(node); do { last_top_combine = top_.load(memory_order_acquire); node->next = Pointer<T>(last_top_combine); // If the top node is still next, then assume no one has changed the stack // (That statement is not always true because of the ABA problem) // Atomically replace top with new node } while (!top_.compare_exchange_weak(last_top_combine, node_combine)); }
  • 40. ABA Problem T* Pop() { T* top = nullptr; uint64_t top_combine = 0, new_top_combine = 0; do { top_combine = top_.load(memory_order_acquire); top = Pointer<T>(top_combine); if (!top) { return nullptr; } new_top_combine = Combine(top->next); // If the top node is still ret, then assume no one has changed the stack // (That statement is not always true because of the ABA problem) // Atomically replace top with next } while (!top_.compare_exchange_weak(top_combine, new_top_combine)); return top; }
  • 41. Benchmark 0 500000000 1E+09 1.5E+09 1 PRODUCER 1 CONSUMER SPSC Condition Variable Queue Lock-Free Queue 0 500000000 1E+09 1.5E+09 1P1C 1P2C 1P4C 1P8C 1P16C 1P32C SPMC Condition Variable Queue Lock-Free Queue 0 200000000 400000000 600000000 800000000 1E+09 1.2E+09 1P1C 2P1C 4P1C 8P1C 16P1C 32P1C MPSC Condition Variable Queue Lock-Free Queue 0 200000000 400000000 600000000 800000000 1E+09 1.2E+09 1P1C 2P2C 4P4C 8P8C 16P16C 32P32C MPMC Condition Variable Queue Lock-Free Queue
  • 42. Reference • 《Java Concurrency in Practice》 • 《The Art of Multiprocessor Programming》 • 《C++ Concurrency In Action》 • http://open-std.org • java.util.concurrent • https://github.com/vorfeed/naesala/lockfree