Parallel Programming: Beyond the Critical Section

Introduction ,[object Object],[object Object],[object Object]

Overview ,[object Object],[object Object],[object Object],[object Object]

Parallel Programming: Why? ,[object Object],[object Object],[object Object],[object Object]

“ Waaaah!” ,[object Object],[object Object],[object Object],[object Object],[object Object]

So? ,[object Object],[object Object],[object Object],[object Object],[object Object]

How can we utilise 100+ CPUS? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Race Condition Example x++ x++ x=0 x=? Thread A Thread B

Race Condition Example R1 = 0 x=0 Thread A Thread B

Race Condition Example R1 = 0+1 x=0 Thread A Thread B

Race Condition Example R1 = 1 R1 = 0 x=0 Thread A Thread B

Race Condition Example R1 = 1 R1 = 0+1 x=1 Thread A Thread B

Race Condition Example ,[object Object],R1 = 1 R1 = 1 x=1 Thread A Thread B

Atomics ,[object Object],[object Object],[object Object],[object Object],[object Object]

Compare And Swap ,[object Object],[object Object],[object Object],[object Object],[object Object]

Race Condition Solution A AtomicInc(x) AtomicInc(x) x=0 Thread A Thread B

Race Condition Solution A AtomicInc(x) AtomicInc(x) x=0 Thread A Thread B x=1

Race Condition Solution A AtomicInc(x) AtomicInc(x) x=0 Thread A Thread B x=1 x=2

Locking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],… Code… Lock(); // protected region Unlock(); ...more code…

Race Condition Solution B x=0 Thread A Thread B Lock A x++ Unl0ck A Lock A x++ Unl0ck A

Race Condition Solution B x=0 Thread A Thread B x=1 Lock A x++ Unl0ck A Lock A x++ Unl0ck A

Race Condition Solution B x=0 Thread A Thread B x=1 x=2 Lock A x++ Unl0ck A Lock A x++ Unl0ck A

The Problems ,[object Object],[object Object]

Deadlock ,[object Object],[object Object]

Deadlock ,[object Object],[object Object],[object Object],Lock A Lock B Lock B Lock A Unl0ck A Unlock B

The Problems ,[object Object],[object Object],[object Object]

Read/write tearing ,[object Object],[object Object],[object Object],“ AAAAAAAA” “ BBBBBBBB” “ AAAABBBB”

The Problems ,[object Object],[object Object],[object Object],[object Object]

Priority Inversion ,[object Object],[object Object],[object Object],[object Object],[object Object]

The Problems ,[object Object],[object Object],[object Object],[object Object],[object Object]

The ABA problem ,[object Object],[object Object],[object Object],[object Object]

Consider a list and a thread pool… head a c b … ..

Thread A about to CAS head from a to b head a c b … .. CAS(&head->next,a,b);

Threads B: deq a & b head c … .. a b A & B are released into thread local pools

Thread B enq A - reused head a c … .. b A is added back

Thread A executes CAS head a c … .. b CAS(&head->next,a,b);

Thread A executes CAS successfully! head a c … .. b CAS(&head->next,a,b);

ABA Solution ,[object Object],[object Object],[object Object]

The Problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Convoy/Stampede ,[object Object],[object Object],[object Object],[object Object]

Higher Level Locking Primitives ,[object Object],[object Object],[object Object],[object Object],[object Object]

SpinLock ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Mutex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Barrier ,[object Object],[object Object]

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(3) Use results Do stuff

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(3) Use results Do stuff Calculating

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(3) Use results Do stuff Done

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(3) Use results Do stuff Signal

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(2) Use results Do stuff Do other stuff

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(2) Use results Do stuff Calculating

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(2) Use results Do stuff Done Calculating

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(2) Use results Do stuff Signal Done

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(1) Use results Do stuff More code Signal

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(0) Use results Do stuff Calc pi

Barrier example Thread 1 Thread 2 Thread 3 Thread 4 Barrier(0) Use results Do stuff

RWLock ,[object Object],[object Object],[object Object],[object Object]

Semaphore ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Parallel Patterns ,[object Object],[object Object],[object Object],[object Object],[object Object]

So, how do we start? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Problem Decomposition Problem From “Patterns for Parallel Programming”

Problem Decomposition Problem Organise By Tasks Organise By Data Decomposition Organise By Data Flow From “Patterns for Parallel Programming”

Problem Decomposition Problem Organise By Tasks Organise By Data Decomposition Organise By Data Flow Linear Recursive Linear Recursive Linear Recursive From “Patterns for Parallel Programming”

Problem Decomposition Problem Organise By Tasks Organise By Data Decomposition Organise By Data Flow Linear Recursive Linear Recursive Linear Recursive Task Parallelism Divide and Conquer Geometric Decomposition Recursive Data Pipeline Event-Based Coordination From “Patterns for Parallel Programming”

Task Parallelism ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Divide and Conquer ,[object Object],[object Object],[object Object]

Geometric Decomposition ,[object Object],[object Object],[object Object],[object Object]

Recursive Data Pattern ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Pipeline Pattern ,[object Object],[object Object],[object Object],[object Object]

Event-Based Coordination ,[object Object],[object Object],[object Object],[object Object],[object Object]

Supporting Structures SPMD Master/Worker Loop Parallelism Fork/Join Program Structures Data Structures Shared Data Distributed Array Shared Queue

Program Structures SPMD Master/Worker Loop Parallelism Fork/Join Program Structures Data Structures Shared Data Distributed Array Shared Queue

SPMD ,[object Object],[object Object],[object Object],[object Object],[object Object]

Master/Worker ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Loop Parallelism ,[object Object],[object Object],[object Object]

Fork/Join ,[object Object],[object Object],[object Object],[object Object],[object Object]

Supporting Data Structures SPMD Master/Worker Loop Parallelism Fork/Join Program Structures Data Structures Shared Data Distributed Array Shared Queue

Shared Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Distributed Array ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Shared Queue ,[object Object],[object Object],[object Object],[object Object],[object Object]

Lock free programming ,[object Object],[object Object],[object Object],[object Object],[object Object]

Lock Free linked list ,[object Object],[object Object],[object Object],[object Object],[object Object]

Adding a node to a list head a c tail b

Adding a node: Step 1 head a c tail b Find where to insert

Adding a node: Step 2 head a c tail b newNode->Next = prev->Next;

Adding a node: Step 3 head a c tail b prev->Next = newNode;

Extending to multiple threads ,[object Object]

Add ‘b’ and ‘c’ concurretly head a d tail b c Find where to insert

Add ‘b’ and ‘c’ concurretly head a d tail b c newNode->Next = prev->Next;

Add ‘b’ and ‘c’ concurrently head a d tail b c prev->Next = newNode;

Add ‘b’ and ‘c’ concurrently head a d tail b c

Extending to multiple threads ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Coarse Grained Locking ,[object Object],[object Object],[object Object],[object Object],[object Object]

A concrete example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Step 1: Lock list b head a c tail

Step 2 & 3:Find then Insert b head a c tail

Coarse Grained locking ,[object Object],[object Object],[object Object]

Fine Grained Locking ,[object Object],[object Object],[object Object],[object Object],[object Object]

Fine Grained Locking head a c tail b

Fine Grained Locking a c tail b head

Fine Grained Locking c tail b head a

Fine Grained Locking head tail b a c

Fine Grained Locking ,[object Object],[object Object],[object Object]

Optimistic Locking ,[object Object],[object Object],[object Object],[object Object]

Optimistic: Add(“g”) head a c d tail f k m g

Step 1: Search head a c d tail f k m g

Step 2: Lock head a c d tail m g f k

Step 3: Validate head a c d tail m g f k

Step 3: Validate - FAIL head a tail m g d f k

Step 3a: Validate (retry) head a e tail m g d f k

Step 3a: Validate (success) head a e tail m g d f k

Step 4: Add head a e tail m g d f k

Step 5: Unlock head a e tail f k m g d

Optimistic Caveat ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Delete Caveat: Validate head a e tail m g d f k

Delete Caveat: delete ‘d’ head a e tail m g f k d

Delete Caveat: Validate head a e tail m g f k d

Delete Caveat: Valid! head a e tail m g f k d

Optimistic Synchronisation ,[object Object],[object Object],[object Object],[object Object]

Lazy Synchronization ,[object Object],[object Object],[object Object],[object Object]

Lazy: Add(“g”) head a c d tail f k m g

Step 1a: Search (delete c) head a c d tail f k m g

Step 1b: Search (lock) head d tail f k m g a c

Step 1c: Search (mark) head d tail f k m g a c

Step 2d: lock (skip/unlock) head a c d tail m g f k

Step 3: Add/Validate head a d tail m g c f k

Step 4: Unlock head a d tail f k m g c

Lazy Synchronisation ,[object Object],[object Object],[object Object]

Lock free (Non-Blocking) ,[object Object]

Delete ‘a’ and add ‘b’ concurrently head a c tail b prev->next=curr->next; | prev->next=b;

Delete ‘a’ and add ‘b’ concurrently head a c tail b head->next=a->next; | prev->next=b;

Delete ‘a’ and add ‘b’ concurrently head a c tail b Effectively deletes ‘a’ and ‘b’.

Introducing the AtomicMarkedPtr<> ,[object Object],[object Object],[object Object],[object Object],AtomicMarkedPtr<Node> next; next->CompareAndSet(eValue, nValue,eFlag, nFlag);

AtomicMarkedPtr<> ,[object Object],[object Object],class Node { public: Node(); AtomicMarkedPtr<Node> m_Next; T m_Data; int32 m_Key; };

Lock Free: Remove ‘d’ head a c d tail f k m Start loop:

Step 1: Find ‘d’ head a c tail f k m pred curr succ d if(!InternalFind(‘d’)) continue;

Step 2: Mark ‘d’ head a c tail f k m pred curr succ d if(!curr->next->CAS(succ,succ,false,true)) continue;

Step 3: Skip ‘d’ head a c tail f k m pred curr succ d pred->next->CAS(curr,succ,false,false);

LockFree: InternalFind() ,[object Object],[object Object],[object Object],[object Object]

Second InternalFind() head a c tail f k m pred curr succ pred curr succ d

If succ is marked… head a c tail f k m pred curr succ pred curr succ d

… Skip it head a c tail f k m pred curr succ pred curr succ d

Lock Free Synchronisation ,[object Object],[object Object],[object Object]

Lock free ,[object Object],[object Object],[object Object]

Real world considerations ,[object Object],[object Object],[object Object],[object Object]

Advice ,[object Object],[object Object],[object Object],[object Object],[object Object]

References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Parallel Programming: Beyond the Critical Section

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Parallel Programming: Beyond the Critical Section

Similar to Parallel Programming: Beyond the Critical Section (20)

Recently uploaded

Recently uploaded (20)

Parallel Programming: Beyond the Critical Section