SlideShare a Scribd company logo
1 of 31
Solution Patterns for Parallel
Programming
CS4532 Concurrent Programming
Dilum Bandara
Dilum.Bandara@uom.lk
Some slides adapted from Dr. Srinath Perera
Outline
 Designing parallel algorithms
 Solution patterns for parallelism
 Loop Parallel
 Fork/Join
 Divide & Conquer
 Pipe Line
 Asynchronous Agents
 Producer/Consumer
 Load balancing
2
Building a Solution by Composition
 We often solve problems by reducing the problem
to a composition of known problems
 Finding the way to Habarana?
 Sorting 1 million integers
 Can we solve this with Mutex & Semaphores?
 Mutex for mutual exclusion
 Semaphores for signaling
 There is another level
3
Designing Parallel Algorithms
 Parallel algorithm design is not easily reduced to
simple recipes
 Parallel version of serial algorithm is not necessarily
optimum
 Good algorithms require creativity
 Goal
 Suggest a framework within which parallel algorithm
design can be explored
 Develop intuition as to what constitutes a good
parallel algorithm
4
Methodical Design
 Partitioning &
communication focus
on concurrency &
scalability
 Agglomeration &
mapping focus on
locality & other
performance issues
5
Source: www.drdobbs.com/parallel/designing-parallel-
algorithms-part-1/223100878
Methodical Design (Cont.)
1. Partitioning
 Decompose computation/data into small tasks/chunks
 Focus on recognizing opportunities for parallel
execution
 Practical issues such as no of CPUs are ignored
2. Communication
 Determine communication required to coordinate task
execution
 Define communication structures & algorithms
6
Methodical Design (Cont.)
3. Agglomeration
 Defined task & communication structures are
evaluated with respect to
 Performance requirements
 Implementation costs
 If necessary, tasks are combined into larger tasks to
improve
 Performance
 Reduce development costs
7
Source: www.drdobbs.com/architecture-and-design/designing-
parallel-algorithms-part-3/223500075
Methodical Design (Cont.)
4. Mapping
 Each task is assigned to a processor while attempting
to satisfy competing goals of
 Maximizing processor utilization
 Minimizing communication costs
 Static mapping
 At design/compile time
 Dynamic mapping
 At runtime by load-balancing algorithms
8
Parallel Algorithm Design Issues
 Efficiency
 Scalability
 Partitioning computations
 Domain decomposition – based on data
 Functional decomposition – based on computation
 Locality
 Spatial & temporal
 Synchronous & asynchronous communication
 Agglomeration to reduce communication
 Load-balancing
9
3 Ways to Parallelize
1. By Data
 Partition data & give it to different threads
2. By Task
 Partition task into smaller tasks & give it to different
threads
3. By Order
 Partition task into steps & give them to different threads
10
By Data
 Use SPMD model
 When data can be processed locally with lower
dependencies with other data
 Patterns
 Loop parallel, embarrassingly parallel
 Large data unit – under utilization
 Small data units – thrashing
 Chunk layout
 Based on dependencies & caching
 Example – Processing geographical data
11
By Task
 Task Parallel, Divide & Conquer
 Too many tasks – thrashing
 Too little tasks – under utilization
 Dependencies among tasks
 Removable
 Code transformations
 Separable
 Accumulation operations (average, sum, count)
 Extrema (max, min)
 Read only, Read/Write
12
By Order
 Pipeline & Asynchronous Agents
 Dependencies
 Temporal – before/after
 Same time
 None
13
Load Balancing
 Some threads will be busy while others are idle
 Counter by distributing load equally
 When cost of problem is well understood this is possible
 e.g., matrix multiplication, known tree walk
 Some other problems are not that simple
 Hard to predict how workload will be distributed  use
dynamic load balancing
 But require communication between threads/tasks
 2 methods for dynamic load balancing
 Task queues
 Work stealing
14
Task Queues
 Multiple instance of task queues (producer
consumer)
 Threads comes to the task queue after finishing a
task & grab next task
 Typically run with thread pool with fixed no of
threads
15
Source: http://blog.zenika.com
Work Stealing
 Every thread has a work/task queue
 When 1 thread runs out of work, it goes to other
task queue & “steal” the work
16
Source: http://karlsenchoi.blogspot.com
Efficiency = Maximizing Parallelism?
 Usually it is 2 things
 Run algorithm in MAX no of threads with minimal
communication/waiting
 When size of the problem grows, algorithm can handle
it by adding new resources
 It’s done by right architecture + tuning
 There are no clear way to do it
 Just like “design patterns” for OOP, people have
identified parallel programming patterns
17
Solution Patterns for Parallelism
 Loop Parallel
 Fork/Join
 Divide and Conquer
 Producer Consumer/ Pipe Line
 Asynchronous Agents
 Producer/Consumer
18
Loop Parallel
 If each iteration in a loop only depends on that
iteration results + read only data, each iteration
can run in a different thread
 As it’s based on data, also called data parallelism
int[] A = .. int[] B = .. int[] C = ..
for (int i; i<N; i++){
C[i] = F(A[i], B[i])
}
19
Which for These are Loop Parallel?
int[] A = .. int[] B = .. int[] C = ..
for (int i; i<N; i++){
C[i] = F(A[i], B[i-1])
}
int[] A = .. int[] B = .. int[] C = ..
for (int i; i<N; i++){
C[i] = F(A[i], C[i-1])
}
20
Implementing Loop Parallel
 OpenMP example
21
Fork/Join
 Fork job into smaller tasks (independent if
possible), perform them, & join them
 Examples
 Calculate the mean across an array
 Tree walk
 How to partition?
 By Data, e.g., SPMD
 By Task, e.g., MPSD
22
Source: http://en.wikipedia.org/wiki/Fork%E2%80%93join_model
Fork/Join (Cont.)
 Size of work Unit
 Small units – thrashing
 Big Unit – imbalance
 Balancing load among threads
 Static allocation
 If data/task is completely known
 E.g., matrix addition
 Dynamic allocation (tree walks)
 Task queues
 Work Stealing
23
Implementing Fork/Join
 Pthreads
 OpenMP
24
Divide & Conquer
 Break problem into recursive sub-problems &
assign them to different threads
 Examples
 Quick sort
 Search for a value in a tree
 Calculating Fibonacci Sequence
 Often fork again, leads to an execution tree
 Recursion
 May or may not have a join step
 Deep tree – thrashing
 Shallow tree – under utilization 25
Divide & Conquer – Fibonacci
Sequence
Source - Introduction to Algorithms (3rd Edition) by Cormen, Leiserson, Rivest and Stein
26
Producer Consumer
 This pattern is often used, as it helps
dynamically balance workload
 E.g., crawling the Web
 Place new links in a queue so others can pick it up
27
Source: http://vichargrave.com/multithreaded-work-queue-in-c/
Pipeline
 Break a task into small steps (which may have
dependencies) & assign execution of steps to
different threads
 Example
 Read file, sort file, & write to file
 Work hand off from step-to-step
 Each task doesn’t gain, but if there are many
instances of the task, we get a better throughput
 Gain come from tuning
 Example – read/write are slow but sort is fast, can
add more threads to read/write & less threads to sort 28
Pipeline (Cont.)
 Long pipeline – high throughput
 Short pipeline – low latency
 Passing data from one stage to another
 Message passing
 Shared queues
29
Asynchronous Agents
 Here task is done by a set of agents
 Working in P2P fashion
 No clear structure
 They talk to each other via asynchronous messages
 Example – Detecting storms using weather data
 Many agents, each know some aspects about storms
 Weather events are sent to them, which in turn fire
other events, leading to detection
30
Source: http://blogs.msdn.com/
31

More Related Content

Similar to Solution Patterns for Parallel Programming

Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Design and analysis of computer algorithms
Design and analysis of computer algorithmsDesign and analysis of computer algorithms
Design and analysis of computer algorithms Krishna Chaytaniah
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Databricks
 
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningA Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningMakoto Yui
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software PerformanceGibraltar Software
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problemsRichard Ashworth
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning Pranya Prabhakar
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Divide and Conquer Case Study
Divide and Conquer Case StudyDivide and Conquer Case Study
Divide and Conquer Case StudyKushagraChadha1
 
Paralle Programming in Python
Paralle Programming in PythonParalle Programming in Python
Paralle Programming in PythonGhazal Tashakor
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc
 
Top 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceTop 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceEdureka!
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...butest
 

Similar to Solution Patterns for Parallel Programming (20)

Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Design and analysis of computer algorithms
Design and analysis of computer algorithmsDesign and analysis of computer algorithms
Design and analysis of computer algorithms
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
 
Resisting skew accumulation
Resisting skew accumulationResisting skew accumulation
Resisting skew accumulation
 
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningA Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
 
Hpc 6 7
Hpc 6 7Hpc 6 7
Hpc 6 7
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Lect1.pptx
Lect1.pptxLect1.pptx
Lect1.pptx
 
Basic Terminology of Data Structure.pptx
Basic Terminology of Data Structure.pptxBasic Terminology of Data Structure.pptx
Basic Terminology of Data Structure.pptx
 
Divide and Conquer Case Study
Divide and Conquer Case StudyDivide and Conquer Case Study
Divide and Conquer Case Study
 
Ssbse10.ppt
Ssbse10.pptSsbse10.ppt
Ssbse10.ppt
 
Paralle Programming in Python
Paralle Programming in PythonParalle Programming in Python
Paralle Programming in Python
 
ML.pdf
ML.pdfML.pdf
ML.pdf
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
Top 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceTop 3 design patterns in Map Reduce
Top 3 design patterns in Map Reduce
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
 

More from Dilum Bandara

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningDilum Bandara
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeDilum Bandara
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCADilum Bandara
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsDilum Bandara
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresDilum Bandara
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixDilum Bandara
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsDilum Bandara
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersDilum Bandara
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level ParallelismDilum Bandara
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesDilum Bandara
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsDilum Bandara
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesDilum Bandara
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesDilum Bandara
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionDilum Bandara
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPDilum Bandara
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery NetworksDilum Bandara
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingDilum Bandara
 

More from Dilum Bandara (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
 

Recently uploaded

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

Solution Patterns for Parallel Programming

  • 1. Solution Patterns for Parallel Programming CS4532 Concurrent Programming Dilum Bandara Dilum.Bandara@uom.lk Some slides adapted from Dr. Srinath Perera
  • 2. Outline  Designing parallel algorithms  Solution patterns for parallelism  Loop Parallel  Fork/Join  Divide & Conquer  Pipe Line  Asynchronous Agents  Producer/Consumer  Load balancing 2
  • 3. Building a Solution by Composition  We often solve problems by reducing the problem to a composition of known problems  Finding the way to Habarana?  Sorting 1 million integers  Can we solve this with Mutex & Semaphores?  Mutex for mutual exclusion  Semaphores for signaling  There is another level 3
  • 4. Designing Parallel Algorithms  Parallel algorithm design is not easily reduced to simple recipes  Parallel version of serial algorithm is not necessarily optimum  Good algorithms require creativity  Goal  Suggest a framework within which parallel algorithm design can be explored  Develop intuition as to what constitutes a good parallel algorithm 4
  • 5. Methodical Design  Partitioning & communication focus on concurrency & scalability  Agglomeration & mapping focus on locality & other performance issues 5 Source: www.drdobbs.com/parallel/designing-parallel- algorithms-part-1/223100878
  • 6. Methodical Design (Cont.) 1. Partitioning  Decompose computation/data into small tasks/chunks  Focus on recognizing opportunities for parallel execution  Practical issues such as no of CPUs are ignored 2. Communication  Determine communication required to coordinate task execution  Define communication structures & algorithms 6
  • 7. Methodical Design (Cont.) 3. Agglomeration  Defined task & communication structures are evaluated with respect to  Performance requirements  Implementation costs  If necessary, tasks are combined into larger tasks to improve  Performance  Reduce development costs 7 Source: www.drdobbs.com/architecture-and-design/designing- parallel-algorithms-part-3/223500075
  • 8. Methodical Design (Cont.) 4. Mapping  Each task is assigned to a processor while attempting to satisfy competing goals of  Maximizing processor utilization  Minimizing communication costs  Static mapping  At design/compile time  Dynamic mapping  At runtime by load-balancing algorithms 8
  • 9. Parallel Algorithm Design Issues  Efficiency  Scalability  Partitioning computations  Domain decomposition – based on data  Functional decomposition – based on computation  Locality  Spatial & temporal  Synchronous & asynchronous communication  Agglomeration to reduce communication  Load-balancing 9
  • 10. 3 Ways to Parallelize 1. By Data  Partition data & give it to different threads 2. By Task  Partition task into smaller tasks & give it to different threads 3. By Order  Partition task into steps & give them to different threads 10
  • 11. By Data  Use SPMD model  When data can be processed locally with lower dependencies with other data  Patterns  Loop parallel, embarrassingly parallel  Large data unit – under utilization  Small data units – thrashing  Chunk layout  Based on dependencies & caching  Example – Processing geographical data 11
  • 12. By Task  Task Parallel, Divide & Conquer  Too many tasks – thrashing  Too little tasks – under utilization  Dependencies among tasks  Removable  Code transformations  Separable  Accumulation operations (average, sum, count)  Extrema (max, min)  Read only, Read/Write 12
  • 13. By Order  Pipeline & Asynchronous Agents  Dependencies  Temporal – before/after  Same time  None 13
  • 14. Load Balancing  Some threads will be busy while others are idle  Counter by distributing load equally  When cost of problem is well understood this is possible  e.g., matrix multiplication, known tree walk  Some other problems are not that simple  Hard to predict how workload will be distributed  use dynamic load balancing  But require communication between threads/tasks  2 methods for dynamic load balancing  Task queues  Work stealing 14
  • 15. Task Queues  Multiple instance of task queues (producer consumer)  Threads comes to the task queue after finishing a task & grab next task  Typically run with thread pool with fixed no of threads 15 Source: http://blog.zenika.com
  • 16. Work Stealing  Every thread has a work/task queue  When 1 thread runs out of work, it goes to other task queue & “steal” the work 16 Source: http://karlsenchoi.blogspot.com
  • 17. Efficiency = Maximizing Parallelism?  Usually it is 2 things  Run algorithm in MAX no of threads with minimal communication/waiting  When size of the problem grows, algorithm can handle it by adding new resources  It’s done by right architecture + tuning  There are no clear way to do it  Just like “design patterns” for OOP, people have identified parallel programming patterns 17
  • 18. Solution Patterns for Parallelism  Loop Parallel  Fork/Join  Divide and Conquer  Producer Consumer/ Pipe Line  Asynchronous Agents  Producer/Consumer 18
  • 19. Loop Parallel  If each iteration in a loop only depends on that iteration results + read only data, each iteration can run in a different thread  As it’s based on data, also called data parallelism int[] A = .. int[] B = .. int[] C = .. for (int i; i<N; i++){ C[i] = F(A[i], B[i]) } 19
  • 20. Which for These are Loop Parallel? int[] A = .. int[] B = .. int[] C = .. for (int i; i<N; i++){ C[i] = F(A[i], B[i-1]) } int[] A = .. int[] B = .. int[] C = .. for (int i; i<N; i++){ C[i] = F(A[i], C[i-1]) } 20
  • 21. Implementing Loop Parallel  OpenMP example 21
  • 22. Fork/Join  Fork job into smaller tasks (independent if possible), perform them, & join them  Examples  Calculate the mean across an array  Tree walk  How to partition?  By Data, e.g., SPMD  By Task, e.g., MPSD 22 Source: http://en.wikipedia.org/wiki/Fork%E2%80%93join_model
  • 23. Fork/Join (Cont.)  Size of work Unit  Small units – thrashing  Big Unit – imbalance  Balancing load among threads  Static allocation  If data/task is completely known  E.g., matrix addition  Dynamic allocation (tree walks)  Task queues  Work Stealing 23
  • 25. Divide & Conquer  Break problem into recursive sub-problems & assign them to different threads  Examples  Quick sort  Search for a value in a tree  Calculating Fibonacci Sequence  Often fork again, leads to an execution tree  Recursion  May or may not have a join step  Deep tree – thrashing  Shallow tree – under utilization 25
  • 26. Divide & Conquer – Fibonacci Sequence Source - Introduction to Algorithms (3rd Edition) by Cormen, Leiserson, Rivest and Stein 26
  • 27. Producer Consumer  This pattern is often used, as it helps dynamically balance workload  E.g., crawling the Web  Place new links in a queue so others can pick it up 27 Source: http://vichargrave.com/multithreaded-work-queue-in-c/
  • 28. Pipeline  Break a task into small steps (which may have dependencies) & assign execution of steps to different threads  Example  Read file, sort file, & write to file  Work hand off from step-to-step  Each task doesn’t gain, but if there are many instances of the task, we get a better throughput  Gain come from tuning  Example – read/write are slow but sort is fast, can add more threads to read/write & less threads to sort 28
  • 29. Pipeline (Cont.)  Long pipeline – high throughput  Short pipeline – low latency  Passing data from one stage to another  Message passing  Shared queues 29
  • 30. Asynchronous Agents  Here task is done by a set of agents  Working in P2P fashion  No clear structure  They talk to each other via asynchronous messages  Example – Detecting storms using weather data  Many agents, each know some aspects about storms  Weather events are sent to them, which in turn fire other events, leading to detection 30 Source: http://blogs.msdn.com/
  • 31. 31

Editor's Notes

  1. Shovel example
  2. Along A6 after Dambulla