SlideShare a Scribd company logo
Lecture 2
More about Parallel
Computing
Vajira Thambawita
Parallel Computer Memory Architectures
- Shared Memory
• Multiple processors can work independently but share the same
memory resources
• Shared memory machines can be divided into two groups based upon
memory access time:
UMA : Uniform Memory Access
NUMA : Non- Uniform Memory Access
Parallel Computer Memory Architectures
- Shared Memory
• Equal accesses and access times to memory
• Most commonly represented today by Symmetric
Multiprocessor (SMP) machines
Uniform Memory Access (UMA)
Parallel Computer Memory Architectures
- Shared Memory
Non - Uniform Memory Access (NUMA)
• Not all processors have equal memory
access time
Parallel Computer Memory Architectures
- Distributed Memory
• Processors have own memory
(There is no concept of global address space)
• It operates independently
• communications in message passing systems
are performed via send and receive operations
Parallel Computer Memory Architectures
– Hybrid Distributed-Shared Memory
• Use in largest and Fasted computers in the
world today
Parallel Programming Models
Shared Memory Model (without threads)
• In this programming model, processes/tasks share a common address
space, which they read and write to asynchronously.
Parallel Programming Models
Threads Model
• This programming model is a type of
shared memory programming.
• In the threads model of parallel
programming, a single "heavy weight"
process can have multiple "light
weight", concurrent execution paths.
• Ex: POSIX Threads, OpenMP,
Microsoft threads, Java Python
threads, CUDA threads for GPUs
Parallel Programming Models
Distributed Memory / Message
Passing Model
• A set of tasks that use their
own local memory during
computation. Multiple tasks
can reside on the same physical
machine and/or across an
arbitrary number of machines.
• Tasks exchange data through
communications by sending
and receiving messages.
• Ex:
• Message Passing Interface
(MPI)
Parallel Programming Models
Data Parallel Model
• May also be referred to as the Partitioned Global Address Space (PGAS)
model.
• Ex: Coarray Fortran, Unified Parallel C (UPC), X10
Parallel Programming Models
Hybrid Model
• A hybrid model combines more than one of the previously described
programming models.
Parallel Programming Models
SPMD and MPMD
Single Program Multiple Data (SPMD)
Multiple Program Multiple Data (MPMD)
Designing Parallel Programs
Automatic vs. Manual Parallelization
• Fully Automatic
• The compiler analyzes the source code and identifies opportunities for
parallelism.
• The analysis includes identifying inhibitors to parallelism and possibly a
cost weighting on whether or not the parallelism would actually improve
performance.
• Loops (do, for) are the most frequent target for automatic
parallelization.
• Programmer Directed
• Using "compiler directives" or possibly compiler flags, the programmer
explicitly tells the compiler how to parallelize the code.
• May be able to be used in conjunction with some degree of automatic
parallelization also.
Designing Parallel Programs
Understand the Problem and the Program
• Easy to parallelize problem
• Calculate the potential energy for each of several thousand independent
conformations of a molecule. When done, find the minimum energy
conformation.
• A problem with little-to-no parallelism
• Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of the
formula:
• F(n) = F(n-1) + F(n-2)
Designing Parallel Programs
Partitioning
• One of the first steps in designing a parallel program is to break the
problem into discrete "chunks" of work that can be distributed to
multiple tasks. This is known as decomposition or partitioning.
Two ways:
• Domain decomposition
• Functional decomposition
Designing Parallel Programs
Domain Decomposition
The data associated with a problem is decomposed
There are different ways to partition data:
Designing Parallel Programs
Functional Decomposition
The problem is decomposed according to the work that must be done
Designing Parallel Programs
You DON'T need communications
• Some types of problems can be
decomposed and executed in parallel with
virtually no need for tasks to share data.
• Ex: Every pixel in a black and white image
needs to have its color reversed
You DO need communications
• This require tasks to share data with each
other
• A 2-D heat diffusion problem requires a
task to know the temperatures calculated
by the tasks that have neighboring data
Designing Parallel Programs
Factors to Consider (designing your program's inter-task
communications)
• Communication overhead
• Latency vs. Bandwidth
• Visibility of communications
• Synchronous vs. asynchronous communications
• Scope of communications
• Efficiency of communications
Designing Parallel Programs
Granularity
• In parallel computing, granularity is a qualitative measure of the ratio of
computation to communication. (Computation / Communication)
• Periods of computation are typically separated from periods of
communication by synchronization events.
• Fine-grain Parallelism
• Coarse-grain Parallelism
Designing Parallel Programs
• Fine-grain Parallelism
• Relatively small amounts of computational work
are done between communication events
• Low computation to communication ratio
• Facilitates load balancing
• Implies high communication overhead and less
opportunity for performance enhancement
• If granularity is too fine it is possible that the
overhead required for communications and
synchronization between tasks takes longer than
the computation.
• Coarse-grain Parallelism
• Relatively large amounts of computational work
are done between communication/synchronization
events
• High computation to communication ratio
• Implies more opportunity for performance increase
• Harder to load balance efficiently
Designing Parallel Programs
I/O
• Rule #1: Reduce overall I/O as much as possible
• If you have access to a parallel file system, use it.
• Writing large chunks of data rather than small chunks is usually
significantly more efficient.
• Fewer, larger files performs better than many small files.
• Confine I/O to specific serial portions of the job, and then use parallel
communications to distribute data to parallel tasks. For example, Task 1
could read an input file and then communicate required data to other
tasks. Likewise, Task 1 could perform write operation after receiving
required data from all other tasks.
• Aggregate I/O operations across tasks - rather than having many tasks
perform I/O, have a subset of tasks perform it.
Designing Parallel Programs
Debugging
• TotalView from RogueWave Software
• DDT from Allinea
• Inspector from Intel
Performance Analysis and Tuning
• LC's web pages at https://hpc.llnl.gov/software/development-environment-
software
• TAU: http://www.cs.uoregon.edu/research/tau/docs.php
• HPCToolkit: http://hpctoolkit.org/documentation.html
• Open|Speedshop: http://www.openspeedshop.org/
• Vampir / Vampirtrace: http://vampir.eu/
• Valgrind: http://valgrind.org/
• PAPI: http://icl.cs.utk.edu/papi/
• mpitrace https://computing.llnl.gov/tutorials/bgq/index.html#mpitrace
• mpiP: http://mpip.sourceforge.net/
• memP: http://memp.sourceforge.net/
Summary

More Related Content

What's hot

Scheduling algorithms
Scheduling algorithmsScheduling algorithms
Scheduling algorithms
Chankey Pathak
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
Dr Sandeep Kumar Poonia
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
Danish Javed
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
prashantdahake
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxINTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
LECO9
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
vanamali_vanu
 
Memory management
Memory managementMemory management
Memory management
cpjcollege
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
Bhavik Vashi
 
CPU Scheduling in OS Presentation
CPU Scheduling in OS  PresentationCPU Scheduling in OS  Presentation
CPU Scheduling in OS Presentation
usmankiyani1
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
Vajira Thambawita
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
Gyanmanjari Institute Of Technology
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
SHATHAN
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
Arpan Baishya
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
Syed Zaid Irshad
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
Shashank Kapoor
 
Parallelism
ParallelismParallelism
Parallelism
Md Raseduzzaman
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing Model
Adlin Jeena
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
Kamal Acharya
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Gyanmanjari Institute Of Technology
 

What's hot (20)

Scheduling algorithms
Scheduling algorithmsScheduling algorithms
Scheduling algorithms
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
 
INTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptxINTER PROCESS COMMUNICATION (IPC).pptx
INTER PROCESS COMMUNICATION (IPC).pptx
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
 
Memory management
Memory managementMemory management
Memory management
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
CPU Scheduling in OS Presentation
CPU Scheduling in OS  PresentationCPU Scheduling in OS  Presentation
CPU Scheduling in OS Presentation
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
 
Parallelism
ParallelismParallelism
Parallelism
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing Model
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
 

Similar to Lecture 2 more about parallel computing

01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt
HarshitPal37
 
Unit5
Unit5Unit5
Unit5
Sneha Soni
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
RubenGabrielHernande
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Engr Zardari Saddam
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
Syed Zaid Irshad
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
giddy5
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
krnaween
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
Hamza Zahid
 
Hpc 6 7
Hpc 6 7Hpc 6 7
Hpc 6 7
Yasir Khan
 
parallel programming models
 parallel programming models parallel programming models
parallel programming models
Swetha S
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
Parallel processing
Parallel processingParallel processing
Parallel processing
Praveen Kumar
 
Chap 1(one) general introduction
Chap 1(one)  general introductionChap 1(one)  general introduction
Chap 1(one) general introduction
Malobe Lottin Cyrille Marcel
 
PGAS Programming Model
PGAS Programming ModelPGAS Programming Model
PGAS Programming Model
ch adnan
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
Murtadha Alsabbagh
 
High performance computing
High performance computingHigh performance computing
High performance computing
punjab engineering college, chandigarh
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
JoeBaker69
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mp
ranjit banshpal
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
Shitalkumar Sukhdeve
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
Sachin Gowda
 

Similar to Lecture 2 more about parallel computing (20)

01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt
 
Unit5
Unit5Unit5
Unit5
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 
Hpc 6 7
Hpc 6 7Hpc 6 7
Hpc 6 7
 
parallel programming models
 parallel programming models parallel programming models
parallel programming models
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Chap 1(one) general introduction
Chap 1(one)  general introductionChap 1(one)  general introduction
Chap 1(one) general introduction
 
PGAS Programming Model
PGAS Programming ModelPGAS Programming Model
PGAS Programming Model
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mp
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
 

More from Vajira Thambawita

Lecture 12 localization and navigation
Lecture 12 localization and navigationLecture 12 localization and navigation
Lecture 12 localization and navigation
Vajira Thambawita
 
Lecture 11 neural network principles
Lecture 11 neural network principlesLecture 11 neural network principles
Lecture 11 neural network principles
Vajira Thambawita
 
Lecture 10 mobile robot design
Lecture 10 mobile robot designLecture 10 mobile robot design
Lecture 10 mobile robot design
Vajira Thambawita
 
Lecture 09 control
Lecture 09 controlLecture 09 control
Lecture 09 control
Vajira Thambawita
 
Lecture 08 robots and controllers
Lecture 08 robots and controllersLecture 08 robots and controllers
Lecture 08 robots and controllers
Vajira Thambawita
 
Lecture 07 more about pic
Lecture 07 more about picLecture 07 more about pic
Lecture 07 more about pic
Vajira Thambawita
 
Lecture 06 pic programming in c
Lecture 06 pic programming in cLecture 06 pic programming in c
Lecture 06 pic programming in c
Vajira Thambawita
 
Lecture 05 pic io port programming
Lecture 05 pic io port programmingLecture 05 pic io port programming
Lecture 05 pic io port programming
Vajira Thambawita
 
Lecture 04 branch call and time delay
Lecture 04  branch call and time delayLecture 04  branch call and time delay
Lecture 04 branch call and time delay
Vajira Thambawita
 
Lecture 03 basics of pic
Lecture 03 basics of picLecture 03 basics of pic
Lecture 03 basics of pic
Vajira Thambawita
 
Lecture 02 mechatronics systems
Lecture 02 mechatronics systemsLecture 02 mechatronics systems
Lecture 02 mechatronics systems
Vajira Thambawita
 
Lecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and RoboticsLecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and Robotics
Vajira Thambawita
 
Lec 09 - Registers and Counters
Lec 09 - Registers and CountersLec 09 - Registers and Counters
Lec 09 - Registers and Counters
Vajira Thambawita
 
Lec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURELec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURE
Vajira Thambawita
 
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITSLec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Vajira Thambawita
 
Lec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential LogicLec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential Logic
Vajira Thambawita
 
Lec 05 - Combinational Logic
Lec 05 - Combinational LogicLec 05 - Combinational Logic
Lec 05 - Combinational Logic
Vajira Thambawita
 
Lec 04 - Gate-level Minimization
Lec 04 - Gate-level MinimizationLec 04 - Gate-level Minimization
Lec 04 - Gate-level Minimization
Vajira Thambawita
 
Lec 03 - Combinational Logic Design
Lec 03 - Combinational Logic DesignLec 03 - Combinational Logic Design
Lec 03 - Combinational Logic Design
Vajira Thambawita
 
Lec 02 - Basic Knowledge
Lec 02 - Basic KnowledgeLec 02 - Basic Knowledge
Lec 02 - Basic Knowledge
Vajira Thambawita
 

More from Vajira Thambawita (20)

Lecture 12 localization and navigation
Lecture 12 localization and navigationLecture 12 localization and navigation
Lecture 12 localization and navigation
 
Lecture 11 neural network principles
Lecture 11 neural network principlesLecture 11 neural network principles
Lecture 11 neural network principles
 
Lecture 10 mobile robot design
Lecture 10 mobile robot designLecture 10 mobile robot design
Lecture 10 mobile robot design
 
Lecture 09 control
Lecture 09 controlLecture 09 control
Lecture 09 control
 
Lecture 08 robots and controllers
Lecture 08 robots and controllersLecture 08 robots and controllers
Lecture 08 robots and controllers
 
Lecture 07 more about pic
Lecture 07 more about picLecture 07 more about pic
Lecture 07 more about pic
 
Lecture 06 pic programming in c
Lecture 06 pic programming in cLecture 06 pic programming in c
Lecture 06 pic programming in c
 
Lecture 05 pic io port programming
Lecture 05 pic io port programmingLecture 05 pic io port programming
Lecture 05 pic io port programming
 
Lecture 04 branch call and time delay
Lecture 04  branch call and time delayLecture 04  branch call and time delay
Lecture 04 branch call and time delay
 
Lecture 03 basics of pic
Lecture 03 basics of picLecture 03 basics of pic
Lecture 03 basics of pic
 
Lecture 02 mechatronics systems
Lecture 02 mechatronics systemsLecture 02 mechatronics systems
Lecture 02 mechatronics systems
 
Lecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and RoboticsLecture 1 - Introduction to embedded system and Robotics
Lecture 1 - Introduction to embedded system and Robotics
 
Lec 09 - Registers and Counters
Lec 09 - Registers and CountersLec 09 - Registers and Counters
Lec 09 - Registers and Counters
 
Lec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURELec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURE
 
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITSLec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
Lec 07 - ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS
 
Lec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential LogicLec 06 - Synchronous Sequential Logic
Lec 06 - Synchronous Sequential Logic
 
Lec 05 - Combinational Logic
Lec 05 - Combinational LogicLec 05 - Combinational Logic
Lec 05 - Combinational Logic
 
Lec 04 - Gate-level Minimization
Lec 04 - Gate-level MinimizationLec 04 - Gate-level Minimization
Lec 04 - Gate-level Minimization
 
Lec 03 - Combinational Logic Design
Lec 03 - Combinational Logic DesignLec 03 - Combinational Logic Design
Lec 03 - Combinational Logic Design
 
Lec 02 - Basic Knowledge
Lec 02 - Basic KnowledgeLec 02 - Basic Knowledge
Lec 02 - Basic Knowledge
 

Recently uploaded

The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 

Recently uploaded (20)

The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
 

Lecture 2 more about parallel computing

  • 1. Lecture 2 More about Parallel Computing Vajira Thambawita
  • 2. Parallel Computer Memory Architectures - Shared Memory • Multiple processors can work independently but share the same memory resources • Shared memory machines can be divided into two groups based upon memory access time: UMA : Uniform Memory Access NUMA : Non- Uniform Memory Access
  • 3. Parallel Computer Memory Architectures - Shared Memory • Equal accesses and access times to memory • Most commonly represented today by Symmetric Multiprocessor (SMP) machines Uniform Memory Access (UMA)
  • 4. Parallel Computer Memory Architectures - Shared Memory Non - Uniform Memory Access (NUMA) • Not all processors have equal memory access time
  • 5. Parallel Computer Memory Architectures - Distributed Memory • Processors have own memory (There is no concept of global address space) • It operates independently • communications in message passing systems are performed via send and receive operations
  • 6. Parallel Computer Memory Architectures – Hybrid Distributed-Shared Memory • Use in largest and Fasted computers in the world today
  • 7. Parallel Programming Models Shared Memory Model (without threads) • In this programming model, processes/tasks share a common address space, which they read and write to asynchronously.
  • 8. Parallel Programming Models Threads Model • This programming model is a type of shared memory programming. • In the threads model of parallel programming, a single "heavy weight" process can have multiple "light weight", concurrent execution paths. • Ex: POSIX Threads, OpenMP, Microsoft threads, Java Python threads, CUDA threads for GPUs
  • 9. Parallel Programming Models Distributed Memory / Message Passing Model • A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines. • Tasks exchange data through communications by sending and receiving messages. • Ex: • Message Passing Interface (MPI)
  • 10. Parallel Programming Models Data Parallel Model • May also be referred to as the Partitioned Global Address Space (PGAS) model. • Ex: Coarray Fortran, Unified Parallel C (UPC), X10
  • 11. Parallel Programming Models Hybrid Model • A hybrid model combines more than one of the previously described programming models.
  • 12. Parallel Programming Models SPMD and MPMD Single Program Multiple Data (SPMD) Multiple Program Multiple Data (MPMD)
  • 13. Designing Parallel Programs Automatic vs. Manual Parallelization • Fully Automatic • The compiler analyzes the source code and identifies opportunities for parallelism. • The analysis includes identifying inhibitors to parallelism and possibly a cost weighting on whether or not the parallelism would actually improve performance. • Loops (do, for) are the most frequent target for automatic parallelization. • Programmer Directed • Using "compiler directives" or possibly compiler flags, the programmer explicitly tells the compiler how to parallelize the code. • May be able to be used in conjunction with some degree of automatic parallelization also.
  • 14. Designing Parallel Programs Understand the Problem and the Program • Easy to parallelize problem • Calculate the potential energy for each of several thousand independent conformations of a molecule. When done, find the minimum energy conformation. • A problem with little-to-no parallelism • Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of the formula: • F(n) = F(n-1) + F(n-2)
  • 15. Designing Parallel Programs Partitioning • One of the first steps in designing a parallel program is to break the problem into discrete "chunks" of work that can be distributed to multiple tasks. This is known as decomposition or partitioning. Two ways: • Domain decomposition • Functional decomposition
  • 16. Designing Parallel Programs Domain Decomposition The data associated with a problem is decomposed There are different ways to partition data:
  • 17. Designing Parallel Programs Functional Decomposition The problem is decomposed according to the work that must be done
  • 18. Designing Parallel Programs You DON'T need communications • Some types of problems can be decomposed and executed in parallel with virtually no need for tasks to share data. • Ex: Every pixel in a black and white image needs to have its color reversed You DO need communications • This require tasks to share data with each other • A 2-D heat diffusion problem requires a task to know the temperatures calculated by the tasks that have neighboring data
  • 19. Designing Parallel Programs Factors to Consider (designing your program's inter-task communications) • Communication overhead • Latency vs. Bandwidth • Visibility of communications • Synchronous vs. asynchronous communications • Scope of communications • Efficiency of communications
  • 20. Designing Parallel Programs Granularity • In parallel computing, granularity is a qualitative measure of the ratio of computation to communication. (Computation / Communication) • Periods of computation are typically separated from periods of communication by synchronization events. • Fine-grain Parallelism • Coarse-grain Parallelism
  • 21. Designing Parallel Programs • Fine-grain Parallelism • Relatively small amounts of computational work are done between communication events • Low computation to communication ratio • Facilitates load balancing • Implies high communication overhead and less opportunity for performance enhancement • If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • Coarse-grain Parallelism • Relatively large amounts of computational work are done between communication/synchronization events • High computation to communication ratio • Implies more opportunity for performance increase • Harder to load balance efficiently
  • 22. Designing Parallel Programs I/O • Rule #1: Reduce overall I/O as much as possible • If you have access to a parallel file system, use it. • Writing large chunks of data rather than small chunks is usually significantly more efficient. • Fewer, larger files performs better than many small files. • Confine I/O to specific serial portions of the job, and then use parallel communications to distribute data to parallel tasks. For example, Task 1 could read an input file and then communicate required data to other tasks. Likewise, Task 1 could perform write operation after receiving required data from all other tasks. • Aggregate I/O operations across tasks - rather than having many tasks perform I/O, have a subset of tasks perform it.
  • 23. Designing Parallel Programs Debugging • TotalView from RogueWave Software • DDT from Allinea • Inspector from Intel Performance Analysis and Tuning • LC's web pages at https://hpc.llnl.gov/software/development-environment- software • TAU: http://www.cs.uoregon.edu/research/tau/docs.php • HPCToolkit: http://hpctoolkit.org/documentation.html • Open|Speedshop: http://www.openspeedshop.org/ • Vampir / Vampirtrace: http://vampir.eu/ • Valgrind: http://valgrind.org/ • PAPI: http://icl.cs.utk.edu/papi/ • mpitrace https://computing.llnl.gov/tutorials/bgq/index.html#mpitrace • mpiP: http://mpip.sourceforge.net/ • memP: http://memp.sourceforge.net/