SlideShare a Scribd company logo
CHAPTER-4
PARALLEL PROGRAMMING
LANGUAGES By Dr. Heman Pathak
Associate Professor
KGC - Dehradun
Parallel Processes
Multicomputer
• A multicomputer is constructed out of multiple
computers and an interconnection network. The
processors on different computers interact by
passing messages to each other.
Multiprocessor
• It is a computer system with two or more CPUs. It is
highly integrated system in which all CPUs share
access to a single global memory.
2
Parallel Processes
Uniform Memory Access (UMA)
• It is a shared memory architecture in which
access time to a memory location is
independent of which processor makes the
request or which memory chip contains the
transferred data.
3
PROGRAMMING PARALLEL PROCESSES
Every parallel language must address certain issues, either explicitly
or implicitly
 There must be a way to create parallel processes
 there must be a way to coordinate the activities of these processes
 when processes exchange results, they must communicate and
synchronize with each other.
 Communication and synchronization can be accomplished by sharing
variables or by message passing.
PROGRAMMING PARALLEL PROCESSES
Communication and synchronization can be accomplished by sharing
variables or by message passing. There are two methods of
synchronization:
synchronization for
precedence
It guarantees that one
event does not begin until
another event has finished
synchronization for
mutual exclusion.
It guarantees that only one
process at a time enters a
critical section of code
where a data structure to
be shared is manipulated
Barrier Synchronization
Barrier
P1
P2
P3
P4
Barrier
P1
P2
P3
P4
P1
P2
P3
P4
time
Barrier
four processes
approach the
barrier
all except P4
arrive
Once all
arrive, they
continue
Mutual Exclusion Synchronization
Process 1
Process 2
An Illustrative Example for UMA
let's consider the problem of computing variance for a list of real numbers. The
values given are r1, r2, r3 ,..., rn
Variance = 𝒓𝒊 − 𝒎 𝟐/𝒏 where m = 𝒓𝒊/𝒏
 Parallel Architecture used is UMA with 4 Processors
 n real numbers are stored in shared memory
 Four variables are also stored in shared memory
 one will contain the grand total
 another the mean
 a third the global sum of squares
 a fourth the variance
An Illustrative Example for UMA
 Four processes are created one for each processors.
 Each process has a local temporary variable.
 Each process adds its share of the n values
 Accumulating the sum in its local temporary variable.
 When the process is through computing its subtotal, it adds its
subtotal to the shared variable, accumulating the grand total.
 Since multiple processes are accessing the same global variable,
that portion of code is a critical section, and the processes must
enforce mutual exclusion.
An Illustrative Example for UMA
An Illustrative Example for UMA
 A barrier synchronization step, inserted in the algorithm after the critical
section, ensures that no process continues until all processes have added
their subtotals to the grand total.
 In sequential code one process computes the average by dividing the
grand total by n.
 To compute the global sum of squares, the processes go through a process
similar to that which computes the global sum.
 Again, the processes must enforce mutual exclusion.
 After another barrier synchronization, one process computes the variance
by dividing the sun of squares by the number of values.
An Illustrative Example for Multicomputer
Solve the same problem on a four-node multicomputer
 There is no shared memory; the n values are
distributed among the local memories of the
nodes.
 Each node process has four variables:
 two to accumulate sums
 one to store the mean
 and another to store the variance
 Each node process initializes the two accumulator
to 0.
 At this point every process has a subtotal; the
four subtotals must be combined to find the
grand total.
An Illustrative Example for Multicomputer
7
• 0
8
• 1
11
• 2
10
• 3
15 15
21 21
36 36
36 36
An Illustrative Example for Multicomputer
The four subtotals must be
combined to find the grand total.
 After two exchange-and-add
steps, every process has the grand
total.
 Every process can divide the
grand total by n to determine the
mean.
An Illustrative Example for Multicomputer
A similar set of steps allows the
processes to compute the variance of
the list values.
Every process has the result. One of the
processes can pass the answer back to
program running on the front end, which
then de-allocates the hypercube,
surrendering access to the nodes.
A Sample Application
Integration: Find the area under the curve
4/( I +x2) between 0 and 1= 
 The interval [0, 1] is divided into n
subintervals of width I/n.
 For each these intervals the algorithm
computes the area of a rectangle whose
height is such that the curve intersects the top
of the rectangle at its midpoint.
 The sum of the areas of the n rectangles
approximates the area under the curve.
A Sample Application
 This algorithm is data-parallel.
 Since the areas of all the rectangles
can be
 computed simultaneously.
 Computing the area of each rectangle
requires the same amount of work:
hence load balancing is insignificant.
 If the language requires to divide the
work among the processors, it can be
done easily.
FORTRAN 90
Parallel Programming Language
FORTRAN 90
In 1978 the ANSI-accredited technical
committee, X3J3 began working on a
new version of the FORTRAN language.
In the early l990s the resulting language,
Fortran 90, was adopted as an ISO and
ANSI standard.
FORTRAN 90
Fortran 90 is a superset of FORTRAN 77. It includes all the features of FORTRAN 77, plus
 Array operations
 Improved facilities for numerical computations
 Syntax to allow processors to support short integers, packed logicals, very large
character sets, and high-precision real and complex numbers
 User-defined data types, structures, and pointers
 Dynamic storage allocation
 Modules to support abstract data types
 Internal procedures and recursive procedures
 Improvements to input-output facilities
 New control structures
 New intrinsic procedures
 Terminal-oriented source form
FORTRAN 90
The committee also marked many language features as obsolete,
including
 arithmetic IF
 some DO construct variations,
 assigned Go TO,
 assigned formats
 and the H edit descriptor
The next revision of the FORTRAN standard may
not contain these features.
FORTRAN 90 Programmer‘s Model
The Fortran 90 programmer has a model of parallel computation similar to a
PRAM. A CPU and a vector unit share a Single memory.
 The CPU executes sequential
instructions, accessing variables
stored in the shard memory.
 To execute parallel operations,
the CPU controls the vector unit.
 Which also stores and fetches
data to and from the shared
memory
FORTRAN 90 Language Features
Fortran 90 gives the programmer the ability to specify the type of variables through type
declaration statements such as
• REAL A, B, C
• INTEGER I
Each type may have several kinds. For example, a real variable may be stored in 4 bytes or
8 bytes. The Fortran 90 programmer may specify explicitly the kind, as well as the type of
the variable, as in the following example:
• REAL ( KIND=LONG) PI
Fortran 90 introduces the notion of an array constant. For example.
• (/ 2, 3, 5, 7,11 /)
denotes a one dimensional array with five elements. It is possible to construct an array of
higher dimension by declaring an array constant, then changing its dimension with the
RESHAPE function.
FORTRAN 90 Language Features
An implied Do notation can simplify the specification of any constants.
For example, the array constant
• (/ 2, 4, 6, 8,10 /) may be specified as (/ (I, I = 2,10, 2) /)
Fortran 90 also allows operations on arrays. When applied to an array, the unary intrinsic
operators + and - return an array of the same dimensions, where the elements in the result
array are found by applying the operator to the corresponding elements in the operand
array.
Numerical, relational, and logical binary intrinsic operators can manipulate arrays having the
same dimensions. Each element in the result array is found by applying the operator to the
corresponding elements in the operand arrays.
A binary intrinsic operator can also manipulate an array and a scalar variable, resulting in
an array of the same dimensions as the array operand.
FORTRAN 90 Language Features
For example, given the array declarations
• REAL, DIMENSION(100,50) : : X, Y
• REAL, DIMENSION(100) : : Z
the following are examples of legal array expressions:
X + Y Array of shape(100,50), elements X(I,J) + Y(I,J)
X + 1.0 Array of shape(100,50), elements X(I,J) + 1.0
X .EQ. Y Value .True. If X(I,J) .EQ. Y(I,J) and .FALSE. otherwise
X(1:100,3) +Z Array of shape(100), elements X(I,3) + Z(I)
FORTRAN 90 Language Features
Sometimes it is important to be able to perform an
operation on a subset of the Array elements. The WHERE
statement allows the programmer to specify which array
elements are lo be active. For example, the statement
WHERE (A > 0.0) A = SORT (A)
replaces every positive element of A with its square root.
FORTRAN 90 Language Features
The WHERE statement divides the array elements into two sets, first performing
one or more array assignments on the elements for which the expression is true,
then performing one or more array assignments on the elements for which the
expression is false. The syntax of most general form of WHERE statement is-
WHERE(logical-array-expression)
array-assignment-statements
ELSEWHERE
array-assignment-statements
END WHERE
Finally new transformational functions allow the reduction of an array into a scalar
value. For example, the function SUM returns the sum of the elements of the any
passed to it as an argument.
FORTRAN 90 Language Features
SEQUENT C
Parallel Programming Language
Sequent C
 Sequent computers run the DYNIX operating system, a version of UNIX
tailored for the multiprocessor environment.
 In addition to the operating-system calls typically found in a UNIX
system, DYNIX provides a set of routines to facilitate parallel processing.
 The commercial parallel programming languages the Sequent hardware
uses are simple extensions of sequential languages that allow
programmers to declare shared variables that interact via mutual
exclusion and barrier synchronization.
 The resulting languages are primitive.
Sequent C - Shared Data
 Parallel processes on the Sequent coordinate their activities by accessing
shared data structures.
 The keyword shared placed before a global variable declaration,
indicates that all processes are to share a single instance of that
variable.
 For example, if a 10-element global array is declared int a [10], then
every active process has its own copy of the array; if one process
modifies a value in its copy of a, no other process's value will change.
 On the other hand, if the array is declared shared int a [10] , then all
active processes share a single instance of the array, and changes made
by one process can be detected by the other processes.
Sequent C - Parallel Processing Function
 A program begins execution as a single process. This process is
responsible for executing those parts of the program that are
inherently sequential.
 the original process forks a number of other processes, each
process performing its share of the work.
 The total number of processes accessing shared data cannot
exceed the number of physical processors less one. Because there
are at least as many CPUs as active processes, each process may
execute on its own CPU.
 This allows a major reduction in the execution time, assuming that
the computer is not executing any other jobs.
Sequent C - Parallel Processing Function
 When control reaches an inherently sequential
portion of the computation, only the original
process executes the code; the remaining processes
wait until control reaches another portion of the
computation that can be divided into pieces and
executed concurrently. The program cycles through
these two modes until termination.
 Parallel programs executing on the Sequent
alternate between sequential and parallel
execution.
 The transition from parallel to sequential execution
is always delimited by a barrier synchronization.
In addition, data dependencies may
require the insertion of barrier
synchronizations within parallel code.
Sequent C - Parallel Processing Function
Sequent C - Parallel Processing Function
Sequent's microtasking library has seven key functions:
1. m_set_procs(p):
The parent process initializes to value p a shared variable that
controls the number of processes created by a subsequent call to
m_fork.
The value of p can not exceed the number of physical processors
in the system minus one.
The function also initializes barriers and locks.
Sequent C - Parallel Processing Function
2. m_fork(name[,arg,...]):
The parent process creates a number of child processes,
The parent process and the child processes begin executing
function name with the arguments (if any) also specified by the
call to m_fork.
After all the processes (the parent and all the children) have
completed execution of function name, the parent process
resumes execution with the code after m_fork, while the child
processes busy wait until the next call to m_fork.
Sequent C - Parallel Processing Function
2. m_fork(name[,arg,...]):
The parent
process
creates p
number of
child
processes
All processes
begin
executing
function name
The parent
process
resumes
execution with
the code after
m_fork
The child
processes
busy wait until
the next call
to m_fork.
Therefore, the first call to m_fork is more expensive than subsequent
calls, because only the first call entails process creation.
Sequent C - Parallel Processing Function
3. m_get_myid: A process calls function m_get_myid to get its
unique process number. If the total number of active processes
is p, then the process number of the parent is 0, while the
process numbers of the child processes range from 1 to p-1.
4. m_get_numprocs: Function m_get_numprocs returns the
number of processes executing in parallel. Given the total
number of processes and its own process number, a process
can determine which portion of a computation is its
responsibility.
Sequent C - Parallel Processing Function
5. m_lock, m_unlock: Functions m_lock and m_unlock
ensure mutually exclusive execution of the code that
the two calls surround. Once a process has entered a
block of code delimited by m_lock and m_unlock, no
other process may enter until the first process has left.
7. m_kill_procs: Function m_kill_procs kills the child
processes created by the first call to m_fork.
Sequent C - Monitor
 Most parallel algorithms implemented on multiprocessors require a
process to perform a series of operations on a shared data structure, as
if it were an atomic operation.
 For example, a process may need to fetch the value at the beginning of
a linked list and advance the list pointer to the next list element.
 When the hardware cannot perform the entire series of operations as an
atomic operation, the process must have some way to enforce mutual
exclusion, keeping all other processes from referencing the resource while
it is being modified. The piece of code in which mutual exclusion must be
enforced is called a critical section.
Sequent C - Monitor
One way to structure accesses to shared resources is by using a monitor.
 A monitor consists of variables representing the state of some resource,
procedures that implement operations on it, and initialization code.
 The values of the variables are initialized before any procedure in the
monitor is called; these values are retained between procedure invocations
and may be accessed only by procedures in the monitor.
 Monitor procedures resemble ordinary procedures in the programming
language with one significant exception. The execution of the procedures in
the same monitor is guaranteed to be mutually exclusive. Hence monitors
are a structured way of implementing mutual exclusion.
Sequent C - Monitor
Sequent C - Monitor
Sequent C - Monitor
Programming languages that support monitors include Concurrent
Pascal (Brinch Hansen 1975, 1977) and Modula (Wirth 1977a,
1977b, 1977c). Even if your parallel programming language does
not support monitors, you can implement one yourself. For example, in
the Sequent C language, you can implement a monitor by declaring a
shared lock variable for each resource, putting an s_lock statement
that accesses the variable at the start of each procedure, and putting
an s_unlock statement at the end of each procedure. You also must
have enough self-discipline to use only these procedures to access the
shared resource.

More Related Content

What's hot

A quick intro to In memory computing
A quick intro to In memory computingA quick intro to In memory computing
A quick intro to In memory computing
Neobric
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Ali A Jalil
 
IRJET- Design of Low Power 32- Bit RISC Processor using Verilog HDL
IRJET-  	  Design of Low Power 32- Bit RISC Processor using Verilog HDLIRJET-  	  Design of Low Power 32- Bit RISC Processor using Verilog HDL
IRJET- Design of Low Power 32- Bit RISC Processor using Verilog HDL
IRJET Journal
 
第4回 配信講義 計算科学技術特論A (2021)
第4回 配信講義 計算科学技術特論A (2021)第4回 配信講義 計算科学技術特論A (2021)
第4回 配信講義 計算科学技術特論A (2021)
RCCSRENKEI
 
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
Fixstars Corporation
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
Heman Pathak
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
Vajira Thambawita
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
Mehul Patel
 
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
Edge AI and Vision Alliance
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
Heman Pathak
 
Computer architecture kai hwang
Computer architecture   kai hwangComputer architecture   kai hwang
Computer architecture kai hwangSumedha
 
Ensembling & Boosting 概念介紹
Ensembling & Boosting  概念介紹Ensembling & Boosting  概念介紹
Ensembling & Boosting 概念介紹
Wayne Chen
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
Dhanashree Prasad
 
Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN)Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN)
gokulprasath06
 
Chapter 14 AutoEncoder
Chapter 14 AutoEncoderChapter 14 AutoEncoder
Chapter 14 AutoEncoder
KyeongUkJang
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So Well
Jinseob Kim
 
第2回 配信講義 計算科学技術特論A (2021)
第2回 配信講義 計算科学技術特論A (2021) 第2回 配信講義 計算科学技術特論A (2021)
第2回 配信講義 計算科学技術特論A (2021)
RCCSRENKEI
 
Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all Broadcast
RashiJoshi11
 
TensroFlow XLA : JIT編 (r1.3版)
TensroFlow XLA : JIT編 (r1.3版)TensroFlow XLA : JIT編 (r1.3版)
TensroFlow XLA : JIT編 (r1.3版)
Mr. Vengineer
 

What's hot (20)

A quick intro to In memory computing
A quick intro to In memory computingA quick intro to In memory computing
A quick intro to In memory computing
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
IRJET- Design of Low Power 32- Bit RISC Processor using Verilog HDL
IRJET-  	  Design of Low Power 32- Bit RISC Processor using Verilog HDLIRJET-  	  Design of Low Power 32- Bit RISC Processor using Verilog HDL
IRJET- Design of Low Power 32- Bit RISC Processor using Verilog HDL
 
第4回 配信講義 計算科学技術特論A (2021)
第4回 配信講義 計算科学技術特論A (2021)第4回 配信講義 計算科学技術特論A (2021)
第4回 配信講義 計算科学技術特論A (2021)
 
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Computer architecture kai hwang
Computer architecture   kai hwangComputer architecture   kai hwang
Computer architecture kai hwang
 
Ensembling & Boosting 概念介紹
Ensembling & Boosting  概念介紹Ensembling & Boosting  概念介紹
Ensembling & Boosting 概念介紹
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN)Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN)
 
Chapter 14 AutoEncoder
Chapter 14 AutoEncoderChapter 14 AutoEncoder
Chapter 14 AutoEncoder
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So Well
 
Open MPI
Open MPIOpen MPI
Open MPI
 
第2回 配信講義 計算科学技術特論A (2021)
第2回 配信講義 計算科学技術特論A (2021) 第2回 配信講義 計算科学技術特論A (2021)
第2回 配信講義 計算科学技術特論A (2021)
 
Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all Broadcast
 
TensroFlow XLA : JIT編 (r1.3版)
TensroFlow XLA : JIT編 (r1.3版)TensroFlow XLA : JIT編 (r1.3版)
TensroFlow XLA : JIT編 (r1.3版)
 

Similar to Chapter 4: Parallel Programming Languages

Performance analysis and randamized agoritham
Performance analysis and randamized agorithamPerformance analysis and randamized agoritham
Performance analysis and randamized agoritham
lilyMalar1
 
Complier design
Complier design Complier design
Complier design
shreeuva
 
Intermediate code optimization Unit-4.pdf
Intermediate code optimization Unit-4.pdfIntermediate code optimization Unit-4.pdf
Intermediate code optimization Unit-4.pdf
Himanshu883663
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
Trupti Agrawal
 
Unit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdfUnit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdf
Vinayak247538
 
Co&al lecture-07
Co&al lecture-07Co&al lecture-07
Co&al lecture-07
AbdulKarim563520
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
Dhaval Kaneria
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
TarikuDabala1
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and AlgorithmDhaval Kaneria
 
Performance measures
Performance measuresPerformance measures
Performance measures
Divya Tiwari
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data types
Rowena Cornejo
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012Jussara F.M.
 
TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016
Andrii Babii
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
johnsmith96441
 
VCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxVCE Unit 01 (2).pptx
VCE Unit 01 (2).pptx
skilljiolms
 
DATA STRUCTURE.pdf
DATA STRUCTURE.pdfDATA STRUCTURE.pdf
DATA STRUCTURE.pdf
ibrahim386946
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
RobinRohit2
 

Similar to Chapter 4: Parallel Programming Languages (20)

Performance analysis and randamized agoritham
Performance analysis and randamized agorithamPerformance analysis and randamized agoritham
Performance analysis and randamized agoritham
 
Complier design
Complier design Complier design
Complier design
 
Algorithm.pptx
Algorithm.pptxAlgorithm.pptx
Algorithm.pptx
 
Algorithm.pptx
Algorithm.pptxAlgorithm.pptx
Algorithm.pptx
 
Intermediate code optimization Unit-4.pdf
Intermediate code optimization Unit-4.pdfIntermediate code optimization Unit-4.pdf
Intermediate code optimization Unit-4.pdf
 
genalg
genalggenalg
genalg
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 
Unit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdfUnit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdf
 
Co&al lecture-07
Co&al lecture-07Co&al lecture-07
Co&al lecture-07
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data types
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
 
TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 
VCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxVCE Unit 01 (2).pptx
VCE Unit 01 (2).pptx
 
DATA STRUCTURE.pdf
DATA STRUCTURE.pdfDATA STRUCTURE.pdf
DATA STRUCTURE.pdf
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
 

More from Heman Pathak

Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Heman Pathak
 
Central processing unit
Central processing unitCentral processing unit
Central processing unit
Heman Pathak
 
Registers and counters
Registers and countersRegisters and counters
Registers and counters
Heman Pathak
 
Sequential Circuit
Sequential CircuitSequential Circuit
Sequential Circuit
Heman Pathak
 
Combinational logic 2
Combinational logic 2Combinational logic 2
Combinational logic 2
Heman Pathak
 
Combinational logic 1
Combinational logic 1Combinational logic 1
Combinational logic 1
Heman Pathak
 
Simplification of Boolean Function
Simplification of Boolean FunctionSimplification of Boolean Function
Simplification of Boolean Function
Heman Pathak
 
Chapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesChapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic Gates
Heman Pathak
 
Chapter 7: Matrix Multiplication
Chapter 7: Matrix MultiplicationChapter 7: Matrix Multiplication
Chapter 7: Matrix Multiplication
Heman Pathak
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
Heman Pathak
 
Cost optimal algorithm
Cost optimal algorithmCost optimal algorithm
Cost optimal algorithm
Heman Pathak
 
Parallel Algorithm for Graph Coloring
Parallel Algorithm for Graph Coloring Parallel Algorithm for Graph Coloring
Parallel Algorithm for Graph Coloring
Heman Pathak
 

More from Heman Pathak (12)

Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Central processing unit
Central processing unitCentral processing unit
Central processing unit
 
Registers and counters
Registers and countersRegisters and counters
Registers and counters
 
Sequential Circuit
Sequential CircuitSequential Circuit
Sequential Circuit
 
Combinational logic 2
Combinational logic 2Combinational logic 2
Combinational logic 2
 
Combinational logic 1
Combinational logic 1Combinational logic 1
Combinational logic 1
 
Simplification of Boolean Function
Simplification of Boolean FunctionSimplification of Boolean Function
Simplification of Boolean Function
 
Chapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesChapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic Gates
 
Chapter 7: Matrix Multiplication
Chapter 7: Matrix MultiplicationChapter 7: Matrix Multiplication
Chapter 7: Matrix Multiplication
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
 
Cost optimal algorithm
Cost optimal algorithmCost optimal algorithm
Cost optimal algorithm
 
Parallel Algorithm for Graph Coloring
Parallel Algorithm for Graph Coloring Parallel Algorithm for Graph Coloring
Parallel Algorithm for Graph Coloring
 

Recently uploaded

Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 

Recently uploaded (20)

Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 

Chapter 4: Parallel Programming Languages

  • 1. CHAPTER-4 PARALLEL PROGRAMMING LANGUAGES By Dr. Heman Pathak Associate Professor KGC - Dehradun
  • 2. Parallel Processes Multicomputer • A multicomputer is constructed out of multiple computers and an interconnection network. The processors on different computers interact by passing messages to each other. Multiprocessor • It is a computer system with two or more CPUs. It is highly integrated system in which all CPUs share access to a single global memory. 2
  • 3. Parallel Processes Uniform Memory Access (UMA) • It is a shared memory architecture in which access time to a memory location is independent of which processor makes the request or which memory chip contains the transferred data. 3
  • 4. PROGRAMMING PARALLEL PROCESSES Every parallel language must address certain issues, either explicitly or implicitly  There must be a way to create parallel processes  there must be a way to coordinate the activities of these processes  when processes exchange results, they must communicate and synchronize with each other.  Communication and synchronization can be accomplished by sharing variables or by message passing.
  • 5. PROGRAMMING PARALLEL PROCESSES Communication and synchronization can be accomplished by sharing variables or by message passing. There are two methods of synchronization: synchronization for precedence It guarantees that one event does not begin until another event has finished synchronization for mutual exclusion. It guarantees that only one process at a time enters a critical section of code where a data structure to be shared is manipulated
  • 8. An Illustrative Example for UMA let's consider the problem of computing variance for a list of real numbers. The values given are r1, r2, r3 ,..., rn Variance = 𝒓𝒊 − 𝒎 𝟐/𝒏 where m = 𝒓𝒊/𝒏  Parallel Architecture used is UMA with 4 Processors  n real numbers are stored in shared memory  Four variables are also stored in shared memory  one will contain the grand total  another the mean  a third the global sum of squares  a fourth the variance
  • 9. An Illustrative Example for UMA  Four processes are created one for each processors.  Each process has a local temporary variable.  Each process adds its share of the n values  Accumulating the sum in its local temporary variable.  When the process is through computing its subtotal, it adds its subtotal to the shared variable, accumulating the grand total.  Since multiple processes are accessing the same global variable, that portion of code is a critical section, and the processes must enforce mutual exclusion.
  • 11. An Illustrative Example for UMA  A barrier synchronization step, inserted in the algorithm after the critical section, ensures that no process continues until all processes have added their subtotals to the grand total.  In sequential code one process computes the average by dividing the grand total by n.  To compute the global sum of squares, the processes go through a process similar to that which computes the global sum.  Again, the processes must enforce mutual exclusion.  After another barrier synchronization, one process computes the variance by dividing the sun of squares by the number of values.
  • 12. An Illustrative Example for Multicomputer Solve the same problem on a four-node multicomputer  There is no shared memory; the n values are distributed among the local memories of the nodes.  Each node process has four variables:  two to accumulate sums  one to store the mean  and another to store the variance  Each node process initializes the two accumulator to 0.  At this point every process has a subtotal; the four subtotals must be combined to find the grand total.
  • 13. An Illustrative Example for Multicomputer 7 • 0 8 • 1 11 • 2 10 • 3 15 15 21 21 36 36 36 36
  • 14. An Illustrative Example for Multicomputer The four subtotals must be combined to find the grand total.  After two exchange-and-add steps, every process has the grand total.  Every process can divide the grand total by n to determine the mean.
  • 15. An Illustrative Example for Multicomputer A similar set of steps allows the processes to compute the variance of the list values. Every process has the result. One of the processes can pass the answer back to program running on the front end, which then de-allocates the hypercube, surrendering access to the nodes.
  • 16. A Sample Application Integration: Find the area under the curve 4/( I +x2) between 0 and 1=   The interval [0, 1] is divided into n subintervals of width I/n.  For each these intervals the algorithm computes the area of a rectangle whose height is such that the curve intersects the top of the rectangle at its midpoint.  The sum of the areas of the n rectangles approximates the area under the curve.
  • 17. A Sample Application  This algorithm is data-parallel.  Since the areas of all the rectangles can be  computed simultaneously.  Computing the area of each rectangle requires the same amount of work: hence load balancing is insignificant.  If the language requires to divide the work among the processors, it can be done easily.
  • 19. FORTRAN 90 In 1978 the ANSI-accredited technical committee, X3J3 began working on a new version of the FORTRAN language. In the early l990s the resulting language, Fortran 90, was adopted as an ISO and ANSI standard.
  • 20. FORTRAN 90 Fortran 90 is a superset of FORTRAN 77. It includes all the features of FORTRAN 77, plus  Array operations  Improved facilities for numerical computations  Syntax to allow processors to support short integers, packed logicals, very large character sets, and high-precision real and complex numbers  User-defined data types, structures, and pointers  Dynamic storage allocation  Modules to support abstract data types  Internal procedures and recursive procedures  Improvements to input-output facilities  New control structures  New intrinsic procedures  Terminal-oriented source form
  • 21. FORTRAN 90 The committee also marked many language features as obsolete, including  arithmetic IF  some DO construct variations,  assigned Go TO,  assigned formats  and the H edit descriptor The next revision of the FORTRAN standard may not contain these features.
  • 22. FORTRAN 90 Programmer‘s Model The Fortran 90 programmer has a model of parallel computation similar to a PRAM. A CPU and a vector unit share a Single memory.  The CPU executes sequential instructions, accessing variables stored in the shard memory.  To execute parallel operations, the CPU controls the vector unit.  Which also stores and fetches data to and from the shared memory
  • 23. FORTRAN 90 Language Features Fortran 90 gives the programmer the ability to specify the type of variables through type declaration statements such as • REAL A, B, C • INTEGER I Each type may have several kinds. For example, a real variable may be stored in 4 bytes or 8 bytes. The Fortran 90 programmer may specify explicitly the kind, as well as the type of the variable, as in the following example: • REAL ( KIND=LONG) PI Fortran 90 introduces the notion of an array constant. For example. • (/ 2, 3, 5, 7,11 /) denotes a one dimensional array with five elements. It is possible to construct an array of higher dimension by declaring an array constant, then changing its dimension with the RESHAPE function.
  • 24. FORTRAN 90 Language Features An implied Do notation can simplify the specification of any constants. For example, the array constant • (/ 2, 4, 6, 8,10 /) may be specified as (/ (I, I = 2,10, 2) /) Fortran 90 also allows operations on arrays. When applied to an array, the unary intrinsic operators + and - return an array of the same dimensions, where the elements in the result array are found by applying the operator to the corresponding elements in the operand array. Numerical, relational, and logical binary intrinsic operators can manipulate arrays having the same dimensions. Each element in the result array is found by applying the operator to the corresponding elements in the operand arrays. A binary intrinsic operator can also manipulate an array and a scalar variable, resulting in an array of the same dimensions as the array operand.
  • 25. FORTRAN 90 Language Features For example, given the array declarations • REAL, DIMENSION(100,50) : : X, Y • REAL, DIMENSION(100) : : Z the following are examples of legal array expressions: X + Y Array of shape(100,50), elements X(I,J) + Y(I,J) X + 1.0 Array of shape(100,50), elements X(I,J) + 1.0 X .EQ. Y Value .True. If X(I,J) .EQ. Y(I,J) and .FALSE. otherwise X(1:100,3) +Z Array of shape(100), elements X(I,3) + Z(I)
  • 26. FORTRAN 90 Language Features Sometimes it is important to be able to perform an operation on a subset of the Array elements. The WHERE statement allows the programmer to specify which array elements are lo be active. For example, the statement WHERE (A > 0.0) A = SORT (A) replaces every positive element of A with its square root.
  • 27. FORTRAN 90 Language Features The WHERE statement divides the array elements into two sets, first performing one or more array assignments on the elements for which the expression is true, then performing one or more array assignments on the elements for which the expression is false. The syntax of most general form of WHERE statement is- WHERE(logical-array-expression) array-assignment-statements ELSEWHERE array-assignment-statements END WHERE Finally new transformational functions allow the reduction of an array into a scalar value. For example, the function SUM returns the sum of the elements of the any passed to it as an argument.
  • 30. Sequent C  Sequent computers run the DYNIX operating system, a version of UNIX tailored for the multiprocessor environment.  In addition to the operating-system calls typically found in a UNIX system, DYNIX provides a set of routines to facilitate parallel processing.  The commercial parallel programming languages the Sequent hardware uses are simple extensions of sequential languages that allow programmers to declare shared variables that interact via mutual exclusion and barrier synchronization.  The resulting languages are primitive.
  • 31. Sequent C - Shared Data  Parallel processes on the Sequent coordinate their activities by accessing shared data structures.  The keyword shared placed before a global variable declaration, indicates that all processes are to share a single instance of that variable.  For example, if a 10-element global array is declared int a [10], then every active process has its own copy of the array; if one process modifies a value in its copy of a, no other process's value will change.  On the other hand, if the array is declared shared int a [10] , then all active processes share a single instance of the array, and changes made by one process can be detected by the other processes.
  • 32. Sequent C - Parallel Processing Function  A program begins execution as a single process. This process is responsible for executing those parts of the program that are inherently sequential.  the original process forks a number of other processes, each process performing its share of the work.  The total number of processes accessing shared data cannot exceed the number of physical processors less one. Because there are at least as many CPUs as active processes, each process may execute on its own CPU.  This allows a major reduction in the execution time, assuming that the computer is not executing any other jobs.
  • 33. Sequent C - Parallel Processing Function  When control reaches an inherently sequential portion of the computation, only the original process executes the code; the remaining processes wait until control reaches another portion of the computation that can be divided into pieces and executed concurrently. The program cycles through these two modes until termination.  Parallel programs executing on the Sequent alternate between sequential and parallel execution.  The transition from parallel to sequential execution is always delimited by a barrier synchronization. In addition, data dependencies may require the insertion of barrier synchronizations within parallel code.
  • 34. Sequent C - Parallel Processing Function
  • 35. Sequent C - Parallel Processing Function Sequent's microtasking library has seven key functions: 1. m_set_procs(p): The parent process initializes to value p a shared variable that controls the number of processes created by a subsequent call to m_fork. The value of p can not exceed the number of physical processors in the system minus one. The function also initializes barriers and locks.
  • 36. Sequent C - Parallel Processing Function 2. m_fork(name[,arg,...]): The parent process creates a number of child processes, The parent process and the child processes begin executing function name with the arguments (if any) also specified by the call to m_fork. After all the processes (the parent and all the children) have completed execution of function name, the parent process resumes execution with the code after m_fork, while the child processes busy wait until the next call to m_fork.
  • 37. Sequent C - Parallel Processing Function 2. m_fork(name[,arg,...]): The parent process creates p number of child processes All processes begin executing function name The parent process resumes execution with the code after m_fork The child processes busy wait until the next call to m_fork. Therefore, the first call to m_fork is more expensive than subsequent calls, because only the first call entails process creation.
  • 38. Sequent C - Parallel Processing Function 3. m_get_myid: A process calls function m_get_myid to get its unique process number. If the total number of active processes is p, then the process number of the parent is 0, while the process numbers of the child processes range from 1 to p-1. 4. m_get_numprocs: Function m_get_numprocs returns the number of processes executing in parallel. Given the total number of processes and its own process number, a process can determine which portion of a computation is its responsibility.
  • 39. Sequent C - Parallel Processing Function 5. m_lock, m_unlock: Functions m_lock and m_unlock ensure mutually exclusive execution of the code that the two calls surround. Once a process has entered a block of code delimited by m_lock and m_unlock, no other process may enter until the first process has left. 7. m_kill_procs: Function m_kill_procs kills the child processes created by the first call to m_fork.
  • 40. Sequent C - Monitor  Most parallel algorithms implemented on multiprocessors require a process to perform a series of operations on a shared data structure, as if it were an atomic operation.  For example, a process may need to fetch the value at the beginning of a linked list and advance the list pointer to the next list element.  When the hardware cannot perform the entire series of operations as an atomic operation, the process must have some way to enforce mutual exclusion, keeping all other processes from referencing the resource while it is being modified. The piece of code in which mutual exclusion must be enforced is called a critical section.
  • 41. Sequent C - Monitor One way to structure accesses to shared resources is by using a monitor.  A monitor consists of variables representing the state of some resource, procedures that implement operations on it, and initialization code.  The values of the variables are initialized before any procedure in the monitor is called; these values are retained between procedure invocations and may be accessed only by procedures in the monitor.  Monitor procedures resemble ordinary procedures in the programming language with one significant exception. The execution of the procedures in the same monitor is guaranteed to be mutually exclusive. Hence monitors are a structured way of implementing mutual exclusion.
  • 42. Sequent C - Monitor
  • 43. Sequent C - Monitor
  • 44. Sequent C - Monitor Programming languages that support monitors include Concurrent Pascal (Brinch Hansen 1975, 1977) and Modula (Wirth 1977a, 1977b, 1977c). Even if your parallel programming language does not support monitors, you can implement one yourself. For example, in the Sequent C language, you can implement a monitor by declaring a shared lock variable for each resource, putting an s_lock statement that accesses the variable at the start of each procedure, and putting an s_unlock statement at the end of each procedure. You also must have enough self-discipline to use only these procedures to access the shared resource.