2. Chapter 4
Parallel Processing Concepts
• 4.1 Program flow mechanism
• 4.2 Control flow versus data flow; A data flow
Architecture
• 4.3 Demand driven mechanism; Reduction machine
model
• 4.4 Comparison of flow mechanisms
• 4.5 Coroutunes; Fork and Join, Data flow,
ParBegin and ParEnd
• 4.6 Processes; Remote Procedure Call
• 4.7 Implicit Parallelism; Explicit versus implicit
parallelism
3. Introduction
• Program flow mechanisms will be introduced.
• Data, demand and control flow or driven approach
will be introduced.
• Typical architecture of those system will be given
in this cheapter.
• Parallel processing concepts and fundementals of
parallel processing will be presented in this
chapter.
4. 4.1 Program flow mechanism
• Conventional computers base on control-flow
mechanism by which the order of program
execution is explicitly stated in user programs.
• Data-flow computers are based on a data-driven
mechanism which the execution of any instruction
to be driven by data (operand) availability.
• Dataflow computer emphasize a high degree of
parallelism at fine grain instruction level.
• Reduction computer are based on demand driven
mechanism which initiates an operation based on
the demand for its results by other computations.
5. 4.2 Control flow versus data
flow
• Von Neuman Computers uses program counters
(PC) to sequence the execution of instructions in a
program.
– PC sequenced by the instruction flow in a program.
– Sequential execution style has been called control-
driven.
– Control flow computers use shared memory to hold
program instructions and data objects.
– Variable in the shared memory are updated by many
instructions.
6. – This may produce side effects since memory is shared.
– This side effect may prevent parallelism.
– Control flow ca be made parallel by using parallel
languages construct or parallel compilers.
• In data flow computers, the execution of a
instruction is driven by data availability instead of
being guided by program counter.
– The instructions in a data driven program are not
ordered in any way.
– Computational results (data tokens) are passed directly
between instructions.
– The data generated by an instruction will be duplicated
into many copies and forwarded directly to all needy
instructions.
7. – Data driven schema requires no shared
memory, no program counter, and no control
sequencer.
– It requires a special mechanism to detect data
availability, to match tokens with needy
instructions.
– This implies the need of hand shaking or token
matching operations.
– Data flow computers exploits fine grain level
parallelism.
8. A data flow Architecture
• There are a few experimental data flow computer
project
• MIT developed tagged-token architecture for
building data flow computer.
•
• Hwang fig 2.12, page 72
•
• n PEs interconnected by nxn routing network.
• System supports pipelined dataflow operations in
all n Pes.
9.
10. • Machine provides a low-level token matching
mechanism. Instructions are stored in the program
memory.
• Tagged tokens enters the PE through local path.
• Tokens also passed other PEs through the routing
network.
• Each instruction represents a synchronization
operation.
• Another synchronization mechanism, called I-
structure which is tagged memory unit for
overlapping usage of a data structure by both the
producer and consumer processes.
11. • I-structure uses a 2- bit tag indicating a word
which is empty, full, or pending a request.
• This may thread pure data flow approach.
• Comparison of data and control flow machines.
• Hwang Page 73, fig 2.13
•
• Data flow computer can absorb the
communication latency and minimizes the loses
due to synchronization waits.
• Data flow offers an ideal model for MasPar
computations because all far-reaching side effect
are removed.
12.
13. 4.3 Demand driven mechanism
• In a reduction machine, the computation is
triggered by the demand for an operation’s results
•
• a = ((b+1)*c-(d/e)).
•
• The data driven computation chooses a bottom up
approach starting from the innermost operations;
• b+1 and d/e
• then proceeding * operation and finally –
14. • Such a computation is called eager evaluation
because operation carried out immediately after all
their operand available.
• A demand driven computation chooses a top down
approach by first demanding approach by first demanding
value of a, which triggers the demand for evaluating the
next-level expression (b+1)*e and d/e and then b+1.
• A demand driven computation correspond to lazy
evaluation, because operations are executed only when
their results are required by another instruction.
• The demand driven approaches matches naturally
with the functional programming concept.
• The removal of side effects in functional
programming makes programming easier to
parallelize.
15. Reduction machine model
• In a string reduction model, each demander gets a
separate copy of the expression for its own
evaluation.
• The operator is suspended while its input
arguments are being evaluated.
• Different part of program graphs or sub-regions
can be reduced or evaluated in parallel upon
demand.
• A determined value (a copy ) is returned to the
original demanding instruction.
16. 4.4 Comparison of flow
mechanisms
• Data, control, and demand flow mechanisms are
compared.
• Hwang page 76, table 2.1
• The degree of explicit control decreases from
control drive to demand driven to data driven.
• Advantages and disadvantages are given in the
table.
• Both data and demand flow mechanism despite of
a higher potentials for parallelism, are still in the
research.
• Control flow machines still dominate markets.
17.
18. 4.5 Coroutines
• The fundamental design character is the single
processor model.
• There is only one instruction stream with
sequential flow control.
• The processor system resource can be engaged
and released by a set of coroutines in an orderly
manner.
• A quasi-parallel execution takes place between
two or more corroutines.
19. • Execution starts with the call of one particular coroutine (
as a kind of procedure)
• Each coroutine may contain any number of transfer
statement that switch the flow of control to a different
coroutine.
• This not a procedure call.
• The transfer of control has to be explicit specified by the
application programmer ( ho also make sure that the flow
of control is transferred at the correct points.
• The procedure transfer is provided in order to switch
control of flow between corrotines in Modulo-2.
• PROCEDURE TRANFER (VAR source, Destination:ADDRESS);
20.
21. Fork and Join
• The fork and join construct are among the earliest parallel
language construct.
• It is possible start parallel processes in the Unix operating
system with the fork operation and to wait end of them
with wait operation.
• In this type of parallel programming, two fundamentally
different concepts are mixed first, the declaration of
parallel processes; and second, the synchronization of the
processes.
• Actually the functionality of the fork operation in Unix is
not as general as shown in figure 4.2
• Instead, an identical copy of the calling process is
generated, which then executes in parallel to the original.
22.
23. • The only possibility of a process to determine its identity
(it is an identification number).
• In order to start a different program and wait for its
termination, the two Unix calls can be embedded in the C
language in the following way.
int status;
if ( fork() == 0 execlp (“program_B”,...); /* Child Proc */
.... /* Parent Proc */
wait(&status);
25. Program Segments (2)
• Program code
• Any global declaration of variables will be doubled in each process but
local variables are not doubled.
26. • The call to the fork operation returns the process number
of the child process to the parent process (is not equal to
0).
• For child fork return 0 to to the child.
• The child immediately executes the execlp operation.
• The parent process can wait for the termination of the child
process (wait operation).
27. ParBegin and ParEnd
• Blocks of parallel code are defined with ParBegin and
ParEnd (cobegin and coend) in a manner analogous to the
sequential begin and end.
• However the instructions in the block should be carried out
simultaneously.
• This language is used AL control several robots and
coordinates them.
• Synchronization between processes through semaphores.
• Due to restrictions mentioned, this concept of statement
has no application in modern programming languages
(synchronization and etc).
28.
29. 4.6 Processes
• Processes are declared similar to procedures and are started
with a specific instruction.
• If several copies of a process needed to be executed, then
that process type must be started with multiple calls
possibly having different parameters.
• The synchronization between processes executing in
parallel may be controlled through concepts of semaphores
or monitors with condition variables.
• The explicit synchronization of parallel processes exact not
only not only an additional control cost but, is also
extremely susceptible to error and deadlocks.
30. • Communication and synchronization are accomplished in
system with shared (“tightly coupled”) via monitors with
conditions.
• In system without shared memory (“loosely coupled”), the
concept is illustrated in figure 3.x (4.4)
31. Remote procedure call
• In order to extend the process concept to a parallel
computer system without shared memory, the
communication between processes located on different
processor has to be carried out by message passing.
• The programming system is dvided into multiple parallel
processes, where each process takes on the roll of either a
client or a sever.
• Each sever can also become a client by using the services
of another sever.
• Each client confers tasks on one or more approximately
configured sever processes.
32. • This type of parallel task distribution is implemented with
the remote procedure call RPC mechanism.
33. • Here, a remote procedure call resembles just a task deposit
operation.
• Returning the results after the calculation by server
requires another explicit data exchange in the opposite
direction
• Problems with the remote procedure call include the
application of error tolerant protocols for resetting or may
be restarting of the client after a server failure.
34.
35. 4.7 Implicit parallelism
• All parallel concepts covered so far use a special term,
explicit language construct for controlling the parallel
execution.
• Several languages do not require any language constructs
for parallelism, but nevertheless allow parallel processing.
• Such a programming languages are called languages with
implicit parallelism.
• The programmer is much more limited in controlling the
parallel processors which are executing his program
(efficient parallelism as done by an intelligent compiler .
36. • The compiler has no interaction with the application
programmer (declarative languages represents knowledge
or problems to be solved by using complex mathematical
formulas.
• The implicit parallelism of vector expressions, for example
from programming language Functional Programming.
• A shown in figure 3.x (4.7), the mathematical notation of a
matrix addition contains implicit parallelism that can quite
easily be converted to a parallel computer architecture
through automatic parallelization.
37.
38. Explicit versus Implicit
parallelism
• A summary of advantages and disadvantages of explicit
and implicit is presented in figure 3.x (4.8.
• The programming actually occurs at high level abstraction;
for this reason implicit parallelism is often found in higher
level non-procedural languages.
• In contrast, explicit parallelism gives the programmer
considerable more flexibility that, can lead to better
processors utilization and higher performance.
• This advantage is paid for with more complicated and
more error prone programming method.