Chapter 4
Parallel Processing Concept
Chapter 4
Parallel Processing Concepts
• 4.1 Program flow mechanism
• 4.2 Control flow versus data flow; A data flow
Architecture
• 4.3 Demand driven mechanism; Reduction machine
model
• 4.4 Comparison of flow mechanisms
• 4.5 Coroutunes; Fork and Join, Data flow,
ParBegin and ParEnd
• 4.6 Processes; Remote Procedure Call
• 4.7 Implicit Parallelism; Explicit versus implicit
parallelism
Introduction
• Program flow mechanisms will be introduced.
• Data, demand and control flow or driven approach
will be introduced.
• Typical architecture of those system will be given
in this cheapter.
• Parallel processing concepts and fundementals of
parallel processing will be presented in this
chapter.
4.1 Program flow mechanism
• Conventional computers base on control-flow
mechanism by which the order of program
execution is explicitly stated in user programs.
• Data-flow computers are based on a data-driven
mechanism which the execution of any instruction
to be driven by data (operand) availability.
• Dataflow computer emphasize a high degree of
parallelism at fine grain instruction level.
• Reduction computer are based on demand driven
mechanism which initiates an operation based on
the demand for its results by other computations.
4.2 Control flow versus data
flow
• Von Neuman Computers uses program counters
(PC) to sequence the execution of instructions in a
program.
– PC sequenced by the instruction flow in a program.
– Sequential execution style has been called control-
driven.
– Control flow computers use shared memory to hold
program instructions and data objects.
– Variable in the shared memory are updated by many
instructions.
– This may produce side effects since memory is shared.
– This side effect may prevent parallelism.
– Control flow ca be made parallel by using parallel
languages construct or parallel compilers.
• In data flow computers, the execution of a
instruction is driven by data availability instead of
being guided by program counter.
– The instructions in a data driven program are not
ordered in any way.
– Computational results (data tokens) are passed directly
between instructions.
– The data generated by an instruction will be duplicated
into many copies and forwarded directly to all needy
instructions.
– Data driven schema requires no shared
memory, no program counter, and no control
sequencer.
– It requires a special mechanism to detect data
availability, to match tokens with needy
instructions.
– This implies the need of hand shaking or token
matching operations.
– Data flow computers exploits fine grain level
parallelism.
A data flow Architecture
• There are a few experimental data flow computer
project
• MIT developed tagged-token architecture for
building data flow computer.
•
• Hwang fig 2.12, page 72
•
• n PEs interconnected by nxn routing network.
• System supports pipelined dataflow operations in
all n Pes.
• Machine provides a low-level token matching
mechanism. Instructions are stored in the program
memory.
• Tagged tokens enters the PE through local path.
• Tokens also passed other PEs through the routing
network.
• Each instruction represents a synchronization
operation.
• Another synchronization mechanism, called I-
structure which is tagged memory unit for
overlapping usage of a data structure by both the
producer and consumer processes.
• I-structure uses a 2- bit tag indicating a word
which is empty, full, or pending a request.
• This may thread pure data flow approach.
• Comparison of data and control flow machines.
• Hwang Page 73, fig 2.13
•
• Data flow computer can absorb the
communication latency and minimizes the loses
due to synchronization waits.
• Data flow offers an ideal model for MasPar
computations because all far-reaching side effect
are removed.
4.3 Demand driven mechanism
• In a reduction machine, the computation is
triggered by the demand for an operation’s results
•
• a = ((b+1)*c-(d/e)).
•
• The data driven computation chooses a bottom up
approach starting from the innermost operations;
• b+1 and d/e
• then proceeding * operation and finally –
• Such a computation is called eager evaluation
because operation carried out immediately after all
their operand available.
• A demand driven computation chooses a top down
approach by first demanding approach by first demanding
value of a, which triggers the demand for evaluating the
next-level expression (b+1)*e and d/e and then b+1.
• A demand driven computation correspond to lazy
evaluation, because operations are executed only when
their results are required by another instruction.
• The demand driven approaches matches naturally
with the functional programming concept.
• The removal of side effects in functional
programming makes programming easier to
parallelize.
Reduction machine model
• In a string reduction model, each demander gets a
separate copy of the expression for its own
evaluation.
• The operator is suspended while its input
arguments are being evaluated.
• Different part of program graphs or sub-regions
can be reduced or evaluated in parallel upon
demand.
• A determined value (a copy ) is returned to the
original demanding instruction.
4.4 Comparison of flow
mechanisms
• Data, control, and demand flow mechanisms are
compared.
• Hwang page 76, table 2.1
• The degree of explicit control decreases from
control drive to demand driven to data driven.
• Advantages and disadvantages are given in the
table.
• Both data and demand flow mechanism despite of
a higher potentials for parallelism, are still in the
research.
• Control flow machines still dominate markets.
4.5 Coroutines
• The fundamental design character is the single
processor model.
• There is only one instruction stream with
sequential flow control.
• The processor system resource can be engaged
and released by a set of coroutines in an orderly
manner.
• A quasi-parallel execution takes place between
two or more corroutines.
• Execution starts with the call of one particular coroutine (
as a kind of procedure)
• Each coroutine may contain any number of transfer
statement that switch the flow of control to a different
coroutine.
• This not a procedure call.
• The transfer of control has to be explicit specified by the
application programmer ( ho also make sure that the flow
of control is transferred at the correct points.
• The procedure transfer is provided in order to switch
control of flow between corrotines in Modulo-2.
• PROCEDURE TRANFER (VAR source, Destination:ADDRESS);
Fork and Join
• The fork and join construct are among the earliest parallel
language construct.
• It is possible start parallel processes in the Unix operating
system with the fork operation and to wait end of them
with wait operation.
• In this type of parallel programming, two fundamentally
different concepts are mixed first, the declaration of
parallel processes; and second, the synchronization of the
processes.
• Actually the functionality of the fork operation in Unix is
not as general as shown in figure 4.2
• Instead, an identical copy of the calling process is
generated, which then executes in parallel to the original.
• The only possibility of a process to determine its identity
(it is an identification number).
• In order to start a different program and wait for its
termination, the two Unix calls can be embedded in the C
language in the following way.
int status;
if ( fork() == 0 execlp (“program_B”,...); /* Child Proc */
.... /* Parent Proc */
wait(&status);
Program Segments
• Master-Slave programming
Program Segments (2)
• Program code
• Any global declaration of variables will be doubled in each process but
local variables are not doubled.
• The call to the fork operation returns the process number
of the child process to the parent process (is not equal to
0).
• For child fork return 0 to to the child.
• The child immediately executes the execlp operation.
• The parent process can wait for the termination of the child
process (wait operation).
ParBegin and ParEnd
• Blocks of parallel code are defined with ParBegin and
ParEnd (cobegin and coend) in a manner analogous to the
sequential begin and end.
• However the instructions in the block should be carried out
simultaneously.
• This language is used AL control several robots and
coordinates them.
• Synchronization between processes through semaphores.
• Due to restrictions mentioned, this concept of statement
has no application in modern programming languages
(synchronization and etc).
4.6 Processes
• Processes are declared similar to procedures and are started
with a specific instruction.
• If several copies of a process needed to be executed, then
that process type must be started with multiple calls
possibly having different parameters.
• The synchronization between processes executing in
parallel may be controlled through concepts of semaphores
or monitors with condition variables.
• The explicit synchronization of parallel processes exact not
only not only an additional control cost but, is also
extremely susceptible to error and deadlocks.
• Communication and synchronization are accomplished in
system with shared (“tightly coupled”) via monitors with
conditions.
• In system without shared memory (“loosely coupled”), the
concept is illustrated in figure 3.x (4.4)
Remote procedure call
• In order to extend the process concept to a parallel
computer system without shared memory, the
communication between processes located on different
processor has to be carried out by message passing.
• The programming system is dvided into multiple parallel
processes, where each process takes on the roll of either a
client or a sever.
• Each sever can also become a client by using the services
of another sever.
• Each client confers tasks on one or more approximately
configured sever processes.
• This type of parallel task distribution is implemented with
the remote procedure call RPC mechanism.
• Here, a remote procedure call resembles just a task deposit
operation.
• Returning the results after the calculation by server
requires another explicit data exchange in the opposite
direction
• Problems with the remote procedure call include the
application of error tolerant protocols for resetting or may
be restarting of the client after a server failure.
4.7 Implicit parallelism
• All parallel concepts covered so far use a special term,
explicit language construct for controlling the parallel
execution.
• Several languages do not require any language constructs
for parallelism, but nevertheless allow parallel processing.
• Such a programming languages are called languages with
implicit parallelism.
• The programmer is much more limited in controlling the
parallel processors which are executing his program
(efficient parallelism as done by an intelligent compiler .
• The compiler has no interaction with the application
programmer (declarative languages represents knowledge
or problems to be solved by using complex mathematical
formulas.
• The implicit parallelism of vector expressions, for example
from programming language Functional Programming.
• A shown in figure 3.x (4.7), the mathematical notation of a
matrix addition contains implicit parallelism that can quite
easily be converted to a parallel computer architecture
through automatic parallelization.
Explicit versus Implicit
parallelism
• A summary of advantages and disadvantages of explicit
and implicit is presented in figure 3.x (4.8.
• The programming actually occurs at high level abstraction;
for this reason implicit parallelism is often found in higher
level non-procedural languages.
• In contrast, explicit parallelism gives the programmer
considerable more flexibility that, can lead to better
processors utilization and higher performance.
• This advantage is paid for with more complicated and
more error prone programming method.
BIL406-Chapter-4-Parallel Processing Concept.ppt

BIL406-Chapter-4-Parallel Processing Concept.ppt

  • 1.
  • 2.
    Chapter 4 Parallel ProcessingConcepts • 4.1 Program flow mechanism • 4.2 Control flow versus data flow; A data flow Architecture • 4.3 Demand driven mechanism; Reduction machine model • 4.4 Comparison of flow mechanisms • 4.5 Coroutunes; Fork and Join, Data flow, ParBegin and ParEnd • 4.6 Processes; Remote Procedure Call • 4.7 Implicit Parallelism; Explicit versus implicit parallelism
  • 3.
    Introduction • Program flowmechanisms will be introduced. • Data, demand and control flow or driven approach will be introduced. • Typical architecture of those system will be given in this cheapter. • Parallel processing concepts and fundementals of parallel processing will be presented in this chapter.
  • 4.
    4.1 Program flowmechanism • Conventional computers base on control-flow mechanism by which the order of program execution is explicitly stated in user programs. • Data-flow computers are based on a data-driven mechanism which the execution of any instruction to be driven by data (operand) availability. • Dataflow computer emphasize a high degree of parallelism at fine grain instruction level. • Reduction computer are based on demand driven mechanism which initiates an operation based on the demand for its results by other computations.
  • 5.
    4.2 Control flowversus data flow • Von Neuman Computers uses program counters (PC) to sequence the execution of instructions in a program. – PC sequenced by the instruction flow in a program. – Sequential execution style has been called control- driven. – Control flow computers use shared memory to hold program instructions and data objects. – Variable in the shared memory are updated by many instructions.
  • 6.
    – This mayproduce side effects since memory is shared. – This side effect may prevent parallelism. – Control flow ca be made parallel by using parallel languages construct or parallel compilers. • In data flow computers, the execution of a instruction is driven by data availability instead of being guided by program counter. – The instructions in a data driven program are not ordered in any way. – Computational results (data tokens) are passed directly between instructions. – The data generated by an instruction will be duplicated into many copies and forwarded directly to all needy instructions.
  • 7.
    – Data drivenschema requires no shared memory, no program counter, and no control sequencer. – It requires a special mechanism to detect data availability, to match tokens with needy instructions. – This implies the need of hand shaking or token matching operations. – Data flow computers exploits fine grain level parallelism.
  • 8.
    A data flowArchitecture • There are a few experimental data flow computer project • MIT developed tagged-token architecture for building data flow computer. • • Hwang fig 2.12, page 72 • • n PEs interconnected by nxn routing network. • System supports pipelined dataflow operations in all n Pes.
  • 10.
    • Machine providesa low-level token matching mechanism. Instructions are stored in the program memory. • Tagged tokens enters the PE through local path. • Tokens also passed other PEs through the routing network. • Each instruction represents a synchronization operation. • Another synchronization mechanism, called I- structure which is tagged memory unit for overlapping usage of a data structure by both the producer and consumer processes.
  • 11.
    • I-structure usesa 2- bit tag indicating a word which is empty, full, or pending a request. • This may thread pure data flow approach. • Comparison of data and control flow machines. • Hwang Page 73, fig 2.13 • • Data flow computer can absorb the communication latency and minimizes the loses due to synchronization waits. • Data flow offers an ideal model for MasPar computations because all far-reaching side effect are removed.
  • 13.
    4.3 Demand drivenmechanism • In a reduction machine, the computation is triggered by the demand for an operation’s results • • a = ((b+1)*c-(d/e)). • • The data driven computation chooses a bottom up approach starting from the innermost operations; • b+1 and d/e • then proceeding * operation and finally –
  • 14.
    • Such acomputation is called eager evaluation because operation carried out immediately after all their operand available. • A demand driven computation chooses a top down approach by first demanding approach by first demanding value of a, which triggers the demand for evaluating the next-level expression (b+1)*e and d/e and then b+1. • A demand driven computation correspond to lazy evaluation, because operations are executed only when their results are required by another instruction. • The demand driven approaches matches naturally with the functional programming concept. • The removal of side effects in functional programming makes programming easier to parallelize.
  • 15.
    Reduction machine model •In a string reduction model, each demander gets a separate copy of the expression for its own evaluation. • The operator is suspended while its input arguments are being evaluated. • Different part of program graphs or sub-regions can be reduced or evaluated in parallel upon demand. • A determined value (a copy ) is returned to the original demanding instruction.
  • 16.
    4.4 Comparison offlow mechanisms • Data, control, and demand flow mechanisms are compared. • Hwang page 76, table 2.1 • The degree of explicit control decreases from control drive to demand driven to data driven. • Advantages and disadvantages are given in the table. • Both data and demand flow mechanism despite of a higher potentials for parallelism, are still in the research. • Control flow machines still dominate markets.
  • 18.
    4.5 Coroutines • Thefundamental design character is the single processor model. • There is only one instruction stream with sequential flow control. • The processor system resource can be engaged and released by a set of coroutines in an orderly manner. • A quasi-parallel execution takes place between two or more corroutines.
  • 19.
    • Execution startswith the call of one particular coroutine ( as a kind of procedure) • Each coroutine may contain any number of transfer statement that switch the flow of control to a different coroutine. • This not a procedure call. • The transfer of control has to be explicit specified by the application programmer ( ho also make sure that the flow of control is transferred at the correct points. • The procedure transfer is provided in order to switch control of flow between corrotines in Modulo-2. • PROCEDURE TRANFER (VAR source, Destination:ADDRESS);
  • 21.
    Fork and Join •The fork and join construct are among the earliest parallel language construct. • It is possible start parallel processes in the Unix operating system with the fork operation and to wait end of them with wait operation. • In this type of parallel programming, two fundamentally different concepts are mixed first, the declaration of parallel processes; and second, the synchronization of the processes. • Actually the functionality of the fork operation in Unix is not as general as shown in figure 4.2 • Instead, an identical copy of the calling process is generated, which then executes in parallel to the original.
  • 23.
    • The onlypossibility of a process to determine its identity (it is an identification number). • In order to start a different program and wait for its termination, the two Unix calls can be embedded in the C language in the following way. int status; if ( fork() == 0 execlp (“program_B”,...); /* Child Proc */ .... /* Parent Proc */ wait(&status);
  • 24.
  • 25.
    Program Segments (2) •Program code • Any global declaration of variables will be doubled in each process but local variables are not doubled.
  • 26.
    • The callto the fork operation returns the process number of the child process to the parent process (is not equal to 0). • For child fork return 0 to to the child. • The child immediately executes the execlp operation. • The parent process can wait for the termination of the child process (wait operation).
  • 27.
    ParBegin and ParEnd •Blocks of parallel code are defined with ParBegin and ParEnd (cobegin and coend) in a manner analogous to the sequential begin and end. • However the instructions in the block should be carried out simultaneously. • This language is used AL control several robots and coordinates them. • Synchronization between processes through semaphores. • Due to restrictions mentioned, this concept of statement has no application in modern programming languages (synchronization and etc).
  • 29.
    4.6 Processes • Processesare declared similar to procedures and are started with a specific instruction. • If several copies of a process needed to be executed, then that process type must be started with multiple calls possibly having different parameters. • The synchronization between processes executing in parallel may be controlled through concepts of semaphores or monitors with condition variables. • The explicit synchronization of parallel processes exact not only not only an additional control cost but, is also extremely susceptible to error and deadlocks.
  • 30.
    • Communication andsynchronization are accomplished in system with shared (“tightly coupled”) via monitors with conditions. • In system without shared memory (“loosely coupled”), the concept is illustrated in figure 3.x (4.4)
  • 31.
    Remote procedure call •In order to extend the process concept to a parallel computer system without shared memory, the communication between processes located on different processor has to be carried out by message passing. • The programming system is dvided into multiple parallel processes, where each process takes on the roll of either a client or a sever. • Each sever can also become a client by using the services of another sever. • Each client confers tasks on one or more approximately configured sever processes.
  • 32.
    • This typeof parallel task distribution is implemented with the remote procedure call RPC mechanism.
  • 33.
    • Here, aremote procedure call resembles just a task deposit operation. • Returning the results after the calculation by server requires another explicit data exchange in the opposite direction • Problems with the remote procedure call include the application of error tolerant protocols for resetting or may be restarting of the client after a server failure.
  • 35.
    4.7 Implicit parallelism •All parallel concepts covered so far use a special term, explicit language construct for controlling the parallel execution. • Several languages do not require any language constructs for parallelism, but nevertheless allow parallel processing. • Such a programming languages are called languages with implicit parallelism. • The programmer is much more limited in controlling the parallel processors which are executing his program (efficient parallelism as done by an intelligent compiler .
  • 36.
    • The compilerhas no interaction with the application programmer (declarative languages represents knowledge or problems to be solved by using complex mathematical formulas. • The implicit parallelism of vector expressions, for example from programming language Functional Programming. • A shown in figure 3.x (4.7), the mathematical notation of a matrix addition contains implicit parallelism that can quite easily be converted to a parallel computer architecture through automatic parallelization.
  • 38.
    Explicit versus Implicit parallelism •A summary of advantages and disadvantages of explicit and implicit is presented in figure 3.x (4.8. • The programming actually occurs at high level abstraction; for this reason implicit parallelism is often found in higher level non-procedural languages. • In contrast, explicit parallelism gives the programmer considerable more flexibility that, can lead to better processors utilization and higher performance. • This advantage is paid for with more complicated and more error prone programming method.