2. Classifications of Parallel Systems
• 2.1 Classification of the parallel computer
systems
• 2.2 SISD: Single Instruction Single Data; The
Cray-1 Super Computer
• 2.3 MISD
• 2.4 SIMD Systems; Synchronous
parallelism>MPP (Massively Parallel Processors),
Data parallel system, DAP (The distributed
array processors) and The connection machine
• 2.5 MIMD System;Asynchronous parallelism>
Transputers, SHARC and Cray T3E
3. • 2.6 Hybrid parallel computer; systems,
Multiple-pipeline, Multiple-SIMD and
Systolic array, Waveform arrays, Very Long
Instruction Word (VLIW) and Same Program
Multiple Data (SPMD)
• 2.7 Some parameters in parallel computers;
Speedup , Efficiency, Latency and Grain
size
• 2.8 Levels of Parallelism; Bit level parallelism,
Instruction level parallelism, Procedure
level and Job or program level parallelism
• 2.9 Parallel operations; Nomadic and Dyadic
operations
4. 2.1 Classification of the parallel
computer systems
• Three different classification system will be
introduced.
– 1. Parallel computers will be classified
according to their processing structure
– 2. Classification of concurrent actions
according to their level of abstractions.
– 3. Parallel operations can be by their
arguments.
5.
6. Computer system Classification
• Flynn’s classification systems divides entire
computer world into four groups.
– 1. SISD
– 2. SIMD
– 3. MISD
– 4. MIMD
7. 2.2 SISD Systems
• Conventional Von Neumann Computer.
– Single processor executes instructions
sequentially.
– The operations are ordered in time and may be
easily traced from start to end.
– Modern uni-processor system uses some from
of pipelining and super scalar techniques.
– Pipelining introduces temporal parallelism by
allowing sequential executions of instruction to
be overlapped in time (Used multiple functional
units).
8. – The need for branching may reduce
effectiveness.
– Very long instruction words can be used to
reduce the impact of branching
– The need for branching may reduce
effectiveness.
– Very long instruction words can be used to
reduce the impact of branching
9. The Cray-1 Super Computer
• Commercial super computer with multiple
pipelines.
– Scalar and vector operations may be performed
concurrently.
– Vector processor capable of 160 Mflop.
– Well suited matrix problems.
– There are twelve functional (pipelined) units
performing addresses, scalar vector and floating
point operations.
11. – Main memory is divided into sixteen memory
bank and banks can be addressed concurrently.
– Each functional unit pipelined and accepts new
set operation in each clock periods.
– Special software is needed
– A vector compiler was developed (Fortran).
– Some dependencies removed by the
reformulating Fortran programs (Yazılım
Önemli).
12. 2.3 MISD systems
• MISD computer may consist of several instruction
units supplying similar number of processors, but
these processors all still obtain their data from
single logical source.
– This concept similar to pipelining architecture
consists of number of processors.
– Stream of data passed from one processor to the
next.
– Each processor possibly performing different
operations.
15. – A, B and C stages corresponds to
different parts of task.
– N-stage pipeline accomplished after load
phase (n-1).
– Only applicable to specific task. (For
example program loops)
– There are instruction interdependence
– List of instruction must be coordinate
with size of pipeline.
16. 2.4 SIMD System
• SISD is von Neumann and MISD covers pipeline
computer system
• Figure 2.1 (Page 5)
17. Synchronous parallelism
• Means that there is only single thread
control.
– A special processor (Master processors)
executes the program.
– Apply a master instruction over vector related
operands.
– Number of processors obeys the same
instruction in strict loc-step.
– Spatial parallelism is provided.
18. MPP (Massively parallel
processors )
• Consists of a control unit (central processor ACU)
and,
• large number of simple processors (such as bit
processors).
• Each processor is independent but only operates
on command from the control unit.
• Each processor executes same instruction on its
memory or data.
• SIMD program is always synchronous.
21. Data parallel system
• is the use of multiple functional units to apply the
same operations simultaneously to elements of a
data set.
22. DAP (The distributed array
processors)
• Consist of 4096 one bit processors,
• Arranged in 64x64 grid, each of which addressed
4 Kbits of memory.
• Two orthogonal (dik) high ways are used to
connect the rows and columns of the processing
elements.
• Registers in the control unit are aligned with the
high ways
• (Fig 3, page 6 , Alan)
23.
24. • The programmer must explicitly partition the data
to ensure efficient processor utilization.
• The size of instruction buffer is 60 words, and this
restricts the number of instructions which
constitute each loop to be executed by array of
processing elements.
• Recently DAP has been updated (DAP 500 32x32
PE and DAP 600 64x64 PE 32 Kbits of memory
is used on SUN and VAX as host computer)
25. The connection machine,
• CM-1 provides up to 64K PE (generally 4096 PE
and 32 Mbytes of memory, 2000 MIPS) , each
4Kbits of memory.
• If the number of PE specified exceeds the number
of physical PE (transparent to user) local memory
sliced and time slicing employed as necessary.
• LISP, Fortran and C.
• CM-2 (fig 4 , page 8 , Alan) 4096 PE and 2048
floating point execution chip, Glf a Gigabyte Ram.
• Data valud hold 10 Giga bytes of data.
26.
27. 2.5 MIMD Systems
• Two interesting class are SIMD (synchronous) and
MIMD (asynchronous).
•
28. Asynchronous parallelism
• Asynchronous parallelism means that there are
multiple threads of control. (data exchange, each
processor executes individual programs).
• MIMD and SIMD according to their inter
connection topology.
• (From page 8, fig 2.4 and 2.5 and 2.2 senkron)
• (Alan page 9, fig 5).
– This class (MIMD) is more general structure
and always work asynchronously).
29. • (From page 8, fig 2.4 and 2.5 and 2.2
senkron)
30. • (From page 8, fig 2.4 and 2.5 and 2.2
senkron)
31. • (From page 8, fig 2.4 and 2.5 and 2.2
senkron)
32. • (From page 8, fig 2.4 and 2.5 and 2.2
senkron)
33. – MIMD computers with shared memory
are known as tightly coupled and,
– Synchronization and information
exchange occur via memory areas which
can be addressed by different processor in
a coordinated manner.
– accesses to the same portion of shared
memory at the same time requires an
arbitration mechanism which must be
used to ensure only one processor
accesses that memory portion at a time.
34. • This problem of memory contention may restrict
the number of processors that can be
interconnected using shared memory model.
– MIMD computers without shared memory are known
as loosely coupled.
– Each PE has its own local memory.
– Synchronization and communication are much more
costly without share memory, because
– messages must be exchanged over the network.
– PE wishes to access another PE’s private memory, it
can only do so by sending a message to the appropriate
PE along the interconnection network.
44. Waveform arrays
• The central clock causes problems in large
systolic array then systolic arrays is
replaced by data flow.
45. Very Long Instruction Word
(VLIW)
• Hybrid form of pipeline and MIMD
computers.
• Parallelism is achieved unusually long
instruction format so that several arithmetic
and logic operations contained in one
instruction word.
• Compiler support is used
46. Same Program Multiple Data
(SPMD)
• Mix of SIMD and MIMD is same program
multiple data.
• Computer system is controlled by single
program then ease of SIMD and flexibility
of MIMD are combined.
• (Data değişiminde senkronizasyon
sağlanır.)
47. 2.7 Some parameters in parallel
computers
• PE, network, memory, speedup, efficiency,
latency and etc.
•
48. Speedup and Efficiency
• Two important metrics are to measure
performance of a parallel system.
• Speedup = elapsed time of a uni-processors (or
functional unit) / elapsed time of the
multiprocessors ( or functional units)
• Efficiency = speed-up X100/ number of
processors (or functional units)
• This two metrics are used to measure performance
of a parallel system.
49. Latency
• is a time measure of the communication overhead
incurred between machine sub system.
– Memory latency,
– Synchronization latency.
– communication latency.
• In general, the execution of a program may
involve combinations of these levels depend on
the application, formulation, algorithm, language,
compilation, and hardware limitations.
•
50. Grain size
• I s measured of computation involved in software
process.
– Simplest measure is to count the number of instructions
in a grain (program segment).
– Grain size commonly describes as fine medium and
coarse.
• Bit level parallelism , instruction level or
expression level parallelism, Procedure level or
subroutines or task or coroutines, program level
job, task or programs.
51. 2.8 Levels of Parallelism
• Computational granularity or level of parallelism
in programs is more fine at lower level and coarse
grained at higher levels.
52. Bit level parallelism
• Parallel executions of bit operations at bit level.
• ALU, individual bits are operated simultaneously
• Fig 2.13, page 14, brunnel
53. Expression (Instruction) level
parallelism
• Fig 2.12, page 14, brunnel
– Typical grain contain less than 20 instruction
– Advantage of fine grain computation lies in the
abundance of parallelism.
– Optimizing compiler automatically detect the
parallelism.
– Communication overhead is problem
– Simple synchronous calculations; such as
matrix calculations
55. Procedure level
• Medium grin size lees than 2000 instructions.
• inter process analysis is much more involved.
• Programmer may need to reconstruct the program
• Multi tasking is belong to this category.
• communication requirement is less in MIMD
execution mode.
• Fig 2.11 , page 13, Brunell
• Fields of Application:
56. – Real time programming,
– Control of time critical techniques; Power
plant
– Process control system
– Simultaneous control of multiple physical
components; Robot control.
– General purpose parallel processing
• Breaking down a problem in to sub-task,
which are distributed onto several
processing elements fro performance
enhancement; example figure;
57.
58. Program (Job) or level
parallelism
• Grain size is larger than ten thousand instructions.
• Multi tasking required.
• Time-sharing and space sharing multiprocessors
explores this level of parallelism.
• Processes may be queued.
• complete programs executed simultaneously.
• demands significant role of programmer and
operating system support.
• less communication.
59. • less parallelism.
• less compiler support.
• communication latency may increases.
• This delay may be hidden or tolerated by using
same.
• message-passing uses medium and coarse grain
computations.
• shared variable often used to support
communication.
• techniques (caching, profiling, multi threading).
• Figure 2.10, page 12, Brunnel
•
60.
61. • In General
– In general The finer the grain size the higher for
parallelism and the higher the communication
and scheduling.
– Fine grain provides higher degree of
parallelism but heavier communication
overhead as compared with coarse grain
computation.
• . Massive parallelism explored at the fine grain
level, such as data parallelism on SIMD and
MIMD computers.
•
62.
63. 2.9 Parallel operations
• Totally different way of viewing parallelism
comes from analyzing mathematical operations on
individual data elements or groups of data.
– Distinguish scalar or vector data and then
processing carried out parallel or sequential.
– Simple operations on vectors (addition of two
vector is a parallel operation.