BIL406-Chapter-2-Classifications of Parallel Systems.ppt

Chapter 2
Classifications of Parallel Systems

Classifications of Parallel Systems
• 2.1 Classification of the parallel computer
systems
• 2.2 SISD: Single Instruction Single Data; The
Cray-1 Super Computer
• 2.3 MISD
• 2.4 SIMD Systems; Synchronous
parallelism>MPP (Massively Parallel Processors),
Data parallel system, DAP (The distributed
array processors) and The connection machine
• 2.5 MIMD System;Asynchronous parallelism>
Transputers, SHARC and Cray T3E

• 2.6 Hybrid parallel computer; systems,
Multiple-pipeline, Multiple-SIMD and
Systolic array, Waveform arrays, Very Long
Instruction Word (VLIW) and Same Program
Multiple Data (SPMD)
• 2.7 Some parameters in parallel computers;
Speedup , Efficiency, Latency and Grain
size
• 2.8 Levels of Parallelism; Bit level parallelism,
Instruction level parallelism, Procedure
level and Job or program level parallelism
• 2.9 Parallel operations; Nomadic and Dyadic
operations

2.1 Classification of the parallel
computer systems
• Three different classification system will be
introduced.
– 1. Parallel computers will be classified
according to their processing structure
– 2. Classification of concurrent actions
according to their level of abstractions.
– 3. Parallel operations can be by their
arguments.

Computer system Classification
• Flynn’s classification systems divides entire
computer world into four groups.
– 1. SISD
– 2. SIMD
– 3. MISD
– 4. MIMD

2.2 SISD Systems
• Conventional Von Neumann Computer.
– Single processor executes instructions
sequentially.
– The operations are ordered in time and may be
easily traced from start to end.
– Modern uni-processor system uses some from
of pipelining and super scalar techniques.
– Pipelining introduces temporal parallelism by
allowing sequential executions of instruction to
be overlapped in time (Used multiple functional
units).

– The need for branching may reduce
effectiveness.
– Very long instruction words can be used to
reduce the impact of branching
– The need for branching may reduce
effectiveness.
– Very long instruction words can be used to
reduce the impact of branching

The Cray-1 Super Computer
• Commercial super computer with multiple
pipelines.
– Scalar and vector operations may be performed
concurrently.
– Vector processor capable of 160 Mflop.
– Well suited matrix problems.
– There are twelve functional (pipelined) units
performing addresses, scalar vector and floating
point operations.

– Bottleneck because of low memory bandwidth.
– (Alan Figure 2. sayfa 5)

– Main memory is divided into sixteen memory
bank and banks can be addressed concurrently.
– Each functional unit pipelined and accepts new
set operation in each clock periods.
– Special software is needed
– A vector compiler was developed (Fortran).
– Some dependencies removed by the
reformulating Fortran programs (Yazılım
Önemli).

2.3 MISD systems
• MISD computer may consist of several instruction
units supplying similar number of processors, but
these processors all still obtain their data from
single logical source.
– This concept similar to pipelining architecture
consists of number of processors.
– Stream of data passed from one processor to the
next.
– Each processor possibly performing different
operations.

– (Pipelined system Braunl page 6 Figure
2.2 and figure 2.3)

– A, B and C stages corresponds to
different parts of task.
– N-stage pipeline accomplished after load
phase (n-1).
– Only applicable to specific task. (For
example program loops)
– There are instruction interdependence
– List of instruction must be coordinate
with size of pipeline.

2.4 SIMD System
• SISD is von Neumann and MISD covers pipeline
computer system
• Figure 2.1 (Page 5)

Synchronous parallelism
• Means that there is only single thread
control.
– A special processor (Master processors)
executes the program.
– Apply a master instruction over vector related
operands.
– Number of processors obeys the same
instruction in strict loc-step.
– Spatial parallelism is provided.

MPP (Massively parallel
processors )
• Consists of a control unit (central processor ACU)
and,
• large number of simple processors (such as bit
processors).
• Each processor is independent but only operates
on command from the control unit.
• Each processor executes same instruction on its
memory or data.
• SIMD program is always synchronous.

• Figure 2.6 Braunl Figure 2.6 and Figure 2.7

Data parallel system
• is the use of multiple functional units to apply the
same operations simultaneously to elements of a
data set.

DAP (The distributed array
processors)
• Consist of 4096 one bit processors,
• Arranged in 64x64 grid, each of which addressed
4 Kbits of memory.
• Two orthogonal (dik) high ways are used to
connect the rows and columns of the processing
elements.
• Registers in the control unit are aligned with the
high ways
• (Fig 3, page 6 , Alan)

• The programmer must explicitly partition the data
to ensure efficient processor utilization.
• The size of instruction buffer is 60 words, and this
restricts the number of instructions which
constitute each loop to be executed by array of
processing elements.
• Recently DAP has been updated (DAP 500 32x32
PE and DAP 600 64x64 PE 32 Kbits of memory
is used on SUN and VAX as host computer)

The connection machine,
• CM-1 provides up to 64K PE (generally 4096 PE
and 32 Mbytes of memory, 2000 MIPS) , each
4Kbits of memory.
• If the number of PE specified exceeds the number
of physical PE (transparent to user) local memory
sliced and time slicing employed as necessary.
• LISP, Fortran and C.
• CM-2 (fig 4 , page 8 , Alan) 4096 PE and 2048
floating point execution chip, Glf a Gigabyte Ram.
• Data valud hold 10 Giga bytes of data.

2.5 MIMD Systems
• Two interesting class are SIMD (synchronous) and
MIMD (asynchronous).
•

Asynchronous parallelism
• Asynchronous parallelism means that there are
multiple threads of control. (data exchange, each
processor executes individual programs).
• MIMD and SIMD according to their inter
connection topology.
• (From page 8, fig 2.4 and 2.5 and 2.2 senkron)
• (Alan page 9, fig 5).
– This class (MIMD) is more general structure
and always work asynchronously).

• (From page 8, fig 2.4 and 2.5 and 2.2
senkron)

– MIMD computers with shared memory
are known as tightly coupled and,
– Synchronization and information
exchange occur via memory areas which
can be addressed by different processor in
a coordinated manner.
– accesses to the same portion of shared
memory at the same time requires an
arbitration mechanism which must be
used to ensure only one processor
accesses that memory portion at a time.

• This problem of memory contention may restrict
the number of processors that can be
interconnected using shared memory model.
– MIMD computers without shared memory are known
as loosely coupled.
– Each PE has its own local memory.
– Synchronization and communication are much more
costly without share memory, because
– messages must be exchanged over the network.
– PE wishes to access another PE’s private memory, it
can only do so by sending a message to the appropriate
PE along the interconnection network.

Transputers
• 2.7.1Transputers
•
• ( Alan, Fig 6 , page 10)
•

SHARC
• 2.7.2 SHARC
•
• 128 Sharc (CPU) PEs
• 12 GFLOP
•

Cray T3E
• ( 32- 64bit( 675 Mhz)
• 40 –2176 işlemci PE 8 li artışlı,
• Peak performance, 1350 Mflop per PE,
• Topology 3D bi-directional folded torus (166 GB
512 PEs için,
• local memory 256 or 512 MB )
• . müthiş
•

2.6 Hybrid parallel computer
systems
• A large number of mixed class of computer
systems can be derived from the parallel
classes.

Multiple-pipeline
• A computer has two or more independent
pipelines and use them parallel. (A kind of
Pipeline and MIMD computer)

Multiple-SIMD
• There are multiple ACU for subset of PEs.
This is equivalent to a (MIMD-like)
connection of several independent SIMD
computers.

Systolic array
• Combination of SIMD, MIMD and
pipelined systems.
•
• Fig 2.8 , page 10, Brunnel

Waveform arrays
• The central clock causes problems in large
systolic array then systolic arrays is
replaced by data flow.

Very Long Instruction Word
(VLIW)
• Hybrid form of pipeline and MIMD
computers.
• Parallelism is achieved unusually long
instruction format so that several arithmetic
and logic operations contained in one
instruction word.
• Compiler support is used

Same Program Multiple Data
(SPMD)
• Mix of SIMD and MIMD is same program
multiple data.
• Computer system is controlled by single
program then ease of SIMD and flexibility
of MIMD are combined.
• (Data değişiminde senkronizasyon
sağlanır.)

2.7 Some parameters in parallel
computers
• PE, network, memory, speedup, efficiency,
latency and etc.
•

Speedup and Efficiency
• Two important metrics are to measure
performance of a parallel system.
• Speedup = elapsed time of a uni-processors (or
functional unit) / elapsed time of the
multiprocessors ( or functional units)
• Efficiency = speed-up X100/ number of
processors (or functional units)
• This two metrics are used to measure performance
of a parallel system.

Latency
• is a time measure of the communication overhead
incurred between machine sub system.
– Memory latency,
– Synchronization latency.
– communication latency.
• In general, the execution of a program may
involve combinations of these levels depend on
the application, formulation, algorithm, language,
compilation, and hardware limitations.
•

Grain size
• I s measured of computation involved in software
process.
– Simplest measure is to count the number of instructions
in a grain (program segment).
– Grain size commonly describes as fine medium and
coarse.
• Bit level parallelism , instruction level or
expression level parallelism, Procedure level or
subroutines or task or coroutines, program level
job, task or programs.

2.8 Levels of Parallelism
• Computational granularity or level of parallelism
in programs is more fine at lower level and coarse
grained at higher levels.

Bit level parallelism
• Parallel executions of bit operations at bit level.
• ALU, individual bits are operated simultaneously
• Fig 2.13, page 14, brunnel

Expression (Instruction) level
parallelism
• Fig 2.12, page 14, brunnel
– Typical grain contain less than 20 instruction
– Advantage of fine grain computation lies in the
abundance of parallelism.
– Optimizing compiler automatically detect the
parallelism.
– Communication overhead is problem
– Simple synchronous calculations; such as
matrix calculations

– Data parallelism exploits fine grain
parallelism at this level.

Procedure level
• Medium grin size lees than 2000 instructions.
• inter process analysis is much more involved.
• Programmer may need to reconstruct the program
• Multi tasking is belong to this category.
• communication requirement is less in MIMD
execution mode.
• Fig 2.11 , page 13, Brunell
• Fields of Application:

– Real time programming,
– Control of time critical techniques; Power
plant
– Process control system
– Simultaneous control of multiple physical
components; Robot control.
– General purpose parallel processing
• Breaking down a problem in to sub-task,
which are distributed onto several
processing elements fro performance
enhancement; example figure;

Program (Job) or level
parallelism
• Grain size is larger than ten thousand instructions.
• Multi tasking required.
• Time-sharing and space sharing multiprocessors
explores this level of parallelism.
• Processes may be queued.
• complete programs executed simultaneously.
• demands significant role of programmer and
operating system support.
• less communication.

• less parallelism.
• less compiler support.
• communication latency may increases.
• This delay may be hidden or tolerated by using
same.
• message-passing uses medium and coarse grain
computations.
• shared variable often used to support
communication.
• techniques (caching, profiling, multi threading).
• Figure 2.10, page 12, Brunnel
•

• In General
– In general The finer the grain size the higher for
parallelism and the higher the communication
and scheduling.
– Fine grain provides higher degree of
parallelism but heavier communication
overhead as compared with coarse grain
computation.
• . Massive parallelism explored at the fine grain
level, such as data parallelism on SIMD and
MIMD computers.
•

2.9 Parallel operations
• Totally different way of viewing parallelism
comes from analyzing mathematical operations on
individual data elements or groups of data.
– Distinguish scalar or vector data and then
processing carried out parallel or sequential.
– Simple operations on vectors (addition of two
vector is a parallel operation.

Nomadic
• 2.11.1 Nomadic (aynen Bunnel, page 15)

Dyadic operations
• 2.11.2 Dyadic operations (aynen, page15)

BIL406-Chapter-2-Classifications of Parallel Systems.ppt

BIL406-Chapter-2-Classifications of Parallel Systems.ppt

Recommended

Recommended

More Related Content

Similar to BIL406-Chapter-2-Classifications of Parallel Systems.ppt

Similar to BIL406-Chapter-2-Classifications of Parallel Systems.ppt (20)

More from Kadri20

More from Kadri20 (9)

Recently uploaded

Recently uploaded (20)

BIL406-Chapter-2-Classifications of Parallel Systems.ppt