BIL406-Chapter-6-Basic Parallelism and CPU.ppt

Chapter 6
Basic Parallelism and CPU

Chapter 6
Basic Parallelism and CPU
• 6.1 Introduction
• 6.2 SISD Computers
• 6.3 Hardware and software parallelism; Hardware
parallelism and Software parallelism
• 6.4 The role of compilers
• 6.5 Communication latency
• 6.6 Grain packing and scheduling
• 6.7 Static multiprocessors scheduling
• 6.8 Node duplication

6.1 Introduction
• SISD CPUs
– Parallelism in a conventional CPU
– Multiple issue CPUs
– Multiple functional units,
– Parallelism with multiple CPUs,
– grain packing and node duplication
– Scheduling

• Simple processing elements executed single
instruction on a single data stream.
• Tekrarlarsak
• Conventional Von Neumann Computer.
• Single processor executes instructions
sequentially.
• The operations are ordered in time and may be
easily traced from start to end.
• Modern uni-processor system use some from of
pipelining and super scalar techniques.
6.2 SISD computer

• Pipelining introduces temporal parallelism by
allowing sequential executions of instruction to be
overlapped in time (Used multiple functional
units).
• . The need for branching may reduce
effectiveness.
• . Very long instruction words can be used to
reduce the impact of branching
• Tekrar

6.3 Hardware and software
parallelism
• For implementation of parallelism, we need special
hardware and software support.
• Distinguish between hardware and software parallelism.
• The mismatch problem between hardware and software.
• Compilation support needed to close the gap between
hardware and software.
• Parallelism cannot be achieved free.
• Detail of special hardware functions and software supports.

Hardware parallelism
• Defined by machine hardware and hardware multiplicity.
• Cost and performance tradeoffs.
• Indicated the peak performance of the processor resource.
• A processors issues k- instruction per machine cycle the it
is called k-issue processor.
• Conventional processor takes one or more cycle to issue a
single instruction.
• This processor is one issue machine.
• For example i960CA three-issue processor , Pentium 4 4-
issu etc.

Software parallelism
• Defined by control and data dependence of programs
• Degree of parallelism is revealed in the program profile or
in the program flow graph.
• Software parallelism is function of algorithm.
• Program flow graph displays the patterns of simultaneous
executable operations.
• Example Hwang, (page 58 figure 2.3 and page 59 fig 2.4)
• Control and data parallelism ( control parallelism in
pipelining or multiplicity of functional units and data
parallelism higher potential of concurrency on SISD and
MIMD systems)

• To solve mismatch problem between software parallelism
and hardware parallelism.
• Develop compilation support.
• Hardware redesign and intelligent or optimized compiler.
• The instruction scheduler exploits pipeline hardware by
filling branch and load delay slots (using cache and
dynamic scheduling).

6.4 The role of compilers
• Compiler techniques used to exploit hardware features to
improve performance.
• Loop transformation, software pipelining, and features
developed in existing optimizing compiler for supporting
parallelism.
• Hardware and software designed jointly at the same time.
• Hardware and software design tradeoffs also exist in terms
of cost, complexity, expandability, compatibility, and
performance.
• Granularity and communication latency play important
role in the code optimization and scheduling.

6.5 Communication latency
• Balancing granularity and latency to achieve better
performance (depend on technology, scalability and
machine size).
• Memory latency increases respect to memory capacity.
• Various latency hiding and tolerating techniques.
• Inter-process communication latency is another important
parameter.
• n tasks communicating with each other requires
• n(n-1)/2 communication links (grows quadrically).
• Communication pattern.

• Pattern included ( permutations, and broadcast, multicast,
and conference)
• Communication demand may limit granularity of
parallelism.
• Trade of between communication and granularity
• Reduce latency and complexity of communication.
• Prevention of deadlock.
• Minimization of blocking in communication.

6.6 Grain packing and
scheduling
• Two fundamental question
– 1. How can we partition in to parallel branches,
program modules, or grains to yield the shortest
possible execution time.
– 2. What it the optimal size of concurrent grains in a
computation.
• Both problems are machine-dependent
• The goal is a short schedule for fast execution of
subdivided program modules.
• Tradeoffs between parallelism and
scheduling/synchronization overhead.

• Partitioning involves the algorithm designer, programmer,
compiler, operating system support, etc.
• Hwang, (fig 2.6 , page 65)
•
• (n,s) ; (n is node,s : grain size)
•
• (v,d) ; (v : output variable, d : delay )
•
• Fine and coarse grain and grain packing
• Hwang, (fig 2.7, page 66)

6.7 Static multiprocessors
scheduling
• Grain packing may not produce a short
schedule always.
• Dynamic multiprocessor scheduling is an
NP-hard problem.

6.8 Node duplication
• To eliminate idle time and reduce communication
delay
• Four major step for grain packing and scheduling.
– 1. Construct fine-grain program graph.
– 2. Schedule the fine-grain computation
– 3. Grain packing to produce the short grain
– 4. Generate a parallel schedule based on the packed
graph.
• Hwang (Figure 2.8 page 67)

• Calculatable grain size and communication
• Hwang fig 2.9, page 68

• Sequential versus parallel scheduling.
• Hwang (fig 2.10 , page 69)

• Grain packing for problem fig 2.9.c
• Hwang , ( fig 2.11, page 70)

BIL406-Chapter-6-Basic Parallelism and CPU.ppt

Recommended

Recommended

More Related Content

Similar to BIL406-Chapter-6-Basic Parallelism and CPU.ppt

Similar to BIL406-Chapter-6-Basic Parallelism and CPU.ppt (20)

More from Kadri20

More from Kadri20 (9)

Recently uploaded

Recently uploaded (20)

BIL406-Chapter-6-Basic Parallelism and CPU.ppt