CA UNIT IV.pptx

UNIT IV – PARALLELISIM
Parallel processing challenges – Flynn„s
classification – SISD, MIMD, SIMD, SPMD,
and Vector Architectures - Hardware
multithreading – Multi-core processors and
other Shared Memory Multiprocessors -
Introduction to Graphics Processing Units,
Clusters, Warehouse Scale Computers and
other Message-Passing Multiprocessors

Introduction:
•Processing data concurrently is known as Parallel
Processing.
•Consider a multiprocessor system with ‗n‘ processors.
If a processor fails, the system would continue to
provide service with the remaining ‗n-1‘ processors.
•Parallelism is a mode of operation in which a process
is split into parts, which are executed simultaneously
on different processors attached to the same
computer.

•TwoWays:
•Multiple functional units – two or moreALUs
•Multiple processors – two or more processors
•Multiprocessor System
•Task-level parallelism or process-level parallelism
•Parallel processing program
•Cluster
Multicore:
• Architecture design that places multiple processors on a single
die(computer chip).
• Eg. Dual, Quad, Hexa, Octa.
Necessity:
• Reduce power consumption
• Cut cost

Goals of Parallelism:
•It increases the computational speed.
• It increases throughput by making two or more
ALUs in CPU can work concurrently.
[Throughput - amount of processing that can be
accomplished during a given interval of time]
•It improves the performance of the computer for a
given clock speed.

Types of Parallelism:
• Instruction level parallelism
•Thread level orTask level Parallelism
•Bit-level Parallelism
•Data level parallelism
•Transaction level parallelism

Instruction Level Parallelism
•When instructions in a sequence are independent and
can be executed in parallel, then there is an
Instruction Level Parallelism.
•Two primary methods are:
1. Increasing the depth of pipeline
2. Replicating the internal components.

1. Implementing a multiple issue processor
- Static and Dynamic
2. Speculation
- Approach to guess the properties of instruction
3. Recovery mechanisms
- Exception Handling
4. Instruction issue policy
- in-order issue with in-order completion
- in-order issue with out-order completion
- out-order issue with out-order completion
5. Register renaming
6. Branch prediction

Parallel Processing Challenges
•Challenges faced by industry is to create
hardware and software that will make it easy to
write correct parallel processing programs that will
execute efficiently in performance and energy.
•Challenges:
•Writing programs
•Scheduling
•Partitioning the task
•Balance the load between processors.

Parallel Processing Challenges
•Amdahl’s Law:
Amdahl’s law is used to calculate the performance
gain that can be obtained by improving some portion
of a computer.
𝑆𝑝𝑒𝑒𝑑 𝑢𝑝 =
1
1 − 𝐹𝑒 + (
𝐹𝑒
𝑆𝑒
)

Flynn’s Classification
•Flynn’s classification uses two basic concepts:
1. Parallelism in instruction stream and
2. Parallelism in data stream
•There are 4 possible combinations.

SISD (Single Instruction Single Data)
•A processor that can only do one job at a time from
start to finish.

SIMD (Single Instruction Multiple
Data)
•They have multiple processing/execution units and
one control unit.
•SPMD

MISD (Multiple Instruction Single
Data)
•There are N control and processor unit operating
over the same data stream and result of one
processor becomes input of the next processor.

MIMD (Multiple Instruction Multiple
Data)
•Most of the multiprocessors system and multiple
computers system come under this category.
•Multiple SISD(MSISD)

Vector Architecture
•Efficient method of SIMD.
•It collects data elements from memory, put them in
order into a large set of register, operate them
sequentially in registers and then write them results
back to memory.

Hardware Multithreading
•The instruction stream is divided into several smaller
streams called Threads.
•Otherwise it’s a high degree of instruction level
parallism.
Some terms:
•Process
•Resource ownership
•Scheduling /execution
•Process Switch
•Thread
•Thread Switch

•Two methods:
1. Explicit Multithreading
2. Implicit Multithreading

Explicit Multithreading
•Explicit Multithreading are visible to the application
programs and visible to operating system.
Implicit Multithreading
•Implicit Multithreading are not direct method.

Approaches to Explicit Multithreading
•Single-threaded scalar
•Interleaved or fine-grained multithreading
•Blocked or coarse-grained multithreading
•Simultaneous multithreading (SMT)
•Chip multiprocessing

•Interleaved multithreaded scalar
•Blocked multithreaded scalar
•Superscalar
•Interleaved multithreading superscalar
• Blocked multithreaded superscalar
•Very long instruction word(VLIW)
•Interleaved multithreading VLIW
•Blocked multithreading VLIW
•Simultaneous multithreading
•Chip multiprocessor

Multicore Processors and Other
Shared Memory Multiprocessors
•Multicore architecture are classified into 3 types:
1. Type 1 (Hyperthreading technology)
2. Type 2 (Classic Multiprocessor)
3. Type 3 (Multicore system)

Shared Memory Multiprocessor (SMP)
•SMP is one that offers the programmer a single
physical address space across all processor.
•Classified as:
1. Uniform memory access multiprocessor (UMA)
2. Non-Uniform memory access multiprocessor
(NUMA)

S.No. Key UMA NUMA
1 Definition
UMA stands for Uniform
Memory Access.
NUMA stands for Non Uniform
Memory Access.
2
Memory
Controller
UMA has single memory
controller.
NUMA has multiple memory
controllers.
3
Memory
Access
UMA memory access is slow.
NUMA memory accsss is faster
than UMA memory.
4 Bandwidth UMA has limited bandwidth.
NUMA has more bandwidth
than UMA.
5 Suitability
UMA is used in general purpose
and time sharing applications.
NUMA is used in real time and
time critical applications.
6
Memory
Access time
UMA has equal memory access
time.
NUMA has varying memory
access time.
7 Bus types
3 types of Buses supported:
Single, Multiple and Crossbar.
2 types of Buses supported:
Tree, hiearchical.

Graphics Processing Unit (GPU)
1. GPUs vs CPUs
•Programming interface to GPU are high-level
application programming interface (APIs) such as
DirectX, OpenGL, NVIDIA’s C for graphics etc..
•CPU supports sequential coding while GPU supports
parallel coding.

2. Connection between CPU and GPU

3. GPU Architecture
o SIMD
One instruction operates on multiple data.
o Multithreading
Most graphics have this property since they need to
process many objects. (pixels, vertices, polygons)
simultaneously.
o NIVIDIA GPU architecture
1. Motherboard GPUs integrated
2. Tesla-based GPUs – 900MHz, 128MB – DDR3 RAM

CUDA Programming
o Compute Unified Device Architecture
o CUDA is a parallel computing platform
and programming model developed by Nvidia for
general computing on its own GPUs (graphics
processing units).
o CUDA enables developers to speed up compute-
intensive applications by harnessing the power of
GPUs for the parallelizable part of the computation.
o Heterogeneous CPU and GPU System.

Message-passing multiprocessors
o With no shared memory space, the alternative
method to achieve multiprocessor is via explicit
message passing technique.
o This is done by establishing a communication
channel between two processor.

Shared memory multiprocessor
o A Shared memory multiprocessor is a computer
system composed of multiple independent
processors that execute different instruction
streams.
o Processor share a common memory address space
and communicate with each other via memory.

Clusters
o Clusters are collections of desktop computers or
servers connected by local area networks to act as
a single large computer.

Warehouse-Scale Computers
o Largest form of clusters are called Warehouse-
scale computers (WSCs)
o WSC provide internet services:
1. Google
2. Facebook
3. Youtube
4. Amazon

Goals and requirements with servers:
•Cost-performance
•Energy efficiency
•Dependability
•Network i/o
•Interactive
Characteristics not shared with servers:
•Ample parallelism
•Operational cost count
•Scale

Ques
o List four major groups of computes defined by
Michael J.Flynn
o State amdahl’s law.
o Define Parallel processing.
o What is Speculation?
o State Coarse grained multithreading.
o Write note on SIMD processor.
o Define VLIW.
o Compare UMA and NUMA multiprocessor.
o What is multicore processor?

Part B
o What is hardware multithreading? Compare and
contrast fine grained and coarse grained
multithreading.
o Discuss in detail about Instruction Level
Parallelism.
o Explain in detail Flynn’s classification of parallel
hardware.

Part B
o Explain
(i) Shared memory multiprocessor (3)
(ii)Warehouse scale computers. (7)
(iii)Message passing multiprocessors.(4)
(iv)Parallel processing challenges.(3)
(v)Clusters and Message passing system.(7)
o Describe GPU Architecture in detail.

CA UNIT IV.pptx

Recommended

Recommended

More Related Content

Similar to CA UNIT IV.pptx

Similar to CA UNIT IV.pptx (20)

More from ssuser9dbd7e

More from ssuser9dbd7e (6)

Recently uploaded

Recently uploaded (20)

CA UNIT IV.pptx