1. UNIT IV â PARALLELISIM
Parallel processing challenges â Flynnâs
classification â SISD, MIMD, SIMD, SPMD,
and Vector Architectures - Hardware
multithreading â Multi-core processors and
other Shared Memory Multiprocessors -
Introduction to Graphics Processing Units,
Clusters, Warehouse Scale Computers and
other Message-Passing Multiprocessors
2. Introduction:
â˘Processing data concurrently is known as Parallel
Processing.
â˘Consider a multiprocessor system with ânâ processors.
If a processor fails, the system would continue to
provide service with the remaining ân-1â processors.
â˘Parallelism is a mode of operation in which a process
is split into parts, which are executed simultaneously
on different processors attached to the same
computer.
3. â˘TwoWays:
â˘Multiple functional units â two or moreALUs
â˘Multiple processors â two or more processors
â˘Multiprocessor System
â˘Task-level parallelism or process-level parallelism
â˘Parallel processing program
â˘Cluster
Multicore:
⢠Architecture design that places multiple processors on a single
die(computer chip).
⢠Eg. Dual, Quad, Hexa, Octa.
Necessity:
⢠Reduce power consumption
⢠Cut cost
4. Goals of Parallelism:
â˘It increases the computational speed.
⢠It increases throughput by making two or more
ALUs in CPU can work concurrently.
[Throughput - amount of processing that can be
accomplished during a given interval of time]
â˘It improves the performance of the computer for a
given clock speed.
6. Instruction Level Parallelism
â˘When instructions in a sequence are independent and
can be executed in parallel, then there is an
Instruction Level Parallelism.
â˘Two primary methods are:
1. Increasing the depth of pipeline
2. Replicating the internal components.
7. 1. Implementing a multiple issue processor
- Static and Dynamic
2. Speculation
- Approach to guess the properties of instruction
3. Recovery mechanisms
- Exception Handling
4. Instruction issue policy
- in-order issue with in-order completion
- in-order issue with out-order completion
- out-order issue with out-order completion
5. Register renaming
6. Branch prediction
8. Parallel Processing Challenges
â˘Challenges faced by industry is to create
hardware and software that will make it easy to
write correct parallel processing programs that will
execute efficiently in performance and energy.
â˘Challenges:
â˘Writing programs
â˘Scheduling
â˘Partitioning the task
â˘Balance the load between processors.
9. Parallel Processing Challenges
â˘Amdahlâs Law:
Amdahlâs law is used to calculate the performance
gain that can be obtained by improving some portion
of a computer.
đđđđđ đ˘đ =
1
1 â đšđ + (
đšđ
đđ
)
11. SISD (Single Instruction Single Data)
â˘A processor that can only do one job at a time from
start to finish.
12. SIMD (Single Instruction Multiple
Data)
â˘They have multiple processing/execution units and
one control unit.
â˘SPMD
13. MISD (Multiple Instruction Single
Data)
â˘There are N control and processor unit operating
over the same data stream and result of one
processor becomes input of the next processor.
14. MIMD (Multiple Instruction Multiple
Data)
â˘Most of the multiprocessors system and multiple
computers system come under this category.
â˘Multiple SISD(MSISD)
15. Vector Architecture
â˘Efficient method of SIMD.
â˘It collects data elements from memory, put them in
order into a large set of register, operate them
sequentially in registers and then write them results
back to memory.
16.
17. Hardware Multithreading
â˘The instruction stream is divided into several smaller
streams called Threads.
â˘Otherwise itâs a high degree of instruction level
parallism.
Some terms:
â˘Process
â˘Resource ownership
â˘Scheduling /execution
â˘Process Switch
â˘Thread
â˘Thread Switch
25. Multicore Processors and Other
Shared Memory Multiprocessors
â˘Multicore architecture are classified into 3 types:
1. Type 1 (Hyperthreading technology)
2. Type 2 (Classic Multiprocessor)
3. Type 3 (Multicore system)
26.
27.
28. Shared Memory Multiprocessor (SMP)
â˘SMP is one that offers the programmer a single
physical address space across all processor.
â˘Classified as:
1. Uniform memory access multiprocessor (UMA)
2. Non-Uniform memory access multiprocessor
(NUMA)
29.
30. S.No. Key UMA NUMA
1 Definition
UMA stands for Uniform
Memory Access.
NUMA stands for Non Uniform
Memory Access.
2
Memory
Controller
UMA has single memory
controller.
NUMA has multiple memory
controllers.
3
Memory
Access
UMA memory access is slow.
NUMA memory accsss is faster
than UMA memory.
4 Bandwidth UMA has limited bandwidth.
NUMA has more bandwidth
than UMA.
5 Suitability
UMA is used in general purpose
and time sharing applications.
NUMA is used in real time and
time critical applications.
6
Memory
Access time
UMA has equal memory access
time.
NUMA has varying memory
access time.
7 Bus types
3 types of Buses supported:
Single, Multiple and Crossbar.
2 types of Buses supported:
Tree, hiearchical.
31. Graphics Processing Unit (GPU)
1. GPUs vs CPUs
â˘Programming interface to GPU are high-level
application programming interface (APIs) such as
DirectX, OpenGL, NVIDIAâs C for graphics etc..
â˘CPU supports sequential coding while GPU supports
parallel coding.
32. Graphics Processing Unit (GPU)
1. GPUs vs CPUs
â˘Programming interface to GPU are high-level
application programming interface (APIs) such as
DirectX, OpenGL, NVIDIAâs C for graphics etc..
â˘CPU supports sequential coding while GPU supports
parallel coding.
34. 3. GPU Architecture
o SIMD
One instruction operates on multiple data.
o Multithreading
Most graphics have this property since they need to
process many objects. (pixels, vertices, polygons)
simultaneously.
o NIVIDIA GPU architecture
1. Motherboard GPUs integrated
2. Tesla-based GPUs â 900MHz, 128MB â DDR3 RAM
35.
36. CUDA Programming
o Compute Unified Device Architecture
o CUDA is a parallel computing platform
and programming model developed by Nvidia for
general computing on its own GPUs (graphics
processing units).
o CUDA enables developers to speed up compute-
intensive applications by harnessing the power of
GPUs for the parallelizable part of the computation.
o Heterogeneous CPU and GPU System.
37. Message-passing multiprocessors
o With no shared memory space, the alternative
method to achieve multiprocessor is via explicit
message passing technique.
o This is done by establishing a communication
channel between two processor.
38.
39. Shared memory multiprocessor
o A Shared memory multiprocessor is a computer
system composed of multiple independent
processors that execute different instruction
streams.
o Processor share a common memory address space
and communicate with each other via memory.
40. Clusters
o Clusters are collections of desktop computers or
servers connected by local area networks to act as
a single large computer.
41. Warehouse-Scale Computers
o Largest form of clusters are called Warehouse-
scale computers (WSCs)
o WSC provide internet services:
1. Google
2. Facebook
3. Youtube
4. Amazon
42. Goals and requirements with servers:
â˘Cost-performance
â˘Energy efficiency
â˘Dependability
â˘Network i/o
â˘Interactive
Characteristics not shared with servers:
â˘Ample parallelism
â˘Operational cost count
â˘Scale
43. Ques
o List four major groups of computes defined by
Michael J.Flynn
o State amdahlâs law.
o Define Parallel processing.
o What is Speculation?
o State Coarse grained multithreading.
o Write note on SIMD processor.
o Define VLIW.
o Compare UMA and NUMA multiprocessor.
o What is multicore processor?
44. Part B
o What is hardware multithreading? Compare and
contrast fine grained and coarse grained
multithreading.
o Discuss in detail about Instruction Level
Parallelism.
o Explain in detail Flynnâs classification of parallel
hardware.
45. Part B
o Explain
(i) Shared memory multiprocessor (3)
(ii)Warehouse scale computers. (7)
(iii)Message passing multiprocessors.(4)
(iv)Parallel processing challenges.(3)
(v)Clusters and Message passing system.(7)
o Describe GPU Architecture in detail.