PARALLEL
PROCESSING
CONCEPTS
Prof. Shashikant V. Athawale
Assistant Professor | Computer Engineering
Department | AISSMS College of Engineering,
Kennedy Road, Pune , MH, India - 411001
Contents
2
 Introduction to Parallel Computing
 Motivating Parallelism
 Scope of Parallel Computing
 Parallel Programming Platforms
 Implicit Parallelism
 Trends in Microprocessor and Architectures
 Limitations of Memory System Performance
 Dichotomy of Parallel Computing Platforms
 Physical Organization of Parallel Platforms
 Communication Costs in Parallel Machines
 Scalable design principles
 Architectures: N-wide superscalar architectures
 Multi-core architectures.
Introduction to Parallel
Computing
3
A parallel computer is a “Collection of processing
elements that communicate and co-operate to solve large
problems fast”.
Processing of multiple tasks simultaneous on
multiple processor is called parallel processing.
What is Parallel Computing?
Traditionally, software has been written for serial computation:
To be run on a single computer having a single Central Processing Unit (CPU)
What is Parallel Computing?
In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem.
Serial Vs Parallel Computing
Fetch/Store
Compute
Fetch/Store
Compute
communicate
Cooperative game
Motivating Parallelism
7
The role of parallelism in accelerating computing
speeds has been recognized for several decades.
Its role in providing multiplicity of datapaths and
increased access to storage elements has been
significant in commercial applications.
The scalable performance and lower cost of parallel
platforms is reflected in the wide variety of applications.
8
Developing parallel hardware and software has traditionally
been time and effort intensive.
If one is to view this in the context of rapidly improving
uniprocessor speeds, one is tempted to question the need for
parallel computing.
This is the result of a number of fundamental physical and
computational limitations.
The emergence of standardized parallel programming
environments, libraries, and hardware have significantly
reduced time to (parallel) solution.
In short
9
1. Overcome limits to serial computing
2. Limits to increase transistor density
3. Limits to data transmission speed
4. Faster turn-around time
5. Solve larger problems
 Parallel computing has great impact on wide range of
applications.
 Commerical
 Scientific
 Turn around time should be minimum
 High performance
 Resource mangement
 Load balencing
 Dynamic libray
 Minimum network congetion and latency
10
Scope of Parallel Computing
Applications
 Commercial computing.
- Weather forecasting
- Remote sensors, Image processing
- Process optimization, operations research.
 Scientific and Engineering application.
- Computational chemistry
- Molecular modelling
- Structure mechanics
 Business application.
- E – Governance
- Medical Imaging
 Internet applications.
- Internet server
- Digital Libraries
11
 The main objective is to provide sufficient
details to programmer to be able to write
efficient code on variety of platform.
 Performance of various parallel
algorithm.
12
Parallel Programming
Platforms
Implicit Parallelism
A programming language is said to be
implicitly parallel if its compiler or interpreter
can recognize opportunities for
parallelization and implement them without
being told to do so.
13
Implicitly parallel programming
language
 Implicitly parallel programming languages
 Microsoft Axum
 MATLAB's M-code
 ZPL
 Laboratory Virtual Instrument Engineering
Workbench (LabVIEW)
 NESL
 SISAL
 High-Performance Fortran (HPF)
14
Dichotomy of Parallel
Computing Platforms
 First explore a dichotomy based on the logical and
physical organization of parallel platforms.
 The logical organization refers to a programmer's
view of the platform while the physical organization
refers to the actual hardware organization of the
platform.
 The two critical components of parallel computing
from a programmer's perspective are ways of
expressing parallel tasks and mechanisms for
specifying interaction between these tasks.
 The former is sometimes also referred to as the
control structure and the latter as the communication
model.
15
Control Structure of Parallel Platforms
16
Parallel tasks can be specified at various levels of granularity.
At the other extreme, individual instructions within a program
can be viewed as parallel tasks. Between these extremes lie a
range of models for specifying the control structure of programs
and the corresponding architectural support for them.
Parallelism from single instruction on multiple processors
Consider the following code segment that adds two vectors:
1 for (i = 0; i < 1000; i++)
2 c[i] = a[i] + b[i];
In this example, various iterations of the loop are independent
of each other; i.e., c[0] = a[0] + b[0]; c[1] = a[1] + b[1];, etc., can all be
executed independently of each other. Consequently, if there is a mechanism for executing the same
instruction, in this case add on all the processors with appropriate data, we
could execute this loop much faster
A typical SIMD architecture (a) and a typical MIMD
architecture (b).
17
Figure A typical SIMD architecture (a) and a typical MIMD architecture (b).
Executing a conditional statement on an SIMD computer
with four processors: (a) the conditional statement; (b) the
execution of the statement in two steps
18
Communication Model of Parallel Platforms
19
Shared-Address-Space Platforms
Typical shared-address-space architectures: (a) Uniform-memory-access
shared-address-space computer; (b) Uniform-memory-access shared-
address-space computer with caches and memories; (c) Non-uniform-
memory-access shared-address-space computer with local memory only.
Message-Passing Platforms
20
The logical machine view of a message-passing platform
consists of p processing nodes.
Instances clustered workstations and non-shared-address-
space multicomputers.
On such platforms, interactions between processes running
on different nodes must be accomplished using messages,
hence the name message passing.
This exchange of messages is used to transfer data, work,
and to synchronize actions among the processes.
In its most general form, message-passing paradigms
support execution of a different program on each of the p
nodes.
Physical Organization of
Parallel Platforms
21
Architecture of an Ideal Parallel Computer
Exclusive-read, exclusive-write (EREW) PRAM. In this class,
access to a memory location is exclusive. No concurrent read or
write operations are allowed.
Concurrent-read, exclusive-write (CREW) PRAM. In this class,
multiple read accesses to a memory location are allowed.
Exclusive-read, concurrent-write (ERCW) PRAM. Multiple write
accesses are allowed to a memory location, but multiple read
accesses are serialized.
Concurrent-read, concurrent-write (CRCW) PRAM. This class
allows multiple read and write accesses to a common memory
location. This is the most powerful PRAM model.
Interconnection Networks for Parallel Computers
▹ Interconnection networks can be classified
as static or dynamic. Static networks consist of point-
to-point communication links among processing nodes
and are also referred to as direct networks. Figure .Classification
of interconnection networks: (a) a static network; and (b) a dynamic network.
22
Network Topology
23
Linear Arrays
Linear arrays: (a) with no wraparound links; (b) with
wraparound link.
Two and three dimensional meshes: (a) 2-D mesh with no
wraparound; (b) 2-D mesh with wraparound link (2-D
torus); and (c) a 3-D mesh with no wraparound.
24
Construction of hypercubes from hypercubes of lower
dimension.
25
Tree-Based Networks
26
Complete binary tree networks: (a) a static tree network;
and (b) a dynamic tree network.
Scalable Design principles
❖ Avoid the single point of failure.
❖ Scale horizontally, not vertically.
❖ Push work as far away from the core as possible.
❖ API first.
❖ Cache everything, always.
❖ Provide as fresh as needed data.
❖ Design for maintenance and automation.
❖ Asynchronous rather than synchronous.
❖ Strive for statelessness.
N-wide superscalar architecture:
❖ Superscalar architecture is called as N-wide architecture
if it supports to fetch and dispatch of n instructions in
every cycle.
Multi-core architectures:
Multi-core architectures:
❖ Many cores fit on the single processor socket.
❖ 2)Also called Chip-Multiprocessor
❖ 3)These cores runs in parallel.
❖ 4)The architecture of a multicore processor enables
❖ communication between all available cores to ensure that
the processing tasks are divided and assigned accurately.
THANKU YOU !!!!
31

Parallel Processing Concepts

  • 1.
    PARALLEL PROCESSING CONCEPTS Prof. Shashikant V.Athawale Assistant Professor | Computer Engineering Department | AISSMS College of Engineering, Kennedy Road, Pune , MH, India - 411001
  • 2.
    Contents 2  Introduction toParallel Computing  Motivating Parallelism  Scope of Parallel Computing  Parallel Programming Platforms  Implicit Parallelism  Trends in Microprocessor and Architectures  Limitations of Memory System Performance  Dichotomy of Parallel Computing Platforms  Physical Organization of Parallel Platforms  Communication Costs in Parallel Machines  Scalable design principles  Architectures: N-wide superscalar architectures  Multi-core architectures.
  • 3.
    Introduction to Parallel Computing 3 Aparallel computer is a “Collection of processing elements that communicate and co-operate to solve large problems fast”. Processing of multiple tasks simultaneous on multiple processor is called parallel processing.
  • 4.
    What is ParallelComputing? Traditionally, software has been written for serial computation: To be run on a single computer having a single Central Processing Unit (CPU)
  • 5.
    What is ParallelComputing? In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem.
  • 6.
    Serial Vs ParallelComputing Fetch/Store Compute Fetch/Store Compute communicate Cooperative game
  • 7.
    Motivating Parallelism 7 The roleof parallelism in accelerating computing speeds has been recognized for several decades. Its role in providing multiplicity of datapaths and increased access to storage elements has been significant in commercial applications. The scalable performance and lower cost of parallel platforms is reflected in the wide variety of applications.
  • 8.
    8 Developing parallel hardwareand software has traditionally been time and effort intensive. If one is to view this in the context of rapidly improving uniprocessor speeds, one is tempted to question the need for parallel computing. This is the result of a number of fundamental physical and computational limitations. The emergence of standardized parallel programming environments, libraries, and hardware have significantly reduced time to (parallel) solution.
  • 9.
    In short 9 1. Overcomelimits to serial computing 2. Limits to increase transistor density 3. Limits to data transmission speed 4. Faster turn-around time 5. Solve larger problems
  • 10.
     Parallel computinghas great impact on wide range of applications.  Commerical  Scientific  Turn around time should be minimum  High performance  Resource mangement  Load balencing  Dynamic libray  Minimum network congetion and latency 10 Scope of Parallel Computing
  • 11.
    Applications  Commercial computing. -Weather forecasting - Remote sensors, Image processing - Process optimization, operations research.  Scientific and Engineering application. - Computational chemistry - Molecular modelling - Structure mechanics  Business application. - E – Governance - Medical Imaging  Internet applications. - Internet server - Digital Libraries 11
  • 12.
     The mainobjective is to provide sufficient details to programmer to be able to write efficient code on variety of platform.  Performance of various parallel algorithm. 12 Parallel Programming Platforms
  • 13.
    Implicit Parallelism A programminglanguage is said to be implicitly parallel if its compiler or interpreter can recognize opportunities for parallelization and implement them without being told to do so. 13
  • 14.
    Implicitly parallel programming language Implicitly parallel programming languages  Microsoft Axum  MATLAB's M-code  ZPL  Laboratory Virtual Instrument Engineering Workbench (LabVIEW)  NESL  SISAL  High-Performance Fortran (HPF) 14
  • 15.
    Dichotomy of Parallel ComputingPlatforms  First explore a dichotomy based on the logical and physical organization of parallel platforms.  The logical organization refers to a programmer's view of the platform while the physical organization refers to the actual hardware organization of the platform.  The two critical components of parallel computing from a programmer's perspective are ways of expressing parallel tasks and mechanisms for specifying interaction between these tasks.  The former is sometimes also referred to as the control structure and the latter as the communication model. 15
  • 16.
    Control Structure ofParallel Platforms 16 Parallel tasks can be specified at various levels of granularity. At the other extreme, individual instructions within a program can be viewed as parallel tasks. Between these extremes lie a range of models for specifying the control structure of programs and the corresponding architectural support for them. Parallelism from single instruction on multiple processors Consider the following code segment that adds two vectors: 1 for (i = 0; i < 1000; i++) 2 c[i] = a[i] + b[i]; In this example, various iterations of the loop are independent of each other; i.e., c[0] = a[0] + b[0]; c[1] = a[1] + b[1];, etc., can all be executed independently of each other. Consequently, if there is a mechanism for executing the same instruction, in this case add on all the processors with appropriate data, we could execute this loop much faster
  • 17.
    A typical SIMDarchitecture (a) and a typical MIMD architecture (b). 17 Figure A typical SIMD architecture (a) and a typical MIMD architecture (b).
  • 18.
    Executing a conditionalstatement on an SIMD computer with four processors: (a) the conditional statement; (b) the execution of the statement in two steps 18
  • 19.
    Communication Model ofParallel Platforms 19 Shared-Address-Space Platforms Typical shared-address-space architectures: (a) Uniform-memory-access shared-address-space computer; (b) Uniform-memory-access shared- address-space computer with caches and memories; (c) Non-uniform- memory-access shared-address-space computer with local memory only.
  • 20.
    Message-Passing Platforms 20 The logicalmachine view of a message-passing platform consists of p processing nodes. Instances clustered workstations and non-shared-address- space multicomputers. On such platforms, interactions between processes running on different nodes must be accomplished using messages, hence the name message passing. This exchange of messages is used to transfer data, work, and to synchronize actions among the processes. In its most general form, message-passing paradigms support execution of a different program on each of the p nodes.
  • 21.
    Physical Organization of ParallelPlatforms 21 Architecture of an Ideal Parallel Computer Exclusive-read, exclusive-write (EREW) PRAM. In this class, access to a memory location is exclusive. No concurrent read or write operations are allowed. Concurrent-read, exclusive-write (CREW) PRAM. In this class, multiple read accesses to a memory location are allowed. Exclusive-read, concurrent-write (ERCW) PRAM. Multiple write accesses are allowed to a memory location, but multiple read accesses are serialized. Concurrent-read, concurrent-write (CRCW) PRAM. This class allows multiple read and write accesses to a common memory location. This is the most powerful PRAM model.
  • 22.
    Interconnection Networks forParallel Computers ▹ Interconnection networks can be classified as static or dynamic. Static networks consist of point- to-point communication links among processing nodes and are also referred to as direct networks. Figure .Classification of interconnection networks: (a) a static network; and (b) a dynamic network. 22
  • 23.
    Network Topology 23 Linear Arrays Lineararrays: (a) with no wraparound links; (b) with wraparound link.
  • 24.
    Two and threedimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound. 24
  • 25.
    Construction of hypercubesfrom hypercubes of lower dimension. 25
  • 26.
    Tree-Based Networks 26 Complete binarytree networks: (a) a static tree network; and (b) a dynamic tree network.
  • 27.
    Scalable Design principles ❖Avoid the single point of failure. ❖ Scale horizontally, not vertically. ❖ Push work as far away from the core as possible. ❖ API first. ❖ Cache everything, always. ❖ Provide as fresh as needed data. ❖ Design for maintenance and automation. ❖ Asynchronous rather than synchronous. ❖ Strive for statelessness.
  • 28.
    N-wide superscalar architecture: ❖Superscalar architecture is called as N-wide architecture if it supports to fetch and dispatch of n instructions in every cycle.
  • 29.
  • 30.
    Multi-core architectures: ❖ Manycores fit on the single processor socket. ❖ 2)Also called Chip-Multiprocessor ❖ 3)These cores runs in parallel. ❖ 4)The architecture of a multicore processor enables ❖ communication between all available cores to ensure that the processing tasks are divided and assigned accurately.
  • 31.