This document provides an overview of parallel and distributed computing concepts including Flynn's taxonomy, shared memory organization, message passing organization, and interconnection networks. Flynn's taxonomy classifies computer architectures into four categories based on their instruction and data streams: SISD, SIMD, MISD, and MIMD. Shared memory systems can be UMA, NUMA, or COMA based on memory access uniformity. Message passing systems use send/receive operations for communication between nodes with local memory. Interconnection networks are discussed in terms of their mode of operation, control strategy, switching technique, and topology for connecting processors in shared and message passing systems.
1. PARALLEL & DISTRIBUTED COMPUTING LECTURE NOTES
Author: Rizwan Fazal
Date: 06.10.2017
FLYNN’S TAXONOMY OF COMPUTER ARCHITECTURE
Two types of information flow into the processor, instructions and data.
Instruction Stream:
The instruction stream is defined as the sequence of instructions performed by the processing unit.
Data Stream:
The data stream is defined as the data traffic exchanged between the memory and the processing unit.
Computer architectures can be classified into the following four distinct categories;
Single-instruction single-data streams (SISD)
o Conventional single-processor von Neuman computers are classified as SISD systems
Single-instruction multiple-data streams (SIMD)
o There is only one control unit and all processors execute the same instruction in a
synchronized fashion.
Multiple-instruction single-data streams (MISD)
o The same stream of data flows through a linear array of processors executing different
instruction streams
Multiple-instruction multiple-data streams (MIMD)
o Each processor has its own control unit and can execute different instructions on different
data
o Shared memory systems are also called Symmetric Multiprocessor (SMP) because of
balanced access to memory. Equal opportunity to read/write to memory with equal speed
o Scalability is achieved in distributed (message-passing) system while programming shared
memory system is easier
o In distributed-shared-memory (DSM) system, memory is physically distributed but the
programming model follows the shared memory school of thought
SHARED MEMORY ORGANIZATION
Issues in shared memory system include access control, synchronization, protection and security. Can be
classified as Uniform Memory Access (UMA), Non-uniform Memory Access (NUMA) and Cache-only
memory architecture (COMA).
UMA:
Shared memory is accessible to all processors through an interconnection network in a way like a
single processor accesses its memory
Equal access time to any memory location by all processors
A single bus, multiple busses, crossbar or a multiport memory can be the options for
interconnection network
NUMA:
Each processor has part of the shared memory attached
2. Any processor could access any memory location directly using its real address
The access time to modules depends on the distance to the processor hence results in non-
uniform memory access time
COMA:
The shared memory consists of cache memory
Like NUMA, each processor has part of the shared memory
Data be migrated to processor requesting it
MESSAGE PASSING ORGANIZATION
A class of multiprocessors in which each processor has access to its own local memory. Communication is
performed via Send and Receive operations.
A node in such a system consists of a processor and its local memory
Buffers are temporary memory locations where messages wait until they can be sent or received
Architecture-specific interconnection structures to geographically dispersed networks are
possible for processing nodes communication
The message passing systems are scalable to large proportions
Hypercube networks and nearest neighbor 2 and 3 dimensional mesh networks have been used
Link bandwidth and network latency are important design factors to consider for interconnection
Bandwidth refers to number of bits transmitted per unit time (bits/s)
Time to complete a message transfer is called network latency
Wormhole routing introduced in 1987 as an alternative to traditional store-and-forward routing in order
to reduce the size of the required buffers and to decrease message latency. A packet is divide into smaller
units called flits (flow control bits) such that they move in a pipeline fashion. Header flit leads the way to
the destination node and if blocked due to network congestion, causes remaining flits to get blocked too.
INTERCONNECTION NETWORKS
Criteria to classify interconnection networks (INs) are as follows;
Mode of operation (synchronous vs. asynchronous)
o Synchronous
A single global clock is used by all components in the system (Synchronous)
Tend to be slower but race and hazard-free
o Asynchronous
Handshaking signals are used in order to coordinate the operation
Comparatively faster than synchronous systems
Control strategy (Centralized vs. decentralized)
o Centralized
A single central control unit is used to oversee and control the operation of the
components of the system
Crossbar is a centralized system
o Decentralized
Control function is distributed among different components in the system
3. The multistage interconnection networks are decentralized
Switching technique (circuit vs. packet)
o Circuit
A complete path has to be established prior to the start of communication
between a source and a destination
The path will remain in existence during the whole communication period
o Packet
Messages are divided into packets that lead to their destination by traversing
node by node in a store-and-forward fashion
Use network resources more efficiently but suffer from variable packet delays
Topology (static vs. dynamic)
A topology describes how to connect processors and memories to other processors and memories
o Static
Direct fixed links are established among nodes to form a fixed network
o Dynamic
Connections are established as needed
Switching elements are used to establish connections among inputs and outputs. Depending on
the switch settings, different interconnections can be established.
INs FOR SHARED AND MESSAGE PASSING SYSTEMS
Shared Memory System
Can be designed using bus-based or switch-based INs
The bus may get saturated if multiple processors are trying to access the shared memory
simultaneously
Caches are used to solve bus contention problem
Crossbar switch can be used to connect multiple processors to multiple memory modules
Crossbar switch is a mesh of wires with switches at the points of intersection
Message Passing System
Can be divided into static and dynamic
Static networks form all connections when the system is designed rather than when the
connection is needed
Messages must be routed using established links in static networks
Dynamic links are established on the fly as messages are routed
Number of point to point links a message traverses to reach its destination is called hop count
Popular static topologies are;
Linear array, ring, mesh, tree and hypercube
The single stage interconnection network (dynamic) may connect each of the inputs to some but not all
outputs. If we cascade enough single-stage networks together, we can form a completely connected
Multi-stage Interconnection Network (MIN). A function of bits of the source and destination addresses as
instructions are used for dynamically selecting a path through switches between source and destination.
4. N2
components are needed to connect N x N source/destination pairs in crossbar switch. On the other
hand, N/2(log N) components are required to connect N x N pairs in omega MIN network.
The crossbar switch can establish a connection between source and destination in one clock which is its
major advantage.
The diameter of the crossbar is one. It is defined as the maximum shortest paths between any two nodes
in the network of N nodes. Omega MIN requires log N clocks to make a connection and hence its diameter.
A network that can handle all possible connections without blocking is called a nonblocking network. In
the other case, the network is said to be blocking or blocking network.