PARALLEL & DISTRIBUTED COMPUTING LECTURE NOTES
Author: Rizwan Fazal
Date: 06.10.2017
FLYNN’S TAXONOMY OF COMPUTER ARCHITECTURE
Two types of information flow into the processor, instructions and data.
Instruction Stream:
The instruction stream is defined as the sequence of instructions performed by the processing unit.
Data Stream:
The data stream is defined as the data traffic exchanged between the memory and the processing unit.
Computer architectures can be classified into the following four distinct categories;
 Single-instruction single-data streams (SISD)
o Conventional single-processor von Neuman computers are classified as SISD systems
 Single-instruction multiple-data streams (SIMD)
o There is only one control unit and all processors execute the same instruction in a
synchronized fashion.
 Multiple-instruction single-data streams (MISD)
o The same stream of data flows through a linear array of processors executing different
instruction streams
 Multiple-instruction multiple-data streams (MIMD)
o Each processor has its own control unit and can execute different instructions on different
data
o Shared memory systems are also called Symmetric Multiprocessor (SMP) because of
balanced access to memory. Equal opportunity to read/write to memory with equal speed
o Scalability is achieved in distributed (message-passing) system while programming shared
memory system is easier
o In distributed-shared-memory (DSM) system, memory is physically distributed but the
programming model follows the shared memory school of thought
SHARED MEMORY ORGANIZATION
Issues in shared memory system include access control, synchronization, protection and security. Can be
classified as Uniform Memory Access (UMA), Non-uniform Memory Access (NUMA) and Cache-only
memory architecture (COMA).
UMA:
 Shared memory is accessible to all processors through an interconnection network in a way like a
single processor accesses its memory
 Equal access time to any memory location by all processors
 A single bus, multiple busses, crossbar or a multiport memory can be the options for
interconnection network
NUMA:
 Each processor has part of the shared memory attached
 Any processor could access any memory location directly using its real address
 The access time to modules depends on the distance to the processor hence results in non-
uniform memory access time
COMA:
 The shared memory consists of cache memory
 Like NUMA, each processor has part of the shared memory
 Data be migrated to processor requesting it
MESSAGE PASSING ORGANIZATION
A class of multiprocessors in which each processor has access to its own local memory. Communication is
performed via Send and Receive operations.
 A node in such a system consists of a processor and its local memory
 Buffers are temporary memory locations where messages wait until they can be sent or received
 Architecture-specific interconnection structures to geographically dispersed networks are
possible for processing nodes communication
 The message passing systems are scalable to large proportions
 Hypercube networks and nearest neighbor 2 and 3 dimensional mesh networks have been used
 Link bandwidth and network latency are important design factors to consider for interconnection
 Bandwidth refers to number of bits transmitted per unit time (bits/s)
 Time to complete a message transfer is called network latency
Wormhole routing introduced in 1987 as an alternative to traditional store-and-forward routing in order
to reduce the size of the required buffers and to decrease message latency. A packet is divide into smaller
units called flits (flow control bits) such that they move in a pipeline fashion. Header flit leads the way to
the destination node and if blocked due to network congestion, causes remaining flits to get blocked too.
INTERCONNECTION NETWORKS
Criteria to classify interconnection networks (INs) are as follows;
 Mode of operation (synchronous vs. asynchronous)
o Synchronous
 A single global clock is used by all components in the system (Synchronous)
 Tend to be slower but race and hazard-free
o Asynchronous
 Handshaking signals are used in order to coordinate the operation
 Comparatively faster than synchronous systems
 Control strategy (Centralized vs. decentralized)
o Centralized
 A single central control unit is used to oversee and control the operation of the
components of the system
 Crossbar is a centralized system
o Decentralized
 Control function is distributed among different components in the system
 The multistage interconnection networks are decentralized
 Switching technique (circuit vs. packet)
o Circuit
 A complete path has to be established prior to the start of communication
between a source and a destination
 The path will remain in existence during the whole communication period
o Packet
 Messages are divided into packets that lead to their destination by traversing
node by node in a store-and-forward fashion
 Use network resources more efficiently but suffer from variable packet delays
 Topology (static vs. dynamic)
A topology describes how to connect processors and memories to other processors and memories
o Static
 Direct fixed links are established among nodes to form a fixed network
o Dynamic
 Connections are established as needed
Switching elements are used to establish connections among inputs and outputs. Depending on
the switch settings, different interconnections can be established.
INs FOR SHARED AND MESSAGE PASSING SYSTEMS
Shared Memory System
 Can be designed using bus-based or switch-based INs
 The bus may get saturated if multiple processors are trying to access the shared memory
simultaneously
 Caches are used to solve bus contention problem
 Crossbar switch can be used to connect multiple processors to multiple memory modules
 Crossbar switch is a mesh of wires with switches at the points of intersection
Message Passing System
 Can be divided into static and dynamic
 Static networks form all connections when the system is designed rather than when the
connection is needed
 Messages must be routed using established links in static networks
 Dynamic links are established on the fly as messages are routed
 Number of point to point links a message traverses to reach its destination is called hop count
Popular static topologies are;
Linear array, ring, mesh, tree and hypercube
The single stage interconnection network (dynamic) may connect each of the inputs to some but not all
outputs. If we cascade enough single-stage networks together, we can form a completely connected
Multi-stage Interconnection Network (MIN). A function of bits of the source and destination addresses as
instructions are used for dynamically selecting a path through switches between source and destination.
N2
components are needed to connect N x N source/destination pairs in crossbar switch. On the other
hand, N/2(log N) components are required to connect N x N pairs in omega MIN network.
The crossbar switch can establish a connection between source and destination in one clock which is its
major advantage.
The diameter of the crossbar is one. It is defined as the maximum shortest paths between any two nodes
in the network of N nodes. Omega MIN requires log N clocks to make a connection and hence its diameter.
A network that can handle all possible connections without blocking is called a nonblocking network. In
the other case, the network is said to be blocking or blocking network.

Pdc chapter1

  • 1.
    PARALLEL & DISTRIBUTEDCOMPUTING LECTURE NOTES Author: Rizwan Fazal Date: 06.10.2017 FLYNN’S TAXONOMY OF COMPUTER ARCHITECTURE Two types of information flow into the processor, instructions and data. Instruction Stream: The instruction stream is defined as the sequence of instructions performed by the processing unit. Data Stream: The data stream is defined as the data traffic exchanged between the memory and the processing unit. Computer architectures can be classified into the following four distinct categories;  Single-instruction single-data streams (SISD) o Conventional single-processor von Neuman computers are classified as SISD systems  Single-instruction multiple-data streams (SIMD) o There is only one control unit and all processors execute the same instruction in a synchronized fashion.  Multiple-instruction single-data streams (MISD) o The same stream of data flows through a linear array of processors executing different instruction streams  Multiple-instruction multiple-data streams (MIMD) o Each processor has its own control unit and can execute different instructions on different data o Shared memory systems are also called Symmetric Multiprocessor (SMP) because of balanced access to memory. Equal opportunity to read/write to memory with equal speed o Scalability is achieved in distributed (message-passing) system while programming shared memory system is easier o In distributed-shared-memory (DSM) system, memory is physically distributed but the programming model follows the shared memory school of thought SHARED MEMORY ORGANIZATION Issues in shared memory system include access control, synchronization, protection and security. Can be classified as Uniform Memory Access (UMA), Non-uniform Memory Access (NUMA) and Cache-only memory architecture (COMA). UMA:  Shared memory is accessible to all processors through an interconnection network in a way like a single processor accesses its memory  Equal access time to any memory location by all processors  A single bus, multiple busses, crossbar or a multiport memory can be the options for interconnection network NUMA:  Each processor has part of the shared memory attached
  • 2.
     Any processorcould access any memory location directly using its real address  The access time to modules depends on the distance to the processor hence results in non- uniform memory access time COMA:  The shared memory consists of cache memory  Like NUMA, each processor has part of the shared memory  Data be migrated to processor requesting it MESSAGE PASSING ORGANIZATION A class of multiprocessors in which each processor has access to its own local memory. Communication is performed via Send and Receive operations.  A node in such a system consists of a processor and its local memory  Buffers are temporary memory locations where messages wait until they can be sent or received  Architecture-specific interconnection structures to geographically dispersed networks are possible for processing nodes communication  The message passing systems are scalable to large proportions  Hypercube networks and nearest neighbor 2 and 3 dimensional mesh networks have been used  Link bandwidth and network latency are important design factors to consider for interconnection  Bandwidth refers to number of bits transmitted per unit time (bits/s)  Time to complete a message transfer is called network latency Wormhole routing introduced in 1987 as an alternative to traditional store-and-forward routing in order to reduce the size of the required buffers and to decrease message latency. A packet is divide into smaller units called flits (flow control bits) such that they move in a pipeline fashion. Header flit leads the way to the destination node and if blocked due to network congestion, causes remaining flits to get blocked too. INTERCONNECTION NETWORKS Criteria to classify interconnection networks (INs) are as follows;  Mode of operation (synchronous vs. asynchronous) o Synchronous  A single global clock is used by all components in the system (Synchronous)  Tend to be slower but race and hazard-free o Asynchronous  Handshaking signals are used in order to coordinate the operation  Comparatively faster than synchronous systems  Control strategy (Centralized vs. decentralized) o Centralized  A single central control unit is used to oversee and control the operation of the components of the system  Crossbar is a centralized system o Decentralized  Control function is distributed among different components in the system
  • 3.
     The multistageinterconnection networks are decentralized  Switching technique (circuit vs. packet) o Circuit  A complete path has to be established prior to the start of communication between a source and a destination  The path will remain in existence during the whole communication period o Packet  Messages are divided into packets that lead to their destination by traversing node by node in a store-and-forward fashion  Use network resources more efficiently but suffer from variable packet delays  Topology (static vs. dynamic) A topology describes how to connect processors and memories to other processors and memories o Static  Direct fixed links are established among nodes to form a fixed network o Dynamic  Connections are established as needed Switching elements are used to establish connections among inputs and outputs. Depending on the switch settings, different interconnections can be established. INs FOR SHARED AND MESSAGE PASSING SYSTEMS Shared Memory System  Can be designed using bus-based or switch-based INs  The bus may get saturated if multiple processors are trying to access the shared memory simultaneously  Caches are used to solve bus contention problem  Crossbar switch can be used to connect multiple processors to multiple memory modules  Crossbar switch is a mesh of wires with switches at the points of intersection Message Passing System  Can be divided into static and dynamic  Static networks form all connections when the system is designed rather than when the connection is needed  Messages must be routed using established links in static networks  Dynamic links are established on the fly as messages are routed  Number of point to point links a message traverses to reach its destination is called hop count Popular static topologies are; Linear array, ring, mesh, tree and hypercube The single stage interconnection network (dynamic) may connect each of the inputs to some but not all outputs. If we cascade enough single-stage networks together, we can form a completely connected Multi-stage Interconnection Network (MIN). A function of bits of the source and destination addresses as instructions are used for dynamically selecting a path through switches between source and destination.
  • 4.
    N2 components are neededto connect N x N source/destination pairs in crossbar switch. On the other hand, N/2(log N) components are required to connect N x N pairs in omega MIN network. The crossbar switch can establish a connection between source and destination in one clock which is its major advantage. The diameter of the crossbar is one. It is defined as the maximum shortest paths between any two nodes in the network of N nodes. Omega MIN requires log N clocks to make a connection and hence its diameter. A network that can handle all possible connections without blocking is called a nonblocking network. In the other case, the network is said to be blocking or blocking network.