A multiprocessor system consists of multiple processing units connected via some interconnection network plus the software needed to make the processing units work together.
1. NADAR SARASWATHI COLLEGE OF ARTS &
SCIENCE,THENI.
DEPARTMENT OF CS & IT
ADVANCED COMPUTER ARCHITECTURE
PRESENTED BY
G.KAVIYA
M.SC(IT)
TOPIC:MULTIPROCESSOR
INTERCONNECTS SYSTEM.
2. SYNOPSIS
What is Multiprocessor System Interconnects?
Hierarchical Bus System.
Hierarchical Buses and Caches.
Crossbar Switch and Multiport Memory.
Multistage and Combining Network.
3. WHAT IS MULTIPROCESSOR SYSTEM INTECONNECTS?
Parallel processing demands the use of efficient system
interconnects for fast communication among multiple processors
and shared memory, I/O and peripheral devices.
Commonly used interconnects are hierarchical buses, crossbar
switches and multistage networks.
Dynamic network are used in multiprocessors in which the
interconnection are under program control.
4. Continous:
The three major operational characteristics of an interconnection
network are;
Timing.
Switching.
Control.
Timing control can be synchronous or asynchronous.
Synchronous network controlled the global clock that synchronoizes all
network activities.
Asynchronous use handshaking or interlocking mechanisms to co-
ordinate fast and slow devices.
A network can transfer data using either circuit switching or packet
switching.
5. Continous:
Circuit switching, once a device is granted a path in the
network, it occupies. The path for the entire duration of
the data transfer.
Packet switching, the information is broken into small
packets individually competing for a path in the network.
Network control strategy is classified as centralized or
distributed.
Centralized control, a global controller receives requests
from all devices attached to the network and grants the
network access to one or more requesters.
Distributed system, requests are handle by local devices
independently.
7. HIERARCHICAL BUS SYSTEM:
• Buses connects various system and subsystem
components in a computer.
• Each bus is formed with a number of signal, control
and power lines.
• different buses are used to perform different
interconnection functions.
• Different levels of bus system are local buses on
boards, back plane buses, and I/O buses.
8. LOCAL BUS
o Buses implemented within processor chips or
on printed circuit board.
o It provides a common communication path
among major component (chips) mounted on
the board.
o A memory board uses a memory bus. An I/O or
network interface chip or board uses a data bus.
9. BACKPLANE BUS
o It is a printed circuit on which many connectors are
used to plug in functional boards.
o Example => VME Bus, Multibus II, Future+.
10. I/O BUS
o I/O Devices are connected to a computer system through
an I/O bus such as the,
SCSI(Small Computer System Interface) Bus.
o This bus is made of co-axial cables with taps connecting
disks, printer and other devices to a processor through an
I/O controller.
12. HIERARICHICAL BUSES AND
CACHES
Wilson (1987) has proposed a hierarchical cache bus
architecture.
This is a multilevel tree structure in which the leaf nodes
are processors and their private caches,
(denoted Pj and C1j).
Divided into several clusters, its full of connected through
a cluster bus.
An inter cluster bus is used to provide communications
among the clusters. Second level caches
(denoted as C2i)are used between each cluster and the
intercluster bus.
13. Continous:
Single cluster operates as a single bus system.
Second level caches are used to extend consistency
from each local cluster to the upper level.
The upper level caches form another level of shared
memory between each cluster and the main memory
modules connected to the intercluster bus.
Most memory request should be satisfied at the
lower-level caches.
Intercluster cache coherence is controlled among the
second-level caches and the resulting effects are passed
caches and the resulting effects are passed to the lower
level.
15. CROSSBAR SWITCH AND MULTIPORT
MEMORY
SWITCHED NETWORK:
Dynamic interconnection between inputs and outputs.
Enables configuring the connection structure in a system.
Switched networks are classified based on following aspects:
No of stages in the network.
Blocking and non-blocking network.
16. No.of NETWORK STAGES
Single stage interconnection network:
The inputs nodes are connected to output that is a single stage of
switching elements.
Input to Output connection is made using a single cross-point
switch.
Not all input can reach all output.
Are called as re-circulating network: Data may have to recirculate
through the single stage repeatedly before reaching their destination.
Cheaper implement.
Eg: crossbar switch, multiport memory.
18. No.of NETWORK STAGES
Multi stage interconnection network:
High-speed interconnection networks
Composed of processing elements(PEs) on one of the network.
Memory elements(MEs) on the other end.
Connected by switching elements(SEs).
It consist of more than one stage of switch boxes.
Able to connect from any input to any output.
Formed by cascading multiple single stage switches.
Eg: omega network, flip network, baseline network
19. BLOCKING NETWORK
A network is called a BLOCKING, if simultaneous connections
of some multiple input-output pairs result in conflicts of
using the connection links.
Multiple passes through network may be required to achieve
certain input connections.
Eg: omega networks, baseline network, delta network, banyan
networks
20. NON BLOCKING NETWORK
A network is called NON BLOCKING, if it can perform all
possible connections between any I/O pair, by rearranging
its connections.
A connection path can always be established between any
I/O pair.
Eg: benes networks, clos networks.
21. CROSSBAR SWITCH
A crossbar networks is a single stage network built with unary
switches at the cross points.
Every input port is connected to a free output port through a
cross-point switch without blocking.
It is a single stage, non blocking permutation network.
22. CROSS-POINT SWITCHES
Cross-point switch is a
unary switch which can be
set open or close.
These switches provide
dynamic connections
between source and
destination pairs.
Each cross-point switch
provides dedicated
connection path between
source & destination.
23. CROSS-POINT SWITCHES EXAMPLE
16*16 crossbar network
16 processors
16 memory modules
16 memory modules can be accessed
by processors in parallel
Each memory module can satisfy only
one processor request at a time
Problem arises when multiple
request arrive for same memory
module at the same time
In such case only one request is
serviced at a time
24. CROSS POINT SWITCH DESIGN
An n*n crossbar network require n^2 sets of crosspoint
switches & large no:of lines.
Also require extension hardware as n is very large.
So crossbar network can be built only in commercial
machines with n<=16.
25. Schematic Design Of A Row Of Crosspoint Switch
In A CrossBar Network
Multiple crosspoint switches are
connected simultaneously on each other.
Each processor sends independent
request.
Arbitration logic makes selection based on
certain priority rules.
Acknowledgement signals are used to
indicate arbitration results to all
requesting processors.
Signals initiate data transfer.
They avoid conflicts.
26. Schematic Design Of A Row Of Crosspoint
Switch In A CrossBar Network
Multiplexer module selects one of the n read or write requests.
n sets of data, address and read/write lines are connected too input
of multiplexer tree.
Based on the control signal received, only one out of n sets of
information lines is selected as output of MUX tree.
Memory address is entered for both read and write access.
Read: Data fetched from memory is returned to selected processor
using the data path established.
Write: Data on the data path is stored in the memory data path
bidirectional.
27. ADVANTAGES AND DISADVANTAGES OF
CROSSBAR
ADVANTAGES DISADVANTAGES
o Single processor can send many requests
to multiple memory modules.
o Hardware complexity.
o N memory words can be delivered to
n processor in each cycle.
o Cost effective only for small
multiprocessor with few processor & few
memory modules.
o Offers highest bandwidth.
_________________
28. MULTIPORT MEMORY
This n/w is used by mainframe multiprocessors.
All crosspoint arbitration and switching functions associated with
each memory modules are moved into memory controller.
Memory modules becomes more expensive due to added access
ports.
N switches are tied to n I/p ports of memory modules.
Only one of n processor request can be honoured at a time.
Multiport memory resolves the conflicts among processor.
29. MULTIPORT MEMORY
Used by mainframe multiprocessors.
All crosspoint arbitration and
switching functions associated with
each memory modules are moved into
memory controller.
Memory module becomes more
expensive due to added access ports.
N switches are tied to n input ports of
memory module.
Only one of the n processor requests
can be honoured at a time.
Multiport memory resolves the
conflicts among processor.
30. DRAW BACKS
Memory structure becomes more expensive when m and n
becomes large.
Not scalable : once ports are fixed, no more processor can be added
without redesigning memory controller.
Need for large no : of interconnection cables and connectors when
configuration becomes large.
31. MULTISTAGE NETWORKS
Used to build large microprocessor systems.
Eg: Omega network, Butterfly network
Combining networks are special class of multistage network.
Used for resolving access conflicts automatically through the
network.
It was build in NYU Ultra computer.
32. Omega Multi-stage Interconnection Network
The Omega MIN
o Another popular MIN is a Omega MIN.
o The interconnection between the stages in a Omega Network are
defined by the “rotate left” of the bits used in the port IDs.
o EXAMPLE:
An 8*8 Omega network is interconnected as follows:
000 ---> 000 ---> 000 ---> 000
001 ---> 010 ---> 100 ---> 001
010 ---> 100 ---> 001 ---> 010
011 ---> 110 ---> 101 ---> 011
100 ---> 001 ---> 010 ---> 100
101 ---> 011 ---> 110 ---> 101
110 ---> 101 ---> 011 ---> 110
111 ---> 111 ---> 111 ---> 111
33. PICTORIALLY
HOW TO READ THE FIGURE:
o Pick a number at the left (eg. 4=100)
o Rotate left: 100 ---> 001(=1)
o Connect 4 to 1
34. The Omega MIN in action
o The Omega network operates in the same manner as the
Delta network, so I will be pretty brief in examples.
Example Forwarding in Omega network
o CPU 1 sends a request (address value) to memory model 4.
• Step 0: initial situation
36. • Step 3:
How does the Omega network work?
It works in a similar manner as the Delta network
37. ROUTING IN BUTTERFLY NETWORK
64-input Butterfly network built with Two stages (2 = log8 64) of
8*8 crossbar switches. The eight-way shuffle function is used to
establish the inter stage connections between stage 0 and stage 1.
38. TWO STAGE:
64-input Butterfly
network built with two
stages (2 = log8 64) of
8*8 crossbar switches.
The eight-way shuffle
function is used to
establish the
interstage connections
between stage 0
and stage 1.
Sixteen 8*8 crossbar
switches.
39. THREE STAGE:
A three-stage Butterfly
network is constructed
for 512 inputs, again with
8*8 crossbar switches.
Each of the 64*64 boxes
in this figure is identical
to the two-stage Butterfly
network.
16*8+ 8*8 = 192
crossbar switches are
used in this method.
40. Continuous:
Larger Butterfly networks can be modularly constructed using more
stages.
No broadcast connections are allowed in a Butterfly network making these
networks a restricted subclass of omega networks.
THE HOT SPOT PROBLEM
When the network traffic is non uniform, a hotspot may appear
corresponding to a certain memory module being excessively accessed by
many processors at the same time.
o eg : a semaphore variable
Hotspot may degrade the network performance significantly.
An atomic read-modify-write primitive Fetch&Add (x, e) has been
developed to perform parallel memory updates using the combining
network.
41. FETCH&ADD
In a Fetch&Add(x, e) operation, x is an integer variable in shared
memory and e is an integer increment.
When a single processor executes this operation, the semantics is as:
Fetch&Add(x, e)
{temp x,
x temp – e;
return temp}
When N processor attempt Fetch&Add( x, e) at the same memory
word simultaneously, the memory is updated only once following a
serialization principle.
The sum of the N increments, e1+e2+……+eN, is produced in any
arbitrary serialization of the N requests.
42. Continuous:
This sum is added to the memory word x, resulting in a new value.
x + e1 + e2 +….. +eN.
The values returned to the N requests are all unique, depending on the
serialization order followed.
The net result is similar to a sequential execution of N Fetch&Adds but
is performed in one indivisible operation.
One of the following operations will be performed if processor
P1 executes Ans1Fetch&Add(x, e1) and
P2 executes Ans2Fetch&Add(x, e2) simultaneously on the shared
variable x.
If the request from P1 is executed ahead of that from P2, the following
values are returned:
43. Continuous:
If the request from P1 is executed ahead of that from P2, the
following values are returned:
Ans1 x
Ans2 x + e1
If the execution order is reversed, the following values are
returned:
Ans1 x+ e2
Ans2 x
Regardless of the executing order, the value x +e1+e2 is stored in
memory.
It is the responsibility of the switch box.
44. Continuous:
To form the sum e1+e2,
Transmit the combined request Fetch&Add(x,e1+e2),
Store the value e1(or e2) in a wait butter of the switch, and
Return the values x and x+e1, to satisfy the original requests
Fetch&Add(x, e1) and Fetch&Add(x, e2),
45. APPLICATION AND DRAWBACKS
The Fetch& Add primitive is very effective in accessing sequentially
allocated queue structures in parallel, or in forking out parallel
processes with identical code that operate on different data set.
The advantage of using a combining network to implement the
Fetch&Add operation is achieved at a significant increase in network
cost.
Additional switch cycles are also needed to make the entire operation
an atomic memory operation. This may increases the network latency
significantly.
Multistage combining networks have the potential of supporting
large-scale multiprocessors with thousands of processors.
The problem of increased cost and latency may be alleviated with the
use of faster and cheaper switching technology in the future.
46. Multistage Networks in Real Systems
Examples:
o IBM RP3 using a high-speed Omega network
o Cedar multiprocessor, Ultra computer using Multistage Omega
networks.
o BBN Butterfly processor (TC2000) using Butterfly networks
o Cray Y-MP multiprocessor, The Alliant FX/2800 uses Crossbar
networks.