Aca2 07 new

CSE539: Advanced Computer Architecture

Chapter 7

Multiprocessors and Multicomputers
Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

Sumit Mittu
Assistant Professor, CSE/IT
Lovely Professional University
sumit.12735@lpu.co.in

In this chapter…
•
•
•
•

Multiprocessor System Interconnects
Cache Coherence and Synchronization Mechanisms
Three Generations of Multi-computers
Message Routing Schemes

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University

2

MULTIPROCESSOR SYSTEM INTERCONNECTS


3

• Network Characteristics
o Topology
• Dynamic Networks
o Timing control protocol
• Synchronous (with global clock)
• Asynchronous (with handshake or interlocking mechanism)
o Switching method
• Circuit switching
• Packet switching
o Control Strategy
• Centralized (global controller to receive requests from all devices and grant network access)
• Distributed (requests handled by local devices independently)

4

• Hierarchical Bus System
o Local Bus (board level)
• Memory bus, data bus
o Backplane Bus (backplane level)
• VME bus (IEEE 1014-1987), Multibus II (IEEE 1296-1987), Futurebus+ (IEEE 896.1-1991)
o I/O Bus (I/O level)
o E.g. Encore Multimax multprocessor’s nanobus
• 20 slots
• 32-bit address path
• 64-bit data path
• Clock rate: 12.5 MHz
• Total Memory bandwidth: 100 Megabytes per second

5



6

• Hierarchical Buses and Caches
o Cache Levels
• First level caches
• Second level caches
o Buses
• (Intra) Cluster Bus
• Inter-cluster bus
o Cache coherence
• Snoopy cache protocol for coherence among first level caches of same cluster
• Intra-cluster cache coherence controlled among second level caches and results passed to
first level caches
o Use of Bridges between multiprocessor clusters

7

• Hierarchical Buses and Caches


8



9



10

• Crossbar Switch Design
o Based on number of network stages
• Single stage (or recirculating) networks
• Multistage networks
o Blocking networks
o Non-blocking (re-arranging) networks
• Crossbar networks
o n x m and n2 Cross-point switch design
o Crossbar benefits and limitations

• Multiport Memory Design
o Multiport Memory


11



12



13

CACHE COHERENCE MECHANISMS
• Cache Coherence Problem
o Inconsistent copies of same memory block in different caches
o Sources of inconsistency:
• Sharing of writable data
• Process migration
• I/O activity

• Protocol Approaches
o Snoopy Bus Protocol
o Directory Based Protocol

• Write Policies
o (Write-back, Write-through) x (Write-invalidate, Write-update)

14



15



16



17



18

• Snoopy Bus Protocols
o Write-through caches
• Write invalidate coherence protocol for write-through caches
• Write-update coherence protocol for write-through caches
• Data item states:
o VALID
o INVALID
• Possible operations:
o Read by same processor R(i)
Read by different processor R( j )
o Write by same processor W(i)
Write by different processor W( j )
o Replace by same processor Z(i)
Replace by different processor Z( j )


19



20

o Write-through caches – write invalidate scheme

Current
State

Operation

New
State

R(i)

Operation

New
State

Valid

R(i)

Valid

W(i)

Valid

W(i)

Valid

Z(i)

Invalid

Z(i)

Invalid

R(j)

Valid

R(j)

Invalid

W(j)

Invalid

W(j)

Invalid

Z(j)

Valid

Current
State

Valid

Z(j)

Invalid

Invalid


21

o Write-back caches
• Ownership protocol: Write invalidate coherence protocol for write-through caches
o RO : Read Only (Valid state)
o RW : Read Write (Valid state)
o INV : Invalid state
• Possible operations:
o Read by same processor R(i)
Read by different processor R( j )
o Write by same processor W(i)
Write by different processor W( j )
o Replace by same processor Z(i)
Replace by different processor Z( j )


22

o Write-back caches – write invalidate (ownership protocol) scheme

Current
State

Operation

New
State

R(i)

Operation

New
State

RO

R(i)

W(i)

RW

Z(i)

INV

R(j)

RO

W(j)
Z(j)

RO
(Valid)

Current
State

Operation

New
State

RW

R(i)

RO

W(i)

RW

W(i)

RW

Z(i)

INV

Z(i)

INV

R(j)

RO

R(j)

INV

INV

W(j)

INV

W(j)

INV

RO

Z(j)

RW

Z(j)

INV

RW
(Valid)


Current
State

INV
(Invalid)

23



24

o Write-once Protocol
• First write using write-through policy
• Subsequent writes using write-back policy
• In both cases, data item copy in remote caches is invalidated
o Valid :cache block consistent with main memory copy
o Reserved : data has been written exactly once and is consistent with main memory
copy
o Dirty : data is written more than once but is not consistent with main memory copy
o Invalid :block not found in cache or is inconsistent with main memory copy


25

o Write-once Protocol
• Cache events and actions:
o Read-miss
o Read-hit
o Write-miss
o Write-hit
o Block replacement


26

• Multilevel Cache Coherence


27

• Protocol Performance issues
o Snoopy Cache Protocol Performance determinants:
• Workload Patterns
• Implementation Efficiency
o Goals/Motivation behind using snooping mechanism
• Reduce bus traffic
• Reduce effective memory access time
o Data Pollution Point
• Miss ratio decreases as block size increases, up to a data pollution point (that is, as blocks
become larger, the probability of finding a desired data item in the cache increases).
• The miss ratio starts to increasing as the block size increases to data pollution point.
o Ping-Pong effect on data shared between multiple caches
• If two processes update a data item alternately, data will continually migrate between two caches
with high miss-rate


28

THREE GENERATIONS OF MULTICOMPUTERS
• Multicomputer v/s Multiprocessor
• Design Choices for Multi-computers
o Processors
• Low cost commodity (off-the-shelf) processors
o Memory Structure
• Distributed memory organization
• Local memory with each processor
o Interconnection Schemes
• Message passing, point-to-point , direct networks with send/receive semantics with/without
uniform message communication speed
o Control Strategy
• Asynchronous MIMD, MPMD and SPMD operations


29



30

• The Past, Present and Future Development
o First Generation
• Example Systems: Caltech’s Cosmic Cube, Intel iPSC/1, Ametek S/14, nCube/10
o Second Generation
• Example Systems: iPSC/2, i860, Delta, nCube/2, Supernode 1000, Ametek Series 2010
o Third Generation
• Example Systems: Caltech’s Mosaic C, J-Machine, Intel Paragon
o First and second generation multi-computers are regarded as medium-grain systems
o Third generation multi-computers were regarded as fine-grain systems.
o Fine-grain and shared memory approach can, in theory, combine the relative merits of
multiprocessors and multi-computers in a heterogeneous processing environment.


31

1st Generation

2nd Generation

3rd Generation

THREE GENERATIONS1OF MULTICOMPUTERS
MIPS
10
100
Typical
MFLOPS (scalar)
Node
Attributes MFLOPS (vector)
Memory Size (in MB)

0.1

2

40

10

40

200

0.5

4

32

Number of Nodes (N)

64

256

1024

64

2560

100 K

6.4

512

40 K

640

10 K

200 K

32

1K

32 K

2000

5

0.5

6000

5

0.5

MIPS
Typical
System MFLOPS (scalar)
Attributes MFLOPS (vector)
Memory Size (in MB)

Local Neighbour
Communi- (in microseconds)
cation
Latency Non-local node
(in microseconds)


32



33

MESSAGE PASSING SCHEMES
• Message Routing Schemes
• Message Formats
o Messages
o Packets
o Flits (Control Flow Digits)
• Data Only Flits
• Sequence Number
• Routing Information

• Store-and-forward routing
• Wormhole routing

34



35

• Asynchronous Pipelining


36

• Latency Analysis
L: Packet length (in bits)
W: Channel Bandwidth (in bits per second)
D: Distance (number of nodes traversed minus 1)
F: Flit length (in bits)
Communication Latency in Store-and-forward Routing
• TSF = L (D + 1) / W
o Communication Latency in Wormhole Routing
• TWH = L / W + F D / W
o
o
o
o
o


37

Aca2 07 new

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Aca2 07 new

Similar to Aca2 07 new (20)

More from Sumit Mittu

More from Sumit Mittu (7)

Recently uploaded

Recently uploaded (20)

Aca2 07 new