Multiprocessor & Multicomputer
OrganisationOrganisation
Parallel and Distributed Computing
Multiprocessing :: Slide 1 of 25David Rye :: MTRX 3700
Multiprocessors and MulticomputersMultiprocessors and Multicomputers
 A multiprocessor system has more than oneA multiprocessor system has more than one
processor, with common memory shared between
processorsp
 A multicomputer system has more than one
processor, with each processor having localprocessor, with each processor having local
memory
 In either case processors may be on a commonIn either case, processors may be on a common
bus (close coupled), or distributed on a network
(loosely coupled)(loosely coupled)
Multiprocessing :: Slide 2 of 25David Rye :: MTRX 3700
Multiprocessing SystemsMultiprocessing Systems
 Generally accepted definition of a
lti i / lti ti tmultiprocessing/multicomputing system:
 Multiple processors, each with its own CPU and memory
I t ti h d Interconnection hardware
 Processors fail independently
 There exists a shared state There exists a shared state
 Appears to users as single system
Multiprocessing :: Slide 3 of 25David Rye :: MTRX 3700
Flynn’s TaxonomyFlynn’s Taxonomy
 Computer system organisation described by two
h t i ticharacteristics
 Number of instruction streams
 Number of data streams Number of data streams
 SISD (PC)
 SIMD (Supercomputer) SIMD (Supercomputer)
 MISD (??)
MIMD (network of processors or network of computers) MIMD (network of processors or network of computers)
 Tightly coupled (backplane)
 Loosely coupled (network) Loosely coupled (network)
 Limited usefulness but serves to categorise
Multiprocessing :: Slide 4 of 25David Rye :: MTRX 3700
 Limited usefulness, but serves to categorise…
SISDSISD
 Single Instruction stream, Single Data stream
 All conventional uniprocessor systems are SISD,
from PCs to mainframes
 Examples: 8080 M6800 M68000 i8086 etc etc etc Examples: 8080, M6800, M68000, i8086, etc, etc, etc.
Multiprocessing :: Slide 5 of 25David Rye :: MTRX 3700
SISDSISD
 Can include Harvard memory
organisation pipelined units
Processor ‘P’
fetchorganisation, pipelined units
 May execute more than one
instruction simultaneously
fetch
decode
st uct o s u ta eous y
(superscalar processor) execute
Minstr Mdata
to I/O
Multiprocessing :: Slide 6 of 25David Rye :: MTRX 3700
SIMDSIMD
 Single Instruction stream, Multiple Data stream
 Often called “Array Processor” or “Vector
Architecture”
 One instruction unit that fetches an instruction then One instruction unit that fetches an instruction, then
commands many processing elements to execute the
same instruction simultaneously on many differentsame instruction simultaneously on many different
data sets
Multiprocessing :: Slide 7 of 25David Rye :: MTRX 3700
SIMDSIMD
 Organisation is usually in the
form of a network of
MemoryMaster
CPU
I/O
form of a network of
processing elements with
local memory
CPU
I/O
 Various topologies are used,
and may be dynamically
fi d 64kconfigured - e.g. 64k
processors in the CM-2
Processing Elements with Local Memory
Multiprocessing :: Slide 8 of 25David Rye :: MTRX 3700
P11 P12 P13 P1y P000
P001
P21 P22 P23 P2y
P P P P
P010
P011
P100P31 P32 P33 P3y
P 1 P 2 P 3 P
P100
P101
P110
P111
N t i hb
Px1 Px2 Px3 Pxy P111
3 b t k Nearest neighbour
network
 May be end-around
 3-cube network
 May be end-around
connected
Multiprocessing :: Slide 9 of 25David Rye :: MTRX 3700
SIMDSIMD
 Examples - mainly Supercomputers
 Goodyear Aerospace MPP (Massively Parallel Processor)
 ICL DAP (Distributed Array Processor)
 Thinking Machines Corp CM-1 and CM-2
 Uses are computational rather than for control
 Comment: In 2011, only 1 of the world’s top 500
supercomputers (see TOP500) had a vectorsupercomputers (see TOP500) had a vector
architecture
Multiprocessing :: Slide 10 of 25David Rye :: MTRX 3700
Dead (Super) Computer SocietyDead (Super) Computer Society
 ACRI  Gould NPL ACRI
 Alliant
 American Supercomputer
 Ametek
 Gould NPL
 Guiltech
 Intel Scientific Computers
 International Parallel Machines
 Applied Dynamics
 Astronautics
 BBN
 CDC
 Kendall Square Research
 Key Computer Laboratories
 MasPar
 MeikoCDC
 Convex
 Cray Computer
 Cray Research
C ll H i
Meiko
 Multiflow
 Myrias
 Numerix
C b Culler-Harris
 Culler Scientific
 Cydrome
 Dana/Ardent/Stellar/Stardent
 nCube
 Prisma
 Thinking Machines
 SaxpyDana/Ardent/Stellar/Stardent
 Denelcor
 Elxsi
 ETA Systems
E d S th l d C t Di i i
Saxpy
 Scientific Computer Systems (SCS)
 Soviet Supercomputers
 Supertek
S t S t (SSI) Evans and Sutherland Computer Division
 Floating Point Systems
 Galaxy YH-1
 Goodyear Aerospace MPP
 Supercomputer Systems (SSI)
 Suprenum
 Vitesse Electronics
(from http://www paralogos com/DeadSuper/ )
Multiprocessing :: Slide 11 of 25David Rye :: MTRX 3700
y p (from http://www.paralogos.com/DeadSuper/ )
(see also their Architectural Themes page)
MISDMISD
 Multiple Instruction stream, Single Data stream
 No true implementations
 Pipelined processors are sometimes regarded as
MISD (each data element is processed by sequentialMISD (each data element is processed by sequential
segments of the pipeline)
Fetch Decode Execute Write
 Examples: Cray-1, CDC Cyber 205, PIC18...
Multiprocessing :: Slide 12 of 25David Rye :: MTRX 3700
MIMDMIMD
 Multiple Instruction stream, Multiple Data stream
 Essentially a group of independent computersEssentially a group of independent computers
 All distributed systems are MIMD All distributed systems are MIMD
Multiprocessing :: Slide 13 of 25David Rye :: MTRX 3700
Parallel and Distributed Computers
Parallel &
distributed
computers
Tightly
L l
Multiprocessors Multicomputers
g y
coupled Loosely
coupled
Multiprocessors
(shared memory)
Multicomputers
(private memory)
Bus Switched Bus Switched
Sequent, Encore Ultracomputer,
RP3
Workstations
on a LAN
Hypercube,
Transputer
A taxonomy of parallel & distributed computer systems
Multiprocessing :: Slide 14 of 25David Rye :: MTRX 3700
Structural ClassificationStructural Classification
 Computer system is
essentiallyessentially
 ‘p’ processing elements =
(CPU + registers + cache) P1 P2 P M1 M2 M
‘p’ Processors ‘m’ Memories
( g )
 ‘m’ memory units
 joined by an inter-
P1 P2 Pp M1 M2 Mm... ...
connection network
M b l l t
Interconnection Network
 Memory may be local to a
processor, shared or both
Multiprocessing :: Slide 15 of 25David Rye :: MTRX 3700
Shared Memory
(Multiprocessor)
Distributed Memory
(Multicomputer or distributed(Multiprocessor) (Multicomputer or distributed
computer system)
‘p’ Processors ‘c’ Computers (c = P and M)
P1 P2 Pp...
M M M Local
C1 C2 Cc
Interconnection
Processors
P1 P2 Pc
M1 M2 Mc memories
...
...
Network N
P1 P2 c...
Memory M
Interconnection
Network N
Multiprocessing :: Slide 16 of 25David Rye :: MTRX 3700
Shared MemoryShared Memory
 If processor A writes 0x55 to its address 2000, then
processor B will read 0x55 from its address 2000.
This is a multiprocessor
 Obviously, some mechanism is needed to resolvey,
contention for the shared resource
Multiprocessing :: Slide 17 of 25David Rye :: MTRX 3700
Multiprocessor interconnections may be
 Bussed (time shared)
 only one bus write at any timey y
 must prevent bus contention at the bus interface ports
 BREQ signals etc
 limited to about 64 processors
 Switched
 multiple simultaneous writes
 requires fast (parallel) bus switches - not cheap!
Multiprocessing :: Slide 18 of 25David Rye :: MTRX 3700
B d S tBussed Systems
 Single shared busP1 P2 Pp M1 M2 Mm
‘p’ Processors ‘m’ Memories
... ... g
 widely used
SystemSystem
bus B
‘p’ Processors ‘m’ Memories
P1 P2 Pp M1 M2 Mm
p Processors m Memories
... ...
 Multiple busses
 relieve bus contention
B1
 provides some
redundancy
B2
Multiprocessing :: Slide 19 of 25David Rye :: MTRX 3700
Bb
Switched SystemsSwitched Systems
‘m’ Memories
 Crossbar switch
 max(m,p) writes at any
M1 M2 Mm...
( ,p) y
time
 requires fast mp bus
P1
switch
‘p’ Processors
P2
.
.
.
Pp
C b t kCrossbar network
Multiprocessing :: Slide 20 of 25David Rye :: MTRX 3700
Switched SystemsSwitched Systems
 Crosspoint switch
 cheaper but slower!!
P1
2x2 switches
M1
p
 used in “Omega”
networks
P2
Proces
M2
Mem
ssors
P3 M3
mories
P4 M4
Multiprocessing :: Slide 21 of 25David Rye :: MTRX 3700
Interconnections (topology) may be either
 Static – fixed by hardwareStatic fixed by hardware
 Dynamic – re-configurable in software perhaps even Dynamic – re-configurable in software, perhaps even
during program execution
Multiprocessing :: Slide 22 of 25David Rye :: MTRX 3700
Static TopologiesStatic Topologies
 Common arrangements Linear
are array, ring, star, cube,
tree, and complete
interconnection of
processors.
Array
Cube
Ring
Star
Fully
t d Tree
Multiprocessing :: Slide 23 of 25David Rye :: MTRX 3700
connected Tree
Static TopologyStatic Topology
 Cube (or hypercube) gives good balance between
 internode length (communications latency)
 number of neighbouring nodes (cost of switching circuitry).
 Several commercial hypercube implementations existyp p
Multiprocessing :: Slide 24 of 25David Rye :: MTRX 3700
Dynamic TopologyDynamic Topology
 Single bus, multiple bus, crossbar-switched and
t k ll l f d iomega networks are all examples of dynamic
topologies.
Multiprocessing :: Slide 25 of 25David Rye :: MTRX 3700

13. multiprocessing

  • 1.
    Multiprocessor & Multicomputer OrganisationOrganisation Paralleland Distributed Computing Multiprocessing :: Slide 1 of 25David Rye :: MTRX 3700
  • 2.
    Multiprocessors and MulticomputersMultiprocessorsand Multicomputers  A multiprocessor system has more than oneA multiprocessor system has more than one processor, with common memory shared between processorsp  A multicomputer system has more than one processor, with each processor having localprocessor, with each processor having local memory  In either case processors may be on a commonIn either case, processors may be on a common bus (close coupled), or distributed on a network (loosely coupled)(loosely coupled) Multiprocessing :: Slide 2 of 25David Rye :: MTRX 3700
  • 3.
    Multiprocessing SystemsMultiprocessing Systems Generally accepted definition of a lti i / lti ti tmultiprocessing/multicomputing system:  Multiple processors, each with its own CPU and memory I t ti h d Interconnection hardware  Processors fail independently  There exists a shared state There exists a shared state  Appears to users as single system Multiprocessing :: Slide 3 of 25David Rye :: MTRX 3700
  • 4.
    Flynn’s TaxonomyFlynn’s Taxonomy Computer system organisation described by two h t i ticharacteristics  Number of instruction streams  Number of data streams Number of data streams  SISD (PC)  SIMD (Supercomputer) SIMD (Supercomputer)  MISD (??) MIMD (network of processors or network of computers) MIMD (network of processors or network of computers)  Tightly coupled (backplane)  Loosely coupled (network) Loosely coupled (network)  Limited usefulness but serves to categorise Multiprocessing :: Slide 4 of 25David Rye :: MTRX 3700  Limited usefulness, but serves to categorise…
  • 5.
    SISDSISD  Single Instructionstream, Single Data stream  All conventional uniprocessor systems are SISD, from PCs to mainframes  Examples: 8080 M6800 M68000 i8086 etc etc etc Examples: 8080, M6800, M68000, i8086, etc, etc, etc. Multiprocessing :: Slide 5 of 25David Rye :: MTRX 3700
  • 6.
    SISDSISD  Can includeHarvard memory organisation pipelined units Processor ‘P’ fetchorganisation, pipelined units  May execute more than one instruction simultaneously fetch decode st uct o s u ta eous y (superscalar processor) execute Minstr Mdata to I/O Multiprocessing :: Slide 6 of 25David Rye :: MTRX 3700
  • 7.
    SIMDSIMD  Single Instructionstream, Multiple Data stream  Often called “Array Processor” or “Vector Architecture”  One instruction unit that fetches an instruction then One instruction unit that fetches an instruction, then commands many processing elements to execute the same instruction simultaneously on many differentsame instruction simultaneously on many different data sets Multiprocessing :: Slide 7 of 25David Rye :: MTRX 3700
  • 8.
    SIMDSIMD  Organisation isusually in the form of a network of MemoryMaster CPU I/O form of a network of processing elements with local memory CPU I/O  Various topologies are used, and may be dynamically fi d 64kconfigured - e.g. 64k processors in the CM-2 Processing Elements with Local Memory Multiprocessing :: Slide 8 of 25David Rye :: MTRX 3700
  • 9.
    P11 P12 P13P1y P000 P001 P21 P22 P23 P2y P P P P P010 P011 P100P31 P32 P33 P3y P 1 P 2 P 3 P P100 P101 P110 P111 N t i hb Px1 Px2 Px3 Pxy P111 3 b t k Nearest neighbour network  May be end-around  3-cube network  May be end-around connected Multiprocessing :: Slide 9 of 25David Rye :: MTRX 3700
  • 10.
    SIMDSIMD  Examples -mainly Supercomputers  Goodyear Aerospace MPP (Massively Parallel Processor)  ICL DAP (Distributed Array Processor)  Thinking Machines Corp CM-1 and CM-2  Uses are computational rather than for control  Comment: In 2011, only 1 of the world’s top 500 supercomputers (see TOP500) had a vectorsupercomputers (see TOP500) had a vector architecture Multiprocessing :: Slide 10 of 25David Rye :: MTRX 3700
  • 11.
    Dead (Super) ComputerSocietyDead (Super) Computer Society  ACRI  Gould NPL ACRI  Alliant  American Supercomputer  Ametek  Gould NPL  Guiltech  Intel Scientific Computers  International Parallel Machines  Applied Dynamics  Astronautics  BBN  CDC  Kendall Square Research  Key Computer Laboratories  MasPar  MeikoCDC  Convex  Cray Computer  Cray Research C ll H i Meiko  Multiflow  Myrias  Numerix C b Culler-Harris  Culler Scientific  Cydrome  Dana/Ardent/Stellar/Stardent  nCube  Prisma  Thinking Machines  SaxpyDana/Ardent/Stellar/Stardent  Denelcor  Elxsi  ETA Systems E d S th l d C t Di i i Saxpy  Scientific Computer Systems (SCS)  Soviet Supercomputers  Supertek S t S t (SSI) Evans and Sutherland Computer Division  Floating Point Systems  Galaxy YH-1  Goodyear Aerospace MPP  Supercomputer Systems (SSI)  Suprenum  Vitesse Electronics (from http://www paralogos com/DeadSuper/ ) Multiprocessing :: Slide 11 of 25David Rye :: MTRX 3700 y p (from http://www.paralogos.com/DeadSuper/ ) (see also their Architectural Themes page)
  • 12.
    MISDMISD  Multiple Instructionstream, Single Data stream  No true implementations  Pipelined processors are sometimes regarded as MISD (each data element is processed by sequentialMISD (each data element is processed by sequential segments of the pipeline) Fetch Decode Execute Write  Examples: Cray-1, CDC Cyber 205, PIC18... Multiprocessing :: Slide 12 of 25David Rye :: MTRX 3700
  • 13.
    MIMDMIMD  Multiple Instructionstream, Multiple Data stream  Essentially a group of independent computersEssentially a group of independent computers  All distributed systems are MIMD All distributed systems are MIMD Multiprocessing :: Slide 13 of 25David Rye :: MTRX 3700
  • 14.
    Parallel and DistributedComputers Parallel & distributed computers Tightly L l Multiprocessors Multicomputers g y coupled Loosely coupled Multiprocessors (shared memory) Multicomputers (private memory) Bus Switched Bus Switched Sequent, Encore Ultracomputer, RP3 Workstations on a LAN Hypercube, Transputer A taxonomy of parallel & distributed computer systems Multiprocessing :: Slide 14 of 25David Rye :: MTRX 3700
  • 15.
    Structural ClassificationStructural Classification Computer system is essentiallyessentially  ‘p’ processing elements = (CPU + registers + cache) P1 P2 P M1 M2 M ‘p’ Processors ‘m’ Memories ( g )  ‘m’ memory units  joined by an inter- P1 P2 Pp M1 M2 Mm... ... connection network M b l l t Interconnection Network  Memory may be local to a processor, shared or both Multiprocessing :: Slide 15 of 25David Rye :: MTRX 3700
  • 16.
    Shared Memory (Multiprocessor) Distributed Memory (Multicomputeror distributed(Multiprocessor) (Multicomputer or distributed computer system) ‘p’ Processors ‘c’ Computers (c = P and M) P1 P2 Pp... M M M Local C1 C2 Cc Interconnection Processors P1 P2 Pc M1 M2 Mc memories ... ... Network N P1 P2 c... Memory M Interconnection Network N Multiprocessing :: Slide 16 of 25David Rye :: MTRX 3700
  • 17.
    Shared MemoryShared Memory If processor A writes 0x55 to its address 2000, then processor B will read 0x55 from its address 2000. This is a multiprocessor  Obviously, some mechanism is needed to resolvey, contention for the shared resource Multiprocessing :: Slide 17 of 25David Rye :: MTRX 3700
  • 18.
    Multiprocessor interconnections maybe  Bussed (time shared)  only one bus write at any timey y  must prevent bus contention at the bus interface ports  BREQ signals etc  limited to about 64 processors  Switched  multiple simultaneous writes  requires fast (parallel) bus switches - not cheap! Multiprocessing :: Slide 18 of 25David Rye :: MTRX 3700
  • 19.
    B d StBussed Systems  Single shared busP1 P2 Pp M1 M2 Mm ‘p’ Processors ‘m’ Memories ... ... g  widely used SystemSystem bus B ‘p’ Processors ‘m’ Memories P1 P2 Pp M1 M2 Mm p Processors m Memories ... ...  Multiple busses  relieve bus contention B1  provides some redundancy B2 Multiprocessing :: Slide 19 of 25David Rye :: MTRX 3700 Bb
  • 20.
    Switched SystemsSwitched Systems ‘m’Memories  Crossbar switch  max(m,p) writes at any M1 M2 Mm... ( ,p) y time  requires fast mp bus P1 switch ‘p’ Processors P2 . . . Pp C b t kCrossbar network Multiprocessing :: Slide 20 of 25David Rye :: MTRX 3700
  • 21.
    Switched SystemsSwitched Systems Crosspoint switch  cheaper but slower!! P1 2x2 switches M1 p  used in “Omega” networks P2 Proces M2 Mem ssors P3 M3 mories P4 M4 Multiprocessing :: Slide 21 of 25David Rye :: MTRX 3700
  • 22.
    Interconnections (topology) maybe either  Static – fixed by hardwareStatic fixed by hardware  Dynamic – re-configurable in software perhaps even Dynamic – re-configurable in software, perhaps even during program execution Multiprocessing :: Slide 22 of 25David Rye :: MTRX 3700
  • 23.
    Static TopologiesStatic Topologies Common arrangements Linear are array, ring, star, cube, tree, and complete interconnection of processors. Array Cube Ring Star Fully t d Tree Multiprocessing :: Slide 23 of 25David Rye :: MTRX 3700 connected Tree
  • 24.
    Static TopologyStatic Topology Cube (or hypercube) gives good balance between  internode length (communications latency)  number of neighbouring nodes (cost of switching circuitry).  Several commercial hypercube implementations existyp p Multiprocessing :: Slide 24 of 25David Rye :: MTRX 3700
  • 25.
    Dynamic TopologyDynamic Topology Single bus, multiple bus, crossbar-switched and t k ll l f d iomega networks are all examples of dynamic topologies. Multiprocessing :: Slide 25 of 25David Rye :: MTRX 3700