2. FINE –GRAIN MULTICOMPUTERS
Shared-memory multiprocessors like the cray y-mp
are used to perform coarse-grain computations in
which each processor executes programs having a
few tasks of 20s or longer.
Message-passing multicomputers are used to execute
medium-grain programs with approximately 10-ms
task size as in the iPSC/1.
3. FINE-GRAIN MULTICOMPUTERS
Fine-grain parallelism appears in SIMD or data-
parallel computers like
The CM-2 or on the message-driven J-Machine and
Mosaic C
5. • Communication latency Tc measures the data or
message transfer time on a system interconnect
• Shared memory access time on the cray Y-MP time
to send 32-bit
• Ts is the time required on a processor by PE
• The sum Tc+Ts gives the total time required for IPC
• The shared memory cray Y-MP has a short Tc but a
long Ts
FINE-GRAIN PARALLELISM
6. The grain size Tg is measured by the execution time
of a typical program , including both time involved
Large grain implies lower concurrency or a lower
DOP.
Fine grain leads to a much higher DOP and also to
higher communication overhaed.
FINE-GRAIN PARALLELISM
7. THE MIT J-MACHINE
The architecture and building block of the MIT J-
Machine, its instruction set ,and system design
considerations.
It is based on the paper by Dalley etal(1992).
The building block is the message-driven
processor(MDP),which is a 36-bit microprocessor
custom-designed for a fine-grain multicomputer
8. J-Machine Architecture
The k-ary n-cube networks have been applied in the
MIT J-Machine.
J-Machine uses a 1024-node network(8*8*16),
which is reduced 16-ary 3-cube with 8 nodes along
the X- and Y-dimensional and 16 node along the Z-
dimension
4096-node J-Machine (16-ary 3-cube with 16*16*16
node)
Every node has a constant node degree of 6 in 3
dimensional
9. MDP (Message Driven Processor) design
• The MDP chip include a processor,a 4096 word by
36-bit memory and a built in router with network
ports.
• MDP handle each arriving message
• It provides communication,synchronization,and
global naming mechanism required to efficiently
support fine-grain,concurrent programming models.
10. MDP (Message Driven Processor) design
The grain-size is an small as8-word objects or 20-
instruction tasks.
Fine-grain programs typically executs from 10 to 100
instructions between communication and
synchronization actions.
MDP appears as acomponent memory port,six two
way network port and diagnostic port.
11. MDP (Message Driven Processor) design
Memory port provides direct interface to 1m words
of ECC DRAM consisting 11 multiplexed address
line,12 bit data bus and 3 control signals.
Network port connect MDPs in 3-dimensional mesh
network.
Each 6 ports corresponds to 1 of the 6 cardinal
directions.
Each ports connect directly to opposite port an
adjacent MDP
12. Chip includes a conventional microprocessor with
control,register file and ALU and memory blocks.
AAU provides addressing function.
13. INSTRUCTION-SET ARCHITECTURE
Instruction set contains fixed – format,3-address
instructions.
Two 17-bit instructions fit into each 36-bit word with
2 bits reserved for type checking.
There are 3 execution level
Background
Priority 0(p0)
Priority 1(p1)
14. INSTRUCTION-SET ARCHITECTURE
MDP executes the background level when no msg
creates a task and initiates execution at p0 or p1.
P1 level has higher priority than p0 level.
Each priority level has 4 GPRS,4 address registers,4
ID register and 1 instruction pointer(IP)
ID register not used in background level
15. COMMUNICATION SUPPORT
MDP provides end to end message delivery
formatting , injection , delivery , buffer allocation ,
buffering and task scheduling.
4 word message using 3 variants of SEND
instruction.
16. THE ROUTER DESIGN
The routers form the switches in a J-Machine
network and deliver messages to their destinations.
MDP contains three independent routers , one for
each bidirectional dimensional of the network.
Each router contains two separate virtual networks
with different priorities that share the same physical
channels.
17. THE ROUTER DESIGN
Each of the 18 router paths contains buffer ,
comparators and output arbitration.
A message entering the dimension competes with
message continuing in the dimension at a two-to-one
switch.
Once a message is granted this switch , all other
input is locked out for the duration of the message
18. GLOBAL NAMING
The AAU, the largest logic block in the MDP,
performs all functions associated with memory
addressing.
To support naming and relocation , the AAU
contains the address and ID registers.
A translation base register defines an area of
memory to be a two-way , set – associative
translation buffer.
19. THE CALTECH MOSAIC C
The project leader at caltech by SEITZ(1992).
The progress in microelectronics over the past
decade has been such that the Mosaic nodes are~60
times faster , use~20 times less power , are ~100
times smaller , and are ~25 times less expensive to
manufacture than the cosmic cube nodes
Each Mosaic node includes 64 Mbytes of memory
and an 11-MIPS processor , a packet interface , and a
router
20. MOSAIC C NODES
The processor also includes two program counters
and two sets of general-purpose registers to allow
zero-time context switching between user programs
and message handling.
Instead of several hundred instructions for handling
a packet , the Mosaic typically requires only about 10
instructions.
21. MOSAIC C 8*8 MESH BOARDS
The choice of a two-dimensional mesh for the Mosaic
was based on a 1989 engineering analysis , originally
, a three –dimensional mesh network was planned .
64 Mosaic chips are packaged by tape-automated
bonding in an 8*8 array on a circuit board.
Host-interface boards are also used to connect the
Mosaic arrays and workstations.