Lect3.pptx

 Reconfigurable Computing mostly stresses the use of
coarse grain reconfigurable arrays (RAs) with paths
greater than one bit, because finegrained architectures
are much less efficient because of huge routing area
overhead and poor routability.
 Since computational datapaths have regular structure,
full custom design of reconfigurable datapath units can
be drastically more area-efficient than by assembling
FPGA way from single bit CLBs.
 Coarse grained architectures provide operator level CFB
(complex functional blocks), word level datapaths, and
powerful and very are a efficient datapath routing
switches.

 A significant benefit of this approach is the
massive reduction of configuration memory
and configuration time, as well as the
reduction of complexity in the place and
route step.
 The obvious drawback is that algorithm
mapping and interconnect resolution, if
certainly simpler than in the case of FPGAs, is
necessarily non-standard, and very
architecture-specific.

 It is composed by a set of Processing Array Clusters (PACs), each
composed by an array of heterogeneous Processing Array
Elements (PAEs) and a low level Configuration Manager (CM).
 Configuration Managers are organized in a hierarchical tree that
handles the bit-stream loading mechanism.
 Communication between PAEs is handled by a packet-oriented
interconnect network.
 Each PAE has 16-bit granularity and is composed by
synchronization register and arithmetical/logical operations,
including multiplication.
 Data exchange is performed by transmission of packets through
the communication network, while I/O is handled by specific
ports located at the four corners of the array.

 In normal operation mode, PAE objects are self-
synchronizing: an operation is performed as
soon as all necessary data input packets are
available, and results are forwarded as soon as
they are available.
 As the full exploitation of parallelism at all levels
is very critical to fully exploit the relevant
computational potential of the architecture, PACT
XPP is programmed through the Native Machine
Language (NML), a structural event-based netlist
description language.

 Other coarse-grained devices are based on the concept of
instruction set Metamorphosis, only utilizing a different architectural
support for mapping extension segments: morphosys,
 It is a very successful RP that also been the base for a few successful
commercial implementations.
 It is composed by a small 32-bit RISC core (TinyRisc), coupled to a
so-called Reconfigurable Cell Array.
 The array is composed by an 8×8 array of identical Reconfigurable
Cells (RCs). Cells are very coarse: each computes 16-bit words and
contains multiplier, ALU, shifter, a small local register file and an
input multiplexing logic.
 The architecture comprises a multi-context configuration memory,
that is capable to overlap computation and configuration in order to
minimize reconfiguration penalty, and a multi-bank frame buffer
that is used to overlap computation on one set of data and
concurrent transfers on a parallel set to enhance overall data
throughput.

 Over the RCs, computation is performed in a
purely Single Instruction Multiple Data (SIMD)
fashion: all cells belonging to the same row
receive the same control word, and thus
compute the same calculation over extended
128-bit words.
 It appears thus evident that the Morphosys
reconfigurable cell array is very performant
and has a much higher area efficiency with
respect to FPGA-based solution

 In this landscape, the fundamental parameters in
the evaluation of a candidate RP for inclusion as
IP in SoC design can be classified as follows:
 a) The design/choice of the reconfigurable fabric
(Computation Grain, Interconnect Infrastructure)
 b) The application mapping flow and its entry
language
 c) The interaction between the fabric and the
processor core (operand feed, synchronization)

Lect3.pptx

More Related Content

Similar to Lect3.pptx

More from Varsha506533

Recently uploaded

Lect3.pptx