 Reconfigurable Computing mostly stresses the use of
coarse grain reconfigurable arrays (RAs) with paths
greater than one bit, because finegrained architectures
are much less efficient because of huge routing area
overhead and poor routability.
 Since computational datapaths have regular structure,
full custom design of reconfigurable datapath units can
be drastically more area-efficient than by assembling
FPGA way from single bit CLBs.
 Coarse grained architectures provide operator level CFB
(complex functional blocks), word level datapaths, and
powerful and very are a efficient datapath routing
switches.
 A significant benefit of this approach is the
massive reduction of configuration memory
and configuration time, as well as the
reduction of complexity in the place and
route step.
 The obvious drawback is that algorithm
mapping and interconnect resolution, if
certainly simpler than in the case of FPGAs, is
necessarily non-standard, and very
architecture-specific.
 It is composed by a set of Processing Array Clusters (PACs), each
composed by an array of heterogeneous Processing Array
Elements (PAEs) and a low level Configuration Manager (CM).
 Configuration Managers are organized in a hierarchical tree that
handles the bit-stream loading mechanism.
 Communication between PAEs is handled by a packet-oriented
interconnect network.
 Each PAE has 16-bit granularity and is composed by
synchronization register and arithmetical/logical operations,
including multiplication.
 Data exchange is performed by transmission of packets through
the communication network, while I/O is handled by specific
ports located at the four corners of the array.
 In normal operation mode, PAE objects are self-
synchronizing: an operation is performed as
soon as all necessary data input packets are
available, and results are forwarded as soon as
they are available.
 As the full exploitation of parallelism at all levels
is very critical to fully exploit the relevant
computational potential of the architecture, PACT
XPP is programmed through the Native Machine
Language (NML), a structural event-based netlist
description language.
 Other coarse-grained devices are based on the concept of
instruction set Metamorphosis, only utilizing a different architectural
support for mapping extension segments: morphosys,
 It is a very successful RP that also been the base for a few successful
commercial implementations.
 It is composed by a small 32-bit RISC core (TinyRisc), coupled to a
so-called Reconfigurable Cell Array.
 The array is composed by an 8×8 array of identical Reconfigurable
Cells (RCs). Cells are very coarse: each computes 16-bit words and
contains multiplier, ALU, shifter, a small local register file and an
input multiplexing logic.
 The architecture comprises a multi-context configuration memory,
that is capable to overlap computation and configuration in order to
minimize reconfiguration penalty, and a multi-bank frame buffer
that is used to overlap computation on one set of data and
concurrent transfers on a parallel set to enhance overall data
throughput.
 Over the RCs, computation is performed in a
purely Single Instruction Multiple Data (SIMD)
fashion: all cells belonging to the same row
receive the same control word, and thus
compute the same calculation over extended
128-bit words.
 It appears thus evident that the Morphosys
reconfigurable cell array is very performant
and has a much higher area efficiency with
respect to FPGA-based solution
 In this landscape, the fundamental parameters in
the evaluation of a candidate RP for inclusion as
IP in SoC design can be classified as follows:
 a) The design/choice of the reconfigurable fabric
(Computation Grain, Interconnect Infrastructure)
 b) The application mapping flow and its entry
language
 c) The interaction between the fabric and the
processor core (operand feed, synchronization)

Lect3.pptx

  • 2.
     Reconfigurable Computingmostly stresses the use of coarse grain reconfigurable arrays (RAs) with paths greater than one bit, because finegrained architectures are much less efficient because of huge routing area overhead and poor routability.  Since computational datapaths have regular structure, full custom design of reconfigurable datapath units can be drastically more area-efficient than by assembling FPGA way from single bit CLBs.  Coarse grained architectures provide operator level CFB (complex functional blocks), word level datapaths, and powerful and very are a efficient datapath routing switches.
  • 3.
     A significantbenefit of this approach is the massive reduction of configuration memory and configuration time, as well as the reduction of complexity in the place and route step.  The obvious drawback is that algorithm mapping and interconnect resolution, if certainly simpler than in the case of FPGAs, is necessarily non-standard, and very architecture-specific.
  • 4.
     It iscomposed by a set of Processing Array Clusters (PACs), each composed by an array of heterogeneous Processing Array Elements (PAEs) and a low level Configuration Manager (CM).  Configuration Managers are organized in a hierarchical tree that handles the bit-stream loading mechanism.  Communication between PAEs is handled by a packet-oriented interconnect network.  Each PAE has 16-bit granularity and is composed by synchronization register and arithmetical/logical operations, including multiplication.  Data exchange is performed by transmission of packets through the communication network, while I/O is handled by specific ports located at the four corners of the array.
  • 5.
     In normaloperation mode, PAE objects are self- synchronizing: an operation is performed as soon as all necessary data input packets are available, and results are forwarded as soon as they are available.  As the full exploitation of parallelism at all levels is very critical to fully exploit the relevant computational potential of the architecture, PACT XPP is programmed through the Native Machine Language (NML), a structural event-based netlist description language.
  • 8.
     Other coarse-graineddevices are based on the concept of instruction set Metamorphosis, only utilizing a different architectural support for mapping extension segments: morphosys,  It is a very successful RP that also been the base for a few successful commercial implementations.  It is composed by a small 32-bit RISC core (TinyRisc), coupled to a so-called Reconfigurable Cell Array.  The array is composed by an 8×8 array of identical Reconfigurable Cells (RCs). Cells are very coarse: each computes 16-bit words and contains multiplier, ALU, shifter, a small local register file and an input multiplexing logic.  The architecture comprises a multi-context configuration memory, that is capable to overlap computation and configuration in order to minimize reconfiguration penalty, and a multi-bank frame buffer that is used to overlap computation on one set of data and concurrent transfers on a parallel set to enhance overall data throughput.
  • 9.
     Over theRCs, computation is performed in a purely Single Instruction Multiple Data (SIMD) fashion: all cells belonging to the same row receive the same control word, and thus compute the same calculation over extended 128-bit words.  It appears thus evident that the Morphosys reconfigurable cell array is very performant and has a much higher area efficiency with respect to FPGA-based solution
  • 11.
     In thislandscape, the fundamental parameters in the evaluation of a candidate RP for inclusion as IP in SoC design can be classified as follows:  a) The design/choice of the reconfigurable fabric (Computation Grain, Interconnect Infrastructure)  b) The application mapping flow and its entry language  c) The interaction between the fabric and the processor core (operand feed, synchronization)