Design Flow – Computation Flow
Computation Flow

• For both run-time
  and compile-time

• For some
  applications, must
  iterate




                                          2
Computation flow
 If many reconfigurations have to be
  done, then some of the steps should
  be reiterated according to the
  application's need.
 A synchronization mechanism is
  usually used between the processor
  and the RD.
 Blocking access should also be
  used for the memory access
  between the two devices.




                                        3
Computation flow
 Devices like the Xilinx Virtex II/II-
  Pro up and the Altera Excalibur
  feature one or more soft or hard-
  macro processors.
    − The complete system can be
        integrated in only one device.

 The reconfiguration process can
  be:
    − Full: The complete device have to
      be reconfigured.
    − Partial: Only part of the device is
      configured while the rest keeps
      running.


                                            4
Computation flow
 Full reconfiguration devices                                          task 2

    − Function to be downloaded at run-time are           task 1                       task N
      developed and stored in a database.
    − No geometrical constraints restriction are
      required for the function.                   Services                       Task Request


 Partial reconfiguration capabilities                     Module                     Scheduler
                                                           Database
    − Modules represented as rectangular                   M1      M4

      boxes, are pre-computed and stored in a              M2      M3
      data base.                                                                        Placer

    − With relocation, the modules are assigned    O.S.

      to a position on the device at run-time.
                                                                                       T2



                                                                                            TN
                                                                                 T1

                                                                            Reconfigurable Device




                                                                                                  5
RTR Challenges
                                                                                task 2

•   Management of Reconf. Device:                                 task 1                       task N
        − Usually as a part of the OS running on a
          processor
     Scheduler                                            Services                       Task Request

        − Decides when a task must be executed
        − Tasks in a database                                      Module                     Scheduler
                                                                   Database
        − Characterized by (bbox, run time)                        M1      M4

     Placer                                                       M2      M3
                                                                                                Placer
        − Temporal placement: management of tasks at run
          time                                             O.S.

        − Allocates a set of resources for the task.
                                                                                               T2
        − If cannot find a site, task is rejected
•   Challenges:
                                                                                                    TN
     Fragmentation                                                                      T1

                                                                                    Reconfigurable Device
     Communication between new/old tasks

                                                                                                          6
Design Flow
Hardware/Software Partitioning
•   Implementation of a reconfigurable
    system:
      a Hardware/software co-design
          process:
•   Software part: (code-segment to be
    executed on the processor)
      Development in a software
          language with common tools
•   Hardware part: (to be executed on the
    RD)
      Development in HDL
                                                           Interface
•   Interface:
                                              Software               Hardware
      HDL or system-level languages        C, C++, Java           VHDL, Verilog
                                                etc ...            HandelC, etc..

                                                                              8
FPGA Architecture
•   FPGA architecture from CAD tools’ point of view:
        N BLE’s (Basic Logic Element)
        K-LUT: k-input LUT
        I inputs, N outputs
        Inputs and outputs fully connected to the inputs of each LUT
         through MUXes




                                                                        9
Design Flow for H/w Part
     Almost the same for all digital
       circuit design
•   Synthesis
     Different particularly in Technology
       mapping
       − LUT-technology mapping
       − Specific to target technology (device)




                                                  10
Design Flow for H/w Part



•   Design Entry
      Schematic Netlist
      HDL
     Waveform
     State Diagram




                                    11
Textual or Schematic

•   Most people today use textual languages rather than schematic

      Poor use of screen space.


      Not appropriate for large designs.


      Hard tooling (parsing).




                                                                12
What is Synthesis?
 •   Transformation of an
     abstract description into a
     more detailed description
       "+" operator is
        transformed into a gate
        netlist
       "if (VEC_A = VEC_B)
        then"
         a comparator which
        controls a multiplexer
 •   Transformation depends on
     several factors:
       Algorithm, constraints,
         library
‫ ، مقايسه( به گيتهاي مشخصي‬AND ،OR ‫عملگرهاي ساده ) مثل‬     •
  ‫تبديل مي شوند اما عملگرهاي پيچيده تر مثل ضرب ابتدا به‬
                .‫ تبديل مي شوند‬tool ‫ماکروسلهاي خاص آن‬
                                                              13
Synthesizability



 • Only a subset of VHDL is
   synthesizable
 • Different tools support different
   subsets
    records?
    arrays of integers?
    clock edge detection?
    sensitivity list?
    ...




                                       14
Synthesis
•   Compilation and optimization:
      All non-synthesizable data types and
       operations  synthesizable code
      Translated into a set of Boolean equations
      Then minimized (Technology-independent
       optimization)
•   Technology mapping:
      Assign functional modules to library elements.
      On FPGAs:
        − Mapping control logic and datapath to LUTs and BLEs
        − Mapping optimized datapath to on-chip dedicated
          circuit structures (e.g. on-chip multipliers, adders with
          dedicated carry-chains, embedded memory blocks)
      Technology-dependent optimization



                                                                      15
Synthesis
•   Result:
      Netlist: a list of components and their
       interconnections.
•   Netlist Formats:
      EDIF (Electronic Design Interchange Format).
      Vendor specific formats.
        − Example: XNF (Xilinx Netlist Format)




                                                      16
Physical Design: Place and Route
•   Place:
      Assign locations to the components
      In hierarchical architectures:
        − May need a separate clustering step: to group BLEs into
          logic blocks
        − Clustering: prior to placement or during placement

•   Route:
      Provide communication paths to the
        interconnections.
•   Optimization problems: some cost must be minimized

•   Important factors:
      Clock frequency
      Power Consumption
      Routing congestion
      ...



                                                                    17
FPGA Placement & Routing




                           18
Field Programmable Gate Array (FPGA)




                                       19
Configuration Bitstream
•   Bitstream:

     LUT contents,


     Multiplexer control lines,


     Interconnections,


     ….




                                           20
‫‪Design Flow‬‬



                    ‫برنامه طرح مانند سيکل‬                ‫• ‪Debug‬‬
                                                         ‫: نويسي‬
   ‫برنامه‬    ‫کامپاي‬      ‫اجرا‬
   ‫نويسي‬         ‫ل‬
                   ‫ويراي‬
                      ‫ش‬
‫ورود طرح‬    ‫کامپاي‬     ‫شبيه سازي‬     ‫سنتز‬    ‫شبيه سازي‬
                 ‫ل‬
                    ‫ويراي‬
                                            ‫ويراي‬
                      ‫ش‬
                                              ‫ش‬




                                                                   ‫12‬
FPGA Design Flow – Example
•   Design:
      Modulo 10-counter
•   Target device:
      FPGA with 2x2 Logic Blocks (LB)
      LBs:
        − Two 2-inputs LUTs
        − Two edge-triggered T-Flipflops
•   Objectives:
      Area
      Latency




                                           22
FPGA Design Flow – Example
•   Truth table:        •   Synthesis and Optimization:
     State transitions      Karnaugh maps
     TFF inputs




                                                          23
FPGA Design Flow – Example




                             24
FPGA Design Flow – Example




                             25
References
 [Bobda07] C. Bobda, “Introduction to Reconfigurable
  Computing: Architectures, Algorithms and
  Applications,” Springer, 2007.




                                                        26

design_flow

  • 1.
    Design Flow –Computation Flow
  • 2.
    Computation Flow • Forboth run-time and compile-time • For some applications, must iterate 2
  • 3.
    Computation flow  Ifmany reconfigurations have to be done, then some of the steps should be reiterated according to the application's need.  A synchronization mechanism is usually used between the processor and the RD.  Blocking access should also be used for the memory access between the two devices. 3
  • 4.
    Computation flow  Deviceslike the Xilinx Virtex II/II- Pro up and the Altera Excalibur feature one or more soft or hard- macro processors. − The complete system can be integrated in only one device.  The reconfiguration process can be: − Full: The complete device have to be reconfigured. − Partial: Only part of the device is configured while the rest keeps running. 4
  • 5.
    Computation flow  Fullreconfiguration devices task 2 − Function to be downloaded at run-time are task 1 task N developed and stored in a database. − No geometrical constraints restriction are required for the function. Services Task Request  Partial reconfiguration capabilities Module Scheduler Database − Modules represented as rectangular M1 M4 boxes, are pre-computed and stored in a M2 M3 data base. Placer − With relocation, the modules are assigned O.S. to a position on the device at run-time. T2 TN T1 Reconfigurable Device 5
  • 6.
    RTR Challenges task 2 • Management of Reconf. Device: task 1 task N − Usually as a part of the OS running on a processor  Scheduler Services Task Request − Decides when a task must be executed − Tasks in a database Module Scheduler Database − Characterized by (bbox, run time) M1 M4  Placer M2 M3 Placer − Temporal placement: management of tasks at run time O.S. − Allocates a set of resources for the task. T2 − If cannot find a site, task is rejected • Challenges: TN  Fragmentation T1 Reconfigurable Device  Communication between new/old tasks 6
  • 7.
  • 8.
    Hardware/Software Partitioning • Implementation of a reconfigurable system:  a Hardware/software co-design process: • Software part: (code-segment to be executed on the processor)  Development in a software language with common tools • Hardware part: (to be executed on the RD)  Development in HDL Interface • Interface: Software Hardware  HDL or system-level languages C, C++, Java VHDL, Verilog etc ... HandelC, etc.. 8
  • 9.
    FPGA Architecture • FPGA architecture from CAD tools’ point of view:  N BLE’s (Basic Logic Element)  K-LUT: k-input LUT  I inputs, N outputs  Inputs and outputs fully connected to the inputs of each LUT through MUXes 9
  • 10.
    Design Flow forH/w Part  Almost the same for all digital circuit design • Synthesis  Different particularly in Technology mapping − LUT-technology mapping − Specific to target technology (device) 10
  • 11.
    Design Flow forH/w Part • Design Entry  Schematic Netlist  HDL  Waveform  State Diagram 11
  • 12.
    Textual or Schematic • Most people today use textual languages rather than schematic  Poor use of screen space.  Not appropriate for large designs.  Hard tooling (parsing). 12
  • 13.
    What is Synthesis? • Transformation of an abstract description into a more detailed description  "+" operator is transformed into a gate netlist  "if (VEC_A = VEC_B) then"  a comparator which controls a multiplexer • Transformation depends on several factors:  Algorithm, constraints, library ‫ ، مقايسه( به گيتهاي مشخصي‬AND ،OR ‫عملگرهاي ساده ) مثل‬ • ‫تبديل مي شوند اما عملگرهاي پيچيده تر مثل ضرب ابتدا به‬ .‫ تبديل مي شوند‬tool ‫ماکروسلهاي خاص آن‬ 13
  • 14.
    Synthesizability • Onlya subset of VHDL is synthesizable • Different tools support different subsets  records?  arrays of integers?  clock edge detection?  sensitivity list?  ... 14
  • 15.
    Synthesis • Compilation and optimization:  All non-synthesizable data types and operations  synthesizable code  Translated into a set of Boolean equations  Then minimized (Technology-independent optimization) • Technology mapping:  Assign functional modules to library elements.  On FPGAs: − Mapping control logic and datapath to LUTs and BLEs − Mapping optimized datapath to on-chip dedicated circuit structures (e.g. on-chip multipliers, adders with dedicated carry-chains, embedded memory blocks)  Technology-dependent optimization 15
  • 16.
    Synthesis • Result:  Netlist: a list of components and their interconnections. • Netlist Formats:  EDIF (Electronic Design Interchange Format).  Vendor specific formats. − Example: XNF (Xilinx Netlist Format) 16
  • 17.
    Physical Design: Placeand Route • Place:  Assign locations to the components  In hierarchical architectures: − May need a separate clustering step: to group BLEs into logic blocks − Clustering: prior to placement or during placement • Route:  Provide communication paths to the interconnections. • Optimization problems: some cost must be minimized • Important factors:  Clock frequency  Power Consumption  Routing congestion  ... 17
  • 18.
    FPGA Placement &Routing 18
  • 19.
    Field Programmable GateArray (FPGA) 19
  • 20.
    Configuration Bitstream • Bitstream:  LUT contents,  Multiplexer control lines,  Interconnections,  …. 20
  • 21.
    ‫‪Design Flow‬‬ ‫برنامه طرح مانند سيکل‬ ‫• ‪Debug‬‬ ‫: نويسي‬ ‫برنامه‬ ‫کامپاي‬ ‫اجرا‬ ‫نويسي‬ ‫ل‬ ‫ويراي‬ ‫ش‬ ‫ورود طرح‬ ‫کامپاي‬ ‫شبيه سازي‬ ‫سنتز‬ ‫شبيه سازي‬ ‫ل‬ ‫ويراي‬ ‫ويراي‬ ‫ش‬ ‫ش‬ ‫12‬
  • 22.
    FPGA Design Flow– Example • Design:  Modulo 10-counter • Target device:  FPGA with 2x2 Logic Blocks (LB)  LBs: − Two 2-inputs LUTs − Two edge-triggered T-Flipflops • Objectives:  Area  Latency 22
  • 23.
    FPGA Design Flow– Example • Truth table: • Synthesis and Optimization:  State transitions  Karnaugh maps  TFF inputs 23
  • 24.
    FPGA Design Flow– Example 24
  • 25.
    FPGA Design Flow– Example 25
  • 26.
    References  [Bobda07] C.Bobda, “Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications,” Springer, 2007. 26