H igh  P erformance   P rocessors   and  S ystems   PdM – UIC joint master 2007 Instructor: Prof. Donatella Sciuto HPPS @ PdM – March 2007
Outline DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
D ynamic  Re configurability  A pplied   to  M ulti-FPGA  S ystems
DReAMS Dynamic Reconfigurability Applied to Multi-FPGA Systems Branch of DRESD project Inherits architectures and tools Automatic workflow from VHDL system description to FPGA implementation VHDL parsing and system simulation System creation over a specific architecture Bitstream creation and download onto FPGAs
Multi-FPGA Partitioning Alessandro Panella [email_address]
Project Organization First Phase (15 Mar- 15 Apr) [DONE] Goals State of the art analysis Proposed approach: basic idea Second Phase (15 Apr – 15 May) [PARTIALLY DONE] Goal Partitioning algorithm: development and implementation  Third Phase (15 May – 15 June) [TODO] Goal Algorithm experimental evaluation Physical evaluation using the  DReAMS architecture
Partitioning Two kinds of multi-FPGA partitionings: Topology-aware Architecture topology is an input No optimizaiton in the no. Of FPGAs Association between the (larger)  system graph  and the (smaller)  architecture graph  => PARTITIONING Topology-free Architecture topology is not provided Input: dimension and communication features of FPGAs Minimization of number of FPGAs Place and Route after partitioning
The algorithm (1) Copes with topology-free problem Structural approach Exploits the design hierarchy Tries to keep modules' integrity Several advantages, less work to be done Objectives Minimize the number of FPGAs Minimize inter-FPGA communication Greedy set-covering algorithm
The algorithm (2) Nodes can be: COVERED, UNCOVERED, PARTIALLY COVERED Stop condition: TOP = COVERED In the exploration of the tree, precedence to siblings w.r.t. children => keep module integrity Procedure  cover(set of nodes) Called recursively, starting from TOP
What’s next? Data structure development Algorithm C++ implementation First verification and “tuning” Obtain hierarchical trees from synthesis tool (Synplify) Verification Phisical evaluation Bound with the other branch of DReAMS
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Simone Corbetta Alessandro Meroni Alessio Montone Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
Chimera Multi-FPGAs Architecture Definition Matteo Murgida [email_address]
Project Organization 1st Phase Goals: Digilent Spartan-3 Starter Board study Boards connection 2nd Phase Goals: Communication between two Microblaze soft-processors GPIO integration in the architecture 3rd Phase Goal Interrupt handling Design a simple distributed application to verify the correctness of the proposed approach
Second Phase: results (1/2) Communication between two Microblaze soft-processors Development of a display controller to visualize the data flow
Second Phase: results (2/2) Higher architecture portability through the use of the GPIO IP-Core.
What’s next ... Interrupt handling, also through the use of the Interrupt Controller Development  of a simple application to verify the correctness of the proposed approach
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
CITiES
CITiES
P rocessing  E lements  RE configuration  I n   R econfigurable  A rchitectures Alessio Montone [email_address]
Second Phase Goals Create a software that takes in input .bmm (BRAM used) and .elf (code) file outputs: memory configuration bitstream is device parametric is tailored for Xilinx Virtex II Pro Family FPGAs
Second Phase: results - I
Second Phase: results - II
Second Phase: results - III Output binary file is a downloadable bitstream (on a Core 2 Duo @ 2.33 GHz) Target FPGA Processor #BRAM Blocks #BRAM columns involved marBram execution time (ms) Commands overhead (approx. %) Bitstream size (Kbytes) VP7 Microblaze 4 2 179 1.5 56 VP7 PPC-405 8 3 203 1.5 84 VP7 Microblaze 8 5 263 1.5 136 VP20 PPC-405 8 3 248 1.5 112 VP20 Microblaze 8 5 326 1.5 160 VP20 Microblaze 16 5 326 1.5 160
What’s next… Third phase in details Perform functional tests on  a single output bitstream Debug both bitstream structure and software structure Test a complete processing element Configuring it independently from the rest of the architecture swapping its memory content
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
R econfiguration  O riented  Me trics Alessandro Meroni [email_address]
Second Phase Objectives Real World Applications Analysis Applications Analysis Common Scenarios Identification Characteristics Evaluation Metrics Evaluation Through Graphics supported by a Prototype Analyzer (C/C++) Performance/Area Master/Slave Different Network Simulators Analysis NS2 OMNeT++ SSFNet OPnet
Application Analysis It’s possible to make a classification that binds together the majority of these applications:
Metrics Evaluation We need to consider different metrics w.r.t. different scenarios which FPGAs ? how many elements ? which configuration ? By now, there is a  qualitative  estimation of some metrics’ trends supported by a  Prototype Analyzer Throughput  and  Area   w.r.t. the # of elements of the system (Master/Slave) no configuration information no FPGA information ...
NS-2 good hardcoded modules bad flexibility models are “flat”, cannot create subnetworks difficult separation of concepts: different parameters in same TCL script OMNeT++ good not only for networks (MP systems and hw architectures) very flexible support for hierarchical module structure enforces the separation between  model  and  experiments all parameters in the  omnet.ini  file SSFNet not yet supported: last release on January 15, 2004 OPnet not free Simulators Analysis
Next Phase... Simulator Exploitation Use of  OMNeT++  to gain information w.r.t. the Throughput and other useful metrics Graphics Redefinition and Expansion Analyzer Improvement
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessandro Meroni Alessio Montone Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
RE configurable  C ommunication  I nfrastructure  F or   E mbedded-systems Simone Corbetta [email_address]
April 2007/May 2007: objectives Extend survey  Reconfigurable communication infrastructure exploration De Micheli Verilog description analysis XPIPES  architecture  analysis XPIPES synthesis on Xilinx FPGAs Area requirements Applications and scenarios of dynamic reconfigurability Communication infrastructure model First ideas Basis for next-step implementation
April 2007/May 2007 : work  (1/3) XPIPES Architecture Layered approach to decouple  communication  from  computation Network switches  and  network interfaces XPIPES Methodology  XpipesCompiler  used to automatically generate synthesizable Verilog-based architecture Table 1 : Area requirements of a single-master/single-slave Network-on-Chip
April 2007/May 2007 : work  (2/3) Scenarios and applications RATIONALE : need of a concrete comparative term of performances of our solution w.r.t third-party ones NO existing standard benchmark! Different applications and market segments Automotive Aerospace & defense Industrial Scientific & medical
April 2007/May 2007 : work  (3/3) Communication infrastructure model  (first ideas) Layered approach Flexibility and independent optimization Decoupling  communication from computation Switching and interfacing elements are crucial  Physical and logical  addressing methods Useful for task relocation Adaptive architecture Achieving  fault-tolerance Integrable  with legacy-systems Bridge is required Plugging-in and -off  of IP-Cores
May 2007/June 2007: objectives XPIPES Possible improvements in the context of dynamic reconfiguration  Implementation  (Verilog) Basic essential elements for the communication infrastructure ( reconfigurable switch ) Testing
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
O perating  Sy stem support for  R econf i gurable  S oC
Development of an OS architecture-independent layer for dynamic reconfiguration Ivan Beretta [email_address]
Project Overview Study of current operating system support for dynamic-reconfigurable architectures Two solutions inside DRESD group Definition of an intermediate layer for dynamic reconfiguration support Architecture independent Distribution independent
Second Phase: Goals Implementation of the DRESD operating system solution Old kernel recovery  Hardware architecture replication using ISE and EDK 9.1 version, on Xilinx Virtex II Pro VP7 Layer definition Comparison between existing solutions Basic definition of the boundaries of the new intermediate layer
Second Phase: Results (1 of 2) Recovery of DRESD solution for Caronte Static hardware architecture Bootmanager recovery Bootstrap from flash memory Base kernel Hardware architectures upgrade New synthesis tools (Xilinx ISE and EDK 9) and new cores Kernel compilation Recovery of dynamic-reconfiguration support
Second Phase: Results (2 of 2) Basic definition of the architecture-independent layer Factorization of existing solutions Interface to the reconfiguration controller driver Address space manager module Driver loader module Core caching and placement module Introduction of new elements Reconfiguration scheduler
What’s next… Third phase: Complete definition of the boundaries of the new intermediate layer Full implementation of DRESD existing solutions Module-based reconfigurable architecture  Virtex II Pro VP7 Synthesis flow based on Xilinx ISE and EDK 8.2 and 9.1 Porting of YaRA solution on Virtex II Pro VP7
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
Design FLow Antonio Piazzi [email_address]
Project Organization 1 st   phase (15 March – 15 April): Budgeting Study of the state of art 2 nd   phase(15 April – 15 May): Realization phase Costruction of the entire tools based on prevoiusly separated tools Implementation of a innovative work flow 3 rd  phase (15 May – 15 June): Project’s validation Validation on real architecture and performance’s quotation
Second Phase: results Output files: system.vhd; inserted device wrapper, ngc project files System.vhd scomposition (ArchGen based) Output files: fix.vhd and top.vhd Comunication infrastructure generation (COMiC based) Output file: <file name>.nmc <file name >.xdl Collect information about comunication infrastructure from xdl file  Output file: port.cfg Adding information to top.vhd Start related flow tool Generation of the UCF file
Second Phase: results Basic previously tools: ArchGen ComIC YaRA  script InCA script Generated tool Editing ArchGen output file (top.vhd) Parsing xdl to collect information on busmacro Traslation of YaRA script into sequence of C++ instruction to be include into the earendil tool chain.
State of the progress Second Phase: results Manual process Automated process Planning VHDL gen. UCF and Com. Inf. Gen. Bitstream gen. Merging phase Planning VHDL gen. UCF and Com. Inf. Gen. Bitstream gen. Merging phase
What’s next… Automated switching The tool must be able to recognize from the device type the typology of the communication infrastructure to create and the appropriate flow design  Upgrade of the communication infrastructure with a deep integration of ComIC tool in the project ComIC maybe considered a extension of ArchGen, this guide us to a different approach that free us from the “parserization” of the top file and the xdl file witch deline the bus  Patch for ComIC to create a bus Wishbone compatible The idea is to create a complete bus witch presents all signals proposed by Wishbon protocol
What’s next DReAMS Matteo Murgida Alessandro Panella CITiES Simone Corbetta Alessandro Meroni Alessio Montone Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
Polaris
Polaris Create an integrated HW/SW system to manage 2D reconfiguration SW side: Maintain information on FPGA status Decide of how to efficiently allocate tasks HW side: Provide support for effective task allocation Perform 2D bitstream relocation
Effects of 2D Reconfiguration in a Reconfigurable System Massimo Morandi [email_address]
2 nd  Phase Goals Definition of a 2D reconfiguration allocation manager: Evaluation of the desired features Definition of its structure State of the art analysis: Investigation of literature solutions Comparison of their costs, effectiveness, versatility… to propose a novel one representing a good compromise
Allocation manager Allocation manager desired features: Low TRR Low management overhead High routing efficiency Low fragmentation Allocation manager structure: Empty space manager Complete space  Heuristic selection Fitter General (FF,BL,BF,WF…) Focused (FA,RA… )
Most relevant works Maintain complete information on empty space: KAMER: Keep All Maximally Empty Rectangles Apply a general fitting strategy CUR: Maintain the Countour of a Union of Rectangles Apply a focused fitting strategy Heuristically prune part of the information: KNER: Keep Non-overlapping Empty Rectangles Apply a general fitting strategy 2D-HASHING: Keep Non-ov. Empty Rectangles in optimized data structure Apply (exclusively) a general fitting strategy
Evaluation High placement quality => high complexity Lowest complexity => no focused fitting  (which is bad especially for routing)
Next Phase Chosen approach is heuristic (KNER-like) but with a fitting strategy focused on minimizing routing costs To be done: Clearly define the interface for the allocation manager Design KNER-like empty space manager Integrate routing aware fitting strategy (with Manhattan distance metric)
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
Relocation for 2D Reconfigurable Systems Marco Novati [email_address]
Goals of 2 nd  phase Implementation of BiRF²: Define  the functionality: Create the new bitstream parser Determine fomulae for: FAR calculation CRC calculation Design the structure BiRF² Hw implementation
New Parser
CRC Calculation Particular CRC value, used by Xilinx tools Two version of BiRF Square: By using the “predefined” value With actual CRC calculation An optimized algorithm has been used
Synthesis results On a Virtex-4 with speed grade -12 General purpose version: max frequency of 160 MHz Specific version:  maxfrequency of 290Mhz
What’s next… Simulation of BiRF Square Interfacement on OPB Bus Creation of a toy architecture for the validation Actual validation on the new Virtex-4
What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
H igh  L evel  R econfiguration Marco Maggioni marco.maggioni @dresd.org
Project Organization First Phase Time window: 1 st  month Goal: Clustering Second Phase Time window: 2 nd  month Goal:Coloring Third Phase Time window: 3 rd  month Goal:Scheduling Clustered Graph Metric Circuit Representation Reconfigurable Clustered Graph Area Latency Rec. Time Power Isomorphic Target  Architecture Database Gcc Frontend Partitioning Algorithm PandA Scheduling Algorithm
Second Phase: Coloring Theoretical Work From Clusters to Reconfigurable Graph Definition of the interfaces for Coloring phase Study of a metric for cluster execution time  Implementation of the Coloring phase Coloring based onto delay of nodes Applied to results of isomorphic clustering GraphGen on Earendil  Produce Graph from specification Automatically Integrated with Panda
Second Phase: Coloring Add usefull information for next steps Execution time mandatory for scheduling Area/Power/Rec.Time can optimize the final result Based onto a target architecture Interchangeable metrics Clustered Graph Latency Area Rec. Time Power Needed Usefull
Second Phase: GraphGen Basically a tool for graph generation (DFG,SDG,CDF,BB)... Write .dot files... Here some benchmark... AES Whetstone
What’s next… Third phase in details  Apply reconfigurable scheduling  Adapts specification to reconfigurable architecture  Uses information obtained from coloring Possible different algorithms Define a schedule result structure Implement the Salomone algorithm Publish the entire work onto Earendil
Questions

Rev2 HPPS Project 2007

  • 1.
    H igh P erformance P rocessors and S ystems PdM – UIC joint master 2007 Instructor: Prof. Donatella Sciuto HPPS @ PdM – March 2007
  • 2.
    Outline DReAMS AlessandroPanella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 3.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 4.
    D ynamic Re configurability A pplied to M ulti-FPGA S ystems
  • 5.
    DReAMS Dynamic ReconfigurabilityApplied to Multi-FPGA Systems Branch of DRESD project Inherits architectures and tools Automatic workflow from VHDL system description to FPGA implementation VHDL parsing and system simulation System creation over a specific architecture Bitstream creation and download onto FPGAs
  • 6.
    Multi-FPGA Partitioning AlessandroPanella [email_address]
  • 7.
    Project Organization FirstPhase (15 Mar- 15 Apr) [DONE] Goals State of the art analysis Proposed approach: basic idea Second Phase (15 Apr – 15 May) [PARTIALLY DONE] Goal Partitioning algorithm: development and implementation Third Phase (15 May – 15 June) [TODO] Goal Algorithm experimental evaluation Physical evaluation using the DReAMS architecture
  • 8.
    Partitioning Two kindsof multi-FPGA partitionings: Topology-aware Architecture topology is an input No optimizaiton in the no. Of FPGAs Association between the (larger) system graph and the (smaller) architecture graph => PARTITIONING Topology-free Architecture topology is not provided Input: dimension and communication features of FPGAs Minimization of number of FPGAs Place and Route after partitioning
  • 9.
    The algorithm (1)Copes with topology-free problem Structural approach Exploits the design hierarchy Tries to keep modules' integrity Several advantages, less work to be done Objectives Minimize the number of FPGAs Minimize inter-FPGA communication Greedy set-covering algorithm
  • 10.
    The algorithm (2)Nodes can be: COVERED, UNCOVERED, PARTIALLY COVERED Stop condition: TOP = COVERED In the exploration of the tree, precedence to siblings w.r.t. children => keep module integrity Procedure cover(set of nodes) Called recursively, starting from TOP
  • 11.
    What’s next? Datastructure development Algorithm C++ implementation First verification and “tuning” Obtain hierarchical trees from synthesis tool (Synplify) Verification Phisical evaluation Bound with the other branch of DReAMS
  • 12.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Simone Corbetta Alessandro Meroni Alessio Montone Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 13.
    Chimera Multi-FPGAs ArchitectureDefinition Matteo Murgida [email_address]
  • 14.
    Project Organization 1stPhase Goals: Digilent Spartan-3 Starter Board study Boards connection 2nd Phase Goals: Communication between two Microblaze soft-processors GPIO integration in the architecture 3rd Phase Goal Interrupt handling Design a simple distributed application to verify the correctness of the proposed approach
  • 15.
    Second Phase: results(1/2) Communication between two Microblaze soft-processors Development of a display controller to visualize the data flow
  • 16.
    Second Phase: results(2/2) Higher architecture portability through the use of the GPIO IP-Core.
  • 17.
    What’s next ...Interrupt handling, also through the use of the Interrupt Controller Development of a simple application to verify the correctness of the proposed approach
  • 18.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 19.
  • 20.
  • 21.
    P rocessing E lements RE configuration I n R econfigurable A rchitectures Alessio Montone [email_address]
  • 22.
    Second Phase GoalsCreate a software that takes in input .bmm (BRAM used) and .elf (code) file outputs: memory configuration bitstream is device parametric is tailored for Xilinx Virtex II Pro Family FPGAs
  • 23.
  • 24.
  • 25.
    Second Phase: results- III Output binary file is a downloadable bitstream (on a Core 2 Duo @ 2.33 GHz) Target FPGA Processor #BRAM Blocks #BRAM columns involved marBram execution time (ms) Commands overhead (approx. %) Bitstream size (Kbytes) VP7 Microblaze 4 2 179 1.5 56 VP7 PPC-405 8 3 203 1.5 84 VP7 Microblaze 8 5 263 1.5 136 VP20 PPC-405 8 3 248 1.5 112 VP20 Microblaze 8 5 326 1.5 160 VP20 Microblaze 16 5 326 1.5 160
  • 26.
    What’s next… Thirdphase in details Perform functional tests on a single output bitstream Debug both bitstream structure and software structure Test a complete processing element Configuring it independently from the rest of the architecture swapping its memory content
  • 27.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 28.
    R econfiguration O riented Me trics Alessandro Meroni [email_address]
  • 29.
    Second Phase ObjectivesReal World Applications Analysis Applications Analysis Common Scenarios Identification Characteristics Evaluation Metrics Evaluation Through Graphics supported by a Prototype Analyzer (C/C++) Performance/Area Master/Slave Different Network Simulators Analysis NS2 OMNeT++ SSFNet OPnet
  • 30.
    Application Analysis It’spossible to make a classification that binds together the majority of these applications:
  • 31.
    Metrics Evaluation Weneed to consider different metrics w.r.t. different scenarios which FPGAs ? how many elements ? which configuration ? By now, there is a qualitative estimation of some metrics’ trends supported by a Prototype Analyzer Throughput and Area w.r.t. the # of elements of the system (Master/Slave) no configuration information no FPGA information ...
  • 32.
    NS-2 good hardcodedmodules bad flexibility models are “flat”, cannot create subnetworks difficult separation of concepts: different parameters in same TCL script OMNeT++ good not only for networks (MP systems and hw architectures) very flexible support for hierarchical module structure enforces the separation between model and experiments all parameters in the omnet.ini file SSFNet not yet supported: last release on January 15, 2004 OPnet not free Simulators Analysis
  • 33.
    Next Phase... SimulatorExploitation Use of OMNeT++ to gain information w.r.t. the Throughput and other useful metrics Graphics Redefinition and Expansion Analyzer Improvement
  • 34.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Alessandro Meroni Alessio Montone Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 35.
    RE configurable C ommunication I nfrastructure F or E mbedded-systems Simone Corbetta [email_address]
  • 36.
    April 2007/May 2007:objectives Extend survey Reconfigurable communication infrastructure exploration De Micheli Verilog description analysis XPIPES architecture analysis XPIPES synthesis on Xilinx FPGAs Area requirements Applications and scenarios of dynamic reconfigurability Communication infrastructure model First ideas Basis for next-step implementation
  • 37.
    April 2007/May 2007: work (1/3) XPIPES Architecture Layered approach to decouple communication from computation Network switches and network interfaces XPIPES Methodology XpipesCompiler used to automatically generate synthesizable Verilog-based architecture Table 1 : Area requirements of a single-master/single-slave Network-on-Chip
  • 38.
    April 2007/May 2007: work (2/3) Scenarios and applications RATIONALE : need of a concrete comparative term of performances of our solution w.r.t third-party ones NO existing standard benchmark! Different applications and market segments Automotive Aerospace & defense Industrial Scientific & medical
  • 39.
    April 2007/May 2007: work (3/3) Communication infrastructure model (first ideas) Layered approach Flexibility and independent optimization Decoupling communication from computation Switching and interfacing elements are crucial Physical and logical addressing methods Useful for task relocation Adaptive architecture Achieving fault-tolerance Integrable with legacy-systems Bridge is required Plugging-in and -off of IP-Cores
  • 40.
    May 2007/June 2007:objectives XPIPES Possible improvements in the context of dynamic reconfiguration Implementation (Verilog) Basic essential elements for the communication infrastructure ( reconfigurable switch ) Testing
  • 41.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 42.
    O perating Sy stem support for R econf i gurable S oC
  • 43.
    Development of anOS architecture-independent layer for dynamic reconfiguration Ivan Beretta [email_address]
  • 44.
    Project Overview Studyof current operating system support for dynamic-reconfigurable architectures Two solutions inside DRESD group Definition of an intermediate layer for dynamic reconfiguration support Architecture independent Distribution independent
  • 45.
    Second Phase: GoalsImplementation of the DRESD operating system solution Old kernel recovery Hardware architecture replication using ISE and EDK 9.1 version, on Xilinx Virtex II Pro VP7 Layer definition Comparison between existing solutions Basic definition of the boundaries of the new intermediate layer
  • 46.
    Second Phase: Results(1 of 2) Recovery of DRESD solution for Caronte Static hardware architecture Bootmanager recovery Bootstrap from flash memory Base kernel Hardware architectures upgrade New synthesis tools (Xilinx ISE and EDK 9) and new cores Kernel compilation Recovery of dynamic-reconfiguration support
  • 47.
    Second Phase: Results(2 of 2) Basic definition of the architecture-independent layer Factorization of existing solutions Interface to the reconfiguration controller driver Address space manager module Driver loader module Core caching and placement module Introduction of new elements Reconfiguration scheduler
  • 48.
    What’s next… Thirdphase: Complete definition of the boundaries of the new intermediate layer Full implementation of DRESD existing solutions Module-based reconfigurable architecture Virtex II Pro VP7 Synthesis flow based on Xilinx ISE and EDK 8.2 and 9.1 Porting of YaRA solution on Virtex II Pro VP7
  • 49.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 50.
    Design FLow AntonioPiazzi [email_address]
  • 51.
    Project Organization 1st phase (15 March – 15 April): Budgeting Study of the state of art 2 nd phase(15 April – 15 May): Realization phase Costruction of the entire tools based on prevoiusly separated tools Implementation of a innovative work flow 3 rd phase (15 May – 15 June): Project’s validation Validation on real architecture and performance’s quotation
  • 52.
    Second Phase: resultsOutput files: system.vhd; inserted device wrapper, ngc project files System.vhd scomposition (ArchGen based) Output files: fix.vhd and top.vhd Comunication infrastructure generation (COMiC based) Output file: <file name>.nmc <file name >.xdl Collect information about comunication infrastructure from xdl file Output file: port.cfg Adding information to top.vhd Start related flow tool Generation of the UCF file
  • 53.
    Second Phase: resultsBasic previously tools: ArchGen ComIC YaRA script InCA script Generated tool Editing ArchGen output file (top.vhd) Parsing xdl to collect information on busmacro Traslation of YaRA script into sequence of C++ instruction to be include into the earendil tool chain.
  • 54.
    State of theprogress Second Phase: results Manual process Automated process Planning VHDL gen. UCF and Com. Inf. Gen. Bitstream gen. Merging phase Planning VHDL gen. UCF and Com. Inf. Gen. Bitstream gen. Merging phase
  • 55.
    What’s next… Automatedswitching The tool must be able to recognize from the device type the typology of the communication infrastructure to create and the appropriate flow design Upgrade of the communication infrastructure with a deep integration of ComIC tool in the project ComIC maybe considered a extension of ArchGen, this guide us to a different approach that free us from the “parserization” of the top file and the xdl file witch deline the bus Patch for ComIC to create a bus Wishbone compatible The idea is to create a complete bus witch presents all signals proposed by Wishbon protocol
  • 56.
    What’s next DReAMSMatteo Murgida Alessandro Panella CITiES Simone Corbetta Alessandro Meroni Alessio Montone Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 57.
  • 58.
    Polaris Create anintegrated HW/SW system to manage 2D reconfiguration SW side: Maintain information on FPGA status Decide of how to efficiently allocate tasks HW side: Provide support for effective task allocation Perform 2D bitstream relocation
  • 59.
    Effects of 2DReconfiguration in a Reconfigurable System Massimo Morandi [email_address]
  • 60.
    2 nd Phase Goals Definition of a 2D reconfiguration allocation manager: Evaluation of the desired features Definition of its structure State of the art analysis: Investigation of literature solutions Comparison of their costs, effectiveness, versatility… to propose a novel one representing a good compromise
  • 61.
    Allocation manager Allocationmanager desired features: Low TRR Low management overhead High routing efficiency Low fragmentation Allocation manager structure: Empty space manager Complete space Heuristic selection Fitter General (FF,BL,BF,WF…) Focused (FA,RA… )
  • 62.
    Most relevant worksMaintain complete information on empty space: KAMER: Keep All Maximally Empty Rectangles Apply a general fitting strategy CUR: Maintain the Countour of a Union of Rectangles Apply a focused fitting strategy Heuristically prune part of the information: KNER: Keep Non-overlapping Empty Rectangles Apply a general fitting strategy 2D-HASHING: Keep Non-ov. Empty Rectangles in optimized data structure Apply (exclusively) a general fitting strategy
  • 63.
    Evaluation High placementquality => high complexity Lowest complexity => no focused fitting (which is bad especially for routing)
  • 64.
    Next Phase Chosenapproach is heuristic (KNER-like) but with a fitting strategy focused on minimizing routing costs To be done: Clearly define the interface for the allocation manager Design KNER-like empty space manager Integrate routing aware fitting strategy (with Manhattan distance metric)
  • 65.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 66.
    Relocation for 2DReconfigurable Systems Marco Novati [email_address]
  • 67.
    Goals of 2nd phase Implementation of BiRF²: Define the functionality: Create the new bitstream parser Determine fomulae for: FAR calculation CRC calculation Design the structure BiRF² Hw implementation
  • 68.
  • 69.
    CRC Calculation ParticularCRC value, used by Xilinx tools Two version of BiRF Square: By using the “predefined” value With actual CRC calculation An optimized algorithm has been used
  • 70.
    Synthesis results Ona Virtex-4 with speed grade -12 General purpose version: max frequency of 160 MHz Specific version: maxfrequency of 290Mhz
  • 71.
    What’s next… Simulationof BiRF Square Interfacement on OPB Bus Creation of a toy architecture for the validation Actual validation on the new Virtex-4
  • 72.
    What’s next DReAMSAlessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni
  • 73.
    H igh L evel R econfiguration Marco Maggioni marco.maggioni @dresd.org
  • 74.
    Project Organization FirstPhase Time window: 1 st month Goal: Clustering Second Phase Time window: 2 nd month Goal:Coloring Third Phase Time window: 3 rd month Goal:Scheduling Clustered Graph Metric Circuit Representation Reconfigurable Clustered Graph Area Latency Rec. Time Power Isomorphic Target Architecture Database Gcc Frontend Partitioning Algorithm PandA Scheduling Algorithm
  • 75.
    Second Phase: ColoringTheoretical Work From Clusters to Reconfigurable Graph Definition of the interfaces for Coloring phase Study of a metric for cluster execution time Implementation of the Coloring phase Coloring based onto delay of nodes Applied to results of isomorphic clustering GraphGen on Earendil Produce Graph from specification Automatically Integrated with Panda
  • 76.
    Second Phase: ColoringAdd usefull information for next steps Execution time mandatory for scheduling Area/Power/Rec.Time can optimize the final result Based onto a target architecture Interchangeable metrics Clustered Graph Latency Area Rec. Time Power Needed Usefull
  • 77.
    Second Phase: GraphGenBasically a tool for graph generation (DFG,SDG,CDF,BB)... Write .dot files... Here some benchmark... AES Whetstone
  • 78.
    What’s next… Thirdphase in details Apply reconfigurable scheduling Adapts specification to reconfigurable architecture Uses information obtained from coloring Possible different algorithms Define a schedule result structure Implement the Salomone algorithm Publish the entire work onto Earendil
  • 79.