Rev2 HPPS Project 2007

H igh P erformance P rocessors and S ystems PdM – UIC joint master 2007 Instructor: Prof. Donatella Sciuto HPPS @ PdM – March 2007

Outline DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni

What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessio Montone Alessandro Meroni Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni

D ynamic Re configurability A pplied to M ulti-FPGA S ystems

DReAMS Dynamic Reconfigurability Applied to Multi-FPGA Systems Branch of DRESD project Inherits architectures and tools Automatic workflow from VHDL system description to FPGA implementation VHDL parsing and system simulation System creation over a specific architecture Bitstream creation and download onto FPGAs

Multi-FPGA Partitioning Alessandro Panella [email_address]

Project Organization First Phase (15 Mar- 15 Apr) [DONE] Goals State of the art analysis Proposed approach: basic idea Second Phase (15 Apr – 15 May) [PARTIALLY DONE] Goal Partitioning algorithm: development and implementation Third Phase (15 May – 15 June) [TODO] Goal Algorithm experimental evaluation Physical evaluation using the DReAMS architecture

Partitioning Two kinds of multi-FPGA partitionings: Topology-aware Architecture topology is an input No optimizaiton in the no. Of FPGAs Association between the (larger) system graph and the (smaller) architecture graph => PARTITIONING Topology-free Architecture topology is not provided Input: dimension and communication features of FPGAs Minimization of number of FPGAs Place and Route after partitioning

The algorithm (1) Copes with topology-free problem Structural approach Exploits the design hierarchy Tries to keep modules' integrity Several advantages, less work to be done Objectives Minimize the number of FPGAs Minimize inter-FPGA communication Greedy set-covering algorithm

The algorithm (2) Nodes can be: COVERED, UNCOVERED, PARTIALLY COVERED Stop condition: TOP = COVERED In the exploration of the tree, precedence to siblings w.r.t. children => keep module integrity Procedure cover(set of nodes) Called recursively, starting from TOP

What’s next? Data structure development Algorithm C++ implementation First verification and “tuning” Obtain hierarchical trees from synthesis tool (Synplify) Verification Phisical evaluation Bound with the other branch of DReAMS

What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Simone Corbetta Alessandro Meroni Alessio Montone Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni

Chimera Multi-FPGAs Architecture Definition Matteo Murgida [email_address]

Project Organization 1st Phase Goals: Digilent Spartan-3 Starter Board study Boards connection 2nd Phase Goals: Communication between two Microblaze soft-processors GPIO integration in the architecture 3rd Phase Goal Interrupt handling Design a simple distributed application to verify the correctness of the proposed approach

Second Phase: results (1/2) Communication between two Microblaze soft-processors Development of a display controller to visualize the data flow

Second Phase: results (2/2) Higher architecture portability through the use of the GPIO IP-Core.

What’s next ... Interrupt handling, also through the use of the Interrupt Controller Development of a simple application to verify the correctness of the proposed approach

P rocessing E lements RE configuration I n R econfigurable A rchitectures Alessio Montone [email_address]

Second Phase Goals Create a software that takes in input .bmm (BRAM used) and .elf (code) file outputs: memory configuration bitstream is device parametric is tailored for Xilinx Virtex II Pro Family FPGAs

Second Phase: results - III Output binary file is a downloadable bitstream (on a Core 2 Duo @ 2.33 GHz) Target FPGA Processor #BRAM Blocks #BRAM columns involved marBram execution time (ms) Commands overhead (approx. %) Bitstream size (Kbytes) VP7 Microblaze 4 2 179 1.5 56 VP7 PPC-405 8 3 203 1.5 84 VP7 Microblaze 8 5 263 1.5 136 VP20 PPC-405 8 3 248 1.5 112 VP20 Microblaze 8 5 326 1.5 160 VP20 Microblaze 16 5 326 1.5 160

What’s next… Third phase in details Perform functional tests on a single output bitstream Debug both bitstream structure and software structure Test a complete processing element Configuring it independently from the rest of the architecture swapping its memory content

R econfiguration O riented Me trics Alessandro Meroni [email_address]

Second Phase Objectives Real World Applications Analysis Applications Analysis Common Scenarios Identification Characteristics Evaluation Metrics Evaluation Through Graphics supported by a Prototype Analyzer (C/C++) Performance/Area Master/Slave Different Network Simulators Analysis NS2 OMNeT++ SSFNet OPnet

Application Analysis It’s possible to make a classification that binds together the majority of these applications:

Metrics Evaluation We need to consider different metrics w.r.t. different scenarios which FPGAs ? how many elements ? which configuration ? By now, there is a qualitative estimation of some metrics’ trends supported by a Prototype Analyzer Throughput and Area w.r.t. the # of elements of the system (Master/Slave) no configuration information no FPGA information ...

NS-2 good hardcoded modules bad flexibility models are “flat”, cannot create subnetworks difficult separation of concepts: different parameters in same TCL script OMNeT++ good not only for networks (MP systems and hw architectures) very flexible support for hierarchical module structure enforces the separation between model and experiments all parameters in the omnet.ini file SSFNet not yet supported: last release on January 15, 2004 OPnet not free Simulators Analysis

Next Phase... Simulator Exploitation Use of OMNeT++ to gain information w.r.t. the Throughput and other useful metrics Graphics Redefinition and Expansion Analyzer Improvement

What’s next DReAMS Alessandro Panella Matteo Murgida CITiES Alessandro Meroni Alessio Montone Simone Corbetta Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni

RE configurable C ommunication I nfrastructure F or E mbedded-systems Simone Corbetta [email_address]

April 2007/May 2007: objectives Extend survey Reconfigurable communication infrastructure exploration De Micheli Verilog description analysis XPIPES architecture analysis XPIPES synthesis on Xilinx FPGAs Area requirements Applications and scenarios of dynamic reconfigurability Communication infrastructure model First ideas Basis for next-step implementation

April 2007/May 2007 : work (1/3) XPIPES Architecture Layered approach to decouple communication from computation Network switches and network interfaces XPIPES Methodology XpipesCompiler used to automatically generate synthesizable Verilog-based architecture Table 1 : Area requirements of a single-master/single-slave Network-on-Chip

April 2007/May 2007 : work (2/3) Scenarios and applications RATIONALE : need of a concrete comparative term of performances of our solution w.r.t third-party ones NO existing standard benchmark! Different applications and market segments Automotive Aerospace & defense Industrial Scientific & medical

April 2007/May 2007 : work (3/3) Communication infrastructure model (first ideas) Layered approach Flexibility and independent optimization Decoupling communication from computation Switching and interfacing elements are crucial Physical and logical addressing methods Useful for task relocation Adaptive architecture Achieving fault-tolerance Integrable with legacy-systems Bridge is required Plugging-in and -off of IP-Cores

May 2007/June 2007: objectives XPIPES Possible improvements in the context of dynamic reconfiguration Implementation (Verilog) Basic essential elements for the communication infrastructure ( reconfigurable switch ) Testing

O perating Sy stem support for R econf i gurable S oC

Development of an OS architecture-independent layer for dynamic reconfiguration Ivan Beretta [email_address]

Project Overview Study of current operating system support for dynamic-reconfigurable architectures Two solutions inside DRESD group Definition of an intermediate layer for dynamic reconfiguration support Architecture independent Distribution independent

Second Phase: Goals Implementation of the DRESD operating system solution Old kernel recovery Hardware architecture replication using ISE and EDK 9.1 version, on Xilinx Virtex II Pro VP7 Layer definition Comparison between existing solutions Basic definition of the boundaries of the new intermediate layer

Second Phase: Results (1 of 2) Recovery of DRESD solution for Caronte Static hardware architecture Bootmanager recovery Bootstrap from flash memory Base kernel Hardware architectures upgrade New synthesis tools (Xilinx ISE and EDK 9) and new cores Kernel compilation Recovery of dynamic-reconfiguration support

Second Phase: Results (2 of 2) Basic definition of the architecture-independent layer Factorization of existing solutions Interface to the reconfiguration controller driver Address space manager module Driver loader module Core caching and placement module Introduction of new elements Reconfiguration scheduler

What’s next… Third phase: Complete definition of the boundaries of the new intermediate layer Full implementation of DRESD existing solutions Module-based reconfigurable architecture Virtex II Pro VP7 Synthesis flow based on Xilinx ISE and EDK 8.2 and 9.1 Porting of YaRA solution on Virtex II Pro VP7

Design FLow Antonio Piazzi [email_address]

Project Organization 1 st phase (15 March – 15 April): Budgeting Study of the state of art 2 nd phase(15 April – 15 May): Realization phase Costruction of the entire tools based on prevoiusly separated tools Implementation of a innovative work flow 3 rd phase (15 May – 15 June): Project’s validation Validation on real architecture and performance’s quotation

Second Phase: results Output files: system.vhd; inserted device wrapper, ngc project files System.vhd scomposition (ArchGen based) Output files: fix.vhd and top.vhd Comunication infrastructure generation (COMiC based) Output file: <file name>.nmc <file name >.xdl Collect information about comunication infrastructure from xdl file Output file: port.cfg Adding information to top.vhd Start related flow tool Generation of the UCF file

Second Phase: results Basic previously tools: ArchGen ComIC YaRA script InCA script Generated tool Editing ArchGen output file (top.vhd) Parsing xdl to collect information on busmacro Traslation of YaRA script into sequence of C++ instruction to be include into the earendil tool chain.

State of the progress Second Phase: results Manual process Automated process Planning VHDL gen. UCF and Com. Inf. Gen. Bitstream gen. Merging phase Planning VHDL gen. UCF and Com. Inf. Gen. Bitstream gen. Merging phase

What’s next… Automated switching The tool must be able to recognize from the device type the typology of the communication infrastructure to create and the appropriate flow design Upgrade of the communication infrastructure with a deep integration of ComIC tool in the project ComIC maybe considered a extension of ArchGen, this guide us to a different approach that free us from the “parserization” of the top file and the xdl file witch deline the bus Patch for ComIC to create a bus Wishbone compatible The idea is to create a complete bus witch presents all signals proposed by Wishbon protocol

What’s next DReAMS Matteo Murgida Alessandro Panella CITiES Simone Corbetta Alessandro Meroni Alessio Montone Operating System Ivan Beretta Design Flow Antonio Piazzi Polaris Massimo Morandi Marco Novati HLR Marco Maggioni

Polaris Create an integrated HW/SW system to manage 2D reconfiguration SW side: Maintain information on FPGA status Decide of how to efficiently allocate tasks HW side: Provide support for effective task allocation Perform 2D bitstream relocation

Effects of 2D Reconfiguration in a Reconfigurable System Massimo Morandi [email_address]

2 nd Phase Goals Definition of a 2D reconfiguration allocation manager: Evaluation of the desired features Definition of its structure State of the art analysis: Investigation of literature solutions Comparison of their costs, effectiveness, versatility… to propose a novel one representing a good compromise

Allocation manager Allocation manager desired features: Low TRR Low management overhead High routing efficiency Low fragmentation Allocation manager structure: Empty space manager Complete space Heuristic selection Fitter General (FF,BL,BF,WF…) Focused (FA,RA… )

Most relevant works Maintain complete information on empty space: KAMER: Keep All Maximally Empty Rectangles Apply a general fitting strategy CUR: Maintain the Countour of a Union of Rectangles Apply a focused fitting strategy Heuristically prune part of the information: KNER: Keep Non-overlapping Empty Rectangles Apply a general fitting strategy 2D-HASHING: Keep Non-ov. Empty Rectangles in optimized data structure Apply (exclusively) a general fitting strategy

Evaluation High placement quality => high complexity Lowest complexity => no focused fitting (which is bad especially for routing)

Next Phase Chosen approach is heuristic (KNER-like) but with a fitting strategy focused on minimizing routing costs To be done: Clearly define the interface for the allocation manager Design KNER-like empty space manager Integrate routing aware fitting strategy (with Manhattan distance metric)

Relocation for 2D Reconfigurable Systems Marco Novati [email_address]

Goals of 2 nd phase Implementation of BiRF²: Define the functionality: Create the new bitstream parser Determine fomulae for: FAR calculation CRC calculation Design the structure BiRF² Hw implementation

CRC Calculation Particular CRC value, used by Xilinx tools Two version of BiRF Square: By using the “predefined” value With actual CRC calculation An optimized algorithm has been used

Synthesis results On a Virtex-4 with speed grade -12 General purpose version: max frequency of 160 MHz Specific version: maxfrequency of 290Mhz

What’s next… Simulation of BiRF Square Interfacement on OPB Bus Creation of a toy architecture for the validation Actual validation on the new Virtex-4

H igh L evel R econfiguration Marco Maggioni marco.maggioni @dresd.org

Project Organization First Phase Time window: 1 st month Goal: Clustering Second Phase Time window: 2 nd month Goal:Coloring Third Phase Time window: 3 rd month Goal:Scheduling Clustered Graph Metric Circuit Representation Reconfigurable Clustered Graph Area Latency Rec. Time Power Isomorphic Target Architecture Database Gcc Frontend Partitioning Algorithm PandA Scheduling Algorithm

Second Phase: Coloring Theoretical Work From Clusters to Reconfigurable Graph Definition of the interfaces for Coloring phase Study of a metric for cluster execution time Implementation of the Coloring phase Coloring based onto delay of nodes Applied to results of isomorphic clustering GraphGen on Earendil Produce Graph from specification Automatically Integrated with Panda

Second Phase: Coloring Add usefull information for next steps Execution time mandatory for scheduling Area/Power/Rec.Time can optimize the final result Based onto a target architecture Interchangeable metrics Clustered Graph Latency Area Rec. Time Power Needed Usefull

Second Phase: GraphGen Basically a tool for graph generation (DFG,SDG,CDF,BB)... Write .dot files... Here some benchmark... AES Whetstone

What’s next… Third phase in details Apply reconfigurable scheduling Adapts specification to reconfigurable architecture Uses information obtained from coloring Possible different algorithms Define a schedule result structure Implement the Salomone algorithm Publish the entire work onto Earendil

Rev2 HPPS Project 2007

More Related Content

What's hot

Viewers also liked

Similar to Rev2 HPPS Project 2007

More from Marco Santambrogio

Recently uploaded

Rev2 HPPS Project 2007