HPPS - Final - 06/14/2007
Upcoming SlideShare
Loading in...5
×
 

HPPS - Final - 06/14/2007

on

  • 1,795 views

 

Statistics

Views

Total Views
1,795
Views on SlideShare
1,792
Embed Views
3

Actions

Likes
1
Downloads
34
Comments
0

3 Embeds 3

http://www.dresd.org 1
http://www.slideshare.net 1
http://www.techgig.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

HPPS - Final - 06/14/2007 HPPS - Final - 06/14/2007 Presentation Transcript

  • H igh P erformance P rocessors and S ystems PdM – UIC joint master 2007 Instructor: Prof. Donatella Sciuto HPPS @ PdM – June 2007
  • General Outline
    • DRESD
    • DReAMS
      • Alessandro Panella
      • Matteo Murgida
    • Operating System
      • Ivan Beretta
    • Design Flow
      • Antonio Piazzi
    • Polaris
      • Massimo Morandi
      • Marco Novati
    • HLR
      • Marco Maggioni
  • DRESD in a Nutshell D ynamic R econfigurability in E mbedded S ystem D esign DRESD @ PdM – June 2007 View slide
  • Outline
    • Reconfiguration
      • Motivations
      • Basic Definition
      • SoC
    View slide
  • Motivations
    • Increasing need for behavioral flexibility in embedded systems design
      • Support of new standards, e.g. in media processing
      • Addition of new features
    • Applications too large to fit on the device all at once
    • Speedup the overall computation of the final system
  • Reconfiguration
    • The process of physically altering the location or functionality of network or system elements. Automatic configuration describes the way sophisticated networks can readjust themselves in the event of a link or device failing, enabling the network to continue operation.
    • Gerald Estrin, 1960
  • SoC Reconfiguration f i x Partial Total Embedded
  • Different Scenarios... Single Device Distributed System
  • What’s next
    • DRESD
    • DReAMS
      • Alessandro Panella
      • Matteo Murgida
    • Operating System
      • Ivan Beretta
    • Design Flow
      • Antonio Piazzi
    • Polaris
      • Massimo Morandi
      • Marco Novati
    • HLR
      • Marco Maggioni
  • D ynamic Re configurability A pplied to M ulti-FPGA S ystems
  • DReAMS
    • Dynamic Reconfigurability
    • Applied to Multi-FPGA Systems
      • Branch of DRESD project
      • Inherits architectures and tools
    • Automatic workflow from VHDL system description to FPGA implementation
      • VHDL parsing and system simulation
      • System creation over a specific architecture
      • Bitstream creation and download onto FPGAs
  • Multi-FPGA Partitioning Alessandro Panella [email_address]
  • Outline
    • Problem description
    • Project goals and contributions
    • Project phases
    • What is partitioning?
    • Existing approaches
    • Going deep into the problem
    • SpartA
      • The framework
      • The idea
      • The algorithm
    • Experimental results
    • Future work
  • Problem description
    • Multi-FPGA - RATIONALE
      • Large designs do not fit into a single chip
      • High performance parallelized applications
      • Our case: apply dynamic reconfigurability
    • Need to break the initial design into several blocks
      • One block corresponds to a single FPGA chip
      • Which inputs/outputs?
      • Which objectives?
      • Which techinques?
  • Project goals and contributions
    • Analyze existing approaches
      • Obtain a deep knowledge of this -well explored- field
      • Extract basic ideas for a new approach
      • Obtain some terms of comparison
    • Define precisely which problem(s) we cope with
      • Contextualize the problem
      • Focus on our needs
    • Develop a new solution
      • Theoretical background
      • Implementation and evaluation
  • Project phases
    • First Phase [15th March – 12th April]
      • Documentation: presentation (12/4), report
      • Goals:
        • Analysis of the state of the art
        • Produce some hints on a new approach
    • Second Phase [13th April – 17th May]
      • Documentation: presentation (17/5), report
      • Goals:
        • Precise definition of the problem
        • Propose a new solution
    • Third Phase [18th May – 14th June]
      • Documentation: presentation (14/6), final report
      • Goal
        • Implementation and evaluation of the proposed solution
  • What is partitioning?
    • Goal
      • Divide a set of interrelated objects into a set of subsets
      • Optimize a specific objective(s)
    • K-way partitioning
      • Given a graph G=(V,E), partition it into k subsets V 1 ...V k such that their intersection is empty and their union = V.
      • Balance constraint: |V i | ≈ |V|/k
      • Aims at minimizing (or maximizing) an objective function
        • Edge-cut
        • Other objectives
    • In general: NP-complete
      • Several heuristics that provide good results have been developed
  • Existing approaches - a glance
    • Traditional methods
      • Kernighan – Lin and Fiduccia – Mattheyses heuristics
        • Iterative-improvement algorithms
        • Begins with an initial partition and iteratively improve it
        • O(n 3 ) complexity
    • Iterative algorithms
      • Genetic
      • Simulated annealing
    • Multilevel algorithms
      • Clustering -> Initial partitioning -> Refining
      • MeTIS/hMETIS suite: best current results for large flattened graphs partitioning
  • Going deeper into the problem
    • Two kinds of multi-FPGA partition
      • Topology-aware
        • Architecture topology is an input
        • No optimization of the no. of FPGAs needed
        • Main task: association between the (larger) system graph and the (smaller) architectural graph
      • Topology-free
        • Architecture topology is not provided
        • Input: dimension and communication features of FPGAs
        • Minimization of the number of FPGAs
        • Place and route after partitioning
    • At the moment, we deal with the Topology-free problem
  • SPartA: the framework
    • Input: VHDL system description
    • Output: several VHDL files, one for each block (FPGA)
    • Three main phases:
      • Extract design from VHDL description
      • “ Real” partitioning phase (core)
      • Build VHDL files
  • SPartA: the idea
    • Structural approach
      • Fully exploits the design hierarchy
      • Modules can be treated as single blocks
      • Bases for expansions toward dynamic reconfigurability
    • Objectives
      • Minimize cutsize
      • Minimized the number of used FPGAs
      • Preserving module integrity
  • SPartA: the algorithm 1/2
    • Recursive algorithm (deals with trees)
    • Starts from TOP node
    • Precondition
      • No leaves with dimension > FPGA size
    • At every moment, a node can be:
      • COVERED, UNCOVERED or PARTIALLY COVERED
    • Stop condition
      • Node TOP is COVERED
  • SPartA: the algorithm 2/2
    • OPEN ISSUE: Selecting the first node to be inserted into an empty partition
      • Random node
      • Node with overall max communication
      • Node with max communication with its siblings
  • Results 2/2
    • Complexity: exponential, due to the recursive nature of the algorithm
    • Execution time however low (tens of seconds for a reasonable large design)
    • EXAMPLE
    ORIGINAL TREE PARTITIONED TREE
  • Results 3/3
    • Evaluation metrics
      • EDGECUT, FILLING and SPLITS
    • Evaluation of the three policies for node selection
      • 18 different trees of varying size
  • Results 3/3
  • Future work
    • Algorithm improvement
      • Balancing of last partition
      • First node selection policies
      • More refined “score” function for selecting node
        • Use closeness metrics
      • Comparisons with existing algorithms
    • Expansion
      • SpartA framework development
      • Topology-aware partitioning
  • The end ANY QUESTIONS?
  • What’s next
    • DRESD
    • DReAMS
      • Alessandro Panella
      • Matteo Murgida
    • Operating System
      • Ivan Beretta
    • Design Flow
      • Antonio Piazzi
    • Polaris
      • Massimo Morandi
      • Marco Novati
    • HLR
      • Marco Maggioni
  • Chimera Multi-FPGAs Architecture Definition Matteo Murgida [email_address]
  • Outline
    • Introduction
      • Problem description
      • Project Goals
      • State of the Art
    • Project in details
      • Contributions
      • Phases
      • Results
    • What’s next
  • Problem Description
    • Architectural description of a distributed FPGAs environment
    • 3 layers architecture
  • Project Goals
    • Design the architecture of the most generic distributed system
      • Node definition
      • Interface definition
      • Communication channel definition
    • Design a communication protocol
      • Essential protocol
      • Interrupt based protocol
      • Timeout improvement
  • State of the Art
    • CONFigurable ElecTronic TIssue (CONFETTI) by EPFL
      • Cellular based architecture
      • PROs: high degree of parallelism, high computational power
      • CONs: no flexibility, oversized for small problems, small architectural customizations imply big cost/effort
    • Splash 2 by IDA Supercomputing Center
      • Architecture composed by a Sun Sparcstation host, an interface board and “Splash Array”s boards
      • PROs: again high parallelism and power
      • CONs: a central host coordinates the computational units, no fault tollerance, no flexibility
  • Contributions
    • The proposed architecture:
      • Allows several Spartan-3 Starter Boards to communicate and exchange data
      • It is portable to different FPGAs with minimum effort
      • It is the basic infrastructure that will allow external partial dynamic reconfiguration
  • Project Phases
    • First Phase, time window: 15th March – 12th April
      • Documentation: prj presentation (12/4), prj report
      • Goals:
        • Digilent Spartan-3 Starter Board study
        • Boards connection
    • Second Phase, time window: 13th April – 17th May
      • Documentation: prj presentation (17/5), prj report
      • Goals:
        • Communication between two Microblaze soft-processors
        • GPIO integration in the architecture
    • Third Phase, time window: 18th May – 14th June
      • Documentation: prj presentation (14/6), prj report
      • Goals:
        • Interrupt handling, timeout handling
        • Simple application as example
  • Board Study
    • How to use resources like switches, leds and connectors in the board
    • How to map an IP-Core port with a physical pin of the board
    • Choice of the A2 Expansion Connector to connect two boards
  • Microblaze Communication
    • Communication between two Microblaze soft-processors
    • Development of a display controller to visualize the data flow
  • GPIO Insertion
    • Higher architecture portability through the use of the GPIO IP-Core.
    • Higher architecture portability through the use of the GPIO IP-Core
  • Interrupt Controller Insertion
    • Communication protocol improvement by interrupt handling to prevent processor from busy waiting
    • Interrupt Controller is included in the architecture to permit multi-interrupt detection and handling
  • Timeout
    • Malfunctioning due to interference on the communication channel lead to deadlocks
    • Communication protocol is not reliable at all
    • Counter implementation, including the driver used by the processor to lower down raised interrupts
    • Development of a simple application to verify to correctness of the proposed approach
  • Results
    • A short Demo ...
  • Future Work
    • Apply the proposed approach to external partial dynamic reconfiguration
    • Develop a co-simulation framework based on the VHDL/SystemC descriptions of distributed systems
      • Receive as input the VHDL description of the system
      • Build the VHDL description for every node
      • Create the SystemC stub to allow inter node communication
      • Describe the communication in SystemC
      • Co-simulate the VHDL / SystemC description
  • Questions
  • What’s next
    • DRESD
    • DReAMS
      • Alessandro Panella
      • Matteo Murgida
    • Operating System
      • Ivan Beretta
    • Design Flow
      • Antonio Piazzi
    • Polaris
      • Massimo Morandi
      • Marco Novati
    • HLR
      • Marco Maggioni
  • O perating Sy stem support for R econf i gurable S oC
  • Development of an OS architecture-independent layer for dynamic reconfiguration Ivan Beretta [email_address]
  • Outline
    • Introduction
      • Problem description
      • Project Goals
      • State of the Art
    • Project in details
      • Contributions
      • Phases
      • Results
    • What’s next
  • Problem description
    • Need for an operating system support on Reconfigurable SoCs
      • Simplified software development process
      • Improved code portability
    • Lack of support for dynamic reconfigurable architectures
      • Specific solutions for specific architectures
    • Need for an architecture-independent abstraction layer
  • Project Goal
    • Primary goals:
      • Analysis of the State of the Art
      • Definition of the new intermediate layer
      • Physical implementation
    • Specific goals:
      • Study of the solutions developed inside the DRESD group
      • Comparison between existing solutions
      • Recovery of on of the two implementations
      • Hardware architectures generation using up-to-date tools on Xilinx Virtex II – Pro VP7
  • State of the Art
    • Caronte implementation (Alberto Donato, 2005)
    • Two kernel modules
      • ICAP deivice driver
      • IP-Core manager (IPCM)
  • State of the Art (cont’d)
    • YaRA implementation (Vincenzo Rana, 2006)
    • Multi-layered structure
      • Four modules: Reconfiguration controller driver, MAC, LOL, Reconfiguration Library
      • ROTFL architecture
  • Contributions
    • Limits of existing implementations
      • Lack of portability
        • E.g. YaRA solution implemented on RAPTOR2000
      • Reconfiguration process details visible from userspace
    • Definition of an architecture independent middleware
      • Improved portability
        • It works on different hardware architectures
        • It works with different Linux distribution
      • Opportunity to optimize latencies
  • Phases
    • First phase: Layer definition
      • Goal: Factorization of common features
        • Boundaries of the new middleware
        • Mapping of existing solutions on the functionalities
      • Motivation: Provide guidelines for actual implementation
    • Second phase: Implementation recovery
      • Goal: Recovery of bootstrap process and kernel images
      • Motivation: Full recovery of Caronte solution
    • Third phase: Architectures generation
      • Goal: Synthesis of hardware architectures using up-to-date Xilinx tools and cores
      • Motivation: Synthesis of hardware architectures using up-to-date Xilinx tools
  • First Phase: Layer definition
    • Definition of new layer boundaries
      • Factorization of existing features
      • Mapping of the required functionalities on existing implementations
    Legend: ● = Both hardware and software ● = Hardware independent Feature Caronte Solution YaRA Solution Reconfiguration controller support ICAP device driver Reconfiguration Controller Driver Dynamic address space assignment IPCM Module MAC module Dynamic device registration and driver loading IPCM Module LOL module API Direct interaction with modules Reconfiguration library Module management (caching, placement...) Not implemented ROTFL architecture
  • Second Phase: Implementation Recovery
    • Bootstrap process from flash memory
    16 MB Flash 0xe4000000 0xe42FFFFF ... ... 0xe4F00000 0xe4F80000 64 MB DDR SDRAM 0x00000000 ... ... 0xe4FFFFFF 0x03FFFFFF 0x00800000 ... BRAM PowerPC FPGA Bootloader Bootmanager Kernel and RAMDisk Image 1 2 3 4 5 6
  • Second Phase: Implementation Recovery (cont’d)
    • Several issues
      • No bootmanager nor linux kernel on flash memory at the beginning
      • Flash memory seen as read-only memory at runtime
      • Need for an ad-hoc solution
    • Avmon command line interface
      • Executed from DDR SDRAM memory
      • FTP transfert of bootmanager and flash programming
      • Also useful for kernel download
    • Kernel executable image
      • Kernel image built using a cross-compiler
      • ICAP and IPCM modules loaded at runtime
  • Third Phase: Architecture generation
    • Hardware architecture used in Second Phase no longer useful
      • Synthesized with Xilinx ISE and EDK 6.1
    • Same hardware structure realized with updated cores and recend tool versions
      • Synthesis with Xilinx ISE and EDK 7.1
      • Synthesis with Xilinx ISE and EDK 9.1
    • Lack of device driver support and documentation to configure newest cores
  • Results: Implementation Recovery
    • Linux Bootstrap from flash memory
  • Results: Implementation Recovery
    • Design summary for hardware architectures on Xilinx Virtex II – Pro VP7
    • Two main limitations
      • Ethernet controller
      • Necessity of a top-level design
    • Design too large for module-based reconfiguration
    Xilinx ISE/EDK 7.1 Xilinx ISE/EDK 9.1 Resource Used Available % Used Available % Slices 4926 4928 99% 5318 4928 107% Flip-Flops 5217 9856 52% 5724 9856 58% 4-in LUTs 6974 9856 70% 6993 9856 70%
  • What’s next
    • Device driver updates to support newest architectures
    • Intermediate layer implementation
      • Opportunity to add some additional features
        • Reconfiguration scheduler
      • Opportunity to define a common device driver interface to simplify the creation of a new driver by the use
    • Integration of the middleware and the operating system support in a complete design flow
  • Questions
  • What’s next
    • DRESD
    • DReAMS
      • Alessandro Panella
      • Matteo Murgida
    • Operating System
      • Ivan Beretta
    • Design Flow
      • Antonio Piazzi
    • Polaris
      • Massimo Morandi
      • Marco Novati
    • HLR
      • Marco Maggioni
  • Design FLow Antonio Piazzi [email_address]
  • Outline
    • Introduction
      • Problem description
      • Project Goals
      • State of the Art
    • Project in details
      • Contributions
      • Phases
      • Results
    • What’s next
  • Problem description
    • User has to spread his attention on many problems, some of this related with the implementation of the design.
    • Often users could don’t know anything about reconfigurable architecture generation and they haven’t.
  • Project Goals
    • New design methodology tailored to support partial dynamic reconfigurable architecture
    • Definition and implememtantion of design framework able to
      • Support different design paradigms i.e. Xilinx Module Based, Xilinx EAPR
      • Hide the dirty work (due to the recofiguration) to the application designer
      • Support different architectural solutions i.e. different communication infrastructure IBM CoreConnect or Wishbone
  • Contributions
    • With our frame work all user (novice and not) may be able to develop and debug their functionality through a reconfigurable architecture without analyze all problems related with that develop methodology
  • Phases
    • 1 st phase (15 March – 15 April): Budgeting
      • Study of the state of the art
    • 2 nd phase(15 April – 15 May): Realization phase
      • Construction of the entire frame work based on previously separated tools
      • Implementation of a innovative work flow
    • 3 rd phase (15 May – 15 June): Project’s validation
      • Definition of a new communication infrastructure and transfer protocol for the reconfigurable part
      • Verify the integration of the new infrastructure in the project
  • First Phase
    • Study of the state of the art
      • Standard reconfigurable design flow
        • Xilinx Modlue Based and EAPR
      • Caronte Design Flow
      • EDK-based architecture
  • Sel f Reconfigurable Architecture
  • Second Phase 1/4
      • Costruction of the entire frame work based on prevoiusly separated tools
    User has to focus his attention only on the develop of the IBM core-connect architecture and on writing modules which implement his functionality SYSTEM.VHD contains all information about the IBM core-connect architecture
  • Second Phase 2/4 ArchGen take the system.vhd file and process the contained architecture and translate that static architecture in a dynamic one FIX.VHD contains the instantiations of the processors (one or more) and all the components presented in the IBM core-connect architecture TOP.VHD contains the instantiations of the fix component and the information about the communication infrastructure
  • Second Phase 3 /4 COMiC generate an NCD file which contains the information about the communication infrastructure and an XDL file which contains the same information in text mode
  • Second Phase 4/ 4 At this point we have only to collect all the information we need and so, through a parser we insert those into a new top.vhd which will be our fix part of the architecture, at this point we have only to manage the reconfigurable modules written by the user
  • Third Phase 1/3 An OPB bus based on 3-state buffer used to link one or more modules to the fix part (created with ISE) Definition of a new communication infrastructure and transfer protocol for the reconfigurable part
  • Third Phase 2/3 Use ncd2xdl converter to obtain an xdl file which contains all parameters of our bus
  • Third Phase 3/3 Perfect integration in our process, we can use all bus type to connect fix and reconfigurable part Verify the integration of the new infrastructure in the project
  • Results
    • That frame work answer to the need of automation presented from the novice user and help, generally, all the users that they head a low time to market.
  • What’s next
    • Our idea for future work is to schedule a one or two work day to patch some bugs presents in the project and to adjust the output of COMiC which has to create an OPB replay bus.
  • Questions?
  • What’s next
    • DRESD
    • DReAMS
      • Matteo Murgida
      • Alessandro Panella
    • Operating System
      • Ivan Beretta
    • Design Flow
      • Antonio Piazzi
    • Polaris
      • Massimo Morandi
      • Marco Novati
    • HLR
      • Marco Maggioni
  • Polaris
  • Polaris
    • Create an integrated HW/SW system to manage 2D reconfiguration
    • SW side:
      • Maintain information on FPGA status
      • Decide of how to efficiently allocate tasks
    • HW side:
      • Provide support for effective task allocation
      • Perform 2D bitstream relocation
  • Management of 2D Reconfiguration in a Reconfigurable System Massimo Morandi [email_address]
  • Outline
    • Introduction
      • Problem description
      • Project Goals and Contributions
    • Project in details
      • Phases
      • Results
    • Future Work
  • Problem Description
    • New Generation of FPGAs
      • Virtex-4 and Virtex-5
      • Allow bi-dimensional reconfiguration
    • This permits to:
      • Better exploit reconfigurable area
      • Obtain modules performance optimizations
    • More complex management:
      • Handle one more degree of freedom
      • Avoid more fragmentation
      • Perform good placement choices to keep low TRR
      • Keep acceptable intra-module routing paths
  • Project Goals and Contributions
    • Analyze effects of 2D reconfiguration
      • New advantages
      • New problems
    • Examine possible solutions to new problems
      • Explore literature to find promising ideas
      • Evaluate those solutions in various scenarios
    • Propose a new solution
      • Combining ideas from literature with new ones
      • Obtaining good cost-quality tradeoff
  • Project Phases
    • First Phase, time window: 15th March – 12th April
      • Documentation: prj presentation (12/4), prj report
      • Goals:
        • General analysis of 2D reconfiguration
        • Detailed description of the new problems
    • Second Phase, time window: 13th April – 17th May
      • Documentation: prj presentation (17/5), prj report
      • Goals:
        • Definition of desired features for a solution
        • Analysis and evaluation of existing solutions
    • Third Phase, time window: 18th May – 14th June
      • Documentation: prj presentation (14/6), prj report
      • Goal: p ropose a new combined solution to effectively handle problems of 2D reconfiguration
  • Setting and Advantages Definition
    • Definition of the setting:
      • 2D self partial dynamical run-time reconfiguration
    • Analysis of the advantages of 2D Reconfiguration
      • In area usage and performance
  • 2D Fragmentation Problem
    • Analysis of the 2D-fragmentation problem
      • Area generally more fragmented
      • Can nullify the area optimizations obtained
  • Placement Decisions
    • Analysis of 2D placement choices effects:
      • Again, bad choices can lead to performance loss
  • Allocation manager
    • Definition of allocation manager desired features:
      • Low TRR
      • Low management overhead
      • High routing efficiency
      • Low fragmentation
    • Definition of allocation manager structure:
      • Empty space manager
        • Complete space
        • Heuristic selection
      • Fitter
        • General (FF,BL,BF,WF…)
        • Focused (FA,RA… )
  • Most relevant works
    • Maintain complete information on empty space:
      • KAMER:
        • Keep All Maximally Empty Rectangles
        • Apply a general fitting strategy
      • CUR:
        • Maintain the Countour of a Union of Rectangles
        • Apply a focused fitting strategy
    • Heuristically prune part of the information:
      • KNER:
        • Keep Non-overlapping Empty Rectangles
        • Apply a general fitting strategy
      • 2D-HASHING:
        • Keep Non-ov. Empty Rectangles in optimized data structure
        • Apply (exclusively) a general fitting strategy
  • Evaluation and Proposed Approach
    • Proposed Approach
      • Heuristic (KNER-like) empty space manager, to keep low complexity for use in a self-reconfigurable system
      • Fitting strategy focused on minimizing routing paths, to maintain high performance of the reconfigurable system (chosen metric to minimize Manhattan distance)
    • High placement quality => high complexity
    • Lowest compl. => no focused fitting (bad especially for routing)
  • Structure of the allocation manager
    • Task, defined by:
      • Arrival time, ASAP, (ALAP), H, W, Latency, Communicating Tasks
      • Hosted in a queue which also adds a pointer to the rectangle where it is placed
    • Reconfigurable Device, represented as:
      • Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle.
      • Navigation trough pointers to left child, right child, next leaf and a function to find previous leaf (for bookkeeping after split or merge)
    • Rectangle, defined by:
      • X, Y, H, W
      • Initially one, (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols
  • The Placement Algorithm
  • Experimental Results
    • Benchmark of 100 randomly generated tasks:
      • Size (5% to 25% of FPGA), randomly interconnected
    • Execution time: 3x less than CUR, close to KNER
    • Communication cost: 3x less than KNER, close to CUR
    • Task Rejection Rate: all solutions quite close
  • Future Work
    • Apply the proposed solution to self reconfiguration:
      • Adapt the algorithm to run on the internal processor
      • Create a validation reconfigurable architecture
      • Integrate the architecture with relocation
    • Tune the algorithm to improve results:
      • Experiment techniques to reduce TRR
      • Try to optimize the code to have an algorithm with lower running time
  • Questions?
  • What’s next
    • DRESD
    • DReAMS
      • Alessandro Panella
      • Matteo Murgida
    • Operating System
      • Ivan Beretta
    • Design Flow
      • Antonio Piazzi
    • Polaris
      • Massimo Morandi
      • Marco Novati
    • HLR
      • Marco Maggioni
  • Relocation for 2D Reconfigurable Systems Marco Novati [email_address]
  • Outline
    • Introduction
      • Problem description
      • Project Goals
    • Project in details
      • Phases
      • Results
    • What’s next
  • Problem Description
    • Self Dynamical Runtime 2D Reconfiguration
      • Xilinx Virtex-4 and Virtex-5
    • Relocation, different solutions
      • Software (BAnMat, PARBIT)
      • Hardware (REPLICA, BiRF)
    • We chose an hardware solution
      • BiRF Square
  • Project Goals
    • Study of the new FPGA Families
      • Examination of Xilinx documentation on V4 and V5
    • Analysis of the new bitstream structure
      • Generation of V4 and V5 bitstream
    • Development of the new version of BiRF
      • Implementation
      • Validation
  • Phases
    • First Phase: 15th March – 12th April
      • Documentation: prj presentation (12/4), prj report
      • Goals:
        • Xilinx documentation examination
        • V4 & V5 bitstream structure analysis
    • Second Phase: 13th April – 17th May
      • Documentation: prj presentation (17/5), prj report
      • Goals:
        • Implementation of BiRF Square
        • Synthesis
    • Third Phase: 18th May – 14th June
      • Documentation: prj presentation (14/6), prj report
      • Goals:
        • Verification & Validation
  • Frame Addressing
    • New Frame Addressing:
      • Possibility of addressing rows and columns
  • New Parser
  • CRC Calculation
    • Particular CRC value, used by Xilinx tools
    • Two version of BiRF Square:
      • By using the “predefined” value
      • With actual CRC calculation
    • An optimized algorithm has been used
  • Synthesis results
    • On a Virtex-4 with speed grade -12
      • General purpose version: max frequency of 160 MHz
      • Specific version: maxfrequency of 290Mhz
  • Target Device
  • Validation Architecture
  • Results 1/2
    • BiRF Square
      • Permitsto apply relocation in a self partially and dynamically 2D-reconfigurable system
      • The occupation ratio is relatively small
      • Frequency more than acceptable
      • Reduction of internal memory requirements
  • Results 2/2
    • Throughput of 7,3 MB/s:
      • A total configuration file size is about 1 MB
      • Considering an architecture:
        • 1/3 of the area as fixed part
        • 2/3 as reconfigurable part with 6 slots
      • With such hypothesis
        • Size of a partial bitstream will be about 110 KB
        • Relocation time of about 15 ms
  • What’s Next
    • Future improvements:
      • Direct access to the memory (DMA)
        • Direct manipulation of the bitstream
        • Portability
      • Integration with ICAP
        • Elimination of the relocation overhead
        • Relocation time << reconfiguration time
    • The final goal:
      • Creation of a real architecture that exploits self partial and dynamical 2D-reconfiguration,with relocation
  • Questions
  • What’s next
    • DRESD
    • DReAMS
      • Alessandro Panella
      • Matteo Murgida
    • Operating System
      • Ivan Beretta
    • Design Flow
      • Antonio Piazzi
    • Polaris
      • Massimo Morandi
      • Marco Novati
    • HLR
      • Marco Maggioni
  • H igh L evel R econfiguration Marco Maggioni marco.maggioni @dresd.org
  • Outline
    • Introduction
      • Problem description
      • Project Goals
      • State of the Art
    • Project in details
      • Contributions
      • HLR workflow
        • GraphGen
        • IsomorphClustering
        • SimpleLatency
        • Salomone
      • Results
    • What’s next
  • Problem Description
    • What is H igh L evel R econfiguration...?
      • Theoretical approach to dynamic reconfiguration...
    • Vision...
      • Reconfigurability has many advantages...
    • Mission...
      • Exploit these advantages to obtain best performance...
    • How...?
      • Adapting a system to this execution model managing complexity and drawbacks...
  • Project Goal
    • Create a complete HLR workflow...
      • From a real system specification to its reconfigurable execution model...
    • Define precise interfaces for each phase...
      • To promote flexibility and future HLR researchs...
      • To develop a complete toolchain...
    • Apply some algorithms regarding reconfigurability...
      • To reuse past works...
  • State of Art
    • Present of HLR ...
      • Some ideas/concepts regarding clustering and scheduling...
      • ... but no a complete and well-defined workflow.
      • ... but a lot of work to do.
    • System specifications analysis...
      • P and A HW/SW framework to promote new ideas...
      • Dynamic Reconfigurability can be considered as a branch of this research...
  • Contribution
    • Dynamic library loading system ...
      • Embedded into GNU compilation tool-chain
    • Porting of P and A libraries into Earendil...
      • Suitable for future analysis...
    • HLR tools deployed onto Earendil...
      • Cover each step of workflow...
  • HLR workflow
    • C lustering (with A nalysis)...
      • 1 st Month
    • C oloring...
      • 2 nd Month
    • S cheduling...
      • 3 rd Month
    Gcc Frontend Partitioning Algorithm PandA Scheduling Algorithm Clustered Graph Metric Evaluation Reconfigurable Clustered Graph Area Latency Rec. Time Power Target Architecture Database
  • GraphGen
    • GraphGen is the first step of the HLR toolchain ...
      • Takes as input a system specification or an algorithm...
      • Produces a graph (CFG/BB/DFG/SDG)
    • Perfoms high level analysis step...
      • Transforms the system description (C/C++/SystemC) to a representation suitable for further elaboration...
      • Based on GCC and compiler theory...
      • Uses P and A 0.4 funtionalities to produce a statement level graph...
  • IsomorphClustering
    • IsomorphClusteing follows GraphGen in the HLR toolchain ...
      • Takes as input a statement level graph...
      • Produces a clustered graph...
    • Clustering phase...
      • Aggregates nodes into configuration (basic unit of reconfigurable execution)...
      • Based on isomorphism, tries to find different instances of isomorph templates...
      • We can also apply differents algorithms...
  • SimpleLatency
    • SimpleLatency follows IsomorphClusteing in the HLR toolchain ...
      • Takes as input a clustered graph...
      • Adds latency information at each configuration...
      • Produces a reconfigurable clustered graph with latency evaluations...
    • Coloring...
      • “ Colors” each cluster with usefull evalution for reconfigurability...
      • Based on clusters internal critical path...
      • Different metric for different architectures...
      • Connects HLR with real architectural parameters...
  • Salomone
    • Salomone is the last step in the HLR toolchain ...
      • Takes as input a reconfigurable clustered graph...
      • Produces a schedule on an abstract reconfigurable architecture...
    • Scheduling...
      • It's considered the core task of HLR ...
      • Maps each configuration on an area portion...
      • Adapts the system execution to reconfigurable model...
      • Based onto graph coloring algorithm...
  • Results 1/3
    • Based onto AES encryption...
    • Templates found with Isomorph CLustering...
      • Execution time... 123.94 s
  • Results 2/3
    • Salomone adapting and coloring...
      • Execution time... 113.55 s
  • Results 3/3
    • Final Scheduling...
  • What's next
    • Heuristich implementation for Salomone...
      • To improve result quality in term of number of area portions...
    • A new metric for area/latency...
      • Based on RTL logical synthesis evaluations...
    • Introduce feedback into HLR workflow...
      • Based on schedule evaluation...
    • New clustering and scheduling algorithms...
      • Such as Napoleon...
  • Questions