Published on

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. DReAMS Dynamic Reconfigurability Applied to Multi-FPGA Systems Matteo Murgida, Alessandro Panella {matteo.murgida, alessandro.panella}@dresd.org
  2. 2. DReAMS <ul><li>Dynamic Reconfigurability Applied to Multi-FPGA Systems </li></ul><ul><ul><li>Branch of DRESD project </li></ul></ul><ul><ul><li>Inherits architectures and tools </li></ul></ul><ul><li>Automatic workflow from VHDL system description to FPGA implementation </li></ul><ul><ul><li>VHDL parsing and system simulation </li></ul></ul><ul><ul><li>System creation over a specific architecture </li></ul></ul><ul><ul><li>Bitstream creation and download onto FPGAs </li></ul></ul>
  3. 3. Workflow
  4. 4. SPartA A novel algorithm for multi-FPGA partitioning Alessandro Panella [email_address]
  5. 5. Outline <ul><li>Problem description </li></ul><ul><li>Project goals and contributions </li></ul><ul><li>What is partitioning? </li></ul><ul><li>Existing approaches </li></ul><ul><li>Going deep into the problem </li></ul><ul><li>SpartA </li></ul><ul><ul><li>The framework </li></ul></ul><ul><ul><li>The idea </li></ul></ul><ul><ul><li>The algorithm </li></ul></ul><ul><li>Experimental results </li></ul><ul><li>Future work </li></ul>
  6. 6. Problem description <ul><li>Multi-FPGA - RATIONALE </li></ul><ul><ul><li>Large designs do not fit into a single chip </li></ul></ul><ul><ul><li>High performance parallelized applications </li></ul></ul><ul><ul><li>Our case: apply dynamic reconfigurability </li></ul></ul><ul><li>Need to break the initial design into several blocks </li></ul><ul><ul><li>One block corresponds to a single FPGA chip </li></ul></ul><ul><ul><li>Which inputs/outputs? </li></ul></ul><ul><ul><li>Which objectives? </li></ul></ul><ul><ul><li>Which techinques? </li></ul></ul>
  7. 7. Project goals and contributions <ul><li>Analyze existing approaches </li></ul><ul><ul><li>Obtain a deep knowledge of this -well explored- field </li></ul></ul><ul><ul><li>Extract basic ideas for a new approach </li></ul></ul><ul><ul><li>Obtain some terms of comparison </li></ul></ul><ul><li>Define precisely which problem(s) we cope with </li></ul><ul><ul><li>Contextualize the problem </li></ul></ul><ul><ul><li>Focus on our needs </li></ul></ul><ul><li>Develop a new solution </li></ul><ul><ul><li>Theoretical background </li></ul></ul><ul><ul><li>Implementation and evaluation </li></ul></ul>
  8. 8. What is partitioning? <ul><li>Goal </li></ul><ul><ul><li>Divide a set of interrelated objects into a set of subsets </li></ul></ul><ul><ul><li>Optimize a specific objective(s) </li></ul></ul><ul><li>K-way partitioning </li></ul><ul><ul><li>Given a graph G=(V,E), partition it into k subsets V 1 ...V k such that their intersection is empty and their union = V. </li></ul></ul><ul><ul><li>Balance constraint: |V i | ≈ |V|/k </li></ul></ul><ul><ul><li>Aims at minimizing (or maximizing) an objective function </li></ul></ul><ul><ul><ul><li>Edge-cut </li></ul></ul></ul><ul><ul><ul><li>Other objectives </li></ul></ul></ul><ul><li>In general: NP-complete </li></ul><ul><ul><li>Several heuristics that provide good results have been developed </li></ul></ul>
  9. 9. Existing approaches - a glance <ul><li>Traditional methods </li></ul><ul><ul><li>Kernighan – Lin and Fiduccia – Mattheyses heuristics </li></ul></ul><ul><ul><ul><li>Iterative-improvement algorithms </li></ul></ul></ul><ul><ul><ul><li>Begins with an initial partition and iteratively improve it </li></ul></ul></ul><ul><ul><ul><li>O(n 3 ) complexity </li></ul></ul></ul><ul><li>Iterative algorithms </li></ul><ul><ul><li>Genetic </li></ul></ul><ul><ul><li>Simulated annealing </li></ul></ul><ul><li>Multilevel algorithms </li></ul><ul><ul><li>Clustering -> Initial partitioning -> Refining </li></ul></ul><ul><ul><li>MeTIS/hMETIS suite: best current results for large flattened graphs partitioning </li></ul></ul>
  10. 10. Going deeper into the problem <ul><li>Two kinds of multi-FPGA partition </li></ul><ul><ul><li>Topology-aware </li></ul></ul><ul><ul><ul><li>Architecture topology is an input </li></ul></ul></ul><ul><ul><ul><li>No optimization of the no. of FPGAs needed </li></ul></ul></ul><ul><ul><ul><li>Main task: association between the (larger) system graph and the (smaller) architectural graph </li></ul></ul></ul><ul><ul><li>Topology-free </li></ul></ul><ul><ul><ul><li>Architecture topology is not provided </li></ul></ul></ul><ul><ul><ul><li>Input: dimension and communication features of FPGAs </li></ul></ul></ul><ul><ul><ul><li>Minimization of the number of FPGAs </li></ul></ul></ul><ul><ul><ul><li>Place and route after partitioning </li></ul></ul></ul><ul><li>At the moment, we deal with the Topology-free problem </li></ul>
  11. 11. SPartA: the framework <ul><li>Input: VHDL system description </li></ul><ul><li>Output: several VHDL files, one for each block (FPGA) </li></ul><ul><li>Three main phases: </li></ul><ul><ul><li>Extract design from VHDL description </li></ul></ul><ul><ul><li>“Real” partitioning phase (core) </li></ul></ul><ul><ul><li>Build VHDL files </li></ul></ul>
  12. 12. SPartA: the idea <ul><li>Structural approach </li></ul><ul><ul><li>Fully exploits the design hierarchy </li></ul></ul><ul><ul><li>Modules can be treated as single blocks </li></ul></ul><ul><ul><li>Bases for expansions toward dynamic reconfigurability </li></ul></ul><ul><li>Objectives </li></ul><ul><ul><li>Minimize cutsize </li></ul></ul><ul><ul><li>Minimized the number of used FPGAs </li></ul></ul><ul><ul><li>Preserving module integrity </li></ul></ul>(*) (*) From Wen-Jong Fang and Allen C.-H. Wu, Multiway FPGA Partitioning by Fully Exploiting Design Hierarchy
  13. 13. SPartA: the algorithm 1/2 <ul><li>Recursive algorithm (deals with trees) </li></ul><ul><li>Starts from TOP node </li></ul><ul><li>Precondition </li></ul><ul><ul><li>No leaves with dimension > FPGA size </li></ul></ul><ul><li>At every moment, a node can be: </li></ul><ul><ul><li>COVERED, UNCOVERED or PARTIALLY COVERED </li></ul></ul><ul><li>Stop condition </li></ul><ul><ul><li>Node TOP is COVERED </li></ul></ul>
  14. 14. SPartA: the algorithm 2/2 <ul><li>OPEN ISSUE: Selecting the first node to be inserted into an empty partition </li></ul><ul><ul><li>Random node </li></ul></ul><ul><ul><li>Node with overall max communication </li></ul></ul><ul><ul><li>Node with max communication with its siblings </li></ul></ul>
  15. 15. Results 1/3 <ul><li>Complexity: exponential, due to the recursive nature of the algorithm </li></ul><ul><li>Execution time however low (tens of seconds for a reasonable large design) </li></ul><ul><li>EXAMPLE </li></ul>ORIGINAL TREE PARTITIONED TREE
  16. 16. Results 2/3 <ul><li>Evaluation metrics </li></ul><ul><ul><li>EDGECUT, FILLING and SPLITS </li></ul></ul><ul><li>Evaluation of the three policies for node selection </li></ul><ul><ul><li>18 different trees of varying size </li></ul></ul>
  17. 17. Results 3/3
  18. 18. Future work <ul><li>Algorithm improvement </li></ul><ul><ul><li>Balancing of last partition </li></ul></ul><ul><ul><li>First node selection policies </li></ul></ul><ul><ul><li>More refined “score” function for selecting node </li></ul></ul><ul><ul><ul><li>Use closeness metrics </li></ul></ul></ul><ul><ul><li>Comparisons with existing algorithms </li></ul></ul><ul><li>Expansion </li></ul><ul><ul><li>SpartA framework development </li></ul></ul><ul><ul><li>Topology-aware partitioning </li></ul></ul>
  19. 19. The end ANY QUESTIONS?
  20. 20. Chimera Multi-FPGAs Architecture Definition Matteo Murgida [email_address]
  21. 21. Murgida - Outline <ul><li>Introduction </li></ul><ul><ul><li>Problem description </li></ul></ul><ul><ul><li>Project Goals </li></ul></ul><ul><ul><li>State of the Art </li></ul></ul><ul><li>Project in details </li></ul><ul><ul><li>Contributions </li></ul></ul><ul><ul><li>Development </li></ul></ul><ul><ul><li>Results </li></ul></ul><ul><ul><li>Future Works </li></ul></ul><ul><li>Demo </li></ul>
  22. 22. Problem Description <ul><li>Architectural description of a distributed FPGAs environment </li></ul><ul><li>3 layers architecture </li></ul>
  23. 23. Project Goals <ul><li>Design the architecture of the most generic distributed system </li></ul><ul><ul><li>Node definition </li></ul></ul><ul><ul><li>Interface definition </li></ul></ul><ul><ul><li>Communication channel definition </li></ul></ul><ul><li>Design a communication protocol </li></ul><ul><ul><li>Essential protocol </li></ul></ul><ul><ul><li>Interrupt based protocol </li></ul></ul><ul><ul><li>Timeout improvement </li></ul></ul>
  24. 24. State of the Art <ul><li>CONFigurable ElecTronic TIssue (CONFETTI) by EPFL </li></ul><ul><ul><li>Cellular based architecture </li></ul></ul><ul><ul><li>PROs: high degree of parallelism, high computational power </li></ul></ul><ul><ul><li>CONs: no flexibility, oversized for small problems, small architectural customizations imply big cost/effort </li></ul></ul><ul><li>Splash 2 by IDA Supercomputing Center </li></ul><ul><ul><li>Architecture composed by a Sun Sparcstation host, an interface board and “Splash Array” boards </li></ul></ul><ul><ul><li>PROs: again high parallelism and power </li></ul></ul><ul><ul><li>CONs: a central host coordinates the computational units, no fault tollerance, no flexibility </li></ul></ul>
  25. 25. Contributions <ul><li>The proposed architecture: </li></ul><ul><ul><li>Allows several Spartan-3 Starter Boards to communicate and exchange data </li></ul></ul><ul><ul><li>It is portable to different FPGAs with minimum effort </li></ul></ul><ul><ul><li>It is the basic infrastructure that will allow external partial dynamic reconfiguration </li></ul></ul>
  26. 26. Board Study <ul><li>How to use resources like switches, leds and connectors in the board </li></ul><ul><li>How to map an IP-Core port with a physical pin of the board </li></ul><ul><li>Choice of the A2 Expansion Connector to connect two boards </li></ul>
  27. 27. Microblaze Communication <ul><li>Communication between two Microblaze soft-processors </li></ul><ul><li>Development of a display controller to visualize the data flow </li></ul>
  28. 28. GPIO Insertion <ul><li>Higher architecture portability through the use of the GPIO IP-Core. </li></ul>
  29. 29. Interrupt Controller Insertion <ul><li>Communication protocol improvement by interrupt handling to prevent processor from busy waiting </li></ul><ul><li>Interrupt Controller is included in the architecture to permit multi-interrupt detection and handling </li></ul>
  30. 30. Timeout <ul><li>Malfunctioning due to interference on the communication channel lead to deadlocks </li></ul><ul><li>Communication protocol is not reliable at all </li></ul><ul><li>Counter implementation, including the driver used by the processor to lower down raised interrupts </li></ul><ul><li>Development of a simple application to verify to correctness of the proposed approach </li></ul>
  31. 31. Results <ul><li>A short Demo ... </li></ul>
  32. 32. Demo
  33. 33. Future Works <ul><li>Development of a SystemC/VHDL Co-Simulation Framework </li></ul><ul><li>Expert system integration </li></ul>
  34. 34. The end ANY QUESTIONS?