UIC Thesis Morandi

437 views
340 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
437
On SlideShare
0
From Embeds
0
Number of Embeds
32
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

UIC Thesis Morandi

  1. 1. BY Massimo Morandi [email_address] Thesis committee: John Lillis (Chair), Donatella Sciuto, Mitchell Theys UIC Thesis Defense: May 9 2008 Runtime Core Allocation Management for 2D Self Partially and Dynamically Reconfigurable Systems
  2. 2. Rationale and Innovation <ul><li>Problem statement </li></ul><ul><ul><li>Providing runtime management support for 2D self partial and dynamical reconfiguration, in particular for what concerns Core placement decisions </li></ul></ul><ul><li>Innovative contributions </li></ul><ul><ul><li>A fast and flexible solution </li></ul></ul><ul><ul><ul><li>A low complexity, to avoid introducing too much overhead at runtime </li></ul></ul></ul><ul><ul><ul><li>Supporting different scenarios and placement policies, according to user needs </li></ul></ul></ul><ul><ul><li>Allowing the possibility to exploit multiple shapes per Core by integration with area constraints definition </li></ul></ul>
  3. 3. Aims <ul><li>Our proposed solution must support different scenarios, placement policies and intervention from the designer </li></ul><ul><li>It must be fast when compared to related solutions existing in literature </li></ul><ul><li>The quality of the placement choices must be high, in terms of percentage of placement success, global application completion time or other metrics, as defined by the user </li></ul>
  4. 4. Outline <ul><li>Context Definition </li></ul><ul><li>Motivations and Goals </li></ul><ul><li>The Complete Polaris Workflow </li></ul><ul><li>Specific Contributions </li></ul><ul><li>Area Constraints Definition </li></ul><ul><ul><li>Proposed solution </li></ul></ul><ul><li>Runtime Core Allocation Management </li></ul><ul><ul><li>Features and Structure of an Allocation Manager </li></ul></ul><ul><ul><li>Relevant Works </li></ul></ul><ul><ul><li>Proposed Solution </li></ul></ul><ul><li>Results </li></ul><ul><li>Conclusions and Future Work </li></ul>
  5. 5. Context Definition <ul><li>Reconfigurable hardware: </li></ul><ul><ul><li>Has the capability of changing its configuration (functionality) according to user needs </li></ul></ul><ul><li>Self reconfiguration: </li></ul><ul><ul><li>the system must be completely autonomous at runtime </li></ul></ul><ul><li>Partial reconfiguration: </li></ul><ul><ul><li>the changes can also involve fractions of the device </li></ul></ul><ul><li>Dynamical Reconfiguration: </li></ul><ul><ul><li>if a part of the hardware is reconfigured, the rest can continue its computation </li></ul></ul><ul><li>2D Reconfiguration: </li></ul><ul><ul><li>arbitrary rectangular slots can be dynamically reconfigured, as opposed to arbitrary columns in 1D </li></ul></ul>
  6. 6. Field Programmable Gate Array <ul><li>Minimum Granularity: </li></ul><ul><ul><li>Physical: there is a minimum unit that can be configured independently, depending on the device (Tile) </li></ul></ul><ul><ul><li>Practical: since reconfiguration has a cost, it is reasonable to define a multiple of a Tile as the minimum reconfigurable unit (Slot) </li></ul></ul>
  7. 7. A bit of Terminology <ul><li>Bitstream: </li></ul><ul><ul><li>Binary file defining the configuration of part or all the reconfigurable device (FPGA) </li></ul></ul><ul><li>Core: </li></ul><ul><ul><li>Representation of a functionality, independent of shape and position (example: JPEG) </li></ul></ul><ul><li>RFU (Reconfigurable Functional Unit): </li></ul><ul><ul><li>A Core to which area constraints have been applied (example: JPEG constrained in a 2x3 rectangle) </li></ul></ul><ul><li>A partial bitstream defines a RFU, implemented in a specific position defined by bottom-left corner </li></ul><ul><li>The same bitstream can be reused for all positions if we exploit bitstream relocation </li></ul>
  8. 8. A bit of Terminology
  9. 9. Virtual homogeneity
  10. 10. What’s next <ul><li>Context Definition </li></ul><ul><li>Motivations and Goals </li></ul><ul><li>The Complete Polaris Workflow </li></ul><ul><li>Specific Contributions </li></ul><ul><li>Area Constraints Definition </li></ul><ul><ul><li>Proposed solution </li></ul></ul><ul><li>Runtime Core Allocation Management </li></ul><ul><ul><li>Features and Structure of an Allocation Manager </li></ul></ul><ul><ul><li>Relevant Works </li></ul></ul><ul><ul><li>Proposed Solution </li></ul></ul><ul><li>Results </li></ul><ul><li>Conclusions and Future Work </li></ul>
  11. 11. Motivations and goals <ul><li>The creation and management of a self partially and dynamically reconfigurable system is a complex problem </li></ul><ul><ul><li>this is even more critical when exploiting the 2D reconfiguration paradigm </li></ul></ul><ul><ul><li>more issues in the definition of area constraints, in the core allocation decisions </li></ul></ul><ul><ul><li>since the system must be autonomous, it also needs runtime management functionalities </li></ul></ul><ul><li>Need for automation in those processes </li></ul><ul><ul><li>to reduce the workload on the designer </li></ul></ul><ul><ul><li>to improve efficiency of the final reconfigurable system </li></ul></ul>
  12. 12. Motivations and goals <ul><li>Creation of an automated workflow to generate a self dynamically reconfigurable architecture that: </li></ul><ul><ul><li>Has “good” area constraints assigned to cores </li></ul></ul><ul><ul><li>Is autonomous in performing 2D runtime core allocation decisions </li></ul></ul><ul><ul><li>Exploits relocation to ensure that the system can obtain the configuration bitstreams it needs at runtime </li></ul></ul><ul><ul><li>Supports intervention from the designer, to guide or constraint the decisions </li></ul></ul><ul><ul><li>Keeps high flexibility and generality </li></ul></ul>
  13. 13. The Complete Workflow <ul><li>Workflow to automate the creation and management of self dynamically reconfigurable architectures </li></ul><ul><li>Input: user specifications </li></ul><ul><li>Final output: complete architecture generation </li></ul>
  14. 14. Specific Contributions <ul><li>In particular, this thesis deals with the solution identification phase of the flow </li></ul><ul><li>This involves: </li></ul><ul><ul><li>The definition of area constraints for Cores, when the user does not specify them </li></ul></ul><ul><ul><li>The creation of Core Allocation Management solutions, able to efficiently manage runtime Core placement </li></ul></ul><ul><li>This last task includes: </li></ul><ul><ul><li>Offering high versatility, supporting different placement policies and different scenarios </li></ul></ul><ul><ul><li>Keeping low complexity, to avoid too much overhead in the running time of the system </li></ul></ul><ul><ul><li>Experimenting techniques to improve the efficiency, for example allowing multiple shapes per Core </li></ul></ul>
  15. 15. What’s Next <ul><li>Context Definition </li></ul><ul><li>Motivations and Goals </li></ul><ul><li>The Complete Polaris Workflow </li></ul><ul><li>Specific Contributions </li></ul><ul><li>Area Constraints Definition </li></ul><ul><ul><li>Proposed solution </li></ul></ul><ul><li>Runtime Core Allocation Management </li></ul><ul><ul><li>Features and Structure of an Allocation Manager </li></ul></ul><ul><ul><li>Relevant Works </li></ul></ul><ul><ul><li>Proposed Solution </li></ul></ul><ul><li>Results </li></ul><ul><li>Conclusions and Future Work </li></ul>
  16. 16. Area Constraints Definition <ul><li>The designer can choose to specify or not the AC for each Core in the application </li></ul><ul><ul><li>If not specified, they are automatically computed </li></ul></ul><ul><li>The designer can also choose wheter to allow multiple shapes per Core (and how many) </li></ul><ul><li>Finally, the last parameter represent the tightness of the constraints that will be defined: </li></ul><ul><ul><li>Impacts on feasibility of implementation </li></ul></ul><ul><ul><li>Impacts on performance of the RFU </li></ul></ul>CORE RFU (or set of RFUs)
  17. 17. Area Constraints Definition <ul><li>The constraints are defined with a simple heuristics </li></ul><ul><li>First a square-like constraint is defined, using these formulae: </li></ul><ul><ul><li>Where H is the height (in slice) and W is the width, S is the number of slices of the Core and m is the tightness </li></ul></ul>
  18. 18. Area Constraints Definition <ul><li>Then, the constraints are converted from slice to slots </li></ul><ul><ul><li>Where Vg is a granularity parameter, Vslices is the number of vertical slices in the device and avgH is the average height of all the RFUs defined with the square-like formula </li></ul></ul><ul><li>Finally, the constraints (in slots) are iteratively altered to horizontally or vertically stretch the Core and obtain multiple RFUs </li></ul>
  19. 19. What’s next <ul><li>Context Definition </li></ul><ul><li>Motivations and Goals </li></ul><ul><li>The Complete Polaris Workflow </li></ul><ul><li>Specific Contributions </li></ul><ul><li>Area Constraints Definition </li></ul><ul><ul><li>Proposed solution </li></ul></ul><ul><li>Runtime Core Allocation Management </li></ul><ul><ul><li>Features and Structure of an Allocation Manager </li></ul></ul><ul><ul><li>Relevant Works </li></ul></ul><ul><ul><li>Proposed Solution </li></ul></ul><ul><li>Results </li></ul><ul><li>Conclusions and Future Work </li></ul>
  20. 20. Runtime Core Allocation Management <ul><li>The Problem: </li></ul><ul><ul><li>Perform the choice of where to place new cores on the reconfigurable area </li></ul></ul><ul><ul><li>In an online scenario: self partial and dynamical reconfiguration </li></ul></ul><ul><li>The Goal: </li></ul><ul><ul><li>Allow efficient usage of the FPGA area </li></ul></ul><ul><ul><li>Critical in the 2D reconfiguration case </li></ul></ul><ul><li>This requires the creation of a solution for allocation management and suitable policies </li></ul>
  21. 21. Allocation Manager Desired Features <ul><li>Low Core Rejection Rate (CRR) </li></ul><ul><ul><li>% of cores that are not successfully placed in time </li></ul></ul><ul><li>Fast application completion time </li></ul><ul><ul><li>Time from arrival of first Core to completion of last </li></ul></ul><ul><li>Low fragmentation grade </li></ul><ul><ul><li>Fraction of area that is unusable because too sparse </li></ul></ul><ul><li>Small management overhead </li></ul><ul><ul><li>We want a lightweight solution to run inside the system </li></ul></ul><ul><li>High routing efficiency </li></ul><ul><ul><li>If interacting cores are clustered, the system is more efficient </li></ul></ul><ul><li>Need to find a good compromise between them </li></ul>
  22. 22. Example: 2D fragmentation <ul><li>the 2D-fragmentation problem: </li></ul><ul><ul><li>Area generally more fragmented </li></ul></ul><ul><ul><li>Can nullify the area optimizations obtained </li></ul></ul>
  23. 23. Example: Core Rejection <ul><li>Bad choices can lead to performance loss and rejection </li></ul><ul><ul><li>A: Core C is successfully placed at step 2 </li></ul></ul><ul><ul><li>B: Core C is delayed (possibly rejected, if deadline=2) </li></ul></ul>
  24. 24. Considered Scenarios <ul><li>Dynamic Schedule </li></ul><ul><ul><li>Cores can arrive at any time </li></ul></ul><ul><ul><li>Have an ASAP and an ALAP time (dependencies) </li></ul></ul><ul><ul><li>Rejection: failure to respect ALAP for a Core </li></ul></ul><ul><ul><li>Goal: respect the schedule, CRR is the most important metric and should tend to zero </li></ul></ul><ul><li>Blind Schedule </li></ul><ul><ul><li>Cores can be either available from the start or arrive at different times, no dependencies assumed </li></ul></ul><ul><ul><li>no ASAP, Cores can optionally have a deadline </li></ul></ul><ul><ul><li>If a Core is not placed, retry later </li></ul></ul><ul><ul><li>Goal: application must complete as fast as possibile, rejection is not the main issue, total time is </li></ul></ul>
  25. 25. Allocation Manager Creation <ul><li>Choose how to maintain information on empty space </li></ul><ul><ul><li>Keep all information (Expensive but more accurate) </li></ul></ul><ul><ul><li>Heuristically prune information (Cheaper) </li></ul></ul><ul><li>Which placement policy to choose: </li></ul><ul><ul><li>General (First Fit, Best Fit, Worst Fit…) </li></ul></ul><ul><ul><li>Focused (Fragmentation Aware, Routing Aware… ) </li></ul></ul><ul><li>Define in which scenario(s) the manager will work </li></ul><ul><li>It can also be useful to consider and exploit different shapes of a Core (multiple RFUs per Core scenario) </li></ul>
  26. 26. What’s next <ul><li>Context Definition </li></ul><ul><li>Motivations and Goals </li></ul><ul><li>The Complete Polaris Workflow </li></ul><ul><li>Specific Contributions </li></ul><ul><li>Area Constraints Definition </li></ul><ul><ul><li>Proposed solution </li></ul></ul><ul><li>Runtime Core Allocation Management </li></ul><ul><ul><li>Features and Structure of an Allocation Manager </li></ul></ul><ul><ul><li>Relevant Works </li></ul></ul><ul><ul><li>Proposed Solution </li></ul></ul><ul><li>Results </li></ul><ul><li>Conclusions and Future Work </li></ul>
  27. 27. Relevant Works <ul><li>Maintain complete information on empty space: </li></ul><ul><li>KAMER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000. </li></ul><ul><ul><li>Keep All Maximally Empty Rectangles </li></ul></ul><ul><ul><li>Apply a general placement policy </li></ul></ul><ul><li>CUR: A. Ahmadinia and C. Bobda and S. P. Fekete and J. Teich and J. v.d. Veen, ''Optimal Routing-Conscious Dynamic Placement for Reconfigurable Devices'', Field-Programmable Logic and Applications (FPL'04), 2004. </li></ul><ul><ul><li>Maintain the Countour of a Union of Rectangles </li></ul></ul><ul><ul><li>Apply a focused placement policy </li></ul></ul>
  28. 28. Relevant Works <ul><li>Heuristically prune part of the information: </li></ul><ul><li>KNER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000. </li></ul><ul><ul><li>Keep Non-overlapping Empty Rectangles </li></ul></ul><ul><ul><li>Apply a general placement policy </li></ul></ul><ul><li>2D-HASHING: H. Walder and C. Steiger and M. Platzner, ''Fast Online Task Placement on FPGAs: Free Space Partitioning and 2D-Hashing'', International Parallel and Distributed Processing Symposium (IPDPS'03), 2003 . </li></ul><ul><ul><li>Keep Non-ov. Empty Rectangles in optimized data structure </li></ul></ul><ul><ul><li>Apply (exclusively) a general placement policy </li></ul></ul>
  29. 29. Example: Empty Space Information
  30. 30. Evaluation <ul><li>The solutions with higher placement quality also have higher complexity </li></ul><ul><li>The fastest solution cannot exploit focused policies, for example routing aware, and adds the overhead of maintaining the 2D hashing structure </li></ul><ul><li>CUR does not support all general policies, for example Best Fit is not allowed </li></ul>
  31. 31. What’s next <ul><li>Context Definition </li></ul><ul><li>Motivations and Goals </li></ul><ul><li>The Complete Polaris Workflow </li></ul><ul><li>Specific Contributions </li></ul><ul><li>Area Constraints Definition </li></ul><ul><ul><li>Proposed solution </li></ul></ul><ul><li>Runtime Core Allocation Management </li></ul><ul><ul><li>Features and Structure of an Allocation Manager </li></ul></ul><ul><ul><li>Relevant Works </li></ul></ul><ul><ul><li>Proposed Solution </li></ul></ul><ul><li>Results </li></ul><ul><li>Conclusions and Future Work </li></ul>
  32. 32. Proposed Approach <ul><li>Choice driven by: </li></ul><ul><ul><li>Need for a low complexity solution to introduce low overhead at runtime in the self reconfigurable system </li></ul></ul><ul><ul><li>Desire to keep high flexibility, to suit user needs also in terms of placement policies </li></ul></ul><ul><li>For this reasons we propose an heuristic (KNER-like) empty space manager: </li></ul><ul><ul><li>Supporting general and focused placement policies (in particular, First Fit, Best Fit and Routing Aware) </li></ul></ul><ul><ul><li>Suitable for both dynamic schedule and blind schedule scenarios </li></ul></ul><ul><ul><li>Exploiting multiple RFUs per Core, to improve results </li></ul></ul>
  33. 33. Data Representation <ul><li>Core, defined by: </li></ul><ul><ul><li>Arrival time, </li></ul></ul><ul><ul><li>Set of RFUs, each one with: </li></ul></ul><ul><ul><ul><li>H, W, Latency </li></ul></ul></ul><ul><ul><li>Optional set of communicating Cores (if using RA) </li></ul></ul><ul><ul><li>ASAP and ALAP (if in dynamic schedule scenario) </li></ul></ul><ul><li>Two queues: </li></ul><ul><ul><li>one for new Cores </li></ul></ul><ul><ul><li>one for Cores that were not successfully placed and need reexamination </li></ul></ul>
  34. 34. Data Representation <ul><li>Reconfigurable Device, represented as: </li></ul><ul><ul><li>Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle. </li></ul></ul><ul><ul><li>Navigation trough: </li></ul></ul><ul><ul><ul><li>pointers to left child, right child, next leaf </li></ul></ul></ul><ul><ul><ul><li>a function to find the previous leaf (used for bookkeeping after rectangle split and merge operations) </li></ul></ul></ul><ul><li>Rectangle, defined by: </li></ul><ul><ul><li>Coordinates on device: X, Y </li></ul></ul><ul><ul><li>Size: H, W </li></ul></ul><ul><ul><li>Initially one, the root, with: </li></ul></ul><ul><ul><ul><li>(X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols </li></ul></ul></ul>
  35. 35. The Online Placement Algorithm The whole processing of a Core is completed in linear time
  36. 36. The Online Placement Algorithm
  37. 37. The Online Placement Algorithm
  38. 38. What’s next <ul><li>Context Definition </li></ul><ul><li>Motivations and Goals </li></ul><ul><li>The Complete Polaris Workflow </li></ul><ul><li>Specific Contributions </li></ul><ul><li>Area Constraints Definition </li></ul><ul><ul><li>Proposed solution </li></ul></ul><ul><li>Runtime Core Allocation Management </li></ul><ul><ul><li>Features and Structure of an Allocation Manager </li></ul></ul><ul><ul><li>Relevant Works </li></ul></ul><ul><ul><li>Proposed Solution </li></ul></ul><ul><li>Results </li></ul><ul><li>Conclusions and Future Work </li></ul>
  39. 39. Evaluation of the proposed solution <ul><li>To evaluate the quality of the proposed approach in various scenarios and with different metrics 3 kinds of experiment were performed: </li></ul><ul><li>1) A comparison against presented literature solutions </li></ul><ul><ul><li>In a dynamic schedule scenario </li></ul></ul><ul><ul><li>With a Routing Aware placement policy </li></ul></ul><ul><ul><li>Measuring CRR (and indirectly fragmentation), routing costs and computational overhead </li></ul></ul><ul><ul><li>Results published in: </li></ul></ul><ul><li>M. MORANDI , M. Novati, M. D. Santambrogio, D. Sciuto, “ Core allocation and relocation management for a self dynamically recongurable architecture ”, IEEE Computer Society Annual Symposium on VLSI, 2008 </li></ul>
  40. 40. Evaluation of the proposed solution <ul><li>2) A measure of application completion time </li></ul><ul><ul><li>Composed of real Cores used as benchmarks </li></ul></ul><ul><ul><li>In a blind schedule scenario </li></ul></ul><ul><ul><li>Directly measuring application completion time, gaining some insight on CRR and fragmentation </li></ul></ul><ul><li>3) Evaluation of the multiple shapes per Core approach </li></ul><ul><ul><li>Comparison between our solution with multiple shapes and KNER (adapted to blind schedule scenario) </li></ul></ul><ul><ul><li>In a mixed scenario (blind schedule with deadlines and variable arrival times) </li></ul></ul><ul><ul><li>Using both First Fit and Best Fit </li></ul></ul><ul><ul><li>Measure of CRR and running time </li></ul></ul>
  41. 41. Experiment 1: Routing Aware <ul><li>Version of our general solution: </li></ul><ul><ul><li>Tailored to minimize routing paths </li></ul></ul><ul><ul><li>Compared with close solutions from literature </li></ul></ul><ul><ul><li>Named in the table RALP (Routing Aware Linear Placer) </li></ul></ul><ul><li>Benchmark of 100 randomly generated tasks: </li></ul><ul><ul><li>Size (5% to 20% of FPGA), randomly interconnected </li></ul></ul>
  42. 42. Experiment 2: Appl. Completion Time <ul><li>Benchmark applications composed of cores taken from opencores.org like JPEG, AES, 3DES </li></ul><ul><li>Measure the time instants needed to complete the applications with different amounts of resources </li></ul><ul><li>Infinite resources is shown, to compare against the lower bound </li></ul>
  43. 43. Experiment 3: Multiple Shapes <ul><li>Similar benchmark, but Cores have deadlines (for CRR) </li></ul><ul><li>Shapes defined using the heuristic described previously </li></ul><ul><li>Difference in runtime is on average 30% more for 3 shapes and 40% more for 5 shapes w.r.t. 1 shape </li></ul><ul><li>CRR is more than halved, often reduced to one third </li></ul>
  44. 44. Numerical Example <ul><li>To give an idea of the goodness of the obtained results, it is useful to give some numerical values for reconfiguration </li></ul><ul><li>Let us consider a JPEG Core, described by a 690 Kb configuration bitstream for a V4 device and using about 10% of the total area </li></ul><ul><ul><li>Reconfiguration time: 150 ms </li></ul></ul><ul><ul><li>Relocation time: 90 ms </li></ul></ul><ul><ul><li>Placement time: 0.4 ms </li></ul></ul><ul><li>The obtained time is low and is suitable to actual usage in a real system </li></ul>
  45. 45. Concluding Remarks <ul><li>The proposed solution offers: </li></ul><ul><ul><li>High versatility, supporting different placement policies and scenarios, designer intervention, multiple shapes </li></ul></ul><ul><ul><li>Low overhead, always processing a Core in linear time and obtaining good results compared with literature </li></ul></ul><ul><ul><li>Good CRR, especially when exploiting multiple shapes </li></ul></ul><ul><ul><li>Fast application completion time, as shown by exp. 2 </li></ul></ul><ul><ul><li>Effective routing costs reduction, when used in conjunction with a Routing Aware policy (exp. 1) </li></ul></ul><ul><li>The original goals were met </li></ul><ul><li>Under Review: </li></ul><ul><li>S. Corbetta, M. MORANDI , M. Novati, M. D. Santambrogio, D. Sciuto, P. Spoletini, “ Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration” , IEEE Transactions on VLSI (2 nd review) </li></ul>
  46. 46. Future Work <ul><li>Future work will be in the direction of integration with the rest of the workflow that was briefly introduced </li></ul><ul><li>The parts that were described achieved good results as a stand-alone in the runtime management of the reconfigurable system, it is important to evaluate them also inside the complete workflow </li></ul><ul><li>The final goal is to achieve complete automation in the creation process of a self dynamically reconfigurable architecture, from user specification up to bistreams and processor code generation </li></ul>
  47. 47. General Information <ul><li>Webpage </li></ul><ul><ul><li>www.dresd.org/polaris </li></ul></ul><ul><li>Mailing List </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><li>Contact </li></ul><ul><ul><li>To have more information regarding Polaris: </li></ul></ul><ul><ul><ul><li>[email_address] </li></ul></ul></ul><ul><ul><li>For a complete list of information on how to contact us: </li></ul></ul><ul><ul><ul><li>www.dresd.org/contact_polaris </li></ul></ul></ul>
  48. 48. Questions

×