• Like
UIC Thesis Morandi
Upcoming SlideShare
Loading in...5
×

UIC Thesis Morandi

  • 251 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
251
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BY Massimo Morandi [email_address] Thesis committee: John Lillis (Chair), Donatella Sciuto, Mitchell Theys UIC Thesis Defense: May 9 2008 Runtime Core Allocation Management for 2D Self Partially and Dynamically Reconfigurable Systems
  • 2. Rationale and Innovation
    • Problem statement
      • Providing runtime management support for 2D self partial and dynamical reconfiguration, in particular for what concerns Core placement decisions
    • Innovative contributions
      • A fast and flexible solution
        • A low complexity, to avoid introducing too much overhead at runtime
        • Supporting different scenarios and placement policies, according to user needs
      • Allowing the possibility to exploit multiple shapes per Core by integration with area constraints definition
  • 3. Aims
    • Our proposed solution must support different scenarios, placement policies and intervention from the designer
    • It must be fast when compared to related solutions existing in literature
    • The quality of the placement choices must be high, in terms of percentage of placement success, global application completion time or other metrics, as defined by the user
  • 4. Outline
    • Context Definition
    • Motivations and Goals
    • The Complete Polaris Workflow
    • Specific Contributions
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • 5. Context Definition
    • Reconfigurable hardware:
      • Has the capability of changing its configuration (functionality) according to user needs
    • Self reconfiguration:
      • the system must be completely autonomous at runtime
    • Partial reconfiguration:
      • the changes can also involve fractions of the device
    • Dynamical Reconfiguration:
      • if a part of the hardware is reconfigured, the rest can continue its computation
    • 2D Reconfiguration:
      • arbitrary rectangular slots can be dynamically reconfigured, as opposed to arbitrary columns in 1D
  • 6. Field Programmable Gate Array
    • Minimum Granularity:
      • Physical: there is a minimum unit that can be configured independently, depending on the device (Tile)
      • Practical: since reconfiguration has a cost, it is reasonable to define a multiple of a Tile as the minimum reconfigurable unit (Slot)
  • 7. A bit of Terminology
    • Bitstream:
      • Binary file defining the configuration of part or all the reconfigurable device (FPGA)
    • Core:
      • Representation of a functionality, independent of shape and position (example: JPEG)
    • RFU (Reconfigurable Functional Unit):
      • A Core to which area constraints have been applied (example: JPEG constrained in a 2x3 rectangle)
    • A partial bitstream defines a RFU, implemented in a specific position defined by bottom-left corner
    • The same bitstream can be reused for all positions if we exploit bitstream relocation
  • 8. A bit of Terminology
  • 9. Virtual homogeneity
  • 10. What’s next
    • Context Definition
    • Motivations and Goals
    • The Complete Polaris Workflow
    • Specific Contributions
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • 11. Motivations and goals
    • The creation and management of a self partially and dynamically reconfigurable system is a complex problem
      • this is even more critical when exploiting the 2D reconfiguration paradigm
      • more issues in the definition of area constraints, in the core allocation decisions
      • since the system must be autonomous, it also needs runtime management functionalities
    • Need for automation in those processes
      • to reduce the workload on the designer
      • to improve efficiency of the final reconfigurable system
  • 12. Motivations and goals
    • Creation of an automated workflow to generate a self dynamically reconfigurable architecture that:
      • Has “good” area constraints assigned to cores
      • Is autonomous in performing 2D runtime core allocation decisions
      • Exploits relocation to ensure that the system can obtain the configuration bitstreams it needs at runtime
      • Supports intervention from the designer, to guide or constraint the decisions
      • Keeps high flexibility and generality
  • 13. The Complete Workflow
    • Workflow to automate the creation and management of self dynamically reconfigurable architectures
    • Input: user specifications
    • Final output: complete architecture generation
  • 14. Specific Contributions
    • In particular, this thesis deals with the solution identification phase of the flow
    • This involves:
      • The definition of area constraints for Cores, when the user does not specify them
      • The creation of Core Allocation Management solutions, able to efficiently manage runtime Core placement
    • This last task includes:
      • Offering high versatility, supporting different placement policies and different scenarios
      • Keeping low complexity, to avoid too much overhead in the running time of the system
      • Experimenting techniques to improve the efficiency, for example allowing multiple shapes per Core
  • 15. What’s Next
    • Context Definition
    • Motivations and Goals
    • The Complete Polaris Workflow
    • Specific Contributions
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • 16. Area Constraints Definition
    • The designer can choose to specify or not the AC for each Core in the application
      • If not specified, they are automatically computed
    • The designer can also choose wheter to allow multiple shapes per Core (and how many)
    • Finally, the last parameter represent the tightness of the constraints that will be defined:
      • Impacts on feasibility of implementation
      • Impacts on performance of the RFU
    CORE RFU (or set of RFUs)
  • 17. Area Constraints Definition
    • The constraints are defined with a simple heuristics
    • First a square-like constraint is defined, using these formulae:
      • Where H is the height (in slice) and W is the width, S is the number of slices of the Core and m is the tightness
  • 18. Area Constraints Definition
    • Then, the constraints are converted from slice to slots
      • Where Vg is a granularity parameter, Vslices is the number of vertical slices in the device and avgH is the average height of all the RFUs defined with the square-like formula
    • Finally, the constraints (in slots) are iteratively altered to horizontally or vertically stretch the Core and obtain multiple RFUs
  • 19. What’s next
    • Context Definition
    • Motivations and Goals
    • The Complete Polaris Workflow
    • Specific Contributions
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • 20. Runtime Core Allocation Management
    • The Problem:
      • Perform the choice of where to place new cores on the reconfigurable area
      • In an online scenario: self partial and dynamical reconfiguration
    • The Goal:
      • Allow efficient usage of the FPGA area
      • Critical in the 2D reconfiguration case
    • This requires the creation of a solution for allocation management and suitable policies
  • 21. Allocation Manager Desired Features
    • Low Core Rejection Rate (CRR)
      • % of cores that are not successfully placed in time
    • Fast application completion time
      • Time from arrival of first Core to completion of last
    • Low fragmentation grade
      • Fraction of area that is unusable because too sparse
    • Small management overhead
      • We want a lightweight solution to run inside the system
    • High routing efficiency
      • If interacting cores are clustered, the system is more efficient
    • Need to find a good compromise between them
  • 22. Example: 2D fragmentation
    • the 2D-fragmentation problem:
      • Area generally more fragmented
      • Can nullify the area optimizations obtained
  • 23. Example: Core Rejection
    • Bad choices can lead to performance loss and rejection
      • A: Core C is successfully placed at step 2
      • B: Core C is delayed (possibly rejected, if deadline=2)
  • 24. Considered Scenarios
    • Dynamic Schedule
      • Cores can arrive at any time
      • Have an ASAP and an ALAP time (dependencies)
      • Rejection: failure to respect ALAP for a Core
      • Goal: respect the schedule, CRR is the most important metric and should tend to zero
    • Blind Schedule
      • Cores can be either available from the start or arrive at different times, no dependencies assumed
      • no ASAP, Cores can optionally have a deadline
      • If a Core is not placed, retry later
      • Goal: application must complete as fast as possibile, rejection is not the main issue, total time is
  • 25. Allocation Manager Creation
    • Choose how to maintain information on empty space
      • Keep all information (Expensive but more accurate)
      • Heuristically prune information (Cheaper)
    • Which placement policy to choose:
      • General (First Fit, Best Fit, Worst Fit…)
      • Focused (Fragmentation Aware, Routing Aware… )
    • Define in which scenario(s) the manager will work
    • It can also be useful to consider and exploit different shapes of a Core (multiple RFUs per Core scenario)
  • 26. What’s next
    • Context Definition
    • Motivations and Goals
    • The Complete Polaris Workflow
    • Specific Contributions
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • 27. Relevant Works
    • Maintain complete information on empty space:
    • KAMER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
      • Keep All Maximally Empty Rectangles
      • Apply a general placement policy
    • CUR: A. Ahmadinia and C. Bobda and S. P. Fekete and J. Teich and J. v.d. Veen, ''Optimal Routing-Conscious Dynamic Placement for Reconfigurable Devices'', Field-Programmable Logic and Applications (FPL'04), 2004.
      • Maintain the Countour of a Union of Rectangles
      • Apply a focused placement policy
  • 28. Relevant Works
    • Heuristically prune part of the information:
    • KNER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
      • Keep Non-overlapping Empty Rectangles
      • Apply a general placement policy
    • 2D-HASHING: H. Walder and C. Steiger and M. Platzner, ''Fast Online Task Placement on FPGAs: Free Space Partitioning and 2D-Hashing'', International Parallel and Distributed Processing Symposium (IPDPS'03), 2003 .
      • Keep Non-ov. Empty Rectangles in optimized data structure
      • Apply (exclusively) a general placement policy
  • 29. Example: Empty Space Information
  • 30. Evaluation
    • The solutions with higher placement quality also have higher complexity
    • The fastest solution cannot exploit focused policies, for example routing aware, and adds the overhead of maintaining the 2D hashing structure
    • CUR does not support all general policies, for example Best Fit is not allowed
  • 31. What’s next
    • Context Definition
    • Motivations and Goals
    • The Complete Polaris Workflow
    • Specific Contributions
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • 32. Proposed Approach
    • Choice driven by:
      • Need for a low complexity solution to introduce low overhead at runtime in the self reconfigurable system
      • Desire to keep high flexibility, to suit user needs also in terms of placement policies
    • For this reasons we propose an heuristic (KNER-like) empty space manager:
      • Supporting general and focused placement policies (in particular, First Fit, Best Fit and Routing Aware)
      • Suitable for both dynamic schedule and blind schedule scenarios
      • Exploiting multiple RFUs per Core, to improve results
  • 33. Data Representation
    • Core, defined by:
      • Arrival time,
      • Set of RFUs, each one with:
        • H, W, Latency
      • Optional set of communicating Cores (if using RA)
      • ASAP and ALAP (if in dynamic schedule scenario)
    • Two queues:
      • one for new Cores
      • one for Cores that were not successfully placed and need reexamination
  • 34. Data Representation
    • Reconfigurable Device, represented as:
      • Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle.
      • Navigation trough:
        • pointers to left child, right child, next leaf
        • a function to find the previous leaf (used for bookkeeping after rectangle split and merge operations)
    • Rectangle, defined by:
      • Coordinates on device: X, Y
      • Size: H, W
      • Initially one, the root, with:
        • (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols
  • 35. The Online Placement Algorithm The whole processing of a Core is completed in linear time
  • 36. The Online Placement Algorithm
  • 37. The Online Placement Algorithm
  • 38. What’s next
    • Context Definition
    • Motivations and Goals
    • The Complete Polaris Workflow
    • Specific Contributions
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • 39. Evaluation of the proposed solution
    • To evaluate the quality of the proposed approach in various scenarios and with different metrics 3 kinds of experiment were performed:
    • 1) A comparison against presented literature solutions
      • In a dynamic schedule scenario
      • With a Routing Aware placement policy
      • Measuring CRR (and indirectly fragmentation), routing costs and computational overhead
      • Results published in:
    • M. MORANDI , M. Novati, M. D. Santambrogio, D. Sciuto, “ Core allocation and relocation management for a self dynamically recongurable architecture ”, IEEE Computer Society Annual Symposium on VLSI, 2008
  • 40. Evaluation of the proposed solution
    • 2) A measure of application completion time
      • Composed of real Cores used as benchmarks
      • In a blind schedule scenario
      • Directly measuring application completion time, gaining some insight on CRR and fragmentation
    • 3) Evaluation of the multiple shapes per Core approach
      • Comparison between our solution with multiple shapes and KNER (adapted to blind schedule scenario)
      • In a mixed scenario (blind schedule with deadlines and variable arrival times)
      • Using both First Fit and Best Fit
      • Measure of CRR and running time
  • 41. Experiment 1: Routing Aware
    • Version of our general solution:
      • Tailored to minimize routing paths
      • Compared with close solutions from literature
      • Named in the table RALP (Routing Aware Linear Placer)
    • Benchmark of 100 randomly generated tasks:
      • Size (5% to 20% of FPGA), randomly interconnected
  • 42. Experiment 2: Appl. Completion Time
    • Benchmark applications composed of cores taken from opencores.org like JPEG, AES, 3DES
    • Measure the time instants needed to complete the applications with different amounts of resources
    • Infinite resources is shown, to compare against the lower bound
  • 43. Experiment 3: Multiple Shapes
    • Similar benchmark, but Cores have deadlines (for CRR)
    • Shapes defined using the heuristic described previously
    • Difference in runtime is on average 30% more for 3 shapes and 40% more for 5 shapes w.r.t. 1 shape
    • CRR is more than halved, often reduced to one third
  • 44. Numerical Example
    • To give an idea of the goodness of the obtained results, it is useful to give some numerical values for reconfiguration
    • Let us consider a JPEG Core, described by a 690 Kb configuration bitstream for a V4 device and using about 10% of the total area
      • Reconfiguration time: 150 ms
      • Relocation time: 90 ms
      • Placement time: 0.4 ms
    • The obtained time is low and is suitable to actual usage in a real system
  • 45. Concluding Remarks
    • The proposed solution offers:
      • High versatility, supporting different placement policies and scenarios, designer intervention, multiple shapes
      • Low overhead, always processing a Core in linear time and obtaining good results compared with literature
      • Good CRR, especially when exploiting multiple shapes
      • Fast application completion time, as shown by exp. 2
      • Effective routing costs reduction, when used in conjunction with a Routing Aware policy (exp. 1)
    • The original goals were met
    • Under Review:
    • S. Corbetta, M. MORANDI , M. Novati, M. D. Santambrogio, D. Sciuto, P. Spoletini, “ Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration” , IEEE Transactions on VLSI (2 nd review)
  • 46. Future Work
    • Future work will be in the direction of integration with the rest of the workflow that was briefly introduced
    • The parts that were described achieved good results as a stand-alone in the runtime management of the reconfigurable system, it is important to evaluate them also inside the complete workflow
    • The final goal is to achieve complete automation in the creation process of a self dynamically reconfigurable architecture, from user specification up to bistreams and processor code generation
  • 47. General Information
    • Webpage
      • www.dresd.org/polaris
    • Mailing List
      • [email_address]
    • Contact
      • To have more information regarding Polaris:
        • [email_address]
      • For a complete list of information on how to contact us:
        • www.dresd.org/contact_polaris
  • 48. Questions