3rd 3DDRESD: Polaris
Upcoming SlideShare
Loading in...5
×
 

3rd 3DDRESD: Polaris

on

  • 431 views

 

Statistics

Views

Total Views
431
Views on SlideShare
431
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

3rd 3DDRESD: Polaris 3rd 3DDRESD: Polaris Presentation Transcript

  • BY Massimo Morandi [email_address] 3D DRESD – 29/07/08 Runtime Core Allocation Management for 2D Self Partially and Dynamically Reconfigurable Systems
  • Rationale and Innovation
    • Problem statement
      • Providing runtime management support for 2D self partial and dynamical reconfiguration, in particular for what concerns Core placement decisions
    • Innovative contributions
      • A fast and flexible solution
        • A low complexity, to avoid introducing too much overhead at runtime
        • Supporting different scenarios and placement policies, according to user needs
      • Allowing the possibility to exploit multiple shapes per Core by integration with area constraints definition
  • Aims
    • Our proposed solution must support different scenarios, placement policies and intervention from the designer
    • It must be fast when compared to related solutions existing in literature
    • The quality of the placement choices must be high, in terms of percentage of placement success, global application completion time or other metrics, as defined by the user
  • Outline
    • Context Definition
    • Motivations and Goals
    • Specific Contributions to Polaris
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • Context Definition
    • Reconfigurable hardware:
      • Has the capability of changing its configuration (functionality) according to user needs
    • Self reconfiguration:
      • the system must be completely autonomous at runtime
    • Partial reconfiguration:
      • the changes can also involve fractions of the device
    • Dynamical Reconfiguration:
      • if a part of the hardware is reconfigured, the rest can continue its computation
    • 2D Reconfiguration:
      • arbitrary rectangular slots can be dynamically reconfigured, as opposed to arbitrary columns in 1D
  • A bit of Terminology
  • What’s next
    • Context Definition
    • Motivations and Goals
    • Specific Contributions to Polaris
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • Motivations and goals
    • The creation and management of a self partially and dynamically reconfigurable system is a complex problem
      • this is even more critical when exploiting the 2D reconfiguration paradigm
      • more issues in the definition of area constraints, in the core allocation decisions
      • since the system must be autonomous, it also needs runtime management functionalities
    • Need for automation in those processes
      • to reduce the workload on the designer
      • to improve efficiency of the final reconfigurable system
  • Motivations and goals
    • Creation of an automated workflow to generate a self dynamically reconfigurable architecture that:
      • Has “good” area constraints assigned to cores
      • Is autonomous in performing 2D runtime core allocation decisions
      • Exploits relocation to ensure that the system can obtain the configuration bitstreams it needs at runtime
      • Supports intervention from the designer, to guide or constraint the decisions
      • Keeps high flexibility and generality
  • Specific Contributions to Polaris
    • Solution identification phase of the flow:
      • The definition of area constraints for Cores, when the user does not specify them
      • The creation of Core Allocation Management solutions, able to efficiently manage runtime Core placement
    • This last task includes:
      • Offering high versatility, supporting different placement policies and different scenarios
      • Keeping low complexity, to avoid too much overhead in the running time of the system
      • Experimenting techniques to improve the efficiency, for example allowing multiple shapes per Core
  • What’s Next
    • Context Definition
    • Motivations and Goals
    • Specific Contributions to Polaris
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • Area Constraints Definition
    • The designer can choose to specify or not the AC for each Core in the application
      • If not specified, they are automatically computed
    • The designer can also choose wheter to allow multiple shapes per Core (and how many)
    • Finally, the last parameter represent the tightness of the constraints that will be defined:
      • Impacts on feasibility of implementation
      • Impacts on performance of the RFU
    CORE RFU (or set of RFUs)
  • Area Constraints Definition
    • The constraints are defined with a simple heuristics
    • First a square-like constraint is defined, using these formulae:
      • Where H is the height (in slice) and W is the width, S is the number of slices of the Core and m is the tightness
  • Area Constraints Definition
    • Then, the constraints are converted from slice to slots
      • Where Vg is a granularity parameter, Vslices is the number of vertical slices in the device and avgH is the average height of all the RFUs defined with the square-like formula
    • Finally, the constraints (in slots) are iteratively altered to horizontally or vertically stretch the Core and obtain multiple RFUs
  • What’s next
    • Context Definition
    • Motivations and Goals
    • Specific Contributions to Polaris
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • Runtime Core Allocation Management
    • The Problem:
      • Perform the choice of where to place new cores on the reconfigurable area
      • In an online scenario: self partial and dynamical reconfiguration
    • The Goal:
      • Allow efficient usage of the FPGA area
      • Critical in the 2D reconfiguration case
    • This requires the creation of a solution for allocation management and suitable policies
  • Allocation Manager Desired Features
    • Low Core Rejection Rate (CRR)
      • % of cores that are not successfully placed in time
    • Fast application completion time
      • Time from arrival of first Core to completion of last
    • Low fragmentation grade
      • Fraction of area that is unusable because too sparse
    • Small management overhead
      • We want a lightweight solution to run inside the system
    • High routing efficiency
      • If interacting cores are clustered, the system is more efficient
    • Need to find a good compromise between them
  • Example: 2D fragmentation
    • the 2D-fragmentation problem:
      • Area generally more fragmented
      • Can nullify the area optimizations obtained
  • Example: Core Rejection
    • Bad choices can lead to performance loss and rejection
      • A: Core C is successfully placed at step 2
      • B: Core C is delayed (possibly rejected, if deadline=2)
  • Considered Scenarios
    • Dynamic Schedule
      • Cores can arrive at any time
      • Have an ASAP and an ALAP time (dependencies)
      • Rejection: failure to respect ALAP for a Core
      • Goal: respect the schedule, CRR is the most important metric and should tend to zero
    • Blind Schedule
      • Cores can be either available from the start or arrive at different times, no dependencies assumed
      • no ASAP, Cores can optionally have a deadline
      • If a Core is not placed, retry later
      • Goal: application must complete as fast as possibile, rejection is not the main issue, total time is
  • Allocation Manager Creation
    • Choose how to maintain information on empty space
      • Keep all information (Expensive but more accurate)
      • Heuristically prune information (Cheaper)
    • Which placement policy to choose:
      • General (First Fit, Best Fit, Worst Fit…)
      • Focused (Fragmentation Aware, Routing Aware… )
    • Define in which scenario(s) the manager will work
    • It can also be useful to consider and exploit different shapes of a Core (multiple RFUs per Core scenario)
  • What’s next
    • Context Definition
    • Motivations and Goals
    • Specific Contributions to Polaris
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • Relevant Works
    • Maintain complete information on empty space:
    • KAMER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
      • Keep All Maximally Empty Rectangles
      • Apply a general placement policy
    • CUR: A. Ahmadinia and C. Bobda and S. P. Fekete and J. Teich and J. v.d. Veen, ''Optimal Routing-Conscious Dynamic Placement for Reconfigurable Devices'', Field-Programmable Logic and Applications (FPL'04), 2004.
      • Maintain the Countour of a Union of Rectangles
      • Apply a focused placement policy
  • Relevant Works
    • Heuristically prune part of the information:
    • KNER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
      • Keep Non-overlapping Empty Rectangles
      • Apply a general placement policy
    • 2D-HASHING: H. Walder and C. Steiger and M. Platzner, ''Fast Online Task Placement on FPGAs: Free Space Partitioning and 2D-Hashing'', International Parallel and Distributed Processing Symposium (IPDPS'03), 2003 .
      • Keep Non-ov. Empty Rectangles in optimized data structure
      • Apply (exclusively) a general placement policy
  • Example: Empty Space Information
  • Evaluation
    • The solutions with higher placement quality also have higher complexity
    • The fastest solution cannot exploit focused policies, for example routing aware, and adds the overhead of maintaining the 2D hashing structure
    • CUR does not support all general policies, for example Best Fit is not allowed
  • What’s next
    • Context Definition
    • Motivations and Goals
    • The Complete Polaris Workflow
    • Specific Contributions
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • Proposed Approach
    • Choice driven by:
      • Need for a low complexity solution to introduce low overhead at runtime in the self reconfigurable system
      • Desire to keep high flexibility, to suit user needs also in terms of placement policies
    • For this reasons we propose an heuristic (KNER-like) empty space manager:
      • Supporting general and focused placement policies (in particular, First Fit, Best Fit and Routing Aware)
      • Suitable for both dynamic schedule and blind schedule scenarios
      • Exploiting multiple RFUs per Core, to improve results
  • Data Representation
    • Core, defined by:
      • Arrival time,
      • Set of RFUs, each one with:
        • H, W, Latency
      • Optional set of communicating Cores (if using RA)
      • ASAP and ALAP (if in dynamic schedule scenario)
    • Two queues:
      • one for new Cores
      • one for Cores that were not successfully placed and need reexamination
  • Data Representation
    • Reconfigurable Device, represented as:
      • Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle.
      • Navigation trough:
        • pointers to left child, right child, next leaf
        • a function to find the previous leaf (used for bookkeeping after rectangle split and merge operations)
    • Rectangle, defined by:
      • Coordinates on device: X, Y
      • Size: H, W
      • Initially one, the root, with:
        • (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols
  • The Online Placement Algorithm The whole processing of a Core is completed in linear time
  • The Online Placement Algorithm
  • The Online Placement Algorithm
  • What’s next
    • Context Definition
    • Motivations and Goals
    • Specific Contributions to Polaris
    • Area Constraints Definition
      • Proposed solution
    • Runtime Core Allocation Management
      • Features and Structure of an Allocation Manager
      • Relevant Works
      • Proposed Solution
    • Results
    • Conclusions and Future Work
  • Evaluation of the proposed solution
    • To evaluate the quality of the proposed approach in various scenarios and with different metrics 3 kinds of experiment were performed:
    • 1) A comparison against presented literature solutions
      • In a dynamic schedule scenario
      • With a Routing Aware placement policy
      • Measuring CRR (and indirectly fragmentation), routing costs and computational overhead
      • Results published in:
    • M. MORANDI , M. Novati, M. D. Santambrogio, D. Sciuto, “ Core allocation and relocation management for a self dynamically recongurable architecture ”, IEEE Computer Society Annual Symposium on VLSI, 2008
  • Evaluation of the proposed solution
    • 2) A measure of application completion time
      • Composed of real Cores used as benchmarks
      • In a blind schedule scenario
      • Directly measuring application completion time, gaining some insight on CRR and fragmentation
    • 3) Evaluation of the multiple shapes per Core approach
      • Comparison between our solution with multiple shapes and KNER (adapted to blind schedule scenario)
      • In a mixed scenario (blind schedule with deadlines and variable arrival times)
      • Using both First Fit and Best Fit
      • Measure of CRR and running time
  • Experiment 1: Routing Aware
    • Version of our general solution:
      • Tailored to minimize routing paths
      • Compared with close solutions from literature
      • Named in the table RALP (Routing Aware Linear Placer)
    • Benchmark of 100 randomly generated tasks:
      • Size (5% to 20% of FPGA), randomly interconnected
  • Experiment 2: Appl. Completion Time
    • Benchmark applications composed of cores taken from opencores.org like JPEG, AES, 3DES
    • Measure the time instants needed to complete the applications with different amounts of resources
    • Infinite resources is shown, to compare against the lower bound
  • Experiment 3: Multiple Shapes
    • Similar benchmark, but Cores have deadlines (for CRR)
    • Shapes defined using the heuristic described previously
    • Difference in runtime is on average 30% more for 3 shapes and 40% more for 5 shapes w.r.t. 1 shape
    • CRR is more than halved, often reduced to one third
  • Numerical Example
    • To give an idea of the goodness of the obtained results, it is useful to give some numerical values for reconfiguration
    • Let us consider a JPEG Core, described by a 690 Kb configuration bitstream for a V4 device and using about 10% of the total area
      • Reconfiguration time: 150 ms
      • Relocation time: 90 ms
      • Placement time: 0.4 ms
    • The obtained time is low and is suitable to actual usage in a real system
  • Concluding Remarks
    • The proposed solution offers:
      • High versatility, supporting different placement policies and scenarios, designer intervention, multiple shapes
      • Low overhead, always processing a Core in linear time and obtaining good results compared with literature
      • Good CRR, especially when exploiting multiple shapes
      • Fast application completion time, as shown by exp. 2
      • Effective routing costs reduction, when used in conjunction with a Routing Aware policy (exp. 1)
    • The original goals were met
    • Under Review:
    • S. Corbetta, M. MORANDI , M. Novati, M. D. Santambrogio, D. Sciuto, P. Spoletini, “ Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration” , IEEE Transactions on VLSI (2 nd review)
  • Future Work
    • Future work will be in the direction of integration with the rest of the workflow that was briefly introduced
    • The parts that were described achieved good results as a stand-alone in the runtime management of the reconfigurable system, it is important to evaluate them also inside the complete workflow
    • The final goal is to achieve complete automation in the creation process of a self dynamically reconfigurable architecture, from user specification up to bistreams and processor code generation
  • Questions