• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
3rd 3DDRESD: Polaris
 

3rd 3DDRESD: Polaris

on

  • 416 views

 

Statistics

Views

Total Views
416
Views on SlideShare
416
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    3rd 3DDRESD: Polaris 3rd 3DDRESD: Polaris Presentation Transcript

    • BY Massimo Morandi [email_address] 3D DRESD – 29/07/08 Runtime Core Allocation Management for 2D Self Partially and Dynamically Reconfigurable Systems
    • Rationale and Innovation
      • Problem statement
        • Providing runtime management support for 2D self partial and dynamical reconfiguration, in particular for what concerns Core placement decisions
      • Innovative contributions
        • A fast and flexible solution
          • A low complexity, to avoid introducing too much overhead at runtime
          • Supporting different scenarios and placement policies, according to user needs
        • Allowing the possibility to exploit multiple shapes per Core by integration with area constraints definition
    • Aims
      • Our proposed solution must support different scenarios, placement policies and intervention from the designer
      • It must be fast when compared to related solutions existing in literature
      • The quality of the placement choices must be high, in terms of percentage of placement success, global application completion time or other metrics, as defined by the user
    • Outline
      • Context Definition
      • Motivations and Goals
      • Specific Contributions to Polaris
      • Area Constraints Definition
        • Proposed solution
      • Runtime Core Allocation Management
        • Features and Structure of an Allocation Manager
        • Relevant Works
        • Proposed Solution
      • Results
      • Conclusions and Future Work
    • Context Definition
      • Reconfigurable hardware:
        • Has the capability of changing its configuration (functionality) according to user needs
      • Self reconfiguration:
        • the system must be completely autonomous at runtime
      • Partial reconfiguration:
        • the changes can also involve fractions of the device
      • Dynamical Reconfiguration:
        • if a part of the hardware is reconfigured, the rest can continue its computation
      • 2D Reconfiguration:
        • arbitrary rectangular slots can be dynamically reconfigured, as opposed to arbitrary columns in 1D
    • A bit of Terminology
    • What’s next
      • Context Definition
      • Motivations and Goals
      • Specific Contributions to Polaris
      • Area Constraints Definition
        • Proposed solution
      • Runtime Core Allocation Management
        • Features and Structure of an Allocation Manager
        • Relevant Works
        • Proposed Solution
      • Results
      • Conclusions and Future Work
    • Motivations and goals
      • The creation and management of a self partially and dynamically reconfigurable system is a complex problem
        • this is even more critical when exploiting the 2D reconfiguration paradigm
        • more issues in the definition of area constraints, in the core allocation decisions
        • since the system must be autonomous, it also needs runtime management functionalities
      • Need for automation in those processes
        • to reduce the workload on the designer
        • to improve efficiency of the final reconfigurable system
    • Motivations and goals
      • Creation of an automated workflow to generate a self dynamically reconfigurable architecture that:
        • Has “good” area constraints assigned to cores
        • Is autonomous in performing 2D runtime core allocation decisions
        • Exploits relocation to ensure that the system can obtain the configuration bitstreams it needs at runtime
        • Supports intervention from the designer, to guide or constraint the decisions
        • Keeps high flexibility and generality
    • Specific Contributions to Polaris
      • Solution identification phase of the flow:
        • The definition of area constraints for Cores, when the user does not specify them
        • The creation of Core Allocation Management solutions, able to efficiently manage runtime Core placement
      • This last task includes:
        • Offering high versatility, supporting different placement policies and different scenarios
        • Keeping low complexity, to avoid too much overhead in the running time of the system
        • Experimenting techniques to improve the efficiency, for example allowing multiple shapes per Core
    • What’s Next
      • Context Definition
      • Motivations and Goals
      • Specific Contributions to Polaris
      • Area Constraints Definition
        • Proposed solution
      • Runtime Core Allocation Management
        • Features and Structure of an Allocation Manager
        • Relevant Works
        • Proposed Solution
      • Results
      • Conclusions and Future Work
    • Area Constraints Definition
      • The designer can choose to specify or not the AC for each Core in the application
        • If not specified, they are automatically computed
      • The designer can also choose wheter to allow multiple shapes per Core (and how many)
      • Finally, the last parameter represent the tightness of the constraints that will be defined:
        • Impacts on feasibility of implementation
        • Impacts on performance of the RFU
      CORE RFU (or set of RFUs)
    • Area Constraints Definition
      • The constraints are defined with a simple heuristics
      • First a square-like constraint is defined, using these formulae:
        • Where H is the height (in slice) and W is the width, S is the number of slices of the Core and m is the tightness
    • Area Constraints Definition
      • Then, the constraints are converted from slice to slots
        • Where Vg is a granularity parameter, Vslices is the number of vertical slices in the device and avgH is the average height of all the RFUs defined with the square-like formula
      • Finally, the constraints (in slots) are iteratively altered to horizontally or vertically stretch the Core and obtain multiple RFUs
    • What’s next
      • Context Definition
      • Motivations and Goals
      • Specific Contributions to Polaris
      • Area Constraints Definition
        • Proposed solution
      • Runtime Core Allocation Management
        • Features and Structure of an Allocation Manager
        • Relevant Works
        • Proposed Solution
      • Results
      • Conclusions and Future Work
    • Runtime Core Allocation Management
      • The Problem:
        • Perform the choice of where to place new cores on the reconfigurable area
        • In an online scenario: self partial and dynamical reconfiguration
      • The Goal:
        • Allow efficient usage of the FPGA area
        • Critical in the 2D reconfiguration case
      • This requires the creation of a solution for allocation management and suitable policies
    • Allocation Manager Desired Features
      • Low Core Rejection Rate (CRR)
        • % of cores that are not successfully placed in time
      • Fast application completion time
        • Time from arrival of first Core to completion of last
      • Low fragmentation grade
        • Fraction of area that is unusable because too sparse
      • Small management overhead
        • We want a lightweight solution to run inside the system
      • High routing efficiency
        • If interacting cores are clustered, the system is more efficient
      • Need to find a good compromise between them
    • Example: 2D fragmentation
      • the 2D-fragmentation problem:
        • Area generally more fragmented
        • Can nullify the area optimizations obtained
    • Example: Core Rejection
      • Bad choices can lead to performance loss and rejection
        • A: Core C is successfully placed at step 2
        • B: Core C is delayed (possibly rejected, if deadline=2)
    • Considered Scenarios
      • Dynamic Schedule
        • Cores can arrive at any time
        • Have an ASAP and an ALAP time (dependencies)
        • Rejection: failure to respect ALAP for a Core
        • Goal: respect the schedule, CRR is the most important metric and should tend to zero
      • Blind Schedule
        • Cores can be either available from the start or arrive at different times, no dependencies assumed
        • no ASAP, Cores can optionally have a deadline
        • If a Core is not placed, retry later
        • Goal: application must complete as fast as possibile, rejection is not the main issue, total time is
    • Allocation Manager Creation
      • Choose how to maintain information on empty space
        • Keep all information (Expensive but more accurate)
        • Heuristically prune information (Cheaper)
      • Which placement policy to choose:
        • General (First Fit, Best Fit, Worst Fit…)
        • Focused (Fragmentation Aware, Routing Aware… )
      • Define in which scenario(s) the manager will work
      • It can also be useful to consider and exploit different shapes of a Core (multiple RFUs per Core scenario)
    • What’s next
      • Context Definition
      • Motivations and Goals
      • Specific Contributions to Polaris
      • Area Constraints Definition
        • Proposed solution
      • Runtime Core Allocation Management
        • Features and Structure of an Allocation Manager
        • Relevant Works
        • Proposed Solution
      • Results
      • Conclusions and Future Work
    • Relevant Works
      • Maintain complete information on empty space:
      • KAMER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
        • Keep All Maximally Empty Rectangles
        • Apply a general placement policy
      • CUR: A. Ahmadinia and C. Bobda and S. P. Fekete and J. Teich and J. v.d. Veen, ''Optimal Routing-Conscious Dynamic Placement for Reconfigurable Devices'', Field-Programmable Logic and Applications (FPL'04), 2004.
        • Maintain the Countour of a Union of Rectangles
        • Apply a focused placement policy
    • Relevant Works
      • Heuristically prune part of the information:
      • KNER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.
        • Keep Non-overlapping Empty Rectangles
        • Apply a general placement policy
      • 2D-HASHING: H. Walder and C. Steiger and M. Platzner, ''Fast Online Task Placement on FPGAs: Free Space Partitioning and 2D-Hashing'', International Parallel and Distributed Processing Symposium (IPDPS'03), 2003 .
        • Keep Non-ov. Empty Rectangles in optimized data structure
        • Apply (exclusively) a general placement policy
    • Example: Empty Space Information
    • Evaluation
      • The solutions with higher placement quality also have higher complexity
      • The fastest solution cannot exploit focused policies, for example routing aware, and adds the overhead of maintaining the 2D hashing structure
      • CUR does not support all general policies, for example Best Fit is not allowed
    • What’s next
      • Context Definition
      • Motivations and Goals
      • The Complete Polaris Workflow
      • Specific Contributions
      • Area Constraints Definition
        • Proposed solution
      • Runtime Core Allocation Management
        • Features and Structure of an Allocation Manager
        • Relevant Works
        • Proposed Solution
      • Results
      • Conclusions and Future Work
    • Proposed Approach
      • Choice driven by:
        • Need for a low complexity solution to introduce low overhead at runtime in the self reconfigurable system
        • Desire to keep high flexibility, to suit user needs also in terms of placement policies
      • For this reasons we propose an heuristic (KNER-like) empty space manager:
        • Supporting general and focused placement policies (in particular, First Fit, Best Fit and Routing Aware)
        • Suitable for both dynamic schedule and blind schedule scenarios
        • Exploiting multiple RFUs per Core, to improve results
    • Data Representation
      • Core, defined by:
        • Arrival time,
        • Set of RFUs, each one with:
          • H, W, Latency
        • Optional set of communicating Cores (if using RA)
        • ASAP and ALAP (if in dynamic schedule scenario)
      • Two queues:
        • one for new Cores
        • one for Cores that were not successfully placed and need reexamination
    • Data Representation
      • Reconfigurable Device, represented as:
        • Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle.
        • Navigation trough:
          • pointers to left child, right child, next leaf
          • a function to find the previous leaf (used for bookkeeping after rectangle split and merge operations)
      • Rectangle, defined by:
        • Coordinates on device: X, Y
        • Size: H, W
        • Initially one, the root, with:
          • (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols
    • The Online Placement Algorithm The whole processing of a Core is completed in linear time
    • The Online Placement Algorithm
    • The Online Placement Algorithm
    • What’s next
      • Context Definition
      • Motivations and Goals
      • Specific Contributions to Polaris
      • Area Constraints Definition
        • Proposed solution
      • Runtime Core Allocation Management
        • Features and Structure of an Allocation Manager
        • Relevant Works
        • Proposed Solution
      • Results
      • Conclusions and Future Work
    • Evaluation of the proposed solution
      • To evaluate the quality of the proposed approach in various scenarios and with different metrics 3 kinds of experiment were performed:
      • 1) A comparison against presented literature solutions
        • In a dynamic schedule scenario
        • With a Routing Aware placement policy
        • Measuring CRR (and indirectly fragmentation), routing costs and computational overhead
        • Results published in:
      • M. MORANDI , M. Novati, M. D. Santambrogio, D. Sciuto, “ Core allocation and relocation management for a self dynamically recongurable architecture ”, IEEE Computer Society Annual Symposium on VLSI, 2008
    • Evaluation of the proposed solution
      • 2) A measure of application completion time
        • Composed of real Cores used as benchmarks
        • In a blind schedule scenario
        • Directly measuring application completion time, gaining some insight on CRR and fragmentation
      • 3) Evaluation of the multiple shapes per Core approach
        • Comparison between our solution with multiple shapes and KNER (adapted to blind schedule scenario)
        • In a mixed scenario (blind schedule with deadlines and variable arrival times)
        • Using both First Fit and Best Fit
        • Measure of CRR and running time
    • Experiment 1: Routing Aware
      • Version of our general solution:
        • Tailored to minimize routing paths
        • Compared with close solutions from literature
        • Named in the table RALP (Routing Aware Linear Placer)
      • Benchmark of 100 randomly generated tasks:
        • Size (5% to 20% of FPGA), randomly interconnected
    • Experiment 2: Appl. Completion Time
      • Benchmark applications composed of cores taken from opencores.org like JPEG, AES, 3DES
      • Measure the time instants needed to complete the applications with different amounts of resources
      • Infinite resources is shown, to compare against the lower bound
    • Experiment 3: Multiple Shapes
      • Similar benchmark, but Cores have deadlines (for CRR)
      • Shapes defined using the heuristic described previously
      • Difference in runtime is on average 30% more for 3 shapes and 40% more for 5 shapes w.r.t. 1 shape
      • CRR is more than halved, often reduced to one third
    • Numerical Example
      • To give an idea of the goodness of the obtained results, it is useful to give some numerical values for reconfiguration
      • Let us consider a JPEG Core, described by a 690 Kb configuration bitstream for a V4 device and using about 10% of the total area
        • Reconfiguration time: 150 ms
        • Relocation time: 90 ms
        • Placement time: 0.4 ms
      • The obtained time is low and is suitable to actual usage in a real system
    • Concluding Remarks
      • The proposed solution offers:
        • High versatility, supporting different placement policies and scenarios, designer intervention, multiple shapes
        • Low overhead, always processing a Core in linear time and obtaining good results compared with literature
        • Good CRR, especially when exploiting multiple shapes
        • Fast application completion time, as shown by exp. 2
        • Effective routing costs reduction, when used in conjunction with a Routing Aware policy (exp. 1)
      • The original goals were met
      • Under Review:
      • S. Corbetta, M. MORANDI , M. Novati, M. D. Santambrogio, D. Sciuto, P. Spoletini, “ Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration” , IEEE Transactions on VLSI (2 nd review)
    • Future Work
      • Future work will be in the direction of integration with the rest of the workflow that was briefly introduced
      • The parts that were described achieved good results as a stand-alone in the runtime management of the reconfigurable system, it is important to evaluate them also inside the complete workflow
      • The final goal is to achieve complete automation in the creation process of a self dynamically reconfigurable architecture, from user specification up to bistreams and processor code generation
    • Questions