Core Identification for Reconfigurable Systems driven by Specification Self-Similarity Matteo Giani Univ. ID # 651157728 F. Balasa - A. A. Khokhar – D. Sciuto
Summary Motivations Introduction Aims State of the Art The Proposed Approach Rationale Similarity Extraction Specification Covering Implementation Experimental results Conclusions and Future Work
Motivations Area occupancy Image processing / robotics applications Survivability to changing requirements Evolving standards: cryptography, communications Reconfigurability for Reliability Single Event Upsets, Permanent Faults Designer constraints Unsatisfiable timing constraints given device area
Reconfigurability: introduction
Reconfigurability: introduction
Reconfigurability: introduction
Reconfigurability: introduction
Reconfigurability: introduction
Reconfigurability: introduction
Reconfigurability: introduction Partial Total
Reconfigurability: introduction f i x Partial Total Embedded
MicroLAB’s System on Reconfigurable Chip Architecture
System on Reconfigurable Chip Architecture: physical constraints Area constraints: Trade-off between area used by fixed components and reconfigurable ones Communication issues: Bit-width of the communication infrastructure Number of access points to the communication structure
The Proposed Approach int test_code( int io , int * o1) { int a = 2, b = 10; Specification DFG Partitioned DFG Reconfigurable Implementation
Aims Definition of a specification partitioning approach, that: Aggregates elementary operations in the DFG into clusters suitable to be implemented as configurable modules Identifies regular structures in the specification, aiming at generating reusable modules Save device area Save reconfiguration time
State of the Art Temporal partitioning approaches Reconfigure the whole device at once Impossible to hide reconfiguration times
State of the Art Space-Time partitioning approaches Example
State of the Art Common points among the different approaches Reconfiguration times badly affect the system’s performance Try to embed a loop in each partition Try to minimize the need for reconfiguration Spatial partitioning approaches often rely on the designer for specification partitioning
The Proposed Approach - Rationale Reconfiguration times impact heavily on the final solution’s latency Reuse the configurable modules Our approach: identify recurrent structures in the specification,  automatically
The Proposed Approach int test_code( int io , int * o1) { int a = 2, b = 10; Specification DFG Partitioned DFG Reconfigurable Implementation
The Proposed Approach: Specification -> DFG The PandA framework Behavioral description layer Graph layer
The Proposed Approach int test_code( int io , int * o1) { int a = 2, b = 10; Specification DFG Partitioned DFG Reconfigurable Implementation
The Proposed Approach: DFG Partitioning Objective: Partition the DFG identifying clusters that are  repeated  through the specification Repeated  structures ->  Isomorphic Subgraphs Extraction of isomorphic subgraphs from a given graph is NP-complete Need heuristics to be able to treat the problem
The Proposed Approach: DFG Partitioning Our approach: two phases Template Identification Produce a collection of isomorphism equivalence classes, each containing some isomorphic subgraphs of the original specification Graph covering (template choice) Choose which among the identified templates are best suitable for implementation as (re)configurable modules
The Proposed Approach: Template Identification Two algorithms were considered for this phase: Reversed tree templates Copes with the complexity of the Isomorphic Subgraphs problem by restricting the shape of the subgraphs it identifies Free shape templates Copes with the complexity of the Isomorphic Subgraphs problem by expanding  pairs  of isomorphic subgraphs via a bipartite matching
Template Identification: Reversed-tree templates
Template Identification: Free-shape templates
Template Identification: Free-shape templates
Template Identification: Free-shape templates
Template Identification: Free-shape templates The algorithm produces a  pair  of isomorphic subgraphs for each run The produced pairs are used to build equivalence classes of isomorphic subgraphs, exploiting the transitivity of the isomorphism relation
Template choice: metrics Largest Fit First Largest templates are best Most Frequent fit First Templates with the largest number of instances are best Communication Weight metrics E.g., #internal edges vs. #boundary edges ratio
Implementation Implementation work was carried out as an extension to the PandA framework C++ C++ STL Boost Graph Library
Experimental Results: Reversed-tree templates 40 6 6 FDCT 57 4 38 DES - des_encrypt 162 3 19 AES - decryptblock 151 3 16 AES - encryptblock #Templates Largest #Instances Largest Template Benchmark
Experimental Results: Free-shape templates 1470 2 62 FDCT 1802 2 100 DES - des_encrypt 11006 2 147 AES - decryptblock 6790 2 132 AES - encryptblock #Templates Largest #Instances Largest Template Benchmark
Experimental Results: Graph covering - free-shape 73.3 87.8 70.8 74.1 Cover % - Comm 6.4 sec 53.8 76.7 FDCT 8.3 sec 59.6 90.5 DES - des_encrypt 61 sec 51.7 85.31 AES - decryptblock 32.5 sec 32.7 74.3 AES - encryptblock CPU Time Cover % - MFF Cover % - LFF Benchmark
Experimental Results: Free-shape - AES - encryptblock Template size (nodes) vs. number of identified templates
Experimental Results: Free-shape - AES - encryptblock Template size (nodes) vs. number of instances of the most recurrent template
Experimental Results: Free-shape - AES - encryptblock Template size (nodes) vs. ratio between number of edges included in the clusters and number of edges cut by the cluster boundaries
Experimental Results: Free-shape - AES - encryptblock
Conclusions int test_code( int io , int * o1) { int a = 2, b = 10; Specification DFG Partitioned DFG Reconfigurable Implementation
References Purna, K. M. G. and Bhatia, D.: Temporal partitioning and scheduling data flow graphs for reconfigurable computers. IEEE Trans. Comput., 1999. Ganesan, S. and Vemuri, R.: An integrated temporal partitioning and partial reconfiguration technique for design latency improvement, 2000. Chowdary, A., Kale, S., Saripella, P. K., Sehgal, N. K., and Gupta, R. K.: Extraction of functional regularity in datapath circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,1999. Bachl, S. and Brandenburg, F.-J.: Computing and drawing isomorphic subgraphs. In Graph Drawing, eds, S. G. Kobourov and M. T. Goodrich, 2002 Donato, A., Ferrandi, F., Redaelli, M., Santambrogio, M. D., and Sciuto, D.: Caronte: A complete methodology for the implementation of partially dynamically self- reconfiguring systems on fpga platforms. In FCCM, IEEE Computer Society, 2005
Conclusions, future work A partitioning approach was defined and implemented, to expose recurrent computing patterns in a system specification Starting point: C, SystemC specifications Tests carried out on real-world examples Future Work Refinement of the template choice metrics: e.g. area fragmentation Heuristics for fixed/reconfigurable modules choice Online scheduling, placement of the reconfigurable cores

Thesis Giani UIC Slides EN

  • 1.
    Core Identification forReconfigurable Systems driven by Specification Self-Similarity Matteo Giani Univ. ID # 651157728 F. Balasa - A. A. Khokhar – D. Sciuto
  • 2.
    Summary Motivations IntroductionAims State of the Art The Proposed Approach Rationale Similarity Extraction Specification Covering Implementation Experimental results Conclusions and Future Work
  • 3.
    Motivations Area occupancyImage processing / robotics applications Survivability to changing requirements Evolving standards: cryptography, communications Reconfigurability for Reliability Single Event Upsets, Permanent Faults Designer constraints Unsatisfiable timing constraints given device area
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Reconfigurability: introduction fi x Partial Total Embedded
  • 12.
    MicroLAB’s System onReconfigurable Chip Architecture
  • 13.
    System on ReconfigurableChip Architecture: physical constraints Area constraints: Trade-off between area used by fixed components and reconfigurable ones Communication issues: Bit-width of the communication infrastructure Number of access points to the communication structure
  • 14.
    The Proposed Approachint test_code( int io , int * o1) { int a = 2, b = 10; Specification DFG Partitioned DFG Reconfigurable Implementation
  • 15.
    Aims Definition ofa specification partitioning approach, that: Aggregates elementary operations in the DFG into clusters suitable to be implemented as configurable modules Identifies regular structures in the specification, aiming at generating reusable modules Save device area Save reconfiguration time
  • 16.
    State of theArt Temporal partitioning approaches Reconfigure the whole device at once Impossible to hide reconfiguration times
  • 17.
    State of theArt Space-Time partitioning approaches Example
  • 18.
    State of theArt Common points among the different approaches Reconfiguration times badly affect the system’s performance Try to embed a loop in each partition Try to minimize the need for reconfiguration Spatial partitioning approaches often rely on the designer for specification partitioning
  • 19.
    The Proposed Approach- Rationale Reconfiguration times impact heavily on the final solution’s latency Reuse the configurable modules Our approach: identify recurrent structures in the specification, automatically
  • 20.
    The Proposed Approachint test_code( int io , int * o1) { int a = 2, b = 10; Specification DFG Partitioned DFG Reconfigurable Implementation
  • 21.
    The Proposed Approach:Specification -> DFG The PandA framework Behavioral description layer Graph layer
  • 22.
    The Proposed Approachint test_code( int io , int * o1) { int a = 2, b = 10; Specification DFG Partitioned DFG Reconfigurable Implementation
  • 23.
    The Proposed Approach:DFG Partitioning Objective: Partition the DFG identifying clusters that are repeated through the specification Repeated structures -> Isomorphic Subgraphs Extraction of isomorphic subgraphs from a given graph is NP-complete Need heuristics to be able to treat the problem
  • 24.
    The Proposed Approach:DFG Partitioning Our approach: two phases Template Identification Produce a collection of isomorphism equivalence classes, each containing some isomorphic subgraphs of the original specification Graph covering (template choice) Choose which among the identified templates are best suitable for implementation as (re)configurable modules
  • 25.
    The Proposed Approach:Template Identification Two algorithms were considered for this phase: Reversed tree templates Copes with the complexity of the Isomorphic Subgraphs problem by restricting the shape of the subgraphs it identifies Free shape templates Copes with the complexity of the Isomorphic Subgraphs problem by expanding pairs of isomorphic subgraphs via a bipartite matching
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    Template Identification: Free-shapetemplates The algorithm produces a pair of isomorphic subgraphs for each run The produced pairs are used to build equivalence classes of isomorphic subgraphs, exploiting the transitivity of the isomorphism relation
  • 31.
    Template choice: metricsLargest Fit First Largest templates are best Most Frequent fit First Templates with the largest number of instances are best Communication Weight metrics E.g., #internal edges vs. #boundary edges ratio
  • 32.
    Implementation Implementation workwas carried out as an extension to the PandA framework C++ C++ STL Boost Graph Library
  • 33.
    Experimental Results: Reversed-treetemplates 40 6 6 FDCT 57 4 38 DES - des_encrypt 162 3 19 AES - decryptblock 151 3 16 AES - encryptblock #Templates Largest #Instances Largest Template Benchmark
  • 34.
    Experimental Results: Free-shapetemplates 1470 2 62 FDCT 1802 2 100 DES - des_encrypt 11006 2 147 AES - decryptblock 6790 2 132 AES - encryptblock #Templates Largest #Instances Largest Template Benchmark
  • 35.
    Experimental Results: Graphcovering - free-shape 73.3 87.8 70.8 74.1 Cover % - Comm 6.4 sec 53.8 76.7 FDCT 8.3 sec 59.6 90.5 DES - des_encrypt 61 sec 51.7 85.31 AES - decryptblock 32.5 sec 32.7 74.3 AES - encryptblock CPU Time Cover % - MFF Cover % - LFF Benchmark
  • 36.
    Experimental Results: Free-shape- AES - encryptblock Template size (nodes) vs. number of identified templates
  • 37.
    Experimental Results: Free-shape- AES - encryptblock Template size (nodes) vs. number of instances of the most recurrent template
  • 38.
    Experimental Results: Free-shape- AES - encryptblock Template size (nodes) vs. ratio between number of edges included in the clusters and number of edges cut by the cluster boundaries
  • 39.
  • 40.
    Conclusions int test_code(int io , int * o1) { int a = 2, b = 10; Specification DFG Partitioned DFG Reconfigurable Implementation
  • 41.
    References Purna, K.M. G. and Bhatia, D.: Temporal partitioning and scheduling data flow graphs for reconfigurable computers. IEEE Trans. Comput., 1999. Ganesan, S. and Vemuri, R.: An integrated temporal partitioning and partial reconfiguration technique for design latency improvement, 2000. Chowdary, A., Kale, S., Saripella, P. K., Sehgal, N. K., and Gupta, R. K.: Extraction of functional regularity in datapath circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,1999. Bachl, S. and Brandenburg, F.-J.: Computing and drawing isomorphic subgraphs. In Graph Drawing, eds, S. G. Kobourov and M. T. Goodrich, 2002 Donato, A., Ferrandi, F., Redaelli, M., Santambrogio, M. D., and Sciuto, D.: Caronte: A complete methodology for the implementation of partially dynamically self- reconfiguring systems on fpga platforms. In FCCM, IEEE Computer Society, 2005
  • 42.
    Conclusions, future workA partitioning approach was defined and implemented, to expose recurrent computing patterns in a system specification Starting point: C, SystemC specifications Tests carried out on real-world examples Future Work Refinement of the template choice metrics: e.g. area fragmentation Heuristics for fixed/reconfigurable modules choice Online scheduling, placement of the reconfigurable cores