1. POLITECNICO DI MILANO
Core Identification for
Reconfigurable Systems driven by
Specification SelfSimilarity
Reconfigurable Computing Italian Meeting
19 December 2008
Room S01, Politecnico di Milano Milan (Italy)
Roberto Cordone: cordone@dti.unimi.it
Massimo Redaelli: mredaelli@elet.polimi.it
2. Outline
Introduction
General Problem
Rationale
Core Identification solutions
Results
Concluding Remarks
2
3. The problem
1. Partition a specification into subsets of operations
(tasks)
2. Map each task onto a compatible circuit design
(mode)
3. Assign a portion of the device to each task,
compatibly with its mode (size, shape,
heterogeneity)
4. Assign a reconfiguration time to each task
5. Assign an execution time to each task
3
4. The data (1)
A specification DFG = (O,P)
operations O, including os, oe for start and end
precedences P: (o, o’) means that o ends before o’ starts
A set M of modes, characterized by
size cm (number of CLBs, possibly shape)
reconfiguration time dm
A compatibility relation between modes and tasks
a task S can be implemented in different modes (MS)
a mode can implement different tasks
4
5. The data (2)
A latency lS,m associated to each task S and compatible
mode m
A set U of reconfigurable units (RUs)
size γ u is the number of CLBs in unit u
A scheduling time horizon T (provided by a heuristic)
5
6. Decision variables
Partition O into tasks (set xS = 1 or 0 for each S ⊆ O)
Map each used task S onto a compatible mode mS ∈MS
Assign to each used task S a portion US ⊆ U
compatible with mS
Assign to each used task S a reconfiguration start time τ S
Assign to each used task S an execution start time tS
6
7. A general model (1)
Minimize the completion time
Subject to
xS defines a partition of O, with singletons for os, oe
and no induced cyclic precedence
mode mS is compatible with task S
mode mS fits into portion US
portion US is connected (to minimize communication overhead)
further shape constraints on portion US
7 further compatibility constraints between mode mS and portion US
8. A general model (2)
the execution follows the reconfiguration
the precedences are respected:
for all S and S’ such that xS = xS = 1 and
two tasks cannot run together on the same RU
for all S and S’ such that xS = xS = 1
when a task is in execution, its RUs cannot be reconfigured
for all S and S’ such that xS = xS = 1
when a task is in reconfiguration, another task can share the
reconfiguration, but only using the same RUs and mode
8
9. Some remarks
The partition of O turns the DFG (O,P) into a
Task Dependency Graph TDG = (N,A)
Also the TDG is acyclic (precedence constraints)
Partitioning, mapping, placing and scheduling
are not independent
The size of the search space is overwhelming:
for each subset of operations, one must define
a mode, out of |M| available ones
a subset of RUs, out of |U| available ones
a reconfiguration start time out of |T| available ones
an execution start time out of |T| available ones
Decomposition approach: build a partition xS independent from the
9
10. The Proposed Approach Rationale
Reconfiguration times impact heavily on the final
solution’s latency
Reuse the configurable modules!
Our approach: identify recurrent structures in the
specification, automatically
10
11. The Proposed Approach
DFG
Specification
int test_code( int io ,
int * o1)
{
int a = 2, b = 10;
Reconfigurable Implementation
Partitioned DFG
11
12. The Proposed Approach: DFG Partitioning
Our approach: two phases
Template Identification
Produce a collection of isomorphism equivalence classes,
each containing some isomorphic subgraphs of the
original specification
Graph covering (template choice)
Choose which among the identified templates are best
suitable for implementation as (re)configurable modules
12
13. Template identification
Problem: finding repeated operations that get performed
in the specification.
In available literature (Software Engineering): extracting
procedures from flat (maybe legacy) code
Textbased matching approach (Ducasse et al. 1999,
Baker 1995)
AST approach (Baxter et al. 1998)
Sourcebased metrics approach (Higo et al. 2002, 2004)
13
15. Problems with Isomorphism
Several problems have been investigated:
•
Graph Isomorphism
1.
Subgraph Isomorphism (GT48)
2.
Largest Common Subgraph (GT49)
3.
However, we are concerned with only one graph:
•
Isomorphic Subgraphs
•
Find two isomorphic subgraphs S1 and S2 of a given
•
graph G
17. The Algorithm
Build a collection V of pairs of basic isomorphic subgraphs;
1.
Extract one pair (S, S’ ) from V;
2.
build the nonoverlapping neighborhoods N (S) and N (S’ ),
a)
which include the nodes adjacent, respectively, to S and S’ . If
any of them is empty, goto 3;
perform a maximum cardinality bipartite matching between N
b)
(S) and N (S’ );
for each matched pair, if adding the two nodes to S and S’
c)
preserves the isomorphism, add them to S and S’ . Goto 2(a)
Save the maximal isomorphic nonoverlapping subgraphs S and S’.
3.
Goto 2.
17
20. Structuring the output
The algorithm returns a list of pairs:
{ (S1, S2), (S3, S4), (S5, S6), …}
Suppose S1 and S3 are isomorphic. Then so are S2 and
S4!
Suppose S3 is isomorphic to a subgraph of S1. Then S2
has a subgraph isomorphic to S4!
20
22. Template choice: metrics
Largest Fit First
Largest templates are best
Most Frequent fit First
Templates with the largest number of instances are best
Communication Weight metrics
E.g., #internal edges vs. #boundary edges ratio
22
23. Experimental Results: Reversedtree templates
Benchmark Largest Largest #Templates
Template #Instances
16 3 151
AES encryptblock
19 3 162
AES decryptblock
38 4 57
DES des_encrypt
Benchmark Largest Largest #Templates
Template #Instances
23
24. Experimental Results: Freeshape templates
Benchmark Largest Largest #Templates
Template #Instances
132 2 6790
AES encryptblock
147 2 11006
AES decryptblock
100 2 1802
DES des_encrypt
Benchmark Largest Largest #Templates
Template #Instances
24
25. Experimental Results: Graph covering
Benchmark Cover % Cover % Cover % CPU Time
LFF MFF Comm
74.3 32.7 74.1 32.5 sec
AES encryptblock
85.31 51.7 70.8 61 sec
AES decryptblock
90.5 59.6 87.8 8.3 sec
DES des_encrypt
Benchmark Cover % Cover % Cover % CPU Time
LFF MFF Comm
25