This project implements mincut placement with and without terminal propagation to reduce wirelength. Terminal propagation temporarily moves connected nodes between partitions during cuts to minimize distance. For four benchmark circuits, terminal propagation achieved an average 27.8% reduction in wirelength with minimal runtime impact. Experimental results showed that terminal propagation significantly improves wirelength especially for smaller partition sizes. The implementation faces memory issues for very large circuits. Several extensions are proposed but not implemented due to time constraints.
2. Introduction:Mincut placement is an algorithm which targets wire length reduction. Mincut involves partitioning
the nodes present in a cell. The cell is cut either horizontally or vertically. This project involves breadth first
recursive bi-partitioning. This means that the similar orientation of cuts occur together on similarly sized
partitions. The partitioning algorithm used in our project is Fiduccia-Mattheyses algorithm. Terminal
Propagation is a type of bias that is taken into consideration while partitioning. It helps keep connected nodes
in different partitions, as close as possible. When a cut is performed, a window is created that specifies that
nodes in other partitions that lie in this window cannot terminally propagate. The size of the window is the
breadth of the window. This project has taken a constant window size of 30%.
Problem Formulation:The recursive bisection partitioning method was chosen as terminal propagation can only be used in
such a method. The main reason for choosing FM algorithm for partitioning was to improve the runtime as
we have large nets. The window size was fixed at 30% after finding that it is a good size as experimented in
previous year’s projects. Where ever possible, the complexity of the program has been reduced. Most of the
parameters required for mincut are stored in vectors. The recursively bi-sectioned partitions is pushed into
queues along with the orientation of the cut it will experience when it is popped out. Another queue
maintains the x and y coordinates of the partition. These are explained in detail in the next section.
Algorithm Discussion:The netlist is read as it is and an array of vectors called net is created. Each vector stores the nodes
connected to it. Then, another array called “node” stores the structures that hold valuable information about
each node. The information present in the structure includes node number, which partition it belongs to, the
index of the net number, degree, gain and the (x,y) coordinates. A bucket is maintained for each block that
will result from the partition. It is an array of vector of nodes. A separate vector called “unlocked_A” or
“unlocked_B” maintains the nodes in each partition that are to be moved from each partition. A vector of
gain maintains the gain resulting from each move in FM. After each node is moved, if the new cutsize is less
than the previous cutsize, the gain is stored. Once all nodes are moved, partition method that had lowest
cutsize is restored. Then the next pass of FM is called. The passes continue as long as the previous pass’s
cutsize is larger than the new cutsize. To calculate gain, a node in a partition is found. The nets which he node
belongs to identified. If there are any nodes from that net are present in the opposite partition, the gain
increases by one. If all the nodes in that net are in the same partition as the current node, the gain decreases
by one. Once FM is done, the newly obtained partitions are pushed into a queue. The queue is a segmented
queue. It also stores the orientation of the next cut to be performed. Suppose the present cut is the first
vertical (represented as 0) cut. The new partitions in the queue will have “horizontal” (represented as 1) entry
in the neighboring segmented area of the queue. This denotes that, when the partition is popped out to cut,
it will have the orientation of the cut along with it. There another queue which runs in parallel to the present
one. This new queue stores a vector of the x and y coordinates of the origin and the extreme vertices of the
partition.
3. Now, the window size is fixed at 30%. But depending on the area constraint, the location of the cut
maybe skewed such that it occurs marginally outside the window. But, when the window size was fixed at
35% such that the cut fell within the window, there was no change in the partitioning or wire length. For
terminal propagation, all the nodes that are connected to a given node in the partition to be cut but,
1) lie outside both partitions that are to be cut
2) lie outside the window
are temporarily moved into the partition of the current node. After the cut, the temporary nodes were
removed. Wire length is calculated by Half Perimeter Bounding Box (HPBB). The final x and y coordinates of
each node is dumped into a text file. MATLAB is used to open the text file and plot the nodes and their
connections. The program ends when there are 2 nodes in each partition. This number can and has been
varied during the experimentation. The area constraint is a ratio of nodes allowed in each partition. It is
expressed in a range of 0 to 0.5. Where 0.5 means 50/50 ratio of nodes in each partition while 0 means one
partition can have all the nodes.
Implementation Issues:This code was first created entirely in MATLAB. Even after reducing complexity as much as possible,
the runtime was found to be extremely high for a single pass of FM. For fract.hgr benchmark circuit, the
runtime was approximately 30 seconds. This was around 1000 times slower than previous year’s results for
runtime. Also, the results for variation in area constraint was arbitrary. So, the entire code was re-written in
C++. The runtime for fract.hgr in C++ for multiple passes was found to be a few hundred milliseconds.
Limitations:For circuits with 5000 or more nets (from biomed.hgr), the program runs into a memory segmentation
error. We were unable to find a solution to this fault, which restricted us from trying large circuits. The
minimum number of nodes per partition is 2.
Experimental Results:FM Standalone Results:Benchmark
Number of
Initial Cutsize
Final Cutsize
Gain
Circuit
Passes
Fract
4
87
11
76
P1
5
122
77
45
StructP
3
189
46
143
P2
5
411
206
205
The initial partition is kept constant by selecting the first half of the nodes and putting it in the first
partition and putting the rest of the nodes in the second partition.
4. For mincut, the four benchmark circuits were experimented for wirelength and runtime by changing
the area constraint and the number of nodes per partition. The observed results are given in the table below,
Circuit name
Area
constraint
Number of
nodes per
partiton
0.45
0.35
0.45
0.35
0.45
0.35
0.45
0.35
0.45
0.35
0.45
0.35
0.45
0.35
0.45
0.35
P2
StructP
P1
Fract
2
2
16
16
2
2
16
16
2
2
16
16
2
2
16
16
Without TP
Runtime
Wirelength
in
seconds
42998.9
364.468
43708.2
521.062
39647.8
360.109
39655.5
513.703
13244
70.328
13447.6
55.172
10150
70.422
10268.2
54.938
7716.89
11.906
7588.93
8.954
6476.37
11.766
6062.98
8.422
812.058
0.266
860.311
0.204
509.69
0.203
541.524
0.234
With TP
Runtime
Wirelength
in
seconds
31874.3
390.226
33683.6
547.359
29019.5
386.296
30690.5
527.859
9484.64
81.078
9134.9
58.86
7232.86
79.781
6864.18
58.657
5151.78
14.766
5386.89
10.125
4105.9
14.703
4421.55
10.156
580.475
0.343
609.846
0.265
417.062
0.297
421.523
0.297
Drawing graphs for better interpretation,
Wirelength for 2 nodes per
partition
Without TP
P2
StructP
With TP
P1
Fract
Fract
7232.86
4105.9
417.062
6864.18
4421.55
421.523
30690.5
P1
29019.5
StructP
10268.2
6062.98
541.524
10150
6476.37
509.69
Area
Area
Area
Area
constraint = constraint = constraint = constraint =
0.45
0.35
0.45
0.35
39655.5
P2
39647.8
50000
40000
30000
20000
10000
0
WIRE LENGTH FOR 16
NODES PER PARTITION
AREA
AREA
AREA
AREA
CONSTRAINT CONSTRAINT CONSTRAINT CONSTRAINT
= 0.45
= 0.35
= 0.45
= 0.35
WITHOUT TP
WITH TP
5. Runtime in seconds for 2 nodes
per partition
600
500
400
300
200
100
0
RUNTIME FOR 16
NODES PER PARTITION
P2
Area
constraint =
0.35
Area
constraint =
0.45
Area
constraint =
0.35
Without TP
P2
Area
constraint =
0.45
With TP
StructP
P1
StructP
P1
Fract
600
500
400
300
200
100
0
Area
constraint =
0.35
Fract
Area
constraint =
0.45
Without TP
Area
constraint =
0.35
Area
constraint =
0.45
With TP
Gain in wirelength through Terminal
Propagation
40
30
20
10
0
P2
StructP
P1
Fract
2 Nodes per partition Area Constraint = 0.45
2 Nodes per partition Area Constraint = 0.35
16 Nodes per partition Area Constraint = 0.45
16 Nodes per partition Area Constraint = 0.35
It can be seen from the graphs that the implemented mincut placement with and without terminal
propagation is pretty fast. Using terminal propagation also results in significant gains in wire length. We were
able to obtain an average gain of 27.8% in wire length with the help of terminal propagation.
6. Final Cells for benchmark circuits:Circuit
Area
Const
-raint
Fract
0.35
Nodes Without Terminal Propagation
per
Partiti
-on
2
0.35
16
0.45
2
With Terminal Propagation
10. 0.45
2
0.45
16
*For full size images, please check the “output files” folder.
Conclusion and Extension:The implementation of mincut placement algorithm is quite fast and with the help of terminal
propagation, wire length can be significantly improved. There has been an average gain of 27.8% in wire
length with the help of terminal propagation. There are many extensions/improvements that can be done.
The following were thought of but not implemented due to lack of time.
1) Implement weighted FM where nodes from neighboring partitions add more weightage during
partitioning than nodes multiple partitions away.
2) Compare KL algorithm and weighted FM algorithm in terms of wirelength for different sizes of circuits.
3) Observe the variation in wirelength and runtime for changing the number of nodes per partition.
4) Randomize the initial partitions for FM.
5) Observe the effect of change in window size.