Implementation of a High Speed and Power Efficient Reliable Multiplier Using ...
ECE260BMiniProject2Report
1. ECE 260B Mini Project 2 Report
Fanyu Yang (A53102865, fay012@eng.ucsd.edu)
Yan Yan (A53096778, yay079@eng.ucsd.edu)
All files used for data collection is available at:
/home/linux/ieng6/ee260b/fay012/mp_2/mp2_results
Metric 1 Design Methodology and Results
1.1 Design Methodology
In this project, we need to carefully size cells in order to reduce leakage power as much as possible and
meanwhile satisfy timing constraints. We take two methods into consideration: Vt swapping and resizing.
By sorting the cellist in descending sequence based on their slacks, we modify cells one by one in the
cellist if only the modification do not cause violations. The flow is shown in Figure 1.
Figure 1 Flow Chart of Design Methodology
(1) Firstly, we consider Vt swapping for cells since it has more impact on leakage reduction than
downsizing but less possibility to cause violations. Inspired by sensitivity-guided metaheuristics[1], and
considering algorithm complexity, we sort the cellist based on cell slack. In order to avoid transition
violation, capacitance violation and block slack violation, we construct a feasible function f.feasible()
which return a boolean value. When f.feasible() returns a true value, which means no violation occurs, we
conduct Vt upscale one step a time(i.e. swapping cells from LVT to NVT, etc.) and repeat until we swap
as many cells to NVT and HVT as possible.
2. (2) Secondly, we consider downsizing. Similarly, we first sort cells by slack and again use f.feasible() as
constraints for our modification. When f.feasible() returns a true value, we make the cell one size
down(i.e. from 40 to 20, etc.). Then repeat sorting and downsizing until all cells have been changed to
their possible smallest sizes.
(3) In view that connected cells of one cell have been modified, the violation caused by change before
may not occur under this new circumstance, which means we can further make modification on certain
cells. Thus, we repeat (1) after downsizing.
(4) With respect to block features of the three benchmarks, we adjust our algorithm accordingly:
1) For usb_phy. Since the cell count is small, we modify all cells.
By comparing the results from pre-ECO and post-ECO, we find that ECO has little impact on WNS. So
we set slack bound 0 in f.feasible().
2) For aes_cipher_top. We find that the cell count is comparably large in this benchmark, so we set the
maximum number of cells CellCNT that are changed during each changing step in order to kill
unnecessary running time. In pre-ECO step, we notice that when CellCNT increases from 10000 to
19059(maximum number), the leakage improvement is enhanced by 2% with almost equal running time.
Hence, we decide to use CellCNT=19059 to produce a better performance.
Also, we carefully analyzing results from pre-ECO and post-ECO and then find that if we set slack bound
0 in f.feasible(), we achieve a negative slack (-0.3 ps) in post-ECO. For the sake of avoiding –1ps
violation, we set slack bound 0 ps in f.feasible(), and run the simulation again to get new outcome.
In addition, through analysis of sizing report, we realize many cells are still in size 08 and 06 after step
(3). As these large-sized cells may have more effect on leakage, we repeat step (2) after step(3).
3) For fpu_add_x3. The cell count is rather large in this benchmark. We simulate with
CellCNT1=10000(approximately 30% of total cell count) and CellCNT2 =27829(total cell count). From
simulation outcome we find that when we increases CellCNT from 10000 to 27829, the leakage
improvement only enhances by more than 1%(from 92.4% 93.6%), and running time for ECO simulation
are almost the same. With respect to time cost, and the trade off between time and leakage improvement,
we finally set CellCNT=27829.
For the same purpose mentioned in aes_cipher_top, we set slack bound 0 ps in f.feasible() and repeat step
(2) after step(3) in simulation.
1.2 Simulation Results And Analysis
The results are shown in Table 1.
Benchmark Pre-ECO
Slack/ps
Post-ECO
Slack/ps
Pre-ECO
Leakage/W
Post-ECO
Leakage/W
Leakage
Improvement/W(%)
usb_phy 1.528870 1.528870 0.000951 0.000951 0.013289 (93.32%)
aes_cipher_to
p
0.012146 -0.360962 0.0652345 0.0652345 0.65874 (90.997%)
fpu_add_x3 0.014832 0.025452 0.0851705 0.0851705 1.23693
(93.558%)
Table 1 Simulation Results
3. Analysis: Comparing results from pre-ECO and post-ECO, we find that after ECO, slack may become
worse while leakage power remains the same. That is because in ECO, parasitic extraction is taken into
consideration so that slack becomes worse due to parasitic effects of designed device and interconnecting
wires. In addition, placement, floorplanning, CTS and routing take place during ECO, which can also
cause slack to be worse. However, the effects mentioned above may affect mostly on dynamic power.
Hence, the leakage is influenced little by ECO.
Metric 2 Total Leakage Power Reduction
Total leakage power reduction is shown in Table 2
Benchmark Leakage Reduced/W Weightage Weighted Leakage Reduction/W
usb_phy 0.013289 10% 0.001329
aes_cipher_top -0.360962 45% 0.296714
fpu_add_x3 0.025452 45% 0.556618
Total 0.854662
Table 2 Leakage Reduction
Metric 3 Post-ECO Timing Quality
Post-ECO quality is shown in Table 3.
Benchmark WNS/ps Capacitance Violation Transition Violation
usb_phy 1.528870 0 0
aes_cipher_top -0.360962 0 0
fpu_add_x3 0.992554 0 0
Table 3 Post-ECO Timing Quality
From Table 3, we can find that all WNSs are better than –1 ps, with no maximum capacitance violation
and no maximum transition violation.
Metric 4 PBA VS GBA; SI VS Non-SI
(1) In PBA
Benchmark Startpoint Endpoint Slack in
PBA/ps
Slack in
GBA/ps
usb_phy i_rx_phy_fs_ce_reg_u0 rst_cnt_reg_4__u0 1.537797 1.537781
aes_cipher_top sa03_reg_4_ sa33_reg_3_ 1.275158 -0.360962
fpu_add_x3 u1_fpu_add_frac_dp_i_a4stg_rnd_f
rac_pre3_q_reg_11_
u1_fpu_add_frac_dp_i_a
5stg_rndadd_q_reg_41_
0.195740 0.100281
Table 4 Timing Comparison Between PBA and GBA
4. From the report, we can see the following cells have different delay values:
1) in usb_phy: U353 (in01s02); U516 (no02s02).
2) in aes_cipher_top: U16287 (oa22s02) ; FE_RC_58_0 (in01s02).
3) in fpu_add_x3: U28122 (in01s01).
Analysis: In GBA, the worst-case delay is assumed and calculated in timing analysis, which is more
pessimistic, while in PBA the actual delay is calculated. If the worst case does not happen for the cell,
then the cell delay will be different in GBA and PBA.
(2)SI-aware Timing Analysis
PrimeTime SI performs crosstalk analysis in conjunction with regular PT analysis flow. Firstly, by
electrical filtering, aggressor nets with small effects are removed. Then, SI choose the set of nets which
are not filtered, analyzes crosstalk effects on them and calculate delays with consideration of crosstalk.
Since crosstalk analysis needs an iterative process, SI needs to conduct it for multiple steps. Initially, SI
uses a conservative model to make sure that every aggressor net can have a worst-case transition and the
worst possible time. Secondly, SI takes into account possible times which victim transitions can take
place and the transition's directions. However, SI does not care about those crosstalk delays that never
occur. Since the second iteration can provide good results, SI usually exits from the loop after finishing
the second iteration.
(3) In SI
Benchmark Startpoint Endpoint Slack in
SI/ps
Slack in
non-SI/p
s
usb_phy i_tx_phy_sft_done_reg_
u0
i_tx_phy_append_eop_reg_
u0
-2.093628 1.528870
aes_cipher_top sa30_reg_4_ sa31_reg_1_ -49.412964 5.280518
fpu_add_x3 u1_fpu_add_frac_dp_i_a
stg_xtra_regs_q_reg_7_
u1_fpu_add_frac_dp_i_a3st
g_ld0_frac_q_reg_43_
-28.073547 7.205322
Table 5 Timing Comparison Between SI and non-SI
From the report, we can see the following cells have different delay values:
1) in usb_phy: U538 (in01s01).
2) in aes_cipher_top: U21759 (no02s02); U20150 (in01s02).
3) in fpu_add_x3: U21666 (na02s04); U33324 (oa22s02).
Analysis: SI conducts delay calculation with consideration of crosstalk effect. Hence, cell delays
computed in SI is usually longer than in non-SI.
References:
[1] Hu, Jin, et al. "Sensitivity-guided metaheuristics for accurate discrete gate sizing." Computer-Aided
Design (ICCAD), 2012 IEEE/ACM International Conference on. IEEE, 2012.