We propose an iterative (IR) and simulated-annealing (SA) based methodology for leakage power minimization by the means of gate sizing and threshold voltage assignment.
2. Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
2
3. Introduction
• Low Power and High Performance
• Mobile device
• Leakage Power Rise
• ITRS Roadmap 2009 [33]
• Technology scales down
3
4. Leakage Power Minimization Methods
• Gate Sizing
𝐺𝑎𝑡𝑒 𝑆𝑖𝑧𝑒 ∝
𝐿𝑒𝑎𝑘𝑎𝑔𝑒 𝑃𝑜𝑤𝑒𝑟 ∝
𝐷𝑟𝑖𝑣𝑖𝑛𝑔 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ
• Threshold Voltage Assignment
• 𝑉𝑡ℎ ∝ 1/𝐿𝑒𝑎𝑘𝑎𝑔𝑒 𝑃𝑜𝑤𝑒𝑟
• 𝑉𝑡ℎ ∝ 𝐷𝑒𝑙𝑎𝑦 𝑡𝑖𝑚𝑒
• Low Vth on critical path
• High Vth on non-critical path
4
5. Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
5
6. Related Work
6
Continuous methods Discrete methods
• Linear Programming (LP)
• Geometric programming
(GP)
• Sensitivity-based Approach
• Slack and delay Budgeting
• Dynamic Programming(DP)
• Lagrangian Relaxation (LR)
• Linear Programming (LP)
• Simulated Annealing (SA)
7. Continuous Methods
• Linear Programming (LP)
• Linear delay model
• The selection of gates is defined as linear function
• Geometric programming (GP)
• Polynomial delay model
7
8. Discrete Methods
• Sensitivity-based approach
• Score and Rank gates according to a defined sensitivity
• Iteratively select the best gate for optimization until no improvement can be
made
• Slack and delay budgeting
• Allocate a slack budget to each gate
• Use the slack budget to trade the power for each gate.
• Dynamic Programming (DP)
• Use decision stage and cost-to-go function.
8
9. Discrete Methods (cont.)
• Lagrangian Relaxation (LR)
• Covert constrained problem to unconstrained one.
• Lagrange multiplier
• Linear Programming (LP)
• The selection of gates is implemented by assigning value to a binary variable:
1 is chosen and 0 otherwise.
• Simulated Annealing (SA)
• Probabilistic method for finding a good approximation to the global optimum
9
10. Related Work Comparison
Methodology Pros Cons
Continuous
Sizing
LP
Fast
Modeling Error
Mapping IssueGP
Discrete
Sizing
Sensitivity Local optimal
Slack & Delay
Ignore delay interaction
LP
DP Solution space explosion
LR Large scale Solution Oscillate
SA
Global optimal
Approximation
Fast solution space
exploration
10
11. Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
11
12. Motivational Example
12
Solution u1 u2 u3
Timing
Violation
Total
Leakage
Power
Solution 1 s10 s06 s04 -2.32 26
Solution 2 s10 s06 f04 0 86
Solution 3 s10 s06 m04 0 38
n2n1
oa oa oa
n3 n4
50ps
u1 u2 u3
13. Problem Formulation
• Inputs:
• Standard Cell Library
• Gate-level Netlist
• Timing Constraints
• Interconnect Parasitics
• Outputs:
• The selection of each cell’s sizes and threshold voltage
• Objective:
• Satisfy all performance constraints
• Minimize total leakage power
13
14. Performance constraints
• Slack violation:
• At PO and DFF inputs, it exists negative slack.
• Slew(Transition time) violation:
• At PO and cell input pins, the transition time is larger than the max limit
transition time.
• Max-load violation:
• At cell output pins, the fan-out load summation is larger than the cell’s max
capacitance.
14
15. Problem Assumptions
• Interconnect parasitics are modeled as lumped capacitance.
• Sequential sizing is not allowed.
• Only one selection for sequential cells.
• Ideal clock network
• No clock buffer, zero skew, and clock net has zero lumped capacitance.
15
16. Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
16
17. Proposed Methodology
• Phase I: Iterative Algorithm for Initial Solution
• Initial solution that satisfies the timing requirement
• Phase II: Simulated-Annealing-Based Algorithm
• Leakage power minimization
17
18. Phase I: Pseudo Code
Iterative Algorithm: upsize cells for feasible solution
Inputs: netlist, cell library, timing constraints, and interconnect parasitics
Outputs: each cell’s size and threshold voltage assignment
Step 1: Count the visited times of the cells traced by negative-slack paths
Step 2: Sort by each cell counter
Step 3: Iterative upsizing in above-defined order
18
19. Phase I: Pseudo Code (Step 1)
Step 1: Count the visited times of the cells traced by negative-slack paths
Run timing engine to calculate each cell’s slack;
Initialize each cell’s counter to zero;
Initialize each cell’s to smallest type-size;
foreach (negative-slack paths)
foreach (cells in the selected path)
if (selected cell has negative slack)
Increase selected cell’s counter;
19
20. Phase I: Pseudo Code (Step 2 & 3)
Step 2: Sort by each cell counter
Sort cell order by each cell’s counter, from larger to small;
Step 3: Iterative upsizing in above-defined order
do
foreach (cell from above-defined order)
if (selected cell has negative slack)
while (selected cell has larger type-size)
if (new Pleakage < old Pleakage)
Update type-size;
until (no negative slack)
20
21. Phase II: Simulated-Annealing-Based
1. Solution representation:
• The set of size and type of each cell.
2. Solution perturbation:
• Randomly pick a cell and change its size and threshold voltage assignment.
3. Cost function:
• Total leakage power.
4. Annealing schedule: (next slide)
21
22. Phase II: SA — Temperature check
22
IF T > ε
THEN NEXT_ITER
ELSE
THEN FINISHED
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
23. Phase II: SA — New solution
23
1. Randomly pick cell
2. Randomly pick new type
and size
3. Call timer and Recalculate
cost
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
24. Phase II: SA — Solution acceptance
24
IF Cnew < Clast
IF Cnew < Cbest
THEN state = UPD
ELSE state = NEW
ELSE IF A.Prob. > Random
THEN state = ACP
ELSE state = REJ
0,1expProb.Accept. *TK
C
old
oldnew
C
CC
C
)(
1,0Random
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
25. Phase II: SA — Solution update
25
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
IF state = UPD or NEW or
ACP
THEN Slast = Snew
ELSE
THEN Slast = Slast
26. Phase II: SA — Temperature update
26
IF γ > φ
THEN DROP_TEMP
ELSE
THEN NEXT_ITER
γ is the counter of successive
state “Reject”
φ is a constant variable
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
27. Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
27
28. Experimental Results
• Experimental Setting
• Standard Library
• Timing Engine
• Acceptance Probability
• Benchmark
• The Trend of Leakage Power Minimization
• Cost Comparison
28
29. Standard Library
• Cell Library in Synopsys Liberty format
• Combinational cells:
• 11 Footprints:
• in01, na02, na03, na04, no02, no03, no04, ao12, ao22, oa12 and oa22
• Each cell has 30 options
• 3 threshold voltage type and 10 gate size
• Sequential cells:
• 1 Footprints: ms08
29
30. Power, Capacitance, & Delay LUBs
30
Footprint:
in01
Leakage Power
(uW)
Capacitance
(fF)
Delay Time
(ps)
Vt Type
Gate Size
s m f s m f s m f
1 1 4 16 12.8 14.4 16 11.7 10.7 9.1
3 3 12 48 38.4 43.2 48 8.2 7.2 6.5
4 4 16 64 51.2 57.6 64 6.5 5.7 5.2
6 6 24 96 76.8 86.4 96 6.5 5.7 5.2
8 8 32 128 102.4 115.2 128 6.5 5.7 5.2
37. Cost Comparison
37
)
35
#
(*15
K
gates
RounduphhRuntime
3.71E+05
1.54E+06
2.05E+05
1.58E+05
1.47E+05
2.15E+05
4.51E+05
3.68E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
DMA
3.51E+05
1.71E+06
2.03E+05
1.15E+05
1.16E+05
6.96E+05
2.26E+05
2.88E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
pci_bridge32
1.54E+06
4.15E+06
6.74E+05
8.84E+05
6.97E+05
9.47E+05
2.28E+06
1.13E+06
0.E+00 2.E+06 4.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
des_perf
4.00E+05
1.47E+06
4.15E+05
3.78E+05
3.91E+05
4.63E+05
6.44E+05
7.53E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
vga_lcd
↓ 73%
38. Cost Comparison (cont.)
38
7.32E+05
1.34E+06
6.27E+05
6.14E+05
7.36E+05
7.58E+05
8.62E+05
5.02E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
b19
3.90E+06
4.78E+06
1.77E+06
1.97E+06
1.94E+06
1.81E+06
2.10E+06
2.00E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
netcard
2.28E+06
5.40E+06
1.42E+06
1.79E+06
2.96E+06
1.47E+06
1.88E+06
1.92E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
leon3mp
39. Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
39
40. Conclusion
• An iterative algorithm is the necessary to initialization. Without using
it, the SA approach may not converge in fixed runtime.
• Our approach can reach a feasible solution in the same magnitude of
related works in all benchmarks.
• In some cases, our approach is resulted in a better solution than
previous work and reduce more than 70 % leakage power from initial
solution in sharp time.
40
41. Future Work
• Much realistic RC network model
• The leakage power minimization of the sequential circuit
41
而在這篇論文中,我們採用 gate sizing 和 threshold voltage assignment 作為我們降低 leakage power 的方法。
Gate 的尺寸影響驅動的能力,又和漏電流成正比,因此選用適當的尺寸能夠減少漏電流值。
另外,Threshold Voltage Assignment則是利用閥值電壓的特性,高Vth delay較長,但是漏電流較小,可以用於non-critical path。 而低 Vth可用於critical path,以符合timing requirement.
接下來是相關研究的部分
Gate sizing and threshold voltage assignmnet 相關的研究從90年代開始已經逐漸受到重視,因此針對在不同的實驗目標而所提出的各種方法,而主要可分為兩大類,分別為左邊的continuous methods和右邊的 discrete methods。
Continuous method 主要有 linear programming 和 geometric programming 兩種方法。
Discrete methods 則有 sensitivity-based approach, slack and delay budgeting 等六種方法。
以下我會簡短介紹各種方法,並且在此部分的最後做一個小結。
在continuous method 中, linear programming 將 power model和 gate selection定義呈線性函數。
而 geometric programming 進一步將 power model 定義成 多項式函數。
Modeling Error: misleads optimization due to the inaccuracy of delay and power models.
Mapping Issue: makes no guarantee on mapping a continuous solution to a discrete one.
LR 是將 constrained problem 轉換成 unconstrained problem 去求larangian multiplier的解
LP 有別於continuous method 將size and threshold voltage的選用,改成一個binary variable,來避免rounding問題
SA 則是 在退火的過程中,有條件地接受較差的解,以求接近最佳解的解,而且適用在離散的解空間(discrete larger search space)
第二階段SA,我分四個部分解釋,
首先 SA 的 solution 是指各個cell所用的type size
第二 solution的擾動是 隨機選取一個cell 並改變他的 type size
第三 cost function 是 total leakage power
最後的annealing schedule 會在flow chart 來說明