SlideShare a Scribd company logo
1 of 42
Leakage Power Minimization using SA-Based
Gate Sizing and Threshold Voltage Assignment
Chih-Chuan, Yu
Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
2
Introduction
• Low Power and High Performance
• Mobile device
• Leakage Power Rise
• ITRS Roadmap 2009 [33]
• Technology scales down
3
Leakage Power Minimization Methods
• Gate Sizing
𝐺𝑎𝑡𝑒 𝑆𝑖𝑧𝑒 ∝
𝐿𝑒𝑎𝑘𝑎𝑔𝑒 𝑃𝑜𝑤𝑒𝑟 ∝
𝐷𝑟𝑖𝑣𝑖𝑛𝑔 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ
• Threshold Voltage Assignment
• 𝑉𝑡ℎ ∝ 1/𝐿𝑒𝑎𝑘𝑎𝑔𝑒 𝑃𝑜𝑤𝑒𝑟
• 𝑉𝑡ℎ ∝ 𝐷𝑒𝑙𝑎𝑦 𝑡𝑖𝑚𝑒
• Low Vth on critical path
• High Vth on non-critical path
4
Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
5
Related Work
6
Continuous methods Discrete methods
• Linear Programming (LP)
• Geometric programming
(GP)
• Sensitivity-based Approach
• Slack and delay Budgeting
• Dynamic Programming(DP)
• Lagrangian Relaxation (LR)
• Linear Programming (LP)
• Simulated Annealing (SA)
Continuous Methods
• Linear Programming (LP)
• Linear delay model
• The selection of gates is defined as linear function
• Geometric programming (GP)
• Polynomial delay model
7
Discrete Methods
• Sensitivity-based approach
• Score and Rank gates according to a defined sensitivity
• Iteratively select the best gate for optimization until no improvement can be
made
• Slack and delay budgeting
• Allocate a slack budget to each gate
• Use the slack budget to trade the power for each gate.
• Dynamic Programming (DP)
• Use decision stage and cost-to-go function.
8
Discrete Methods (cont.)
• Lagrangian Relaxation (LR)
• Covert constrained problem to unconstrained one.
• Lagrange multiplier
• Linear Programming (LP)
• The selection of gates is implemented by assigning value to a binary variable:
1 is chosen and 0 otherwise.
• Simulated Annealing (SA)
• Probabilistic method for finding a good approximation to the global optimum
9
Related Work Comparison
Methodology Pros Cons
Continuous
Sizing
LP
Fast
Modeling Error
Mapping IssueGP
Discrete
Sizing
Sensitivity Local optimal
Slack & Delay
Ignore delay interaction
LP
DP Solution space explosion
LR Large scale Solution Oscillate
SA
Global optimal
Approximation
Fast solution space
exploration
10
Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
11
Motivational Example
12
Solution u1 u2 u3
Timing
Violation
Total
Leakage
Power
Solution 1 s10 s06 s04 -2.32 26
Solution 2 s10 s06 f04 0 86
Solution 3 s10 s06 m04 0 38
n2n1
oa oa oa
n3 n4
50ps
u1 u2 u3
Problem Formulation
• Inputs:
• Standard Cell Library
• Gate-level Netlist
• Timing Constraints
• Interconnect Parasitics
• Outputs:
• The selection of each cell’s sizes and threshold voltage
• Objective:
• Satisfy all performance constraints
• Minimize total leakage power
13
Performance constraints
• Slack violation:
• At PO and DFF inputs, it exists negative slack.
• Slew(Transition time) violation:
• At PO and cell input pins, the transition time is larger than the max limit
transition time.
• Max-load violation:
• At cell output pins, the fan-out load summation is larger than the cell’s max
capacitance.
14
Problem Assumptions
• Interconnect parasitics are modeled as lumped capacitance.
• Sequential sizing is not allowed.
• Only one selection for sequential cells.
• Ideal clock network
• No clock buffer, zero skew, and clock net has zero lumped capacitance.
15
Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
16
Proposed Methodology
• Phase I: Iterative Algorithm for Initial Solution
• Initial solution that satisfies the timing requirement
• Phase II: Simulated-Annealing-Based Algorithm
• Leakage power minimization
17
Phase I: Pseudo Code
Iterative Algorithm: upsize cells for feasible solution
Inputs: netlist, cell library, timing constraints, and interconnect parasitics
Outputs: each cell’s size and threshold voltage assignment
Step 1: Count the visited times of the cells traced by negative-slack paths
Step 2: Sort by each cell counter
Step 3: Iterative upsizing in above-defined order
18
Phase I: Pseudo Code (Step 1)
Step 1: Count the visited times of the cells traced by negative-slack paths
Run timing engine to calculate each cell’s slack;
Initialize each cell’s counter to zero;
Initialize each cell’s to smallest type-size;
foreach (negative-slack paths)
foreach (cells in the selected path)
if (selected cell has negative slack)
Increase selected cell’s counter;
19
Phase I: Pseudo Code (Step 2 & 3)
Step 2: Sort by each cell counter
Sort cell order by each cell’s counter, from larger to small;
Step 3: Iterative upsizing in above-defined order
do
foreach (cell from above-defined order)
if (selected cell has negative slack)
while (selected cell has larger type-size)
if (new Pleakage < old Pleakage)
Update type-size;
until (no negative slack)
20
Phase II: Simulated-Annealing-Based
1. Solution representation:
• The set of size and type of each cell.
2. Solution perturbation:
• Randomly pick a cell and change its size and threshold voltage assignment.
3. Cost function:
• Total leakage power.
4. Annealing schedule: (next slide)
21
Phase II: SA — Temperature check
22
IF T > ε
THEN NEXT_ITER
ELSE
THEN FINISHED
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
Phase II: SA — New solution
23
1. Randomly pick cell
2. Randomly pick new type
and size
3. Call timer and Recalculate
cost
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
Phase II: SA — Solution acceptance
24
IF Cnew < Clast
IF Cnew < Cbest
THEN state = UPD
ELSE state = NEW
ELSE IF A.Prob. > Random
THEN state = ACP
ELSE state = REJ
 0,1expProb.Accept. *TK
C


old
oldnew
C
CC
C
)( 

 1,0Random
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
Phase II: SA — Solution update
25
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
IF state = UPD or NEW or
ACP
THEN Slast = Snew
ELSE
THEN Slast = Slast
Phase II: SA — Temperature update
26
IF γ > φ
THEN DROP_TEMP
ELSE
THEN NEXT_ITER
γ is the counter of successive
state “Reject”
φ is a constant variable
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
27
Experimental Results
• Experimental Setting
• Standard Library
• Timing Engine
• Acceptance Probability
• Benchmark
• The Trend of Leakage Power Minimization
• Cost Comparison
28
Standard Library
• Cell Library in Synopsys Liberty format
• Combinational cells:
• 11 Footprints:
• in01, na02, na03, na04, no02, no03, no04, ao12, ao22, oa12 and oa22
• Each cell has 30 options
• 3 threshold voltage type and 10 gate size
• Sequential cells:
• 1 Footprints: ms08
29
Power, Capacitance, & Delay LUBs
30
Footprint:
in01
Leakage Power
(uW)
Capacitance
(fF)
Delay Time
(ps)
Vt Type
Gate Size
s m f s m f s m f
1 1 4 16 12.8 14.4 16 11.7 10.7 9.1
3 3 12 48 38.4 43.2 48 8.2 7.2 6.5
4 4 16 64 51.2 57.6 64 6.5 5.7 5.2
6 6 24 96 76.8 86.4 96 6.5 5.7 5.2
8 8 32 128 102.4 115.2 128 6.5 5.7 5.2
Delay time Look-Up Table
• Delay time = f(input slew, output load)
• 2D Linear Interpolation
31
Slew(ps)
Loads (fF)
5 10 15 20 25 30
0 6.5 7.6 8.8 10.0 11.1 12.3
1 7.8 9.0 10.2 11.4 12.6 13.8
2 9.1 10.3 11.5 12.8 14.0 15.2
3 10.4 11.7 12.9 14.2 15.5 16.7
Timing Engine
Runtime
(second/iteration)
PrimeTime®
Full
Functional Timer
Incremental Update
and Full Functional
Timer
DMA 10.00 1.50 0.00087
pci_bridge32 11.00 1.90 0.00096
des_perf 33.00 4.60 0.00067
vga_lcd 44.00 6.80 0.00208
b19 63.00 9.70 0.00238
leon3mp 375.00 26.30 0.01582
netcard 393.00 38.40 0.06694
32
Acceptance Probability
0
0.2
0.4
0.6
0.8
1
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
AcceptanceProbability
ΔC
33
High K
Low K
T*K
C
expProb.Accept.


old
C
old
CnewC
C
)( 

Benchmark
Design # IO pins # Comb cells # Seq Cells # Total Cells
DMA 959 23K 2K 25K
pci_bridge32 361 30K 3K 33K
des_perf 374 102K 9K 111K
vga_lcd 184 148K 17K 165K
b19 47 213K 7K 219K
leon3mp 333 540K 109K 649K
netcard 1,846 861K 98K 959K
34
The Trend of Leakage Power Minimization
35
0
200000
400000
600000
800000
1000000
1200000
1
65
129
193
257
321
385
449
513
577
641
705
769
833
897
961
1025
leakagepower(μW)
iteration*18K
DMA
0
500000
1000000
1500000
2000000
1
67
133
199
265
331
397
463
529
595
661
727
793
859
925
991
1057
leakagepower(μW)
iteration*16K
pci_bridge32
0
1000000
2000000
3000000
4000000
5000000
1
105
209
313
417
521
625
729
833
937
1041
1145
1249
1353
1457
1561
1665
leakagepower(μW)
iteration*23K
des_perf
0
500000
1000000
1500000
2000000
1
69
137
205
273
341
409
477
545
613
681
749
817
885
953
1021
1089
leakagepower(μW)
iteration*15K
vga_lcd
The Trend of Leakage Power Minimization
(cont.)
36
0
500000
1000000
1500000
1
59
117
175
233
291
349
407
465
523
581
639
697
755
813
871
929
leakagepower(μW)
iteration*16K
b19
0
1000000
2000000
3000000
4000000
5000000
6000000
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
leakagepower(μW)
iteration*14K
netcard
0
1000000
2000000
3000000
4000000
5000000
6000000
1
21
41
61
81
101
121
141
161
181
201
221
241
261
281
301
321
leakagepower(μW)
iteration*15K
leon3mp
Cost Comparison
37
)
35
#
(*15
K
gates
RounduphhRuntime 
3.71E+05
1.54E+06
2.05E+05
1.58E+05
1.47E+05
2.15E+05
4.51E+05
3.68E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
DMA
3.51E+05
1.71E+06
2.03E+05
1.15E+05
1.16E+05
6.96E+05
2.26E+05
2.88E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
pci_bridge32
1.54E+06
4.15E+06
6.74E+05
8.84E+05
6.97E+05
9.47E+05
2.28E+06
1.13E+06
0.E+00 2.E+06 4.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
des_perf
4.00E+05
1.47E+06
4.15E+05
3.78E+05
3.91E+05
4.63E+05
6.44E+05
7.53E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
vga_lcd
↓ 73%
Cost Comparison (cont.)
38
7.32E+05
1.34E+06
6.27E+05
6.14E+05
7.36E+05
7.58E+05
8.62E+05
5.02E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
b19
3.90E+06
4.78E+06
1.77E+06
1.97E+06
1.94E+06
1.81E+06
2.10E+06
2.00E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
netcard
2.28E+06
5.40E+06
1.42E+06
1.79E+06
2.96E+06
1.47E+06
1.88E+06
1.92E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
leon3mp
Outline
• Introduction
• Related Work
• Problem Formulation
• Proposed Methodology
• Experimental Results
• Conclusion and Future Work
39
Conclusion
• An iterative algorithm is the necessary to initialization. Without using
it, the SA approach may not converge in fixed runtime.
• Our approach can reach a feasible solution in the same magnitude of
related works in all benchmarks.
• In some cases, our approach is resulted in a better solution than
previous work and reduce more than 70 % leakage power from initial
solution in sharp time.
40
Future Work
• Much realistic RC network model
• The leakage power minimization of the sequential circuit
41
Q&A
Thank you!
42

More Related Content

Similar to Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage Assignment

Expt1_Electronic Principles and Circuits Lab Manual_BEC303_18-11-2023.pptx
Expt1_Electronic Principles and Circuits Lab Manual_BEC303_18-11-2023.pptxExpt1_Electronic Principles and Circuits Lab Manual_BEC303_18-11-2023.pptx
Expt1_Electronic Principles and Circuits Lab Manual_BEC303_18-11-2023.pptxDrAnanthKumarMS1
 
Queuing theory and traffic analysis in depth
Queuing theory and traffic analysis in depthQueuing theory and traffic analysis in depth
Queuing theory and traffic analysis in depthIdcIdk1
 
Power estimation in low power vlsi design
Power estimation in low power vlsi designPower estimation in low power vlsi design
Power estimation in low power vlsi designDr.rukmani Devi
 
Measurements to Perform Voltage Reduction-Miu
Measurements to Perform Voltage Reduction-MiuMeasurements to Perform Voltage Reduction-Miu
Measurements to Perform Voltage Reduction-MiuNicole Segal
 
LSS - Defense - 12-1-2014
LSS - Defense - 12-1-2014LSS - Defense - 12-1-2014
LSS - Defense - 12-1-2014Les Sheffield
 
Advd lecture 7 logical effort
Advd   lecture 7 logical effortAdvd   lecture 7 logical effort
Advd lecture 7 logical effortHardik Gupta
 
Ecd302 unit 04 (analysis)
Ecd302 unit 04 (analysis)Ecd302 unit 04 (analysis)
Ecd302 unit 04 (analysis)Xi Qiu
 
Flexible Memory Allocation in Kinetic Monte Carlo Simulations
Flexible Memory Allocation in Kinetic Monte Carlo SimulationsFlexible Memory Allocation in Kinetic Monte Carlo Simulations
Flexible Memory Allocation in Kinetic Monte Carlo SimulationsAaron Craig
 
TESCO Tuesday: Traditional Ratio, Burden, Admittance and Demag Testing – Part II
TESCO Tuesday: Traditional Ratio, Burden, Admittance and Demag Testing – Part IITESCO Tuesday: Traditional Ratio, Burden, Admittance and Demag Testing – Part II
TESCO Tuesday: Traditional Ratio, Burden, Admittance and Demag Testing – Part IITESCO - The Eastern Specialty Company
 
lowpower consumption and details of dfferent power pdf
lowpower consumption and details of dfferent power pdflowpower consumption and details of dfferent power pdf
lowpower consumption and details of dfferent power pdfManiBharathNuti1
 
ACC_2014_Yasha_Parvini
ACC_2014_Yasha_ParviniACC_2014_Yasha_Parvini
ACC_2014_Yasha_ParviniYasha Parvini
 
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iAHC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iASaurabh Dighe
 
power system state estimation ...Virginia
power system state estimation ...Virginiapower system state estimation ...Virginia
power system state estimation ...Virginianaga7chaitanya
 
Power System Dynamics & Stability Overview & Electromagnetic Transients
Power System Dynamics & Stability Overview  &  Electromagnetic TransientsPower System Dynamics & Stability Overview  &  Electromagnetic Transients
Power System Dynamics & Stability Overview & Electromagnetic TransientsPower System Operation
 
NEURON EONS Model PhD Proposal 2
NEURON EONS Model PhD Proposal 2NEURON EONS Model PhD Proposal 2
NEURON EONS Model PhD Proposal 2Mike Huang
 

Similar to Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage Assignment (20)

Expt1_Electronic Principles and Circuits Lab Manual_BEC303_18-11-2023.pptx
Expt1_Electronic Principles and Circuits Lab Manual_BEC303_18-11-2023.pptxExpt1_Electronic Principles and Circuits Lab Manual_BEC303_18-11-2023.pptx
Expt1_Electronic Principles and Circuits Lab Manual_BEC303_18-11-2023.pptx
 
Queuing theory and traffic analysis in depth
Queuing theory and traffic analysis in depthQueuing theory and traffic analysis in depth
Queuing theory and traffic analysis in depth
 
test generation
test generationtest generation
test generation
 
MSc Presentation.potx
MSc Presentation.potxMSc Presentation.potx
MSc Presentation.potx
 
Power estimation in low power vlsi design
Power estimation in low power vlsi designPower estimation in low power vlsi design
Power estimation in low power vlsi design
 
Measurements to Perform Voltage Reduction-Miu
Measurements to Perform Voltage Reduction-MiuMeasurements to Perform Voltage Reduction-Miu
Measurements to Perform Voltage Reduction-Miu
 
Technotoy2
Technotoy2Technotoy2
Technotoy2
 
LSS - Defense - 12-1-2014
LSS - Defense - 12-1-2014LSS - Defense - 12-1-2014
LSS - Defense - 12-1-2014
 
Advd lecture 7 logical effort
Advd   lecture 7 logical effortAdvd   lecture 7 logical effort
Advd lecture 7 logical effort
 
Ecd302 unit 04 (analysis)
Ecd302 unit 04 (analysis)Ecd302 unit 04 (analysis)
Ecd302 unit 04 (analysis)
 
Flexible Memory Allocation in Kinetic Monte Carlo Simulations
Flexible Memory Allocation in Kinetic Monte Carlo SimulationsFlexible Memory Allocation in Kinetic Monte Carlo Simulations
Flexible Memory Allocation in Kinetic Monte Carlo Simulations
 
TESCO Tuesday: Traditional Ratio, Burden, Admittance and Demag Testing – Part II
TESCO Tuesday: Traditional Ratio, Burden, Admittance and Demag Testing – Part IITESCO Tuesday: Traditional Ratio, Burden, Admittance and Demag Testing – Part II
TESCO Tuesday: Traditional Ratio, Burden, Admittance and Demag Testing – Part II
 
lowpower consumption and details of dfferent power pdf
lowpower consumption and details of dfferent power pdflowpower consumption and details of dfferent power pdf
lowpower consumption and details of dfferent power pdf
 
ACC_2014_Yasha_Parvini
ACC_2014_Yasha_ParviniACC_2014_Yasha_Parvini
ACC_2014_Yasha_Parvini
 
4771_doc_1
4771_doc_14771_doc_1
4771_doc_1
 
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iAHC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
HC24.29.625-IA-23-Wide-Ruhl-Intel_2012_NTV_iA
 
power system state estimation ...Virginia
power system state estimation ...Virginiapower system state estimation ...Virginia
power system state estimation ...Virginia
 
Power System Dynamics & Stability Overview & Electromagnetic Transients
Power System Dynamics & Stability Overview  &  Electromagnetic TransientsPower System Dynamics & Stability Overview  &  Electromagnetic Transients
Power System Dynamics & Stability Overview & Electromagnetic Transients
 
NEURON EONS Model PhD Proposal 2
NEURON EONS Model PhD Proposal 2NEURON EONS Model PhD Proposal 2
NEURON EONS Model PhD Proposal 2
 
NASPI-Arvin
NASPI-ArvinNASPI-Arvin
NASPI-Arvin
 

Recently uploaded

Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfhoangquan21999
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...yogeshlabana357357
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptxyoussefboujtat3
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeAreesha Ahmad
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfmarcuskenyatta275
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...kevin8smith
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Fabiano Dalpiaz
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfSuchita Rawat
 
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)kushbuR
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algaekushbuR
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...mikehavy0
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentslevieagacer
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfStart Project
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionAreesha Ahmad
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxMAGOTI ERNEST
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyAreesha Ahmad
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einsteinxgamestudios8
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semborkhotudu123
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxArunLakshmiMeenakshi
 

Recently uploaded (20)

Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptx
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algae
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interaction
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) Enzymology
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einstein
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptx
 

Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage Assignment

  • 1. Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage Assignment Chih-Chuan, Yu
  • 2. Outline • Introduction • Related Work • Problem Formulation • Proposed Methodology • Experimental Results • Conclusion and Future Work 2
  • 3. Introduction • Low Power and High Performance • Mobile device • Leakage Power Rise • ITRS Roadmap 2009 [33] • Technology scales down 3
  • 4. Leakage Power Minimization Methods • Gate Sizing 𝐺𝑎𝑡𝑒 𝑆𝑖𝑧𝑒 ∝ 𝐿𝑒𝑎𝑘𝑎𝑔𝑒 𝑃𝑜𝑤𝑒𝑟 ∝ 𝐷𝑟𝑖𝑣𝑖𝑛𝑔 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ • Threshold Voltage Assignment • 𝑉𝑡ℎ ∝ 1/𝐿𝑒𝑎𝑘𝑎𝑔𝑒 𝑃𝑜𝑤𝑒𝑟 • 𝑉𝑡ℎ ∝ 𝐷𝑒𝑙𝑎𝑦 𝑡𝑖𝑚𝑒 • Low Vth on critical path • High Vth on non-critical path 4
  • 5. Outline • Introduction • Related Work • Problem Formulation • Proposed Methodology • Experimental Results • Conclusion and Future Work 5
  • 6. Related Work 6 Continuous methods Discrete methods • Linear Programming (LP) • Geometric programming (GP) • Sensitivity-based Approach • Slack and delay Budgeting • Dynamic Programming(DP) • Lagrangian Relaxation (LR) • Linear Programming (LP) • Simulated Annealing (SA)
  • 7. Continuous Methods • Linear Programming (LP) • Linear delay model • The selection of gates is defined as linear function • Geometric programming (GP) • Polynomial delay model 7
  • 8. Discrete Methods • Sensitivity-based approach • Score and Rank gates according to a defined sensitivity • Iteratively select the best gate for optimization until no improvement can be made • Slack and delay budgeting • Allocate a slack budget to each gate • Use the slack budget to trade the power for each gate. • Dynamic Programming (DP) • Use decision stage and cost-to-go function. 8
  • 9. Discrete Methods (cont.) • Lagrangian Relaxation (LR) • Covert constrained problem to unconstrained one. • Lagrange multiplier • Linear Programming (LP) • The selection of gates is implemented by assigning value to a binary variable: 1 is chosen and 0 otherwise. • Simulated Annealing (SA) • Probabilistic method for finding a good approximation to the global optimum 9
  • 10. Related Work Comparison Methodology Pros Cons Continuous Sizing LP Fast Modeling Error Mapping IssueGP Discrete Sizing Sensitivity Local optimal Slack & Delay Ignore delay interaction LP DP Solution space explosion LR Large scale Solution Oscillate SA Global optimal Approximation Fast solution space exploration 10
  • 11. Outline • Introduction • Related Work • Problem Formulation • Proposed Methodology • Experimental Results • Conclusion and Future Work 11
  • 12. Motivational Example 12 Solution u1 u2 u3 Timing Violation Total Leakage Power Solution 1 s10 s06 s04 -2.32 26 Solution 2 s10 s06 f04 0 86 Solution 3 s10 s06 m04 0 38 n2n1 oa oa oa n3 n4 50ps u1 u2 u3
  • 13. Problem Formulation • Inputs: • Standard Cell Library • Gate-level Netlist • Timing Constraints • Interconnect Parasitics • Outputs: • The selection of each cell’s sizes and threshold voltage • Objective: • Satisfy all performance constraints • Minimize total leakage power 13
  • 14. Performance constraints • Slack violation: • At PO and DFF inputs, it exists negative slack. • Slew(Transition time) violation: • At PO and cell input pins, the transition time is larger than the max limit transition time. • Max-load violation: • At cell output pins, the fan-out load summation is larger than the cell’s max capacitance. 14
  • 15. Problem Assumptions • Interconnect parasitics are modeled as lumped capacitance. • Sequential sizing is not allowed. • Only one selection for sequential cells. • Ideal clock network • No clock buffer, zero skew, and clock net has zero lumped capacitance. 15
  • 16. Outline • Introduction • Related Work • Problem Formulation • Proposed Methodology • Experimental Results • Conclusion and Future Work 16
  • 17. Proposed Methodology • Phase I: Iterative Algorithm for Initial Solution • Initial solution that satisfies the timing requirement • Phase II: Simulated-Annealing-Based Algorithm • Leakage power minimization 17
  • 18. Phase I: Pseudo Code Iterative Algorithm: upsize cells for feasible solution Inputs: netlist, cell library, timing constraints, and interconnect parasitics Outputs: each cell’s size and threshold voltage assignment Step 1: Count the visited times of the cells traced by negative-slack paths Step 2: Sort by each cell counter Step 3: Iterative upsizing in above-defined order 18
  • 19. Phase I: Pseudo Code (Step 1) Step 1: Count the visited times of the cells traced by negative-slack paths Run timing engine to calculate each cell’s slack; Initialize each cell’s counter to zero; Initialize each cell’s to smallest type-size; foreach (negative-slack paths) foreach (cells in the selected path) if (selected cell has negative slack) Increase selected cell’s counter; 19
  • 20. Phase I: Pseudo Code (Step 2 & 3) Step 2: Sort by each cell counter Sort cell order by each cell’s counter, from larger to small; Step 3: Iterative upsizing in above-defined order do foreach (cell from above-defined order) if (selected cell has negative slack) while (selected cell has larger type-size) if (new Pleakage < old Pleakage) Update type-size; until (no negative slack) 20
  • 21. Phase II: Simulated-Annealing-Based 1. Solution representation: • The set of size and type of each cell. 2. Solution perturbation: • Randomly pick a cell and change its size and threshold voltage assignment. 3. Cost function: • Total leakage power. 4. Annealing schedule: (next slide) 21
  • 22. Phase II: SA — Temperature check 22 IF T > ε THEN NEXT_ITER ELSE THEN FINISHED FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No
  • 23. Phase II: SA — New solution 23 1. Randomly pick cell 2. Randomly pick new type and size 3. Call timer and Recalculate cost FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No
  • 24. Phase II: SA — Solution acceptance 24 IF Cnew < Clast IF Cnew < Cbest THEN state = UPD ELSE state = NEW ELSE IF A.Prob. > Random THEN state = ACP ELSE state = REJ  0,1expProb.Accept. *TK C   old oldnew C CC C )(    1,0Random FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No
  • 25. Phase II: SA — Solution update 25 FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No IF state = UPD or NEW or ACP THEN Slast = Snew ELSE THEN Slast = Slast
  • 26. Phase II: SA — Temperature update 26 IF γ > φ THEN DROP_TEMP ELSE THEN NEXT_ITER γ is the counter of successive state “Reject” φ is a constant variable FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No
  • 27. Outline • Introduction • Related Work • Problem Formulation • Proposed Methodology • Experimental Results • Conclusion and Future Work 27
  • 28. Experimental Results • Experimental Setting • Standard Library • Timing Engine • Acceptance Probability • Benchmark • The Trend of Leakage Power Minimization • Cost Comparison 28
  • 29. Standard Library • Cell Library in Synopsys Liberty format • Combinational cells: • 11 Footprints: • in01, na02, na03, na04, no02, no03, no04, ao12, ao22, oa12 and oa22 • Each cell has 30 options • 3 threshold voltage type and 10 gate size • Sequential cells: • 1 Footprints: ms08 29
  • 30. Power, Capacitance, & Delay LUBs 30 Footprint: in01 Leakage Power (uW) Capacitance (fF) Delay Time (ps) Vt Type Gate Size s m f s m f s m f 1 1 4 16 12.8 14.4 16 11.7 10.7 9.1 3 3 12 48 38.4 43.2 48 8.2 7.2 6.5 4 4 16 64 51.2 57.6 64 6.5 5.7 5.2 6 6 24 96 76.8 86.4 96 6.5 5.7 5.2 8 8 32 128 102.4 115.2 128 6.5 5.7 5.2
  • 31. Delay time Look-Up Table • Delay time = f(input slew, output load) • 2D Linear Interpolation 31 Slew(ps) Loads (fF) 5 10 15 20 25 30 0 6.5 7.6 8.8 10.0 11.1 12.3 1 7.8 9.0 10.2 11.4 12.6 13.8 2 9.1 10.3 11.5 12.8 14.0 15.2 3 10.4 11.7 12.9 14.2 15.5 16.7
  • 32. Timing Engine Runtime (second/iteration) PrimeTime® Full Functional Timer Incremental Update and Full Functional Timer DMA 10.00 1.50 0.00087 pci_bridge32 11.00 1.90 0.00096 des_perf 33.00 4.60 0.00067 vga_lcd 44.00 6.80 0.00208 b19 63.00 9.70 0.00238 leon3mp 375.00 26.30 0.01582 netcard 393.00 38.40 0.06694 32
  • 34. Benchmark Design # IO pins # Comb cells # Seq Cells # Total Cells DMA 959 23K 2K 25K pci_bridge32 361 30K 3K 33K des_perf 374 102K 9K 111K vga_lcd 184 148K 17K 165K b19 47 213K 7K 219K leon3mp 333 540K 109K 649K netcard 1,846 861K 98K 959K 34
  • 35. The Trend of Leakage Power Minimization 35 0 200000 400000 600000 800000 1000000 1200000 1 65 129 193 257 321 385 449 513 577 641 705 769 833 897 961 1025 leakagepower(μW) iteration*18K DMA 0 500000 1000000 1500000 2000000 1 67 133 199 265 331 397 463 529 595 661 727 793 859 925 991 1057 leakagepower(μW) iteration*16K pci_bridge32 0 1000000 2000000 3000000 4000000 5000000 1 105 209 313 417 521 625 729 833 937 1041 1145 1249 1353 1457 1561 1665 leakagepower(μW) iteration*23K des_perf 0 500000 1000000 1500000 2000000 1 69 137 205 273 341 409 477 545 613 681 749 817 885 953 1021 1089 leakagepower(μW) iteration*15K vga_lcd
  • 36. The Trend of Leakage Power Minimization (cont.) 36 0 500000 1000000 1500000 1 59 117 175 233 291 349 407 465 523 581 639 697 755 813 871 929 leakagepower(μW) iteration*16K b19 0 1000000 2000000 3000000 4000000 5000000 6000000 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 leakagepower(μW) iteration*14K netcard 0 1000000 2000000 3000000 4000000 5000000 6000000 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 leakagepower(μW) iteration*15K leon3mp
  • 37. Cost Comparison 37 ) 35 # (*15 K gates RounduphhRuntime  3.71E+05 1.54E+06 2.05E+05 1.58E+05 1.47E+05 2.15E+05 4.51E+05 3.68E+05 0.E+00 5.E+05 1.E+06 2.E+06 2.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) DMA 3.51E+05 1.71E+06 2.03E+05 1.15E+05 1.16E+05 6.96E+05 2.26E+05 2.88E+05 0.E+00 5.E+05 1.E+06 2.E+06 2.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) pci_bridge32 1.54E+06 4.15E+06 6.74E+05 8.84E+05 6.97E+05 9.47E+05 2.28E+06 1.13E+06 0.E+00 2.E+06 4.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) des_perf 4.00E+05 1.47E+06 4.15E+05 3.78E+05 3.91E+05 4.63E+05 6.44E+05 7.53E+05 0.E+00 5.E+05 1.E+06 2.E+06 2.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) vga_lcd ↓ 73%
  • 38. Cost Comparison (cont.) 38 7.32E+05 1.34E+06 6.27E+05 6.14E+05 7.36E+05 7.58E+05 8.62E+05 5.02E+06 0.E+00 2.E+06 4.E+06 6.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) b19 3.90E+06 4.78E+06 1.77E+06 1.97E+06 1.94E+06 1.81E+06 2.10E+06 2.00E+06 0.E+00 2.E+06 4.E+06 6.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) netcard 2.28E+06 5.40E+06 1.42E+06 1.79E+06 2.96E+06 1.47E+06 1.88E+06 1.92E+06 0.E+00 2.E+06 4.E+06 6.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) leon3mp
  • 39. Outline • Introduction • Related Work • Problem Formulation • Proposed Methodology • Experimental Results • Conclusion and Future Work 39
  • 40. Conclusion • An iterative algorithm is the necessary to initialization. Without using it, the SA approach may not converge in fixed runtime. • Our approach can reach a feasible solution in the same magnitude of related works in all benchmarks. • In some cases, our approach is resulted in a better solution than previous work and reduce more than 70 % leakage power from initial solution in sharp time. 40
  • 41. Future Work • Much realistic RC network model • The leakage power minimization of the sequential circuit 41

Editor's Notes

  1. 首先是 Introduction 的部分
  2. 現今的電子產品,尤其是手持裝置,越來越強調 低耗能 和 高效能 的兼備性。 隨著科技的進度,晶圓製程的演進,漏電流所造成的影響性,更是日趨重要。 根據 ITRS roadmap 在 2009年的報告中指出,漏電流所造成的將在 2016年 增加 百分之400。 因此,降低漏電流的散失,將是重要的議題。
  3. 而在這篇論文中,我們採用 gate sizing 和 threshold voltage assignment 作為我們降低 leakage power 的方法。 Gate 的尺寸影響驅動的能力,又和漏電流成正比,因此選用適當的尺寸能夠減少漏電流值。 另外,Threshold Voltage Assignment則是利用閥值電壓的特性,高Vth delay較長,但是漏電流較小,可以用於non-critical path。 而低 Vth可用於critical path,以符合timing requirement.
  4. 接下來是相關研究的部分
  5. Gate sizing and threshold voltage assignmnet 相關的研究從90年代開始已經逐漸受到重視,因此針對在不同的實驗目標而所提出的各種方法,而主要可分為兩大類,分別為左邊的continuous methods和右邊的 discrete methods。 Continuous method 主要有 linear programming 和 geometric programming 兩種方法。 Discrete methods 則有 sensitivity-based approach, slack and delay budgeting 等六種方法。 以下我會簡短介紹各種方法,並且在此部分的最後做一個小結。
  6. 在continuous method 中, linear programming 將 power model和 gate selection定義呈線性函數。 而 geometric programming 進一步將 power model 定義成 多項式函數。 Modeling Error: misleads optimization due to the inaccuracy of delay and power models. Mapping Issue: makes no guarantee on mapping a continuous solution to a discrete one.
  7. 在 discrete gate sizing 的方法中,首先看到 sensitivity- based approach,他們定義了各自獨特的 sensitivity ,根據這定義給每個gate評分並且排序,選出最高分的gate,並做 leakage minimization 直到無法再優化。 再來是 slack and delay budgeting ,他們找出各個gate的slack,將 較大slack 的 gate 換成 leakage power的gate。 第三部分是採用DP 他是將circuit掃過,依序決定assignment
  8. LR 是將 constrained problem 轉換成 unconstrained problem 去求larangian multiplier的解 LP 有別於continuous method 將size and threshold voltage的選用,改成一個binary variable,來避免rounding問題 SA 則是 在退火的過程中,有條件地接受較差的解,以求接近最佳解的解,而且適用在離散的解空間(discrete larger search space)
  9. 這張表格列出各方法的優缺點,可以看到LP, GP, Sensitivity, Slack & delay 這幾個方法的優點是求解時間短,DP則是有效率地找出解,LR可以用於大尺規的問題,SA則可以找到全域最佳解(global optimal)的近似解 但continuous sizing是modeling error和 mapping issues,前者將delay和power model成比較不準確的線性函數,後者則有可能無法找到離散的解 Sensitivity受限於區域性最佳解(local optimal) Slack&Delay 和 LP 這兩個方法忽略了 跟動gate所造成的delay interaction DP受限於 解空間的大小 LR則可以在求解過程遇到震盪,無法收斂或卡在區域性最佳解(local optimal) SA則需要能夠快去搜尋解空間的方法
  10. 接下來是 Problem Formulation的部分,我們將問題量化以利分析和求解
  11. 首先是一個 motivational example 來解釋gate size and threshold voltage assignment 的精神。 可以看到三個串接的inverters 然後從 n1 走到 n4 的最大時間為 50ps 我們在u3選用不同的threshold voltage type 可以看到 solution 1 違反了timing requirement 有negative slack Solution2和soultion3 都有在達到時間限制,其中solution的leakage power 比較小 因此我們知道 適當的 assignment 不只能到滿足時間限制,亦能夠降低漏電流
  12. 再來是problem formulation的細節,input 四個部分分別是 standard cell library(Synopsis format), gate-level netlist(verilog fromat 定義每個cell的名稱和net的連接), timing constraints (SDC format 限制clock period和clock input name 還有input/output delay 以及 PI的transition time和PO的load) 和 interconnect parasitics (SPEF format 定義每個 net 的 寄生電容) 然後output是在design中的各個cell 最後的 gate size 和 threshold voltage 的選用 我們的目標是要滿足performance constraints 和 最小化 total leakage power Performance constrains 我會在下一張投影片解釋
  13. Performance constraints 是要避免這三個violation分別為 slack, slew, max-load violations Slack violation 是指若在PO 和 DFF的input 有negative slack Slew violation 則是指若在PO和cell input pins 有slew 大於 定義在library中的 max limitation slew 最後, max-load violation 是在cell output pins 若cell fan-out load的總和 大於 這個 cell的最大電容值 Ps: pico second (10^-12) Ff: femto farad (10^-15)
  14. 並且針對我們的實驗環境做了些假設,以避免實驗環境過於複雜。 首先 network互相連結的寄生電容定義為 lumped capacitance,以方便計算。 第二,我們只針對combinational circuit作assignment,sequential sizing不考慮。 第三,假設clock network 是理想值,也就是沒有 clock buffer, 沒有 skew 以及 clock net 沒有lumped capacitance
  15. 接下來是介紹我們提出的設計方法
  16. 在我們的設計方法中,分成兩個階段。 第一個階段是運用 iterative algorithm 求出一個可以滿足 timing requirement的初始值 再交由第二階段的 Simulated annealing algorithm 作進一步的降低漏電流值。
  17. 第一階段的Iterative algorithm Input 和 output 在之前的formulations 有提過。 可分成三個步驟,我在接下來的投影片說明各個步驟。
  18. 第一步,我們的目的是找出每個cell 會被幾條negative slack路徑trace 過 首先我們先算去各個cell的時間差,將cell count設為0,將各個 cell設成最小的type size 找到所有的negative slack 路徑,然後去看在這路徑上的每個 cell 若是negative slack 就將該 cell counter 加一
  19. 第二步,根據剛剛計算過的值去做排序。被trace過越多次的cell,排序越前面。(若一樣,則照topology order。) 最後一步驟,我們依照剛剛的排序,去檢查cell是否有negative slack,假如有就換成較大的type size,直到符合timing constraints. (一定會沒有violation嗎? A:本來是照topology order會遇到這個問題,後來加入前兩個步驟,就可以避免。)
  20. 第二階段SA,我分四個部分解釋, 首先 SA 的 solution 是指各個cell所用的type size 第二 solution的擾動是 隨機選取一個cell 並改變他的 type size 第三 cost function 是 total leakage power 最後的annealing schedule 會在flow chart 來說明
  21. 檢查溫度是否降到bound 是了話就結束
  22. 隨機選取一個cell 並改變他的 type size 呼叫 timer 重新計算cost
  23. 然後判斷要不要接受新的 solution 假如新的cost 比 舊的cost 小 ,再去判斷是否比 best cost來的好,有了話就將state設為UPD,並跟新到best cost。沒有了話就將state設為NEW。 再來是else的部分,會去計算 一個 01之間的亂數值 和 acceptance probability。 acceptance probability 是一個指數函數 exponential 的 (_____) 次方 解釋delta cost (相對於舊cost的改變比例)
  24. 看是哪個state 決定要不要跟新 current solution 到 last solution
  25. 看累積的state REJ到了沒,到了降溫。 γ: gamma φ: phi
  26. 實驗設定 (library, timing engine, AP, benchmark) 實驗數據 數據比較
  27. Combinational Sequential 不考慮
  28. Power和 capacitance Delay 反性趨勢
  29. Delay 是根據slew load查表 二維 線性內插法
  30. Netcard 快 6千倍 DMA pci 快 1萬倍 Des 快 5萬倍 Vga b19 loen 快 3萬倍
  31. AP公式 前面有提過 指數函數 low k high k 右邊 不同K K太大不易收斂,runtime限制。 K太小,SA會趨於local optimal
  32. 測資 7組 Cell count 2萬 到 95萬
  33. RC network Sequential gate sizing