SlideShare a Scribd company logo
1 of 60
Download to read offline
I
N
VE
N
TI
V
E
Low Power ICD Talks
Vivek Shukla
October 24, 2007
2
Topics
 Introduction
 Power Dissipation basic
 Existing Low Power Techniques and Issues for
 Advance LP Techniques (under exploration)
October 24, 2007
3
Introduction
0.1
1
10
100
1000
1970 1980 1990 2000 2010 2020
Power
(Watts)
1000's of
Watts?
8080
8086
386
Pentium® proc
Pentium® 4 proc
Unconstrained power will reach 1,000’s of watts
Unconstrained power will reach 1,000
Unconstrained power will reach 1,000’
’s of watts
s of watts
October 24, 2007
4
Power Density will Get Even Worse
Hot Plate
Hot Plate
Nuclear Reactor
Nuclear Reactor
Rocket Nozzle
Rocket Nozzle
Sun
Sun’
’s Surface
s Surface
4004
4004
8008
8008
8080
8080
8085
8085
8086
8086
286
286
386
386
486
486
Pentium
Pentium®
®
processors
processors
1
1
10
10
100
100
1,000
1,000
10,000
10,000
’
’70
70 ’
’80
80 ’
’90
90 ’
’00
00 ’
’10
10
Power
Density
Power
Density
(W/cm2)
(W/cm2)
October 24, 2007
5
Motivation
• Portability
– Extending battery life
• Battery technologies scales-up slowly – 150Wh/kg today vs. 75Wh/kg in 1990
• 1 Kg Ni Cad battery could power 1 hrs for P4 can power Centrino for 4 Hour
– Low power dissipation as a product feature in itself
– Enabling portable devices to be more powerful and feature-rich
• Packaging
– High power dissipation leads to expensive packaging and cooling systems
• ~ 1W: inexpensive plastic package limit
• ~ 10W: Ceramic package limit
• ~ 10W/cm2: limit for convection cooling
• ~ 50W/cm2: limit for forced-air cooling
• Reliability
– High Product life time
October 24, 2007
6
Sources of Power Dissipation in CMOS
Power in a CMOS inverter is governed by the 3 part equation above
• Dynamic (switching) power
– Currently the largest part, but percentage getting smaller
• Leakage Power
– Subthreshold conduction – getting bigger due to aggressive scaling, temperature, etc.
– Reverse leakage of diodes (relatively small)
– Possible gate tunneling current in future technologies
• Short-circuit (crowbar) current
– Both pull-up and pull-down devices are partially conducting for a small, but finite
amount of time
– Can be modeled as some fraction of dynamic current
Ptotal = CLVDD
2fclka01 + VDDIshort-circuit + VDDIleakage
October 24, 2007
7
Sources of Power Dissipation: Switching
• One half of the power from the supply is
consumed in the pull-up network (PMOS) and
the remaining half is stored in CL when Vout
makes 10 transition
• During 01 transition the charge stored in CL is
dumped via the pull-down network (NMOS)
• Power = (Energy/Transition)*(Transition Rate)
= CLVDD
2 * f01
= CLVDD
2 * fclk* a01
= CswitchedVDD
2fclk
where Cswitched = CL*a01 and
a01 = probability of 01 transition
• Dynamic power therefore can be reduced by
– Scaling down the supply voltage VDD
– Reducing the switching probability thru’
architectural means
– Scaling down the frequency as per
throughput demands
– Optimizing/reducing the load capacitance
(Device Scaling)
October 24, 2007
8
Sources of Power Dissipation: Short-Circuit
• Due to finite input transition time both NMOS and PMOS conduct for a small, but
finite duration, thus providing a resistive path btw VDD and GND
• Typically less than 10% of the total dynamic power
• The short-circuit current Isc depends on the ratio of input to output transition times
(higher the ratio, more is the duration for which both the devices are ON, higher the
dissipation due to short-circuit current)
• Can be minimized by balancing out the input and output rise times
• Can be virtually eliminated by making VDD less than (VTN+|VTP|)
October 24, 2007
9
Sources of Power Dissipation: Leakage
I1 : pn junction reverse bias current
I2 : Subthreshold conduction due to weak inversion
I3 : Drain-induced barrier lowering (DIBL)
I4 : Gate-induced drain leakage (GIDL)
I5 : Punchthrough
I6 : Narrow width effect
I7 : Gate oxide tunneling
I8 : Hot carrier injection (HCI)
• Significant contributor to standby power
• The most dominant one among these is the
subthreshold leakage current (I2) due to
constant lowering of VTH with scaling (see
the exponential dependence over VTH and
also see the sensitivity w/ temperature)
• There are several techniques to contain this
viz. using dual-VT, multi-VT libraries, using
MTCMOS technology, using VTCMOS
technologies, using Back-biasing etc.
October 24, 2007
10
Medium-High
High
High
Medium
Low
None
None
Synth, Formal 
Test Impact
High
High
Medium-
high
Medium
Low
Low
None
Implement.
impact
Medium
High
High
Low
None
None
None
Verification
impact
10%
10%
-
10X
Substrate Biasing
10%
0%
40-70%
2-3X
Dynamic and Adaptive
Voltage Frequency Scaling
(DVFS and AVS)
5-15%
4-8%
~0%
10-50X
Power shut-off (PSO)
10%
0%
40-50%
2X
Multi-supply voltage (MSV)
2%
0%
20%
0X
Clock gating
2 to -2%
0%
0%
6X
Multi-Vt optimization
-10%
0%
10%
1.1X
Area optimization
Area
penalty
Timing
penalty
Dynamic
power
Leakage
power
Power reduction
technique
Low-Power Techniques
Basic
Advanced
October 24, 2007
11
PSO in std cell based design
Fine Grain Power Switches -Eg. Coarse Grain Power Switches
Buffered µ
µ
µ
µSwitch Un-buffered µ
µ
µ
µSwitch
Virtual
Vss
Real
VSS
Real
VSS
Virtual
Vss
A Z
SLEEP
A1 Z
SLEEP
A2
Real
VSS
Real
VSS
SLEEP
SLEEP
5%
30%
Power gate leakage
Needs to be addressed
No issue
Simultaneous switching
capacitance
Always-on buffer by abutment
Always-on buffer network
Gate control slew rate
Actual switching (5% area)
Worst case switching (30% area)
Power gate size
Coarse grain
Fine grain
October 24, 2007
12
PSO in std cell based design (contd..)
D: Active
B:
FF
Vss
Vdd
VddC
PD3 – Shut down
FF
SRP
G FF
Iso.
iso_en
shutoff
PSE
En_in
Column Pitch
(200um)
Left Offset
(150um)
PD1
En_out_1 En_out_2 En_out_3
En_in
Column Pitch
(200um)
Left Offset
(150um)
PD1
En_out_1 En_out_2 En_out_3
Switchable Power Domain
(PD1)
En_in
En_out
Note:
switch cell
has 2 buffers
built-in with
different directions
Switchable Power Domain
(PD1)
En_in
En_out
Note:
switch cell
has 2 buffers
built-in with
different directions
Filler
Forms contiguous ring
Prevents additional leakage
Breaker
Divides into separate gate control groups
Used with feed-thru enable signal
Corner
Acts like corner cell in pad ring
Buffer-only (no switch) / switch-only (no buffer)
Allows flexible control of buffer tree
October 24, 2007
13
PSO in std cell based design (contd..)
• Ring
– Ring(s) of switches enclose the power domain fully or partially
– Switches placed outside the power domain
– Switch cell treated as hard macro
– Often used with hard macros (not allowed to touch inside)
– More IRdrop
– Better current distribution
• Column
– Columns of switches inside power domain
– Switches placed in the standard cell rows
– Switch is a standard cell
– Often used inside hierarchical (soft) blocks
– Lesser IRdrop
– More prone to rush current issue
– Needs careful EM checks
October 24, 2007
14
• Key in PSO design apart of PSO insertion
– Power up and rush current
• Dynamic IRdrop becoming must
• Optimum no’s of switch
• Smooth power up
– Verification
• RTL simulations
• Low power insertion checks
• CPF verification
– DFM
• More power rails
• Stacked via requirements
• EM
– Testability
• Coverage on the logic on the Restore and Save signals
– ESD
– IRdrop aware timing analysis
PSO in std cell based design (contd..)
October 24, 2007
15
PSO in std cell based design (contd…)
PDM2
PDM1
Good
Missing
OFF ON
LH
ISO
1.2V
0.8V
iB
PD
ISO_EN
PMM
iA
1
0
X
Structural/Rule Checking
• User defines rules for crossings, isolation type, and location
 Conformal LP reports missing or redundant isolation/ level shifter cells
 Conformal LP reports wrong isolation cell type
 Conformal LP reports bad level shifter direction
 Conformal LP reports wrong isolation cell / level shifter domain location
Low Power Insertion Checks
October 24, 2007
16
PSO in std cell based design
• Equivalence checking for Low Power design
– Ensure low power optimizations do not introduce logical errors
– Verify gated clocks, gated signals, de-cloning, and re-cloning of
gated clocks
– Check State Retention mapping from RTL to gate
– Check corresponding presence of Isolation and level shifter during
implementation
Silicon Virtual Prototype
Power Routing
Low Power
Clock Tree Synthesis
Domain-aware Post-CTS
Optimization
IR-Aware Timing/SI Opt.
Decap insertion
Sign-off
Switch cell Insertion
(for MTCMOS)
Placement including
SRPG/Level shifters/Isol. cells
Top-down Single-pass Synthesis
Power Grid Synthesis
Domain Aware NanoRoute
Conformal
Low
Power
• Power domain structural and functional checks
– Ensure proper insertion of low power cells
– Ensure proper connectivity of low power cells
– Formally validate isolation function
– Formally validate state retention function
– Supports both logical and physical (power aware) netlists
• Transistor Electrical Verification
– Detect Sneak (leakage) paths across power domain boundaries
Low Power Insertion Checks
October 24, 2007
17
PSO in std cell based design (contd...)
Test the Low Power Design, Reduce Power During Test
• Insert the required Power-
Aware test DFT
• Test Access Mechanism
(PTAM)
• Power-aware scan chains
• Encounter Test Model
has test modes that reflects
power modes
• Power domains verified for
isolation and scan integrity
• ATPG can process each
power mode
• Low Power scan vectors
reduce scan-shift power
• Runtime MBIST scheduling
reduces memory test power
• Limited Pin testing reduces
IO power switching
• Level shifters are tested
• Isolation logic stressed
• Retention Flops
Power Aware DFT Power-Aware
Test Model
PD1 PD2
PTAM
Reduce Power
during Test
ATPG for Power
Structures
A B
v1 v2
ISO
Top
PD1
Mem
PD4
PD2
PMU
Core
PD3
SR
Low Power Test
October 24, 2007
18
PSO in std cell based design (cont…)
Power Analysis
• Power-gating – goals/tasks
– Power-switch on – overall IR drop
• Modelling the power-switch as on
• Running IR drop on entire power-grid, both global and switched at once
– Power-up
• Simulate as the power-switch is turned on
• Capture power-switch current behavior
– IR drop effect on global grid and neighbors when block powering-up
• Use captured current behavior from previous step and feed into rail analysis
October 24, 2007
19
• Impact on IR drop and EM
– Power switches modeled as resistors in power grid view
• Solution flags if switches enter saturation (I/Idsat – PI)
• Support steady-state on and off
• Off-state – use leakage value of switch
• Power Consumption of steady-state on and off
– Power savings in different modes
• Power-up analysis
– Fastmos simulator used for power-up simulation (UltraSim)
– Dynamic currents captured through power switches
• Impact of power-up on global grid
– Dynamic VSDG rail analysis uses captured currents from power-up analysis to
show impact of power-up on surrounding logic
PSO in std cell based design (cont…)
Power Analysis
October 24, 2007
20
How Many Power Switches?
• Two-part approach
1. Steady state analysis
– To monitor IR drop through switches
– VoltageStorm analyzes for IR drop
– VoltageStorm reports power switches
operating in saturation
2. Dynamic analysis
– To monitor  control power ramp-up
– VoltageStorm reports block
“power-on” time
• Too fast  latch-up
• Too slow  limits performance
October 24, 2007
21
Logic
Circuit
Netlist
1. Create circuit netlist
Control
VDD
Circuit
Netlist
Inputs
clamped
Outputs
correctly
loaded
2. Simulate with UltraSim
Load full-chip power RC network
with PGVs and analyze
VDD
4. Analyze top level grid in VSDG
Block Power-up/Down Analysis and
Global Grid Verification
Capture Dynamic
Current in PGV
3. Create Dynamic Power Grid Views
Circuit
Netlist
October 24, 2007
22
MTCMOS Power-On
• PowerMeter generates data to drive spice
simulation using Ultrasim
– Netlist sensitized to the virtual power domain
• Use existing sub-circuit netlist
• Generate sub-circuit netlist from .cl
– Signal loading dspf (lumped C)
– Voltage Source file
– Template Stimulus file
• DC voltages
• PWL for control logic – derived from TWF file
• QX can generate RC network of Virtual power net
– Potential capacity limitations
– Analyse to see Ton differences
– Not used to date
• QX generates RC network of control signals
– Important to capture delay in controlling swithes
• User simulates power-on conditions
– Analyzes ramp-up time to steady-state
• UltraSim also captures current behavior through
power-transistors
– Leverages existing UltraSim commands used
within integration inside VST (.usim_ir)
– Generates binary current data files (.pti)
PowerMeter
QX
UltraSim
RC grid
Netlist
Signal
Loading
Voltage
Sources
Template
Stimulus
Toplevel
Circuit File
Power-transistor
dynamic
currents (pti)
Spice
waveforms
 Results
October 24, 2007
23
PSO for memories
• Why Memory Shut-off
– On-chip memory is increasing
• Memory increase result in higher leakage
– Activity factor for the large memories is less so less active
power
– Memories already have Higher L devices (lesser Sub-threshold
leakage)
– Below 65nm process, Junction leakage starts getting
dominating factor
Reduced standby/average power by power down is absolute necessary
October 24, 2007
24
PSO for memories
 Memory Shut-off can be
 Selective shut-off
 Retention Memory
 Complete memory shut-off
 Memory Shut-off implemented at SOC level
 Tools are competent enough for
implementation
 Key Challenges
 Performance Hit
 RTL functional verification
 Yield is an issue
 Testing is a big issue
 Support for the IRdrop aware timing models
October 24, 2007
25
IO power shut-off
• IO’s are to be grouped together based on architecture
– Set of IO voltages can be shut-down
• Issues
– Board design and pad selection
– ESD
October 24, 2007
26
Dynamic Voltage Frequency Scaling Requires Multi-
Mode Analysis
• Multiple modes need to be
analyzed/optimized for multiple
corners
– Setup analysis for (WC,
1,125C) corner
0.0V
1.08V
125MHz
0.0V
Standby
1.08V
125MHz
1.08V
125 MHz
Drowsy
0.9V
66MHz
1.08V
125 MHz
Dull
1.08V
125MHz
Slow
1.08V
125MHz
Baseline
Core
Mode
• Multiple constraints
(.sdc)
– Example: baseline.sdc,
ios.sdc, dull.sdc,
drowsy.sdc
CORE
DROWSY DULL
• Libraries
– stdcell_1.08sl.lib,
stdcell_0.9sl.lib,
stdcell_1.08fs.lib,
stdcell_0.9fs.lib
October 24, 2007
27
DVFS: Multi-Mode Multi-Corner Flow
Create library set
Define various RC
corners
Define constraint modes
Create analysis views
optDesign/
timeDesign
The library set can be a
single library or a collection
of libraries (ECSM)
Specify PVT condition for
each corner. Specify spef for
each corner
Specify SDC file for each
mode. Same SDC file may
be used or specify 1 SDC
file per domain
Associate a corner with a
mode; Design may have 5
corners and 3 modes, but
only 10 views
Run optimization and timing
checks for concurrent
handling of views
Primary Concerns:
1. Timing Closure
2. Verification
3. Mixed Scenario for Power
Saving (DVFS and PSO
together)
October 24, 2007
28
 Pulsed Latch Design Methodology
Traditional FF is replaced with a pulsed-latch
 Pulse generator is shared by several pulsed-latch
 Dummy clock delay cell is used to balance clock tree
q
t
t
t
d
t
t
cp
q
t
t
t
t
t
t
d
t
t
cp
pulse clock
Pulsed latch
Pulse Generator
Traditional register
Dummy delay
Negative edge FF memory
Advance Low Power Techniques
October 24, 2007
29
 Pulsed Latch: Results
 25% active power reduction by swapping to pulsed latch
 50% of active power is consumed by FF - cut half by pulsed latch
 Power consumption overhead :
 Slew control after pulse generator cell
Slew need to be faster at pulse clock-tree
 Pulse generator cell insertion (addition)
Required # of PG cell is controllable
latency control : slow slew
skew control : fast slew
General clock-tree structure
Pulse generator
insertion point
Clock-Tree image
~5% overhead
Advance Low Power Techniques (Contd..)
October 24, 2007
30
Advance Low Power Techniques (Contd..)
4.38
1.30
3.18
0.41
Conditional Sum
3.38
0.81
2.24
0.36
Carry Select
2.04
0.70
1.59
0.44
Carry Look Ahead
1.88
0.57
1.29
0.44
Variable Block Width
Carry Skip
1.27
0.59
1.06
0.56
Constant Block Width
Carry Skip
1
1
1
1
Ripple Carry
Area
PDP
Power
Delay
Topology
Delay, power, PDP and area of 16-bit adders
normalized to the delay, the power, the PDP and
the area, respectively, of the Ripple Carry Adder
Source: T. Callaway and E. Swartzlander, ”The power consumption of CMOS adders and multipliers”
2.02
0.47
0.95
0.49
Modified Booth
1.93
0.43
0.74
0.58
Wallace Tree
1.43
0.59
0.87
0.68
Split Array
1
1
1
1
Array
Area
PDP
Power
Delay
Topology
Low Power Arithmetic Units:
Delay, power, PDP and area of 16-bit
multipliers normalized to the delay, the
power, the PDP and the area,
respectively, of the Array Multiplier
October 24, 2007
31
Advance Low Power Techniques (Contd..)
October 24, 2007
32
Advance Low Power Techniques (Contd..)
• Double-edge triggered F/Fs (DETFF) can “ideally” save 50% of clock network power
by reducing the clock frequency requirement to half
• However stringent 50% duty-cycle constraint over clock and the area overhead of
DETFF can significantly offset the amount of power saved
• Slower than normal F/Fs due to increased internal and/or output node capacitance
Clock for single-edge F/F with period T
Clock for DTFF with period 2T and 50% duty-cycle
Clock for DTFF with period 2T and 50% duty-cycle
Clock for DTFF with period 2T and 50% duty-cycle
Double-Edge Triggered F/Fs
October 24, 2007
33
Advance Low Power Techniques (Contd..)
 There are Several Other Techniques which are under
exploration/Used
 Thermal Throttling
 Clock Swing Controls
 Clock-on Demand
 Dynamic Threshold
 Generic Bus power reduction IPs
October 24, 2007
34
QA
October 24, 2007
35
BACK-UP
October 24, 2007
36
Development goals
• ARM 1136JF-S IC
– Power optimization methodology leverageable to synthesized digital designs
– Collaborative development: Silicon design chain (Applied Materials, ARM,
Cadence, TSMC)
• ARM 1136JF-S IC PSO
– Power switch-off (PSO) enhancement: Methodology and implementation
• ARM 1176JZF-S IC
– PSO and dynamic voltage and frequency scaling (DVFS) enhancement:
Methodology and implementation
– Facilitate comprehensive methodology across design, verification and
implementation
• Power Forward Initiative (Common Power Format, CPF)
• ARM, AMD, ATI, Applied Materials, Cadence, Calypto, Freescale, Fujitsu, Golden Gate
Technology, NEC Electronics, NXP, Sequence, TSMC
October 24, 2007
37
ARM1136JF-S IC architecture
• ARM1136JF-S microprocessor
– 16k I+D cache, 16 kB TCM; Tag RAMs, TLBs
– ARM, Thumb, DSP instructions; Java
• ETM11 trace macro, ETB11 trace buffer
• Adv. high-performance bus (AHB) bus
– Core AHB Lite ports  AHB I/F (pin access.)
– Access to 128 KB on-chip test RAM: Enable concurrent data
transfers from any four ports
Trace
Full
AHB
Fetch
LSU
1V VDD
~100K cell
+ 44 SRAMs
~3,400 voltage
level-shifting cells
0.8V VDD
~200K cells
• 300 K standard cell instances; 22M
transistors; 44 SRAMs
• IC: 355 MHz typical (90nm standard
CMOS: TSMC 90G)
• Dual VDD domains, dual VT library
October 24, 2007
38
Design methodology overview (1)
• Microprocessor verification
– Set microprocessor code,
memory configurations
– Verify RAM functionality in 90nm process
– Verify microprocessor functionality (RTL)
• Test cases (135K vectors)
• Vector sets generated used subsequently for power
dissipation analysis
• VCD and TCF formats
– Fully verified RTL “golden reference” for Regression
tests / functional verification
• ARM1136JF-S IC
– VDD domain selection and voltage level
shifting cells (VLS) design considerations
– MSV RTL synthesis
– Clock gating
– Timing closure in multi-VDD designs
– Dynamic/static IR drop analysis/optimization
– System-level validation
Timing,
Power
and
Area
Optimization
October 24, 2007
39
Design methodology overview (2)
• ARM1136JF-S IC PSO
– PSO design, verification
– Structured PSO ring methodology
– VLS/isolation cells insertion in synthesis
– Automated placement / insertion: VLS cells, switch cells,
state retention registers
– Automated power stitching
– Automated multi-domain clocks
– Power switch-off, switch-on voltage drop
and transients analysis
• ARM1176JZF-S IC
– PSO management, verification
– Integrate dynamic voltage and frequency scaling function
(DVFS)
– Physical synthesis / optimization and timing analysis
(DVFS)
– Functional integrity verification and test insertion with
power-optimization features
• Vsoc, Vram  1.0V libraries; Vcore  0.8V libraries;
~800 test cases
Timing,
Power
and
Area
Optimization
October 24, 2007
40
• Multiple supply voltage synthesis
– Newly-developed technology
– Single-pass concurrent optimization for timing, area and power
– 0.8 and 1.0 VDD domains, dual-VT cell libraries
• Power optimization in synthesis
– Logic restructuring
– Logic resizing (before clock tree synthesis)
• Buffer removal/resizing
– Transition rate buffering (Buffer slow transition nets)
• Minimize duration in which both pFET and nFET conduct
simultaneously
– Pin swapping
• Apply high transition rate signal nets to low capacitance inputs
• ARM1136JF-S IC cells: 62%, 38% in 0.8V, 1.0V
Pin Swapping (CACC)
A
B
C
X
A
B
C
X
A
B
C
D
E
X
Y
Z
Buffer introduced
to reduce slew
Multiple Supply Voltage (MSV) RTL
synthesis
October 24, 2007
41
VDD domains, clock gating
• 0.8V, 1.0V VDD domains
– Analyze standard cells delay, leakage, standby and
dynamic power (2.5x delta)
– Adequate performance for timing critical nets
– Customization  further improvements feasible
Cell
Delays
(normalized)
• Architectural clock gating included in uP RTL
• Automated design flow  add’l. clock gating
– Inferred from RTL through low-power synthesis
– ~1,000 clock gated cells identified and managed  85%
registers gated
– Shut off dynamic current in quiescent logic
• Clock decloning: 1,112  703 cells (1136 IC)
– Move clock gating to highest hierarchical node of logic tree
 reduce power, insertion delay
October 24, 2007
42
MSV electrical/timing closure (1)
• Automated (VLS) insertion
– For nets traversing VDD domains
– Align cells to avoid n-well spacing violations (domain
perimeter placements)
– Automated multi-VDD power distribution and cell
placements, antenna diode insertion
– ARM1176JZF-S IC: Automated in synthesis
• VLS placement directly affects electrical
performance
– Optimal or detoured routing
– Power-supply-aware timing and multi-VDD supply
constraints  drive placement
– ARM1136JF-S IC: Netlist modified to insert VLS cells
where needed
– ARM1136JF-S IC PSO, ARM1176JZF-S IC: Automated
VLS cells insertion, placement, timing
October 24, 2007
43
MSV electrical/timing closure (2)
• Cell substitution with timing constraint
– Replace standard-VT with high-VT cells
• Net by net basis; same footprint as original cell
• Signal integrity addressed within PR
– ~10 of 500K nets required post-layout optimization
• Effective current source model (ECSM)
instance-specific multiple VDD delay
calculation
– Standard cell libraries characterized for multiple VDD
values at outset
– Numerical model 2% deviation vs. full circuit
simulation
Distribution
(%)
Length, Y/X Ratio
SPICE
ECSM
A C
X
mm.
B
(X
mm.)
A C
B Y
mm. D
October 24, 2007
44
IR drop analysis and optimization
• Grid-specific resistor meshes
• Dynamic power (manage di/dt)
22 mV
1.0V VDD
19 mV
0.8V VDD
VSS
Dynamic IR
drop analysis
October 24, 2007
45
ARM1136JF-S IC validation
• ARM RealView® Validation
System (instrumented system)
• Run applications, measure performance
– ~15,000 system-level validation tests
– Linux (2.4.7, 2.4.19, v6 backport and 2.6.x), WinCE
.NET 4.2 and Symbian OS7 operating systems
– Applications: X-windows, Doom, Pocket Word, and
Pocket Explorer, etc.
• ~40% overall and 46% leakage power reduction
Sim.
Baseline
(90nm)
Sim. LP
(90nm)
Meas. LP
(90nm)
Meas.
Power (130
nm; ARM)
Core 0.28 0.14 0.10 0.60
Other 0.36 0.32 0.21
Total 0.64 0.46 0.31
IC Block
Dynamic Power Dissipation (mW/MHz)
0%
20%
40%
60%
80%
100%
Std. Power Low Power
Norm.
Power
Dissipation
(%)
Leakage (Total)
Switching (Total)
Leakage (Logic)
Switching (Logic)
October 24, 2007
46
ARM1136JF-S IC PSO design
• Automated PSO implementation
– PSO design, functional verification (VLS cells)
– Power, clock distribution
– Static and dynamic power analysis
• Structured ring methodology
– Filler, breaker, corner, switch- or buffer-only
Switches and
Fillers forming
the ring
Internal
power
mesh
Switchable
Power Domain
Switch cell has 2 buffers built-in
with different directions
En_out
En_in
PSO
domain
Pso
switched
block
PSO
switched
block
1.0V
October 24, 2007
47
VLS cells integration
• Multiple height VLS/Isolation cells
• Automatic placement (at domain edge)
• Automatic power/enable connection
Level shifter/Isolation
cell placement
Standard
Cell
0.8V supply
connects to
M4
3-row high
isolation cell
1V VDD
VSS
October 24, 2007
48
MSV optimization
• Cross-domain timing optimization
– Automatically handle conditions shown
• Domain-aware clock tree synthesis
– Automatically handle multi-domain clocks
• Automatic insertion of state retention
registers
– RTL synthesis, implement., verification
– Capability not implemented in this work
Power Domain 0.8V
Libraries A
Power Domain 0.8V
Libraries A
0.8V
I/O
0.8V
I/O
Power Domain 0.8V
Libraries A
Power Domain 1.0V
Libraries B
Don’t touch
nets
Power Domain 1.0V
Libraries B
Power Domain 1.0V
Libraries B
FF
FF
SRPG
FF
SRPG
FF
VDDC (not swtiched)
VDD
Shutdown block
VSS
VSSC not swtiched)
PG
PG
RET
VDD (switched)
October 24, 2007
49
ARM1176JZF-S IC architecture
• ARM1176JZF-S microprocessor
– 16k I+D cache, 16 kB TCM; Tag RAMs, TLBs
– ARM, Thumb, DSP instructions; Java, IEM
• ARM1176JZF-S IC
– ETM11 trace macro, ETB11 trace buffer
– AHB bus I/F through AXI to AHB bridge
Vsoc
1.0V
Voltage
level-shifting cells
Vcore
0.8V
Vram
1.0V
• 360K standard cell instances; 22M
transistors; 46 SRAMs
• IC: 340MHz typical (90nm standard
CMOS: TSMC 90G)
• 3 power domains defined
• Dual VDD domains, dual VT library
ARM1176Main
TestChip
RAM
AXI 
AXI to AHB
Bridge
VIC
1176
MBIST
JTAG, TAP
Boundary Scan
Test Logic
PLL
Clock  Reset
Validation
Coprocessor
Validation
Coprocessor
Dormant Mode
Sequencer
TPIU
ETBM11CS
Clocks and
resets
Debug
interface
TAP I/F
Trace I/F
CP14 I/F
ARM 1176_IC
ETB11 MBIST
ETB11 RAM
ARM1176JZFSImp
IARS: IEM Asynchronous Register Slices
Peripheral
AXI
DMA
AXI
Data
AXI
Instruction
AXI
Cache and TCM
RAMs
Vram
VLS/Clamps
ETM11 CS
October 24, 2007
50
Intelligent energy manager (IEM)
ARM1176JZF-S RTL structure
• ARM1176 IEM: Ease of implementation in present design methodologies
– Asynchronous between voltage domains at different voltages, frequencies
• IEM Asynchronous AXI Register Slices required
– Has logical partitioning for voltage domains
• No logic at the top-level of the design
– Has logical partitioning for level shifters
• Implementer must replace with specific library cells or rely on implementation tools to add
– Has separate clocks and resets per voltage domain
October 24, 2007
51
ARM1176JZF-S IEM configuration overview
• RAM Interface
• Clamps for
dormant mode
support
• Always
Synchronous
• IEM Register
Slices
• Asynchronous for
DVS
• Synchronous when
Vsoc = Vcore
October 24, 2007
52
Additional IEM enabled components
• Level 2 Cache Controller
• Embedded Trace Macrocell
Level 2 Cache Controller
L220
Vcore
Instr. IARS Vcore
VLS
Instr. IARS Vsoc
Data IARS Vcore
VLS
Data IARS Vsoc
October 24, 2007
53
VLS and standard cells placement and clock
design
• Leverage ARM1136JF-S IC PSO design methodology
– Automatic placement (at domain edge)
– Non-integral multiple height rows
• 7, 9, 11-track cells, etc. in the same design
• Clock skew 122ps skew (worst-case, global)
VLS
cells
VCORE-VSOC
PSO
cells
VLS
cells
VRAM-VSOC
October 24, 2007
54
• Power Forward Initiative: Common Power
Format (CPF)
– New method to capture design and
constraint information
– Facilitates comprehensive
methodology across design,
verification, and implementation
– Enables automation and what-if
exploration
– Collaboration/integration across
design/supply chain
– Foundation for an integrated
methodology
R. Goering, “EDA spec describes power” EETimes, May 22, 2006
An effective power management solution
October 24, 2007
55
Formal
Analysis
Acceleration
 Emulation
Simulation
Verification
Coverage
Testbench
Automation
Verification
Chip Integration
Prototyping
Synthesis
Physical Synthesis
Routing
DFT
Analysis
Sign-off
ATPG
Constraint
Design
EC
LVS/DRC/Ext
Physical Implementation
RTL+CPF Gates+CPF
GDSII
Synthesis
SDC Constraint
Generation
Design for Test
SVP
Equivalence
Checking
SDC
Constraint
Validation
Design Creation
Spec CPF
Iterate
Iterate
Gate
RTL
RTL+CPF Gate+CPF
RTL
Coding
RTL+CPF
Coding
Design methodology with CPF
Verify low power
implementation
MPD, MSV, DVFS
Automatic partitioning
of physical design
• Multiple
supply voltage
synthesis
• Level shifter
and power
gate insertion
• Automatic test scheduling  ATPG for power gating cells
• Automatic scan stitching for power domains
October 24, 2007
56
Summary
• Power optimization methodology
delivered ~40% overall and
46% leakage power reduction (ARM1136JF-S IC)
– Single-pass synthesis with concurrent optimization (timing,
power, area); multi-VDD, multi-VT designs
• ARM1136JF-S IC PSO implementation
– Normalized ~98.5% (66x) reduction of leakage power in the
low power region (typical conditions)
– Automated PSO implementation
– Structured ring methodology
• ARM1176JZF-S IC development
– Dynamic voltage and frequency scaling enhancement
methodology and implementation
– Power optimization methodology enhancements
• IEM; synthesis, test, formal verification, clocks, timing
closure, electrical/physical design; CPF
PSO
VDD
0.8V
VDD
1.0V
VLS
October 24, 2007
57
Acknowledgments and references
• Acknowledgments
– We thank C. Chu, A. Gupta, J. Goodenough, A. Harry, C. Hopkins, L. Jensen, T. Valind, L.
Milano, A. Iyer, P. Mamtora, J. Willis, M. McAweeney, R. Williams,
I. Devereux and the ARM Physical IP team for their contributions
• References
– A. Khan et al., “A 90nm Power Optimization Methodology with Application to the ARM 1136JF-S Microprocessor,”
In IEEE Journal of Solid State Circuits, Vol. 41, No. 8, pp. 1707 – 1717, August 2006
– A. Khan et al., “A 90nm Power Optimization Methodology and its’ Application to the ARM 1136JF-S
Microprocessor,” Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, September 21,
2005
– Gartner- WW ASIC/ASSP, FPGA/PLD and SLI/SOC App. Fcst., 1Q04
– B. Calhoun, “Ultra-Dynamic Voltage Scaling Using Sub-threshold Operation and Local Voltage Dithering in 90nm
CMOS,” ISSCC, 2/05
– S. Henzler, “Sleep Transistor Circuits for Fine-Grained Power Switch-Off with Short Power-Down Times,” ISSCC,
Feb. 05
– http://www.arm.com/pdfs/DUI0273B_core_tile_user_guide.pdf.
– A. Khan et al., “Design and Development of 130-nanometer ICs for a Multi-Gigabit Switching Network System,”
CICC, Oct. 04
– D. Desharnais, ”Nanometer IC routing requires new approaches,” EEDesign.com, Dec. 03
– A. Khan et al., “A 150 MHz Graphics Rendering Processor with 256Mb Embedded DRAM,“ ISSCC, Feb. 2001
– G. Paul, et al., “A Scalable 160Gb/s Switch Fabric Processor with 320Gb/s Memory Bandwidth,” ISSCC, Feb. 04
October 24, 2007
58
PSO in std cell based design
RTL Model
Gate Netlist
Synthesis
Level shifters
– Placement
– Location
– Connectivity
Isolation cells
– Placement
– Isolation type
– Isolation function
State retention cells
– Placement
– Retention function
Miscellaneous
– Floating nets / pins
Logical Netlist
Level shifters
– Placement/Location
– Power connectivity
– Level Shifter function
Isolation cells
– Placement/type
– Power connectivity
– Isolation function
State retention cells
– Placement
– Power connectivity
– Retention function
Miscellaneous
– Power switches
– Shorts b/n VDD/VSS
Physical Netlist
EC
Gate Netlist
Place  Route
EC
October 24, 2007
59
lowpower consumption and details of dfferent power pdf

More Related Content

Similar to lowpower consumption and details of dfferent power pdf

Track 2 session 6 - st dev con 2016 - wireless charging technologies
Track 2   session 6 - st dev con 2016 - wireless charging technologies Track 2   session 6 - st dev con 2016 - wireless charging technologies
Track 2 session 6 - st dev con 2016 - wireless charging technologies ST_World
 
Sensors expo-2013-engineering-ultra-low-power-so c-sensors
Sensors expo-2013-engineering-ultra-low-power-so c-sensorsSensors expo-2013-engineering-ultra-low-power-so c-sensors
Sensors expo-2013-engineering-ultra-low-power-so c-sensorsSCGRADY
 
Module 1 introduction
Module 1 introductionModule 1 introduction
Module 1 introductionAmreen Khanam
 
Module 1 introduction to Power Electronics
Module 1 introduction to Power ElectronicsModule 1 introduction to Power Electronics
Module 1 introduction to Power ElectronicsZahiraTabassum1
 
Power Electronics and Switch Mode Power Supply
Power Electronics and Switch Mode Power SupplyPower Electronics and Switch Mode Power Supply
Power Electronics and Switch Mode Power SupplyLiving Online
 
Short-Circuit Protective Device Coordination & Arc Flash Analysis
Short-Circuit Protective Device Coordination & Arc Flash AnalysisShort-Circuit Protective Device Coordination & Arc Flash Analysis
Short-Circuit Protective Device Coordination & Arc Flash AnalysisPower System Operation
 
chapter_1 Intro. to electonic Devices.ppt
chapter_1 Intro. to electonic Devices.pptchapter_1 Intro. to electonic Devices.ppt
chapter_1 Intro. to electonic Devices.pptLiewChiaPing
 
Power transformers rating
Power transformers ratingPower transformers rating
Power transformers ratingLeonardo ENERGY
 
Next105 Ases Power Point Presentation Internal Final
Next105 Ases Power Point Presentation Internal FinalNext105 Ases Power Point Presentation Internal Final
Next105 Ases Power Point Presentation Internal FinalNextronex Inc
 
Selective Coodination
Selective CoodinationSelective Coodination
Selective Coodinationmichaeljmack
 
Short Circuit, Protective Device Coordination
Short Circuit, Protective Device CoordinationShort Circuit, Protective Device Coordination
Short Circuit, Protective Device Coordinationmichaeljmack
 

Similar to lowpower consumption and details of dfferent power pdf (20)

Chapter 10.pptx
Chapter 10.pptxChapter 10.pptx
Chapter 10.pptx
 
Track 2 session 6 - st dev con 2016 - wireless charging technologies
Track 2   session 6 - st dev con 2016 - wireless charging technologies Track 2   session 6 - st dev con 2016 - wireless charging technologies
Track 2 session 6 - st dev con 2016 - wireless charging technologies
 
FACTS Controller.
FACTS Controller.FACTS Controller.
FACTS Controller.
 
Sensors expo-2013-engineering-ultra-low-power-so c-sensors
Sensors expo-2013-engineering-ultra-low-power-so c-sensorsSensors expo-2013-engineering-ultra-low-power-so c-sensors
Sensors expo-2013-engineering-ultra-low-power-so c-sensors
 
Module 1 introduction
Module 1 introductionModule 1 introduction
Module 1 introduction
 
Module 1 introduction to Power Electronics
Module 1 introduction to Power ElectronicsModule 1 introduction to Power Electronics
Module 1 introduction to Power Electronics
 
Power Electronics and Switch Mode Power Supply
Power Electronics and Switch Mode Power SupplyPower Electronics and Switch Mode Power Supply
Power Electronics and Switch Mode Power Supply
 
Low power
Low powerLow power
Low power
 
VLSI Power Reduction
VLSI Power ReductionVLSI Power Reduction
VLSI Power Reduction
 
IEEE Blue Book
IEEE Blue BookIEEE Blue Book
IEEE Blue Book
 
Short-Circuit Protective Device Coordination & Arc Flash Analysis
Short-Circuit Protective Device Coordination & Arc Flash AnalysisShort-Circuit Protective Device Coordination & Arc Flash Analysis
Short-Circuit Protective Device Coordination & Arc Flash Analysis
 
SoC Power Reduction
SoC Power ReductionSoC Power Reduction
SoC Power Reduction
 
chapter_1 Intro. to electonic Devices.ppt
chapter_1 Intro. to electonic Devices.pptchapter_1 Intro. to electonic Devices.ppt
chapter_1 Intro. to electonic Devices.ppt
 
Power transformers rating
Power transformers ratingPower transformers rating
Power transformers rating
 
Next105 Ases Power Point Presentation Internal Final
Next105 Ases Power Point Presentation Internal FinalNext105 Ases Power Point Presentation Internal Final
Next105 Ases Power Point Presentation Internal Final
 
Selective Coodination
Selective CoodinationSelective Coodination
Selective Coodination
 
Short Circuit, Protective Device Coordination
Short Circuit, Protective Device CoordinationShort Circuit, Protective Device Coordination
Short Circuit, Protective Device Coordination
 
8891.ppt
8891.ppt8891.ppt
8891.ppt
 
Lect_01_Intro.ppt
Lect_01_Intro.pptLect_01_Intro.ppt
Lect_01_Intro.ppt
 
Logic families
Logic familiesLogic families
Logic families
 

Recently uploaded

Chapter 19_DDA_TOD Policy_First Draft 2012.pdf
Chapter 19_DDA_TOD Policy_First Draft 2012.pdfChapter 19_DDA_TOD Policy_First Draft 2012.pdf
Chapter 19_DDA_TOD Policy_First Draft 2012.pdfParomita Roy
 
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun serviceCALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun serviceanilsa9823
 
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call GirlsCBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girlsmodelanjalisharma4
 
Verified Trusted Call Girls Adugodi💘 9352852248 Good Looking standard Profil...
Verified Trusted Call Girls Adugodi💘 9352852248  Good Looking standard Profil...Verified Trusted Call Girls Adugodi💘 9352852248  Good Looking standard Profil...
Verified Trusted Call Girls Adugodi💘 9352852248 Good Looking standard Profil...kumaririma588
 
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...ranjana rawat
 
Kindergarten Assessment Questions Via LessonUp
Kindergarten Assessment Questions Via LessonUpKindergarten Assessment Questions Via LessonUp
Kindergarten Assessment Questions Via LessonUpmainac1
 
SD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptxSD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptxjanettecruzeiro1
 
Fashion trends before and after covid.pptx
Fashion trends before and after covid.pptxFashion trends before and after covid.pptx
Fashion trends before and after covid.pptxVanshNarang19
 
The history of music videos a level presentation
The history of music videos a level presentationThe history of music videos a level presentation
The history of music videos a level presentationamedia6
 
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779Delhi Call girls
 
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...Suhani Kapoor
 
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...BarusRa
 
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Call Girls in Nagpur High Profile
 
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...babafaisel
 
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk GurgaonCheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk GurgaonDelhi Call girls
 
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service BhiwandiVIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service BhiwandiSuhani Kapoor
 
Presentation.pptx about blender what is blender
Presentation.pptx about blender what is blenderPresentation.pptx about blender what is blender
Presentation.pptx about blender what is blenderUbaidurrehman997675
 
SCRIP Lua HTTP PROGRACMACION PLC WECON CA
SCRIP Lua HTTP PROGRACMACION PLC  WECON CASCRIP Lua HTTP PROGRACMACION PLC  WECON CA
SCRIP Lua HTTP PROGRACMACION PLC WECON CANestorGamez6
 

Recently uploaded (20)

Chapter 19_DDA_TOD Policy_First Draft 2012.pdf
Chapter 19_DDA_TOD Policy_First Draft 2012.pdfChapter 19_DDA_TOD Policy_First Draft 2012.pdf
Chapter 19_DDA_TOD Policy_First Draft 2012.pdf
 
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun serviceCALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
 
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
 
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call GirlsCBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
 
Verified Trusted Call Girls Adugodi💘 9352852248 Good Looking standard Profil...
Verified Trusted Call Girls Adugodi💘 9352852248  Good Looking standard Profil...Verified Trusted Call Girls Adugodi💘 9352852248  Good Looking standard Profil...
Verified Trusted Call Girls Adugodi💘 9352852248 Good Looking standard Profil...
 
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
 
Kindergarten Assessment Questions Via LessonUp
Kindergarten Assessment Questions Via LessonUpKindergarten Assessment Questions Via LessonUp
Kindergarten Assessment Questions Via LessonUp
 
escort service sasti (*~Call Girls in Prasad Nagar Metro❤️9953056974
escort service sasti (*~Call Girls in Prasad Nagar Metro❤️9953056974escort service sasti (*~Call Girls in Prasad Nagar Metro❤️9953056974
escort service sasti (*~Call Girls in Prasad Nagar Metro❤️9953056974
 
SD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptxSD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptx
 
Fashion trends before and after covid.pptx
Fashion trends before and after covid.pptxFashion trends before and after covid.pptx
Fashion trends before and after covid.pptx
 
The history of music videos a level presentation
The history of music videos a level presentationThe history of music videos a level presentation
The history of music videos a level presentation
 
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
 
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
 
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...
 
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
 
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
 
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk GurgaonCheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk Gurgaon
 
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service BhiwandiVIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
 
Presentation.pptx about blender what is blender
Presentation.pptx about blender what is blenderPresentation.pptx about blender what is blender
Presentation.pptx about blender what is blender
 
SCRIP Lua HTTP PROGRACMACION PLC WECON CA
SCRIP Lua HTTP PROGRACMACION PLC  WECON CASCRIP Lua HTTP PROGRACMACION PLC  WECON CA
SCRIP Lua HTTP PROGRACMACION PLC WECON CA
 

lowpower consumption and details of dfferent power pdf

  • 1. I N VE N TI V E Low Power ICD Talks Vivek Shukla
  • 2. October 24, 2007 2 Topics Introduction Power Dissipation basic Existing Low Power Techniques and Issues for Advance LP Techniques (under exploration)
  • 3. October 24, 2007 3 Introduction 0.1 1 10 100 1000 1970 1980 1990 2000 2010 2020 Power (Watts) 1000's of Watts? 8080 8086 386 Pentium® proc Pentium® 4 proc Unconstrained power will reach 1,000’s of watts Unconstrained power will reach 1,000 Unconstrained power will reach 1,000’ ’s of watts s of watts
  • 4. October 24, 2007 4 Power Density will Get Even Worse Hot Plate Hot Plate Nuclear Reactor Nuclear Reactor Rocket Nozzle Rocket Nozzle Sun Sun’ ’s Surface s Surface 4004 4004 8008 8008 8080 8080 8085 8085 8086 8086 286 286 386 386 486 486 Pentium Pentium® ® processors processors 1 1 10 10 100 100 1,000 1,000 10,000 10,000 ’ ’70 70 ’ ’80 80 ’ ’90 90 ’ ’00 00 ’ ’10 10 Power Density Power Density (W/cm2) (W/cm2)
  • 5. October 24, 2007 5 Motivation • Portability – Extending battery life • Battery technologies scales-up slowly – 150Wh/kg today vs. 75Wh/kg in 1990 • 1 Kg Ni Cad battery could power 1 hrs for P4 can power Centrino for 4 Hour – Low power dissipation as a product feature in itself – Enabling portable devices to be more powerful and feature-rich • Packaging – High power dissipation leads to expensive packaging and cooling systems • ~ 1W: inexpensive plastic package limit • ~ 10W: Ceramic package limit • ~ 10W/cm2: limit for convection cooling • ~ 50W/cm2: limit for forced-air cooling • Reliability – High Product life time
  • 6. October 24, 2007 6 Sources of Power Dissipation in CMOS Power in a CMOS inverter is governed by the 3 part equation above • Dynamic (switching) power – Currently the largest part, but percentage getting smaller • Leakage Power – Subthreshold conduction – getting bigger due to aggressive scaling, temperature, etc. – Reverse leakage of diodes (relatively small) – Possible gate tunneling current in future technologies • Short-circuit (crowbar) current – Both pull-up and pull-down devices are partially conducting for a small, but finite amount of time – Can be modeled as some fraction of dynamic current Ptotal = CLVDD 2fclka01 + VDDIshort-circuit + VDDIleakage
  • 7. October 24, 2007 7 Sources of Power Dissipation: Switching • One half of the power from the supply is consumed in the pull-up network (PMOS) and the remaining half is stored in CL when Vout makes 10 transition • During 01 transition the charge stored in CL is dumped via the pull-down network (NMOS) • Power = (Energy/Transition)*(Transition Rate) = CLVDD 2 * f01 = CLVDD 2 * fclk* a01 = CswitchedVDD 2fclk where Cswitched = CL*a01 and a01 = probability of 01 transition • Dynamic power therefore can be reduced by – Scaling down the supply voltage VDD – Reducing the switching probability thru’ architectural means – Scaling down the frequency as per throughput demands – Optimizing/reducing the load capacitance (Device Scaling)
  • 8. October 24, 2007 8 Sources of Power Dissipation: Short-Circuit • Due to finite input transition time both NMOS and PMOS conduct for a small, but finite duration, thus providing a resistive path btw VDD and GND • Typically less than 10% of the total dynamic power • The short-circuit current Isc depends on the ratio of input to output transition times (higher the ratio, more is the duration for which both the devices are ON, higher the dissipation due to short-circuit current) • Can be minimized by balancing out the input and output rise times • Can be virtually eliminated by making VDD less than (VTN+|VTP|)
  • 9. October 24, 2007 9 Sources of Power Dissipation: Leakage I1 : pn junction reverse bias current I2 : Subthreshold conduction due to weak inversion I3 : Drain-induced barrier lowering (DIBL) I4 : Gate-induced drain leakage (GIDL) I5 : Punchthrough I6 : Narrow width effect I7 : Gate oxide tunneling I8 : Hot carrier injection (HCI) • Significant contributor to standby power • The most dominant one among these is the subthreshold leakage current (I2) due to constant lowering of VTH with scaling (see the exponential dependence over VTH and also see the sensitivity w/ temperature) • There are several techniques to contain this viz. using dual-VT, multi-VT libraries, using MTCMOS technology, using VTCMOS technologies, using Back-biasing etc.
  • 10. October 24, 2007 10 Medium-High High High Medium Low None None Synth, Formal Test Impact High High Medium- high Medium Low Low None Implement. impact Medium High High Low None None None Verification impact 10% 10% - 10X Substrate Biasing 10% 0% 40-70% 2-3X Dynamic and Adaptive Voltage Frequency Scaling (DVFS and AVS) 5-15% 4-8% ~0% 10-50X Power shut-off (PSO) 10% 0% 40-50% 2X Multi-supply voltage (MSV) 2% 0% 20% 0X Clock gating 2 to -2% 0% 0% 6X Multi-Vt optimization -10% 0% 10% 1.1X Area optimization Area penalty Timing penalty Dynamic power Leakage power Power reduction technique Low-Power Techniques Basic Advanced
  • 11. October 24, 2007 11 PSO in std cell based design Fine Grain Power Switches -Eg. Coarse Grain Power Switches Buffered µ µ µ µSwitch Un-buffered µ µ µ µSwitch Virtual Vss Real VSS Real VSS Virtual Vss A Z SLEEP A1 Z SLEEP A2 Real VSS Real VSS SLEEP SLEEP 5% 30% Power gate leakage Needs to be addressed No issue Simultaneous switching capacitance Always-on buffer by abutment Always-on buffer network Gate control slew rate Actual switching (5% area) Worst case switching (30% area) Power gate size Coarse grain Fine grain
  • 12. October 24, 2007 12 PSO in std cell based design (contd..) D: Active B: FF Vss Vdd VddC PD3 – Shut down FF SRP G FF Iso. iso_en shutoff PSE En_in Column Pitch (200um) Left Offset (150um) PD1 En_out_1 En_out_2 En_out_3 En_in Column Pitch (200um) Left Offset (150um) PD1 En_out_1 En_out_2 En_out_3 Switchable Power Domain (PD1) En_in En_out Note: switch cell has 2 buffers built-in with different directions Switchable Power Domain (PD1) En_in En_out Note: switch cell has 2 buffers built-in with different directions Filler Forms contiguous ring Prevents additional leakage Breaker Divides into separate gate control groups Used with feed-thru enable signal Corner Acts like corner cell in pad ring Buffer-only (no switch) / switch-only (no buffer) Allows flexible control of buffer tree
  • 13. October 24, 2007 13 PSO in std cell based design (contd..) • Ring – Ring(s) of switches enclose the power domain fully or partially – Switches placed outside the power domain – Switch cell treated as hard macro – Often used with hard macros (not allowed to touch inside) – More IRdrop – Better current distribution • Column – Columns of switches inside power domain – Switches placed in the standard cell rows – Switch is a standard cell – Often used inside hierarchical (soft) blocks – Lesser IRdrop – More prone to rush current issue – Needs careful EM checks
  • 14. October 24, 2007 14 • Key in PSO design apart of PSO insertion – Power up and rush current • Dynamic IRdrop becoming must • Optimum no’s of switch • Smooth power up – Verification • RTL simulations • Low power insertion checks • CPF verification – DFM • More power rails • Stacked via requirements • EM – Testability • Coverage on the logic on the Restore and Save signals – ESD – IRdrop aware timing analysis PSO in std cell based design (contd..)
  • 15. October 24, 2007 15 PSO in std cell based design (contd…) PDM2 PDM1 Good Missing OFF ON LH ISO 1.2V 0.8V iB PD ISO_EN PMM iA 1 0 X Structural/Rule Checking • User defines rules for crossings, isolation type, and location Conformal LP reports missing or redundant isolation/ level shifter cells Conformal LP reports wrong isolation cell type Conformal LP reports bad level shifter direction Conformal LP reports wrong isolation cell / level shifter domain location Low Power Insertion Checks
  • 16. October 24, 2007 16 PSO in std cell based design • Equivalence checking for Low Power design – Ensure low power optimizations do not introduce logical errors – Verify gated clocks, gated signals, de-cloning, and re-cloning of gated clocks – Check State Retention mapping from RTL to gate – Check corresponding presence of Isolation and level shifter during implementation Silicon Virtual Prototype Power Routing Low Power Clock Tree Synthesis Domain-aware Post-CTS Optimization IR-Aware Timing/SI Opt. Decap insertion Sign-off Switch cell Insertion (for MTCMOS) Placement including SRPG/Level shifters/Isol. cells Top-down Single-pass Synthesis Power Grid Synthesis Domain Aware NanoRoute Conformal Low Power • Power domain structural and functional checks – Ensure proper insertion of low power cells – Ensure proper connectivity of low power cells – Formally validate isolation function – Formally validate state retention function – Supports both logical and physical (power aware) netlists • Transistor Electrical Verification – Detect Sneak (leakage) paths across power domain boundaries Low Power Insertion Checks
  • 17. October 24, 2007 17 PSO in std cell based design (contd...) Test the Low Power Design, Reduce Power During Test • Insert the required Power- Aware test DFT • Test Access Mechanism (PTAM) • Power-aware scan chains • Encounter Test Model has test modes that reflects power modes • Power domains verified for isolation and scan integrity • ATPG can process each power mode • Low Power scan vectors reduce scan-shift power • Runtime MBIST scheduling reduces memory test power • Limited Pin testing reduces IO power switching • Level shifters are tested • Isolation logic stressed • Retention Flops Power Aware DFT Power-Aware Test Model PD1 PD2 PTAM Reduce Power during Test ATPG for Power Structures A B v1 v2 ISO Top PD1 Mem PD4 PD2 PMU Core PD3 SR Low Power Test
  • 18. October 24, 2007 18 PSO in std cell based design (cont…) Power Analysis • Power-gating – goals/tasks – Power-switch on – overall IR drop • Modelling the power-switch as on • Running IR drop on entire power-grid, both global and switched at once – Power-up • Simulate as the power-switch is turned on • Capture power-switch current behavior – IR drop effect on global grid and neighbors when block powering-up • Use captured current behavior from previous step and feed into rail analysis
  • 19. October 24, 2007 19 • Impact on IR drop and EM – Power switches modeled as resistors in power grid view • Solution flags if switches enter saturation (I/Idsat – PI) • Support steady-state on and off • Off-state – use leakage value of switch • Power Consumption of steady-state on and off – Power savings in different modes • Power-up analysis – Fastmos simulator used for power-up simulation (UltraSim) – Dynamic currents captured through power switches • Impact of power-up on global grid – Dynamic VSDG rail analysis uses captured currents from power-up analysis to show impact of power-up on surrounding logic PSO in std cell based design (cont…) Power Analysis
  • 20. October 24, 2007 20 How Many Power Switches? • Two-part approach 1. Steady state analysis – To monitor IR drop through switches – VoltageStorm analyzes for IR drop – VoltageStorm reports power switches operating in saturation 2. Dynamic analysis – To monitor control power ramp-up – VoltageStorm reports block “power-on” time • Too fast latch-up • Too slow limits performance
  • 21. October 24, 2007 21 Logic Circuit Netlist 1. Create circuit netlist Control VDD Circuit Netlist Inputs clamped Outputs correctly loaded 2. Simulate with UltraSim Load full-chip power RC network with PGVs and analyze VDD 4. Analyze top level grid in VSDG Block Power-up/Down Analysis and Global Grid Verification Capture Dynamic Current in PGV 3. Create Dynamic Power Grid Views Circuit Netlist
  • 22. October 24, 2007 22 MTCMOS Power-On • PowerMeter generates data to drive spice simulation using Ultrasim – Netlist sensitized to the virtual power domain • Use existing sub-circuit netlist • Generate sub-circuit netlist from .cl – Signal loading dspf (lumped C) – Voltage Source file – Template Stimulus file • DC voltages • PWL for control logic – derived from TWF file • QX can generate RC network of Virtual power net – Potential capacity limitations – Analyse to see Ton differences – Not used to date • QX generates RC network of control signals – Important to capture delay in controlling swithes • User simulates power-on conditions – Analyzes ramp-up time to steady-state • UltraSim also captures current behavior through power-transistors – Leverages existing UltraSim commands used within integration inside VST (.usim_ir) – Generates binary current data files (.pti) PowerMeter QX UltraSim RC grid Netlist Signal Loading Voltage Sources Template Stimulus Toplevel Circuit File Power-transistor dynamic currents (pti) Spice waveforms Results
  • 23. October 24, 2007 23 PSO for memories • Why Memory Shut-off – On-chip memory is increasing • Memory increase result in higher leakage – Activity factor for the large memories is less so less active power – Memories already have Higher L devices (lesser Sub-threshold leakage) – Below 65nm process, Junction leakage starts getting dominating factor Reduced standby/average power by power down is absolute necessary
  • 24. October 24, 2007 24 PSO for memories Memory Shut-off can be Selective shut-off Retention Memory Complete memory shut-off Memory Shut-off implemented at SOC level Tools are competent enough for implementation Key Challenges Performance Hit RTL functional verification Yield is an issue Testing is a big issue Support for the IRdrop aware timing models
  • 25. October 24, 2007 25 IO power shut-off • IO’s are to be grouped together based on architecture – Set of IO voltages can be shut-down • Issues – Board design and pad selection – ESD
  • 26. October 24, 2007 26 Dynamic Voltage Frequency Scaling Requires Multi- Mode Analysis • Multiple modes need to be analyzed/optimized for multiple corners – Setup analysis for (WC, 1,125C) corner 0.0V 1.08V 125MHz 0.0V Standby 1.08V 125MHz 1.08V 125 MHz Drowsy 0.9V 66MHz 1.08V 125 MHz Dull 1.08V 125MHz Slow 1.08V 125MHz Baseline Core Mode • Multiple constraints (.sdc) – Example: baseline.sdc, ios.sdc, dull.sdc, drowsy.sdc CORE DROWSY DULL • Libraries – stdcell_1.08sl.lib, stdcell_0.9sl.lib, stdcell_1.08fs.lib, stdcell_0.9fs.lib
  • 27. October 24, 2007 27 DVFS: Multi-Mode Multi-Corner Flow Create library set Define various RC corners Define constraint modes Create analysis views optDesign/ timeDesign The library set can be a single library or a collection of libraries (ECSM) Specify PVT condition for each corner. Specify spef for each corner Specify SDC file for each mode. Same SDC file may be used or specify 1 SDC file per domain Associate a corner with a mode; Design may have 5 corners and 3 modes, but only 10 views Run optimization and timing checks for concurrent handling of views Primary Concerns: 1. Timing Closure 2. Verification 3. Mixed Scenario for Power Saving (DVFS and PSO together)
  • 28. October 24, 2007 28 Pulsed Latch Design Methodology Traditional FF is replaced with a pulsed-latch Pulse generator is shared by several pulsed-latch Dummy clock delay cell is used to balance clock tree q t t t d t t cp q t t t t t t d t t cp pulse clock Pulsed latch Pulse Generator Traditional register Dummy delay Negative edge FF memory Advance Low Power Techniques
  • 29. October 24, 2007 29 Pulsed Latch: Results 25% active power reduction by swapping to pulsed latch 50% of active power is consumed by FF - cut half by pulsed latch Power consumption overhead : Slew control after pulse generator cell Slew need to be faster at pulse clock-tree Pulse generator cell insertion (addition) Required # of PG cell is controllable latency control : slow slew skew control : fast slew General clock-tree structure Pulse generator insertion point Clock-Tree image ~5% overhead Advance Low Power Techniques (Contd..)
  • 30. October 24, 2007 30 Advance Low Power Techniques (Contd..) 4.38 1.30 3.18 0.41 Conditional Sum 3.38 0.81 2.24 0.36 Carry Select 2.04 0.70 1.59 0.44 Carry Look Ahead 1.88 0.57 1.29 0.44 Variable Block Width Carry Skip 1.27 0.59 1.06 0.56 Constant Block Width Carry Skip 1 1 1 1 Ripple Carry Area PDP Power Delay Topology Delay, power, PDP and area of 16-bit adders normalized to the delay, the power, the PDP and the area, respectively, of the Ripple Carry Adder Source: T. Callaway and E. Swartzlander, ”The power consumption of CMOS adders and multipliers” 2.02 0.47 0.95 0.49 Modified Booth 1.93 0.43 0.74 0.58 Wallace Tree 1.43 0.59 0.87 0.68 Split Array 1 1 1 1 Array Area PDP Power Delay Topology Low Power Arithmetic Units: Delay, power, PDP and area of 16-bit multipliers normalized to the delay, the power, the PDP and the area, respectively, of the Array Multiplier
  • 31. October 24, 2007 31 Advance Low Power Techniques (Contd..)
  • 32. October 24, 2007 32 Advance Low Power Techniques (Contd..) • Double-edge triggered F/Fs (DETFF) can “ideally” save 50% of clock network power by reducing the clock frequency requirement to half • However stringent 50% duty-cycle constraint over clock and the area overhead of DETFF can significantly offset the amount of power saved • Slower than normal F/Fs due to increased internal and/or output node capacitance Clock for single-edge F/F with period T Clock for DTFF with period 2T and 50% duty-cycle Clock for DTFF with period 2T and 50% duty-cycle Clock for DTFF with period 2T and 50% duty-cycle Double-Edge Triggered F/Fs
  • 33. October 24, 2007 33 Advance Low Power Techniques (Contd..) There are Several Other Techniques which are under exploration/Used Thermal Throttling Clock Swing Controls Clock-on Demand Dynamic Threshold Generic Bus power reduction IPs
  • 36. October 24, 2007 36 Development goals • ARM 1136JF-S IC – Power optimization methodology leverageable to synthesized digital designs – Collaborative development: Silicon design chain (Applied Materials, ARM, Cadence, TSMC) • ARM 1136JF-S IC PSO – Power switch-off (PSO) enhancement: Methodology and implementation • ARM 1176JZF-S IC – PSO and dynamic voltage and frequency scaling (DVFS) enhancement: Methodology and implementation – Facilitate comprehensive methodology across design, verification and implementation • Power Forward Initiative (Common Power Format, CPF) • ARM, AMD, ATI, Applied Materials, Cadence, Calypto, Freescale, Fujitsu, Golden Gate Technology, NEC Electronics, NXP, Sequence, TSMC
  • 37. October 24, 2007 37 ARM1136JF-S IC architecture • ARM1136JF-S microprocessor – 16k I+D cache, 16 kB TCM; Tag RAMs, TLBs – ARM, Thumb, DSP instructions; Java • ETM11 trace macro, ETB11 trace buffer • Adv. high-performance bus (AHB) bus – Core AHB Lite ports AHB I/F (pin access.) – Access to 128 KB on-chip test RAM: Enable concurrent data transfers from any four ports Trace Full AHB Fetch LSU 1V VDD ~100K cell + 44 SRAMs ~3,400 voltage level-shifting cells 0.8V VDD ~200K cells • 300 K standard cell instances; 22M transistors; 44 SRAMs • IC: 355 MHz typical (90nm standard CMOS: TSMC 90G) • Dual VDD domains, dual VT library
  • 38. October 24, 2007 38 Design methodology overview (1) • Microprocessor verification – Set microprocessor code, memory configurations – Verify RAM functionality in 90nm process – Verify microprocessor functionality (RTL) • Test cases (135K vectors) • Vector sets generated used subsequently for power dissipation analysis • VCD and TCF formats – Fully verified RTL “golden reference” for Regression tests / functional verification • ARM1136JF-S IC – VDD domain selection and voltage level shifting cells (VLS) design considerations – MSV RTL synthesis – Clock gating – Timing closure in multi-VDD designs – Dynamic/static IR drop analysis/optimization – System-level validation Timing, Power and Area Optimization
  • 39. October 24, 2007 39 Design methodology overview (2) • ARM1136JF-S IC PSO – PSO design, verification – Structured PSO ring methodology – VLS/isolation cells insertion in synthesis – Automated placement / insertion: VLS cells, switch cells, state retention registers – Automated power stitching – Automated multi-domain clocks – Power switch-off, switch-on voltage drop and transients analysis • ARM1176JZF-S IC – PSO management, verification – Integrate dynamic voltage and frequency scaling function (DVFS) – Physical synthesis / optimization and timing analysis (DVFS) – Functional integrity verification and test insertion with power-optimization features • Vsoc, Vram 1.0V libraries; Vcore 0.8V libraries; ~800 test cases Timing, Power and Area Optimization
  • 40. October 24, 2007 40 • Multiple supply voltage synthesis – Newly-developed technology – Single-pass concurrent optimization for timing, area and power – 0.8 and 1.0 VDD domains, dual-VT cell libraries • Power optimization in synthesis – Logic restructuring – Logic resizing (before clock tree synthesis) • Buffer removal/resizing – Transition rate buffering (Buffer slow transition nets) • Minimize duration in which both pFET and nFET conduct simultaneously – Pin swapping • Apply high transition rate signal nets to low capacitance inputs • ARM1136JF-S IC cells: 62%, 38% in 0.8V, 1.0V Pin Swapping (CACC) A B C X A B C X A B C D E X Y Z Buffer introduced to reduce slew Multiple Supply Voltage (MSV) RTL synthesis
  • 41. October 24, 2007 41 VDD domains, clock gating • 0.8V, 1.0V VDD domains – Analyze standard cells delay, leakage, standby and dynamic power (2.5x delta) – Adequate performance for timing critical nets – Customization further improvements feasible Cell Delays (normalized) • Architectural clock gating included in uP RTL • Automated design flow add’l. clock gating – Inferred from RTL through low-power synthesis – ~1,000 clock gated cells identified and managed 85% registers gated – Shut off dynamic current in quiescent logic • Clock decloning: 1,112 703 cells (1136 IC) – Move clock gating to highest hierarchical node of logic tree reduce power, insertion delay
  • 42. October 24, 2007 42 MSV electrical/timing closure (1) • Automated (VLS) insertion – For nets traversing VDD domains – Align cells to avoid n-well spacing violations (domain perimeter placements) – Automated multi-VDD power distribution and cell placements, antenna diode insertion – ARM1176JZF-S IC: Automated in synthesis • VLS placement directly affects electrical performance – Optimal or detoured routing – Power-supply-aware timing and multi-VDD supply constraints drive placement – ARM1136JF-S IC: Netlist modified to insert VLS cells where needed – ARM1136JF-S IC PSO, ARM1176JZF-S IC: Automated VLS cells insertion, placement, timing
  • 43. October 24, 2007 43 MSV electrical/timing closure (2) • Cell substitution with timing constraint – Replace standard-VT with high-VT cells • Net by net basis; same footprint as original cell • Signal integrity addressed within PR – ~10 of 500K nets required post-layout optimization • Effective current source model (ECSM) instance-specific multiple VDD delay calculation – Standard cell libraries characterized for multiple VDD values at outset – Numerical model 2% deviation vs. full circuit simulation Distribution (%) Length, Y/X Ratio SPICE ECSM A C X mm. B (X mm.) A C B Y mm. D
  • 44. October 24, 2007 44 IR drop analysis and optimization • Grid-specific resistor meshes • Dynamic power (manage di/dt) 22 mV 1.0V VDD 19 mV 0.8V VDD VSS Dynamic IR drop analysis
  • 45. October 24, 2007 45 ARM1136JF-S IC validation • ARM RealView® Validation System (instrumented system) • Run applications, measure performance – ~15,000 system-level validation tests – Linux (2.4.7, 2.4.19, v6 backport and 2.6.x), WinCE .NET 4.2 and Symbian OS7 operating systems – Applications: X-windows, Doom, Pocket Word, and Pocket Explorer, etc. • ~40% overall and 46% leakage power reduction Sim. Baseline (90nm) Sim. LP (90nm) Meas. LP (90nm) Meas. Power (130 nm; ARM) Core 0.28 0.14 0.10 0.60 Other 0.36 0.32 0.21 Total 0.64 0.46 0.31 IC Block Dynamic Power Dissipation (mW/MHz) 0% 20% 40% 60% 80% 100% Std. Power Low Power Norm. Power Dissipation (%) Leakage (Total) Switching (Total) Leakage (Logic) Switching (Logic)
  • 46. October 24, 2007 46 ARM1136JF-S IC PSO design • Automated PSO implementation – PSO design, functional verification (VLS cells) – Power, clock distribution – Static and dynamic power analysis • Structured ring methodology – Filler, breaker, corner, switch- or buffer-only Switches and Fillers forming the ring Internal power mesh Switchable Power Domain Switch cell has 2 buffers built-in with different directions En_out En_in PSO domain Pso switched block PSO switched block 1.0V
  • 47. October 24, 2007 47 VLS cells integration • Multiple height VLS/Isolation cells • Automatic placement (at domain edge) • Automatic power/enable connection Level shifter/Isolation cell placement Standard Cell 0.8V supply connects to M4 3-row high isolation cell 1V VDD VSS
  • 48. October 24, 2007 48 MSV optimization • Cross-domain timing optimization – Automatically handle conditions shown • Domain-aware clock tree synthesis – Automatically handle multi-domain clocks • Automatic insertion of state retention registers – RTL synthesis, implement., verification – Capability not implemented in this work Power Domain 0.8V Libraries A Power Domain 0.8V Libraries A 0.8V I/O 0.8V I/O Power Domain 0.8V Libraries A Power Domain 1.0V Libraries B Don’t touch nets Power Domain 1.0V Libraries B Power Domain 1.0V Libraries B FF FF SRPG FF SRPG FF VDDC (not swtiched) VDD Shutdown block VSS VSSC not swtiched) PG PG RET VDD (switched)
  • 49. October 24, 2007 49 ARM1176JZF-S IC architecture • ARM1176JZF-S microprocessor – 16k I+D cache, 16 kB TCM; Tag RAMs, TLBs – ARM, Thumb, DSP instructions; Java, IEM • ARM1176JZF-S IC – ETM11 trace macro, ETB11 trace buffer – AHB bus I/F through AXI to AHB bridge Vsoc 1.0V Voltage level-shifting cells Vcore 0.8V Vram 1.0V • 360K standard cell instances; 22M transistors; 46 SRAMs • IC: 340MHz typical (90nm standard CMOS: TSMC 90G) • 3 power domains defined • Dual VDD domains, dual VT library ARM1176Main TestChip RAM AXI AXI to AHB Bridge VIC 1176 MBIST JTAG, TAP Boundary Scan Test Logic PLL Clock Reset Validation Coprocessor Validation Coprocessor Dormant Mode Sequencer TPIU ETBM11CS Clocks and resets Debug interface TAP I/F Trace I/F CP14 I/F ARM 1176_IC ETB11 MBIST ETB11 RAM ARM1176JZFSImp IARS: IEM Asynchronous Register Slices Peripheral AXI DMA AXI Data AXI Instruction AXI Cache and TCM RAMs Vram VLS/Clamps ETM11 CS
  • 50. October 24, 2007 50 Intelligent energy manager (IEM) ARM1176JZF-S RTL structure • ARM1176 IEM: Ease of implementation in present design methodologies – Asynchronous between voltage domains at different voltages, frequencies • IEM Asynchronous AXI Register Slices required – Has logical partitioning for voltage domains • No logic at the top-level of the design – Has logical partitioning for level shifters • Implementer must replace with specific library cells or rely on implementation tools to add – Has separate clocks and resets per voltage domain
  • 51. October 24, 2007 51 ARM1176JZF-S IEM configuration overview • RAM Interface • Clamps for dormant mode support • Always Synchronous • IEM Register Slices • Asynchronous for DVS • Synchronous when Vsoc = Vcore
  • 52. October 24, 2007 52 Additional IEM enabled components • Level 2 Cache Controller • Embedded Trace Macrocell Level 2 Cache Controller L220 Vcore Instr. IARS Vcore VLS Instr. IARS Vsoc Data IARS Vcore VLS Data IARS Vsoc
  • 53. October 24, 2007 53 VLS and standard cells placement and clock design • Leverage ARM1136JF-S IC PSO design methodology – Automatic placement (at domain edge) – Non-integral multiple height rows • 7, 9, 11-track cells, etc. in the same design • Clock skew 122ps skew (worst-case, global) VLS cells VCORE-VSOC PSO cells VLS cells VRAM-VSOC
  • 54. October 24, 2007 54 • Power Forward Initiative: Common Power Format (CPF) – New method to capture design and constraint information – Facilitates comprehensive methodology across design, verification, and implementation – Enables automation and what-if exploration – Collaboration/integration across design/supply chain – Foundation for an integrated methodology R. Goering, “EDA spec describes power” EETimes, May 22, 2006 An effective power management solution
  • 55. October 24, 2007 55 Formal Analysis Acceleration Emulation Simulation Verification Coverage Testbench Automation Verification Chip Integration Prototyping Synthesis Physical Synthesis Routing DFT Analysis Sign-off ATPG Constraint Design EC LVS/DRC/Ext Physical Implementation RTL+CPF Gates+CPF GDSII Synthesis SDC Constraint Generation Design for Test SVP Equivalence Checking SDC Constraint Validation Design Creation Spec CPF Iterate Iterate Gate RTL RTL+CPF Gate+CPF RTL Coding RTL+CPF Coding Design methodology with CPF Verify low power implementation MPD, MSV, DVFS Automatic partitioning of physical design • Multiple supply voltage synthesis • Level shifter and power gate insertion • Automatic test scheduling ATPG for power gating cells • Automatic scan stitching for power domains
  • 56. October 24, 2007 56 Summary • Power optimization methodology delivered ~40% overall and 46% leakage power reduction (ARM1136JF-S IC) – Single-pass synthesis with concurrent optimization (timing, power, area); multi-VDD, multi-VT designs • ARM1136JF-S IC PSO implementation – Normalized ~98.5% (66x) reduction of leakage power in the low power region (typical conditions) – Automated PSO implementation – Structured ring methodology • ARM1176JZF-S IC development – Dynamic voltage and frequency scaling enhancement methodology and implementation – Power optimization methodology enhancements • IEM; synthesis, test, formal verification, clocks, timing closure, electrical/physical design; CPF PSO VDD 0.8V VDD 1.0V VLS
  • 57. October 24, 2007 57 Acknowledgments and references • Acknowledgments – We thank C. Chu, A. Gupta, J. Goodenough, A. Harry, C. Hopkins, L. Jensen, T. Valind, L. Milano, A. Iyer, P. Mamtora, J. Willis, M. McAweeney, R. Williams, I. Devereux and the ARM Physical IP team for their contributions • References – A. Khan et al., “A 90nm Power Optimization Methodology with Application to the ARM 1136JF-S Microprocessor,” In IEEE Journal of Solid State Circuits, Vol. 41, No. 8, pp. 1707 – 1717, August 2006 – A. Khan et al., “A 90nm Power Optimization Methodology and its’ Application to the ARM 1136JF-S Microprocessor,” Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, September 21, 2005 – Gartner- WW ASIC/ASSP, FPGA/PLD and SLI/SOC App. Fcst., 1Q04 – B. Calhoun, “Ultra-Dynamic Voltage Scaling Using Sub-threshold Operation and Local Voltage Dithering in 90nm CMOS,” ISSCC, 2/05 – S. Henzler, “Sleep Transistor Circuits for Fine-Grained Power Switch-Off with Short Power-Down Times,” ISSCC, Feb. 05 – http://www.arm.com/pdfs/DUI0273B_core_tile_user_guide.pdf. – A. Khan et al., “Design and Development of 130-nanometer ICs for a Multi-Gigabit Switching Network System,” CICC, Oct. 04 – D. Desharnais, ”Nanometer IC routing requires new approaches,” EEDesign.com, Dec. 03 – A. Khan et al., “A 150 MHz Graphics Rendering Processor with 256Mb Embedded DRAM,“ ISSCC, Feb. 2001 – G. Paul, et al., “A Scalable 160Gb/s Switch Fabric Processor with 320Gb/s Memory Bandwidth,” ISSCC, Feb. 04
  • 58. October 24, 2007 58 PSO in std cell based design RTL Model Gate Netlist Synthesis Level shifters – Placement – Location – Connectivity Isolation cells – Placement – Isolation type – Isolation function State retention cells – Placement – Retention function Miscellaneous – Floating nets / pins Logical Netlist Level shifters – Placement/Location – Power connectivity – Level Shifter function Isolation cells – Placement/type – Power connectivity – Isolation function State retention cells – Placement – Power connectivity – Retention function Miscellaneous – Power switches – Shorts b/n VDD/VSS Physical Netlist EC Gate Netlist Place Route EC