5378086.ppt

Low Power Design
of Integrated Systems
Assoc. Prof. Dimitrios Soudris
dsoudris@ee.duth.gr

Technology Directions:
SIA Roadmap
Year 1999 2002 2005 2008 2011 2014
Feature size (nm) 180 130 100 70 50 35
Logic trans/cm2
6.2M 18M 39M 84M 180M 390M
Cost/trans (mc) 1.735 .580 .255 .110 .049 .022
#pads/chip 1867 2553 3492 4776 6532 8935
Clock (MHz) 1250 2100 3500 6000 10000 16900
Chip size (mm2
) 340 430 520 620 750 900
Wiring levels 6-7 7 7-8 8-9 9 10
Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5
High-perf pow (W) 90 130 160 170 175 183
Battery pow (W) 1.4 2 2.4 2.8 3.2 3.7

Technology Process Evolution
Technology Directions:
SIA Roadmap 2002

Power Consumption
Power consumption

Power Terminology
• Power is the rate at which energy is delivered
or exchanged
» electrical energy is converted to heat energy
during operation
• Power Dissipation - rate at which energy is
taken from the source (Vdd ) and converted
into heat

Why Smaller Power?
• Large Market of Portable devices
– e.g. laptops, mobile phones
• Achieve larger transistor integration
– Pentium IV contains 42 million transistors
– Teraflops chip contains 1.9 billion
transistors
• Need for “green” computers
– 10% of total electrical energy consumed by
PCs

Battery Technology Improvements

The Industry’s Reaction
• Reduce chip capacitance through process scaling
==> Expensive
• Reduce Voltage levels from 5V  3.3V 2V
==> Industry is hard to move (microprocessors,
memory,...)
• Better Circuit Techniques
==> Gated clocks, Power-Down of non-operational
units…
• Example: IBM 80 MHz PowerPC RISC (3 W @ 3.3V)
–Power Management Logic determines activity on per cycle basis
–Clocks of idle blocks are turned off  12-30% savings
–Doze - Nap and Sleep mode (5 mW)

Example: Intel Pentium-II processor
• Pentium-1: 15 Watt (5V - 66MHz)
• Pentium-2: 8 Watt (3.3V- 133 MHz)

Where Does Power Go in CMOS?
• The power consumption in digital CMOS circuits
Pavg = Pdynamic + Pshort-circuit + Pleakage
• Dynamic Power Consumption
• Short Circuit Currents
• Leakage (Static)
Charging and Discharging Capacitors
Short Circuit Path between Supply Rails during Switching
Leaking diodes and transistors

Present & Future in Power
Consumption

Dynamic Power Consumption(1)
• where VDD supply voltage, CL capacitance, N is the average
number of transitions per clock cycle, and f frequency operation
O UT
CL
Charging
current
O UT
CL
Discharging
current
(b) (c)
IN O UT
CL
(a)
Vdd
Vdd
Vdd
P C V N f
dynamic L dd
   
2

• For technologies up to 0.35 m, the dynamic
consumption is about 80% of the total consumption
• Goal ===> reduce dynamic power consumption
– reduction capacitance
– reduction of supply voltage
– reduction of frequency
– reduction of switching activity
– or combination of above factors
Dynamic Power Consumption (2)

Leakage current consumption
• the reverse-bias diode leakage at the transistor
drains and
• the sub-threshold current through an turned-off
transistor channel
p+ p+
n-type substrate
+
Vdd
leakage
current
reversed-biased diode
(drain-substrate)
gate
The leakage of a reverse-biased pMOS transistor.
0.5 1 1.5 2
0
10-15
10-13
10-9
10-11
10-7
10-3
10-5
Subthreshold
region
Saturated
region
Decreasing V DS
, Vdd
Log ID
VGS, volts
Subthreshold leakage with respect to gate-source
voltage

The Design Flow
System
Specifications
System-Level Design
Architecture-Level
Design
Logic-Level Design
Circuit-Level Design /
Layout synthesis
System
Specifications
System-Level Design
System-Level
Analysis/Estimation
Architecture-Level
Design
Architecture-Level
Analysis/Estimation
Logic-Level Design
Logic-Level
Analysis/Estimation
Circuit-Level Design /
Layout synthesis
Circuit-Level
Analysis/Estimation
Power models
for S ystem-level
components
Power models
for macrocells,
control logic
Power models
for gates, cells
(a)
(b)

Power savings in terms of the design level
Systemlevel
Behavior level
Logic level
Transistor level
Layout level
RTlevel
10-20 x
2-5 x
20-50%
Increasing
power
savings

Lower Vdd Increases Delay
CL * Vdd
I
=
Td
Td(Vdd=5)
Td(Vdd=2)
=
(2) * (5 - 0.7)2
(5) * (2 - 0.7)2
 4
I ~ (Vdd - Vt)2
Relatively independent of logic function and style.
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
5.50
6.00
6.50
7.00
7.50
2.00 4.00 6.00
Vdd (volts)
NORMALIZED
DELAY
adder (SPICE)
microcoded DSP chip
multiplier
adder
ring oscillator
clock generator
2.0m technology

P x td = Et = CL * Vdd
2
E(Vdd=2)
=
(CL) * (2)2
(CL) * (5)2
E(Vdd=5)
Strong function of voltage (V2
dependence).
Relatively independent of logic function and style.
E(Vdd=2)  0.16 E(Vdd =5)
0.03
0.05
0.07
0.1
0.15
0.20
0.30
0.50
0.70
1.00
1.5
1 2 5
51 stage ring oscillator
8-bit adder
Vdd (volts)
quadratic dependence
NORMALIZED
POWER-DELAY
PRODUCT
Power Delay Product Improves with lowering VDD.
Reducing Vdd

Lowering the Threshold
DESIGN FOR PLeakage == PDynamic
Vt = 0.2
Vt = 0
I
D
VGS
Reduces the Speed Loss, But Increases Leakage
Vdd
Delay
2Vt
Interesting Design Approach:

Transistor Sizing for Power
Minimization
Minimum sized devices are usually optimal for low-power.
Small W/L’s
Large W/L’s
Higher Voltage
Lower Voltage
Lower Capacitance
Higher Capacitance
Larger sized devices are useful only when interconnect dominated.

Techniques to reduce supply voltage
Algorithm
Architecture
Circuit/Logic
Technology
Transformation to exploit
concurrency
Parallelism and Pipelining
Transistor Sizing, Fast Logic
Structures
Threshold Voltage Reduction,
Feature Size scaling

Techniques to minimizing the
switched capacitance
Partitioning, Power-down, power states
Complexity, Concurrency, Regularity,
Locality, Data representation
Concurrency, Instruction set selection,
Signal correlations,
Data representation, Data Encoding
Transistor sizing, Logic optimization,
Power down, Layout Optimization
Advanced packaging, SOI
Architecture
Circuit/Logic
Technology
Algorithm
U
System

16-bit carry-select
1
3.6
4.4
9
10
33
relative
energy/operation
16-bit M
ultiplier
8x128x16 SRAM
(read)
8x128x16 SRAM
(write)
External I/O
Access
16 bit M
emory Access
relative
energy
Storage
Interconnect
Other RISC
components
0.0
0.2
0.4
clocks
Power consumption of transfer and storage
over datapath operations both in hardware
[Men95] and software [Tiw94, Gon96] .

Architecture Power Optimization
Techniques
• Architecture-driven voltage reduction: The key idea is to
speed up the circuit in order to be able reduces voltage while
meeting throughput rate constraints. Voltage reduction can
be achieved by introducing parallelism in hardware or
inserting flip-flops
• Switching activity minimization: Try to prevent the
generation and propagation of spurious transitions or to
reduce the number of transitions, e.g. retiming, path
balancing, data representation
• Switched capacitance minimization: Aim at the minimization
of switched capacitance
• Dynamic power management: Under certain conditions, a
circuit part becomes inactive, avoiding unnecessary
calculations, e.g. gated clocks, operand isolation, pre-
computation, and guarded evaluation

Architecture Trade-offs:
Reference Data Path
• Critical path delay  Tadder + Tcomparator (= 25ns),  fref = 40MHz
• Total capacitance being switched = Cref
• Vdd = Vref = 5V
• Power for reference datapath = Pref = Cref Vref
2
fref

Voltage Reduction Technique:
Parallelism
• The clock rate can be reduced by half with the same throughput
 fpar = fref / 2
• Vpar = Vref / 1.7 Cpar = 2.15 Cref
• Ppar = (2.15 Cref ) (Vref /1.7)2
(fref /2)  0.36 P ref

Voltage Reduction Technique:
Pipeline
• fpipe = fref, Cpipe = 1.1 Cref, Vpipe = Vref /1.7
• Voltage can be dropped while maintaining the original
throughput
• Ppipe = Cpipe Vpipe
2
fpipe = (1.1 Cref ) (Vref /1.7)2
fref = 0.37 Pref

Logic Style and Power Consumption
• Power-delay product improves as voltage decreases
• The “best” logic style minimizes power-delay for a given delay
constraint

The concept of gating clock signals
0 1
REG clock
X Y
B
A <
<
clock
gated
clock
scheme 1
<
clock
gated
clock
scheme 2
comparator
output
gated clock
(scheme 2)
gated clock
(scheme 1)
clock
0
0
0
0
1 clock period
(a) (c)
(b)

Resource Sharing Can Increase
Activity

Global bus architecture Local bus architecture
Shared Resources incur Switching Overhead
Reducing Effective Capacitance

Data representation
• Sign-extension activity significantly reduced using
sign-magnitude representation

Switching Activity in Multipliers

Signals and Operations Reordering
• Example: complex multiplication
Trading a multiplication for an addition
(a) (b)
x
Xr
x
-
Xi
Ar
Ai
Yr
x
Xr
x
+
Xi
Ai
Ar
Yi
Ai-Ar
x
Xr
x
+
Ar
Yi
x
Xi
Yr
Ai+Ar
-
+
Xr Xi

Module Selection
* *
*i ii iii
+i
+ii
(a)
(c)
(d)
* *
*i ii iii
+
+ii
*
ii iii
+i
+ii
*
*i
Area=2744
Latency=30 ns
Power=1199μW
ripple
adder
carry
loohahead
adder
Area=3959
Latency=20 ns
Power=1467μW
array
multiplier
wallace
multiplier
Area=16185
Latency=60 ns
Power=18540μW
Area=18443
Latency=40 ns
Power=23545μW
RTL
Library
(b)

Glitching activity reduction (3)
x y
z
ARCHITECTURE 1
Power Consumption:
Without glitches: 823.9 μW
With glitches: 1650 μW
ARCHITECTURE 2
Power Consumption:
Without glitches: 951.7 μW
With glitches: 1357.7 μW
Function
if (x < y) then
z=c+d
else
z=a+b
a c
0 1
x y
a b c d
b d
0 1
0 1
z

Two-Level Logic Circuits
Switching Activity Minimization (1)
• Taking into account the static and transition
probabilities (i.e. temporal correlation) of the primary
inputs, we can insert in certain gates of the first logic
level (i.e. AND gates), additional input signals
resulting into reduced switching activity
• Appropriately-selected input signals force the
outputs of the AND gates to logic level zero for a
number of combinations of the binary input signals

Two-Level Logic Circuits Switching
Activity Minimization (2)
• Example:
• Signal x3 exhibits low-transition probability and
high static-1 probability, while the signals x0 , x1,
and x2 are characterized by high-transition
probabilities
F'
g4
g4
g1
g2
g3
x0
x1
x0
x2
x0
x3
x3
'
y1
'
y2
'
y3
F
g4
g1
g2
g3
x0
x1
x0
x2
x0
x3
y1
y2
y3
g4
Intial Logic Circuit Modified Logic circuit
F x x x x x x
  
0 1 0 2 0 3

• A. Chandrakasan and R. Brodersen, “Low Power CMOS Design”,
Kluwer Academic Publishers, 1995
• Christian Piguet, Editor, « Low-Power Electronics Design”, CRC
Press, November 2004
• D. Soudris, C. Piguet, C. Goutis, “Designing CMOS Circuits for Low-
Power”, Kluwer Academic Press, October 2002
• F. Catthoor, K. Danckaert, et. al.: 2002, Data Access and Storage
Management for Embedded Programmable Processors. Kluwer
Academic Publishers
• Stamatis Vassiliadis and Dimitrios Soudris, “Fine- and Coarse-
Grain Reconfigurable Computing” Springer,
Dordrecht/London/Boston, August 2007
• http://vlsi.ee.duth.gr/~dsoudris
• AMDREL website  http://vlsi.ee.duh.gr/amdrel
Additional Info

5378086.ppt

Recommended

Recommended

More Related Content

Similar to 5378086.ppt

Similar to 5378086.ppt (20)

More from kavita417551

More from kavita417551 (6)

Recently uploaded

Recently uploaded (20)

5378086.ppt