Instruction level power analysis

Instruction Level Power Analysis

1

Layout
 Introduction
 Components of Power Consumption
 Power Characterization
 Instruction Level Power Analysis for RISC
processors
 Extensions for VLIW/EPIC processors
 Register Files
 Caches

2

Introduction
 Why power of nano-electronics became so
important?
 Because of Moore’s law still holds true through
complex applications
 Mobile systems – battery “bottleneck”
 High performance computation – heat
extraction
 Operating cost and reliability
 Data warehouse of ISP with 8000 servers

needs 2 MW

3

Introduction
 Power or Energy? Aren’t they go hand-in-hand?
 Power varies significantly with time!
 A given battery has fixed amount of energy
 Average power consumption = Energy/Execution-
time
 Decides average chip and junction temperature
 Decides battery life (if peak current < rated
current)
 Peak power and current
 Voltage drops, hot spots, rate of battery discharge
 Power-efficient, Energy-efficient, Battery-efficient
design paradigms do exist!

4

Components of Power
Consumption
 System = hardware platform + software (sys. & app.)
 Software impacts hardware power consumption
 Static power
 Sub-threshold leakage & reverse biased junction leakage
 Quiescent biasing power (in case of non-CMOS circuits)
 Dynamic power
 Charging and discharging of capacitance (switching
activity)
 Short circuit power during transition (rate of change,
delay)
 Alternative grouping (used at component/cell level)
 Switching power at the boundaries of cells
 Internal cell power
 Short circuit power
 Switching power at internal nodes

5

System Abstractions - Power
Functional Specifications and Constraints

Accuracy of power characterization
Opportunities for optimization

System Level Netlist

Time complexity
Register Transfer Level (RTL) Netlist

Component/Cell Level Netlist

Layout or Configuration-bits

Chip

6

Power Characterization
 Measurement (Chip/Board Level)
 Most accurate
 Perhaps the fastest, if setup and tools
exist
 Too late to change hardware details
 Software/Load control is still possible
 Typically used for software
optimizations

7

Power Characterization (cont…)
 Transistor Level (estimation)
 Spice simulation of transistor level netlist
 Most accurate in the simulation world
 Requires complete implementation details
 Unmanageable time complexity even for
simpler designs
 Typically used for cell/component
characterization
 Synopsys PowerMill (said to provide spice-
like accuracy)

8

 Cell Level (estimation)
 After logic synthesis
 Requires RTL implementation
 Simulation to capture switching activity
 Requires delay simulation if glitches need to be accounted
 Characterized cells – empirical formulas or table look-up
 Interconnect power
 Either unaccounted or
 Using estimated wire load models (typically based on
experience) or
 Extracted layout (if done after physical synthesis)
 Still unmanageable time complexity especially to use in
design space exploration
 Synopsys PrimePower
 Netlist, interconnect capacitance, VCD traces, cell power
library

9

 Register Transfer Level (estimation)
 Requires conceptual RTL description (detailed
micro-architecture)
 Data-path is modeled as netlist of macro cells,
which are characterized offline
 Control path and glue logic
 Either unaccounted or estimated based on I/O
 Simulation to capture switching activity
 Typically glitches are not considered but methods do
exist
 Interconnect power
 Typically unaccounted but possible to estimate
through floor-planning
 Typically used in DSE mostly using in-house tools

10

System Level Power Estimation
 For Design Space Exploration
 Least accurate but uncertainty of exploration results
can be reduced if models have good fidelity
 Purpose, target architecture and available system
details govern the system-level estimation models
 Selecting algorithm or designing hardware for given
algorithm?
 ASIC based or processor based?
 Is ISA fixed or extensible?
 Typically system-level power estimation models are
macro-architecture template specific
 Major constituents of power consumption
 Computation, communication, storage units & peripherals

11

Power Estimation Models
 Activity Based Models
 Instruction Level Energy Models

12

Activity Based Models
 Fixed Activity Model
 N-Transition Model
 Dual Bit Model

13

Fixed Activity Model

P = ∑ i kiGifi
Where:
ki = PFA proportionality constant extracted
empirically from past designs
Gi = Measure of hardware complexity
fi = Activation frequency

 Disadvantage: Do not model the influence of data
activity on power consumption

14

N-Transition Model

P = Pconst + n.Pchange

 Disadvantage:
It does not differentiate between transitions on
different inputs.

15

Dual Bit Type Model
 Drawback in previous
approaches:
 Less Accurate
 Characterizes the
module on basis of
Uniform White Noise
(UWN) input
 Leads to high error if
the input dynamic
range does not fully
occupy the word
length

16

Dual Bit Type Model
The Approach

 Combines reduced complexity of the
architecture level with the accuracy of
gate and circuit level
 Black box model of capacitance switched
in each module for various types of inputs
 Easy to parameterize capacitance models
to take into account size , etc.

17

Dual Bit Type Model
Modeling Complexity
 Power consumed by a module is a
function of its complexity as large
modules contain more circuitry
 Examples:
 Capacitance of N-bit ripple carry subtracter:
CT = Ceff * N
 Not restricted to linear models, but can be
used to specify even more complex models

18

Dual Bit Type Model
Capacitive Data Coefficients

 Describe the average amount of
capacitance switched within a module
during an input transition
 LSB regions suffer random transitions and
hence can be characterized by a single
capacitive coefficient CUU
 MSB region experiences sign transitions and so
is characterized by capacitive sign coefficients
C+-,C++, etc.

19

Instruction Level Power Estimation
 First introduced to characterize
processor power consumption to drive
software optimizations
 Each instruction is associated with
some current
 Inter instruction effects for better
accuracy

20

 E = Σ(Bi x Ni) + Σ(O(i,j) x N(I,j)) +
ΣEk
 Bi: Base Energy Cost
 Oi.j: Inter-instruction effect Energy Cost

 Ek: additional energy penalties due to

resource constraints
 Require cost associated with every pair
of instructions: O(N2), where N =
number of instructions in ISA

21

JouleTrack
 Experiments on StrongARM by Amit Sinha &
A.P.Chandran
 Current/instruction ~ 0.2A (averaged over all
instructions)
 Min-max variation of 38% of average current
 Address mode and data dependent variation is
smaller
 But, max current variation across benchmarks is
< 8% !
 Concluded that first order energy model of a
given processor is, E = V I(V, f) T
 Second order effects can be significant for data-
path dominated processors such as DSP, VLIW

22

 Impractical for CISC processors with
very large instruction set
 Higher Average Instruction Energy
 Low Energy Per Instruction Variance
 Do not consider inter instruction effects
 Cluster Similar Instructions as a single
class
 Exponential Storage Problem for VLIW
architectures
 No. of Long Instructions = N operations
into a K-wide VLIW = N(2k)

23

Modified Energy Model for VLIW
 Assume Independent Energy dissipation for
different Execution slots
 Consider nop as the base energy
 E(W) = ΣU(wn|wn-1) + mxpxS + lxqxM

 U(wn|wn-1) = U(0|0) + Σv(wnk,wn-1k)

 Wnk = operation issued on lane k by instruction wn
 Example
 Wn = [ ALU NOP NOP NOP], Wn-1 = [ LS NOP ALU
NOP]
 U(wn|wn-1) = U(0|0) + v(ALU|LS) + v(NOP|ALU)
 Memory Requirement
 O(K*N2)

24

Modified Energy Model for VLIW
 Cluster Similar Instructions based on cost
 Θ = {e1, e2, …, et}
 et = energy consumption of instruction t
 Partition Θ into K clusters (C1, C2, …, Ck) s.t.
 ΣΣ (xi,j –cj)2 = minimum
 Large number of clusters
 Good Accuracy
 Huge no. of experiments
 Small number of clusters
 Small number of experiments
 High Variance between clusters
 Reduced Accuracy
 Memory Requirement
 O(C*N2)

25

Limitations of ILPA
 Does not provide any insight on the
causes of power consumption within the
processor core
 Does not account for the power consumed
in the memory system, which is often
dominant
 To address the second limitation, power
estimation frameworks which integrate
processor and memory models are built
around instruction set simulators

26

MicroArchitecture ILPA
 Pipeline Aware Instruction Level Energy Model
 Divide the design into smaller architectural blocks
 Usually Processor’s Pipeline Stages
 Fetch, Decode, RF, Execute, WB
 E(wn|wn-1) = Σ As(wn|wn-1) + I(wn|wn-1)
 As = Energy Consumed Per stage s when executing
wn after wn-1
 I(wn|wn-1) = Interstage connections energy
(PipeLine Registers + Buses)
 Provides better insight for power bottlenecks
 Smoother Energy Behaviour than Blackbox model
 Require a Pipeline Structure Aware ISS

27

Energy Models for Register File
 Assume Linear Power Behaviour for
access across different ports
 PRF = Pi + 1/T Σ (Er,n + Ew,n)
 Er,n = Σ H(RRi,n, RRi,n-1) *Erb
 Ew,n = Σ H(RWi,n, oldi,n) * Ewb

28

Energy Model for Caches
 Power consumption depends on mode of
operation (read, write, idle)
 Energy consumed in a given clock cycle is
function of node transition between
previous and current cycle.
 Characterize energy as function of state
transitions(read-read, read-write, etc).
 For a given transition, dependence upon
transition on address lines.

29

Instruction level power analysis

More Related Content

What's hot

Viewers also liked

Similar to Instruction level power analysis

Recently uploaded

Instruction level power analysis