OK Labs - Virtualization as the Nexus of Multicore Power Management

November 9-11, 2010
The Santa Clara Convention Center
www.armtechcon.com

Energy Management for Mobile Devices
Power to the Microvisor!

> Energy-management
> Virtualization basics
> Enter multicore
> Summary
Overview

> Device uses energy
• Drains battery
> Goal of energy management:
• Maximize battery life
Energy in Mobile Devices

Dynamic voltage and frequency scaling
> CMOS power consumption:
• P = Pdyn + Pstat
• Pdyn ∝ f V2
• Vmin ∝ f (very approximately)
> Assuming execution time T 1 /∝ f
• Edyn = Pdyn T ∝ f V2
/ f = V2
= f2
• lower frequency lower dynamic energy⇒
Energy-Management Mechanisms: DVFS

> When CPU is idle, turn clock off
• Pdyn = 0 ⇒ P = Pstat
> Sleep states reduce power further:
• Psleep < Pstat
> Typically have multiple sleep states
• shallow sleep states save some energy
but fast to enter/exit
• deep sleep states save more energy
but lose state and are expensive to enter/exit
> Complex tradeoff
Mechanisms: Sleep States

> Edyn ∝ f 2
lowest frequency is best⇒
> Ignores static energy!
• E = Edyn + Estat
• Edyn ∝ f 2
• Estat = Pstat T ∝ 1/f
> Low f increases execution time
⇒ Estat increases at low f !
Popular Approach: Lowest Frequency

> Run at maximum f, then go to sleep
• Tries to minimize static power — but:
• dynamic power isn’t irrelevant (yet)
– T 1/∝ f isn’t correct either — ignores memory!
• Effect of memory stalls
• T = TCPU + Tmem
• TCPU ∝ 1/f
• Tmem = const
• Estat ∝ T = 1/f + const
> Ignores sleep energy!
Other Approach: “Race to Halt”

> Run at maximum f, then go to sleep
> Earlier completion longer sleep⇒
• E = Edyn + Estat + Esleep
• Esleep = Psleep Tsleep
• Tsleep = T0 – T
• Esleep = Psleep (T0 - T)
> Still ignores dynamic energy!
Other Approach: “Race to Halt” (2)

Real Data: Execution Time
Memory-
bound
Memory-
bound
CPU-
bound
CPU-
bound

Real Data: Total Energy (Measured)
CPU-
bound
CPU-
bound
Memory-
bound
Memory-
bound Naïve
model
Naïve
model

Real Data: Including Sleep Energy
High-power
sleep state
High-power
sleep state
Low-power
sleep state
Low-power
sleep state

> Energy management is complex!
> Optimal setting depends on:
• Workload
memory-bound vs CPU-bound vs in-between
• Hardware platform
static vs dynamic energy
CPU vs memory power
depth of sleep states and cost of entering
> Simple models don’t work!
Summary: Energy-Management Basics

> How to establish memory-boundedness?
> Easy way out: pre-characterization
• measure behavior off-line
• determine optimal power setting
by model or trial-and-error
> Ok-ish for pre-defined workloads
> Unsuitable for open systems
• ... such as phones
> Tricky with apps which change behavior
Characterizing Workloads

> Need to observe app and adjust setting
• works for any app
• adjusts to changing behavior
> Solution by [Snowdon et al., EuroSys’09]
> Performance counters are your friends!
• e.g. cache misses indicate memory access
> Can systematically select best counters
• build model of platform
• Linear combination of performance-counter readings
• pre-characterize hardware
• pick counters which provide most accurate model
• using sound statistical methods
Better Way: On-Line Characterization

> Model predicts energy consumption and relative execution speed
• at present setpoint
• at different setpoins
> Accurately predicts energy- and performance response to DVFS
• within a few %
> Can use this for informed energy-management decisions
On-Line Characterization & Modeling

Accuracy of Approach
Memory-
bound
Memory-
bound
CPU-
bound
CPU-
bound

Effect on Energy
CPU-
bound
CPU-
bound
Memory-
bound
Memory-
bound

> What is “best”?
• Maximal Performance?
• Minimal Energy?
• Minimal Power?
> Depends...
> May change
• battery depletes
> Need flexible policies
Energy Management Policies
Workload PredictionWorkload Prediction
Candidate
Setpoints
QoS Info
Setting
Energy/PerformanceEnergy/Performance
ModelsModels
Selection PolicySelection Policy
Workload
Statistics

Generalized Energy-Delay Policy

Generalized Energy-Delay Policy
PerformancePerformance
CPU-
bound
CPU-
bound
Memory-
bound
Memory-
bound
EnergyEnergy

Multi-Tasking Workload
CPU-
bound
CPU-
bound
Memory-
bound
Memory-
bound

> Implementation of power model and policies
• once for platform vs once for each guest
• no guest has global view, hypervisor does
• integration with other cores
DSPs, baseband processor
• policy-mechanism separation
Why do it outside the OS?

> Controls all resources
• CPU, memory, devices
> De-privileged guest OSes
• execute in user mode
• prevents interference
with hypervisor
with other guests
• ensures hypervisor retains control over resources
The Hypervisor

> Subsystems compete for it
> Cannot let subsystems manage it
• just as with memory, CPU
> Needs trusted, central authority
> Needs to be done in virtualization layer
Energy is a Global Resource

> Mechanisms in hypervisor
> Policies in isolated management module
> Keep hypervisor policy-free
• HW-like
Policy-Mechanism Separation

> Additional degree of freedom
• DVFS + sleep states + core shutdown
• Hypervisor supports transparent, temporary
consolidation of cores
• Unneeded cores turned off to reduce power
> Different tradeoffs
• Performance vs power close to linear
> Important to manage cores globally
• In average more cores off than with
per-guest management
• Can use deeper sleep state
• Less overall energy use
Enter Multicore
OKL4 Microvisor
Subsystem #1
CPU
VCPU VCPU VCPUVCPU
Subsystem #2
CPU CPUCPU
OKL4 Microvisor
Subsystem #1
CPU
VCPU VCPU VCPUVCPU
Subsystem #2
CPU CPUCPU

> Cache coherency couples clock
frequencies of multiple cores
> OSes running on different cores cannot
adjust clock independently
> Requires entity with global view
Enter Multicore: Architectural Constraints

> Cores have same ISA but different clock rates
> Hypervisor can determine optimal mapping of subsystems to cores
• Using same infrastructure as for DVFS
• Integrate with temporary core consolidation
Asymmetric Multicore
Fast
CPU
Slow
CPU
OKL4 Microvisor
CPU-bound
Subsystem
Fast
CPU
VCPU VCPU VCPUVCPU
Memory-bound
Subsystem
Slow
CPU

> Move subsystems between cores
• including temporary consolidation
of different subsystems on common core
> Architectural inter-core dependencies
• cannot manage core clocks independently
> Requires global control
• ... outside individual OSes
• indirection layer between OS and hardware
> No practical alternative to virtualization!
The Future is Multicore
OKL4 Microvisor
Subsystem #1
CPU
VCPU VCPU VCPUVCPU
Subsystem #2
CPU CPUCPU

> Virtualization is unavoidable long-term
> ... but provides other benefits short-term
> Early uptake maximises benefits
> Future-proof your designs!
Summary

OK Labs - Virtualization as the Nexus of Multicore Power Management

Recommended

Recommended

More Related Content

Similar to OK Labs - Virtualization as the Nexus of Multicore Power Management

Similar to OK Labs - Virtualization as the Nexus of Multicore Power Management (20)

Recently uploaded

Recently uploaded (20)

OK Labs - Virtualization as the Nexus of Multicore Power Management

Editor's Notes