OK Labs - Virtualization as the Nexus of Multicore Power Management

1,019 views

Published on

ARM TechCon Session "Virtualization as the Nexus of Multicore Power Management"

Thursday, November 11, 2010

Adoption of multicore technology for the desktop,data center and embedded designs responds to comparable needs – to scale compute capacity without stepping up system clocks and to attain more MIPS-per-watt for devices and applications. Multicore for the desktop and data center enjoys mature support from deployed OSes. Even as embedded OSes become more adept at running on multicore CPUs, applications and middleware still face challenges of thread-safety, concurrency and load balancing. Mobile virtualization is a means to get maximum value from multicore ARM designs, at both architectural and app levels. It examines multicore use cases for virtualization, and how it brings superior CPU utilization,greater security, smoother legacy migration,& smarter energy management to multicore designs.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,019
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This slide is missing arrows
  • OK Labs - Virtualization as the Nexus of Multicore Power Management

    1. 1. November 9-11, 2010 The Santa Clara Convention Center www.armtechcon.com
    2. 2. Energy Management for Mobile Devices Power to the Microvisor!
    3. 3. > Energy-management > Virtualization basics > Enter multicore > Summary Overview
    4. 4. > Device uses energy • Drains battery > Goal of energy management: • Maximize battery life Energy in Mobile Devices
    5. 5. Dynamic voltage and frequency scaling > CMOS power consumption: • P = Pdyn + Pstat • Pdyn ∝ f V2 • Vmin ∝ f (very approximately) > Assuming execution time T 1 /∝ f • Edyn = Pdyn T ∝ f V2 / f = V2 = f2 • lower frequency lower dynamic energy⇒ Energy-Management Mechanisms: DVFS
    6. 6. > When CPU is idle, turn clock off • Pdyn = 0 ⇒ P = Pstat > Sleep states reduce power further: • Psleep < Pstat > Typically have multiple sleep states • shallow sleep states save some energy but fast to enter/exit • deep sleep states save more energy but lose state and are expensive to enter/exit > Complex tradeoff Mechanisms: Sleep States
    7. 7. > Edyn ∝ f 2 lowest frequency is best⇒ > Ignores static energy! • E = Edyn + Estat • Edyn ∝ f 2 • Estat = Pstat T ∝ 1/f > Low f increases execution time ⇒ Estat increases at low f ! Popular Approach: Lowest Frequency
    8. 8. > Run at maximum f, then go to sleep • Tries to minimize static power — but: • dynamic power isn’t irrelevant (yet) – T 1/∝ f isn’t correct either — ignores memory! • Effect of memory stalls • T = TCPU + Tmem • TCPU ∝ 1/f • Tmem = const • Estat ∝ T = 1/f + const > Ignores sleep energy! Other Approach: “Race to Halt”
    9. 9. > Run at maximum f, then go to sleep > Earlier completion longer sleep⇒ • E = Edyn + Estat + Esleep • Esleep = Psleep Tsleep • Tsleep = T0 – T • Esleep = Psleep (T0 - T) > Still ignores dynamic energy! Other Approach: “Race to Halt” (2)
    10. 10. Real Data: Execution Time Memory- bound Memory- bound CPU- bound CPU- bound
    11. 11. Real Data: Total Energy (Measured) CPU- bound CPU- bound Memory- bound Memory- bound Naïve model Naïve model
    12. 12. Real Data: Including Sleep Energy High-power sleep state High-power sleep state Low-power sleep state Low-power sleep state
    13. 13. > Energy management is complex! > Optimal setting depends on: • Workload memory-bound vs CPU-bound vs in-between • Hardware platform static vs dynamic energy CPU vs memory power depth of sleep states and cost of entering > Simple models don’t work! Summary: Energy-Management Basics
    14. 14. > How to establish memory-boundedness? > Easy way out: pre-characterization • measure behavior off-line • determine optimal power setting by model or trial-and-error > Ok-ish for pre-defined workloads > Unsuitable for open systems • ... such as phones > Tricky with apps which change behavior Characterizing Workloads
    15. 15. > Need to observe app and adjust setting • works for any app • adjusts to changing behavior > Solution by [Snowdon et al., EuroSys’09] > Performance counters are your friends! • e.g. cache misses indicate memory access > Can systematically select best counters • build model of platform • Linear combination of performance-counter readings • pre-characterize hardware • pick counters which provide most accurate model • using sound statistical methods Better Way: On-Line Characterization
    16. 16. > Model predicts energy consumption and relative execution speed • at present setpoint • at different setpoins > Accurately predicts energy- and performance response to DVFS • within a few % > Can use this for informed energy-management decisions On-Line Characterization & Modeling
    17. 17. Accuracy of Approach Memory- bound Memory- bound CPU- bound CPU- bound
    18. 18. Effect on Energy CPU- bound CPU- bound Memory- bound Memory- bound
    19. 19. > What is “best”? • Maximal Performance? • Minimal Energy? • Minimal Power? > Depends... > May change • battery depletes > Need flexible policies Energy Management Policies Workload PredictionWorkload Prediction Candidate Setpoints QoS Info Setting Energy/PerformanceEnergy/Performance ModelsModels Selection PolicySelection Policy Workload Statistics
    20. 20. Generalized Energy-Delay Policy
    21. 21. Generalized Energy-Delay Policy PerformancePerformance CPU- bound CPU- bound Memory- bound Memory- bound EnergyEnergy
    22. 22. Multi-Tasking Workload CPU- bound CPU- bound Memory- bound Memory- bound
    23. 23. > Implementation of power model and policies • once for platform vs once for each guest • no guest has global view, hypervisor does • integration with other cores DSPs, baseband processor • policy-mechanism separation Why do it outside the OS?
    24. 24. > Controls all resources • CPU, memory, devices > De-privileged guest OSes • execute in user mode • prevents interference with hypervisor with other guests • ensures hypervisor retains control over resources The Hypervisor
    25. 25. > Subsystems compete for it > Cannot let subsystems manage it • just as with memory, CPU > Needs trusted, central authority > Needs to be done in virtualization layer Energy is a Global Resource
    26. 26. > Mechanisms in hypervisor > Policies in isolated management module > Keep hypervisor policy-free • HW-like Policy-Mechanism Separation
    27. 27. > Additional degree of freedom • DVFS + sleep states + core shutdown • Hypervisor supports transparent, temporary consolidation of cores • Unneeded cores turned off to reduce power > Different tradeoffs • Performance vs power close to linear > Important to manage cores globally • In average more cores off than with per-guest management • Can use deeper sleep state • Less overall energy use Enter Multicore OKL4 Microvisor Subsystem #1 CPU VCPU VCPU VCPUVCPU Subsystem #2 CPU CPUCPU OKL4 Microvisor Subsystem #1 CPU VCPU VCPU VCPUVCPU Subsystem #2 CPU CPUCPU
    28. 28. > Cache coherency couples clock frequencies of multiple cores > OSes running on different cores cannot adjust clock independently > Requires entity with global view Enter Multicore: Architectural Constraints
    29. 29. > Cores have same ISA but different clock rates > Hypervisor can determine optimal mapping of subsystems to cores • Using same infrastructure as for DVFS • Integrate with temporary core consolidation Asymmetric Multicore Fast CPU Slow CPU OKL4 Microvisor CPU-bound Subsystem Fast CPU VCPU VCPU VCPUVCPU Memory-bound Subsystem Slow CPU
    30. 30. > Move subsystems between cores • including temporary consolidation of different subsystems on common core > Architectural inter-core dependencies • cannot manage core clocks independently > Requires global control • ... outside individual OSes • indirection layer between OS and hardware > No practical alternative to virtualization! The Future is Multicore OKL4 Microvisor Subsystem #1 CPU VCPU VCPU VCPUVCPU Subsystem #2 CPU CPUCPU
    31. 31. > Virtualization is unavoidable long-term > ... but provides other benefits short-term > Early uptake maximises benefits > Future-proof your designs! Summary
    32. 32.  Thank You!

    ×