Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
656
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
16
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Power Management in Embedded SOCs An overview of our research efforts in power, performance management in embedded computing systems (on chip) Rajesh K. Gupta Center for Embedded Computer Systems Department of Information & Computer Science University of California, Irvine. http://www.cecs.uci.edu/~rgupta
  • 2. Focus • Our focus is on – system-level power management: » pretty much everything other than the process, devices and circuits » architecture, compiler, operating system, middleware, communication, network protocols – “management” of power and energy » techniques that achieve specific functionality, performance goals within available (and dynamically changing) power, energy constraints – targeted at “On-chip Computing & Networked Systems” » systems with interesting network interfaces, computation and communication capabilities.
  • 3. Ongoing Projects • Ongoing projects – COPPER: Compiler-controlled power/performance optimization » compiler smarts (with Alex Nicolau, Nikil Dutt supported by DARPA) » architectural smarts (supported by DARPA) – FMPOWER: Formal methods in power modeling and optimization » what are the fundamental limits on the “goodness” of specific power management strategies (SRC, NSF) • applies competitive analysis and computational learning theory – PADS: Power-aware design in a sensor network environment » Intra-nodal and network-level power modeling and management (with M. Srivastava supported by DARPA) – OSDPM: Operating system strategies to management power » supported by UC CoRe, NSF
  • 4. Power Aware API for Efficient Power/Performance Management Rajesh Gupta
  • 5. Motivation • Power management is an important component of overall power minimization in computing systems. • The run-time system (OS) is a good place to make tradeoffs between real-time performance, accuracy and power consumption due to its knowledge about the whole system state.
  • 6. Goal • Provide ways by which Application, Operating System and Hardware can exchange energy/power and performance related information efficiently. • Facilitate the continuously dialogue / adaptation between OS / Applications. • Facilitate the implementation of power aware OS services by providing a software interface to low power devices – A power-aware API to the end user that enables one to implement energy-efficient RTOS services and applications
  • 7. Software Architecture • Consists on two-levels of software – between RTOS and underlying hardware – between Application and RTOS Application PA-API PA-Middleware POSIX PA-OSL Operating Modified Operating System OS Services System PA-HAL Hardware Abstraction Layer Hardware
  • 8. Software Architecture • PA-API - Power aware function calls available to the application writer. – Some functions of this layer are specific to certain scheduling techniques. • PA-Middleware - Power aware services – implemented on the top of the OS (power management threads, data handling, etc...). • POSIX - Standard interface for OS system calls. – This isolates PA-API and PA-Middleware from OS. • PA-OSL - Power aware OS layer. – Calls related to modified OS services should go through this level. Also isolates OS from PA-API and PA-Middleware. • PA-HAL - Power Aware Hardware Abstraction Layer. – Isolates OS from underlying power aware hardware. • Modified OS services – Implementation / modification of OS services in a power related fashion. Ex: scheduler, memory manager, I/O, etc.
  • 9. Power-aware API Requirements • Independent of Hardware and RTOS implementations – enables its use in different hardware platforms » for this all routines should access the HAL (Hardware Abstraction Layer) rather than the Hardware directly – enables its use in different RTOS as well as its use with different scheduling strategies » do not count on specific RTOS info and/or specific schedulers • Services provided – processor frequency scaling and low-power state transitions » with costs of making such transitions – battery status (if the system is battery based) – appropriate routines to control energy-speed and energy- accuracy knobs available on I/O devices: » network interface, serial interface, LCD, etc.
  • 10. Power-aware API The applications interface provides the following services: • The application is able to – tell RT information to OS (period, deadlines, WCET, hardness) – create new threads – tell OS time predicted to finish a given task instance » depending on the conditions of the environment (application dependent and not yet implemented) • OS must be able to predict and tell applications the time estimated to finish the task – depends on the scheduling scheme used • A hard task must be killed if its deadline is missed.
  • 11. Current Status • API specification available from – http://www.ics.uci.edu/~cpereira/pads/ • Implementation – eCOS RTOS: » open source, Object oriented and highly configurable RTOS (by means of scripting language) – Hardware platforms we are currently working with: » Linux-synthetic (emulation of eCos over Linux - debugging purposes only) » Compaq iPaq Pocket PC - StrongARM SA1110 based platform » Accelent IDP (Integrated Development Environment) - also StrongARM SA1110 based. » LRH Intel evaluation board 80200EVB - Intel Xscale based
  • 12. DPM Algorithms Implemented • A predictive RMS low-power scheduling – It validates the power-aware API implementation » assumes periodic tasks and „deadline = period‟ – The predictive scheduler implementation is divided as follows: » tables and variables manipulation » admission control and static slow down factor » dynamic slow down factor computation (time prediction) » deadline management (hard deadline tasks) – The processor frequency and voltage are scaled according to the time predicted by the OS – The application can also predict the execution time in order to enhance accuracy.
  • 13. Implementation • eCOS Implementation: – All the timing related information are kept internally to eCos kernel by means of tables – Some eCos classes were extended with new members in order to efficiently access the tables – The code is inserted in the eCos kernel source code by means of symbols definitions » enables automatic kernel synthesis of code • Plans: – Extend eCos (by means of inheritance) class instead of just add code into them – Have a tool to generate the implementation of different scheduling schemes automatically (using the API) – API implementation on Embedded Linux
  • 14. Implementation 80200EVB w/ voltage scaling board and the host system Compaq IPAQ Maxim board for voltage scaling running eCos
  • 15. Experiments - XScale Processor Frequency/Voltage versus Current (executing busy loop) For varying voltage 600 733 Frequency Voltage 666 (Mhz) (V) 500 600 666 333 1.0 533 400 466 600 400 1.1 Current (mA) 400 533 Current Varying 466 1.2 300 333 466 Fixed Voltage (1.5V) 533 1.25 400 200 333 600 1.3 100 666 1.4 733 1.5 0 0 200 400 600 800 Frequency (MHz) *All measurements executing a busy loop
  • 16. Software/Static Optimizations for Power
  • 17. Software Power Optimizations • Code running on CPU – Code optimizations for power CPU ASIC • Code accessing memory objects – SW optimizations for mem Cache I/O Compiler-supported power mgt – Dynamic power/perf. mgt Memory
  • 18. The COPPER Project Compiler-controlled Power-Performance Management • Develop efficient architectural support and compiler techniques for power management – continuously -- as an application runs – targeted for high performance/VLIW machines • Coordinated management of multiple techniques – reduction in power with little or no loss of performance. • Develop techniques for dynamic compilation to actively trade off performance and power consumption • Develop a retargetable, ADL-based, power-aware system simulation capability. With co-PIs: A. Nicolau, N. Dutt, A. Veidenbaum
  • 19. Approach • Compiler Strategies for Power Management – Compiler-directed architectural “configuration” » generate “configuration code” embedded in the application » code “adapts” to new architectural organization at runtime • JIT vs multi-version compilation techniques • dynamic, on-demand optimization – Code annotation for dynamic compilation » trade-off compilation overhead for quality of generated code – Power-use Estimation for Compiler Control » static analysis to select “optimal” configuration » profile-based selection techniques » static or dynamic prediction methods
  • 20. COPPER Framework Available Power Power Power Scheduler Profiler Chosen Code Version Hardware Power Config Estimate Parameterizable Cycle-Level Power Models Performance Cycle-by-Cycle Simulator Hardware Access Counts Code Versions Performance Power Simulator Estimate Application Compiler (gcc)
  • 21. Baseline Architecture • A MIPS R10K like processor – 4-wide issue, out-of-order (OOO) processor » 5-stage pipeline: fetch, dispatch, issue, writeback, commit – 32b integers, 64b f.p. numbers – register files: 32 integer and 32 FP registers – 32K L1 instruction cache, 32K L1 data cache » 32B L1 line size, – 512K L2 unified cache » 64B L2 line size – 2 int ALUs, 1 FP adder, 1 FP multiplier – 512-entry BTB, 2K entry branch predictor
  • 22. Power/Performance “Knobs” Explored Memory hierarchy Instruction issue logic & issue width for VLIW m/c Dynamic Register File Reconfiguration Frequency and Voltage scaling
  • 23. Dynamic Register Reconfiguration • Compiler generates different code versions – Code versions have different ILP, register need… • Power-performance profiling compiler decides on best code version • Compiler generates function code annotation – carrying the chosen code version – carrying the number of registers needed for each code version • At function calls, the run-time scheduler selects code version and adjusts register file size accordingly – all based on code annotation information
  • 24. Power Scheduling Heuristic • Select code version dissipating below and closest to the limit • Switch to selected version and re-configure registers file • Invoke every N cycles to continuously track energy use
  • 25. Power Management Through DRR Register file power management without performance degradation
  • 26. Unconstrained DRR • Register file power management with performance degradation
  • 27. Frequency and Voltage Scaling • Code profiled for 4 clock frequency/voltage scaling configurations
  • 28. Power Management by F/V Scaling • 4 available versions (600MHz,2.2V-500MHz,2.0V-400MHz,1.8V- 300MHz,1.6V)
  • 29. Timing Constraints • We consider timing constraints as bounds on operation intervals – upper and lower bounds – (determination of optimum interval separation possible statically) • Time constraints specified via checkpoints – User-defined checkpoints are inserted in the source code and time constraints between checkpoints are defined.
  • 30. Constrained Dynamic Frequency & Voltage Scaling • Specify time and energy constraints – energy constraints specified via estimation of the varying power available throughout the whole program execution • Power-performance profiling compiler – estimates max energy/cycle ratio and cycle count between checkpoints • Run-time scheduler – Calculates run-time freq limit based on available power and energy profile between curr chp and all possible next chps – Calculates optimal target freq based on both time constraints and run-time freq limit between curr chp and all possible next chps. – Final target freq is selected so that the code runs as slow as possible within the imposed time constraints.
  • 31. Register usage during execution Register File/Code Versioning and Clock Frequency/Voltage Power Management - paraffins 35 Reg Num 1 3 4 4 4 4 4 4 4 Code Version 6 2 4 4 4 4 4 4 4 5 30 [Code Version and reg s] 25 20 15 10 5 0 0 50 100 150 200 250 300 [Time, ms]
  • 32. Results (Preliminary) #start end MinTime MaxTime Register File/Code Versioning and Clock Frequency/Voltage Power Management - paraffins #checkp checkp (ns) (ns) 600 Frequency Limit 1 3 4 4 4 4 4 4 4 Frequency 6 1 2 4000 4000 2 4 4 4 4 4 4 4 5 2 3 8000 8000 500 3 4 1000 1000 4 4 6000 30000 4 5 40000 60000 [Frequency, MHz] 400 5 6 70000 70000 Time Power 300 2000 1.5 27000 1.24 54000 0.98 200 81000 0.73 109000 0.28 136000 0.72 100 163000 0.95 190000 1.15 218000 1.38 0 0 50 100 150 200 250 300 245000 0.365 [Time, ms] 272000 0.85 Freq limit: max. allowed freq using energy constraints 300000 0.95 Target frequency chosen based on time and energy constraints
  • 33. (Explanation) • Green line – Maximum allowed freq calculated using energy constraints • Blue lines – Program checkpoints • Red line – Target freq chosen by dynamic scheduler, respecting time constraints and allowing the program run as slow as possible to save power – Freq value =0 means extra delay was inserted to satisfy minimum time constraints between checkpoints in the simulation
  • 34. Combined Register Reconfiguration, F&V Scaling Register File/Code Versioning and Clock Frequency/Voltage Power Management - paraffins 1.8 Power Consumption 1 3 4 4 4 4 4 4 4 Predicted Power Profile 6 1.6 2 4 4 4 4 4 4 4 5 1.4 1.2 [Power] 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 300 [Time, ms]
  • 35. Summary • While average power reduction is important, effective control of dynamic power consumption is essential – especially for software management of power and performance • The hard problem here is – identification of effective architectural mechanisms and their deterministic control through software • COPPER approach – use architectural features common to a range of processor architectures » memory hierarchy, register files, instruction issue. – Coordinate with technology and OS strategies » frequency and voltage scaling.
  • 36. Power Management in Networked Systems Rajesh Gupta
  • 37. Motivation • Networked SOCs are finding use far beyond traditional desktop machines – PDA, wireless pads, wireless sensors, cell phones,… – efficient power use is crucial to portability, reliability and thermal management • Energy and power usage of these devices is markedly different from laptop and notebook computers – much wider dynamic range of power demands – increasing share of memory, communication and signal processing (as opposed to disk and displays) – multiple power use modalities depending upon application » “immortal”, “paging-mode RX”, “lifeline TX”, “mission mode” Design of power-aware higher layer applications, protocols, OS services.
  • 38. Where Does Power Go? Baseband DSP Peripherals Processing Disk Display Programmable ASICs Ps & DSPs (apps, protocols etc.) Memory DC-DC Converter Battery Radio RF Modem Transceiver Power Supply Communication Signaling protocols, choice of modulation, TX/RX architecture, RF/IF circuits
  • 39. Example: Computer with Wireless NIC O th e r 7% C P U / M e m o ry D isp lay 21% 36% H a rd D ri v e 18% W i re l e s s L A N 18%
  • 40. Berkeley InfoPad DC/DC µProc. DC/DC 25% 6% 42% I/O Wire le ss 2% 18% Misc 11% µProc. LCD 6% 6% LCD Vide o Misc 10% Display I/O Wire le ss 7% 40% 1% 29% With Optional Video Display Without Optional Video Display Total = 9.6W Total = 6.8W (with processor at 7% duty cycle) (with processor at 7% duty cycle)
  • 41. Rockwell WINS Node Pro ce sso r S e ism ic S e n sor R a d io Po w e r (m W ) Capabilities: vibration, acoustic, A c t ive On Rx 7 51 . 6 accelerometer, magnetometer, temperature sensing A c t ive On Idle 7 27 . 5 A c t ive On S le ep 4 16 . 3 Communication Rest of the Node A c t ive On R e m o ved 3 83 . 3 GPS Subsystem A c t ive R em ove d R e m o ved 3 60 . 0 Radio Micro CPU Sensor A c t ive On Tx (36 . 3 m W ) 10 80 . 5 Modem Controller Tx (27 . 5 m W ) 10 33 . 3 Summary Tx (19 . 1 m W ) 9 86 . 0  Processor = 360 mW Tx (13 . 8 m W ) 9 42 . 6 Tx (10 . 0 m W ) 9 10 . 9  doing repeated transmit/receive Tx (3. 4 7 m W ) 8 15 . 5  Sensor = 23 mW Tx (2. 5 1 m W ) 8 07 . 5  Processor : Tx = 1 : 2 Tx (1. 7 8 m W ) 7 99 . 5 Tx (1. 3 2 m W ) 7 91 . 5  Processor : Rx = 1 : 1 Tx (0. 9 55 m W ) 7 87 . 5  Total Tx : Rx = 4 : 3 at maximum range Tx (0. 4 37 m W ) 7 75 . 5 Tx (0. 3 02 m W ) 7 73 . 9 Tx (0. 2 29 m W ) 7 72 . 7 Tx (0. 1 58 m W ) 7 71 . 5 Tx (0. 1 17 m W ) 7 71 . 1
  • 42. Power Management in Wireless NES • Power modes: transmit, receive, idle, sleep, off – typically idle mode (ready but neither receiving nor transmitting) takes similar power as receive mode – transmit power in WLAN is x2-x3 of receive power » difference larger in WWAN (RF power dominates) » often RF transmit power to be varied, thereby NICs transmit mode power, but at the cost of varying BER – transition times may be significant » HP‟s HSDL-1001 IR transceiver takes 10 uS to enter sleep, and 40 uS to wakeup » Wavelan takes about 100 ms to wake up » Ricochet takes about 5s to wake up • Shutdown strategies similar to disks and CPUs – sleep <-> wakeup transition times << in disks – could be done by MAC protocols, e.g., 802.11x
  • 43. Help From Upper Layers • Minimizing idle times matters the most – other factors secondary, such as specific protocol • Energy-efficient MAC – reduce time radio is in TX » (minimize collisions, use polling, slot reserveration, …) – reduce time radio is in RX » (minimize listening for packets to arrive, broadcast schedule,..) – reduce TX-RX, ON/OFF turnaround; “voluntary” sleep • Transport – don’t leave the RX idle while there is congestion in the network • Data scheduling – coordinate data delivery to RX in bursts • Software control of NI for application-level optimizations
  • 44. Architectural Strategies • Example: radio often simply relays packets in multihop network • Traditional approach: main CPU woken up, packets sent to it across serial bus – power hungry computing and communication operations • Our approach: exploit programmable micro-controller in the Communication Subsystem to handle common cases of packet routing – can also do operations such as combining of packets with redundant information …zZZ Multihop Multihop Packet Communication Rest of the Node Communication Rest of the Node Packet GPS Subsystem GPS Subsystem Radio Micro CPU Sensor Radio Micro Modem Controller CPU Sensor Modem Controller Traditional Approach Our Approach
  • 45. Packet Routing Communication Packet Classifier GPS Subsystem Application-Defined Matching Rules Radio Micro & Actions Modem Controller Packet Modifier • Packet-classifier and packet-modifier driven by application defined matching rules and actions – Matching rules: and/or expressions using =, <, >, range operators on arbitrary packet fields (offset, length) – Actions: accept, forward, drop, field increment/decrement etc. • Rules and actions operate on arbitrary packet fields (any layer) – fields specified as (offset, length) – only simple, common cases handled at the radio » for complex cases packet sent to the main processor • Expressiveness: implemented the following as test cases – Node ID-based addressing and routing (IP-like) – Point-cast (send to a circular area specified as destination) • Current proof-of-concept prototype being done on Rockwell node
  • 46. Coordinated Power/Performance Management PM coordinated across communication and computation tasks particularly for sensor applications (in collaboration with M. Srivastava)
  • 47. Canonical Power Management Hardware Resource Power vs. Performance Control Knob Power-aware Resource Manager OS/Middleware/Application
  • 48. Performance Impact Hardware Resource • Observation: – Tuning power-performance control knobs often has time and/or energy cost Power vs. Performance Control Knob • Question: – How effective is a specific power-aware resource management policy in terms of energy saved and timing constraints violated? Power-aware Resource Manager OS/Middleware/Application
  • 49. Power Management in Networked Systems Computation Communicatio Subsystem n Subsystem e.g. Dynamic Voltage/Freq. Scaling ? Power-aware Power-aware Task Scheduling Packet Scheduling? OS/Middleware/Application
  • 50. Our Focus: Wirelessly Networked Systems Computation Subsystem Radio WHY? Energy/Bit >> Energy/Op e.g. Dynamic Voltage/Freq. Scaling ? Power-aware Power-aware Task Scheduling Packet Scheduling? OS/Middleware/Application
  • 51. Our Research Agenda • How to power manage the communication subsystem? – What are the power-performance control knobs for communication subsystems such as radios? – How to use these knobs to do power-aware packet scheduling? • How to coordinate the power management of computing and communication subsystems in a networked system? • How to determine the effectiveness of various adaptive and non- adaptive power management policies? – How much energy is saved? – What is the penalty (slowdown, timing violations)? • Develop design-time power management analysis and synthesis tools, and run-time power management framework.
  • 52. Understanding Radio Power Consumption packet Transmit Transmit Processing Amplifier d packet Receive Processing Ebit(Tx) = (PRF + PE(Tx)).Tbit = ( d + PE(Tx)).Tbit Ebit(Rx) = PE(Rx).Tbit| • Observations – The RF power dominates over the electronic power associated with transmit or receive processing, except for short distances (5-10 m). – The RF power is not helped by voltage scaling – Much of electronic power is in RF circuits, and that too is not helped by low power techniques for digital circuits
  • 53. Power Management in Radios • We can of course shutdown the radios just as we shutdown digital hardware… – Problem: how to wake up a remote radio? “wakeup paradox” • Another possibility: can we do a power-speed trade-off as in digital hardware? – Slowdown is better than shutdown in digital circuits (for small leakage currents) – Dynamic Voltage/Frequency Scaling (DVS) » exploits convex-shaped P vs. S curve » better than shutting down or selecting a fixed optimum voltage » power-aware task schedulers exploit DVS Question: are there energy-speed control knobs in radios that are analogous to DVS?
  • 54. Yes! • Two that we have formulated and studied – Dynamic Modulation Scaling (DMS) – Dynamic Code Scaling (DCS) • Hybrids also possible • They need to be integrated with higher layer power management policies – DMS and DCS integrated with packet schedulers, akin to DVS integrated with OS schedulers for CPUs • DMS and DCS are analogous to DVS, but many crucial differences – e.g. packets are sent non-preemptively, their settings cannot be changed in the middle etc.
  • 55. Modulation • Modulation codes bits into channel symbols, which correspond to different waveforms • Modulation level b = # of bits in one symbol – # of possible waveforms • Modulation level affects – time to communicate a bit – energy to communicate a bit (for given error probability) • DMS exploit this!
  • 56. Principle of DMS Ebit b=6 Shutdown b=4 b=0 b Energy b=4 Slowdown b=2 b=2 L·Ebit L·Tbit Tbit • The energy - delay curve is convex Slowing down is more energy efficient than shutting down • For energy efficiency, operate as slow as possible
  • 57. Definition of DMS Adapt the modulation level b on the fly to minimize the energy while satisfying performance constraints Analogous to dynamic voltage scaling in digital circuits, where the operating voltage is varied to trade off speed versus energy E Ebit b=6 b V b=4 b=2 Tbit td
  • 58. Analogy with DVS Voltage Modulation • Scaling modulation on the fly scaling scaling results in energy awareness • Strong analogy between V b modulation scaling and voltage scaling f Rb – Low power techniques, like parallelism – Packet scheduling like task Pswitching Ptransmit scheduling Pleakage Pelectronics – Other power management techniques Eoperation Ebit
  • 59. Another Knob: DCS • Data is coded by introducing redundancy so that receiver can correct errors • Code rate = ratio of the size of the original data to the size of the coded data • Reducing code rate (i.e. stronger code) – Longer delay – Reduced energy • Energy-delay trade-off much like DVS and DCS
  • 60. Energy-delay Trade-off for a Code Ebit ( J) Tbit ( s)
  • 61. How to exploit DMS and DCS? • Packet scheduler in medium access control protocol decides when to send a packet • With DMS/DCS it can also decide, at what speed • Akin to integration of DVS in task schedulers in OSs with some crucial differences – Packets are sent non-preemptively – DMS/DCS settings can’t be changed in the middle of a packet – Receiver needs to know DMS/DCS settings – Optimal DMS/DCS settings also change as the condition of the communication channel changes (which is unpredictable) » Like doing DVS on a CPU whose MIPS is time-varying!
  • 62. Example: Queue-based DMS • Adapt modulation based on number of packets in the queue • Different {queue, b}-settings make the system operate at different points in the energy-delay curve Eav ( J) Queue Processor Radio R-DPM Tav ( s)
  • 63. Effectiveness of Power Management • Model – Functionality is composed of individual tasks – Each task dissipates power to service requests that arrive over time – Inter-arrival time of requests is unknown – Requests are of different sizes and must be served in the order received – Task can choose to move to power minimizing states • DPM is an on-line problem – input sequence is received dynamically during run-time – characteristics of the input sequence is not known – any algorithm to solve the problem cannot make static decisions about the input • Competitive analysis provides a framework for understanding online strategies
  • 64. Competitive Analysis • Strategy S has a Competitive Ratio, r – if for all input sequence, , CS( ) <= r . Copt( ) • Think of a 2-player game against a malicious adversary – Adversary creates an input sequence dynamically – Adversary knows the strategy, and creates inputs that makes the strategy perform as non-optimally as possible – Adversary is aware of the move of the strategy against every input • Competitive Ratio akin to Complexity Lower Bounds – Represents the worst possible scenario
  • 65. DPM Bounds: 2-state case • DPM strategies can be non-adaptive or adaptive • 2-state Non-Adaptive Er – 2 – 1/k where k is discretized tk, break-even time, tk Pi – It is also shown that this bound is tight » No Non-adaptive strategy has a better CR • 2-State Adaptive (Shutdown after variable time) – e/(e-1) ~ 1.6 – No Adaptive Algorithm has a better CR
  • 66. Latency Consideration • If there was no DPM implemented – System would be always in active or idle state – A new job arrival at idle state, will be scheduled immediately – If the system goes to sleep or other low-power states » A latency is incurred due to power management • Affects the response time for the requests » 2-state case, an upper bound on this extraneous latency • Increase in latency is upper bounded by the “system recovery time due to PM”
  • 67. Multi-State DPM: CR Bound • Let there be k+1 states – Let State 0 be the shut-down state and k be the active state – Let I be the power dissipation rate at state I – Let I be the total energy dissipated to move back to State k – States are ordered such that I I+1 – 0 = 0 (without loss of generality) • Assume – Power down energy cost can be incorporated in the power up cost for analysis – Idle time duration unknown
  • 68. Lower Envelope Algorithm State3 State2 State1 State 0 Energy t1 t2 t3 Time
  • 69. Deterministic Algorithm (LEA) • The optimal cost for an interval of time t (idle) is ) t(EL t { inim } • LEA will remain in state j as long as: tj j ti { inim } – The system will remain along the lower envelope of the curve. • The Lower Envelope Algorithm is 2-competitive. – algorithm does not consider anything about input pattern, hence ratio 2 is very good. • This ratio can be improved by considering input distribution – Which can be learned on-line
  • 70. Probability Based Approach • Suppose we know the probability distribution of the idle time – Assume the p.d.f to be (t), generated using online learning (Not covered here) • In 2 state case, let be the time to transition to sleep state – Then the expected energy dissipation t t )d t t ] (t ) d t 0 The offline algorithm sets = / to minimize /a t t ) dt t ] (t ) dt 0 /a
  • 71. Probability based Approach • So the Competitive Ratio is min { t t ) dt t ] (t ) dt} 0 /a t t )dt t ] (t )dt 0 /a – It can be shown that this is <= e/(e-1) • Extended to the multi-state case – The algorithm determines k thresholds i – At i the algorithm transitions from state i to i-1
  • 72. Experimental Evaluation Comparison of Energy Dissipation between the Deterministic and Probability Based Strategies 6 5 4 Deterministic online Joules 3 Optimal offline Probability Based online 2 1 0 13 15 3 5 7 9 11 1 ce ce ce ce ce ce ce ce ra ra ra ra ra ra ra ra T T T T T T T T Traces
  • 73. Competitive Ratio Comparison Average Competitive Ratio Comparison t6.H4181.idle t6.H4127.idle t6.H4119.idle t6.H4058.idle t6.H4008.idle t6.H3113.idle t6.H3069.idle t6.H2217.idle t6.H2207.idle t6.H2149.idle 0 0.5 1 1.5 2 2.5 t6.H2029.idle t6.H2014.idle CR t6.H2012.idle t6.H1074.idle
  • 74. Comparison with other DPM Strategies 2500.0000 2000.0000 Optimal 1500.0000 DET 1000.0000 OPBA LAST: P 500.0000 LAST:NP TREE:P 0.0000 CR Latency Optimal 1000.0000 0 TREE:NP DET 1470.0000 870.91 EXP:P OPBA 1010.0000 1332.41 LAST: P 1900.0000 1010.87 EXP:NP LAST:NP 1090.0000 2183.92 TREE:P 2250.0000 1285.01 TREE:NP 1050.0000 2239.57 EXP:P 1250.0000 1080.16 EXP:NP 1040.0000 1897.29
  • 75. The Adversary • The adversary is the “worst case” scenario generator, • In adversarial analysis (competitive analysis) one assumes that – The adversary knows the future of the input sequence – Hence it can take a strategy to power down the system based on its future knowledge – That makes the adversary minimize power dissipation • We use non-determinism to make an adversary all powerful – If the adversary is allowed to choose not to go to sleep state non-deterministically, that is sufficient to allow it to minimize power dissipation – When asked to prove a bound, if the bound does not hold, the model checker will generate a sequence showing adversary’s schedule that falsifies the bound.
  • 76. Automatic Adversary Construction • Model the strategy as a finite state transition system • Model task arrivals at every clock by a Boolean variable “task”, » If “task” is 1 during a clock, the system is either working on a task or a new task arrived, if task = 0 then the system is idle » If task = 0, the system goes into “idle state” and dissipates 1 unit of power, and if the number of such consecutive intervals is greater than k, then system goes into “sleep” state, where it does not dissipate power • Model the adversary, by making it a non-deterministic machine » adversary non-deterministically decides if it wants to go to “sleep” state » Non-determinism makes the adversary all powerful to employ any strategy to minimize its power consumption
  • 77. Example: A Non-Adaptive Power management Strategy • Strategy NON-ADAPTIVE shuts down after k idle system clocks, where k is the ratio of power needed to come back from sleep state to power dissipated during “idle” system state Algorithm Non-Adaptive: if (idle_state) then idle_intervals = idle_intervals + 1; if (idle_intervals k) then go to sleep_state (zero power dissipation) • Property proven is: prop5: assert(G(((steps < STEPS)) -> ( power <= 2*power2))) • We have proved a “Small model theorem” That shows that bounding the number is not unsafe.
  • 78. SMV model checker based analysis init(mode) := idle; init(clock) := 0; • SMV Code init(power) := 0; init(steps) := 0; if (steps <= (STEPS - 1)) module strategy(task){ next(steps) := steps + 1; input task : boolean; else { else next(steps) := STEPS; steps : 0..STEPS; if (mode = sleep) { init(power2) := 0; clock : 0..k; next(power) := power + k; init(mode2) := idle; predict: boolean; mode : {sleep, busy, idle}; /* mode under NON- } Initialization ADAPTIVE */ if (mode2 = sleep) { mode2 : {sleep, busy, idle}; next(power2) := power2 + /* mode under adversary */ k; power2 : 0..MAXINT; } if (~task){ /* power wasted by adversary next(mode2) := busy; if ((clock = k-1)) next(mode) := sleep; */ next(mode) := busy; else next(mode) := idle; power : 0..MAXINT; next(clock) := 0; if ((clock = k-1) & predict) next(mode2) := sleep; /*power wasted by NON- } else next(mode2) := idle; ADAPTIVE */ next(power2) := power2 + 1; Sleep to busy when next(power) := power + 1; } task arrives next(clock) := (clock + 1) mod k; } declarations If no task at this cycle (adversary may decide not to power down based on “predict”)
  • 79. Counter Example to See the adversarial Strategy • If we try to prove a bound of 1.5 w.r.t to adversary, we get the following 22 cycle counter example which shows, how the adversary makes NON-ADAPTIVE lose more power Cycles 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 NA state I I B B B I I I I S B I I I I S B I I I I S AD state I I B B B I I I I I B I I I I I B I I I I S I = idle, B = Busy, S = Sleep, AD = adversary, NA = Non-adaptive strategy • POC Implemented using SMV – Future plans include parameteric and probabilistic model checking techniques.
  • 80. Current Limitations and Plans • Current implementation – POC implementation using SMV model checker – excessive memory hog: capacity limited by available memory (256 MB RAM used); we had to bound the model checking to 500 clock cycles – currently use fixed values of k and a discretized power ratio. • Plans – “parametric” model checking to enable variable k – integrate model checking into a simulation framework for RTOS.
  • 81. Summary • Accomplishments – Development of measures to quantify the effectiveness of system-level power management algorithms, and methods to analytically determine these measures » Competitive Ratio, a ratio of the power reduction by a given algorithm compared to that by an optimal offline algorithm • gives analytic bounds on achievable power reduction » Tighter bounds based on a model-checking strategy » Developed Java-based experimental tool interface at http://www.ics.uci.edu/~osdpm – Identification and characterization of control knobs for power management of radios » Developed and characterized two radio-level control knobs for power- performance scaling • “Dynamic Modulation Scaling” (DMS) • “Dynamic Code Scaling” (DCS) » Analogous to the Dynamic Voltage Scaling control-knob in digital computing circuits