Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks


Published on

This presentation is about system-wide energy optimization for multiple-DVS components and real-time tasks. It describes a realistic energy model for embedded systems, and present a method to optimize the energy consumption. It is presented at ECRTS 2010.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks

  1. 1. System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks<br />HeechulYun, Po-Liang Wu, AnshuArya, TarekAbdelzaher, Cheolgi Kim, and LuiSha<br />
  2. 2. DVS in Real-time Systems<br />The Goal<br />To minimize energy consumption by adjusting freq. and voltage but still meet the deadline<br />Most consider CPU only <br />Assume execution time depends on CPU freq. <br />But memory and bus are also important<br />Affect execution time (e.g., memory intensive app will be slowed if memory or bus is slow.)<br />Consume considerable energy (similar order of energy compared to CPU)<br />Are DVS capable in many recent embedded processors<br />2<br />
  3. 3. Motivation<br />3<br />Memxfer5b : memory benchmark program<br />Half of CPU clock<br />Exec. time increased only 3%<br />Energy saved 30%<br />
  4. 4. Motivation<br />4<br />Dhrystone: CPU benchmark program<br />Half of Mem clock<br />Exec time increased only 0.05%<br />Energy saved 10%<br />
  5. 5. Contents<br />Motivation <br />Energy Model<br />Considers CPU, BUS and Memory and task characteristics<br />Evaluation (Model validation)<br />Energy Optimization of Real-time Tasks<br />Static multi-DVS problem and solution<br />Evaluation <br />Conclusion<br />5<br />
  6. 6. Task Model<br />6<br />computation<br />memory fetch<br />(cache stall)<br />power<br />power<br />Computation<br />Memory<br />fetch<br />time<br />time<br />Task = Computation + Memory fetch<br />
  7. 7. Task Model (2)<br />power<br />Lower CPU freq<br />M<br />power<br />C<br />time<br />C<br />M<br />time<br />power<br />C : computation<br />M : off-chip memory fetch<br /> (cache-stall cycles)<br />C<br />Lower MEM freq<br />M<br />time<br />7<br />
  8. 8. Task Model (3)<br />Execution time of a task<br />C : CPU cycles of a given task<br />M : memory cycles of a given task <br />fc : CPU clock frequency<br />fm : Memory clock frequency <br />8<br />
  9. 9. Power Model<br />Power of a component (i.e., CPU)<br />k : capacitance constant<br />f : frequency of the component <br />V : supplying voltage <br />R : leakage power <br />9<br />Different k for different modes: kactive - active mode capacitance <br />kstandby- standby mode capacitance <br />
  10. 10. Energy Model<br />10<br />power<br />Pure <br />Computation<br />Memory <br />Fetch<br />(Cache stall)<br />idle<br />time<br />P (Period)<br />e(exec. time)<br />Total system energy is<br />
  11. 11. Pure Computation Block<br />11<br />power<br />CPU active<br />Memory <br />Fetch<br />(Cache stall)<br />Bus, memstandby<br />idle<br />System static<br />time<br />e<br />P<br />kca : capacitance constant for activecpu<br />kbs : capacitance constant for standby bus <br />kms : capacitance constant for standby memory<br />R : system wide static power consumption<br />
  12. 12. Memory Fetch Block<br />12<br />power<br />Pure Computation<br />CPU standby<br />Bus, memactive<br />idle<br />System static<br />time<br />e<br />P<br />kcs : capacitance constant for standbycpu<br />kba : capacitance constant for active bus <br />kma : capacitance constant for active memory<br />
  13. 13. Idle Block<br />13<br />power<br />Pure<br />Computation<br />Memory <br />Fetch<br />(Cache stall)<br />CPU, bus, mem idle<br />System static<br />time<br />e<br />P<br />I : idle mode power consumption. <br />e: execution time (C/fc + M/fm )<br />
  14. 14. Energy Model Summary<br />14<br />power<br />Ecpu<br />Emem<br />Eidle<br />pure exec block<br />MEM fetch block<br />idle block <br />CPU active<br />CPU standby<br />Memory <br />Fetch<br />Dynamic <br />power<br />Bus, memactive<br />Bus, memstandby<br />CPU, bus, mem idle<br />idle<br />System static<br />time<br />e<br />P<br />System wide energy model<br />Considers CPU, bus, and memory power consumption <br />Considers active, standby and idle modes<br />Other components are assumed to be static (included in R)<br />
  15. 15. Energy Equation<br />15<br />CPU block<br />Memory block<br />Idle block<br />System-wide energy consumption of a task during period P<br />
  16. 16. Power supply<br />ARM926<br />PSRAM<br />(256KB)<br />8K-I<br />8K-D<br />System bus<br />STMP3650 SoC<br />External peripherals<br /> (flash, LCD, External DRAM, …) <br />BOARD<br />16<br />Evaluation Platform<br />Multi-meter<br />
  17. 17. Evaluation Platform (2)<br />ARM9 based SoC<br />CPU : up to 200Mhz, BUS : up to 100Mhz <br />CPU and BUS are synchronous (BUS = CPU/N)<br />Memory (PSRAM) freq is equal to system bus frequency (fb=fm)<br />CPU, BUS, and memory all share the common voltage<br />Vdd : 1.504V ~ 1.804V (0.32V step)<br />Energy equation<br />V : shared voltage for CPU, bus, and memory <br /> : active bus and memory constant<br /> : standby bus and memory constant<br />17<br />
  18. 18. Validation<br />Methodology<br />4 synthetic programs with different cache stall ratio (0%, 10%, 25%, 55%) <br />8 clock configurations (fc, fm) for each program<br />Performed nonlinear least square analysis for total 32 data points against the energy equation<br />18<br />
  19. 19. Energy Model Fitting<br />19<br />Coefficient of determination(R2) is 99.97%<br />(100% is a perfect fit)<br />
  20. 20. Energy Equation for Our Platform<br />20<br />Obtained coefficients in the energy equation<br />
  21. 21. Contents<br />Motivation <br />Energy Model<br />Considers CPU, BUS and Memory and task characteristics<br />Evaluation<br />Energy Optimization of Real-time Tasks<br />Static Multi-DVS Problem and optimal solution <br />Evaluation<br />Conclusion<br />21<br />
  22. 22. Static Multi-DVS Problem<br />Given a set of periodic real-time tasks (T1, …,Tn), where each task invocation requires up to Ci CPU cycles and up to Mi memory cycles at worst. <br />Find the energy optimal static frequencies for multiple DVS capable components (CPU, bus, and memory)<br />22<br />
  23. 23. Problem Formulation<br />Minimize<br />Subjects to<br />where<br />23<br />H : hyper period<br />ei : execution time of task i<br />Ecomp,i: computation block energy of task i<br />Emem,i: cache stall block energy of task i<br />Eidle: idle block energy<br />
  24. 24. Optimal Solution<br />Intuitive procedure <br />Find an unconstrained minimal over fc and fm (fb= fm)<br />Check boundary conditions due to system specific constraints. (e.g., minimum and maximum clock range)<br />Details are in the paper<br />24<br />
  25. 25. Energy Plot<br />25<br />Blue : less energy<br />Red : more energy<br />fm(MHz)<br />Deadlineboundary<br />fc(MHz)<br />Task set : CH = 140*106, MH = 30*106 ,H = 3s<br />
  26. 26. Evaluation<br />Compare the following schemes: <br />MAX<br />CPU and memory are all set to maximum.<br />CPU-only static DVS<br />Memory frequency is set to maximum<br />Baseline static multi-DVS<br />CPU and memory frequencies change proportionally <br />Optimal static multi-DVS <br />Proposed scheme<br />Optimal dynamic multi-DVS<br />Can change frequencies at each task schedule<br />Brute force search among all the possible combination <br />Simulation setup <br />Use energy equation obtained from measurements on our real hardware platform<br />26<br />
  27. 27. Energy vs Utilization<br />27<br />Normalized average power consumption<br />utilization<br />Task set cache stall ratio (MH/(CH+MH) ):0.3<br />
  28. 28. Energy vs Cache Stall Ratio<br />28<br />Normalized average power consumption<br />Cache stall ratio<br />Task set utilization ratio(eH/H):0.5<br />
  29. 29. Effect of Diversity of Cache Stall Ratio<br />29<br />Normalized energy consumption<br />diversity<br />homogeneous<br />diverse<br />Task set cache stall ratio = 0.45, <br />Task set utilization ratio(eH/H):0.5<br />
  30. 30. Conclusion<br />Energy model <br />Considers multiple DVS capable components and task characteristic<br />Validated on a real hardware platform<br />Static multi-DVS problem <br />Assigns energy optimal static frequencies of multiple DVS components for periodic real-time tasks <br />Optimal solution (static multi-DVS scheme) shows better energy saving compared to CPU-only DVS<br />30<br />
  31. 31. Thank you.<br />31<br />
  32. 32. Additional Slides<br />32<br />
  33. 33. CPU-only DVS<br />33<br />Valid range (~200Mhz)<br />Energy<br />(mJ)<br />fc<br />(Mhz)<br />Not effective in allowed range<br />(*) based on energy equation for out h/w platform. Memory clock was set to max<br />
  34. 34. Power Distribution<br />34<br />Cache stall ratio = 55%<br />(cpu,bus)=(80,80Mhz)<br />Cache stall ratio = 10%<br />(cpu,bus)=(80,80Mhz)<br />(*) based on energy equation for our h/w platform E = Ecpu + Emem + Estatic<br />
  35. 35. Active and Idle<br />35<br />mJ<br />mJ<br />fc<br />(Mhz)<br />fm<br />(Mhz)<br />(*) actual measurement result<br />