System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks<br />HeechulYun, Po-Liang Wu, AnshuArya, T...
DVS in Real-time Systems<br />The Goal<br />To minimize energy consumption by adjusting freq. and voltage but still meet t...
Motivation<br />3<br />Memxfer5b : memory benchmark program<br />Half of CPU clock<br />Exec. time increased only 3%<br />...
Motivation<br />4<br />Dhrystone: CPU benchmark program<br />Half of Mem clock<br />Exec time increased only 0.05%<br />En...
Contents<br />Motivation <br />Energy Model<br />Considers CPU, BUS and Memory  and task characteristics<br />Evaluation (...
Task Model<br />6<br />computation<br />memory fetch<br />(cache stall)<br />power<br />power<br />Computation<br />Memory...
Task Model (2)<br />power<br />Lower CPU freq<br />M<br />power<br />C<br />time<br />C<br />M<br />time<br />power<br />C...
Task Model (3)<br />Execution time of a task<br />C : CPU cycles of a given task<br />M : memory cycles of a given task <b...
Power Model<br />Power of a component (i.e., CPU)<br />k : capacitance constant<br />f : frequency of the component <br />...
Energy Model<br />10<br />power<br />Pure <br />Computation<br />Memory <br />Fetch<br />(Cache stall)<br />idle<br />time...
Pure Computation Block<br />11<br />power<br />CPU active<br />Memory <br />Fetch<br />(Cache stall)<br />Bus, memstandby<...
Memory Fetch Block<br />12<br />power<br />Pure Computation<br />CPU standby<br />Bus, memactive<br />idle<br />System sta...
Idle Block<br />13<br />power<br />Pure<br />Computation<br />Memory <br />Fetch<br />(Cache stall)<br />CPU, bus, mem idl...
Energy Model Summary<br />14<br />power<br />Ecpu<br />Emem<br />Eidle<br />pure exec block<br />MEM fetch block<br />idle...
Energy Equation<br />15<br />CPU block<br />Memory block<br />Idle block<br />System-wide  energy consumption of a task du...
Power supply<br />ARM926<br />PSRAM<br />(256KB)<br />8K-I<br />8K-D<br />System bus<br />STMP3650 SoC<br />External perip...
Evaluation Platform (2)<br />ARM9 based SoC<br />CPU : up to 200Mhz, BUS : up to 100Mhz <br />CPU and BUS are synchronous ...
Validation<br />Methodology<br />4 synthetic programs with different cache stall ratio (0%, 10%, 25%, 55%) <br />8 clock c...
Energy Model Fitting<br />19<br />Coefficient of determination(R2) is 99.97%<br />(100% is a perfect fit)<br />
Energy Equation for Our Platform<br />20<br />Obtained coefficients in the energy equation<br />
Contents<br />Motivation <br />Energy Model<br />Considers CPU, BUS and Memory  and task characteristics<br />Evaluation<b...
Static Multi-DVS Problem<br />Given a set of periodic real-time tasks (T1, …,Tn), where each task invocation requires up t...
Problem Formulation<br />Minimize<br />Subjects to<br />where<br />23<br />H : hyper period<br />ei : execution time of ta...
Optimal Solution<br />Intuitive procedure  <br />Find an unconstrained minimal over fc and fm (fb= fm)<br />Check boundary...
Energy Plot<br />25<br />Blue : less energy<br />Red : more energy<br />fm(MHz)<br />Deadlineboundary<br />fc(MHz)<br />Ta...
Evaluation<br />Compare the following schemes: <br />MAX<br />CPU and memory are all set to maximum.<br />CPU-only static ...
Energy vs Utilization<br />27<br />Normalized average power consumption<br />utilization<br />Task set cache stall ratio (...
Energy vs Cache Stall Ratio<br />28<br />Normalized average power consumption<br />Cache stall ratio<br />Task set utiliza...
Effect of Diversity of Cache Stall Ratio<br />29<br />Normalized energy consumption<br />diversity<br />homogeneous<br />d...
Conclusion<br />Energy model <br />Considers multiple DVS capable components and task characteristic<br />Validated on a r...
Thank you.<br />31<br />
Additional Slides<br />32<br />
CPU-only DVS<br />33<br />Valid range (~200Mhz)<br />Energy<br />(mJ)<br />fc<br />(Mhz)<br />Not effective in allowed ran...
Power Distribution<br />34<br />Cache stall ratio = 55%<br />(cpu,bus)=(80,80Mhz)<br />Cache stall ratio = 10%<br />(cpu,b...
Active and Idle<br />35<br />mJ<br />mJ<br />fc<br />(Mhz)<br />fm<br />(Mhz)<br />(*) actual measurement result<br />
Upcoming SlideShare
Loading in …5
×

System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks

928 views

Published on

This presentation is about system-wide energy optimization for multiple-DVS components and real-time tasks. It describes a realistic energy model for embedded systems, and present a method to optimize the energy consumption. It is presented at ECRTS 2010.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
928
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • DVS is extensively studied in real-time system. The goal is to minimize energy consumption and still meet the deadline of tasks. Most DVS schemes only considerCPU and assume execution time depends on CPU freq. In reality, there are other similarly important components such as memory and bus that affects execution time and energy consumption. Moreover, unlike desktop processors, in many embedded processors, we can control bus and memory clocks as well as cpu clock.
  • Here is one real example. We ran a memory intensive benchmark on our hardware platform and measured the power consumption of the entire system. When we lower the CPU clock to half (click), the execution time does not change much (click), only a 3% increase, instead of double the execution time. This is because most of the time, the task fetch data from memory which is independent from CPU clock speed. So, by reducing the CPU clock, we save 30% on energy consumption with only 3% of time increase.
  • In a second example, we ran a CPU benchmark program called dhrystone. Of course, if we change CPU clock to half, the execution time will double. However, if we change memory clock to half, its execution time does not change at all, because this program does not fetch data from memory much, but we save 10% on energy consumption. The amount of save energy is smaller than the previous experiment, but still significant.  These experiments shows that CPU only DVS is not enough and motivate to us develop a realistic model that can explain these behaviors.
  • Mathematically, the execution time of a task is defined in this eq. Here C is cpu cycles, M is memory cycles, fc is cpu clock, and fm is memory clock. Then the execution time e is C over fc plus M over fm.
  • And this equation is standard power equation. Here k is capacitance constant for the component and f is frequency, V is voltage, and R is leakage power of the component. that is Power consumption W equal to kfV square plus RHere, it is important to understand that if operation mode is different then k is different. For example, when CPU is busy, its power consumption is much bigger compared to when cpu is not doing anything even though its operating frequency is the same. It is because of low power h/w design such as clock gating technique reduce the number of active part of the component so that it lower its aggeregated capacitance constance.
  • Now, let’s consider a task energy consumption for a given period P. Task execution can be divided into two blocks: pure computation and cache stall which means time to fetch data from off chip memory to fill the cache line of the processor core. While memory fetch operations are scattered throughout the entire execution, we aggregate them into this single block. This is true for in-order processor in which processor must wait until data to be fetched when there is any single cache miss. Therefore there’s no overlap between CPU execution and memory fetch. This is not true for out-of-order processors because there is overlapping period due to out-of-order executions, but it is relatively small compared to long memory fetch time. When a task complete, cpu, bus and memory are all in idle state. Then, the energy consumption can be expressed as the sum of energy consumption of these three blocks.
  • Now, let’s look at the first block – pure computation block. As I said, we have three major component to consider, cpu, memory, and bus that each component is a single term in this equation.The cpu … The bus and memory It is important to note that each component is in different mode of operation. At this block, the CPU is actively executing instructions without any delay. But bus and memory are not doing anything. When a component is in different mode of operation, the capacitance constant can be quite different even if its operating frequency remain the same because of various power saving techniques, such as clock gating, used in recent hardware design. Therefore we use different capacitance constant for different for active and standby mode. Therefore, in this equation, kca is capacitance constant for active cpu, kbs is standby bus, and kms is also standby memory. R represents the sum of all static power components in the system. The execution time of this portion of the task is C over fc. So far, we described this block.
  • The next block is cache stall block. The difference, compared to pure computation block, is mode of operation of each component. In this block, CPU is do noting but wait data are being fetched from memory. Therefore, in this block, CPU is idle but bus and memory is active. Again we use different capacitance constant for each component at each mode of operation. Here, Kcs is standby cpu, kbs is active bus, kma is active memory. And the execution time of this portion is M over fm.
  • Final block is idle block. Since the task is finished, all components – cpu, bus, memory – are in idle mode. We assume there is special mode in the system that save power more aggressively, which can be found in many recent embedded processors. Therefore, we used a separate term I instead to represent the power consumption in that special mode of the system. The execution time is period minus the execution time of the task.
  • And this is the equation we derived which describes energy consumption of entire system running a periodic task.
  • In this hardware platform, we used 4 synthetic programs with different cache stall ratio (0-55%) and for each task, we used 8 different clock configurations and measured the energy consumption. After measuring total 32 data we performed non-linear-…
  • Hyper period EDF schedule (dynamic scheduling)
  • System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks

    1. 1. System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks<br />HeechulYun, Po-Liang Wu, AnshuArya, TarekAbdelzaher, Cheolgi Kim, and LuiSha<br />
    2. 2. DVS in Real-time Systems<br />The Goal<br />To minimize energy consumption by adjusting freq. and voltage but still meet the deadline<br />Most consider CPU only <br />Assume execution time depends on CPU freq. <br />But memory and bus are also important<br />Affect execution time (e.g., memory intensive app will be slowed if memory or bus is slow.)<br />Consume considerable energy (similar order of energy compared to CPU)<br />Are DVS capable in many recent embedded processors<br />2<br />
    3. 3. Motivation<br />3<br />Memxfer5b : memory benchmark program<br />Half of CPU clock<br />Exec. time increased only 3%<br />Energy saved 30%<br />
    4. 4. Motivation<br />4<br />Dhrystone: CPU benchmark program<br />Half of Mem clock<br />Exec time increased only 0.05%<br />Energy saved 10%<br />
    5. 5. Contents<br />Motivation <br />Energy Model<br />Considers CPU, BUS and Memory and task characteristics<br />Evaluation (Model validation)<br />Energy Optimization of Real-time Tasks<br />Static multi-DVS problem and solution<br />Evaluation <br />Conclusion<br />5<br />
    6. 6. Task Model<br />6<br />computation<br />memory fetch<br />(cache stall)<br />power<br />power<br />Computation<br />Memory<br />fetch<br />time<br />time<br />Task = Computation + Memory fetch<br />
    7. 7. Task Model (2)<br />power<br />Lower CPU freq<br />M<br />power<br />C<br />time<br />C<br />M<br />time<br />power<br />C : computation<br />M : off-chip memory fetch<br /> (cache-stall cycles)<br />C<br />Lower MEM freq<br />M<br />time<br />7<br />
    8. 8. Task Model (3)<br />Execution time of a task<br />C : CPU cycles of a given task<br />M : memory cycles of a given task <br />fc : CPU clock frequency<br />fm : Memory clock frequency <br />8<br />
    9. 9. Power Model<br />Power of a component (i.e., CPU)<br />k : capacitance constant<br />f : frequency of the component <br />V : supplying voltage <br />R : leakage power <br />9<br />Different k for different modes: kactive - active mode capacitance <br />kstandby- standby mode capacitance <br />
    10. 10. Energy Model<br />10<br />power<br />Pure <br />Computation<br />Memory <br />Fetch<br />(Cache stall)<br />idle<br />time<br />P (Period)<br />e(exec. time)<br />Total system energy is<br />
    11. 11. Pure Computation Block<br />11<br />power<br />CPU active<br />Memory <br />Fetch<br />(Cache stall)<br />Bus, memstandby<br />idle<br />System static<br />time<br />e<br />P<br />kca : capacitance constant for activecpu<br />kbs : capacitance constant for standby bus <br />kms : capacitance constant for standby memory<br />R : system wide static power consumption<br />
    12. 12. Memory Fetch Block<br />12<br />power<br />Pure Computation<br />CPU standby<br />Bus, memactive<br />idle<br />System static<br />time<br />e<br />P<br />kcs : capacitance constant for standbycpu<br />kba : capacitance constant for active bus <br />kma : capacitance constant for active memory<br />
    13. 13. Idle Block<br />13<br />power<br />Pure<br />Computation<br />Memory <br />Fetch<br />(Cache stall)<br />CPU, bus, mem idle<br />System static<br />time<br />e<br />P<br />I : idle mode power consumption. <br />e: execution time (C/fc + M/fm )<br />
    14. 14. Energy Model Summary<br />14<br />power<br />Ecpu<br />Emem<br />Eidle<br />pure exec block<br />MEM fetch block<br />idle block <br />CPU active<br />CPU standby<br />Memory <br />Fetch<br />Dynamic <br />power<br />Bus, memactive<br />Bus, memstandby<br />CPU, bus, mem idle<br />idle<br />System static<br />time<br />e<br />P<br />System wide energy model<br />Considers CPU, bus, and memory power consumption <br />Considers active, standby and idle modes<br />Other components are assumed to be static (included in R)<br />
    15. 15. Energy Equation<br />15<br />CPU block<br />Memory block<br />Idle block<br />System-wide energy consumption of a task during period P<br />
    16. 16. Power supply<br />ARM926<br />PSRAM<br />(256KB)<br />8K-I<br />8K-D<br />System bus<br />STMP3650 SoC<br />External peripherals<br /> (flash, LCD, External DRAM, …) <br />BOARD<br />16<br />Evaluation Platform<br />Multi-meter<br />
    17. 17. Evaluation Platform (2)<br />ARM9 based SoC<br />CPU : up to 200Mhz, BUS : up to 100Mhz <br />CPU and BUS are synchronous (BUS = CPU/N)<br />Memory (PSRAM) freq is equal to system bus frequency (fb=fm)<br />CPU, BUS, and memory all share the common voltage<br />Vdd : 1.504V ~ 1.804V (0.32V step)<br />Energy equation<br />V : shared voltage for CPU, bus, and memory <br /> : active bus and memory constant<br /> : standby bus and memory constant<br />17<br />
    18. 18. Validation<br />Methodology<br />4 synthetic programs with different cache stall ratio (0%, 10%, 25%, 55%) <br />8 clock configurations (fc, fm) for each program<br />Performed nonlinear least square analysis for total 32 data points against the energy equation<br />18<br />
    19. 19. Energy Model Fitting<br />19<br />Coefficient of determination(R2) is 99.97%<br />(100% is a perfect fit)<br />
    20. 20. Energy Equation for Our Platform<br />20<br />Obtained coefficients in the energy equation<br />
    21. 21. Contents<br />Motivation <br />Energy Model<br />Considers CPU, BUS and Memory and task characteristics<br />Evaluation<br />Energy Optimization of Real-time Tasks<br />Static Multi-DVS Problem and optimal solution <br />Evaluation<br />Conclusion<br />21<br />
    22. 22. Static Multi-DVS Problem<br />Given a set of periodic real-time tasks (T1, …,Tn), where each task invocation requires up to Ci CPU cycles and up to Mi memory cycles at worst. <br />Find the energy optimal static frequencies for multiple DVS capable components (CPU, bus, and memory)<br />22<br />
    23. 23. Problem Formulation<br />Minimize<br />Subjects to<br />where<br />23<br />H : hyper period<br />ei : execution time of task i<br />Ecomp,i: computation block energy of task i<br />Emem,i: cache stall block energy of task i<br />Eidle: idle block energy<br />
    24. 24. Optimal Solution<br />Intuitive procedure <br />Find an unconstrained minimal over fc and fm (fb= fm)<br />Check boundary conditions due to system specific constraints. (e.g., minimum and maximum clock range)<br />Details are in the paper<br />24<br />
    25. 25. Energy Plot<br />25<br />Blue : less energy<br />Red : more energy<br />fm(MHz)<br />Deadlineboundary<br />fc(MHz)<br />Task set : CH = 140*106, MH = 30*106 ,H = 3s<br />
    26. 26. Evaluation<br />Compare the following schemes: <br />MAX<br />CPU and memory are all set to maximum.<br />CPU-only static DVS<br />Memory frequency is set to maximum<br />Baseline static multi-DVS<br />CPU and memory frequencies change proportionally <br />Optimal static multi-DVS <br />Proposed scheme<br />Optimal dynamic multi-DVS<br />Can change frequencies at each task schedule<br />Brute force search among all the possible combination <br />Simulation setup <br />Use energy equation obtained from measurements on our real hardware platform<br />26<br />
    27. 27. Energy vs Utilization<br />27<br />Normalized average power consumption<br />utilization<br />Task set cache stall ratio (MH/(CH+MH) ):0.3<br />
    28. 28. Energy vs Cache Stall Ratio<br />28<br />Normalized average power consumption<br />Cache stall ratio<br />Task set utilization ratio(eH/H):0.5<br />
    29. 29. Effect of Diversity of Cache Stall Ratio<br />29<br />Normalized energy consumption<br />diversity<br />homogeneous<br />diverse<br />Task set cache stall ratio = 0.45, <br />Task set utilization ratio(eH/H):0.5<br />
    30. 30. Conclusion<br />Energy model <br />Considers multiple DVS capable components and task characteristic<br />Validated on a real hardware platform<br />Static multi-DVS problem <br />Assigns energy optimal static frequencies of multiple DVS components for periodic real-time tasks <br />Optimal solution (static multi-DVS scheme) shows better energy saving compared to CPU-only DVS<br />30<br />
    31. 31. Thank you.<br />31<br />
    32. 32. Additional Slides<br />32<br />
    33. 33. CPU-only DVS<br />33<br />Valid range (~200Mhz)<br />Energy<br />(mJ)<br />fc<br />(Mhz)<br />Not effective in allowed range<br />(*) based on energy equation for out h/w platform. Memory clock was set to max<br />
    34. 34. Power Distribution<br />34<br />Cache stall ratio = 55%<br />(cpu,bus)=(80,80Mhz)<br />Cache stall ratio = 10%<br />(cpu,bus)=(80,80Mhz)<br />(*) based on energy equation for our h/w platform E = Ecpu + Emem + Estatic<br />
    35. 35. Active and Idle<br />35<br />mJ<br />mJ<br />fc<br />(Mhz)<br />fm<br />(Mhz)<br />(*) actual measurement result<br />

    ×