Energy Efficiency in Large Scale Systems


Published on

June 2010 presentation by GreenLight researcher Tajana Rosing on her project research on system energy efficiency.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This figure shows a typical fan controller that is based on a classical close-loop approach. The fan controller decides the required fan speed. The output of the controller is fed to the actuator to actually adjust the fan speed. The feedback is collected using thermal sensors (each CPU core has a dedicated thermal sensor) where the fan speed is in proportional to the highest temperature <click> The cooling optimizations techniques up until now focused mainly on the fan controller without including workload management which we show later that including workload management can results in a big cooling savings
  • Current load balancing do not consider cooling costs <click> Read the example to the audience (stop when you reach the equation) [The figure is the visual representation of the example]. In this figure we show a case of dual sockets (each socket has 4 cores where each runs 1 (thr=workload thread or job)<click> Thermal imbalance leads to cooling inefficiencies due to “cubic relation between fan speed and power” <click> This indicate that better workload assignment can improve the thermal distribution and lower cooling cost. The question is HOW and WHEN to schedule the workload
  • We utilize the freedom in migrating the workload around to perform cooling aware workload scheduling to minimize the cooling costs<click> The good news is that the migration overhead of the threads between sockets is minor since the temperature change is quite slow (order of sec) compared to the migration time (order of micro sec)<click> In this example we show a case of thermal imbalance between two sockets (one fan run at high speed while the other at low speed)<click> The challenge is which threads to migrate to get a better thermal and cooling balance. Then read the second bullet in the yellow box
  • The question that we need to answer is “when we should trigger the workload rescheduling”One way is to employ a reactive approach that acts when the system is in cooling inefficiency condition. The problem with this approach is that mitigating the inefficiencies require time (temperature changes slowly) which impacts the cooling savings, noise and may generate instability in the fan system <click> The alternative way is to use proactive researching that predict then avoid cooling inefficiencies at earlier point in time and reschedule accordingly. Read quickly the benefits in the green box<click> Read the challenge sentence
  • In this slide and the following one we illustrate the fundamental ways to deliver cooling savings: This slide explains “spreading the hot threads” concept to obtain cooling savings through creating a better temperature distribution across the CPU sockets. This technique needs to be applied when there is an imbalance in the heat sink temperature across the CPU sockets. To implement job spreading we can employ either job migration or swapping (read the two bullets briefly). <click> The example in the bottom clearly shows how spreading works. In the left side we have a case of big imbalance. To solve the imbalance we swap the hot threads (C,D) with the colder ones (W,X). The two fans now run at a moderate speed (savings is expected due to the cubic relation between fan power and speed)
  • Here we illustrate the second way to obtain cooling savings. The motivation is to concentrate more hot threads into fewer sockets while keeping their fan speed in almost the same. We apply this method when the average temperature across sockets is in similar range (it should be noted that consolidation is not opposite to the spreading but it can be applies on top of it)Consolidation can be implemented in two ways:Squeezing more hot jobs to the fan that is running more that what it should be (fan speeds is discrete, e.g 8 or 16 speeds)<click> The other way is to trade a (hot thread) from the socket that have lower fan with (colder threads but have similar total power) from the socket with higher fan speed to maintain temperature balance. This help lowering the fan speed of the socket that receives the cold threads while keeping the higher fan speed almost the same. The example below illustrate this case
  • ×