Your SlideShare is downloading. ×
  • Like
  • Save
Sustainable Data Centers: HP Labs
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Sustainable Data Centers: HP Labs

  • 3,986 views
Published

A presentation and paper delivered by HP Labs director Prith Banerjee at the 46th Design Automation Conference. …

A presentation and paper delivered by HP Labs director Prith Banerjee at the 46th Design Automation Conference.

Full paper available here:

http://www.docstoc.com/docs/9173417/Sustainable-Data-Centers-Enabled-by-Supply-and-Demand-Side-Management

July 2009.

www.hplabs.com
www.twitter.com/hplabs

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,986
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
14

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The slide has all the text
  • Let us consider one example of flexible building blocks. We will focus on server design for the cloud market. This picture shows where the total costs of ownership are spent in current servers. (We assume infrastructure operational costs and a three year depreciation cycle; people costs are not considered given the market.) The right focuses on hardware costs and the left focuses on *burdened* power and cooling costs. Each portion is further broken down into its subcomponents like CPU, memory, etc.(Pause)Lots of interesting data in this slide, but want to highlight two key insights:Cost of power and cooling equal to cost of hardware (similar to what others, including, google, has been saying)But more importantly, there is no single large dominating component that can give us a factor of 2X or 5X. We need a *holistic* solution!
  • This picture shows the holistic solution that we designed. We call our approach microblades and megaservers. There are three elements to our approach. The first one (point to figure) is a redesign of the server platform to be extremely disaggregated and efficient. In this particular case, we replace the classical server processor with a more power-efficient more cost-efficient embedded processor (from the high-volume mobile market) – example, an Atom processor or a mobile core2duo, etc. The intuition is that many of these processors have significantly lower energy (4X or 10 lower) but do not correspondingly lower performance (only 1.5X to 2X lower) and given the throughput-centric nature of the workloads, we can have more of these cheaper lower-power systems to get same performance at lower power. We also simplify the rest of the components on the server as much as we can. The second element of our design is co-designed packaging and system architecture; we do two things. The inset shows one of the ideas – we use naked packages in each microblades, put a sheet of alumnium in between the microblades (kinda like a sandwich), *aggregate* all the heat and have one single heat sink at the top – this can improve cooling efficiency by about 25%; the second element is shown in the blue and red arrows that show airflow in the figure. We have designed a new kind of blade enclosure with a separate plenum that directs air in a more efficient manner (vertically instead of horizontally for example among other things) that can further improve cooling efficiency by another 25% or higher. The third element of our design is to have *memory disaggregation*. Just as we can think of storage area networks, what we have is memory across the network and a new novel memory hierarchy design that provides support for a remote level of memory including optimizations to dynamically “Swap-in” content as appropriate. This design also enables us to inject currently more-costly technologies like non-volatile memory (or in the future, memristors) without compromising commodity cost structures.Note that these three design aspects were *co-designed* with each other going back to the holistic design motivation in the previous slide.
  • So how did we do?The figure shows the performance-per-dollar improvement for our design compared to current state-of-the-art servers. The x axis shows different workloads common to the cloud market – we have websearch, webmail (think hotmail), video hosting (thinky youtube), mapreduce running different kinds of data processing, and the last bar is the harmonic mean.We can see that this design does really well. On average we get a factor of 2 improvement. That is, for the same costs, we can double the performance. Interestingly, in some cases, we get even higher performance 400 to 600% better performance for the same costs! While I focused on performance for TCO (total cost of ownership) here, we get similar results if you look at performance per watt as well. We can see that this notion of building flexible building blocks from a holistic point of view really works.
  • Application of sustainable data center principles to provision compute, power and cooling resources based on the needs of the user can dramatically reduce the cost of ownership. As illustration, HP consolidated 14 laboratory data centres into one large site in Bangalore, India and applied a dynamic control system that adjusts the utilisation of the air conditioning system based on 7,500 temperature sensors deployed throughout the data centre. The 40 percent savings in cooling power consumption achieved in this facility translates to an annual savings of approximately $1.2 million (U.S.), when compared to the conventional approach. In addition, with 7500 sensors gathering data every 10 sec, we have built an enormous testbed for our knowledge discovery research.Chandrakant can provide some anecdotes that can serve as good examples if you like e.g. “reverse outsourcing” to India by analyzing data through KD techniques in the USA.
  • How do we solve this problem – we call this the “no power struggles architecture”Like we mentioned earlier, the first part is a coordinated cross-layer monitoring framework. In addition to the pervasive sensoring, a key challenge is to communicate the information across layers. For example, the virtual machine layer needs to know about what the hardware is doing, and the lowest hardware controller needs to know about SLAs. And we need to provide a way to communicate this without resorting to global information sharing and proprietary protocols for communication that don’t scale or add to costs.The solution we have is something that we call m-channels and brokers. This figure shows an example of this architecture. You can see the platform elements in the bottom and the virtualization elements at the top. We assume two key elements – shown in the shaded areas. The m-channel abstractions provide a registry and proxy service for different sensing agents and management agents to register and communication information across layers. We leverage industry-standard protocols like the CIM (common information model) to be able to do that. A key second element is called m-brokers and that provides the mechanisms for coordinating the actuation. There is a lot of hard problems that need to be solved in the design of mchannels and mbrokers including unified namespaces, multi-agent coordination, and doing that at scale that I am not going to have time to talk about, but the paper has more details on these.
  • Once we have the basic architecture, we now need to have a coordination policy solution. The complicated (colorful ) figure summarizes our approach. The list on the right indicate the various controllers that we coordinate (using the mbroker abstractions from the last slide). We have the efficiency controller (think of it as classic voltage/frequency scaling), a local power capper (that capps power to a certain point), an enclosure and group capper that does the same thing at different levels, and then a virtual machine controller that migrates virtual machines and turns machines off. One of the advantages of the inter-disciplinary people we have in HP Labs is that we were able to walk to the next cube and talk to the control theory folks and you can see that at the core (in light blue) is a classical feedback controller. All we have done is to redefine the computer as a container that dynamically gets resized when the container is not utilized (in this case, CPU utilization is the utilization measurement and the resizing knob is voltage and frequency scaling). The local capper in red is now implemented as a controller that changes the preference on “how much we would like the container to be full” – the elegance of this approach is that we minimize the communication between the two controllers and essentially overload classical control theory terms to provide the coordination. Other controller interface in the same way – you can see that the VMC changes the input to the controllers for example. The advantage is that the same property of the control theory that enables it to respond to dynamism and changing behaviour of the workload now lets it learn about and respond to policies and actuations by other controllers! In addition, using control theory allows us to formally reason about stability. When you are talking to customers and you can give them a mathematical proof that the datacenter will not explode when they have 13 different controllers, you usually give them more confidence! 
  • More generally, we evaluated our approach on 8 different customer traces – ranging from web2.0 customers to pharmaceutical, finance, etc. You can see three numbers – avg power – namely how much of our electricity bill do I save – op-ex, peak power – how much of my peak provisioning can I reduce – capital expenditures, and performance. Redis baseline and green is our new solution. We can see that green does really well – a staggering 65% reduction in electricity costs and a 20% reduction in capex and most importantly with negligible changes to performance. This really illustrates the value of the coordination architecture in improving efficiency while providing stability and correctness.
  • One last advantage of such a coordination architecture.When we start thinking of the solution holistically, we now get a chance to rethink individual controllers? Do we *really* need all the controllers? Do we *Really* need all the complexity in the policies for the individual controllers? Or can we change them to be simpler when you know that we have another controller to back it up?The paper talks about several interesting insights that we found, but here we show one example. As before we have avg power, peak power and performance. We have the red baseline and green proposed solution, but we now have tow more bars – a blue and yellow bar corresponding to different permutations of controllers in the overall coordinated architecture. What we can see here is that (looking at the yellow) in this particular case for the systems and workloads considered, a solution that emphasizes the virtual machine controller can achieve most of the benefits that the overall solution comes at the expense of deemphasizing some of the local control. More generally, such a coordination architecture allows us to get deep insights on the assignment of functionality at different layers of the stack and going back to the original picture with the multiple verticals and horizontals once again motivates whey we want to look at this problem holistically!
  • In closing…

Transcript

  • 1. Sustainable Data CentersEnabled by Supply and Demand Side Management
    Prith Banerjee
    Senior Vice President of Research and
    Director, Hewlett Packard Laboratories
    (Co-authors: Cullen Bash, Chandrakant Patel, Partha Ranganathan)
  • 2. 2
    DAC2009-50-2
    HP Labs Research Areas
    The next technology challenges and opportunities
    Digital Commercial Print
    Intelligent Infrastructure
    Content Transformation
    Sustainability
    Immersive Interaction
    Cloud
    Analytics
    Information Management
  • 3. 3
    DAC2009-50-2
    The rest of the global economy
    IT industry
    98%
    2%
    IT must play a central role in addressing the global sustainability challenge.
    Industry Challenge
    Create technologies, IT infrastructure and business models for the low-carbon economy
    Total carbon emissions
    • As much as the aviation industry
    • 4. Projected to double by 2020
    • 5. IT can play a role in reducing this impact
    • 6. To do so, IT solutions must take a lifecycle perspective
    3
  • 7. 4
    DAC2009-50-2
    Role of the IT EcosystemData Centers at the Hub
    Sustainable Data Centers enabled by supply and demand side management of power, cooling and IT resources
    4
  • 8. 5
    DAC2009-50-2
    Supply and Demand Side Management
    Supply Side:
    Design of physical infrastructure with focus on lifecycle engineering and management, and the available energy required to extract, manufacture, operate and reclaim components;
    Utilization of local resources to minimize destruction of available energy in transmission, and construction of transmission infrastructure.
    Demand Side:
    Provisioning data center resources based on the needs and service level agreement of the user through use of flexible building blocks, pervasive sensing, knowledge discovery and policy based control
    5
  • 9. 6
    DAC2009-50-2
    IT:SW
    IT:HW
    Power
    Cooling
    Autonomous Control
    Knowledge Discovery & Visualization
    Pervasive Cross-layer Sensing
    Flexible, Efficient, & Configurable Building Blocks
    Data Center Scale Lifecycle Design
    Sustainable Data Center
    Key Elements
    extraction
    operation
    manufacturing
    End of Life
    6
  • 10. 7
    DAC2009-50-2
    Lifecycle Design
    extraction
    operation
    manufacturing
    End of Life
    7
  • 11. 8
    DAC2009-50-2
    Lifecycle Design through Data Center Synthesis
    Automate design of datacenters based on lifecycle considerations
    Synthesis Process Flow
    8
  • 12. 9
    DAC2009-50-2
    Flexible Building Blocks
    From chips, to servers to data centers
    Cooling Grid
    Qdata center + ∑W
    Power Grid – Wensemble
    Wblower
    Wblower
    Wpump
    Wcompressor
    Wpump
    Ground Coupled Loop
    Qsystem
    Wblower
    Wsystem
    Qchip
    Outside Air
    Wchip
    9
  • 13. 10
    DAC2009-50-2
    Microblades and Megaservers
    The Inefficiencies in the Cloud
    Hardware
    Power & cooling
  • 14. 11
    DAC2009-50-2
    Disaggregation
    Efficient building blocks



  • 15. 12
    DAC2009-50-2
    2X performance/$
  • 16. 13
    DAC2009-50-2
    Pervasive SensingIT & Facilities
    Cooling Infrastructure
    IT Hardware
    External Environment
    13
  • 17. 14
    DAC2009-50-2
    Knowledge Discovery in the IT ecosystem
    minimizing material and minimizing energy
    PCA-based Anomaly detection
    Mobile-enabled diagnostics
    Collaborative Fault Analytics
    Data Center Room Infrastructure
    Power Micro-Grid Infrastructure
    Cooling Grid Infrastructure
    Client Infrastructure
    Data Aggregation Pathways
    Event/Episode Detection
    Model
    Creation
    Causality
    Pre-Process
    Raw
    Data
    Useful Knowledge
    Visualization
    Expedient Assessment
    Early Warning System
    Lifetime Estimation
    Performance Metrics
    Efficiency Metrics
    Expert Systems
  • 18. DAC2009-50-2
    Air Mixture Out
    Warm Water
    Air Mixture In
    Air Mixture In
    Cooling Tower loop
    Wp
    QCond
    Makeup Water
    Return Water
    Chiller Refrigerant loop
    Wcomp
    QEvap
    Wp
    Chilled Water loop
    Data Center CRAC units
    Example: Chiller Unit Ensemble
    Data center cooling infrastructure
    Clustering
    15
  • 19. DAC2009-50-2
    Active Control of Cooling Resources
    Conventional Mode
    With active control using rack inlet temperature
    35% Energy SavingsImproved reliability
  • 20. DAC2009-50-2
    Dynamic Workload Placement based on Cooling Efficiency
    8am
    10am
    3pm
    Results
    (Thermal Management Margin)j + (AC Margin)j
    • 32% Energy savings over random job placement
    • 21.  $1M/year in savings (for large DCs)
    • 22. Improved thermal reliability of IT equipment
    • 23. Increased uptime
    (Hot Air Recirculation)j
    (Tset – Tin)i + [(TSAT - TSAT,min)j TCIj]i
    (Tin – TSAT)i
    Premise:
    • Hotspots exist that impact efficiency
    • 24. Use LWPI to place workload in more efficient locations
    • 25. Test using batch loads in real data center
    LWPI = 0.33
    LWPI = 3.8
    Batch load for tests
    LWPI = 4.9
    Local Workload Placement Index …
    LWPI =
    =
    17
    17
  • 26. 18
    DAC2009-50-2
    Demonstration at Scale
    • Software Operations, Bangalore
    • 27. Consolidation of 14 lab data centers
    Facility Building Blocks
    IT Building Blocks
    5 floors @14k sq. ft.
    900kW cooling per floor
    • Need based provisioning of compute, power and cooling resources based on available energy consumed (supply side)
    • 46. Dynamic cooling control implemented
    • 47. Data Analysis, Visualization and Knowledge Discovery to detect anomalies, improve reliability and minimize redundancy
    40% reduction in AHU power
    20% reduction in Infrastructure Power
    7,500 tons of CO2 prevented annually
    18
    27 July 2009
  • 48. 19
    DAC2009-50-2
    Rack
    heterogeneity
    X
    VM
    Enclosure IAM
    Server iLO
    CPU
    CHAOS!! (“Power” Struggle)
    X
    VM-res.all
    Peak thermal power
    Peak electrical power
    OS-wlm
    Average power
    Vmotion
    OS-gwlm
    Local optima
    LSF
    X
    SIM
    global optima
    X
    Power Struggles!
    X
    X
    X
    X
    X
    X
    performance
    performance
    X
    X
    X
    performance
    X
    X
    X
    X
    X
    X
    X
    X
    X
  • 49. 20
    DAC2009-50-2
    Cross Layer Monitoring and Management Framework
    20
  • 50. 21
    DAC2009-50-2
  • 51. 22
    DAC2009-50-2
    It works!
  • 52. 23
    DAC2009-50-2
    It works well!
    65% savings (OpEx)
    20% savings (CapEx)
    Similar performance
  • 53. 24
    DAC2009-50-2
    Other interesting insights…
    No VMC
    VMC only
    Unified
  • 54. 25
    DAC2009-50-2
    Conclusions
    • Environmental impact of IT is a growing worldwide concern
    • 55. Governments are beginning to take notice and regulations are increasing
    • 56. Management of available energy required to run cost effective operation
    • 57. An integrated, life-cycle approach to data center design and management is necessary to improve efficiency and reduce impact.
    • 58. Demonstrated results with economic payback