Enabling Energy Efficient Data Centers for New York and the Nation Collaborative Partner
Center Vision: To create electronicsystems that are self sensing andregulating, and are optimized forenergy efficiency at any desiredperformance level.E3S works in partnership withindustry and academia to developsystematic methodologies foroperating information technology,telecommunications, and electronicsystems and cooling equipment
E3S is part of the NYS Center of Excellence in Small Scale Systems Integration and Packaging New electronics that improve the way people interact with their surroundings E3S CASP Energy-Efficient IEEC CAMM Autonomous Solar Electronic Systems Electronics Packaging Flexible Electronics Supportive Analytical and Diagnostics Infrastructure
S3IP: academia, industry and government incollaboration
Binghamton University is a leader in microelectronics R&D incollaboration with government and industry. New $30 million Centerof Excellence building will house I/UCRC in Energy-EfficientElectronic Systems, including new data center research laboratory.
Binghamton University Bahgat Sammakia, Kanad Ghose, Bruce MurrayThe University of Texas at Arlington Dereje AgonaferVillanova University Alfonso Ortega, Amy FleischerAdditional Collaborators:The Georgia Institute of Technology Yogendra Joshi
History of successful partnerships with industry Research collaborations have 10+ year history Faculty are leaders in this research effort with expertise in: ◦ Thermal management techniques at all levels: chips to racks to entire data centers ◦ Fast models for thermal analysis and trend prediction ◦ Scalable control systems for co-managing IT and cooling systems ◦ Sensing-controls and model adjustments as needed
Endicott Interconnect NYSERDA Technologies Advanced Electronics Corning, Inc. Commscope IBM Sealco General Electric Verizon Emerson Network Power Comcast Panduit Steel Orca GE DVL Microsoft Microsoft Bloomberg
E3S Members Emerson DVL Facebook Collaborating Partner
Lifecycle: 90% of energy expended during the operations Breakdown of operating costs: Others Equip 5% 45% ◦ Equipment energy consumption: 45% ◦ HVAC: 50% (about 25% to 30% in water chilling facilities) HVAC ◦ Remainder: 5% Others 50% CPU Breakdown inside a modern server: 28% 22% ◦ CPU energy: 20% to 25% ◦ Memory energy: 30% to 45% ◦ Interconnection energy: 10% to 15% Total energy consumptions: Interconn 12% Memory ◦ 2.8% of total national electricity 38% consumption in US ◦ 1.6% for Western Europe
Server overprovisioning is a common practice 100% Power Performance is not proportional to Energy power Efficiency Typical Poor energy efficiency Operating Region (performance per Watt) on average 0% Individual Server 100% Utilization Situation becoming worse with smaller form factors
HVAC equipment typically use water chillers to supply cold air: takes a long time to adjust cooling system output (water has a very high thermal capacity) -Difficult for cooling system to track instantaneous load changes -Overprovisioning is common here as well to permit handling of sudden load surges -Using sensed temperature to adjust cooling system performance is a viable first step, but not enough -Solutions for improving the energy efficiency of data centers have been fairly isolated System-level, holistic solutions are a MUST
Addresses energy-efficiency improvements of computingand telecommunications systems at multiple spatial andtemporal scales:◦ Synergistic techniques for simultaneously controlling IT load allocation and thermal management in real time, to minimize the overall energy expenditures includes the development of infrastructures and standards needed for synergistic control and actuation mechanisms at the system level◦ From chip-level systems such as processors and operating systems to data centers, including IT equipment hardware, software, cooling systems◦ Multi-scale control systems, including scalability and stability issues for both traditional reactive control mechanisms as well as proactive control mechanisms based on reduced order models for predicting workload and thermal trends: the resulting system is self-sensing and self-regulating.◦ Mechanisms for utilizing waste heat
Five-Year Goals for thermalmanagement infrastructure•Determining key intrinsic energy consumption inefficiencies atevery level from devices to entire systems•Techniques for managing data centers synergistically usingpredictive models for computing workload and thermal trends,driven by live transient models in the control loop•Airflow management techniques based on fast, compactmodels that are continuously validated with live data and usingthese models to reduce the overall energy expenditures.•Techniques for improving the energy efficiency of buildingsand containers.
IAB = IndustrialAdvisory Board 2011 2012 2013 2014 2015 Holistic Exergy based system modeling Back end waste energy recovery modeling Multi-scale subsystem Multi-scale system Gaps & Modeling models (e.g.,aisle) models (e.g., room/bldg) Refinement Cyber models (compute loads,etc) Component Subsystem Subsystem System Validation & Testbed Eval Testbed Validation Validation Verification Demos Demos Hardware & Software InfrastructuresInfrastructures, Integrated Software Job Scheduling Decision & Energy Mgmt Airflow & Water Management Algorithms Control Scalable Control Algorithms Center IAB IAB IAB IAB IAB Benchmarks Portfolio Benchmark/ Portfolio Benchmark/ Management & Metrics Assessment Assessment Assessment Assessment
Typical raised floor data center configuration with alternating hot and cold aisle arrangement.• Model Details: ▫ CFD software FloTHERM ▫ 20 Racks; 4 CRACs ▫ k-ε turbulence model ▫ Boussinesq approximation ▫ 230k grid cells ▫ Plenum depth: 0.3 m ▫ Tile resistance: 50% ▫ 6 monitor points at rack inlet. Plan view of the 3D View of the the data center under rack layout for room 17 study.
Team:Sammakia and Murray (BU), Agonafer (UTA) 60 Inlet TemperatureGoals: 50 40Develop verified thermal analysis 30 20 tools to provide real time 10 assessment of Data Center 0 50 100 150 200 250 300 350 400 Time (sec) operating conditions to enable Rack A1 operation as dynamic self- 100 % Supply sensing and self-regulating 60 0 200 400 systems Time (sec)Enable Data Centers to run at Outcomes: optimal or near optimal Data showing measured thermal capacity values for conditions in order to minimize different servers. energy consumption for a Modeling methodology for data centers transient broad range of specified response. performance metrics Design guidelines for thermal management of data center accounting for both steady state and dynamic loads.
Assessing the Effectiveness of Cold Aisle Containment on the Efficiency of the Data Center CoolingInfrastructure Purpose: ◦ Eliminate hot air recirculation. ◦ Eliminate short circuiting of cold air. ◦ Energy efficiency! Previous problems with fire codes. Diagram of a data center with hot air recirculation and cold air short circuiting. (Caption courtesy of www.42U.com) 19
Assessing the Effectiveness of Cold Aisle Containment on the Efficiency of the Data Center CoolingInfrastructure Cannot totally seal aisle due to pressure and noise problems. 3 configurations are used for perforated ceiling. Aim is to keep rack inlet temperature for a cold aisle uniform. (a) 3D model in FloTHERM showing perforated ceiling. (b) 3 different configurations used for perforated ceiling. 20
Assessing the Effectiveness of Cold Aisle Containment on the Efficiency of the Data Center CoolingInfrastructure All racks powered at 32 kW and CRAC supply is 100% Rapid drop in inlet temperature when sealing the cold aisle. Very little variation Maximum rack temperature for various in inlet temperature perforations. per rack. Tile airflow uniformity and lower hot air recirculation Delta temperature of maximum and minimum temperature across each rack for various 21
Assessing the Effectiveness of Cold Aisle Containment on the Efficiency of the Data Center CoolingInfrastructure Average variation of 8 oC with no containment Configuration I: average 𝑇 𝑣 is 0.6 oC for 20% open ceiling. Temperature profile at height of 1750 mm. Temperature profile at height of 1750 mm. 22
Fan and system curves for CRACs and servers Steady state and transient models Impact of thermal capacity Impact of airflow leaks Overall thermodynamic models
Development of a Compact Model for a Server Pictures of Top view of the server showing the details of the thermocouple locations (redthermocouples attached dots). 24 to the server.
Development of a Compact Model for a Server 350 12000 Fan 1 Fan Speed (RPM) 300 10000 Fan 2 Power (W) Fan 3 250 8000 Fan 4 200 6000 150 100 4000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 Time (s) Time (s) Measured power. Measured fan speed. 100 Measured Graphic Card Measured HDD Measured RAM 1 90 Measured RAM 2 CPU 1 Heatsink Bottom CPU 1 Heatsink Top CPU 2 Heatsink Bottom 80 CPU 2 Heatsink Top Server Graphic Card Server HDD Server RAM 1 70 Server RAM 2Temperature (oC) Server CPU 1 Server CPU 2 60 50 40 30 20 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Time (s) Zoomed in plot of: (a) fan speed, Temperature measurements of different components inside the server. (b) RAM temperature. 25
Analyze time constants for variable power Analyze the impact of air flow and temperature fluctuations Analyze the impact of CRAC failures Collaborate with Georgia Tech and its partners on infrastructure measurements Collaborate with UT on developing a simple control model for airflow
Research agenda is unique in its use of techniques that synergistically combine: (a) computing (b) innovative cooling solutions and thermal management techniques and, (c) adaptive, pro-active and scalable control system concepts in devising a system-level solution. Relevant research is already supported by many sources, including NSF and industry. Designation as a NSF I/UCRC is formalizing and expanding current collaborations with industry and is enabling development of practical, deployable, high-impact solutions.
Bahgat Sammakia Interim Vice President for Research DirectorThe NSF I/UCRC in Energy-Efficient Electronic Systems Kand Ghose Chair, Department of Computer Science Site DirectorThe NSF I/UCRC in Energy-Efficient Electronic Systems Binghamton University Binghamton, NY 13902-6000 http://s3ip.binghamton.edu