1. Proactive and Reactive Thermal Aware
Optimization Techniques to Minimize the
Environmental Impact of Data Centers
Marina Zapater Sancho
Laboratorio de Sistemas Integrados (LSI)
Universidad Politécnica de Madrid
2. ● About me
● Motivation
● Focus of this PhD Thesis
● Multi-level approach
➢ Server level
➢ Data Center level
➢ Application framework
● Conclusions
Outline
ENERGY OPTIMIZATION of DATA CENTERS at LSI
3. About me
● Ingeniería de Telecomunicación, 2010.
Ingeniería Electrónica, 2010.
Universitat Politècnica de Catalunya (Barcelona)
● PICATA Pre-doctoral Fellowship, CEI Campus Moncloa
➢ Research in collaboration with:
ArTeCs Group, Facultad de Informática, UCM
● Research Stay at Performance and Energy-Aware Computing Lab.
➢ Boston University (BU)
➢ In collaboration with Oracle, Inc.
ENERGY OPTIMIZATION of DATA CENTERS at LSI
2009
2010
PFC
@LSI
PICATA
2011
2012
2013
2014
Research Stay @BU
4. ● About me
● Motivation
● Focus of this PhD Thesis
● Multi-level approach
➢ Server level
➢ Data Center level
➢ Application framework
● Conclusions
Outline
ENERGY OPTIMIZATION of DATA CENTERS at LSI
5. ● Energy consumption of Data Centers
➢ 1.3% of worldwide energy production in 2010
➢ USA: 2.0% production in 2011 = 1,5 x NYC
➢ 1 data center = 25 000 houses
➢ 12GW in 2007, 24 GW 2011, 43 GW in 2013 worldwide
➢ By 2015, total worldwide electricity use of 400 GWh/year
● More than 43 Million Tons of CO2
emissions per year
(2% worldwide)
● More water consumption than many industries (paper,
automotive, petrol, wood, or plastic)
The energy challenge
MOTIVATION
6. ● From 30% to 50% of
energy costs devoted to
cooling:
➢ Air conditioning units
➢ Server fans
● PUE metric
➢ Average PUE = 1.92
➢ SoA PUE = 1.3
● CeSViMa Data Center
@UPM:
➢ Cooling costs/year: 360k€
➢ IT costs/year: 240k€
The energy challenge
MOTIVATION
CeSViMa IT power consumption
7. ● Cloud Computing
● The March towards the Internet of Everything
➢ e-Health, Smart-everything (cities, cars, offices...)
● Huge increase of computational needs
➢ ...Data Centers
Future trends
MOTIVATION
Global Data Center traffic growth (Cisco)Global M2M Communication Growth
8. ● Industry focused on PUE
➢ Metric shifting to Performance Per Watt
➢ Costly CFD simulations of the Data Center
State-of-the-Art
MOTIVATION
● Academia
➢ Problem faced from
multiple perspectives
➢ Lack of a holistic approach
➢ Lack of scalable models
➢ No joint cooling +
computing approaches
9. Proactive and reactive holistic approach:
● Using the knowledge about the energy demand of
applications, the features of the computational and
cooling resources to apply proactive optimization
techniques
● Global strategy to integrate multiple information
sources and coordinate decisions to reduce overall
power consumption.
● Energy optimization beyond PUE
Our perspective
MOTIVATION
10. ● About me
● Motivation
● Focus of this PhD Thesis
● Multi-level approach
➢ Server level
➢ Data Center level
➢ Application framework
● Conclusions
Outline
ENERGY OPTIMIZATION of DATA CENTERS at LSI
11. Global Framework
FOCUS OF THE PhD THESIS
Datacenter
ModelOptimization
We derive accurate and flexible
models of the Data Center to be able
to predict power and energy
consumption
We use the models and the
knowledge of computing and cooling
resources to jointly optimize cooling
and computational costs
We propose actuations to
reduce the energy
consumption
12. Data Center Energy Optimization
Datacenter
Workload
Model
Sensors
Actuators
Sensor
configuration
Visualization
Power Model
Energy Model
Thermal
Model
Dynamic
Cooling Opt.
Resource
Alloc. Opt.
Global DVFS
VM Opt.
AnomalyDetection
andReputation
Systems
Communication
network
Sensor network
Application framework
FOCUS OF THE PhD THESIS
13. Optimization
Optimization
● Develop models and propose optimizations to minimize energy.
● Leveraging heterogeneity and application-awareness
● Multi-level orthogonal optimizations
➢ Server
➢ Data Center
➢ Application framework → emphasis on e-Health
Optimization
Objectives
FOCUS OF THE PhD THESIS
Server
Models
Models
Data
Center
Nodes
Models
Application
Framework
14. ● About me
● Motivation
● Focus of this PhD Thesis
● Multi-level approach
➢ Server level
➢ Data Center level
➢ Application framework
● Conclusions
Outline
ENERGY OPTIMIZATION of DATA CENTERS at LSI
15. Server modeling and optimization
● Splitting contributors to power:
➢ Dynamic power → workload
➢ Static power → leakage (exp(T))
➢ Fan power → (RPM)³
SERVER LEVEL
Goal 1: Exploiting the leakage-cooling tradeoffs at the server level
Goal 2: Energy-efficient workload allocation policy
● Joint workload and cooling management policy to
minimize energy consumption at the server level
➢ CPU affinity
16. Server modeling (I)
● Experimental set-up:
➢ SPARC T3 server
■ 32 cores, 256 hw threads
■ 128GB RAM
■ Monitoring via IPMI (SP)
➢ Control over cooling subsystem
● Workloads:
➢ Training:
Synthetic workloads
(LoadGen, RandMem)
➢ Test set:
SPEC Power
SPEC CPU 2006
PARSEC
SERVER LEVEL
CPU thermal dynamics (training)
17. Server modeling (II)
SERVER LEVEL
CPU Steady-State Temperature (RMSE < 2.1ºC)CPU Leakage Power modeling (RMSE < 0.5W)
Sensor measurements
Models
● Modeling contributors to power consumption:
➢ Leakage power
➢ CPU steady-state temperature
➢ Memory dynamic power (via performance counters)
➢ CPU dynamic power (via perf. counters, WIP)
18. Optimization
● Optimum cooling-management
to improve energy efficiency
➢ Proactive fan control policy
➢ Tested with statistically different
workloads (random power, Poisson
arrival times )
➢ Up to 9% savings compared to
server default policy
➢ Up to 6% savings compared to
other SoA policies
SERVER LEVEL
19. Optimization
● Energy-efficient workload allocation policy
➢ Comparing allocations: energy, power, EDP, temperature
➢ Guided by application parameters: performance counters
(Mem accesses, L1 misses, IPC…)
➢ Up to 13% energy savings when combining optimum
allocation and cooling
SERVER LEVEL
20. Work in progress
● Proactive workload allocation policy:
➢ Now we were using “qualitative” knowledge about workload
behavior.
➢ Working on contention-aware models to develop co-
assignment policies
➢ Predict how we should combine several workloads in the same
server to minimize energy.
➢ Proactive joint workload and cooling management.
SERVER LEVEL
M. Zapater, O. Tuncer, J. L. Ayala, J. M. Moya, K. Vaidyanathan, K. Gross, and A. K. Coskun, “Leakage-aware cooling
management for Improving Server Energy Efficiency,” submitted to TPDS (JCR Q1), under review. in collaboration
with Oracle, BU, UCM
M. Zapater, J. L. Ayala, J. M. Moya, K. Vaidyanathan, K. Gross, and A. K. Coskun, “Leakage and temperature aware
server control for improving energy efficiency in data centers,” in DATE 2013. in collaboration with Oracle, BU, UCM
21. ● About me
● Motivation
● Focus of this PhD Thesis
● Multi-level approach
➢ Server level
➢ Data Center level
➢ Application framework
● Conclusions
Outline
ENERGY OPTIMIZATION of DATA CENTERS at LSI
22. DC Modeling and optimization
SERVER LEVEL
Goal: Energy efficient assignment of computational and cooling
resources of the DC to execute a workload
23. DC Modeling and optimization
DATA CENTER LEVEL
Goal: Energy efficient assignment of computational and cooling
resources of the DC to execute a workload
SLURM Resource Manager
24. Data Center Room modeling
● The maximum CPU temperature limits the minimum
cooling of the Data Room.
➢ Development of fast, accurate and flexible models to
predict:
■ Server Inlet temperature
■ CPU temperature
➢ Literature uses CFD simulation → Complex non-
linear models...
➢ Classical regression techniques no longer valid…
● Usage of a WSN to gather environmental parameters
● Usage of Genetic Programming techniques
DATA CENTER LEVEL
25. Data Center Room modeling
● Genetic programming techniques:
➢ Find the best model to predict a time series given a set of
variables and a fitness function.
➢ Each model is an individual with a genotype and a phenotype
➢ Fitness function is RMSE
➢ Models evolve → individuals with best fitness survive
● 1 minute ahead CPU temperature prediction:
DATA CENTER LEVEL
CPU Temperature prediction in Intel Xeon server (RMSE = 2.1ºC)
TS(k+1) = TS(k-6)-PS(k-8)+6.3+PS(k-6)-PS(k-25)/49.4
26. Data Center Room modeling
● Work-in-Progress:
➢ Extending CPU temperature
prediction to CeSViMa servers →
Power7 architecture, blade center
■ 245 servers eServer
BladeCenter PS702, each with
2 CPU x 8 cores @3.3 GHz
➢ Running (currently evolving)
models for inlet temperature at LSI
servers.
➢ Going to extend to CeSViMa
DATA CENTER LEVEL
27. Optimizing IT allocation (I)
● Heterogeneity-aware and application-aware resource
management
➢ Energy profiling of tasks of the SPEC CPU 2006 benchmark in
3 servers
➢ Static optimization: finding the best data center setup, given a
number of heterogeneous servers
➢ Dynamic: run-time allocation using the resource manager
● MILP algorithms to allocate
tasks to servers:
➢ Minimize total IT energy
DATA CENTER LEVEL
28. Optimizing IT allocation (II)
● Implemented in SLURM resource
manager:
➢ BSC SLURM Simulator
➢ Random arrival distribution (light, medium,
heavy load)
➢ Simulating around 1.000 cores
● Results show that the best solution is achieved
with a heterogeneous data center:
➢ 5% to 22% savings for static solution
➢ 7.5% to 24% energy savings (depending on
the scenario) for dynamic solution when
compared to SLURM round-robin allocation
DATA CENTER LEVEL
M. Zapater, J. L. Ayala, and J. M. Moya, “Leveraging heterogeneity for energy minimization in data
centers,” in CCGRID 2012. CORE A, in collaboration with UCM
29. Cooling & IT optimization
● Cooling reduction of 15% in LSI server room (Aug’13)
➢ Leakage and temperature-aware control
● Work in progress:
➢ Using the data room modeling at CeSViMa and LSI
rooms, development of joint cooling and IT
optimizations
■ MILP
■ GA-based
DATA CENTER LEVEL
30. ● About me
● Motivation
● Focus of this PhD Thesis
● Multi-level approach
➢ Server level
➢ Data Center level
➢ Application framework
● Conclusions
Outline
ENERGY OPTIMIZATION of DATA CENTERS at LSI
31. e-Health scenarios
● Next-generation applications need higher computational
demands to analyze data.
➢ We propose the usage of other elements in the application
framework (i.e. personal servers) to offload computation from
the data center.
APPLICATION FRAMEWORK
32. Off-loading workload
● Tasks that do not have high computational demands,
can be executed in intermediate nodes:
➢ Not all computation is performed in the Data Center
➢ Clustering tasks according to IPC and memory boundedness
➢ Each node decides whether to:
a) execute a task or b) forward it to the data center
APPLICATION FRAMEWORK
33. Off-loading workload
● Usage of SMT Solvers (Satisfiability Modulo Theory)
➢ SMT solvers determine whether a certain condition can be
satisfied
➢ Each node runs an SMT solver: if a task satisfies certain
parameters, it is executed in the node.
■ Lower EDP product in the node than in the DC
■ Minimum QoS (constrains max. execution time)
■ Maximum amount of battery used
● Tested with Yices SMT Solver
● Different nodes capabilities depending on scenario:
➢ Hardware equivalent to a Samsung Galaxy SII Smartphone
(ARM Cortex-A9, 1GB RAM)
➢ MIPS32 @500MHz, 256MB RAM
➢ Dual-core AMD PC @2GHz, 1GB RAM
APPLICATION FRAMEWORK
34. Off-loading workload
● Depending on the number of nodes to execute the
workload and on the workload (light, medium, heavy)
different benefits are achieved:
➢ 10% to 24% energy savings
➢ Up to 16% performance increase
M. Zapater, C. Sánchez, J. L. Ayala, J. M. Moya, and J. L. Risco-Martín, “Ubiquitous green computing techniques
for high demand applications in smart environments,” Sensors, 2012. JCR Q1, in collaboration with IMDEA
Software, UCM
M. Zapater, P. Arroba, J. L. Ayala, J. M. Moya, and K. Olcoz, “A novel energy-driven computing paradigm for e-
Health scenarios”, Future Generation Computer Systems, 2014. JCR Q1, in collaboration with UCM
APPLICATION FRAMEWORK
35. ● About me
● Motivation
● Focus of this PhD Thesis
● Multi-level approach
➢ Server level
➢ Data Center level
➢ Application framework
● Conclusions
Outline
ENERGY OPTIMIZATION of DATA CENTERS at LSI
36. The energy challenge
● Unsustainable energy costs of Data Centers
● Proposal of multi-layer holistic approaches to the energy
issue → energy as a first-class requirement
● Combining the proposed approaches:
➢ server, data center and application level
we can reach high energy savings
CONCLUSIONS
39. Most relevant publications
M. Zapater, P. Arroba, J. L. Ayala, J. M. Moya, and K. Olcoz, “A novel energy-driven computing paradigm for e-
Health scenarios”, Future Generation Computer Systems, 2014. JCR Q1, in collaboration with UCM
J. Pagán, M. Zapater, Ó. Cubo, P. Arroba, V. Martín, and J. M. Moya, “A Cyber-Physical approach to combined
HW-SW monitoring for improving energy efficiency in data centers,” in DCIS 2013. in collaboration with CeSViMa
M. Zapater, J. L. Ayala, J. M. Moya, K. Vaidyanathan, K. Gross, and A. K. Coskun, “Leakage and temperature
aware server control for improving energy efficiency in data centers,” in DATE 2013. in collaboration with Oracle,
BU, UCM
P. Arroba, M. Zapater, J. L. Ayala, J. M. Moya, K. Olcoz, and R. Hermida, “On the Leakage-Power modeling for
optimal server operation,” in IWIA, 2014. in collaboration with UCM
M. Zapater, C. Sánchez, J. L. Ayala, J. M. Moya, and J. L. Risco-Martín, “Ubiquitous green computing techniques
for high demand applications in smart environments,” Sensors, 2012. JCR Q1, in collaboration with IMDEA
Software, UCM
M. Zapater, J. L. Ayala, and J. M. Moya, “GreenDisc: a HW/SW energy optimization framework in globally
distributed computation,” LNCS, 2012, in collaboration with UCM
M. Zapater, J. L. Ayala, and J. M. Moya, “Leveraging heterogeneity for energy minimization in data centers,” in
CCGRID 2012. CORE A, in collaboration with UCM
ENERGY OPTIMIZATION of DATA CENTERS at LSI
40. Know-How and skills
● Methodologies to develop models
➢ Data sets, tests to perform, etc.
➢ Extracting useful information from large data sets
● Metaheuristics
➢ Genetic programming
● Benchmarks
➢ CPU and memory intensive, disk, etc.
● Collecting data from servers:
➢ Sensors, performance counters
CONCLUSIONS