Energy Conservation and Thermal                  Management in High-Performance                        Server Architecture...
Agenda                    • Background and Related Work                    • System Modeling                    • Effectiv...
What does this picture tell us?                                                                  (c) The New York Times, J...
Current Practice                          Completely Fair Scheduler                          Domain-based Load Balancing  ...
Thread Scheduling & Power Management                           DVFS:                          P = CV 2 f                  ...
Proactively Avoid Thermal Emergencies                                                                Thermally-        A F...
System ModelingTuesday, March 15, 2011
Model: Inputs & Components                                        Esystem = Eproc + Emem + Ehdd + Eboard + Eem .          ...
Model: Processor                                                                                                          ...
Model: Memory                                                                                      •   DRAM Read/Write    ...
Model: Storage                                                                                         Ehdd =Pspin−up × Ts...
Model: Board                                                                                                           Ebo...
Model: Electromechanical                                                          N                                       ...
Effective PredictionTuesday, March 15, 2011
gobmk         1.7%     9.0%     2.30                                               zeusmp       TABLE III 8.1%            ...
Prediction w/ Chaotic Time Series                     TABLE 4 dications of chaotic behavior in power time series          ...
Kernel weighting                                   1.                               −m        K(x) = (2π)            exp(−...
Forward prediction                •          Start with a Taylor series expansion                                         ...
Time Complexity                   n future observations               p past observations                                 ...
Initial Evaluation and                                  ResultsTuesday, March 15, 2011
gobmk         C     Artificial Intelligence: Go                                                                            ...
Results: AMD Opteron f10h                           (a) Astar/CAP.                                 (b) Astar/AR(1)).      ...
Results: Intel Nehalem                          (c) Zeusmp/CAP.                                        (d) Zeusmp/AR(1).  ...
Results: Error - Other BenchmarksTuesday, March 15, 2011
Observations and Analysis                    • Where does maximum error occur?                    • Choice of performance ...
Thermally-Aware                             SchedulingTuesday, March 15, 2011
Problem nature                    • Scheduling...                     • in time: who runs next                     • in sp...
Thermal Extensions to System Model                          Applications have                              a length:      ...
Extending CAP for Thermal Prediction                    • Thermal Chaotic Attractor Predictor                          (TC...
Reducing Processor Temperatures                    • Premise: Processor die temperature can be                          ma...
Reducing Ambient Temperature                    • Premise: Control system ambient                          temperature by ...
Status, Plans, and                              SummaryTuesday, March 15, 2011
Current Status                                               Thermally-     A Full-System             Effective           ...
Plan for Completion                          ID Task                          1   Respond to review comments for [Lewis 20...
Future Directions                    • Extend beyond a single blade                     • Cluster, Grid, and Cloud Schedul...
Questions?Tuesday, March 15, 2011
Additional MaterialTuesday, March 15, 2011
This work was supported in part by the U.S.                          Department of Energy and by the Louisiana            ...
Publications List                          Lewis, A., Ghosh, S., and Tzeng, N.-F. 2008. Run-                          time...
Upcoming SlideShare
Loading in …5
×

2011 Feb07 Lewis Prospectus

374 views

Published on

Presented at the Center for Advanced Computer Studies at The University of Louisiana at Lafayette on 07Feb2011

  • Be the first to comment

  • Be the first to like this

2011 Feb07 Lewis Prospectus

  1. 1. Energy Conservation and Thermal Management in High-Performance Server Architectures Adam Lewis The Center for Advanced Computer Studies The University of Louisiana at LafayetteTuesday, March 15, 2011
  2. 2. Agenda • Background and Related Work • System Modeling • Effective Prediction • Initial Evaluation and Results • Thermally-Aware Scheduling • Status, Plans, and SummaryTuesday, March 15, 2011
  3. 3. What does this picture tell us? (c) The New York Times, June 14, 2006 Source: McKinsey & Company 2008 Source: EPA 2008 A 20% projected increase Only ~50% in data center of power consumed emissions over next 5 years from IT equipmentTuesday, March 15, 2011
  4. 4. Current Practice Completely Fair Scheduler Domain-based Load Balancing Power-state aware Run-queue scheduling Domain-based Load Balancing Power-state aware (Solaris 11) Run-queue scheduling Interface w/ power manager?Tuesday, March 15, 2011
  5. 5. Thread Scheduling & Power Management DVFS: P = CV 2 f SpeedStep Multi-core/Many-core • Performance issues [LLBL 2007] •Cache affinity • Lack of slack •Load balancing • High load = No gain •Opportunity to turn • Reliability issues [Bircher 2008] off the lights? • Under-clocking & MTBF • Reactive rather than proactiveTuesday, March 15, 2011
  6. 6. Proactively Avoid Thermal Emergencies Thermally- A Full-System Effective + Aware Energy Model Prediction Scheduling • Possible approaches • Heat-and-Run and related approaches [Gomaa2004] [Coskun2009] [Zhou2010] • Memory-resource focused approaches [Merkel2010] • Control-theoretic techniquesTuesday, March 15, 2011
  7. 7. System ModelingTuesday, March 15, 2011
  8. 8. Model: Inputs & Components Esystem = Eproc + Emem + Ehdd + Eboard + Eem . • Processor • Memory • Hard disk & storage devices Edc = Esystem • Motherboard & peripherals • Three DC voltage domains • Electrical & Electromechanical • 12Vdc, 5.5Vdc, 3.3Vdc Components • 5.5V and 3.3V domains limited to 20% of rated voltageTuesday, March 15, 2011
  9. 9. Model: Processor t2 Eproc = (Pproc (t))dt t1 Memory Memory DDR2-DRAM DDR2-DRAM QPLC • Core 1 Core 2 Core 2 Core 1 Bus transactions I-cache D-cache I-cache D-cache D-cache I-cache D-cache I-cache L2 Cache L2 cache L2 cache L2 Cache Core Core system request interface system request interface Crossbar Switch Crossbar Switch • Coherent Integrated HyperTransport Integrated Host bridge (cHT) Host bridge Memory Memory Reflects amount of HyperTransport HyperTransport Controller Controller HyperTransport Bus QPLIO QPLIO data processed Input USB VGA Output SouthBridge Handler HDD • Ethernet Die temperature DVD Graphics Board - Level Power consumers PCI Express AMD Opteron Intel Nehalem • Computation per core • Processor as black box • Processor system • Power = f(workload) metrics • Manifests as heatTuesday, March 15, 2011
  10. 10. Model: Memory • DRAM Read/Write t2 N power + background Emem = t1 ( i=1 CMi (t) + DB(t)) × PDR + Pab dt power = known quantities • Performance counters exist for measuring the count of highest level cache miss and bus transactions • Combine these to compute the energy consumedTuesday, March 15, 2011
  11. 11. Model: Storage Ehdd =Pspin−up × Tsu + Pread N r × Tr + Pwrite N w × Tw + Pidle × Tid Parameter Value Interface Serial ATA Capacity 250 GB Rotational speed 7200 rpm Power (spin up) 5.25 W (max) Power (Random read, write) 9.4 W (typical) Power (Silent read, write) 7 W (typical) Power (idle) 5 W (typical) Power (low RPM idle) 2.3 W (typical for 4500 RPM) Power (standby) 0.8 W (typical) Power (sleep) 0.6 W (typical)Tuesday, March 15, 2011
  12. 12. Model: Board Eboard = Vpower−line × Ipower−line × tinterval • System components that support the operation of the machine • Typically in the 5.5Vdc and 3.3Vdc power domains • Measured by current probeTuesday, March 15, 2011
  13. 13. Model: Electromechanical N Tp i Eem = V (t) · I(t) + Pf an (t) dt 0 i=1 • Need to account for energy required to cool • No performance counters • Can measure power drawn by the fans • Derived from log data collected by OSTuesday, March 15, 2011
  14. 14. Effective PredictionTuesday, March 15, 2011
  15. 15. gobmk 1.7% 9.0% 2.30 zeusmp TABLE III 8.1% 2.8% 2.14 Linear ODEL ERRORS FOR CAP, AR(1),A good ON AN AMD OPTERON S M AR Time Series - AND MARS idea? AR Avg Max RMSE Benchmark Err % Err % astar 3.1% 8.9% 2.26 games 2.2% 9.3% 2.06 gobmk 1.7% 9.0% 2.30 zeusmp 2.8% 8.1% 2.14 TABLE IV M ODEL ERRORS AR Model: AMD Opteron SERVER Linear FOR AR ON I NTEL N EHALEM • Linear Regression • Easy, simple Benchmark Avg Err % Max Err % RMSE • Odd mis-predictions astar 5.9% 28.5% 4.94 • Corrective methods games gobmk 5.6% 5.3% 44.3% 27.8% 5.54 4.83 required zeusmp 7.7% 31.8% 7.24 TABLE IV M ODEL ERRORS FOR AR ON I NTEL Nehalem SERVER Linear AR Model: Intel N EHALEM Avg Max RMSE Benchmark Err % Err % astar 5.9% 28.5% 4.94 games 5.6% 44.3% 5.54Tuesday, March 15, 2011
  16. 16. Prediction w/ Chaotic Time Series TABLE 4 dications of chaotic behavior in power time series Chaotic behavior (AMD, Intel) Chaotic Time Series Benchmark Hurst Average • Time-delay reconstructed state space Parameter (H) Lyapunov Exponent • Uses Takens Embedding Theorem: bzip2 (0.96, 0.93) (0.28, 0.35) • Time-delayed partition of observations to build function that cactusadm (0.95, 0.97) (0.01, 0.04) preserves the topological and gromac (0.94, 0.95) (0.02, 0.03) leslie3d (0.93, 0.94) (0.05, 0.11) dynamical properties of our original omnetpp (0.96, 0.97) (0.05, 0.06) chaotic system perlbench (0.98, 0.95) (0.06, 0.04) • Find nearest neighbors on attractor to our observations • Perform least-square curve fit to find aponent can be calculated using: polynomial that approximates the attractor 1 N −1 λ = lim ln|f (Xn )|. N →∞ N n=0e found a positive Lyapunov exponent when per-ming this calculation on our data set ranging from1 to 0.28 (or 0.03 to 0.35) on the AMD (or Intel) testver, as listed in Table 4, where each pair indicates Tuesday, March 15, 2011
  17. 17. Kernel weighting 1. −m K(x) = (2π) exp(−x /2) 2 2 3. 1 x n+p Kβ (x) = K( ) Op ∗ Kβ (Xt−1 − x) β β t=p+1 ˆ f (x) = n+p 2. Kβ (Xt−1 − x) 1 t=p+1 4 5 T β= σ Op = (Xt−1 , . . . , Xt−p ) 3p σ = median(|xi − µ|)/0.6745 ¯ ¯Tuesday, March 15, 2011
  18. 18. Forward prediction • Start with a Taylor series expansion fˆ(X) = f (x) + f (x)T (X − x) ˆ ˆ • Find the coefficients of the polynomial by solving the linear least squares problem for a and b: n+p T 2 Xt − a − b (Xt−1 − x) ∗ Kβ (Xt−1 − x) t=p+1 • Explicit solution for our linear least squares problem: n+p ˆ 1 f (x) = (s2 − s1 ∗ (x − Xt−1 ))2 ∗ Kβ ((x − Xt−1 )/β) n t=p+1 n+p 1 si = (x − Xt−1 )i ∗ Kβ ((x − Xt−1 )/β) n t=p+1Tuesday, March 15, 2011
  19. 19. Time Complexity n future observations p past observations Creating a CAP: O(n ) 2 Predicting with a CAP: O(p)Tuesday, March 15, 2011
  20. 20. Initial Evaluation and ResultsTuesday, March 15, 2011
  21. 21. gobmk C Artificial Intelligence: Go ipmito FP Benchmarks are ava Initial Evaluation and Results calculix C++/F90 Structural Mechanics zeusmp F90 Computational Fluid Dynamics commo Solaris Linux. TABLE 7 dtrace Test hardware configuration tunable impact Sun Fire 2200 Dell PowerEdge R610 consum CPU 2 AMD Opteron 2 Intel Xeon (Nehalem) 5500 The p CPU L2 cache 2x2MB 4MB Memory 8GB 9GM power Internal disk 2060GB 500GM and th Network 2x1000Mbps 1x1000Mbps measur Video On-board NVIDA Quadro FX4600 Height 1 rack unit 1 rack unit ampera memor the run TABLE 5 CAT increases linearly, as can be obtained in Eq. (15). downlovalues, TABLE 6 SPEC CPU2006 benchmarks used for model interna Open Training Benchmarks calibration Evaluation Benchmarks The actual computation time results for our CAP SPEC CPU2006 benchmarks used for evaluation code implemented using MATLAB run on machines the dif coun (detailed in Table 7) with respect to different n and p sured u the O Integer Benchmarks Integer Benchmark values are provided in the next section. one Ag In bzip2 C Compression astar C++ Path Finding domain is de- mcf C Combinatorial Optimization gobmk C Artificial Intelligence: Go have from thaverage omnetpp C++ Discrete Event Simulation 5 FP E VALUATION ipmi Benchmarks a benchsed on host. a are FP Benchmarks . , Xt−p A calculix C++/F90 Structural Mechanics out to evaluate set of experiments was carried gromacs C/F90 Biochemistry/Molecular Dynamics the performance of power models built using CAP zeusmp F90 Computational Fluid Dynamics comm (u) = cacstusADM C/F90 Physics/General Relativity Solaras de- leslie3d F90 Fluid Dynamics techniques to approximate a solution for dynamic 5.2 R lbm C Fluid Dynamics systems following Eq. TABLE 7 purpose of the first (12). The Linu experiment was to confirm the time complexity CAP. Fig. 8 dtra Test hardware configuration The behavior of CAP was simulated using MATLAB from C tuna using two criteria: sufficient coverage of the functional on the hardware described in Table 7 withR610 varying tual po impa Sun Fire 2200 Dell PowerEdge system units in the processor and reasonable applicability values of n future observation and p past observa- cons to the problem space. Components of the processor tions. Fig. 6 illustrates the behavior of (Nehalem) 5500 CPU 2 AMD Opteron 2 Intel Xeon CAP as the three A Th affect the thermal envelope in different ways [40]. This CPU L2 cache 2x2MB 4MB over th value of n is varied and confirms the O(n2 ) behavior Memory 8GB 9GM pow issue is addressed by balancing the benchmark selec- of the predictor in this case. The behavior of CAP as Internal disk 2060GB 500GM betwee and tion between integer and floating point benchmarks pNetwork is shown in Fig. 71x1000Mbps is varied 2x1000Mbps and supports the claim benchm meas a local in the SPEC CPU2006 benchmark suite. Second, the Video On-board NVIDA Quadro FX4600 polyno Tuesday, March 15, 2011 of linear behavior. amp
  22. 22. Results: AMD Opteron f10h (a) Astar/CAP. (b) Astar/AR(1)). (c) Zeusmp/CAP. (d) Zeusmp/AR(1). Fig. 8. Actual power results versus predicted results for AMD Opteron.Tuesday, March 15, 2011
  23. 23. Results: Intel Nehalem (c) Zeusmp/CAP. (d) Zeusmp/AR(1). Fig. 8. Actual power results versus predicted results for AMD Opteron. (a) Astar/CAP. (b) Astar/AR(1)). (c) Zeusmp/CAP. (d) Zeusmp/AR(1). Fig. 9. Actual power results versus predicted results for an Intel Nehalem server.Tuesday, March 15, 2011
  24. 24. Results: Error - Other BenchmarksTuesday, March 15, 2011
  25. 25. Observations and Analysis • Where does maximum error occur? • Choice of performance counters • Difference in behavior between processors? • The right set of performance counters • Benchmark selectionTuesday, March 15, 2011
  26. 26. Thermally-Aware SchedulingTuesday, March 15, 2011
  27. 27. Problem nature • Scheduling... • in time: who runs next • in space: who runs where • Optimization problem • Who runs next: least use of energy with best performance quality of service • Who runs where: best utilization of resources with least increase in processor and/or ambient temperatureTuesday, March 15, 2011
  28. 28. Thermal Extensions to System Model Applications have a length: 2. 1. For which we define L(A, DA , t) Thermal Equivalent of Application and generate workload ΘA (A, DA , T, t) = U (A,DA ,t) lim Je × (T − Tnominal ) T →Tth U (A, DA , t) = lim n × W (pi , di , t) × Ln (An , DAn , t), 1 ≤ i ≤ p n→ke 3. Which is used to generate 4. That is used to compute Thermal Efficiency to Completion Cost of Performance per Unit Power ΘA (A,DA ,T,t) ΘA (A,DA ,T,t) η(A, DA , T, t) = ΘA (Ae ,DAe ,Tme ,Le ) Cθ (A, DA , T, t) = Esys (A,DA ,t)Tuesday, March 15, 2011
  29. 29. Extending CAP for Thermal Prediction • Thermal Chaotic Attractor Predictor (TCAP) • Extends CAP to thermal domain • Created and used in similar manner to CAP • Matching TCAP for each thermal metricTuesday, March 15, 2011
  30. 30. Reducing Processor Temperatures • Premise: Processor die temperature can be managed by controlling what threads execute over time • Predict the next thread to run a logical CPU using TCAP for processor die temperatureTuesday, March 15, 2011
  31. 31. Reducing Ambient Temperature • Premise: Control system ambient temperature by managing load on logical CPUs so that overheated resources have time to recover • Partition resources into categories based on predicted change in temperature • Move workload from “HOT” resources towards “COLD” resourcesTuesday, March 15, 2011
  32. 32. Status, Plans, and SummaryTuesday, March 15, 2011
  33. 33. Current Status Thermally- A Full-System Effective + Aware Energy Model Prediction Scheduling • Development Complete • Evaluation Complete • Design complete • Intel + AMD processors • Prototype under development • OpenSolaris (Solaris 11) • Peer-reviewed • Conference/Workshop: 3 • Journal: 1Tuesday, March 15, 2011
  34. 34. Plan for Completion ID Task 1 Respond to review comments for [Lewis 2011] 2 Implement scheduler prototype in FreeBSD 3 Evaluate scheduler performance using parallel benchmarks 4 Document results and submit to archival journal 5 Create dissertation from Prospectus + output from previous task 6 Defend dissertation 7 Respond to comments from committee and Graduate School editor 8 Submit final version of documentTuesday, March 15, 2011
  35. 35. Future Directions • Extend beyond a single blade • Cluster, Grid, and Cloud Scheduling • MPI, OpenMP, and other environments • Impact of operating system virtualization • Extension of the thermal model in terms of the thermodynamics of computationTuesday, March 15, 2011
  36. 36. Questions?Tuesday, March 15, 2011
  37. 37. Additional MaterialTuesday, March 15, 2011
  38. 38. This work was supported in part by the U.S. Department of Energy and by the Louisiana Board of RegentsTuesday, March 15, 2011
  39. 39. Publications List Lewis, A., Ghosh, S., and Tzeng, N.-F. 2008. Run- time energy consumption estimation based on workload in server systems. Proceedings of the 2008 conference on Power aware computing and systems. Lewis, A., Simon, J., and Tzeng, N.-F. 2010. Chaotic attractor prediction for server run-time energy consumption. Proc. of the 2010 Workshop on Power Aware Computing and Systems (Hotpower’10). Lewis, A., Tzeng, N.-F., and Ghosh, S. 2011. Time series approximation of run-time energy consumption based on server workload. Under review for publication in ACM Transactions on Architecture and Code Optimization.Tuesday, March 15, 2011

×