• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Green it economics_aurora_tco_paper
 

Green it economics_aurora_tco_paper

on

  • 2,294 views

An analysis of a total cost of ownership case in supercomputers

An analysis of a total cost of ownership case in supercomputers

Statistics

Views

Total Views
2,294
Views on SlideShare
2,294
Embed Views
0

Actions

Likes
0
Downloads
15
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Green it economics_aurora_tco_paper Green it economics_aurora_tco_paper Document Transcript

    • The economics of green HPC a total cost of ownership approach an Eurotech HPC Division White Paper by Giovanbattista Mattiussi, MBA, MSc marketing manager, Eurotech S.p.a.
    • Table of contents Introduction 3 Remarks 4 Definitions 6 Data centers comparison example 9 Results 14 Economical conclusions 19 Aurora from Eurotech 21 References M.K. Patterson, D.G. Costello, & P. F. Grimm Intel Corporation, Hillsboro, Oregon, USA M. Loef- fler Intel Corporation, Santa Clara, California, USA - Data center TCO; a comparison of high- density and low-density spaces Rachel A. Dines, Forrester - Build Or Buy? The Economics Of Data Center Facilities Neil Rasmussen - Re-examining the Suitability of the Raised Floor for Data Center Applica- tions Jonathan Koomey, Ph.D. with Kenneth Brill, Pitt Turner, John Stanley, and Bruce Taylor - A Sim- ple Model for Determining True Total Cost of Ownershipfor Data Centers Eurotech S.p.a. – Aurora systems technical specifications Fabrizio Petrini, Kei Davis and Jos´e Carlos Sancho - System-Level Fault-Tolerance in Large- Scale Parallel Machines with Buffered Coscheduling Narasimha Raju, Gottumukkala, Yudan Liu, Chokchai Box Leangsuksun, Raja Nassar, Stephen Scott, College of Engineering & Science, Louisiana Tech University, Oak Ridge National Lab - Reliability Analysis in HPC clustersEconomics of Green HPC - A TCO approach 2
    • Introduction The evolution of supercomputers from very expensive special machines to cheaper high performing computing (HPC) has made very fast and advance systems available to an au- dience much vaster now than just 10 years ago. The availability of a lot more computing power is helping to progress in many strategic fields like science, applied research, na- tional security and business. This has fired the demand for even more computing power and more storage to stock larger and larger amounts of data. At the same time, HPC has be- come more pervasive, being progressively and increasingly adopted in fields like oil&gas, finance, manufacturing, digital media, chemistry, biology, pharmaceutics, cyber security, all sectors that together with the “historical” HPC users (scientific research, space and cli- mate modeling) have seen the demand for HPC services to soar. More computing power per installation and more installations mean more electrical power needed and more complex heat management. In the last decade there has been a dramatic raise in electrical power requirements to both feed and cool an increased number of serv- ers. According to the U.S. Environmental Protection Agency (EPA) the amount of energy con- sumed by data centers has doubled between 2000 and 2006 alone. Projecting this in the future means facing harsh problems, like electrical power availability, energy cost and car- bon footprint associated to datacenters. While the cost of powering and cooling a data center has been traditionally low compared to the cost of hardware, this is now changing. As shown in Figure 1, the cost of power could reach that of hardware in the next 3 years, especially in areas like Europe or northeast US, where utilities have a high price. This is particularly true in supercomputing, where computational power and utilization are taken to extreme levels. All of the above suggests that energy efficiency is becoming an unavoidable requirement not only for green IT but also for the economic health of the HPC data centers. Another trend with HPC systems is the increase in density. On the one hand, this raises the heat concentration and gives more problems in how to extract it. On the other hand, more density means also to gain benefits in terms of cost and green, especially when, as with liquid cooling, an effective way of heat extraction is found. High density means low space occupancy, which comes with capital and financial gains related to occupancy reduction and intangible gains related to the lower carbon footprint that saving real estate entitles. Supercomputers occupy the extreme side of computing, like F1 in cars. Such machines are often utilized to the limit of their computational power, with utilization rates close to 100%. Reliability of supercomputers becomes paramount to really exploit their main char- acteristic of being fast. If a HPC system breaks often, than the advantages of a lot of Flops may fade away quite quickly. Reliability is associated to the cost of maintenance, spare parts and to the generally much bigger business cost of an outage.Economics of green HPC - a TCO approach 3
    • Reliability means basically less faults or less major faults, and hence less spare parts. Low- ering parts replacements has a connection to sustainability, according to which waste and depletion of hearth resources should be limited. Green IT and economics are intimately interweaved. This paper wants to provide some food for thought to help organizations to take the right decisions when buying HPC sys- tems, considering both an economic and an ecologic reasoning. The approach chosen in the following paragraphs is to create a comparison among 3 different data centers using 3 different HPC systems and to detail a hypothesis of total cost of ownership for each of them. $70 New Server Costs $60 $50 Billion $40 Power Costs $30 $20 $10 0 2007 2008 2009 2010 2011 Figure 1: Raise in power cost (source Intel) Remarks This paper focuses on hardware, even if it recognizes the determinant importance of software in making HPC systems greener and at the same time economically more attractive. Software impacts both the capital and operational sides of a HPC return of investment calculation. Bet- ter software and architectures mean saving energy, reducing time to solution and time to mar- ket, better hardware utilization and improved operational efficiency. Ultimately, it is software that determines how HPC hardware is used and hence discriminates between simple hardware expenditures and full rounded strategic investments. It is the integrated approach to optimize hardware and software investments that allows reach- ing the best returns: fixing one side and neglecting the other won’t make any business case really sound.Economics of green HPC - a TCO approach 4
    • This is particularly true when attempting to make a data center greener. An integrated green IT approach means optimizing all parts in a data center, from processor to building. This paper cannot detail all actions needed to achieve greener results but wants to emphasize starting from a case that greener and cheaper often go together, side by side. While storage has its impact on green IT management and economics, the analysis presented in this paper concentrated on computing units, assuming that the same storage is installed in the 3 data centers of the example reported below.Economics of green HPC - a TCO approach 5
    • Definitions Some definitions may help the reader to better understand the remaining of this paper. TCO TCO stands for total cost of ownership. It is the sum, adjusted for the time value for money, of all of the costs that a customer incurs during the lifetime of a technology solution. Talking about an HPC data center, a common breakdown of those costs includes: • Purchase of hardware, including cabling and switches • Purchase of software licences • Building work for new constructions or extensions, including power adaptation • Air conditioning and cooling • Electrical equipment, , including power distribution units, transformers, patch pan- els, UPSes, auto transfer switches, generators… • Installation of hardware, software, electrical and cooling • Hardware, software, electrical and cooling maintenance • Software upgrades and recurring (monthly, yearly) software licences • Any additional form of lease • Energy costs • Disposal costs PUE PUE stands for power usage effectiveness. It measures how much of the electrical power enter- ing a data center is effectively used for the IT load, which is the energy absorbed by the servers and usefully transformed in computational power. The definition of PUE in formula is as follows: Total Facility Energy Consumption PUE = IT Equipment Energy Consumption The perfect theoretical PUE is equal to 1, that means all of the energy entering the data center is used to feed IT equipment and nothing is wasted. ERE ERE stands for energy reuse efficiency and it is the ratio between the energy balance of the data center and the energy absorbed by the IT equipment. The data center energy balance is the total energy consumed by the data center minus the part of this energy that is reused outside the data center. A typical example where ERE is a useful matric is liquid cooling, with which water, cooling the IT equipment, heats up and moves energy outside the data center, where it can be reused. The ERE can range between 0 and the PUE. As with the PUE the lower the value, the better is for the data center. Practically speaking, ERE as a metric, helps in those situations where the PUE is not enough to explain energy reuse. It now mends situations, where it was common habit to factor the energy recovery into the PUE, talking about a PUE lower than 1, which makes a mathematical non-sense.Economics of green HPC - a TCO approach 6
    • Total Facility Energy Consumption-Recovered Energy ERE = IT Equipment Energy Consumption CUE CUE stands for carbon usage effectiveness. It measures the total CO2 emissions caused by the data center divided by the IT load, which is the energy consumed by the servers. The formula can be expressed as follows: CO 2 emitted (Kg CO 2 eq) Total data center energy CUE = X unit of energy (Kwh) IT Equipment energy which, simplified, becomes: CUE = PUE X CEF The CEF is the carbon emission factor (kgCO2eq/kWh) of the site, based on the government’s published data for the region of operation for that year. The CEF depends on the mix of energy types a site is fed with. A site which is entirely powered by a hydroelectric or solar power sta- tion, it has a low CEF. A site which is entirely powered with electricity produced in an oil or coal power station, it bears a higher CEF. While CEF can vary with the site considered, the utility companies provide regional or national average values that can be used with good approxima- tion. Layout The layout efficiency measures the data center square footage utilization and the efficiency of efficiency the data center layout. This is defined as racks per thousand square feet and so it is a measure of space efficiency related to the electrically active floor in a data center. DC layout The most datacenter common layout in today’s data centers is a repeating of rows of racks side-by-side with alternating cold aisles and hot aisles. The cold aisle supplies air to the serv- ers, which heat it up and discharge it into a hot aisle, shared with the next row of server. The servers lay on a raised floor, which provides cool air to the cold aisle, while the hot air returns to the air conditioning system. For instance, in the data center in Figure 2, computer refrigerated air conditioning units (CRACs) pull hot air across chillers that distribute the cool air beneath the raised floor. In big buildings requiring more ventilation power, CRACs are normally replaced by air handling units (AHU). In this hot aisle / cold aisle configuration, a varying number of servers can be fit into each rack based on many factors; cooling capability, power availability, network availability, and floor loading capability (the rack’s loaded weight).Economics of green HPC - a TCO approach 7
    • Figure 2 - Server rack thermal example - Source: Intel IT, 2007 Work cell A work cell is a square footage area in a data center, which is directly attributable to a rack of servers. In air cooled data centers, a work cell is a repeating unit of hot aisle, rack and cold aisle. In a liquid cooled data center it is the rack, plus the liquid distribution and the maneuver space in front and on the back of the rack. Liquid If a data center uses server liquid cooling technology, the above layout may look obsolete. In cooling an existing data center, liquid cooling technology doesn’t require a change of layout, because plumbing, electrical and network works can blend into the existing one. In the case of a new data center that employs solely liquid cooling, most if not all the air conditioning infrastructure becomes redundant, leaving many opportunities for layout and space optimizations. Some of them would be the reduction of the work cell (as defined above), the reduction of the raised floor, the avoidance of chillers, CRACs and air handling units, to be replaced by the cooling circuit (pipes) and free coolers (liquid to air heat exchangers).Economics of green HPC - a TCO approach 8
    • Data centers comparison example The best way to get a good grip with the TCO concept is to draw an example, where 3 different data centers are compared: • A data center with a low/medium density HPC system, air cooled • A data center with an high density HPC system, air cooled • A data center with an high density HPC system, water cooled Table 1 summarizes the main differences and hypothesis. Air cooled low den- Air cooled high Liquid cooled high sity data center density data center density data center Location Temperate climate Temperate climate Temperate climate PUE 2.3 1.8 1.05 Power mix Various energy types Various energy types Various energy types Table 1 - Comparison between the 3 different datacenters The data centers are located in the same geographical area and hence the external tempera- ture profile is the same for all 3 cases throughout the year. The measurements for the PUE computation are taken, one at the plug from where the data center is powered, the other after the PSU, immediately before the circuit feeds the computa- tion units. In this way, there is a neat separation between the power absorbed by the IT equip- ment and the one used for cooling, lighting and other ancillary services. The type of energy mix that feeds the data center is important to calculate the CUE. The CUE can give an indication of how green a data center is on the basis of its carbon footprint. In terms of the actual HPC systems installed in the 3 data centers, as mentioned, the analysis will focus on computational nodes, considering the following 3 systems: • A low/medium density high performance cluster made up of 1U servers, air cooled • A high density blade HPC server, air cooled • A high density blade HPC server, water cooledComparing Comparing systems that are fundamentally different does not always produce fair results, be- cause the risk is to compare apples and oranges, so to compare systems with different features and value addition that fit best the customer depending on the case. So, in order to level out differences as much as possible, the comparison in this example is made between systems delivering the same computational power (500 Tflop/s), having the same cluster architecture, the same RAM, the same Interconnects and software stack. Regarding the processors, the com- parison makes 2 different hypotheses. The first is summarized in Table 2 and it assumes that 3 different processors are used. The second is summarized in Table 3 and posits that the same processor is used.Economics of green HPC - a TCO approach 9
    • Air cooled Air cooled Liquid cooled high low density high density density HPC HPC HPC TFLOP/S 500 500 500 1U servers,Intel Architecture Blades, Intel CPUs Blades, Intel CPUs CPUs 2xIntel Xeon E5, 2xIntel Xeon X5650 2xIntel Xeon X5690 Processors per node (Sandy Bridge) 8C 6C 2.66GHz 6C 3.46 GHz 2.60GHz Layout efficiency 22 22 40 GFLOPS/node 130 165 340 Total nodes needed 3846 3030 1471 Servers/blades per rack 35 100 256 Cores per rack 280 1200 4096 Server racks needed 110 30 6 Total network equipment 481 379 184 Total racks for network 11 9 4 equipment Total racks needed 121 39 10 Occupancy (ft 2) of electri- 5515 1787 253 cal active floor Total DC space (in ft 2) 11031 3575 953 (In m ) 2 1015 329 88 Energy consumption Mflops/W 280 450 800 Total IT Power (Kw) 1,786 1,111 625 Total DC power (Kw) 4107 2000 656 Reliability MTBF per node (hours) 670,000 670,000 804,000 MTBF per system (hours) 174 221 547 Price of outage per h ($) 2000 2000 2000 Table 2 – Characteristics of the 3 data centers: case of different processor typesEconomics of green HPC - a TCO approach 10
    • Air cooled Air cooled Liquid cooled high low density high density density HPC HPC HPC TFLOP/S 500 500 500 1U servers, Intel Architecture Blades, Intel CPUs Blades, Intel CPUs CPUs 2 x Intel Xeon X5690 2 x Intel Xeon 2 x Intel Xeon X5690 Processors per node 6C 3.46 GHz X5690 6C 3.46 GHz 6C 3.46 GHz Layout efficiency 22 22 40 GFLOPS/node 165 165 165 Total nodes needed 3030 3030 3030 Servers/blades per rack 35 90 256 cores per rack 420 1080 3072 Server racks needed 87 34 12 Total network equipment 379 379 379 Total racks for network 9 9 9 equipment Total racks needed 96 43 21 Occupancy (ft2) of electri- 4345 1940 521 cal active floor Total DC space (in ft2) 8691 3881 1221 (In m2) 800 357 112 Energy consumption mflops/w 450 450 450 Total IT Power (Kw) 1,111 1,111 1,111 Total DC power (Kw) 2556 2000 1167 Reliability MTBF per node (hours) 670,000 670,000 804,000 MTBF per system (hours) 221 221 265 price of outage per h ($) 2000 2000 2000 Table 3 – Characteristics of the 3 data centers: same processorsEconomics of green HPC - a TCO approach 11
    • The comparison involves CPU based systems. The same reasoning can be extended to GPUs based systems or hybrid (CPU and GPU based), but we think it would be not useful to compare a CPU system with a hybrid or GPU based one, due to their different technological nature. Density In terms of density, the comparison is made between a low density 1U servers system and 2 higher density blade servers systems, one air cooled and the other liquid cooled. According to Uptime institute, on average in data centers, 75% of a rack gets filled by computing nodes, the rest by power supplies, switches… In the current example, this would set a number of 35 and 100 servers per rack, respectively for the 1U and blade server configurations. For the third system, water cooled, we have taken as reference the Aurora HPC 10-10 from Eurotech, which mounts 256 blades per each 19 inches rack. Racks In terms of number of racks and servers needed, using different processor types or using the same processor type in the 3 systems being compared results in a different total number of servers and racks. In the case the 3 systems use different processors, it is assumed that each node in the 3 systems has 2 processors each, generating a number of Gflops that varies depending on the processor itself. It follows that, to reach the required total Flops, each system will need a different number of servers (nodes). If the same processor type is used and if we make the additional assumption that each node has 2 sockets and generates an equal number of Gflops per node, then the 3 systems have the same number of servers. Nevertheless, in both cases, the density of the systems, so how many cores can be packed into each rack, will determine how many racks are needed and consequently the number of racks needed is different in the 3 systems compared. Following the Uptime institute observations, it has been assumed that the network equipment is roughly 10% of the total IT equipment used in the data center. This has led to the numbers reported in the 2 tables above, where there are also the total racks needed to host the network equipment (assumed to be 1U). Layout With regards to the layout efficiency, it has been assumed as 22 standard racks per 1000 ft2, Efficiency number obtained from the standard size work cell (48sqft). We assumed that a liquid cooled system requires a smaller work cell, because it eliminates the plenum requirements for optimal air cooling. For the liquid cooled system, an assumption of work cell of 25 ft2 seems to be fair and leads to a layout efficiency of 40 racks for 1000 ft2. Footprint In terms of footprint, once the number of racks needed per data center type is determined, knowing the layout efficiency of the data center, a simple calculation gives the total square feet of electrical active floor. The electrical active occupancy has been doubled to include all infrastructure and data canter facilities, the additional space being at minimum on the order of 700 ft2Economics of green HPC - a TCO approach 12
    • Electrical The total power needed in the data center has been calculated making 3 assumptions re- power garding the efficiency of the IT equipment, efficiency expressed in Mflops/w. The TOP 500 list helped us to understand the typical values of Mflops/w per processor type, from which we derived the electrical power needed for the IT equipment to generate the required Flops. The value obtained for IT power of each datacenter was multiplied by the PUE of the datacenter to determine the total electrical power installed in each of them. Reliability System failure rates increase with higher temperature, larger footprints, and more IT compo- nents. Efficient cooling and a reduction of the number of components are essential to increase systems reliability. We have calculated an average nominal MTBF (mean time between failures) per each node, representative of the reliability of the entire system, aggregating the MTBF val- ues of single components (memory, processors…) interconnects, switches, power supplies. The overall system MTBF was derived from the single node MTBF through the formula: 1 MTBF = N is number of nodes in μ is the fault rate for each node, i=N ∑ the system: the higher, the higher, the lower is the MTBF μi the lower is the MTBF i=1 For the water cooled system we have again taken as reference the Eurotech Aurora super- computer family, in which the limitation of hot spots, absence of vibrations, soldered memory and 3 independent sensor networks help to reduce the fault rate of each node, leading to an increased MTBF of the total system (assumed 20% higher). To comprehensively consider the systems availability through their life, an analysis of the re- liability should be extended to all of the electrical and mechanical components in the data center, including auxiliary electrical equipment (like UPSs) and the cooling system. Capital To calculate the annualized investment costs, we have used the static annuity method, defin- recovery ing a capital recovery factor, which contains both the return of investment (depreciation) and factor the return on investment. The CRF is defined as: d(1+d) L CRF = (1+d) L - 1 Where d is the discount rate (assumed 7%) and L is the facility or equipment lifetime. The CRF converts a capital cost to a constant stream of annual payments over the lifetime of the invest- ment that has a present value at discount rate d that is equal to the initial investment.Economics of green HPC - a TCO approach 13
    • Results We have summarized the quantitative results of the analysis in the set of tables reported below. Table 4 reports the results of the comparison in terms of total TCO (3 years) Table 5 reports the annualized TCO results Table 6 shows the results of the total TCO (3 years) in the case the same processor types are used Table 7 shows the results of the annualized TCO in the case the same processors types are used CSA costs The calculations consider the costs of the entire data center, including civil, structural and ar- chitectural (CSA), IT equipment, cooling, electrical capital cost, the energy consumption, main- tenance and additional operating costs. The capital cost to purchase hardware and software is assumed to be the same for the 3 systems. This hypothesis will be relaxed in the next chapter, where some conclusions are presented. CSA costs are highly impacted by the density. According to Forrester research, the CSA cost per ft2 is around $220/ft2 (in USA). This is an average number, its variance depending on the real estate costs and building costs, which may vary greatly by area, region or nation. The permitting and fees for any project are often based on square footage. The specifics of the individual site location would dictate the magnitude of additional savings associated with higher density. Forrester estimates $70 per square foot in building permits and local taxes, which represents a moderate cost in the United States. CFD The cost of computational fluid dynamics (CFD) depends more on complexity than on the square footage. However, it is possible to assume that for a new and homogenous data center $5/ft2 is a fair number. Raised Air cooled data centers normally require a raised floor higher than the ones adopting liquid floor cooling technologies. In fact, liquid cooling may not require a raised floor at all, but it would be safer to assume that some raised floor is needed for cabling and piping. The presence of a higher raised floor requires a similar height increase in the return air plenum, so the overall building height is higher in the case of air cooled systems than in the case of liquid cooled ones. Although building costs are far less sensitive to height than to area, we can assume that the marginal cost to increase the height of the building as required is 10% of the total building cost. As for the raised floor per se, the cost associated to the height delta between the air and liquid cooling cases could be assumed on the order of 2$ per ft2. Racks The IT equipment is normally a large portion of the total budget. The cost of computational node cards, processors, memory, I/O is considered to be the same in the 3 systems being com- pared. What differs is the number of physical racks needed to deliver the 500 Tflop/s required. To correctly account for this difference, a per-rack cost of $1500 is assumed, with an additional $1500 to move in and install it. This value is conservative, considering that it depends of how advanced and interconnected is the data center. Each rack is normally associated to a rack management hardware which can be quoted a similar amount.Economics of green HPC - a TCO approach 14
    • Air cooled low Air cooled high Liquid cooled density density high density Initial investment costs Cost of IT (HW) $7,500 $7,500 $7,500 Building permits and local taxes $770 $250 $60 CSA capital costs $2,420 $780 $200 Capital cost of taller DC $240 $70 $0 Design cost for CFD $30 $10 $0 Delta cost raised floor $20 $10 $0 Fire suppression and detection $60 $30 $20 Racks $360 $110 $30 Rack management hardware $360 $110 $30 Liquid cooling $0 $0 $320 Total for network equipment $960 $750 $360 Cooling infrastructure/plumbing $5,350 $3,330 $280 Electrical $7,140 $4,440 $2,500 TOTAL CAPITAL COSTS $25,210 $17,390 $11,300 Annual costs (3 years) Cost of energy $8,660 $4,980 $1,630 Retuning and additional CFD $50 $20 $0 Reactive maintenance (cost of outages) $1,510 $1,190 $490 Preventive maintenance $450 $450 $450 Facility and infrastructure maintenance $2,040 $1,170 $420 Lighting $50 $20 $10 TOTAL TCO $37,770 $25,020 $14,100 Table 4 – 3 different processors, TCO comparison results (In K USD dollars) Air cooled low Air cooled high Liquid cooled density density high density Cost of energy $2,720 $1,560 $510 Retuning and additional CFD $20 $10 $0 Reactive maintenance (cost of outages) $500 $390 $160 Preventive maintenance $150 $150 $150 Facility and infrastructure maintenance. $670 $380 $130 Lighting $20 $10 $5 Annualized 3 years capital costs $3,440 $3,180 $2,960 Annualized 10 years capital costs $1,770 $1,100 $440 Annualized 15 years capital costs $380 $120 $30 ANNUALIZED TCO $9,670 $6,900 $4,385 Table 5 – 3 different processors, annualized TCO comparison results (In K USD dollars)Economics of green HPC - a TCO approach 15
    • Air cooled low Air cooled high Liquid cooled density density high density Initial investment costs Cost of IT (HW) $7,500 $7,500 $7,500 Building permits and local taxes $600 $270 $80 CSA capital costs $1,910 $850 $260 Capital cost of taller DC $190 $80 $0 Design cost for CFD $30 $10 $0 Delta cost raised floor $20 $10 $0 Fire suppression and detection $60 $30 $20 Racks $290 $130 $70 Rack management hardware $290 $130 $70 Liquid cooling $0 $0 $810 Total for network equipment $750 $750 $750 Cooling infrastructure/plumbing $3,330 $3,330 $500 Electrical $4,440 $4,440 $4,440 Annual costs (3 years) Cost of energy $5,390 $4,980 $2,900 Retuning and additional CFD $40 $20 $0 Reactive maintenance (cost of outages) $1,190 $1,190 $1,000 Preventive maintenance $150 $150 $150 Facility and infrastructure maintenance $1,360 $1,180 $730 Lighting $40 $20 $10 TOTAL TCO $27,880 $25,380 $19,590 Table 4 – 3 different processors, TCO comparison results (In K USD dollars) Air cooled low Air cooled high Liquid cooled density density high density Cost of energy $1,690 $1,560 $910 Retuning and additional CFD $20 $10 $0 Reactive maintenance (cost of outages) $390 $390 $330 Preventive maintenance $150 $150 $150 Facility and infrastructure maintenance $450 $390 $240 Lighting $20 $10 $5 Annualized 3 years capital costs $3,390 $3,270 $3,220 Annualized 10 years capital costs $1,100 $1,100 $820 Annualized 15 years capital costs $300 $130 $40 ANNUALIZED TCO $7,510 $7,010 $5,715 Table 5 – 3 different processors, annualized TCO comparison results (In K USD dollars)Economics of green HPC - a TCO approach 16
    • Liquid The third system in the current comparison is liquid cooled. The analysis takes in consideration Cooling the additional cost of the liquid cooling structure. The “liquid cooling” item in the tables above refers to the cost of actual cooling components within the rack. The external piping, dry coolers etc. are grouped under the “Cooling infrastructure” cost item. For instance, in the Eurotech Au- rora supercomputers the cooling parts within each rack include liquid distribution bars, piping and the aluminum cold plates that are coupled with each electronic board, allowing a direct and efficient heat removal. In other supercomputers or HPC clusters, the cooling system may consist in distribution pipes and micro channels. Cooling In the case of air cooled servers, the cost of the cooling infrastructure includes chillers, AHUs,infrastructure CRACs, piping, filters…. In the case of water cooling, it includes heat exchangers (free coolers), piping, filters and pumps. The dimensioning of the cooling depends on the data center location (which, in this example, is assumed the same) and its temperature profile across the year, the Kw/ft2 of IT equipment installed and the size of the building. With air cooled systems, It is assumed that an approximate 3000$ of cooling infrastructure is required for each Kw of IT equipment installed, while the liquid cooling infrastructure is gener- ally cheaper. This is particularly true in the case of a hot water cooled solution, like Aurora from Eurotech. Such technology allows the employment of free coolers instead of the more expen- sive chillers, in most of the world climate zones. Also, much of the ventilation, necessary in an air cooled data center, may be avoided. Hot liquid cooled allows for high density systems to be deployed and hence for less occupancy, reducing the volume of air that needs to be chilled. As a consequence, the use of water cooled systems can save a big chunk of the money dedicated to the cooling infrastructure. Electrical As regards the electrical equipment (e.g., power distribution units, transformers, patch panels, UPSes, auto transfer switches, generators), it is a cost that can be made, like cooling, propor- tional to the KWs of IT equipment installed. For the present example, it has been assumed a cost of $3500 per Kw of IT equipment installed. Energy Analyzing now the operating costs, particular attention should be dedicated to the cost of en- costs ergy. In the comparison being described, the cost of energy is calculated computing the energy consumed by the data center each year and during 3 years of operations. The cost per Kwh has been fixed as: • 0.11$/Kwh for System 1 • 0.13 $/Kwh for System 2 and 3 System 1 cost per Kwh is less than the one for the other 2 systems, because the first datacenter ends up consuming enough to most probably benefit from volume discounts. Both of the en- ergy unit costs chosen are relatively high for some parts of US, but generally low for Europe, where the average cost per Kwh for industrial use is 0.11 € (0.15$). The IT equipment is supposed to bear an average load of 80% and have an availability of 86%, an assumption conservative enough to counterbalance cases in which energy prices are lower than the ones assumed. The calculated cost takes in consideration all of the energy consumed by IT equipment, cooling, ventilation, power conversion and lighting. Economics of green HPC - a TCO approach 17
    • Two main aspects affect the difference in energy consumption: the efficiency of the single servers (in terms of flops/watt) and the efficiency of the whole data center, in terms of PUE.Maintenance We have divided the costs of maintenance in 3 different groups: preventive maintenance for costs the IT systems, reactive maintenance for the IT systems and the maintenance of all other infra- structure, including building. The IT system preventive maintenance includes cost of IT systems monitoring, fine tuning, soft- ware upgrades and patches, consultancy and personnel. It is assumed to be proportional to the flops installed and, for this reason, it is equal for the 3 systems compared. The reactive maintenance has been associated to outages. To take in consideration the cost of HW replacements, personnel, support, consultancy and business impact associated to the fault, a value of 2000$ per each hour of outage has been chosen. While we believe this value is a fair and conservative assumption, it is also true that it may vary a lot from sector to sector and company to company. In particular, the business impact cost of an hour of outage may span between few hundred dollars to millions depending on the type of outage and the nature of the business of the company using the HPC facility (an obvious difference could be comparing a small manufacturing company with a large investment bank). While the annual facility and infrastructure is an unpredictable ongoing costs of a data center, on average, companies should expect a base cost of 3% to 5% of the initial construction costAnnualized The initial investment costs (capital costs) have been annualized using 3 different year bases, 3, TCO 10 and 15 years. Capital costs with three year life include all IT equipment costs (which include internal routers and switches). Capital costs with 10 years life include equipment other than IT (electrical and cooling). Capital costs with 15 year lifetime include all other capital costs (build- ing). We have calculated a capital recovery factor, using the different life values and a 7% real dis- count rate. CHILLER 23% ELECTRICAL POWER IN HUMIDIFIER 3% CRAC/CRAH 15% HEAT OUT INDOOR IT EQUIPMENT 47% DATA CENTER PUE = 2.13 HEAT PDU 3% UPS 6% LIGHTING/AUX DEVICES 2% SWITCHGEAR/GENERATOR 1% Figure 3: Typical power distribution in a data center (source APC)Economics of green HPC - a TCO approach 18
    • Economical conclusions This brief study demonstrates that there is a sense in considering the overall total cost owner- ship of an HPC solution and so extending the cost analysis beyond the initial expenditure of hardware and software. Energy efficiency, density and higher reliability play an important role in determining the overall cost of a HPC solution and consequently its value, compounded as total benefits minus total costs. In the example reported in this paper, an assumption was made that the purchase cost of hard- ware was the same for the 3 systems being compared. We want now to relax this assumption, keeping a distinction between the 2 cases in which different and the same processor types are used. In the case the 3 systems have different processor types, the increased energy efficiency, den- sity and flops/$ that newer generation processors bring, put in advantage the system that adopted the latest Intel technology. There is a clear benefit in using the latest technology (at least comparing systems with the same total flops). If we compare 3 systems mounting different processor types, in order for the annualized TCO of the 3 systems to be the same: • The 1U server solution cannot equal the TCO of the water cooled solution, even if the IT hardware is given away for free • The IT hardware of the air cooled blade server solution must be 70-80% cheaper than the IT hardware of the liquid cooled one. In the case the 3 systems compared are using the same processors, in order for the annualized TCO to be the same: • The IT hardware of 1U server solution must be 50-55% cheaper than the IT hardware of the water cooled solution • The IT hardware of the air cooled blade server solution must be 35-40% cheaper than the IT hardware of the water cooled solution. It is important to remark that a TCO study is conditioned by the particular organization situa- tion. Important variables are location, availability of datacenter space, energy contracts, avail- ability of power…While this is certainly true, the example reported in this paper gives an idea of how useful it is to dig deeply enough to discover that a throughout analysis can lead to more sensible purchases.Economics of green HPC - a TCO approach 19
    • Green considerationsEconomics We have seen that energy efficiency, density and reliability are areas that contribute to im- and green prove the TCO of a high performance computing solution. Incidentally, they are also means to reach greener IT. Saving energy is a paramount component of any green IT policy. We have pointed out that the approach to energy savings should be multilevel, involving hardware, software, facilities and even human behavior in a data center. Some actions, more than others, can have a determinant impact in making datacenter greener, like, for instance, adopting liquid cooling, which lower the data center PUE down to limits not easily reachable with air cooling. Also, a reduction in space occupancy ticks many green boxes, when building avoidance or reduction of newly built is taken in consideration. High density HPC solutions help delaying the construction of new real estate or limiting the size of newly built, contributing to reduce carbon footprint and the depletion of hearth resources, as well as limiting the environmental impact on landscapes. High reliability means less faults, so less spare parts, making it possible to save on components that use hearth resources and need to be disposed, with the consequent creation of waste. Green value A part from the benefits brought to the cost side of the profit equation, Green IT can influence the revenue side, when organizations are able to capitalize their green image or posture. Many companies and public institutions can leverage the intangible value of being green, through increased reputation that pushes up sales or through the government contributions that are reserved to green policies/sustainability adopters. The biggest payback of green and sustainability will be seen overtime. According to the Natu- ral Step, a word renewed sustainability think tank, it is not a matter of if companies and gov- ernments need to adopt sustainability measures, it is a matter of when, within a context where who will be late to adapt to new requirements will pay a severe price. Carbon Moving to a more practical dimension, to complete the example made in the previous para- footprint graph, it would be useful to compute the total number of CO2 tons associated to the 3 data center operations throughout the lifetime of the solution adopted (assumed to be 5 years). Table 8 reports this calculation, in which it has been assumed a CEF (carbon efficiency factor) of 0.59 Kg CO2 per kg (kgCO2eq). This is average in the US and quite similar to Europen average. It is fair to mention that the CEF has a great variability depending on country, state, region, power mix of the data center. Air cooled low Air cooled high Liquid cooled density density high density Tons of CO2 51692 16385 9558 CUE 1.26 1.06 0.62 Table 8 – CO2 equivalent and CUE in the 3 cases under scrutinyEconomics of green HPC - a TCO approach 20
    • Thermal When in operations, the IT equipment of a data center transforms electrical energy in heat. Air energy cooled data centers remove the heat through air conditioning, while liquid cooled IT equip- recovery ment convey the heat out of the building using water. In the latter case, water can be heated enough to be utilized for producing air conditioning, through a absorption chiller, producing some electrical energy or simply heat a building. This practice is called thermal energy recovery and, while still rather marginal due to the average size of present data centers, it will become progressively more important while the average data center dimension increases. Reusing the energy is an interesting concept both from the economic and ecologic viewpoints. This is why a specific measure has been introduced, the ERE, to overcome the limitations of PUE, which is mathematically incapable of considering thermal energy recovery. Aurora from Eurotech Aurora is the family of green, liquid cooled supercomputers from Eurotech. Eurotech develops the Aurora products from boards to racks, maintaining production in house in the Eurotech plant in Japan and keeping a strict control over quality. Aurora product line is the synthesis of more than 10 years of Eurotech HPC history and investments in research projects, giving birth to supercomputers that set the scene for their advanced technology and forward-thinking concepts. Aurora supercomputers tick all boxes to take full advantage of the green economics: Energy efficiency, thanks to liquid cooling and a very high power conversion efficiency. The direct on component water cooling tech- nology allows the use of hot water and free coolers in any climate zone, with no need for expensive and power hungry chillers. In ad- dition, Aurora supercomputers employ a power conversion with ef- ficiency up to 97%. An Aurora data center can aim to a PUE of 1.05. High density, with 4096 cores and 100 Tflop/s per standard rack, Aurora supercomputers are among the densest in the world, per- mitting to save in space, cabling, air conditioning, power and data center complexity. High reliability, guaranteed by high quality, liquid cooling that reduces or eliminates the data center and on board hot spots, vi- bration less operations, redundancy of all critical components, 3 independent sensor networks, soldered memory, optional SSDs. Eurotech HPC division has inherited from Eurotech embedded and rugged PCs core business the ability to construct reliable systems. Eurotech excels in building bomb resistant computers or PCs that need to operate in very harsh conditions, so they know one thing or two…Economics of green HPC - a TCO approach 21
    • About Eurotech Eurotech is a listed global company (ETH.MI) that integrates hardware, software, services and expertise to deliver embedded computing platforms, sub-systems and high performance com- puting solutions to leading OEMs, system integrators and enterprise customers for successful and efficient deployment of their products and services. Drawing on concepts of minimalist computing, Eurotech lowers power draw, minimizes physical size and reduces coding complex- ity to bring sensors, embedded platforms, sub-systems, ready-to-use devices and high perfor- mance computers to market, specializing in defense, transportation, industrial and medical segments. By combining domain expertise in wireless connectivity as well as communications protocols, Eurotech architects platforms that simplify data capture, processing and transfer over unified communications networks. Our customers rely on us to simplify their access to state-of-art embedded technologies so they can focus on their core competencies. Learn more about Eurotech at www.eurotech.com For HPC visit www.eurotech.com/aurora www.eurotech.com For sales information about Eurotech Aurora product range: aurora@eurotech.com For more in depth technical information about Eurotech Aurora range: aurora_tech@eurotech.comEconomics of green HPC - a TCO approach 22