Holistic approaches to green it gregynog colloquium - june 2011


Published on

Cardiff University's Green IT presentation at Gregynog 2011.

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This talk will cover the major drivers to go green and some important metrics, describe some of the measures that we’ve taken at Cardiff (as well as an example from elsewhere in the UK), starting where the power enters the building, looking at the Datacentre, checking out PC consumption and use on the desktop, and thinking about how to influence people and the organisation.
    These are just examples that may or may not be relevant for your context; e.g.‘free’ cooling might be relevant up a mountain in Scotland (and should be in Cardiff, but… more of that later) where the ambient temperature is low all year round. Free cooling might not be so relevant in warmer countries. However, what I hope to do is to show you what and how we have thought about the problem, and perhaps you will in turn find that useful.
    I’ll finish with a paradoxical claim - a tool that ensures that PCs use the maximum amount of electricity possible can be fairly considered to be as ‘Green’ as possible. I hope that I’ll be able to persuade you that this last claim is not ‘Greenwash’.
  • So, what do we mean by Green IT ? There are many answers out there, which range from ‘minimal or no impact on the environment’, through ‘environmentally sustainable’, to ‘use of resources in an efficient way’.
    I’m not going to debate the definition with you, but I will attempt to persuade you that, whichever definition you choose, it may not be as hard as you think to become a little bit (or perhaps a lot) greener, even though you know that ‘zero environmental impact’ is a long way off 
    Going ‘Green’ is not a one-off set of actions: it is an on-going struggle to understand what is happening on a wide range of levels, measuring systems and analysing data, and then selecting the next target for improvement.
    I expect that we’re all familiar with ‘Greenwash’ – a hybrid engine that uses more petrol than conventional competitors, a two-stroke engined lawn mower that hypes its recycled packaging, the claim that a product is free of CFCs (which have been banned for years), thin-client energy comparisons using Pentium 4 energy examples (CPU energy has gone down by >50% in recent years), and so on, and on, and on.....
    Sometimes it results in using more energy for greatly improved services. Is that Green ? Well, a bus uses more fuel per mile travelled than a car, but less fuel per passenger mile travelled – more energy, yes, but also more efficient. So, when selecting metrics to target for improvement, make sure you select the appropriate metric.
    So, there are large amounts of conflicting information and misinformation out there. How do we decide what to do ? How do you distinguish the irrelevant claim from the useful ?
  • There are a number of drivers for doing ‘Green IT’.
    Not all of these points apply to all environments. The relative priority of all of these drivers will vary, but energy prices are likely to become more significant as oil prices rise.
    Computers and computation are increasingly used in all areas of human endeavour, not just the hard sciences. This is being driven by a number of factors:- Growth in the quantities of data readily accessible over the Internet – census, genome databases, etc. Growth in data capture and mining of varying sorts – think health statistics or Amazon purchasing patterns. Personal music and photo collections are increasing so we now need TeraByte disks at home.
    All of these trends lead to a growth in demand for CPU & Storage, which all uses more physical space and electrical power, neither of them always available where you need them. For example, building new Datacentres in London with very high property prices is no longer very practicable. In many locations, there is insufficient electricity generating capacity to power the new Datacentres that are required, and the costs associated with this are also increasing as oil reserves dwindle.
    It is not only these fundamental constraints on the costs and availability of space and power that produce drivers to ‘go green’.
    Organisations are becoming increasingly concerned with their reputation. Enhancing the Reputation of Cardiff University & attracting better students by emphasising how ‘green’ we are is a definite driver for us. Sustainable IT is another great phrase which often means “sort out your future funding”, but is often translated to “being as green as possible”.
    Legal compliance is arriving as another driver, as governments strive to reduce their carbon emissions in line with international agreements.
    To complete this run through the main drivers, and this is a more personal view, I believe that we need to develop and improve our capabilities in these areas because we should (for the Planet, climate change, finite nature of most natural resources, legacy for our descendants, etc).
  • So, what should we measure ?
    The problem of deciding which metric to use is a bit like the old question of which came first, the chicken or the egg. You need to have some idea of what the numbers are before you can decide which numbers to measure (and thus target for improvement). So, how do you overcome that problem ?
  • This is where your ability to produce “guesstimates” really matters.
    Let’s start with electrical power, and a useful observation to help you cost things. At UK prices, 1kW used for a year costs approximately £1000.
    In your organisation, how many of you know how much power is used by all desktop PCs, and how much is used by servers ? For those of you who know (or have a rough idea), this could be the clue as to where to start your Green programme – part of your local context. If you don’t know these numbers, then perhaps you should ?
    Perhaps you can do some rough modelling (a “guesstimate”) as follows :-
    Measure the consumption (at idle) of your newest and oldest PCs (and monitors), and split the difference. Say you get 150W. Assume a duty cycle of 20% (8 hours per day, 220 days per year) – or if you have hard data, use that. Estimate (or better still know) how many PCs you have. From that, estimate the total PC power consumption. For 10,000 PCs, that equates to an electricity bill of around £0.3M p.a. at UK prices.
    Estimate your average server consumption – say 300W. Multiply by 2 to allow for cooling and other losses (more discussion on that later). Assume 100% duty cycle – most servers are on 24X7. Estimate (or count) your servers. For 300 servers, that equates to an electricity bill of around £0.3M.
    So, for the hypothetical numbers above, the servers are using about the same as the PCs.
    Now ask yourself where there might be possibilities for savings. Perhaps you could halve the duty cycle of PCs by encouraging more people to switch them off, or by implementing Windows power saving methods – that could halve the power consumption. Maybe moving to a thin-client solution would do even better.
    Perhaps you could improve the efficiency of your server cooling (more later), or would server virtualisation help ?
    The most promising initial target may not be with the largest consumer of electricity, but perhaps with the opportunity that gets you the necessary backing from the Finance Director.
    I can’t tell you what the right answer is; it all depends on your local context, and how far you are down this particular road.
    So much for the electricity consumption side of things. What else might we measure that has a part to play in our decision making processes ?
    Lifetime carbon costs of equipment are notoriously difficult to measure, but you can get useful info out there on the net.
    Paper consumption is a good one to tackle, as the embedded energy costs are high and many chemicals used in production are hazardous to the environment. Toner costs per printed page vary widely, with smaller printers being more expensive to run.
    Now what about the benefits that you get (or don’t get) from all of these IT services that use so much electricity ?
    Are you running old services that no-one really uses any more, and you just hadn’t noticed ? Perhaps they are running on old Pentium 4 based CPUs that had one of the highest power ratings out of the entire x86 series. Do you run multiple (physical) web servers based on Apache that are lightly loaded ? If so, run them as multiple Apache instances on one physical box (another form of virtualisation).
  • So, let’s consider some specific metrics such as:-
    Cost (Watts of electricity, capital investment, staff effort)
    CPU hours (how many hours of a Pentium did my computer job use)
    MFlops/Watt (Millions of Floating Point Operations (maths) per Watt of electricity) – Green Top 500
    June 2010 – 773.38Mflops/Watt – Juelich supercomputer.
    Cardiff, 2008 ~ 200 Mflops/Watt.
    GB/Watt (GigaBytes of data stored per Watt of electricity)
    Gbps/Watt (Gigabits per second of network bandwidth per Watt of electricity)
    Cooling and power losses (PUE)
    Lifecycle Assessment – energy in production, distribution, disposal, recycling – can be significant.
    Pages printed (or not)
    Cost per page printed (printer, toner, paper)
    Physical space used saved – see SwaP = Performance/(Space X Power)
    For a research-based university, an important metric could be research papers published or cited. Note that this ‘Context’ affects your drivers.
    If you are running a supercomputer that supports some researchers, you might just use something as simplistic as the numbers of CPU hours or MFlops consumed during a research project. Alternatively, your Green IT metric might compare that papers published or cited to the total number of CPU hours or MFlops provided that supports that research. In both cases, an enhanced Mflops/Watt rating for your supercomputer would be a ‘Green benefit’ and thus could be a good metric.
    For the whole university, an on-going reduction in pages printed would probably be a good thing, so are you currently recording this number ?
    Or suppose you think outside the box. Could you use all that waste heat from the servers to heat your swimming pool, or warm your greenhouse ? Good for the UK, perhaps, but maybe you don’t need this in Australia ? The importance of local context again.
    To summarise on Metrics:-
    Use broad brush guesstimates, and then go for more accurate measurement of selected metrics.
    Pick the low hanging fruit and/or the largest wins, implement, measure, start again.
    Watch for ‘unintended consequences’.
  • What is PUE ?
    PUE=(Total power into the Datacentre)/(Total power into the computers in the Datacentre)
    A PUE of 1 is just about perfect. 2 is bad. 1.2 is ‘state of the art’.
    This graph shows how PUE was changing at Microsoft’s datacentres from 2004 to 2007.
  • So, where does all the power go ? The first lesson is to consider every step of the path that your electricity follows, from the point where it enters your building, probably to a high voltage (HV) transformer
    Our Estates department quotes an expected 3% saving for using the right HV transformer, at an additional cost of £500
    For a new project, running at 50% load, the ROI is approximately 1 month !!!!!!!
    For a replacement, the work and disposal (old transformer) costs take that to a year or more – not too bad when the lifetime may be 10-20 years.
  • Look at losses when you get into the room itself following the energy all the way to where it leaves (usually as hot air).
  • Shown here is a top down view of a NetShelter SX rack with two InRow RC cooling units on either side. Both the rear of the NetShelter SX rack and the 2 InRow units are contained creating a closed loop system. As you can see:
    Fans of InRow cooling unit(s) distribute cool air to front of IT equipment
    Air passes through servers and absorbs heat
    Server exhaust air is prevented from escaping by rear containment
    All exhaust air is returned to InRow cooling unit(s)
    Proper airflow through the enclosure is ensured
    This particular configuration allows can allow for upwards of 25kW per rack providing 2N cooling redundancy.
    A variant of this configuration would be to use rear containment only allowing the cold supply air leaving the InRow cooling unit to disperse into the cold aisle. This would allow for sharing of cooling capacity with neighboring IT equipment.
  • There has been a gradual recognition that the more you can cool the hot air before you mix it with any cold air, the laws of thermodynamics tell you that it will take less energy. Thus hot aisle, cold aisle became common in Datacentres, followed by solutions such as APC’s Hot Aisle Containment, and other solutions such as cooled doors). These methods have varying pros and cons depending on your context again, but should all be considered.
  • Chilled water supplies are supposedly cheaper if they use free cooling – when it is below ~10 Centigrade, reduced refrigerant cycle, below 3.5, no cycle. But what about your context ? If it is hot all year round, free cooling may not be an option). Also, does it work in your environment anyway (perhaps not so well in Australia as half way up a mountain in Wales) ?
  • UPSs are not 100% efficient. When run at low percentage levels of their maximum, they can lose anything from 10% up to 25%, even if they are 95% efficient at full load.
    So, make sure you size your UPS at the Goldilocks level (just right). This is not as easy as it sounds. Our HPC cluster uses 115kW when performing the benchmarks used in acceptance tests, but under normal production use never exceeds 80% of that level.
    So, what’s the answer to that ? Perhaps it is to buy a modular UPS like the one we bought.
  • What does this slide tell us ? 6% difference in efficiency between 2 specific UPS models. Generally, the smaller the UPS, the less efficient. For 100kW UPS running at 50% load, that 6% difference translates to £3000 p.a.
  • What is our PUE, and why it should be better.
    What does the graph show ? No correlation between temperature and PUE; Free Cooling not working.
    We believe that this is because a buffer tank is needed, as the mechanical cooling was turning on and off too frequently, which wastes energy. We will shortly be fitting a buffer tank, and hope to see improvements as a result.
    Also, we have not yet raised operating temperatures, which will further reduce energy needed to cool everything.
    What are some of the other questions you might ask ?
    Server Power supplies – how efficient are yours (Blades vs Pizza boxes) ?
    How do you choose the CPUs for your servers ? Do you buy the fastest, or those with the best MFlops/Watt. What impact does the choice of memory or disks have on energy ?
    How much might be wasted by the overnight running of services only used in the day ?
    Are your servers doing useful work all of the time ? (virtualise, etc)
    How much space is consumed ? Space has a cost, and so do newer cooling methods. Sweating the assets on your cooling infrastructure can reduce the per unit cost of cooling.
  • These new chillers are supposed to have a higher CoP (Coefficient of Performance) and will be used in conjunction with the existing Free Cooling chillers.
  • To summarise our procurement, here are the Key Power Saving Solutions (Slide18), and the Key business benefits. (Slide 19)
    Note that there is a good ROI on the large efficient room-based UPS. Partly from buying ‘Just what you need’, rather than ‘Maximum you might need’
    Cooling ROI, potentially very good. Time will tell when we have the numbers.
    Servers – ROI on quad-core difficult to assess, but we are VERY happy with a decision that effectively gave us twice the ‘grunt’ for the same money, took up less space, and will use less power.
  • How much energy can you save through server virtualisation and correct storage selection ?
    What is server virtualisation – running many virtual servers in one physical server.
    This relies on the fact that many physical servers are running at very low CPU utilisation levels, so you can run many virtual servers in a single physical server whilst delivering the same performance to users.
  • Watts X hours per day = watt-hours per day; X days per year = watt-hours per year; ÷ 1,000 = kWh per year; X energy cost per kWh = total annual energy cost.200W X 24hrs per day = 4800 watt-hours per day; X 365 days per year = 1752000 watt-hours per year; ÷ 1,000 = 1752 kWh per year X 12p per kWh = 21024p =£210.24
    100W X 24hrs per day = 2400 watt-hours per day; X 365 days per year = 876000 watt-hours per year; ÷ 1,000 = 876 kWh per year X 12p per kWh = 10512p =£105.12
  • Watts X hours per day = watt-hours per day; X days per year = watt-hours per year; ÷ 1,000 = kWh per year; X energy cost per kWh = total annual energy cost.500W X 24hrs per day = 12000 watt-hours per day; X 365 days per year = 4380000 watt-hours per year; ÷ 1,000 = 4380 kWh per year X 12p per kWh = 52560p =£525.60
    250W X 24hrs per day = 6000 watt-hours per day; X 365 days per year = 4380000 watt-hours per year; ÷ 1,000 = 2190 kWh per year X 12p per kWh = 26280p =£262.80
  • Cardiff used the following estimates as part of the business case for virtualising:-
    Total electricity cost reduction over 3 years from £425k down to £57k (and capital from £1.15M down to £191k
    So, having picked off the low hanging fruit by virtualising, where else can we look for savings in our server estate ?
    Can you make further energy reductions ?
    Yes. Look at automated VMotion & power down (save money at night and during the weekends).
    What about the newer CPUs and their ability to dynamically slow down the CPU ?
  • Cardiff won a grant to model and measure the energy impact of tiered storage – Project was called Planet Filestore.
    Capital Costs:-
    A 450GB 15k SAS disk costs about the same as a 2TB 7k2 SATA disk. So, if your top tier is Raid 10, SAS disks, mirrored across two sites, and your bottom tier is Raid 5 (8+1) SATA and not site-mirrored, you see an approx X16 difference in £/usable TB. (~X4 from the disk size, ~X2 from Raid5 vs Raid10, and another X2 from the site mirroring).
  • How much energy does the storage consume per usable TB ? Start with the X16 and then note that a 7k2 disk typically uses 15W, and a 15k disk uses 25W.
    So if your disks are all Tier 1 (say 400GB usable = 100GB in Raid10, site mirrored) = 1000 disks @ 25W = 25kW ~ £25,000
    For HSM, with 10TB Tier 1 & 90TB Tier 2. Raid 5 array (8+1) of 2TB SATA delivers 16TB usable. Therefore need 6 arrays = 45 SATA disks. If Raid 5 array is (4+1), then 55 SATA disks. Use average of 50 disks.
    Tier 1&2 = £2500 + ((50 disks@15W) = 0.75kW) £750 = £3250
    Energy savings of £21,750 for every 100TB
  • Compare the energy use of a PC left on 24X7, with one that is only turned on during working hours (say 8 hours per day, 200 days per year. That is equal to 20% of the time). So, lesson number one is to turn them off when you are not using them.
    Context changes over time – PC energy use
    2005 PC – 200W-300W
    2010 PC – 50W-100W
    2005 CRT Monitor – 60W-150W
    2010 LCD Monitor – 15W-30W
  • Should you implement one of the many packages that can provide Asset Management and Power Control & Recording (e.g. Nightwatchman) ? Very useful to prove to management that your changes are having an effect.
    Can you influence your users ? Perhaps the very recent Microsoft Research offering (Joulemeter, an alpha release piece of software, so be cautious) would help change user behaviour ? This is based on the idea that providing more information about energy usage can directly affect behaviour.
  • Don’t believe everything you read  Data from 2007 (Stanford Research) concerning thin client energy benefits quoted a thin client @ 15W and a PC @ 300W.
    What’s wrong with this ? Most PCs don’t use 300W anymore. 50-75W would be a better figure.
    See the paper for an up to date estimate of savings for a thin client scenario.
  • Deskjet energy costs vs centralised, larger printers
    “Over the life of a laser printer, Lexmark calculates some 80% of the costs will be for the paper it uses, around 6% for toner cartridges and around 8% for the electricity. For an ink-jet, it’s 29% for the paper costs, 7% for ink cartridges and 16% for electricity.”
    Perhaps you should stop using ink-jet printers ?
  • I’m presenting here some measured results in a UK University (Liverpool John Moores – LJMU).
    Around 20,000 undergraduates, with a more centralised management structure than Cardiff.
    This facilitated them in making a business case to save significant money by a comprehensive change to all printing facilities. They implemented a new chargeable printing system with students charged via software-based accounts that they can ‘top-up’, and staff charged via departmental accounts.
    They use Print Release stations to reduce actual pages printed, and gradually removed deskjet-type printers.
  • There were difficulties in changing user attitudes, but once people got used to the new system, they generally preferred it.
  • Network switches and routers don’t use much electricity, do they ? Well, that’s changing, as we start to see more devices using Power over Ethernet (PoE), and more desktops using Gbps connections. I’ve seen values for Gbps switches ranging from 0.5Gbps/W up to 3.5Gbps/Watt.
    So let’s do a guesstimate for Cardiff’s LAN. We have around 50,000 outlets, all of them connected to switches. Right now, they are mostly 100Mbps, but we’ll probably move to Gbps over the next few years, and unless switch designers do something about this, those 50,000 ports on edge switches could be using anything from 15kW up to around 100kW. A difference of around £85k p.a., and I haven’t added in the core and distribution switches.
    What about PoE ? The newer PoE standards deliver up to 25.5W, with some manufacturers claiming 50W capability. A 100m Cat 5 cable run will lose around 5W. PoE typically delivers around 50V, which is then DC-DC converted at the powered device incurring more losses – I’ve seen 10-20% quoted. Add that up and you could be wasting up to 10W out of the standard 25W. That’s a staggering 40% loss.
    One other thing to watch for is the cabling. At these power levels, you can’t tie together more than around 100 cables, or they’ll start to get too hot !!
    So, although switch manufacturers are starting to catch up with the Green marketing push and offering power saving features such as automated power down of unused ports, they still have a long way to go, and some of the major players have the least efficient kit.
  • Can your organisation help by changing your purchasing policies ? Yes. Here are just a few examples of how you can do this.
    Insist on your suppliers presenting information regarding aspects like EnergyStar ratings, or end-of-life disposal for PCs.
    Buy the Greener PC models. We have found these to cost approx £30 more, which pays for itself within 2 to 3 years.
    Discourage the purchase of the very fastest PCs – they may use 50% more power for a 10% performance improvement.
    For High Performance Computing (HPC) and general server purchasing, insist on suppliers contractually committing to consumption, and use their figures as part of your evaluation, perhaps in a TCO figure.
    Require your suppliers to only publicise energy-efficient systems when providing quotes to your organisation. Get them to configure energy saving settings as part of your supplied configuration.
  • I have already covered some examples of how to influence both your organisation and users in the previous examples. Perhaps the most important point here is to consider the audience, and what you are trying to achieve.
    Local context comes into play yet again. For example, in Cardiff, all electricity bills are paid centrally, so Heads of Academic Schools are not particularly interested in how much energy their PCs use, but the Head of Estates is.
    In other organisations, the bills are devolved, so you need to target a different audience for this information.
    Individuals may have little interest in the bigger picture concerning the energy impact of virtualisation on the total energy bills, but your management probably do.
    However, telling each PC user how much energy they have used today, and how much they might save by changing the power saving settings has a real chance of changing things.
    Technical types like me are probably influenced more easily when told the detail of how some energy saving measure works, whereas the Finance Director just wants to know the impact on the bottom line and the IT Director may want reassuring that it won’t take lots of staff time.
  • Finally, here is a counter-intuitive example of ‘more is less’.
    There is a freely available package called Condor, which makes all those spare CPU cycles in your PCs available for certain types of HPC calculation. The effect on your PCs is to make them use more energy, all of the time.
    So why might this be green ?
  • The basic answer is that you are getting a greater value from the extra energy that one of these PCs uses when doing some HPC work (provided that it is already turned on) than you would get from running a dedicated HPC service on a set of servers.
    Let’s show how that works by using a guesstimate again. We have measured the energy consumption of many different PC models to show this effect. Let’s consider a typical modern PC - a 2.4GHz Core2Duo PC. Its measured power usage when idle was 65W, and when running at 100% CPU utilisation was 81W.
    So, it only adds 16W to run your HPC work. Now compare that with a dedicated HPC system build from the same components – it will use the full 81W to do the same work – that is 5 times as much as the PC.
    Actually, it is not that bad, because the dedicated HPC system is probably more efficient, but note that it would have to be 5 times more efficient to match the PC.
    There is another environmental advantage to Condor that relates to the embodied manufacturing costs. You have this PC anyway, and all that Condor is doing is providing extra value by ‘sweating the asset’. It does not add to those costs in any way.
    It is also possible to run Condor in a fashion that does not get these benefits. I know of one University that only runs Condor on their PCs at night, by using ‘Wake on LAN’ to turn PCs on when a user submits jobs to the Condor system. They still ‘sweat their asset’ and have provided a useful HPC system with almost zero capital investment, but they don’t get the energy benefit identified above.
    What about a changing technical context ? We have observed that, as CPUs get more and more cores, the difference between 0% and 100% gets less, and so Condor becomes even more efficient compared with a dedicated cluster.
  • I have covered some of the many ways in which Cardiff University is trying to genuinely improve its Green credentials and presented overviews of some important ones.
    The implementation of Green policies will in many cases save money over the longer term, as well as benefiting the environment.
    Organisations should think carefully about the metrics they employ to measure success in this area, and should adopt a constant cycle of choosing the next target metric, implement selected improvements, measure the result, and then start again.
    In all cases, it is important to understand the local context, whether that be the ambient temperature of your location, your organisation’s prime drivers, or the need to capitalise on opportunities to maximise the usage of components. Remember that the context (technical, political, environmental, etc) can change.
    People can help if you encourage and inform them, and finally, don’t forget the wacky idea. Sometimes it is a winner.
    Questions ?
  • How do you make your Data Centre Greener ?
    I’ll be talking here about the experience that we gained when procuring a new supercomputer for Cardiff, along with the Datacentre to house it. One of the early pointers I found was a report to the US Congress in Aug 2007 on Datacentres. If you are interested in the issues and trends, look it up.
  • Ran 2 simultaneous tenders – 1 for HPS, 1 for environment – Difficult, required good relationship and cooperation between 2 suppliers (and our Estates dept).
    Crucial aspect to make it clear that tenders were to be evaluated on the basis of a 3 yr TCO, requiring suppliers to commit to power consumption figures, UPS efficiencies and cooling overheads. In addition, we wish to retro-fit additional metering to one of our existing machine rooms, so we will have a direct comparison.
  • Holistic approaches to green it gregynog colloquium - june 2011

    1. 1. Holistic Approaches to Green IT Dr Hugh Beedie CTO ARCCA & INSRV Cardiff University Going Green Can Save Money
    2. 2. Introduction  What is Green IT ?  Drivers & Metrics  Data Centres, Power & Cooling  Servers, Virtualisation, and Storage  PCs, Monitors & Thin clients  Printing and Networks  Purchasing policies  Influencing user behavior & the organisation  ‘Thinking outside the box’ – save energy with Condor
    3. 3. What is Green IT ? No environmental impact  Get real  Use of resources in an efficient way  Better  Avoid ‘Greenwash’  Often faulty comparisons Importance of your Context  Your situation, drivers & metrics…  Can change over time
    4. 4. Drivers – Why do Green IT?  Increasing demand for CPU & Storage  Lack of Space  Lack of Power  Increasing energy bills (oil prices doubled)  Enhancing the reputation of your University & attracting better students  Sustainable IT & Legal Compliance  Because we should (for the Planet )
    5. 5. Metrics – Chicken and Egg You can’t measure everything. So, which numbers do you measure, when you don’t know which are important ?
    6. 6. Guesstimation Which use more, PCs or Servers ? 10,000 PCs using  150W, 8 hrs/day, 220 days/yr  UK electricity £300,000 p.a. 500 servers using  300W + losses (X 2)  24 X 7  UK prices £300,000 p.a. So where do we start ?
    7. 7. Examples of Metrics Cost (Watts, TCO, capital investment, etc) Value (CPU Hrs, MFlops, Gbps, Papers) Value/Cost is better ? (MFlops/Watt) Cooling & power losses (PUE) Average kWhr per PC per year Pages printed (or not printed ?) Cost per page (electricity, paper) Space saved
    8. 8. Power Usage Effectiveness PUE=Total Power/Computer Power
    9. 9. The Detail What wastes the power ?  Power Conversion in HV Transformer  Before it gets to your room, you lose power. Transformer Efficiency = 98% or 95%  Return On Investment (ROI)?  Capital Cost = £5500 or £5000 (650kVA)  New installation (@200kW), ROI = 1 month  Replacement, ROI = 1 year  Lifetime of investment = 20+ yrs !!!!!
    10. 10. Data Centre Consumption Power delivery40% 80% 100% Loads Cooling Cumulativepower Bull View Source: Intel Corp. 7.3% Voltage Regulators 20W 5.5% Server fans 15W 7.3% UPS +PDU 20W 18.2% PSU 50W 36.4% Load CPU, Memory, Drives , I/O 100W 25.5% Room cooling system 70W Total 275W Intel View
    11. 11. Cooling Inside the Room Hot Aisle/Cold Aisle Layout
    12. 12. Hot Aisle Containment Hot Aisle Ceiling Tiles/Cable Trough Seals in hot air, prevents mixing with room air Chamber DoorsAccess to hot aisle, locks for security. Prevents hot air mixing InRow RC unit InRow air conditioner cools hot exhaust air directly
    13. 13. Rack Air Containment Airflow Diagram  Server exhaust air is prevented from escaping by rear containment  All exhaust air is returned to InRow cooling unit  Fans of InRow cooling unit distribute cool air to front of IT equipment Top Down View Front Rear InRow Cooling Unit InRow Cooling Unit NetShelter SX Rack Servers Rear Containment Front Containment
    14. 14. APC Inline RC units • Up to 20kW/rack • Uniform cooling allows raising cold air temp Cooling Inside the Room Temperature (deg C) <24 27.4 30.8 34.3 >37.8
    15. 15. Cooling – Outside the Room 3 Airedale Chillers Ultima Compact Free Cool 120 Quiet model Variable speed fans N+1 arrangement (We needed that )
    16. 16. And in real life (2008)
    17. 17. Free-cooling Cooling Load System operated on 100% Free-Cooling (12% of Year) System operated on partial Free-Cooling (50% of Year) System operated on mechanical cooling only (38% of Year) -7oC 3.5oC 12.5oC Ambient Temp. Mechanical (compressor) cooling Free-cooling
    18. 18. Back Up Power  Full and half load efficiency >92%  Scalable & Modular – could grow as we grew, keeping %max load high APC 160kW Symmetra UPS
    19. 19. UPS Efficiencies
    20. 20. High Density Servers Important if you spend on advanced cooling methods Can have shared power supplies (PSU)  Higher efficiency Blade Servers have shared PSUs  Not always good for HPC Bull supplied half-width, 1U servers.  2, quad core, 80W Harpertown CPUs
    21. 21. Merlin Cluster
    22. 22. Cardiff’s PUE vs Temperature Average PUE = 1.3 No correlation with temperature ?
    23. 23. Remove 1 120kW Free cooling chiller Add 1 300kW Evaporative chiller Expansion for HPC Wales
    24. 24. Key Power Saving Solutions Low power servers High Density Servers (sweating the asset) Modular UPS (not over provisioned) Cooling Infrastructure Chilled Water“Air Handling” HVTransformer 98% efficiency
    25. 25. Virtualisation and Storage Virtualisation  Many virtual servers in one physical server Key questions  How much can you save ?  How much of that is energy ? Storage  Fast disk=450GB, 25W, Slow disk=2TB, 15W  Tiering = Energy saving
    26. 26. Business Case Before Virtualisation Equipment FY Unit Total 100 RM servers - replacement cost, inc installation and peripherals (assumes Dell PowerEdge 1950) 08/09 £2700 £270K 100 ‘Out of warranty’ servers – 2 year extended warranty 08/09 £500 £50K 250 Production Servers - replacement cost, inc installation and peripherals (assumes Dell PowerEdge 1950) 09 - 11 £2700 £675K 1 Air conditioning upgrades in RWD 1.71 and Main 1.71 Computer Rooms 08/09 £100K £100K 6 12KVA and 16KVA UPS replacements 09/10 £9K £54K Total hardware replacement costs (1 for 1) £1.15M 450 Cost of electricity for servers (200w per server @12p/KW hour) Annual £94K £283K 450 Cost of electricity for cooling (100w per server @12p/KW hour) * Annual £47K £142K Total electricity costs £425K Total hardware and electricity costs £1.24M * Inefficient raised floor and ceiling mount air conditioning units
    27. 27. Business Case After Virtualisation Equipment FY Unit Total 5 RM Servers – replacement cost, inc installation and peripherals (assumes Dell PowerEdge P900 with 32GB RAM, 2TB Disk) 08/09 £5K £25K 5 ‘Out of warranty’ servers– replacement cost, inc installation and peripherals (assumes Dell PowerEdge P900 with 32GB RAM, 2TB Disk) 09/10 £5K £25K 13 Production Servers – replacement cost, inc installation and peripherals (assumes Dell PowerEdge P900 with 32GB RAM, 2TB Disk) 09/11 £5K £65K 2 24TB Central Storage – assumes Sun Fire X4500 or Equivalent 08/09 £15K £30K 46 VMware ESX Licences – one per CPU 08 - 11 £1K £46K Physical to Virtual (P2V) Software ? 08/09 ? ? VMware backup software? 08/09 ? ? Total hardware costs (VMware Infrastructure) £191K 25 Cost of electricity for servers (500w per server @12p/KW hour) Annual £13K £39K 25 Cost of electricity for cooling (250w per server @12p/KW hour)* Annual £6K £18K Total electricity costs £57K Total hardware and electricity costs £248K * Inefficient raised floor and ceiling mount air conditioning units
    28. 28. Virtualisation Business Case 3 Year ROI with Virtualisation Before Total Total hardware costs £1.15M Total electricity costs £425K Total costs £1.57M After Total Total hardware costs (VMware Infrastructure) £191K Total electricity costs £57K Total costs £248K Hardware £959K Electricity £368K Total Savings = £1.32M
    29. 29. Planet Filestore JISC-funded project to measure savings of.. Tiered storage – scenario  100TB needed for 30000 (staff and students)  High performance for 7000 concurrent users  Site mirrored Raid 10 450GB SAS vs  Non mirrored Raid 5 2TB SATA  ~ factor of 16 in hardware cost  Therefore, use HSM (old data on Tier 2) What about energy ?
    30. 30. Planet Filestore Project Fast disk (FC/SAS)  450GB, 25W Slow disk (SATA)  2TB, 15W Tiered storage Tier 1 – DR Tier 2 – non DR 100TB Tier 1 (DR) Energy cost  £25,000 p.a. 10TB Tier 1 (DR) 90TB Tier 2(nonDR) Energy cost  £3,250 p.a. Energy saving = £21,750
    31. 31. PCs & Monitors Where to start ? Duty cycle guesstimate  Always on (100%)  On during working hours (20%) Turn it off at night !!
    32. 32. What next for PCs ? More stringent power saving policies Asset Management (reporting) Influencing user behaviour (Joulemeter)
    33. 33. What about Thin Clients ? Year PC Thin Client 2005 300W 15W 2010 50W 5W Still some scope, but not what there used to be 
    34. 34. Printing Thanks to Liverpool John Moores University (LJMU) for the next 2 slides
    35. 35. Carbon Footprint Savings • Hundreds of devices removed – Old lasers – Deskjets – Fax machines • Replaced with fleet of 220 devices • Automatic power saving after 2 hours • Use of recycled paper • FollowMe documents held for 18 hours
    36. 36. Actual Savings at LJMU
    37. 37. Networks Power used by switches  3.5Gbps/W to 0.5Gbps/W Cardiff LAN has 50,000 outlets  100kW down to 15kW  Potential £85k p.a. by choosing well PoE (up to 25W down a UTP cable)  5W in the cable, 5W DC-DC conversion  40% loss 
    38. 38. Use EnergyStar PCs Discourage use of the fastest PCs Use energy in your TCO calculations Ensure suppliers only publicise green PCs Purchasing Policies
    39. 39. Influencing Behaviour Context – who pays for electricity Target your messages  Users • Their PC uses ???W • Pages printed last week ??  Technical types – detail • MFlops, kW, Geeks/Pint  Managers • Bottom line ££££ • Reputation.
    40. 40. Finally…. What is Green, but makes your PC use more electricity ? Answer – Condor So, what is Condor ? Condor takes spare CPU Cycles for use in  Research calculations  Runs at low priority so users don’t notice So why is it Green ?
    41. 41. Condor - 2.4GHz Core2Duo 0 50 100 Idle (W) 64.5 Max (W) 80.7
    42. 42. Conclusions Measure, Improve, Measure again Look everywhere for waste Context changes over time  Make sure you revisit your assumptions Don’t just do the technical stuff Think out of the box (Condor)
    43. 43. Questions ?
    44. 44. Cardiff’s New Datacentre £3M grant to purchase a supercomputer New Datacentre required with appropriate power, cooling .. Our Context ?  INSRV Sustainability Mission - To minimise CU’s IT environmental impact and to be a leader in delivering sustainable information services.  Limited experience in Datacentre build  Server & Data Center Energy Efficiency (US Congress Report 2007)
    45. 45. What did Cardiff do ?  Ran 2 tenders, many contractors (& subs)  HPC & Environment (power, cooling)  Strong project management  TCO as key evaluation criterion  Value = HPC Benchmark figures  Winner = Highest Value/TCO