Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Myths and realities about designing high availability data centers
Next
Download to read offline and view in fullscreen.

1

Share

Download to read offline

The_Future_of_Data-Centres_-_Prof._Ian_Bitterlin_Emerson

Download to read offline

Related Audiobooks

Free with a 30 day trial from Scribd

See all

The_Future_of_Data-Centres_-_Prof._Ian_Bitterlin_Emerson

  1. 1. The future of DataThe future of Data-- Centers?Centers? Prof Ian Bitterlin CEng PhD BSc(Hons) BA DipDesInn MIET MCIBSE MBCS MIEEE Visiting Professor, School of Mechanical Engineering, University of Leeds Chief Technology Officer, Emerson Network Power Systems, EMEA Member, UK Expert Panel, EN50600 – Data Centre Infrastructure - TCT7/-/3 UK National Body Representative, ISO/IEC JCT1 SC39 WG1 – Resource Efficient Data Centres Project Editor for ISO/IEC 30143, General Requirements of KPI’s, WUE, CUE & REC Committee Member, BSI IST/46 – Sustainability for and by IT Member, Data Centre Council of Intellect UK SVP & Technical Director (Power), Data Centre Alliance – non-for profit Trade Association Chairman of Judges, DataCenterDynamics, USA & EMEA Awards Chairman of The Green Grid’s EMEA Technical Work Group
  2. 2. Data is growing faster and faster andData is growing faster and faster andData is growing faster and faster andData is growing faster and faster and Capacity driven by exponential data growth 80% CAGR compared to the 40% CAGR of Moore’s Law Virtualisation of hardware partly closes the gap Growth in emerging markets is faster than mature regions Increasing capacity and efficiency of ICT hardware has always been outstripped by demand
  3. 3. The Law of Accelerating Returns:The Law of Accelerating Returns: KurzweilKurzweil Information generation • 2009 = 50GB/s • 2020 = 500GB/ s • 10,000,000x increase The Singularity is Near Raymond Kurzweil, 2005, Viking Introduced the ‘law of accelerating returns’ and extended Moore’s Law Ray Kurzweil has been described as “the restless genius” by the Wall Street Journal, and “the ultimate thinking machine” by Forbes magazine, ranking him #8 among entrepreneurs in the United States and calling him the “rightful heir to Thomas Edison”. PBS included Ray as one of 16 “revolutionaries who made America,” along with other inventors of the past two centuries.
  4. 4. 䀀 Gordon Moore was a founder of Intel 30 years ago he wrote Moore’s Law which predicted the doubling of the number of transistors on a microprocessor every two years Moore’s Law has held true ever since Applies as well to – Doubling compute capacity Moore’s LawMoore’s LawMoore’s LawMoore’s Law – Doubling compute capacity – Halving the Watts/FLOP – Halving kWh per unit of compute load etc Kurzweil now suggests that the doubling is every 1.2 years Encourages ever-shorter hardware refresh rates – Facebook 9-12 months, Google 24-30 months etc Keeping ICT hardware 3 years is energy profligate
  5. 5. Five ‘Moore’ years?Five ‘Moore’ years?Five ‘Moore’ years?Five ‘Moore’ years? Is 3D graphene the fifth paradigm?
  6. 6. Data generation growthData generation growth • At Photonics West 2009 in San Jose, Cisco correctly predicted for 2012 that ‘20 US homes with FTTH will generate more traffic than the entire internet backbone carried in 1995’ • Japanese average home with FTTH - download rate is 500MB per day, dominated by HD-Video • More video content is uploaded to YouTube every month than a TV station can broadcast in 300 years 24/7/365station can broadcast in 300 years 24/7/365 • Phones with 4G are huge data-generators. Even with 3G in 2011 Vodafone reported a 79% data-growth in one year – was that all social networking? • 4K UHD-TV? A 3D 4K Movie = 2h download over fast broadband
  7. 7. Jevons Paradox (Rebound Effect)Jevons Paradox (Rebound Effect) ‘It is a confusion of ideas to suppose that the economical use of fuel is equivalent to diminished consumption. The very contrary is the truth’ William Stanley Jevons, 1865 The Coal Question, Published 1865, London, Macmillan Co Newcomen’ s engine was c2% thermally efficient and coal supplies in the UK were highly strained Watt’s engine replaced it with c5% efficiency - but the result was rapid increase in coal consumption Can the same be said of data generation and proliferation? Don’t forget that less than 30% of the world’s population have access to the internet And the rest want it .
  8. 8. Infrastructure and energy!Infrastructure and energy!Infrastructure and energy!Infrastructure and energy! Time magazine reported that it takes 0.0002kWh to stream 1 minute of video from the YouTube data centre Based on Jay Walker’s recent TED talk, 0.01kWh of energy is consumed on average in downloading 1MB over the Internet. The average Internet device For 1.7B downloads of this 17MBThe average Internet device energy consumption is around 0.001kWh for 1 minute of video streaming For 1.7B downloads of this 17MB file and streaming for 4 minutes gives the overall energy for just this one pop video in one year
  9. 9.  310GWh in one year from 15310GWh in one year from 15thth July 12July 12310GWh in one year from 15310GWh in one year from 15thth July 12July 12 c36MW of 24/7/365 diesel generation 310GWh = more than the annual electricity consumption of Burundi, population 9 million (273GWh in 2008) 100 million litres of fuel oil 250,000 Tons CO2 80,000 average UK car years – 960 million miles (c8,000 cars, cradle to grave) Just for one pop-video on YouTube
  10. 10. Japanese IP Router power consumptionJapanese IP Router power consumption • Paper by S. Namiki, T. Hasama H. Ishikawa • National Institute of Advanced Industrial Science and Technology • Network Photonics Research Center, 2009 • Japanese traffic has grown exponentially • Broadband Subscribers Mar-00 to Jul-07, 0.22 to 27.76 million • 40% CAGR in daily average JPIX Traffic • 11/04 324Gbps • 11/05 468Gbps • 11/06 637Gbps • 05/07 722Gbps • By Sep-07 10.52 million FTTH subscribers • Forecast c25million subscribers by end-2010 • Forecast download per user per day = 225MB • The current technologies can’t scale to the future traffic • Japan needs a new technology paradigm with 3-4 orders of energy reduction on today’s technology
  11. 11. Energy limitation on current technologyEnergy limitation on current technology The current technology would consume the entire grid power capacity before 2030!
  12. 12. Data has always outstripped Moore's LawData has always outstripped Moore's Law Vodafone experienced 69% annual data growth in mobile data in 2011
  13. 13. Choose your starting pointChoose your starting point 10% of grid capacity consumed in 4-6 years? 100% in under 10 years? The result is unsustainable with any start-value
  14. 14. Can data centres be ‘sustainable’?Can data centres be ‘sustainable’? • Never in isolation! • Data centres are the factories of the digital age • They convert power into digital services – its impossible to calculate the ‘efficiency’ as there is no definition of ‘work done’ • All the energy is treated as waste and, in almost every case, is dumped into the local environment • Only if the application of the data centre can be shown to be an enabler of a low-carbon process can it be regarded as sustainable • Not ‘sustainable’, unless• Not ‘sustainable’, unless • The load is a low-carbon solution • They have minimised consumption by best-in-class hardware • They have reduced their PUE to the minimum for business case • They source power from renewable (or low-carbon) sources? • They re-use waste heat • Is a true ‘parallel computing’ model ‘efficient’? • If you build two ultra-low PUE facilities (close to PUE=1) to push redundancy and availability into the hardware-software layer then could your peak overall power consumption be 2?
  15. 15. Fast broadband for all?Fast broadband for all? The EU has a digital agenda that involves super-fast broadband for all citizens at an affordable price, if not free to those who are less able to pay Faster access will, according to Jevons Paradox, generate a power demand increase but no government has yet appeared to understand the direct linkage mechanism between data-generation and power demand Faster access used for business or education is one thing, but for social networking? Faster access used for education, medical services security may be key to many 3rd World nations development ‘Internet access will become a privilege, not a right’ Vint Cerf, 2011 Inventor of the IP address and often regarded as one of the ‘Fathers of the Internet’ Now VP and Chief Internet Evangelist, Google – working on inter-Galactic IP addresses
  16. 16. Industry predictions that point the wayIndustry predictions that point the way • Nokia Siemens Networks • By 2015 2,500% mobile data • 23 Exabytes/year (23,000,000,000,000,000,000 bytes) • Planning for 1,000x increase in network storage capacity 2010-2020 • Cisco • By 2015 2,600% mobile data• By 2015 2,600% mobile data • 76 Exabytes/year • Internet traffic increases 32%/year to 966 Exabytes/year • 3,900% the Internet traffic (by volume) in 2005 • IDC • 2009-2020 data-growth of 4,400% • A faster growth rate than Moore’s Law and technology?
  17. 17. But ICT infrastructure needs energy...But ICT infrastructure needs energy... • A viral-like spread and expansion of digital data – but how will it be transferred? • By courier on hard-drives or via fibre? • At the moment sending 2TB between Bristol California is cheaper, faster and lower carbon footprint by DHL on a jumbo-jet • Is there a natural limit to growth? Or an un-natural one? We all remember when Gartner (2008) said that energy consumption• We all remember when Gartner (2008) said that energy consumption of data-centres will grow by 1,600% from 2005 to 2025 and that ICT produces 2% of worldwide CO2 emissions • Could the 2% of ‘today’ grow into... • Cisco 39x = 78% by 2015 • Nokia Siemens 25x = 50% by 2015 • IDC 44x = 88% by 2020 • Gartner 16x = 32% by 2025
  18. 18. Is ‘The Cloud’ an answer?Is ‘The Cloud’ an answer?Is ‘The Cloud’ an answer?Is ‘The Cloud’ an answer? Partly, ‘The Cloud’ = ‘Someone else’s data-centre’ – They will proliferate and get bigger – They will increase dramatically in ICT utilisation – Built in an increasingly modular/scalable fashion They will strive for low costs via ultra-low PUEThey will strive for low costs via ultra-low PUE – They will innovate and move to sub-1.15 PUE – Low energy cooling, ‘thermal management’ (major influence) – High efficiency UPS with advanced eco-mode – Visibility and control via DCIM will be essential
  19. 19. Big, virtualised heavily loaded and ‘greener’Big, virtualised heavily loaded and ‘greener’ • UK data centres consume c1GW • 35-40,000 ‘data-centres’, ripe for consolidation/outsourcing • If average PUE=2 then ICT load = 500MW • ‘Cloud’ is an outsourced flexible pay-as-you-go compute storage business with relaxed hardware SLA’s in a highly virtualised environmentvirtualised environment • ‘Cloud’ = ‘someone else’s data centre’ • ‘Cloud’ will (has already?) become a commodity service driven by the cost of power and its efficient use • Logically ‘cloud’ should be more efficient with a cost driven PUE of 1.2, cutting grid demand by 40% • But data-growth will continue to demand more power
  20. 20. Don’t pay for heavyweight reports on the growth rate of data centres Choose your ‘best guess’ data growth-rate – Currently 80%? e.g. mobile data, storage sales etc Deduct Moore’s Law (40%CAGR) – E.g. 80%-40% = 40% annual power growth Bitterlin’s LawBitterlin’s Law ☺☺20122012Bitterlin’s LawBitterlin’s Law ☺☺20122012 Compare virtualisation software sales to server sales and take a view on the impact – E.g. Halving the 40% = 20% So, data-centres power-growth rate is currently 20% - and mostly in emerging markets rather than in the old economies A paradigm shift will only extend exponential growth, not solve the power-growth problem
  21. 21. It’s all about the moneyIt’s all about the moneyIt’s all about the moneyIt’s all about the money Power Usage EffectivenessPower Usage EffectivenessPower Usage EffectivenessPower Usage Effectiveness A universally accepted and harmonised metric that covers the infrastructure and soon to be embodied in an ISO/IEC Standard
  22. 22. ͩ印 Power costs have become dominantPower costs have become dominantPower costs have become dominantPower costs have become dominant At UK power costs 40-60% of 10-year data centre TCO is the cost of electrical power – Land, structure, ICT hardware staffing are all subjugated by the cost of electricity ICT hardware costs have fallen to less than 3-ICT hardware costs have fallen to less than 3- years of its own power consumption – Refresh rates have fallen to 3y, for some 1y Low PUE has become the dominant mantra Monitoring and control have become vital
  23. 23. 쓠Ў An example of a UK colo cost modelAn example of a UK colo cost modelAn example of a UK colo cost modelAn example of a UK colo cost model Tier 3 build cost = £10k-£13k/kW One 4kW cabinet lease = £27,500/year – c£6k/year/kW Power cost for 4kW IT at PUE 1.6 = £5,600paPower cost for 4kW IT at PUE 1.6 = £5,600pa – Over 10 years = 4x the infrastructure build cost The cost of power dominates the TCO and a low PUE becomes a key enabler
  24. 24. PUE = 1.7 (EU CoC Participant average)PUE = 1.7 (EU CoC Participant average) Cooling fans, pumps compressors Lighting small power Ventilation – Fresh Air5 kW 15 kW 1MVA IT terminal load Distribution conversion losses Cooling fans, pumps compressors Security, NOC, BMS, outdoor lighting Communications 250 kW 470 kW 35 kW 13 kW 2 kW Total 800 kW 1MVA
  25. 25. 籐Ј The misuse of PUE for marketing?The misuse of PUE for marketing?The misuse of PUE for marketing?The misuse of PUE for marketing? Has Facebook, Google et al spoiled it for the mainstream data-center industry? Ultra-low PUE’s set unachievable targets for enterprise facilities – 1.12 by Google to 1.07 PUE shattering by Facebook– 1.12 by Google to 1.07 PUE shattering by Facebook
  26. 26. 籐Ј ‘Horses for courses’‘Horses for courses’‘Horses for courses’‘Horses for courses’ What is good for Google is not usually acceptable or possible for enterprise facilities, but it is not ‘wrong’ – it’s ‘right’ for Google! – Fresh-air cooling but with short refresh cycle • Low ambient locations are preferable• Low ambient locations are preferable – No central UPS but ride-thru battery built into server • Redundancy in the software/hardware layer Resultant PUE 1.12 and going down – With a very high processor utilisation from a single application like ‘search’
  27. 27. Is a low PUE ‘sustainable’ engineering?Is a low PUE ‘sustainable’ engineering? • Cooling efficiency • Site selection, latitude and local climate (water-usage a limiting factor?) • Rigorous air-management in the room • High server inlet temperature (avoiding fan ramp-up, 27°C?) • Minimum humidification and de-hum (if any?) • Free-cooling coils for when the external ambient is cool • If possible avoid compressor operation altogether • Power efficiency • Avoid high levels of redundancy and low partial loads in general• Avoid high levels of redundancy and low partial loads in general • Design redundancy to always run at 60% load • Adopt high-efficiency, modular, transformer-less UPS where efficiency is 96% at 20% load • Adopt eco-mode UPS where peak efficiency is 99% with an annual average efficiency close to 98% • Apply high efficiency lighting etc • Best practice gets us to a PUE of 1.11-1.15 • Extreme data-centre ‘engineering’ gets us down to below 1.1 • ‘Risk’ (perceived or real) increases as PUE goes sub-1.2
  28. 28. 짰Ў Can ICT save the planet?Can ICT save the planet? • Will ICT lower our energy consumption and help to counter Global Warming? • Less travel, video conferencing, home working • Internet shopping, smarter logistics (no right-hand turns?) • Smarter buildings (sensors, sensors everywhere ) • Better manufacturing • Smart-grid enablement • Better education and access to medical services • But we all seem to want more digital services and content • 24x7 x Forever • Wherever the location, fixed and mobile • Increasingly HD-video content • 4G mobile network 4K-TV will exacerbate the problem • Government plan for ‘fast-broadband for all’ at low cost will only drive consumption up • Let’s not forget that 25% of the world’s population has access to the internet and the rest want/need it
  29. 29. 籐Ј Power Cooling in the pastPower Cooling in the past • Data-centres have evolved from the Mainframe machine-rooms of the mid-50s to the file-server and storage-array dominated mega-facilities of today • From 35W/m² in the 70s to 5,000W/m² in 2010 • The power requirement hardly changed in 20 years • 1990 441Hz, derived from aircraft technology • 1997 50Hz, voltage frequency ±1%, fidelity 10ms • 1997 50Hz, voltage frequency ±10%, fidelity 20ms • But in 2013 things may have regressed • The environmental requirements of IT hardware have changed drastically in very recent times • The original specification was based on humidity control for punch- cards and read/write accuracy on magnetic tape-heads • 45%RH and too much static-electricity built up • 55%RH and the punch-cards absorbed too much moisture • Humidification and de-hum were key elements in the thermal management design and the result was precision air-con • Temperature was controlled to 22°C±1°C (usually return air) • Until 2-3 years ago, and still for (far too) many facilities, this was/is the ‘safe’ SLA and avoids any conflict for legacy loads
  30. 30. 籐Ј Cooling is the lowCooling is the low--hanging fruithanging fruit pPUE = Partial Power Usage EffectivenesspPUE = Partial Power Usage EffectivenesspPUE = Partial Power Usage EffectivenesspPUE = Partial Power Usage Effectiveness The cooling system has become the most important target for saving energy in the data centre
  31. 31. 籐Ј PUE only measures the infrastructurePUE only measures the infrastructure PUE takes no account of the IT load or its ‘efficiency’ PUE must never be used to compare facilities PUE is annualised energy (kWh), not ‘power’ (kW) PUE varies by location, season and load Low PUE enables a bigger IT load Peak power can be very different from PUE
  32. 32. 漐Ў PUE varies with load climatePUE varies with load climate PUE = energy ratio of the annualised ‘kWh-Facility’ divided by ‘kWh-ICT load’ Above example PUE = 9 at 10% load improves to 1.4 at 100% load
  33. 33. 蚐Ј Partial load performance is keyPartial load performance is keyPartial load performance is keyPartial load performance is key Partial load is endemic in Data Centres worldwide – 400MW of Trinergy delivered in the last 2 years is running with an average load of 29% Partial load is the enemy of energy efficiencyPartial load is the enemy of energy efficiency – Modular/scalable solutions are the key to keeping the system load high and efficiency maximised – Trinergy example, running at 97.8% efficiency High redundancy often exacerbates partial load
  34. 34. 셀Ў CompressorCompressor--free cooling?free cooling? • UK examples, where the design peak external dry-bulb ambient is c33°C wet-bulb c23°C then: • Open fresh-air system with adiabatic cooling, limited to peak 26°C server inlet = 100 hours/year compressor operation • Closed system with air-to-air heat-exchanger and adiabatic spray, limited to peak 30°C server inlet = zero hours/year compressor operation • Note! ‘Free-cooling’ does not mean ‘fresh-air’ • Wherever the peak external ambient is below 35°C and water for evaporation is available it is possible to have compressor-free cooling 8760h/year and keep within the latest Class 2 ‘recommended’ limits • Annualised PUE of 1.15 could be achieved Europe- wide • Compared to industry legacy of 3 in operation • More than a 60% reduction in power consumption
  35. 35. 셀Ў The UK could avoid compressor operation...The UK could avoid compressor operation... Approach temperature of 7°K (indirect or direct airside economization) Maximum server inlet temperature of 30°C for 50 hours/year using water for adiabatic cooling – about 1,000T/MW/year Average server inlet temperature of a ‘traditional’ 22°C °C Dry-bulb monthly average Wet-bulb monthly average
  36. 36. 셀Ў Risk, real or perceived?Risk, real or perceived? Complexity can be the enemy of reliabilityComplexity can be the enemy of reliabilityComplexity can be the enemy of reliabilityComplexity can be the enemy of reliability Balancing redundancy and the chances for human error is key
  37. 37. 漐Ў What is your appetite for risk?What is your appetite for risk?What is your appetite for risk?What is your appetite for risk? This is the first question that a designer should ask a data-centre client – Thermal envelope for hardware • ASHRAE TC9.9 Class 1,2, 3 or 4? • Recommended or Allowable for ‘X’ hours per year?• Recommended or Allowable for ‘X’ hours per year? – Contamination and corrosion • Air quality? Direct or Indirect economisation? – Power Quality and Availability • High efficiency UPS? • Single-bus or dual-bus power? High reliability usually costs energy
  38. 38. 籐Ј Enabling factors for innovationEnabling factors for innovationEnabling factors for innovationEnabling factors for innovation ASHRAE TC9.9 slowly widening the ‘recommended’ and, faster, the ‘allowable’ thermal windows – Allowable A1 temperature 18°-32°C, Humidity 20-80%RH – Encouraging no refrigeration in data centres of the future The Green Grid pushing DCMM, the Maturity Model – Eco-mode UPS plus no refrigeration, even in back-up EU CoC is reported to be considering +45°C? ISO/IEC, ETSI ITU will push energy efficiency of data centres to the top of the agenda
  39. 39. 漐Ў The future: Ever wider thermal envelopeThe future: Ever wider thermal envelope • The critical change has been to concentrate on server inlet temperatures, maximising the return-air temperature • Rigorous air-containment is ‘best practice’ Do ASHRAE need to go further and expand the ‘Recommend’, not just the ‘Allowable’?
  40. 40. 셀Ў ASHRAE TC9.9 2011 Thermal GuidelinesASHRAE TC9.9 2011 Thermal GuidelinesASHRAE TC9.9 2011 Thermal GuidelinesASHRAE TC9.9 2011 Thermal Guidelines
  41. 41. 漐Ў Our industry is like a cometOur industry is like a cometOur industry is like a cometOur industry is like a comet Facebook Google et al are the bright-white tip but 99.5% of the matter is in the dark tail Governed by paranoia rather than engineering Not littered with Early Adopters; thermal SLA’s are more often still based upon ASHRAE 2004 limits – 22°C (where?) and 45-55%RH
  42. 42. 籐Ј Chilled Water, DX Adiabatic?Chilled Water, DX Adiabatic?Chilled Water, DX Adiabatic?Chilled Water, DX Adiabatic? Chilled Water will remain dominant for 1MW multi-storey and larger city- centre locations where space and external wall runs are limited and flexibility of heat rejection location is low – Latest technology from ENP will enable pPUE of 1.4 – Adiabatic coils likely to become a standard feature – Will remain dominant where ambient conditions are very hot and/or very humid – Will remain dominant for tight thermal envelope SLAs– Will remain dominant for tight thermal envelope SLAs DX will remain dominant for smaller facilities and city-centre locations. Up to c300kW – Latest technology from ENP enables pPUE of 1.2 Adiabatic systems will dominate the new green-field mega-facilities – Latest technology from ENP will enable pPUE of 1.06 – Indirect economization will dominate over Direct (fresh-air) systems – Water consumption may be an issue for some locations
  43. 43. 漐Ў 70% of all failures are human error70% of all failures are human error70% of all failures are human error70% of all failures are human error Power ArchitecturePower ArchitecturePower ArchitecturePower Architecture Reliability versus human-error versus energy efficiency? 2N power removes a lot of human error!
  44. 44. 셀Ў The drive for higher Availability leadsThe drive for higher Availability leads to increasing complexityto increasing complexity The drive for higher Availability leadsThe drive for higher Availability leads to increasing complexityto increasing complexity
  45. 45. 漐Ў Uptime Institute Tier Ratings for Data Centres ANSI/TIA 942 – Infrastructure Standard for Data Centres ANSI/BICSI 002 – Data Centre Design and Implementation Best Practice New EN Standard BS EN 50600 will be introduced in 2013 and use the terminology ‘Availability Class’ in four discrete steps Site Distribution: Tier TopologySite Distribution: Tier TopologySite Distribution: Tier TopologySite Distribution: Tier Topology
  46. 46. 漐Ў Why are there only four tiers/classes?Why are there only four tiers/classes?Why are there only four tiers/classes?Why are there only four tiers/classes? Before the founders of The Uptime Institute innovated the dual-cord load, critical loads only had one power connection (one active path) With single-cord loads you can only have two tiers/classes – Single path without redundant components – Single path with redundant components, e.g. N+1 UPS Static Transfer Switches were first introduced in Air Traffic Control applications to increase the power availability but an STS is always aapplications to increase the power availability but an STS is always a single point of failure With dual-cord loads two more tiers/classes were made available – Dual-path with one ‘active’ (e.g. N+1 UPS) and one ‘passive’ – a wrap- around pathway that could be used in emergency or for covering routine maintenance in the ‘active’ path – Dual-path with two ‘active’ paths (e.g. 2(N+1) or 2S) where no common point of failure exists between the two pathways and load availability is maximised The (0) classification of BICSI doesn’t really reflect a dedicated data- centre 46
  47. 47. 셀Ў UTI Tier Classifications:UTI Tier Classifications: I to IVI to IVUTI Tier Classifications:UTI Tier Classifications: I to IVI to IV • The Tier classification system takes into account that 16 sub-systems contribute to the overall site availability • Tier I = 99.67% site • Tier II = 99.75% site • Tier III = 99.98% site • Tier IV = 99.99% site = 99.9994% power-system• Tier IV = 99.99% site = 99.9994% power-system • Note that any system requiring 4h maintenance per year = 99.95% max • All systems have to meet: Tier IV later revised to 2(N)
  48. 48. 漐Ў Combinations of MTBF/MTTR = Any TierCombinations of MTBF/MTTR = Any TierCombinations of MTBF/MTTR = Any TierCombinations of MTBF/MTTR = Any Tier
  49. 49. 漐Ў N - Meets base load requirements with no redundancy – Note that where N1 the reliability is rapidly degraded N+1 - One additional unit/path/module more than the base requirement; the stoppage of a single unit will not disrupt operations – N+2 is also specified so that maintenance does not degrade resilience Levels of RedundancyLevels of RedundancyLevels of RedundancyLevels of Redundancy – An N+1 system running at partial load can become N+2 2N - Two complete units/paths/modules for every one required for the base system; failure of one entire system will not disrupt operations for dual-corded loads 2(N+1) - Two complete (N+1) units/paths/modules; failure of one system still leaves an entire system with a resilient components for dual-corded loads 49
  50. 50. 셀Ў Redundancy: What is ‘N’?Redundancy: What is ‘N’?Redundancy: What is ‘N’?Redundancy: What is ‘N’? Module capacity = Load 2x Module capacity = Load 3x Module capacity = Load MTBF = X MTBF = 0.5X MTBF = 0.33X N=1 N=2 N=3 Unitary string Power-parallel Power-parallel
  51. 51. 齰Ў Redundancy: What is ‘N+1’?Redundancy: What is ‘N+1’?Redundancy: What is ‘N+1’?Redundancy: What is ‘N+1’? Module capacity = 100% Load Module capacity = 50% Load Module capacity = 33.3% Load MTBF = 10X MTBF = 9X MTBF = 8X N=1 N=2 N=3
  52. 52. Redundancy: What is ‘2N’?Redundancy: What is ‘2N’?Redundancy: What is ‘2N’?Redundancy: What is ‘2N’? Module capacity = 100% Load Module capacity = 33.3% Load A B BA BA MTBF = 100X MTBF = 50X N=1 N=3 A B BA BA
  53. 53. Ў Redundancy: What is ‘2(N+1)’?Redundancy: What is ‘2(N+1)’?Redundancy: What is ‘2(N+1)’?Redundancy: What is ‘2(N+1)’? Module capacity = 100% Load Module capacity = 50% Load BA MTBF = 1000X MTBF = 800X N=1 N=2 BA A B
  54. 54. ͪ꺰 Module capacity = 100% Load Think smart: When N+1 = 2N for no costThink smart: When N+1 = 2N for no cost A B Module capacity = 100% Load R = 10X R = 100X N+1 2N N=1 N=1 A B For dual-cord loads (or PoU-STS’s) and when N=1
  55. 55. DistributionDistribution limits the MTBF Availabilitylimits the MTBF AvailabilityDistributionDistribution limits the MTBF Availabilitylimits the MTBF Availability Mains/Generator Feed Maintenance Bypass UPS Input Switchboard Critical Load Bus UPS Output Switchboard N+X does not improve things – the MTBF and Availability is entirely dependent upon the output switches, only 2N offers high Availabilty MCCB/ACB MTBF=250,000h, so two in series offer a 125,000h ceiling
  56. 56. ㆰͬ Connection to the (one!) utility gridConnection to the (one!) utility gridConnection to the (one!) utility gridConnection to the (one!) utility grid 230-400kV 66kV 33kV 56 Data Centre 11kV 400V Data Centre Data Centre Best = A+B The higher the connection voltage the better Fewest shared connections Diverse substations Diverse routing
  57. 57. ͪ In the EU we have EN 50160:2000 Voltage characteristics of electricity supplied by public distribution systems (see next slide) In the USA: – The Sustained Average Interruption Frequency Index (SAIFI): • Measurement of the months between interruption • A SAIFI of 0.9 indicates that the utility’s average customer experiences a sustained electric interruption every 10.8 months (0.9 x 12 months) UtilityUtility Supply:Supply: Power Quality MetricsPower Quality MetricsUtilityUtility Supply:Supply: Power Quality MetricsPower Quality Metrics – The Customer Average Interruption Duration Index (CAIDI): • An average of outage minutes experienced by each customer who experiences a sustained interruption – The Momentary Average Interruption Frequency Index (MAIFI): • The average number of momentary interruptions experienced by utility customers – Depending upon state regulations, momentary interruptions are defined as any interruption lasting less than 2 to 5 minutes – NOT 20ms! In all cases national regulations provide for a public power supply that is not suitable for compute loads with embedded microprocessors 57
  58. 58. ʀͬ EN 50160:2000EN 50160:2000 -- Voltage characteristics of electricity supplied byVoltage characteristics of electricity supplied by public distribution systemspublic distribution systems EN 50160:2000EN 50160:2000 -- Voltage characteristics of electricity supplied byVoltage characteristics of electricity supplied by public distribution systemspublic distribution systems Phenomenon Limits Measuremen t interval Monitoring period Acceptan ce Percentag e Frequency 49.5 to 50.5Hz 47 to 52Hz 10s 1 week 95% 100% Slow Voltage changes 230V ± 10% and outside of 10% for 5% of the time 10 minutes 1 week 95% Voltage sags (1min) 10 to 1000 times per year (85% nominal) 10ms 1 year 100% If you try to plot this against the CBEMA Curve you get MTBF c50hShort interruptions (3min) 10 to 100 times per year (1% nominal) 10ms 1 year 100% Accidental, long interruptions (3min) 10 to 50 times per year (1% nominal) 10ms 1 year 100% Temporary over- voltage (Line- Ground) Mostly 1.5kV 10ms 1 year 100% Transient over- voltage (Line- Ground) Mostly 6kV N/A N/A 100% Voltage unbalance Mostly 2% but occasionally 3% 10 minutes 1 week 95% Harmonic voltages 8% Total Harmonic Distortion (THD) 10 minutes 1 week 95% 58 c50h
  59. 59. ʀͬ BlackBlack--outout atat the 11kV distribution levelthe 11kV distribution level UK ElectricityUK Electricity Council data, 1988Council data, 1988 BlackBlack--outout atat the 11kV distribution levelthe 11kV distribution level UK ElectricityUK Electricity Council data, 1988Council data, 1988 Availability MTBF(years) MDT(hours) Urban Rural 0.01 36sec 3.1 0.39 0.02 3.2 0.40 0.08 3.7 0.46 0.20 12mins 4.1 0.50 0.33 4.4 0.55 0.50 30mins 4.9 0.60 0.65 5.7 0.70 0.80 48mins 6.8 0.80 Black-out Total loss of voltage on three phases Brown-out Depression of one, or more, phases Frequency Grid or standby set generated Surges Switching, fault clearance re-closure Voltage distortion Caused by consumer connection Micro-breaks Short-circuits fault clearance Swells Over-voltage for several cycles 0.80 48mins 6.8 0.80 1.00 8.2 0.90 How much diesel fuel need you store? Sags Under-voltage for several cycles Quality of the grid supply 34 German data centers, 1995 Deviations (10ms to V±5%, 50Hz±1%) over 2190 hours Worst Average Best MTBF 43 h 155 h 685 h MDT 81.45 s 1.72 s 0.1 s Availability 99.94738% 99.99969% 99.99999% Typical connection voltage 380V 20kV
  60. 60. Typical utility power qualityTypical utility power qualityTypical utility power qualityTypical utility power quality 107,834 MV deviations (RMS) over 24 months – 300 MV feeders – 49.90 events/connection/year – MTBF = 175h – MTTR = 3.6s– MTTR = 3.6s – 2% better when closer to sub-station feed Over 60% of events are – 10 cycles duration, 200ms – 50% voltage sag
  61. 61. ʀͬ UPS requirements for big data centresUPS requirements for big data centresUPS requirements for big data centresUPS requirements for big data centres High efficiency below 40% load, pPUE of 1.03 Maximum protection when grid is poor quality Scalable for ‘invest as you grow’ CapEx to MW Low voltage distortion against non-linear load current – Emerson Trinergy provides all– Emerson Trinergy provides all • 98% efficiency over a full year (average 97.8% at 29% load) • Three operating modes from double-conversion upwards • 200kW blocks, 1600kW modules to multi-MW LV systems • THVD 3% with 100% distorted load • MV systems optional
  62. 62. ʀͬ Advanced ecoAdvanced eco--mode; pPUE = 1.02mode; pPUE = 1.02Advanced ecoAdvanced eco--mode; pPUE = 1.02mode; pPUE = 1.02 1200kW Load
  63. 63. ʀͬ Server hardware developments?Server hardware developments?Server hardware developments?Server hardware developments? Relaxed cooling but increased demands for UPS?
  64. 64. ʀͬ But now it’s the turn of the ‘One’!But now it’s the turn of the ‘One’! • Typical servers in 2013 consume 40% (from as low as 23% to as much as 80%) of their peak power when doing zero IT ‘work’ • Average microprocessor utilisation across the globe is c10%, whilst the best virtualisation takes it to c40% for (rare) homogeneous loads only 90% for HPC If the IT hardware had a linear power demand profile• If the IT hardware had a linear power demand profile versus IT load we would only be using 10% grid power • In the UK that could mean 100MW instead of 1000MW • PUE of 1.2 is a law of diminishing returns and increasing risk so is it time to look at the ICT load? • DCIM can offer a path to high utilisation rates
  65. 65. ʀͬ Spec_Power: OEMs input dataSpec_Power: OEMs input data Utility servers In this small extract from the web-site, HP ProLiant models average 41% idle power and vary from 24%power and vary from 24% to 79% HP is ‘better’ than ‘worst’ http://www.spec.org/power_ssj2008/
  66. 66. ʀͬ This is the real ‘efficiency’ battlegroundThis is the real ‘efficiency’ battleground ...... Average utilisation must increase The IT load will become highly dynamic and the PUE may get ‘worse’, although the overall energy consumption will reduce!
  67. 67. ʀͬ 1313thth Generation Servers?Generation Servers?1313thth Generation Servers?Generation Servers? Optimised for 27°C inlet temperature – 300W server would have typical 20W fan load Capable of 45°C inlet temperature – Server power rises 60% with 200W fan load – Dramatic increase in noise 20°K delta-T, front-to-back All terminations for power and connectivity brought to front – nothing in the hot-aisle Disaggregation?
  68. 68. ʀͬ High efficiency has consequencesHigh efficiency has consequencesHigh efficiency has consequencesHigh efficiency has consequences
  69. 69. ʀͬ Neutral Current (Balanced load) The problem with high harmonic loadsThe problem with high harmonic loads Phase Currents Kirchoff’s Law: Sum of the currents at a junction is zero Source: Visa International, London, 1995 (Balanced load) N-E Potential (5.4V Peak) GRD N
  70. 70. ʀͬ NeutralNeutral current induces noise in thecurrent induces noise in the EarthEarth NeutralNeutral current induces noise in thecurrent induces noise in the EarthEarth 0.00 200.00 400.00 600.00 800.00 1000.00 Voltage -1000.00 -800.00 -600.00 -400.00 -200.00 0.00 0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 High frequency current flowing through the impedance of the Neutral conductor causes voltage impulses (with respect to Earth) in the Neutral. This “noise” on the Earth can cause communication errors. 20ms
  71. 71. ʀͬ Utility Supply? What the load needsUtility Supply? What the load needsUtility Supply? What the load needsUtility Supply? What the load needs Voltage 300% 200% Unacceptable Range Electro-mechanical switch (60-80ms) 140 Time (60Hz) 0.02ms 0.2ms 20ms2ms 0.2s 2s 20s 100% STS (4ms) 0% Acceptable Range Unacceptable Range 140 120 70 80 110 90 Note! The IEEE-1100/CBEMA Curve was only ever issued for 120V/60Hz single-phase equipment
  72. 72. ʀͬ Current future power quality demands?Current future power quality demands?Current future power quality demands?Current future power quality demands? Pre-1997 CEBMA PQ Curve (IEEE 466 1100) – 10ms zero-voltage immunity Post-1997 CEBMA/ITIC PQ Curve – 20ms zero-voltage immunity 2012 typical server when fully loaded only meets the pre-2012 typical server when fully loaded only meets the pre- 97 10ms zero-voltage tolerance – In mature markets MTBF of grid to this spec = 250h – Leading PF of c0.95 – Harmonics at low load 30%THID, at full load c5% THID – More need for UPS with leading PF capacity and low THVD against load current distortion
  73. 73. ʀͬ Standards in developments?Standards in developments?Standards in developments?Standards in developments? I am Spartacus! Everyone is involved in guides, white papers and standards. Governments are increasingly interested in energy efficiency
  74. 74. ʀͬ International Standards workInternational Standards workInternational Standards workInternational Standards work EN50600 - Data Centre Infrastructure – Facility, power, cooling, cabling, fire, security etc – Availability Class replaces Tiers ISO/IEC JCT1 SC39 – Resource Efficient Data Centres – Sustainability for and by ICT – WG1 – Metrics; • IEC 30134-1 Introduction to KPIs • -2 PUE, -3 ITEE, -4 ITEU, -5 WUE • Then CUE, KREC, RWH and others • Korea favours an aggregated ‘silver bullet’ KPI – WG2 – Sustainability by ICT; low carbon enablement The Green Grid – Innovators in energy efficient facilities – Original work being adopted in ISO – Technical work continues apace so please come and join us!
  75. 75. ʀͬ Why metrics?Why metrics?Why metrics?Why metrics? You cant control what you don’t measure – Identify areas that need improvement and take actions – Monitor that improvement – Continuously move forward Legislation has to be based on measurementsLegislation has to be based on measurements – The CRC was to be based on PUE improvement – The best metrics are those suggested by the industry – Most facilities cannot be judged by the extremes of Google, Facebook et al
  76. 76. ʀͬ Conclusions or predictions?Conclusions or predictions?Conclusions or predictions?Conclusions or predictions? Data Centres are at the heart of the internet, enabling our digital economy. They will expand as our demands, for social, educational, medical and business purposes, for digital content and services grow – Facilities will become storage dominant and footprint will increase – Loads will become more load:power linear and, as a result, more dynamic. – Thermal management will become increasingly adopted and PUE’s will fall to c1.2 across all of Europe – Only larger, highly virtualised and heavily loaded facilities will enable low-cost digital services as the cost of power escalates Despite our best efforts power consumption will rise, not fallDespite our best efforts power consumption will rise, not fall – Data growth continues to outstrip Moore’s Law and a paradigm shift in network photonics and devices will be required but, even then, a change in usage behaviour will probably be required – Bitterlin’s Law forecasts a growth rate at c20% CAGR for the foreseeable future – often in connected locations where energy is cheap and taxes are low Only a restriction in access will moderate power consumption – Probably for ‘social’ applications rather than business, medical or education? – Through price, tax or legislation? Using DCIM to match load to capacity and maximising utilisation is one key component
  77. 77. ʀͬ But predicting the future of IT is risky...But predicting the future of IT is risky... +9 years Top500 June 2013, China 33.86 PetaFLOPS c20,000x 1997 Source: www.top500.org 1997 – the world’s fastest super-computer SANDIA National Laboratories ‘ASCI RED’ 1.8 teraflops 150m² raised floor 800kW 2006 Sony Playstation3 1.8 teraflops 0.08m² 0.2kW
  78. 78. ʀͬ Questions?Questions? Data centres are here to stay and will increase in number and power. Weincrease in number and power. We need to explain that this power growth problem is of society’s own making and not ‘dirty data-centres’
  • MamounAlzyoud

    Sep. 6, 2018

Views

Total views

1,590

On Slideshare

0

From embeds

0

Number of embeds

9

Actions

Downloads

74

Shares

0

Comments

0

Likes

1

×