• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The Modern Data Center Topology - The High Availability Mantra

The Modern Data Center Topology - The High Availability Mantra



This is slide deck of the first of a 4-webinar series "DCIM for High Availability" conducted by GreenField Software. It proposes that while Data Centers have been around for more than four decades, ...

This is slide deck of the first of a 4-webinar series "DCIM for High Availability" conducted by GreenField Software. It proposes that while Data Centers have been around for more than four decades, the reason why DCIM Software is now becoming an important tool for DC Managers is the need for maintaining a near 100% Uptime. The Data Center topology has changed as a result of this High Availability Mantra and new tools are required to effectively manage the Modern Data Center.



Total Views
Views on SlideShare
Embed Views



2 Embeds 14

http://www.linkedin.com 13
https://www.linkedin.com 1


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    The Modern Data Center Topology - The High Availability Mantra The Modern Data Center Topology - The High Availability Mantra Presentation Transcript

    • August 29, 2012 (Wednesday): Webinar 1: 11:00 AM ISTThe Modern Data Center Topology: The High Availability Mantra 1
    • About GreenField SoftwareOur Company Our Mission Incubated by a US$ 40 million Engineering Company Pioneering Energy & Environment Founders: Management Software for• Shekhar Dasgupta, ex-MD Cost Savings & Energy Optimization Oracle India• Abhijit Sen, Director, UD GroupOur Solutions Our Partners Data Center Infrastructure Management (DCIM) Sustainability Management for Manufacturing 2
    • Today’s Topics• The Modern Data Center Overview• The High Availability (HA) Mantra• Operating Challenges• A Solution 3
    • Modern Data Center Overview 4
    • Multiple Classes of Data Centers• Internet Data Center  used by external clients connecting from the Internet  supports servers and devices required for B2C transaction-based applications (e- commerce).• Extranet Data Center  provides support and services for external B2B partner transactions.  accessed over secure VPN connections or private WAN links between the partner network and the enterprise extranet.• Intranet Data Center  hosts applications and services mostly accessed by internal employees with connectivity to the internal enterprise network.ness services.• Special Purpose Data Center  For specialized application areas like Geological & Geophysical for Oil & Gas Industry May or may not be inter-connected 5
    • Common Objective: Business Continuity• Disaster Recovery Data Center  Each Class may have dedicated or Shared DR Center  Usually located separately from Primary Data Center• High Availability (HA) Data Center  Each Data Center provided for with significant redundancies  DR Center comes into play only when a Disaster strikes.  Component or system failures within any DC should be either self-healing or redundancies within the DC should take over• Insurance Against Power & Network Outages  Reliability through multiple service providers  Internal Back-upsness services.• Securing the Data Center  Against malicious hacking that can bring down the Data Center impacting business continuity  Implementing Firewalls/ Virtual Firewalls 6
    • Common Complexity: Multitude of Assets Multitude of Assets  Divided between two worlds: IT & Facilities  Includes Mission Critical Applications  Like a manufacturing operation  Raw Material: Power & Networks  Processing: Data  Output: Information Service  Needs: Asset Management, Resource Optimization, a la Manufacturing 7
    • The High Availability Mantra 8
    • Today’s High Availability Data CenterExtreme Redundancies for 99.99% Uptime -> Higher Power ConsumptionHuge Population of N+1/N+2 Equipment -> Asset Under utilization & Too complex to manage with spreadsheets & Visio toolsChain of inter-dependent equipment -> Multiple points of failuresKW per Rack increases as more processing capacity is added -> Trade-offs: need tosupport more per rack versus extra space & heat loads.Growing Heat Loads, Carbon Emissions & e-waste -> Sustainability Issues High Availability is Inversely Proportional to Asset Utilization & Energy Efficiency 9
    • When HA fails - Tale of Two DisastersAmazon RBS Amazon cloud outage takes down Tech fault at RBS and Natwest freezes Netflix, Instagram, Pinterest, & more millions of UK bank balancesWith the critical Amazon outage, which is the RBS and Natwest have failed to register inboundsecond this month, we wouldn’t be surprised payments for up to three days, customers haveif these popular services started looking at reported, leaving people unable to pay for bills,other options, including Rackspace, SoftLayer, travel and even food. The banks - both ownedMicrosoft’s Azure, and Google’s just- by RBS Group - have confirmed that technicalintroduced Compute Engine. Some of glitches have left bank accounts displaying theAmazon’s biggest EC2 outages occurred in wrong balances and certain servicesApril and August of last year. unavailable. There is no fix date available. Which Will Be The Next One? 10
    • What’s the High Availability Mantra?Availability % Downtime per year Downtime per month* Downtime per week99% ("two nines") 3.65 days 7.20 hours 1.68 hours99.5% 1.83 days 3.60 hours 50.4 minutes99.8% 17.52 hours 86.23 minutes 20.16 minutes99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes99.95% 4.38 hours 21.56 minutes 5.04 minutes99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds99.99999% ("seven nines") 3.15 seconds 0.259 seconds 0.0605 secondsAmazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) has hadtwo outages already in 2012 – each over 3 hours!• Tier 3/Tier 4 just defined by hardware redundancies• Glaring gaps in operating procedures to prevent fatal human errors• Lack of purpose-built BCP software to predict failures• Lack of chain of custody to detect root cause 11
    • Delivering the High Availability PromiseAdequate Redundancies• Are there any points of failure – besides power and external networks - that can impact uptime? (Not everything is N+1)• What are my redundancy paths?• Are the relationships & dependencies among critical assets clearly defined?• Can I do an impact analysis on the outage/downtime of any equipment? Can I predict the cascading effect of such an outage on other assets/applications in the data center?Preventing Failures• Can any failure be predicted to take proactive measures? Do I get alerts on threshold breaches so that I can take preventive actions before a failure happens?• Is there a history of a Move-Add-Change (MAC) that I should be aware of?• What is the impact of a MAC on space, power, cooling?• Where can new devices/servers be best placed? Floor -> Rack -> Cage. How this can be determined based on current infrastructure and other dependencies to avoid a failure?• How do I prevent a fatal human error? 12
    • Operating Challenges 13
    • The High Availability ChallengeAsset Over Provisioning Lack of HA Management Tool Too many assets; two classes of assets  IT assets tracked by Systems Management Tool Absence of Software Portfolio (even if hardware assets are tracked)  Facilities assets tracked by BMS Move-Add-Change: Decisions not  Two not inter-operable: Unable to based on simulations, analysis determine missing link for HA Absence of change management  Unable to track redundancy paths Absence of workflow approvals  HA fails if any equipment or software in critical path fails Unable to predict failures  HA fails if there’s fatal human error No chain of custody  Health and history of equipment, or previous MAC impact, not tracked Need to Predict Failures 14
    • Beyond HA: Infrastructure & Operational ChallengesEnergy Problems Operational Problems Higher power consumption & growing  Low level asset tracking power bills  Under utilization of many computing Not monitoring power use at device resources levels  Running of old inefficient equipment Dissemination of enormous heat  Decisions not based on analysis Creation of hot spots  Cooling not optimized Drastic reduction in expected life of  Floor & Rack Space: Non-optimal computing equipment placements of equipment Failing of a data center  Increasing demand for rack space Increase in CO2 emission  Absence of capacity planning Need to Improve Energy & Operational Efficiencies 15
    • A Solution 16
    • Solution That Bridges the Gap Between IT & Facilities Facilities IT Building Data Center IT System Management Infrastructure Performance System Management Management Data Center Infrastructure Management (DCIM) Software 17
    • Solution That Addresses The High Availability ChallengeAsset Over Provisioning Lack of HA Management Tool Tracks and manages both IT and non-  Single tool manages both IT & IT assets Facilities – single window helps in better monitoring and management. Rationalizes asset base and identifies assets for retirement, consolidation,  Tracks redundancy path & identifies replacement & repurpose. Single Point of Failure across the DC ecosystem. Tracks and records MAC of assets to the component level  Does trend analysis on device/application behavior & Provides Change & Work Flow performance and predicts failures. Management for better manageability, control & chain of  Tracks MAC and prevents disruption custody. due to unauthorized change. Monitors performance trends of  Change Management prevents assets and predicts failures. downtime due to human errors DCIM Helps to Predict Failures 18
    • Solution That Addresses Infra & Operational ChallengesEnergy Problems Operational Problems Measures power consumption till  In depth asset tracking of IT & non-IT device level  Identifies underutilized computing Identifies devices in the data center resources and recommends ways of with lower performance per watt rating optimization and recommends improvement  Identifies old equipment and methods recommends replacement Optimizes cooling  Enables decision making based on Measures PUE & DCiE of the data data and analysis center and identifies inefficiencies  Optimizes floor and rack space Monitors health of the data center utilization continuously and compares it with  Enables more accurate capacity global benchmarks planning based on real-time data Reports CO2 emission rather than assumptions DCIM Improves Energy & Operational Efficiencies 19
    • Anatomy of a DCIM Software: GFS Crane DC Enables a More Efficient, Higher Availability & Greener Data Center 20
    • Thank You and Q&Ahttp://www.greenfieldsoft.comEmail: sales@greenfieldsoft.com 21
    • Next Webinar: September 26, 2012 (Wednesday): Data Center Infrastructure 11:00 AM ISTManagement: ERP for the Data Center Manager 22