Top Three Root Causes of Data Center Outages

•

0 likes•33 views

UPS failure, accident / human error & cooling system failure are major reasons for data center outage. DCIM can detect downtime issues with constant monitoring.

Technology

Top Three Root Causes of Data Center Outages
The thought of unplanned downtime strikes fear in the hearts of every data center operator. The
most recent Ponemon Institute “Cost of Data Center Outages” report, from January 2016, pegged the
average cost of downtime at nearly $8,000 per minute. The maximum data center outage cost nearly
$2.5 million.
Clearly, minimizing the risk of downtime is a high priority for data center operators. The good news is
that the root cause of most outages can be traced to just a handful of problems. What’s more, human
error and data center infrastructure equipment failure are far more likely to cause outages than IT
equipment downtime.
- UPS failure. One-fourth of all outages are caused by UPS system failure, according to the Ponemon
report. UPS devices are indispensable to data center operations, but they’re often forgotten once
they’ve been installed. Battery failure is the chief cause of UPS problems, and rising data center heat
loads can reduce battery life substantially.
- Accident / human error. Human beings are the root cause of 22 percent of outages, either through
accident or negligence. Automation and artificial intelligence can help by eliminating many repetitive
tasks, but there’s no substitute for training and accountability.
- Cooling system failure. Although the number of outages attributable to cooling system failure have
decreased, from 15 percent in 2010 to 11 percent in 2016, the cost of such outages has increased
more than 20 percent over the same time period. Increasing data center heat loads have made
cooling system failure a more significant threat.
Armed with this knowledge, data center operators are in a better position to develop policies and
procedures that reduce risk. Standard operating procedures (SOPs), methods of procedure (MOPs)
and site configuration policies (SCPs) should focus on the most critical workloads and the most likely
causes of an outage. They should be reviewed and updated regularly, ideally by incorporating them
into day-to-day operations.
Emergency operating procedures (EOPs) should also be developed, tested and practiced. If staff can
respond quickly and appropriately to an incident, they often can prevent it from becoming a full-scale
outage.
Data center infrastructure management (DCIM) tools can help IT teams detect issues that could lead
to downtime by monitoring the health of various systems and presenting the data in easy-to-read
dashboards. Best in-class DCIM tools also provide asset management, capacity management and
energy management capabilities, and can present a virtual 3-D view of the data center including room
layouts, rack diagrams and cabling. This information can help IT teams assess the impact of changes
and fine-tune policies and procedures accordingly.
Organizations should also refresh data center infrastructure components regularly to not only reduce

What's hot

Werner SchierschmidtNOSA (Pty) Ltd

7 deadly sins of backup and recoverygeekmodeboy

Uninterruped Power Supplies - DP Air dpsir

Top 5 Steps to Disaster Preparedness for Businesses- Mark - Fullbright

Process Safety Blind Spots: EXPOSED [Infographic]Darwin Jayson Mariano

2008 epsc - accident avoidanceAndy Brazier

Data security risks and the cost of business continuity (slideshare) tmcs q...tmcscs

What every IT audit should know about backup and recoveryessbaih

5 common misconceptions about Disaster Recovery demystified | SysforeSysfore Technologies

The Ultimate Data Protection SolutionNamtek Consulting Services

Risk Assessment (Project: Cosmic Constructor)MagicalPotato9000

Dr Brett Solomon: The Neuroscience behind Safety SAMTRAC International

Incident investigation and reportingJillian Bower

PACE-IT: Rack and Power ManagementPace IT at Edmonds Community College

BA 257 C1.C2mattheweric

Pace IT - Prohibited Content and ActivityPace IT at Edmonds Community College

3.10 Introducing large ict systems into organisationsmrmwood

Pace IT - Environmental Impact and ControlPace IT at Edmonds Community College

Individual Assignment - 6037PS2013039mazraara

What's hot (19)

Werner Schierschmidt

7 deadly sins of backup and recovery

Uninterruped Power Supplies - DP Air

Top 5 Steps to Disaster Preparedness for Businesses

Process Safety Blind Spots: EXPOSED [Infographic]

2008 epsc - accident avoidance

Data security risks and the cost of business continuity (slideshare) tmcs q...

What every IT audit should know about backup and recovery

5 common misconceptions about Disaster Recovery demystified | Sysfore

The Ultimate Data Protection Solution

Risk Assessment (Project: Cosmic Constructor)

Dr Brett Solomon: The Neuroscience behind Safety

Incident investigation and reporting

PACE-IT: Rack and Power Management

BA 257 C1.C2

Pace IT - Prohibited Content and Activity

3.10 Introducing large ict systems into organisations

Pace IT - Environmental Impact and Control

Individual Assignment - 6037PS2013039

Similar to Top Three Root Causes of Data Center Outages

Is Overcoming Data Center Outage Possible?Web Werks Data Centers

White paper - Actionable Alarming - Wonderware-Schneider ElectricSuman Singh

7 deadly data centre sins: how to recognise themKatieirelandSSE

E guide weathering the storm at your businessSoma Technology Group

What is Data Center Management_ - Modius _ DCIM - Data Center Infrastructure ...hrutikeshAnpat

Essential Elements of Data Center Facility OperationsSchneider Electric

Fm computer roomcleaningguideserver computer room cleaning

Facility Optimizationcwoodson

Business Risk: Effective Technology Protecting Your Businessat MicroFocus Italy ❖✔

StruxureWare DCIM Q&ARogier den Boer

Data Centers In USmsirmajritchie

Schneider - ASK THE EXPERT - ITNEXTAniket Patange

BLOG-POST_DATA CENTER INCENTIVE PROGRAMSDaniel Bodenski

How healthy is your computer estate?1E: Software Lifecycle Automation

P r o t e c t i n g y o u r b u s i n e s smatele41

SWEATING THE ASSET HOW IMPROVING THE HEALTH AND PERFORMANCE OF YOUR PCS CAN E...1E: Software Lifecycle Automation

14-15 Ask the ExpertAniket Patange

The need for a fluid data centre, what that actually means, and the hurdles i...Nordic Digital Business Summit

Computer integrated manufacturingSyed Ajeesh

산업인터넷이 업무 생산성을 높이는 방법GE코리아

Similar to Top Three Root Causes of Data Center Outages (20)

Is Overcoming Data Center Outage Possible?

White paper - Actionable Alarming - Wonderware-Schneider Electric

7 deadly data centre sins: how to recognise them

E guide weathering the storm at your business

What is Data Center Management_ - Modius _ DCIM - Data Center Infrastructure ...

Essential Elements of Data Center Facility Operations

Fm computer roomcleaningguide

Facility Optimization

Business Risk: Effective Technology Protecting Your Business

StruxureWare DCIM Q&A

Data Centers In US

Schneider - ASK THE EXPERT - ITNEXT

BLOG-POST_DATA CENTER INCENTIVE PROGRAMS

How healthy is your computer estate?

P r o t e c t i n g y o u r b u s i n e s s

SWEATING THE ASSET HOW IMPROVING THE HEALTH AND PERFORMANCE OF YOUR PCS CAN E...

14-15 Ask the Expert

The need for a fluid data centre, what that actually means, and the hurdles i...

Computer integrated manufacturing

산업인터넷이 업무 생산성을 높이는 방법

Recently uploaded

Understanding the Laravel MVC ArchitecturePixlogix Infotech

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group

Slack Application Development 101 Slidespraypatel2

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Recently uploaded (20)

Understanding the Laravel MVC Architecture

GenCyber Cyber Security Day Presentation

Next-generation AAM aircraft unveiled by Supernal, S-A2

Slack Application Development 101 Slides

The transition to renewables in India.pdf

08448380779 Call Girls In Friends Colony Women Seeking Men

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Benefits Of Flutter Compared To Other Frameworks

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Pigging Solutions Piggable Sweeping Elbows

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

08448380779 Call Girls In Civil Lines Women Seeking Men

Breaking the Kubernetes Kill Chain: Host Path Mount

Maximizing Board Effectiveness 2024 Webinar.pptx

Unblocking The Main Thread Solving ANRs and Frozen Frames

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

Injustice - Developers Among Us (SciFiDevCon 2024)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Top Three Root Causes of Data Center Outages

1. Top Three Root Causes of Data Center Outages The thought of unplanned downtime strikes fear in the hearts of every data center operator. The most recent Ponemon Institute “Cost of Data Center Outages” report, from January 2016, pegged the average cost of downtime at nearly $8,000 per minute. The maximum data center outage cost nearly $2.5 million. Clearly, minimizing the risk of downtime is a high priority for data center operators. The good news is that the root cause of most outages can be traced to just a handful of problems. What’s more, human error and data center infrastructure equipment failure are far more likely to cause outages than IT equipment downtime. - UPS failure. One-fourth of all outages are caused by UPS system failure, according to the Ponemon report. UPS devices are indispensable to data center operations, but they’re often forgotten once they’ve been installed. Battery failure is the chief cause of UPS problems, and rising data center heat loads can reduce battery life substantially. - Accident / human error. Human beings are the root cause of 22 percent of outages, either through accident or negligence. Automation and artificial intelligence can help by eliminating many repetitive tasks, but there’s no substitute for training and accountability. - Cooling system failure. Although the number of outages attributable to cooling system failure have decreased, from 15 percent in 2010 to 11 percent in 2016, the cost of such outages has increased more than 20 percent over the same time period. Increasing data center heat loads have made cooling system failure a more significant threat. Armed with this knowledge, data center operators are in a better position to develop policies and procedures that reduce risk. Standard operating procedures (SOPs), methods of procedure (MOPs) and site configuration policies (SCPs) should focus on the most critical workloads and the most likely causes of an outage. They should be reviewed and updated regularly, ideally by incorporating them into day-to-day operations. Emergency operating procedures (EOPs) should also be developed, tested and practiced. If staff can respond quickly and appropriately to an incident, they often can prevent it from becoming a full-scale outage. Data center infrastructure management (DCIM) tools can help IT teams detect issues that could lead to downtime by monitoring the health of various systems and presenting the data in easy-to-read dashboards. Best in-class DCIM tools also provide asset management, capacity management and energy management capabilities, and can present a virtual 3-D view of the data center including room layouts, rack diagrams and cabling. This information can help IT teams assess the impact of changes and fine-tune policies and procedures accordingly. Organizations should also refresh data center infrastructure components regularly to not only reduce

2. risk but increase efficiency. For example, UPS systems should be replaced every five to eight years to ensure seamless operation, but organizations may want to upgrade more frequently to take advantage of today’s compact, energy-efficient and feature-rich units. Data center outages take a financial toll through business disruption, lost revenue and reduced productivity. The trickle-down effects of brand damage and missed opportunities can haunt organizations for years to come. However, the right policies, procedures and infrastructure components can help reduce the frequency, duration and cost of downtime. Contact Enconnex for help in optimizing your data center infrastructure to maximize availability.

Top Three Root Causes of Data Center Outages

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Top Three Root Causes of Data Center Outages

Similar to Top Three Root Causes of Data Center Outages (20)

More from Rahi Systems

More from Rahi Systems (13)

Recently uploaded

Recently uploaded (20)

Top Three Root Causes of Data Center Outages