Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Survival guide for a data center power shutdown final

  • Be the first to comment

  • Be the first to like this

Survival guide for a data center power shutdown final

  1. 1. Survival guide for a data center power shutdown Imagine you are running a data center (DC) housing critical applications and solutions. Now imagine that you are told that this DC has to sustain an extended power outage. The big question for all involved is, "How much restful sleep can I expect until the DC is back online?" What is the impact of an improperly executed DC shutdown? Failure to shut down a DC gracefully can lead to severe impact resulting in downtime and a loss of revenue as well as create significant customer dissatisfaction. It is imperative to avoid DC outages. But if you find yourself in that situation, following these tips might save you a lot of headaches. Planning: Strong medication to preserve your sleep cycle • Visibility • Collaboration • Coordination • Contingency and business continuity
  2. 2. For the first dose, you absolutely must have complete understanding of the DC environment and dependencies. It's tremendously beneficial to have visibility into your physical, virtual, network and storage infrastructure. Ideally, you are using IBM Cloud and Smarter Infrastructure (CSI) monitoring products such as IBM Tivoli Monitoring (ITM), IBM Tivoli Network Manager IP Edition (ITNM-ip) and IBM Tivoli Storage Productivity Center (TPC) to manage your DC. Such an arduous and complex task cannot be accomplished by single individual or team; it requires a major collaborative effort and cross-team coordination to be successful. Knowing the roles and responsibilities of every DC service, application and asset owner will ensure that the data center shutdown and restore goes as smoothly as possible. With even the best laid plans, some systems may not come back up. Having a strong business continuity plan and backups for critical services will go a long way to ensure restful sleep in the nights after the shutdown! Execution: Shutting down with grace and style If the team has planned and coordinated well, this step should flow easily. Knowing the sequence for shutting down services, servers, storage and network devices is crucial. For example, shutting down domain name server DNS can break your monitoring database connectivity before you're ready to gracefully shut it down! The following is the general sequence that the CSI-IT team follows whenever we go through a planned data center outage: 1. Make sure you have a successful backup of all physical and virtual systems. 2. For physical systems: a. Shut down the application or services gracefully. b. Shut down databases and other middleware. c. Shut down the physical systems. 3. For virtual systems : a. Shut down your applications or services. b. Shut down databases and other middleware. c. Shut down the virtual instances – virtual machines (VMs) or logical partitions (LPARs). d. Shut down your virtual input/outputs (VIOs), vCenter or kernel-based virtual machine (KVM) host. e. Shut down Hardware Management Consoles (HMCs) if in use. f. Power down cloud infrastructure racks. 4. Shut down storage infrastructure.
  3. 3. 5. Shut down network services infrastructure. Power's back up; is the data center? Just like the shutdown of the DC, powering it back up must be systematic and, essentially, the reverse of the shutdown sequence. This is where using the value of CSI monitoring products shines through by providing insight into service, server, appliance and device state by facilitating expedited restoration of services. A key caveat when powering racks back online is to pace oneself so as to not trip a power breaker and risk an unexpected outage for the DC. It's never fun recovering from the chaos resulting from a lack of patience! Conclusion Given the magnitude and complexity of the CSI data center environments, power outage events require close coordination, communication and flawless execution to be successful. In our own experience with a recent power shutdown, CSI-IT followed these rules of thumb and managed to minimize impact to the DC. We shut down over 4,000 physical and virtual assets and experienced a less than 0.001 percent failure rate! How will you avert sleepless nights when dealing with your own data center shutdowns? This blog post was a collaborative effort by the CSI-IT Lab and Network Services Organization. Please leave a comment or contact us on Twitter (@ShakeMan_A or @GigaManz) or by email (salmuabe@us.ibm.com, hestrada@us.ibm.com or mfarid@us.ibm.com).

×