Continuous Monitoring and
Faster Service Restoration
(CM and FSR)
How do we quickly
restore our
services back post
an incident/
outage?
The Problem: For the larger portfolio
of application and services, most of
which are third party and off-the-shelf
based solutions. Due to a wide variety
in how these solutions were designed
and deployed overtime, stop-start
procedures vary widely. There are
often multiple upstream and
downstream dependencies to be met
for restarts. Traditionally, much of the
stop-start or restart of applications is
conducted manually or in a semi-
automated fashion (within an app), that
require an ops engineer to login to
multiple systems to restore full service
of a given application. This leads to
applications being unavailable to
businesses for a prolonged time during
a major incident.
Given the vast heterogeneity of the EBS
portfolio of applications it is important to
provide a stable consolidated solution for
auto-restart. E BS App Ops has
embarked on an initiative (EBS Faster
Service Restoration FSR) to improve
restoration of its applications through
automation. At this point the focus is on
reducing time to recover, automated
dependency management and
eliminating human errors rather than self
healing (i.e. crawl, walk, run). Primary
objectives are to achieve RTO < 15 mins
or to reduce the current time to restore
by at least 80%. This standard
framework needs to adopt across a
variety of applications, be secure &
compliant and provide for verification of
availability of capabilities integrated into
a dashboard.
Continuous Monitoring and Faster Service Restoration (CM and FSR)

Continuous Monitoring and Faster Service Restoration (CM and FSR)

  • 1.
    Continuous Monitoring and FasterService Restoration (CM and FSR)
  • 2.
    How do wequickly restore our services back post an incident/ outage?
  • 3.
    The Problem: Forthe larger portfolio of application and services, most of which are third party and off-the-shelf based solutions. Due to a wide variety in how these solutions were designed and deployed overtime, stop-start procedures vary widely. There are often multiple upstream and downstream dependencies to be met for restarts. Traditionally, much of the stop-start or restart of applications is conducted manually or in a semi- automated fashion (within an app), that require an ops engineer to login to multiple systems to restore full service of a given application. This leads to applications being unavailable to businesses for a prolonged time during a major incident.
  • 4.
    Given the vastheterogeneity of the EBS portfolio of applications it is important to provide a stable consolidated solution for auto-restart. E BS App Ops has embarked on an initiative (EBS Faster Service Restoration FSR) to improve restoration of its applications through automation. At this point the focus is on reducing time to recover, automated dependency management and eliminating human errors rather than self healing (i.e. crawl, walk, run). Primary objectives are to achieve RTO < 15 mins or to reduce the current time to restore by at least 80%. This standard framework needs to adopt across a variety of applications, be secure & compliant and provide for verification of availability of capabilities integrated into a dashboard.