SlideShare a Scribd company logo
Continuity and Resilience (CORE)
ISO 22301 BCM Consulting Firm
Presentations by our partners and
extended team of industry experts
Our Contact Details:
INDIA UAE
Continuity and Resilience
Level 15,Eros Corporate Tower
Nehru Place ,New Delhi-110019
Tel: +91 11 41055534/ +91 11 41613033
Fax: ++91 11 41055535
Email: neha@continuityandresilience.com
Continuity and Resilience
P. O. Box 127557
Abu Dhabi, United Arab Emirates
Mobile:+971 50 8460530
Tel: +971 2 8152831
Fax: +971 2 8152888
Email: info@continuityandresilience.com
H A & D R Design Concepts
S Seshadri
Head – IT DR & Service Management
Continuity and Resilience
10th Feb, 2014
Dubai
2
Outage Categorization
• Service failures that should/need not be known to end users
need ‘fault protection’ – the operation of such services will be
continuous despite failure scenarios
• Short interruptions (within a few hours) are referred to as
‘minor outages’
• Longer interruptions, when end users’ business services get
delayed for longer durations, are termed as disaster situations
or ‘major outages’
3
Key Questions
1. Which systems should ‘never’ fail – we may need Fault Tolerant
systems in their place
2. What failures should be handled transparently, where an outage
must not occur? Against such failures we need fault protection.
3. How long may a short-term interruption be that happens once a day,
once a week, or once a month? Such interruptions are called minor
outages.
4. How long may a long-term interruption be that happens very seldom
and is related to serious damage to the IT system? For instance, when
will this cause a big business impact, also called a major outage or
disaster?
5. How much data may be lost during a major outage? And in which state
– persistent or ephemeral…
6. What failures are deemed so improbable that they will not be
handled, or what failures are beyond the scope of a project?
4
Business Issues & Cost of IT Outage
• IT Fault Protection has to be driven by business
considerations
• Business Continuity is the overall goal
• Business imperatives manifest through BIA/RA and
MTPoD/RTO/RPO
• IT Outage is not the real issue, but the business
consequences are
• IT Outage affects revenues & costs adversely
• Direct Costs – repairs, penalties, lost revenue
• Indirect Costs – lost & additional work hours
5
Cost Vs Benefit
• IT Recovery has extensive cost implications – both in terms of
Capex and Opex
• Strategies developed should be cost effective
• ‘Technology for the sake of Technology’ approach should be
completely avoided
• Strategies should, as far as possible, be able to address
disruptions and impacts collectively
• Organizational objectives and risk appetite should direct
recovery strategies
• Legal, contractual and regulatory aspects play a major role
(SOX, SAS 70, BASEL II/III…..)
6
IT Service Outage
• Importance of IT Services depends on
– Business relevance
– Revenues
– Functionality that they enable
– Amount of damage due to the outage
– Any regulatory aspect that demands the service
• Outage Categorization is dictated by the importance of the
service and hence the significance of its failure
7
High Availability
• High availability is the characteristic of a system to protect
against or recover from minor outages in a short time frame
with largely automated means.
• HA has 3 essential features
– Outage categorization is ‘minor’- we need to envisage
potential failure scenarios for the service and the minor
outage requirements for them - robustness
– System category should involve Mission Critical & Business
Important and Business Foundation processes which need
to be recovered within a very short time – RTO/RPO
– Component (SPoF) level protection which will facilitate
automatic recovery – redundancy
• HA features are normally built within the primary data center
and data replication is synchronous
8
Continuous Availability
• Continuous Availability is the highest point of High Availability,
wherein, every component failure is protected against, and no ‘after
failure recovery’ takes place
• These are known as Fault Tolerant systems, that provide automatic,
high-speed ‘failover’ in the case of h/w or s/w failures
• They have ‘internal multi-computer systems architecture’ that have
no shared central components, including memory
• Tandem’s ‘non-stop’ systems and Stratus’s fault tolerant computers
are examples of this
• These are used by the leading stock exchanges globally (NSE in India
uses Stratus and BSE, Tandem), and by banks for their ATM related
transaction processing
• These systems scale extremely well to the largest commercial
workloads
• These systems were introduced originally by Airbus for their A-320
planes for on-board flight controls In their long duration flights
HA Components
Essential ingredients of High Availability are:
• Availability
• Reliability
• Serviceability
We will discuss the above three in the following
slides.
10
Availability & Metrics
• Availability – How long a service or system component is
available for use and the features that help the system to stay
operational despite occurrence of failures, eg. NIC, Mirrored
Disks, Redundant Power Supply
• Availability = uptime/uptime+downtime
• Downtime will include scheduled downtime also
• Elapsed time can be measured as wall clock time
• Availability can be expressed in absolute numbers (79 hrs out
of 80 hrs or as a percentage (99.89%)
• Availability = MTBF/MTBF+MTTR (????)
– MTBF: Mean Time Between Failures
– MTTR: Mean Time To Repair
11
Reliability & Metrics
• Reliability is a measure of ‘fault avoidance’
• Refers to the ‘probability that a system will be available over a
time interval T’
• MTBF is a measure of Reliability
• Annual Failure Rate (AFR) is the inverse of MTBF
• Reliability features help to ‘prevent’ and ‘detect’ failures
• H/w reliability has tremendously improved over the last 30
years and they are highly resilient nowadays
Component MTBF (Hours) MTBF (Years) AFR (per year)
Disk Drive 300,000 34 0.0292
Power Supply 150,000 17 0.0584
Fan 250,000 28 0.0350
NIC 200,000 23 0.0438
12
Serviceability
• Measurement that expresses how easily and quickly
a system is serviced and repaired
• The lower the planned service time, the higher is the
availability
• Planned serviceability goes into the architecture as a
design objective
• Actual serviceability should be lower than planned
serviceability
• These clauses have to be carefully built into the
Service Level Agreements with IT vendors
• Murphy’s Law: Anything that can possibly go wrong,
does
13
HA/DR Strategy - Aspects
• Data – what is the architecture concerned with
• Function – how is the data worked with
• Location – where is the data worked with
• People – who works with the data and achieve the
functionality
• Time – when is the data processed
Each of the above aspects are run through 3 levels of abstraction
• Objectives – What will this achieve vis a vis org objectives
• Conceptual Model – Realization of the objectives on a
business process level
• System Model – Logical data model and the application
functions that must be implemented to realize the business
concepts
14
HA/DR Framework (Zachman)
Objectives Conceptual Model System Model
Data
(What)
Business Continuity /
IT Service Continuity
Availability of mission-
critical and important
business services
ICT categories,
dependency diagrams
Function
(How)
Map biz processes to IT
services, RTO, RPO, SLA
ITIL processes, IT
processes, projects
Design patterns – RAS,
redundancy, backup,
replication,
virtualization
Location
(Where)
Internal (IT),
Outsourced
Data Center, Disaster
Recovery Center
All systems, all
categories
People
(Who)
Biz process owner CIO/IT dept IT PM, Architect,
System Engineers,
System Administrators
Time
(When)
Implementation Plan Outage scenarios,
categories
Failure/Change/
Incident/Problem
/Disaster
15
HA/DR System Design
• System Model discussed earlier is the core of this activity
• ‘What’ and ‘How’ of the System Model will lay the foundation
for HA/DR System Design
• Protection against outages of computers, systems and
databases are in scope for HA
• Protection against infra/building/city/ outage,
user/administrative errors are in scope for DR
• Sound processes, solid architecture, careful engineering and
an eye for details are the hall marks of a good HA/DR system
design
16
HA/DR Touch Points
• User Environment
• Administration Environment
• Application
• Middleware
• Network Infrastructure
• Operating System
• Hardware (Servers, Storage, Backups etc)
• Physical Environment (Power, Fire, Floods etc)
17
HA/DR Scoping
• Take into account regulatory aspects (SOX, SAS, Basel II)
• Identify the key applications (from business BIAs)
• Check out the various ICT environments required by these
applications (IT BIA)
• Identify the dependencies
• Carefully identify and document the component categories
that are not required – scope exclusions
• Prepare preliminary system scope – list of component
categories required for HA/DR
• Identify failure scenarios for each of these component
categories
• Document the failure scenarios that are outside the scope
• The component categories and the failure scenarios will
constitute the scope of HA/DR
18
Redundancy & Replication
• Redundancy is the ability to continue operations in the case of
component failures
• Recovery is done through ‘managed component repetition’
• Eliminating ‘single points of failure’ is the goal
• Just adding a second component is not enough
• Replicated component has to be ‘managed’ to take over in
case the original component fails (failover)
• This ‘management’ can be automated or manual
• Replication of the ‘state’ of the component is crucial
• Replication may be a duplicate part, an alternate system (HA)
or an alternate location (DR)
• 100% redundancy through replication is very expensive and
difficult to achieve
19
Data Replication
• Redundancy for Disk Drives means ‘data replication’ and hence very
crucial
• Redundant disks provide multiple storage of data and/or OS
• Data disks carry one of the highest risks
• OS disks usually house the root file system and swap space
• Data Replication can be ‘synchronous’ or ‘asynchronous’
• RPO considerations should dictate data replication approach
• For very low or nil RPO, latency in data replication may not be
tolerated (synchronous vs asynchronous)
• Bandwidth considerations also impact replication
• Data Deduplication technology in recent times along with data
compression has reduced much of the headaches involved with
data replication
• Two main types of date replication
– Host based/Storage based
20
Virtualization
• Virtualization, as a concept, was demonstrated in 1960s ,
when IBM’s Thomas J Watson Research Center simulated
‘multiple pseudo machines’ on a single 7044 MX Mainframe
• Virtualization allows multiple operating system (OS) instances
to run concurrently on a single computer.
• It is a means of separating hardware from a single OS, by
“inserting an abstraction layer” into the software stack.
• Each ‘Guest’ OS is managed by a Virtual Machine Monitor.
• Virtualization Software can also collect a number of separate
resources and “pool” them, even if the devices or resources
remain in separate physical locations.
• The end goal is sharing the resources and capabilities flexibly,
under software control.
• The part of the virtualization package that enables to interact
with and control the VMs is referred to as the Virtual Machine
Monitor (VMM) or Hypervisor software.
21
Virtualization of Resources
• They supply resources in logical units to application programs and free
them from reliance on specific hardware
• Virtualization of Servers allows business to consolidate the workloads
running on multiple servers to just a FEW
• Storage Virtualization hides the physical storage from applications on host
systems, and presents a simplified (logical) view to the applications and
allows them to reference the storage resource by its common name
whereas the actual storage could be on a complex, multilayered,
multipath storage networks.
• RAID is an early example of storage virtualization.
• Virtual CPU is one of the oldest concepts, which has enabled
multiprocessing capability, handled by OS
• Virtual Memory is as old as Virtual CPU – again handled by the OS as part
of Virtual Memory Management
• Working within a virtualized environment may add some options and new
flexibility to your HA and DR plans.
22
Storage Virtualization
• With regard to storage, the objective is to bring together multiple
storage devices under unified command, whether they are from the
same manufacturer or not, and without regard for their physical
locations.
• Once accomplished, the now-unified band of storage systems can
be treated as a single, huge storage capacity that can be
provisioned, managed, backed up to tape, and even replicated to
offsite disaster recovery (DR) or high availability (HA) sites, with
greater visibility, synchronized automation, and reduced
management labour.
• Even archiving, multi-level storage, and information lifecycle
management (ILM) efforts can be made simpler, with older, slower,
or cheaper storage units provisioned to handle the near-line or
archival storage while newer, faster devices handle the current
production processes.
23
Host Clustering
• Increasing availability through redundancy on the host level
by taking several hosts and using them to supply a bunch of
services, where each service is not strictly associated with a
specific computer
• Host Clustering addresses
– Hardware errors
– OS errors
– Application errors
• Failover clusters , which allow a service to migrate from one
host to another in the case of an error. They are the most
used technology for high availability.
• Load-balancing clusters, which run a service on multiple hosts
from the start and handle outages of a host – more relevant
for performance than HA.
24
Middleware
• Generally considered to be the layer between the OS and the
applications
• They are independent of applications but carry application-
specific configuration and used by multiple applications
• Database Servers, Web Servers, Application Servers,
Messaging Servers are some examples
• HA for these will include product specific clustering, data
replication, and even session state replication
• Properly configured failover cluster sufficiently integrated
with the DB Server provides HA
• Redo log file shipping (asynchronous) with commits delayed
by the RPO will provide the best DR
• HA for Web Servers and Messaging Servers are achieved
mostly through Load-balancing Clusters (stateless)
25
HA for Applications
• Application HA is the eventual goal
• Application categories – Off the Shelf, Bought & Customized,
In-house Built
• Failover cluster is an approach most commonly adopted for all
categories of applications
• Applications touch the nerve center of all the following
systems:
– Development
– Acceptance/Integration Test
– Staging & Release
– Production
– Disaster Recovery
• Suitable precautions must be taken while coding/testing
stages to ensure HA
26
Networks
• Network is the backbone of ICT as it provides the linkages and
ability to communicate between component categories
• Various types of networks are
– LAN, VLAN, MAN, WAN, VPN, Intranet, Extranet, Internet
• And there are n/w components that help build and run the
networks – NIC, switches, routers, hubs, firewalls etc.
• Connectivity is the most major element of networks
• Data management on the network is done through encoding, data
compression & encryption/decryption
• Power supply, Heating, Ventilating & Air Conditioning (HVAC) are
two other important considerations
• It is absolutely essential to provide redundancies at each of the
network and component level/s for network HA
• Generally, there is no pay-load based state for any of these – hence
two or more devices would ensure HA
27
Data Back up and Restoration
• A major requisite for HA & DR
• Management of backed up data is equally important
• Restoration of data must work effectively
• Automated mechanisms exist
• System/file/database backups are the key
• Full or incremental backup
• Consistency of the data state is crucial
• Checkpoint functionality is useful in this context
• Storage and handling of backup media is very significant
• Remote (including at the DR site) storage of backups including
Tape Vaulting should be institutionalized
• Testing/recycling and proper maintenance of backup media
• Backup on failover clusters should distinguish between
physical and logical hosts in the cluster
28
HA & DR – Positioning
• HA and DR are two sides of the same coin
• Redundancy, Replication and Robustness are the key
characteristics of both HA & DR
• HA focuses on fault protection and is built on mostly
automated recovery techniques for minor outages
• HA is not built for environmental disasters like floods, fire,
earthquake and manmade incidents like terrorist attacks,
human errors of huge magnitude
• The above additional scenarios and major outages lead to the
need for DR, that focuses only on recovery
• DR is also associated with a large part of manual recovery in
terms of Emergency Management and Damage Assessment &
Recovery apart from IT Recovery
• When the primary data center is unavailable, migration to DR
site will be the only option
29
Disaster Recovery
• Disaster recovery is the ability to continue with services in the
case of major outages, often with reduced capabilities or
performance.
• Disaster recovery handles the disaster when either a single
point of failure is the defect or when many components are
damaged and the whole system is rendered non-functional.
• Operations cannot be resumed on the same system or at the
same site. Instead, a replacement or backup system, usually
located at another place is activated and operations continue
from there.
• Disaster recovery often restores only restricted resources and
thus restricted service levels.
• Continuation of service also does not happen instantly, but
will happen after some outage time.
30
DR in Context
• IT DR is activated when the likely recovery time is above the
least RTO and there is expected data loss
• IT recovery will be limited only by the agreed levels of service
by the business owners
• IT DR activities will be carried out of the DR site, which should
be equipped fully to handle IT services upto agreed levels
• Scaling up the IT services in due course of time will generally
be outside the purview of DR Planning
• Agreed levels of IT services are resumed in the DR site using
the infrastructure and back up data/tapes there
• The roles of primary and DR sites are interchangeable but not
in the strict sense of HA
• In the above scenario, both primary and DR sites will be
functional, even though they may cater to different business
activities/IT services
31
DR and the Cloud
• Cloud is the latest buzz word in outsourced business model
• Leveraging cloud model can optimize DR procedures
• Reduces the high cost of maintaining stand-by sites
• Cloud service providers normally have state of the art systems
and infrastructure, huge bandwidth, exacting security setup,
apart from complying with relevant ISO guidelines and
industry best standards.
• According to recent Aberdeen study report, DR is the leading
‘use case’ for cloud
• The key advantages are recovery times, virtualization and
multi-site availability
• Concerns regarding security, identity and compliance to
various regulations do exist as the cloud model matures
• With data volumes growing at the rate of 10 times every 5
years, cloud computing is likely to see a huge growth
32
DR in the Supply Chain
• Supply Chain is basically a delineation of dependencies
depicting the various actors in the chain of a product or
service from a vendor till reaching a consumer
• IT DR dependencies are manifold – internal customers, ICT
equipments, external vendors and service providers, IT staff,
etc etc…
• DR planning should judiciously take into account the inherent
risks in the supply chain and provision suitable mechanisms to
handle them effectively, so that the DR goal does not derail
• Typically, if Data Center support is outsourced, there is a huge
dependence on the Service Provider – timely availability of
people, spares, replacements etc.
• Supply chain glitches can emerge from as innocuous a thing as
consumables supplies
33
Thank You
S Seshadri

More Related Content

What's hot

Azure IAAS architecture with High Availability for beginners and developers -...
Azure IAAS architecture with High Availability for beginners and developers -...Azure IAAS architecture with High Availability for beginners and developers -...
Azure IAAS architecture with High Availability for beginners and developers -...
Malleswar Reddy
 
HPE InfoSight for Servers
HPE InfoSight for ServersHPE InfoSight for Servers
HPE InfoSight for Servers
Xylos
 
Azure security and Compliance
Azure security and ComplianceAzure security and Compliance
Azure security and Compliance
Karina Matos
 
Enterprise Mobility+Security Overview
Enterprise Mobility+Security Overview Enterprise Mobility+Security Overview
Enterprise Mobility+Security Overview Chris Genazzio
 
Azure 101
Azure 101Azure 101
Azure 101
Korry Lavoie
 
High Availability in Microsoft Azure
High Availability in Microsoft AzureHigh Availability in Microsoft Azure
High Availability in Microsoft Azure
Krunal Trivedi
 
Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...
Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...
Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...
Edureka!
 
Develop an Enterprise-wide Cloud Adoption Strategy – Chris Merrigan
Develop an Enterprise-wide Cloud Adoption Strategy – Chris MerriganDevelop an Enterprise-wide Cloud Adoption Strategy – Chris Merrigan
Develop an Enterprise-wide Cloud Adoption Strategy – Chris Merrigan
Amazon Web Services
 
Azure Virtual Desktop Overview.pptx
Azure Virtual Desktop Overview.pptxAzure Virtual Desktop Overview.pptx
Azure Virtual Desktop Overview.pptx
ceyhan1
 
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!
Johan Biere
 
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
Amazon Web Services
 
Understanding Azure AD
Understanding Azure ADUnderstanding Azure AD
Understanding Azure AD
New Horizons Ireland
 
An introduction to Defender for Business
An introduction to Defender for BusinessAn introduction to Defender for Business
An introduction to Defender for Business
Robert Crane
 
Introduction to Oracle Cloud
Introduction to Oracle CloudIntroduction to Oracle Cloud
Introduction to Oracle Cloud
johnnhernandez
 
Cloud computing and Cloud security fundamentals
Cloud computing and Cloud security fundamentalsCloud computing and Cloud security fundamentals
Cloud computing and Cloud security fundamentals
Viresh Suri
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overview
gjuljo
 
Azure fundamentals
Azure   fundamentalsAzure   fundamentals
Azure fundamentals
Raju Kumar
 
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Timothy McAliley
 
AWS Summit Seoul 2023 | 새로운 금융 서비스 출시 시 Agility 확보 방안
AWS Summit Seoul 2023 | 새로운 금융 서비스 출시 시 Agility 확보 방안AWS Summit Seoul 2023 | 새로운 금융 서비스 출시 시 Agility 확보 방안
AWS Summit Seoul 2023 | 새로운 금융 서비스 출시 시 Agility 확보 방안
Amazon Web Services Korea
 

What's hot (20)

Azure IAAS architecture with High Availability for beginners and developers -...
Azure IAAS architecture with High Availability for beginners and developers -...Azure IAAS architecture with High Availability for beginners and developers -...
Azure IAAS architecture with High Availability for beginners and developers -...
 
HPE InfoSight for Servers
HPE InfoSight for ServersHPE InfoSight for Servers
HPE InfoSight for Servers
 
Azure security and Compliance
Azure security and ComplianceAzure security and Compliance
Azure security and Compliance
 
Enterprise Mobility+Security Overview
Enterprise Mobility+Security Overview Enterprise Mobility+Security Overview
Enterprise Mobility+Security Overview
 
Azure 101
Azure 101Azure 101
Azure 101
 
High Availability in Microsoft Azure
High Availability in Microsoft AzureHigh Availability in Microsoft Azure
High Availability in Microsoft Azure
 
Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...
Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...
Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...
 
Develop an Enterprise-wide Cloud Adoption Strategy – Chris Merrigan
Develop an Enterprise-wide Cloud Adoption Strategy – Chris MerriganDevelop an Enterprise-wide Cloud Adoption Strategy – Chris Merrigan
Develop an Enterprise-wide Cloud Adoption Strategy – Chris Merrigan
 
Azure Virtual Desktop Overview.pptx
Azure Virtual Desktop Overview.pptxAzure Virtual Desktop Overview.pptx
Azure Virtual Desktop Overview.pptx
 
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!
 
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
 
Azure migration
Azure migrationAzure migration
Azure migration
 
Understanding Azure AD
Understanding Azure ADUnderstanding Azure AD
Understanding Azure AD
 
An introduction to Defender for Business
An introduction to Defender for BusinessAn introduction to Defender for Business
An introduction to Defender for Business
 
Introduction to Oracle Cloud
Introduction to Oracle CloudIntroduction to Oracle Cloud
Introduction to Oracle Cloud
 
Cloud computing and Cloud security fundamentals
Cloud computing and Cloud security fundamentalsCloud computing and Cloud security fundamentals
Cloud computing and Cloud security fundamentals
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overview
 
Azure fundamentals
Azure   fundamentalsAzure   fundamentals
Azure fundamentals
 
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
 
AWS Summit Seoul 2023 | 새로운 금융 서비스 출시 시 Agility 확보 방안
AWS Summit Seoul 2023 | 새로운 금융 서비스 출시 시 Agility 확보 방안AWS Summit Seoul 2023 | 새로운 금융 서비스 출시 시 Agility 확보 방안
AWS Summit Seoul 2023 | 새로운 금융 서비스 출시 시 Agility 확보 방안
 

Viewers also liked

High Availability and Disaster Recovery
High Availability and Disaster RecoveryHigh Availability and Disaster Recovery
High Availability and Disaster RecoveryAkelios
 
Disaster recovery plan (DRP)
Disaster recovery plan (DRP)Disaster recovery plan (DRP)
Disaster recovery plan (DRP)
КРОК
 
High Availability in 37 Easy Steps
High Availability in 37 Easy StepsHigh Availability in 37 Easy Steps
High Availability in 37 Easy Steps
Tim Serong
 
План аварийного восстановления данных
План аварийного восстановления данныхПлан аварийного восстановления данных
План аварийного восстановления данных
Datamodel
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with Pacemaker
Kris Buytaert
 
Обеспечение непрерывности бизнеса и создание планов восстановления после аварии
Обеспечение непрерывности бизнеса и создание планов восстановления после аварииОбеспечение непрерывности бизнеса и создание планов восстановления после аварии
Обеспечение непрерывности бизнеса и создание планов восстановления после аварии
КРОК
 
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) Explained
Maciej Lasyk
 
Architecting for High Availability
Architecting for High AvailabilityArchitecting for High Availability
Architecting for High Availability
Amazon Web Services
 
High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStack
Kamesh Pemmaraju
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
Arthur Berezin
 
The A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster RecoveryThe A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster Recovery
Sirius
 
Business continuity & disaster recovery planning (BCP & DRP)
Business continuity & disaster recovery planning (BCP & DRP)Business continuity & disaster recovery planning (BCP & DRP)
Business continuity & disaster recovery planning (BCP & DRP)
Narudom Roongsiriwong, CISSP
 
Drp International Brochure Version 5.5[1]
Drp International Brochure Version 5.5[1]Drp International Brochure Version 5.5[1]
Drp International Brochure Version 5.5[1]
Catchphrasecopywriter
 
MENORA
MENORAMENORA
MENORA
Amran Aris
 
Top 10 DB2 Support Nightmares #9
Top 10 DB2 Support Nightmares  #9Top 10 DB2 Support Nightmares  #9
Top 10 DB2 Support Nightmares #9
Laura Hood
 
Linux Disaster Recovery Solutions
Linux Disaster Recovery SolutionsLinux Disaster Recovery Solutions
Linux Disaster Recovery Solutions
Gratien D'haese
 
A05
A05A05
DB2 High Availability für IBM Connections, Sametime oder Traveler
DB2 High Availability für IBM Connections, Sametime oder TravelerDB2 High Availability für IBM Connections, Sametime oder Traveler
DB2 High Availability für IBM Connections, Sametime oder Traveler
Nico Meisenzahl
 
Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applications
Himanshu Sahu
 
High availability solutions bakostech
High availability solutions bakostechHigh availability solutions bakostech
High availability solutions bakostech
Viktoria Bakos
 

Viewers also liked (20)

High Availability and Disaster Recovery
High Availability and Disaster RecoveryHigh Availability and Disaster Recovery
High Availability and Disaster Recovery
 
Disaster recovery plan (DRP)
Disaster recovery plan (DRP)Disaster recovery plan (DRP)
Disaster recovery plan (DRP)
 
High Availability in 37 Easy Steps
High Availability in 37 Easy StepsHigh Availability in 37 Easy Steps
High Availability in 37 Easy Steps
 
План аварийного восстановления данных
План аварийного восстановления данныхПлан аварийного восстановления данных
План аварийного восстановления данных
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with Pacemaker
 
Обеспечение непрерывности бизнеса и создание планов восстановления после аварии
Обеспечение непрерывности бизнеса и создание планов восстановления после аварииОбеспечение непрерывности бизнеса и создание планов восстановления после аварии
Обеспечение непрерывности бизнеса и создание планов восстановления после аварии
 
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) Explained
 
Architecting for High Availability
Architecting for High AvailabilityArchitecting for High Availability
Architecting for High Availability
 
High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStack
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
 
The A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster RecoveryThe A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster Recovery
 
Business continuity & disaster recovery planning (BCP & DRP)
Business continuity & disaster recovery planning (BCP & DRP)Business continuity & disaster recovery planning (BCP & DRP)
Business continuity & disaster recovery planning (BCP & DRP)
 
Drp International Brochure Version 5.5[1]
Drp International Brochure Version 5.5[1]Drp International Brochure Version 5.5[1]
Drp International Brochure Version 5.5[1]
 
MENORA
MENORAMENORA
MENORA
 
Top 10 DB2 Support Nightmares #9
Top 10 DB2 Support Nightmares  #9Top 10 DB2 Support Nightmares  #9
Top 10 DB2 Support Nightmares #9
 
Linux Disaster Recovery Solutions
Linux Disaster Recovery SolutionsLinux Disaster Recovery Solutions
Linux Disaster Recovery Solutions
 
A05
A05A05
A05
 
DB2 High Availability für IBM Connections, Sametime oder Traveler
DB2 High Availability für IBM Connections, Sametime oder TravelerDB2 High Availability für IBM Connections, Sametime oder Traveler
DB2 High Availability für IBM Connections, Sametime oder Traveler
 
Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applications
 
High availability solutions bakostech
High availability solutions bakostechHigh availability solutions bakostech
High availability solutions bakostech
 

Similar to HA & DR System Design - Concepts and Solution

Best practices in networks and infrastructure
Best practices in networks and infrastructureBest practices in networks and infrastructure
Best practices in networks and infrastructure
nicholas njoroge
 
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
Andrew Miller
 
Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top Practices
Brian Petrini
 
RIMS: Remote Infrastructure Management Services
RIMS: Remote Infrastructure Management Services RIMS: Remote Infrastructure Management Services
RIMS: Remote Infrastructure Management Services
Abhishek Agnihotry
 
NZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
NZS-4555 - IT Analytics Keynote - IT Analytics for the EnterpriseNZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
NZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
IBM z Systems Software - IT Service Management
 
MIRAI - Managing Industry Restructuring and Adoptions Inquisitively
MIRAI - Managing Industry Restructuring and Adoptions InquisitivelyMIRAI - Managing Industry Restructuring and Adoptions Inquisitively
MIRAI - Managing Industry Restructuring and Adoptions Inquisitively
QuEST Forum
 
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons LearnedVMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
VMworld
 
L10 Architecture Considerations
L10 Architecture ConsiderationsL10 Architecture Considerations
L10 Architecture Considerations
Ólafur Andri Ragnarsson
 
UnitOnePresentationSlides.pptx
UnitOnePresentationSlides.pptxUnitOnePresentationSlides.pptx
UnitOnePresentationSlides.pptx
BLACKSPAROW
 
Troux Presentation Austin Texas
Troux Presentation Austin TexasTroux Presentation Austin Texas
Troux Presentation Austin Texas
JoeFaghani
 
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docxCMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
mary772
 
Lessons Learned from AMI Deployments and Asset Management Readiness
Lessons Learned from AMI Deployments and Asset Management ReadinessLessons Learned from AMI Deployments and Asset Management Readiness
Lessons Learned from AMI Deployments and Asset Management Readiness
TESCO - The Eastern Specialty Company
 
What to expect from your IT People
What to expect from your IT PeopleWhat to expect from your IT People
What to expect from your IT People
Jason Caras
 
Building a Business Continuity Capability
Building a Business Continuity CapabilityBuilding a Business Continuity Capability
Building a Business Continuity Capability
Rod Davis
 
DATA CENTER AND BUSINESS COMMUNITY
DATA CENTER AND BUSINESS COMMUNITYDATA CENTER AND BUSINESS COMMUNITY
DATA CENTER AND BUSINESS COMMUNITY
Anil Chaurasiya
 
BiznetGio Presentation Business Continuity
BiznetGio Presentation Business ContinuityBiznetGio Presentation Business Continuity
BiznetGio Presentation Business Continuity
Yusuf Hadiwinata Sutandar
 
How much does it cost to be Secure?
How much does it cost to be Secure?How much does it cost to be Secure?
How much does it cost to be Secure?mbmobile
 
Top Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.comTop Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.com
Pawan Sharma
 
Expectations in DRAAS from CSP
Expectations in DRAAS from CSPExpectations in DRAAS from CSP
Expectations in DRAAS from CSP
Continuity and Resilience
 
Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017
Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017
Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017
sandhibhide
 

Similar to HA & DR System Design - Concepts and Solution (20)

Best practices in networks and infrastructure
Best practices in networks and infrastructureBest practices in networks and infrastructure
Best practices in networks and infrastructure
 
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
 
Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top Practices
 
RIMS: Remote Infrastructure Management Services
RIMS: Remote Infrastructure Management Services RIMS: Remote Infrastructure Management Services
RIMS: Remote Infrastructure Management Services
 
NZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
NZS-4555 - IT Analytics Keynote - IT Analytics for the EnterpriseNZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
NZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
 
MIRAI - Managing Industry Restructuring and Adoptions Inquisitively
MIRAI - Managing Industry Restructuring and Adoptions InquisitivelyMIRAI - Managing Industry Restructuring and Adoptions Inquisitively
MIRAI - Managing Industry Restructuring and Adoptions Inquisitively
 
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons LearnedVMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
 
L10 Architecture Considerations
L10 Architecture ConsiderationsL10 Architecture Considerations
L10 Architecture Considerations
 
UnitOnePresentationSlides.pptx
UnitOnePresentationSlides.pptxUnitOnePresentationSlides.pptx
UnitOnePresentationSlides.pptx
 
Troux Presentation Austin Texas
Troux Presentation Austin TexasTroux Presentation Austin Texas
Troux Presentation Austin Texas
 
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docxCMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
CMGT410 v19Business Requirements TemplateCMGT410 v19Page 2.docx
 
Lessons Learned from AMI Deployments and Asset Management Readiness
Lessons Learned from AMI Deployments and Asset Management ReadinessLessons Learned from AMI Deployments and Asset Management Readiness
Lessons Learned from AMI Deployments and Asset Management Readiness
 
What to expect from your IT People
What to expect from your IT PeopleWhat to expect from your IT People
What to expect from your IT People
 
Building a Business Continuity Capability
Building a Business Continuity CapabilityBuilding a Business Continuity Capability
Building a Business Continuity Capability
 
DATA CENTER AND BUSINESS COMMUNITY
DATA CENTER AND BUSINESS COMMUNITYDATA CENTER AND BUSINESS COMMUNITY
DATA CENTER AND BUSINESS COMMUNITY
 
BiznetGio Presentation Business Continuity
BiznetGio Presentation Business ContinuityBiznetGio Presentation Business Continuity
BiznetGio Presentation Business Continuity
 
How much does it cost to be Secure?
How much does it cost to be Secure?How much does it cost to be Secure?
How much does it cost to be Secure?
 
Top Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.comTop Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.com
 
Expectations in DRAAS from CSP
Expectations in DRAAS from CSPExpectations in DRAAS from CSP
Expectations in DRAAS from CSP
 
Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017
Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017
Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017
 

More from Continuity and Resilience

The Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq Bajwa
The Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq BajwaThe Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq Bajwa
The Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq Bajwa
Continuity and Resilience
 
The Business Continuity Conference, 25th October 2023 in Riyadh - Nuha Eltinay
The Business Continuity Conference, 25th October 2023 in Riyadh - Nuha EltinayThe Business Continuity Conference, 25th October 2023 in Riyadh - Nuha Eltinay
The Business Continuity Conference, 25th October 2023 in Riyadh - Nuha Eltinay
Continuity and Resilience
 
The Business Continuity Conference, 25th October 2023 in Riyadh - Paul Gant
The Business Continuity Conference, 25th October 2023 in Riyadh -  Paul GantThe Business Continuity Conference, 25th October 2023 in Riyadh -  Paul Gant
The Business Continuity Conference, 25th October 2023 in Riyadh - Paul Gant
Continuity and Resilience
 
The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...
The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...
The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...
Continuity and Resilience
 
The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...
The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...
The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...
Continuity and Resilience
 
DEFLUFFING RESILIENCE
DEFLUFFING RESILIENCEDEFLUFFING RESILIENCE
DEFLUFFING RESILIENCE
Continuity and Resilience
 
CREATING AND MAINTAINING A BCM PROGRAM
CREATING AND MAINTAINING A BCM PROGRAMCREATING AND MAINTAINING A BCM PROGRAM
CREATING AND MAINTAINING A BCM PROGRAM
Continuity and Resilience
 
BCM Challenges and Compliance
BCM Challenges and Compliance BCM Challenges and Compliance
BCM Challenges and Compliance
Continuity and Resilience
 
Thriving in the Crisis Situation
Thriving in the Crisis SituationThriving in the Crisis Situation
Thriving in the Crisis Situation
Continuity and Resilience
 
Cyber Security & IT Resilience
Cyber Security & IT Resilience Cyber Security & IT Resilience
Cyber Security & IT Resilience
Continuity and Resilience
 
Enterprise Resilience
Enterprise ResilienceEnterprise Resilience
Enterprise Resilience
Continuity and Resilience
 
Advancing the Enterprise Towards Enterprise Resilience
Advancing the Enterprise Towards Enterprise ResilienceAdvancing the Enterprise Towards Enterprise Resilience
Advancing the Enterprise Towards Enterprise Resilience
Continuity and Resilience
 
Bcm is all about people!
Bcm   is all about people!Bcm   is all about people!
Bcm is all about people!
Continuity and Resilience
 
SAMA BCM Framework
SAMA BCM Framework SAMA BCM Framework
SAMA BCM Framework
Continuity and Resilience
 
Value of Work Place Services in the Middle East
Value of Work Place Services in the Middle EastValue of Work Place Services in the Middle East
Value of Work Place Services in the Middle East
Continuity and Resilience
 
Social Media Influence in the field of Crisis Management– Case Studies
Social Media Influence in the field of Crisis Management– Case StudiesSocial Media Influence in the field of Crisis Management– Case Studies
Social Media Influence in the field of Crisis Management– Case Studies
Continuity and Resilience
 
Cyber Resilience Tips and Techniques For Protection & Response
Cyber ResilienceTips and Techniques For Protection & Response Cyber ResilienceTips and Techniques For Protection & Response
Cyber Resilience Tips and Techniques For Protection & Response
Continuity and Resilience
 
Business Continuity and Information Security- An Excellent Fit!
Business Continuity and Information Security- An Excellent Fit!Business Continuity and Information Security- An Excellent Fit!
Business Continuity and Information Security- An Excellent Fit!
Continuity and Resilience
 
Crisis Communication & BCM in Aviation Sector
Crisis Communication & BCM in Aviation SectorCrisis Communication & BCM in Aviation Sector
Crisis Communication & BCM in Aviation Sector
Continuity and Resilience
 
Effectiveness of Disaster Management Ground Reality and Potential.
Effectiveness of Disaster Management Ground Reality and Potential.Effectiveness of Disaster Management Ground Reality and Potential.
Effectiveness of Disaster Management Ground Reality and Potential.
Continuity and Resilience
 

More from Continuity and Resilience (20)

The Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq Bajwa
The Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq BajwaThe Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq Bajwa
The Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq Bajwa
 
The Business Continuity Conference, 25th October 2023 in Riyadh - Nuha Eltinay
The Business Continuity Conference, 25th October 2023 in Riyadh - Nuha EltinayThe Business Continuity Conference, 25th October 2023 in Riyadh - Nuha Eltinay
The Business Continuity Conference, 25th October 2023 in Riyadh - Nuha Eltinay
 
The Business Continuity Conference, 25th October 2023 in Riyadh - Paul Gant
The Business Continuity Conference, 25th October 2023 in Riyadh -  Paul GantThe Business Continuity Conference, 25th October 2023 in Riyadh -  Paul Gant
The Business Continuity Conference, 25th October 2023 in Riyadh - Paul Gant
 
The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...
The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...
The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...
 
The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...
The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...
The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...
 
DEFLUFFING RESILIENCE
DEFLUFFING RESILIENCEDEFLUFFING RESILIENCE
DEFLUFFING RESILIENCE
 
CREATING AND MAINTAINING A BCM PROGRAM
CREATING AND MAINTAINING A BCM PROGRAMCREATING AND MAINTAINING A BCM PROGRAM
CREATING AND MAINTAINING A BCM PROGRAM
 
BCM Challenges and Compliance
BCM Challenges and Compliance BCM Challenges and Compliance
BCM Challenges and Compliance
 
Thriving in the Crisis Situation
Thriving in the Crisis SituationThriving in the Crisis Situation
Thriving in the Crisis Situation
 
Cyber Security & IT Resilience
Cyber Security & IT Resilience Cyber Security & IT Resilience
Cyber Security & IT Resilience
 
Enterprise Resilience
Enterprise ResilienceEnterprise Resilience
Enterprise Resilience
 
Advancing the Enterprise Towards Enterprise Resilience
Advancing the Enterprise Towards Enterprise ResilienceAdvancing the Enterprise Towards Enterprise Resilience
Advancing the Enterprise Towards Enterprise Resilience
 
Bcm is all about people!
Bcm   is all about people!Bcm   is all about people!
Bcm is all about people!
 
SAMA BCM Framework
SAMA BCM Framework SAMA BCM Framework
SAMA BCM Framework
 
Value of Work Place Services in the Middle East
Value of Work Place Services in the Middle EastValue of Work Place Services in the Middle East
Value of Work Place Services in the Middle East
 
Social Media Influence in the field of Crisis Management– Case Studies
Social Media Influence in the field of Crisis Management– Case StudiesSocial Media Influence in the field of Crisis Management– Case Studies
Social Media Influence in the field of Crisis Management– Case Studies
 
Cyber Resilience Tips and Techniques For Protection & Response
Cyber ResilienceTips and Techniques For Protection & Response Cyber ResilienceTips and Techniques For Protection & Response
Cyber Resilience Tips and Techniques For Protection & Response
 
Business Continuity and Information Security- An Excellent Fit!
Business Continuity and Information Security- An Excellent Fit!Business Continuity and Information Security- An Excellent Fit!
Business Continuity and Information Security- An Excellent Fit!
 
Crisis Communication & BCM in Aviation Sector
Crisis Communication & BCM in Aviation SectorCrisis Communication & BCM in Aviation Sector
Crisis Communication & BCM in Aviation Sector
 
Effectiveness of Disaster Management Ground Reality and Potential.
Effectiveness of Disaster Management Ground Reality and Potential.Effectiveness of Disaster Management Ground Reality and Potential.
Effectiveness of Disaster Management Ground Reality and Potential.
 

Recently uploaded

Best steel industrial company LLC in UAE
Best steel industrial company LLC in UAEBest steel industrial company LLC in UAE
Best steel industrial company LLC in UAE
alafnanmetals
 
Best Catering Event Planner Miso-Hungry.pptx
Best Catering Event Planner  Miso-Hungry.pptxBest Catering Event Planner  Miso-Hungry.pptx
Best Catering Event Planner Miso-Hungry.pptx
Miso Hungry
 
Are Gutters Necessary? Explore the details now!
Are Gutters Necessary? Explore the details now!Are Gutters Necessary? Explore the details now!
Are Gutters Necessary? Explore the details now!
AmeliaLauren3
 
What Are the Latest Trends in Endpoint Security for 2024?
What Are the Latest Trends in Endpoint Security for 2024?What Are the Latest Trends in Endpoint Security for 2024?
What Are the Latest Trends in Endpoint Security for 2024?
VRS Technologies
 
Importance of BWTS in the Maritime Industry
Importance of BWTS in the Maritime IndustryImportance of BWTS in the Maritime Industry
Importance of BWTS in the Maritime Industry
Blessed Marine Automation
 
All Trophies at Trophy-World Malaysia | Custom Trophies & Plaques Supplier
All Trophies at Trophy-World Malaysia | Custom Trophies & Plaques SupplierAll Trophies at Trophy-World Malaysia | Custom Trophies & Plaques Supplier
All Trophies at Trophy-World Malaysia | Custom Trophies & Plaques Supplier
Trophy-World Malaysia Your #1 Rated Trophy Supplier
 
How Does Littering Affect the Environment.
How Does Littering Affect the Environment.How Does Littering Affect the Environment.
How Does Littering Affect the Environment.
ClenliDirect
 
Top Email Marketing Trends to Watch in 2024
Top Email Marketing Trends to Watch in 2024Top Email Marketing Trends to Watch in 2024
Top Email Marketing Trends to Watch in 2024
time4servers technologies
 
Inspect Edge & NSPIRE Inspection Application - Streamline Housing Inspections
Inspect Edge & NSPIRE Inspection Application - Streamline Housing InspectionsInspect Edge & NSPIRE Inspection Application - Streamline Housing Inspections
Inspect Edge & NSPIRE Inspection Application - Streamline Housing Inspections
inspectedge1
 
The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...
The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...
The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...
Softradix Technologies
 
Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...
Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...
Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...
RNayak3
 
DOJO Training Center - Empowering Workforce Excellence
DOJO Training Center - Empowering Workforce ExcellenceDOJO Training Center - Empowering Workforce Excellence
DOJO Training Center - Empowering Workforce Excellence
Himanshu
 
SECUREX UK FOR SECURITY SERVICES AND MOBILE PATROL
SECUREX UK FOR SECURITY SERVICES AND MOBILE PATROLSECUREX UK FOR SECURITY SERVICES AND MOBILE PATROL
SECUREX UK FOR SECURITY SERVICES AND MOBILE PATROL
securexukweb
 
Colors of Wall Paint and Their Mentally Properties.pptx
Colors of Wall Paint and Their Mentally Properties.pptxColors of Wall Paint and Their Mentally Properties.pptx
Colors of Wall Paint and Their Mentally Properties.pptx
Brendon Jonathan
 
Waikiki Sunset Catamaran ! MAITAI Catamaran
Waikiki Sunset Catamaran !  MAITAI CatamaranWaikiki Sunset Catamaran !  MAITAI Catamaran
Waikiki Sunset Catamaran ! MAITAI Catamaran
maitaicatamaran
 
SIMBA SQUAD : Best seo company in perth
SIMBA SQUAD :  Best seo company in perthSIMBA SQUAD :  Best seo company in perth
SIMBA SQUAD : Best seo company in perth
ridebiler
 
BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...
BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...
BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...
gitapress3
 
Comprehensive Water Damage Restoration Services
Comprehensive Water Damage Restoration ServicesComprehensive Water Damage Restoration Services
Comprehensive Water Damage Restoration Services
kleenupdisaster
 
Bulk SMS Service Provider In Mumbai | sms2orbit
Bulk SMS Service Provider In Mumbai | sms2orbitBulk SMS Service Provider In Mumbai | sms2orbit
Bulk SMS Service Provider In Mumbai | sms2orbit
Orbit Messaging Hub
 
Delightful Finds: Unveiling the Power of Gifts Under 100
Delightful Finds: Unveiling the Power of Gifts Under 100Delightful Finds: Unveiling the Power of Gifts Under 100
Delightful Finds: Unveiling the Power of Gifts Under 100
JoyTree Global
 

Recently uploaded (20)

Best steel industrial company LLC in UAE
Best steel industrial company LLC in UAEBest steel industrial company LLC in UAE
Best steel industrial company LLC in UAE
 
Best Catering Event Planner Miso-Hungry.pptx
Best Catering Event Planner  Miso-Hungry.pptxBest Catering Event Planner  Miso-Hungry.pptx
Best Catering Event Planner Miso-Hungry.pptx
 
Are Gutters Necessary? Explore the details now!
Are Gutters Necessary? Explore the details now!Are Gutters Necessary? Explore the details now!
Are Gutters Necessary? Explore the details now!
 
What Are the Latest Trends in Endpoint Security for 2024?
What Are the Latest Trends in Endpoint Security for 2024?What Are the Latest Trends in Endpoint Security for 2024?
What Are the Latest Trends in Endpoint Security for 2024?
 
Importance of BWTS in the Maritime Industry
Importance of BWTS in the Maritime IndustryImportance of BWTS in the Maritime Industry
Importance of BWTS in the Maritime Industry
 
All Trophies at Trophy-World Malaysia | Custom Trophies & Plaques Supplier
All Trophies at Trophy-World Malaysia | Custom Trophies & Plaques SupplierAll Trophies at Trophy-World Malaysia | Custom Trophies & Plaques Supplier
All Trophies at Trophy-World Malaysia | Custom Trophies & Plaques Supplier
 
How Does Littering Affect the Environment.
How Does Littering Affect the Environment.How Does Littering Affect the Environment.
How Does Littering Affect the Environment.
 
Top Email Marketing Trends to Watch in 2024
Top Email Marketing Trends to Watch in 2024Top Email Marketing Trends to Watch in 2024
Top Email Marketing Trends to Watch in 2024
 
Inspect Edge & NSPIRE Inspection Application - Streamline Housing Inspections
Inspect Edge & NSPIRE Inspection Application - Streamline Housing InspectionsInspect Edge & NSPIRE Inspection Application - Streamline Housing Inspections
Inspect Edge & NSPIRE Inspection Application - Streamline Housing Inspections
 
The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...
The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...
The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...
 
Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...
Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...
Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...
 
DOJO Training Center - Empowering Workforce Excellence
DOJO Training Center - Empowering Workforce ExcellenceDOJO Training Center - Empowering Workforce Excellence
DOJO Training Center - Empowering Workforce Excellence
 
SECUREX UK FOR SECURITY SERVICES AND MOBILE PATROL
SECUREX UK FOR SECURITY SERVICES AND MOBILE PATROLSECUREX UK FOR SECURITY SERVICES AND MOBILE PATROL
SECUREX UK FOR SECURITY SERVICES AND MOBILE PATROL
 
Colors of Wall Paint and Their Mentally Properties.pptx
Colors of Wall Paint and Their Mentally Properties.pptxColors of Wall Paint and Their Mentally Properties.pptx
Colors of Wall Paint and Their Mentally Properties.pptx
 
Waikiki Sunset Catamaran ! MAITAI Catamaran
Waikiki Sunset Catamaran !  MAITAI CatamaranWaikiki Sunset Catamaran !  MAITAI Catamaran
Waikiki Sunset Catamaran ! MAITAI Catamaran
 
SIMBA SQUAD : Best seo company in perth
SIMBA SQUAD :  Best seo company in perthSIMBA SQUAD :  Best seo company in perth
SIMBA SQUAD : Best seo company in perth
 
BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...
BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...
BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...
 
Comprehensive Water Damage Restoration Services
Comprehensive Water Damage Restoration ServicesComprehensive Water Damage Restoration Services
Comprehensive Water Damage Restoration Services
 
Bulk SMS Service Provider In Mumbai | sms2orbit
Bulk SMS Service Provider In Mumbai | sms2orbitBulk SMS Service Provider In Mumbai | sms2orbit
Bulk SMS Service Provider In Mumbai | sms2orbit
 
Delightful Finds: Unveiling the Power of Gifts Under 100
Delightful Finds: Unveiling the Power of Gifts Under 100Delightful Finds: Unveiling the Power of Gifts Under 100
Delightful Finds: Unveiling the Power of Gifts Under 100
 

HA & DR System Design - Concepts and Solution

  • 1. Continuity and Resilience (CORE) ISO 22301 BCM Consulting Firm Presentations by our partners and extended team of industry experts Our Contact Details: INDIA UAE Continuity and Resilience Level 15,Eros Corporate Tower Nehru Place ,New Delhi-110019 Tel: +91 11 41055534/ +91 11 41613033 Fax: ++91 11 41055535 Email: neha@continuityandresilience.com Continuity and Resilience P. O. Box 127557 Abu Dhabi, United Arab Emirates Mobile:+971 50 8460530 Tel: +971 2 8152831 Fax: +971 2 8152888 Email: info@continuityandresilience.com
  • 2. H A & D R Design Concepts S Seshadri Head – IT DR & Service Management Continuity and Resilience 10th Feb, 2014 Dubai 2
  • 3. Outage Categorization • Service failures that should/need not be known to end users need ‘fault protection’ – the operation of such services will be continuous despite failure scenarios • Short interruptions (within a few hours) are referred to as ‘minor outages’ • Longer interruptions, when end users’ business services get delayed for longer durations, are termed as disaster situations or ‘major outages’ 3
  • 4. Key Questions 1. Which systems should ‘never’ fail – we may need Fault Tolerant systems in their place 2. What failures should be handled transparently, where an outage must not occur? Against such failures we need fault protection. 3. How long may a short-term interruption be that happens once a day, once a week, or once a month? Such interruptions are called minor outages. 4. How long may a long-term interruption be that happens very seldom and is related to serious damage to the IT system? For instance, when will this cause a big business impact, also called a major outage or disaster? 5. How much data may be lost during a major outage? And in which state – persistent or ephemeral… 6. What failures are deemed so improbable that they will not be handled, or what failures are beyond the scope of a project? 4
  • 5. Business Issues & Cost of IT Outage • IT Fault Protection has to be driven by business considerations • Business Continuity is the overall goal • Business imperatives manifest through BIA/RA and MTPoD/RTO/RPO • IT Outage is not the real issue, but the business consequences are • IT Outage affects revenues & costs adversely • Direct Costs – repairs, penalties, lost revenue • Indirect Costs – lost & additional work hours 5
  • 6. Cost Vs Benefit • IT Recovery has extensive cost implications – both in terms of Capex and Opex • Strategies developed should be cost effective • ‘Technology for the sake of Technology’ approach should be completely avoided • Strategies should, as far as possible, be able to address disruptions and impacts collectively • Organizational objectives and risk appetite should direct recovery strategies • Legal, contractual and regulatory aspects play a major role (SOX, SAS 70, BASEL II/III…..) 6
  • 7. IT Service Outage • Importance of IT Services depends on – Business relevance – Revenues – Functionality that they enable – Amount of damage due to the outage – Any regulatory aspect that demands the service • Outage Categorization is dictated by the importance of the service and hence the significance of its failure 7
  • 8. High Availability • High availability is the characteristic of a system to protect against or recover from minor outages in a short time frame with largely automated means. • HA has 3 essential features – Outage categorization is ‘minor’- we need to envisage potential failure scenarios for the service and the minor outage requirements for them - robustness – System category should involve Mission Critical & Business Important and Business Foundation processes which need to be recovered within a very short time – RTO/RPO – Component (SPoF) level protection which will facilitate automatic recovery – redundancy • HA features are normally built within the primary data center and data replication is synchronous 8
  • 9. Continuous Availability • Continuous Availability is the highest point of High Availability, wherein, every component failure is protected against, and no ‘after failure recovery’ takes place • These are known as Fault Tolerant systems, that provide automatic, high-speed ‘failover’ in the case of h/w or s/w failures • They have ‘internal multi-computer systems architecture’ that have no shared central components, including memory • Tandem’s ‘non-stop’ systems and Stratus’s fault tolerant computers are examples of this • These are used by the leading stock exchanges globally (NSE in India uses Stratus and BSE, Tandem), and by banks for their ATM related transaction processing • These systems scale extremely well to the largest commercial workloads • These systems were introduced originally by Airbus for their A-320 planes for on-board flight controls In their long duration flights
  • 10. HA Components Essential ingredients of High Availability are: • Availability • Reliability • Serviceability We will discuss the above three in the following slides. 10
  • 11. Availability & Metrics • Availability – How long a service or system component is available for use and the features that help the system to stay operational despite occurrence of failures, eg. NIC, Mirrored Disks, Redundant Power Supply • Availability = uptime/uptime+downtime • Downtime will include scheduled downtime also • Elapsed time can be measured as wall clock time • Availability can be expressed in absolute numbers (79 hrs out of 80 hrs or as a percentage (99.89%) • Availability = MTBF/MTBF+MTTR (????) – MTBF: Mean Time Between Failures – MTTR: Mean Time To Repair 11
  • 12. Reliability & Metrics • Reliability is a measure of ‘fault avoidance’ • Refers to the ‘probability that a system will be available over a time interval T’ • MTBF is a measure of Reliability • Annual Failure Rate (AFR) is the inverse of MTBF • Reliability features help to ‘prevent’ and ‘detect’ failures • H/w reliability has tremendously improved over the last 30 years and they are highly resilient nowadays Component MTBF (Hours) MTBF (Years) AFR (per year) Disk Drive 300,000 34 0.0292 Power Supply 150,000 17 0.0584 Fan 250,000 28 0.0350 NIC 200,000 23 0.0438 12
  • 13. Serviceability • Measurement that expresses how easily and quickly a system is serviced and repaired • The lower the planned service time, the higher is the availability • Planned serviceability goes into the architecture as a design objective • Actual serviceability should be lower than planned serviceability • These clauses have to be carefully built into the Service Level Agreements with IT vendors • Murphy’s Law: Anything that can possibly go wrong, does 13
  • 14. HA/DR Strategy - Aspects • Data – what is the architecture concerned with • Function – how is the data worked with • Location – where is the data worked with • People – who works with the data and achieve the functionality • Time – when is the data processed Each of the above aspects are run through 3 levels of abstraction • Objectives – What will this achieve vis a vis org objectives • Conceptual Model – Realization of the objectives on a business process level • System Model – Logical data model and the application functions that must be implemented to realize the business concepts 14
  • 15. HA/DR Framework (Zachman) Objectives Conceptual Model System Model Data (What) Business Continuity / IT Service Continuity Availability of mission- critical and important business services ICT categories, dependency diagrams Function (How) Map biz processes to IT services, RTO, RPO, SLA ITIL processes, IT processes, projects Design patterns – RAS, redundancy, backup, replication, virtualization Location (Where) Internal (IT), Outsourced Data Center, Disaster Recovery Center All systems, all categories People (Who) Biz process owner CIO/IT dept IT PM, Architect, System Engineers, System Administrators Time (When) Implementation Plan Outage scenarios, categories Failure/Change/ Incident/Problem /Disaster 15
  • 16. HA/DR System Design • System Model discussed earlier is the core of this activity • ‘What’ and ‘How’ of the System Model will lay the foundation for HA/DR System Design • Protection against outages of computers, systems and databases are in scope for HA • Protection against infra/building/city/ outage, user/administrative errors are in scope for DR • Sound processes, solid architecture, careful engineering and an eye for details are the hall marks of a good HA/DR system design 16
  • 17. HA/DR Touch Points • User Environment • Administration Environment • Application • Middleware • Network Infrastructure • Operating System • Hardware (Servers, Storage, Backups etc) • Physical Environment (Power, Fire, Floods etc) 17
  • 18. HA/DR Scoping • Take into account regulatory aspects (SOX, SAS, Basel II) • Identify the key applications (from business BIAs) • Check out the various ICT environments required by these applications (IT BIA) • Identify the dependencies • Carefully identify and document the component categories that are not required – scope exclusions • Prepare preliminary system scope – list of component categories required for HA/DR • Identify failure scenarios for each of these component categories • Document the failure scenarios that are outside the scope • The component categories and the failure scenarios will constitute the scope of HA/DR 18
  • 19. Redundancy & Replication • Redundancy is the ability to continue operations in the case of component failures • Recovery is done through ‘managed component repetition’ • Eliminating ‘single points of failure’ is the goal • Just adding a second component is not enough • Replicated component has to be ‘managed’ to take over in case the original component fails (failover) • This ‘management’ can be automated or manual • Replication of the ‘state’ of the component is crucial • Replication may be a duplicate part, an alternate system (HA) or an alternate location (DR) • 100% redundancy through replication is very expensive and difficult to achieve 19
  • 20. Data Replication • Redundancy for Disk Drives means ‘data replication’ and hence very crucial • Redundant disks provide multiple storage of data and/or OS • Data disks carry one of the highest risks • OS disks usually house the root file system and swap space • Data Replication can be ‘synchronous’ or ‘asynchronous’ • RPO considerations should dictate data replication approach • For very low or nil RPO, latency in data replication may not be tolerated (synchronous vs asynchronous) • Bandwidth considerations also impact replication • Data Deduplication technology in recent times along with data compression has reduced much of the headaches involved with data replication • Two main types of date replication – Host based/Storage based 20
  • 21. Virtualization • Virtualization, as a concept, was demonstrated in 1960s , when IBM’s Thomas J Watson Research Center simulated ‘multiple pseudo machines’ on a single 7044 MX Mainframe • Virtualization allows multiple operating system (OS) instances to run concurrently on a single computer. • It is a means of separating hardware from a single OS, by “inserting an abstraction layer” into the software stack. • Each ‘Guest’ OS is managed by a Virtual Machine Monitor. • Virtualization Software can also collect a number of separate resources and “pool” them, even if the devices or resources remain in separate physical locations. • The end goal is sharing the resources and capabilities flexibly, under software control. • The part of the virtualization package that enables to interact with and control the VMs is referred to as the Virtual Machine Monitor (VMM) or Hypervisor software. 21
  • 22. Virtualization of Resources • They supply resources in logical units to application programs and free them from reliance on specific hardware • Virtualization of Servers allows business to consolidate the workloads running on multiple servers to just a FEW • Storage Virtualization hides the physical storage from applications on host systems, and presents a simplified (logical) view to the applications and allows them to reference the storage resource by its common name whereas the actual storage could be on a complex, multilayered, multipath storage networks. • RAID is an early example of storage virtualization. • Virtual CPU is one of the oldest concepts, which has enabled multiprocessing capability, handled by OS • Virtual Memory is as old as Virtual CPU – again handled by the OS as part of Virtual Memory Management • Working within a virtualized environment may add some options and new flexibility to your HA and DR plans. 22
  • 23. Storage Virtualization • With regard to storage, the objective is to bring together multiple storage devices under unified command, whether they are from the same manufacturer or not, and without regard for their physical locations. • Once accomplished, the now-unified band of storage systems can be treated as a single, huge storage capacity that can be provisioned, managed, backed up to tape, and even replicated to offsite disaster recovery (DR) or high availability (HA) sites, with greater visibility, synchronized automation, and reduced management labour. • Even archiving, multi-level storage, and information lifecycle management (ILM) efforts can be made simpler, with older, slower, or cheaper storage units provisioned to handle the near-line or archival storage while newer, faster devices handle the current production processes. 23
  • 24. Host Clustering • Increasing availability through redundancy on the host level by taking several hosts and using them to supply a bunch of services, where each service is not strictly associated with a specific computer • Host Clustering addresses – Hardware errors – OS errors – Application errors • Failover clusters , which allow a service to migrate from one host to another in the case of an error. They are the most used technology for high availability. • Load-balancing clusters, which run a service on multiple hosts from the start and handle outages of a host – more relevant for performance than HA. 24
  • 25. Middleware • Generally considered to be the layer between the OS and the applications • They are independent of applications but carry application- specific configuration and used by multiple applications • Database Servers, Web Servers, Application Servers, Messaging Servers are some examples • HA for these will include product specific clustering, data replication, and even session state replication • Properly configured failover cluster sufficiently integrated with the DB Server provides HA • Redo log file shipping (asynchronous) with commits delayed by the RPO will provide the best DR • HA for Web Servers and Messaging Servers are achieved mostly through Load-balancing Clusters (stateless) 25
  • 26. HA for Applications • Application HA is the eventual goal • Application categories – Off the Shelf, Bought & Customized, In-house Built • Failover cluster is an approach most commonly adopted for all categories of applications • Applications touch the nerve center of all the following systems: – Development – Acceptance/Integration Test – Staging & Release – Production – Disaster Recovery • Suitable precautions must be taken while coding/testing stages to ensure HA 26
  • 27. Networks • Network is the backbone of ICT as it provides the linkages and ability to communicate between component categories • Various types of networks are – LAN, VLAN, MAN, WAN, VPN, Intranet, Extranet, Internet • And there are n/w components that help build and run the networks – NIC, switches, routers, hubs, firewalls etc. • Connectivity is the most major element of networks • Data management on the network is done through encoding, data compression & encryption/decryption • Power supply, Heating, Ventilating & Air Conditioning (HVAC) are two other important considerations • It is absolutely essential to provide redundancies at each of the network and component level/s for network HA • Generally, there is no pay-load based state for any of these – hence two or more devices would ensure HA 27
  • 28. Data Back up and Restoration • A major requisite for HA & DR • Management of backed up data is equally important • Restoration of data must work effectively • Automated mechanisms exist • System/file/database backups are the key • Full or incremental backup • Consistency of the data state is crucial • Checkpoint functionality is useful in this context • Storage and handling of backup media is very significant • Remote (including at the DR site) storage of backups including Tape Vaulting should be institutionalized • Testing/recycling and proper maintenance of backup media • Backup on failover clusters should distinguish between physical and logical hosts in the cluster 28
  • 29. HA & DR – Positioning • HA and DR are two sides of the same coin • Redundancy, Replication and Robustness are the key characteristics of both HA & DR • HA focuses on fault protection and is built on mostly automated recovery techniques for minor outages • HA is not built for environmental disasters like floods, fire, earthquake and manmade incidents like terrorist attacks, human errors of huge magnitude • The above additional scenarios and major outages lead to the need for DR, that focuses only on recovery • DR is also associated with a large part of manual recovery in terms of Emergency Management and Damage Assessment & Recovery apart from IT Recovery • When the primary data center is unavailable, migration to DR site will be the only option 29
  • 30. Disaster Recovery • Disaster recovery is the ability to continue with services in the case of major outages, often with reduced capabilities or performance. • Disaster recovery handles the disaster when either a single point of failure is the defect or when many components are damaged and the whole system is rendered non-functional. • Operations cannot be resumed on the same system or at the same site. Instead, a replacement or backup system, usually located at another place is activated and operations continue from there. • Disaster recovery often restores only restricted resources and thus restricted service levels. • Continuation of service also does not happen instantly, but will happen after some outage time. 30
  • 31. DR in Context • IT DR is activated when the likely recovery time is above the least RTO and there is expected data loss • IT recovery will be limited only by the agreed levels of service by the business owners • IT DR activities will be carried out of the DR site, which should be equipped fully to handle IT services upto agreed levels • Scaling up the IT services in due course of time will generally be outside the purview of DR Planning • Agreed levels of IT services are resumed in the DR site using the infrastructure and back up data/tapes there • The roles of primary and DR sites are interchangeable but not in the strict sense of HA • In the above scenario, both primary and DR sites will be functional, even though they may cater to different business activities/IT services 31
  • 32. DR and the Cloud • Cloud is the latest buzz word in outsourced business model • Leveraging cloud model can optimize DR procedures • Reduces the high cost of maintaining stand-by sites • Cloud service providers normally have state of the art systems and infrastructure, huge bandwidth, exacting security setup, apart from complying with relevant ISO guidelines and industry best standards. • According to recent Aberdeen study report, DR is the leading ‘use case’ for cloud • The key advantages are recovery times, virtualization and multi-site availability • Concerns regarding security, identity and compliance to various regulations do exist as the cloud model matures • With data volumes growing at the rate of 10 times every 5 years, cloud computing is likely to see a huge growth 32
  • 33. DR in the Supply Chain • Supply Chain is basically a delineation of dependencies depicting the various actors in the chain of a product or service from a vendor till reaching a consumer • IT DR dependencies are manifold – internal customers, ICT equipments, external vendors and service providers, IT staff, etc etc… • DR planning should judiciously take into account the inherent risks in the supply chain and provision suitable mechanisms to handle them effectively, so that the DR goal does not derail • Typically, if Data Center support is outsourced, there is a huge dependence on the Service Provider – timely availability of people, spares, replacements etc. • Supply chain glitches can emerge from as innocuous a thing as consumables supplies 33

Editor's Notes

  1. Division of our complete problem into the above layers enables us to think about potential problems and their solution separately This separation builds the base for HA/DR scoping
  2. Eliminations have to be documented and sign off obtained All the ‘scope exclusions’ must be recognized during risk management and might need to be handled in separate projects
  3. ‘State’ does not just refer to data Data state in DR situations generally differ due to accepted RPO These refer to files or registry entries in the case of s/w components and firmware releases in the case of h/w Disks are redundant via volume manager Primary and secondary databases are redundant via system administrator NIC is redundant through OS (multipath configuration) In the above cases, VM/SA/MC could be the SPoF
  4. Through virtualization, Hewlett Packard consolidated no fewer than 86 of its own data centers to just three. The actual server counts and consolidation ratios vary, A ratio of 10:1 is not uncommon.
  5. H/w: No redundancy/redundant component had an error/redundancy activation did not work OS: Process scheduling error/processes hanging/memory management deficiency/network traffic glitches/file system corruption Apps: Memory leaks due to applications getting into endless loop/deadlocks in communication processes/other software errors F/O clusters – active/active; active/passive Suitable for application with stateful data
  6. Eg.: No application must use machine specific configuration of the physical host (as it will not recognize a virtualized host or a cluster node) On exceptional conditions, apps must contain start/stop/restart actions Long batch jobs should have check points for validation at restart Application needs to be designed in a cluster environment Tiered development approach for applications – UI (front end), business logic (middleware) database (backend) ‘From the scratch’ applications should deploy fault-tolerant requirements Code quality is of paramount importance Testing – function point, non-functional properties and end-to-end
  7. Open Systems Interconnection Reference Model provides seven layers of abstraction for networks – Physical, Datalink, Network, Transport, Session, Presentation, Application Popular network protocols are Ethernet, TCP/IP, Token Ring, Frame Relay, ATM, FC, etc. Network Outage is generally considered a major outage WAN outages are major and it is almost impossible to prevent totally – question remains if these multiple connections are independent or if they share some SpoF Typically the ‘last mile’ and the ‘proverbial digger’ syndrome WAN Virtualization dangers ISPs - SLAs – penalties – goes on and on Other network based services like DHCP, DNS, LDAP, AD, Email, Print etc have to be redundant depending on the need
  8. SOPs should be in place in details for all backup and restoration processes Specific personal responsibilities should be assigned for backup duties
  9. Internal and external cloud Private and Public cloud