This document discusses concepts related to high availability and disaster recovery. It defines key terms like availability, reliability, outages, fault tolerance, and redundancy. It describes strategies for high availability including data replication, virtualization, host clustering, and ensuring reliability of network and middleware components. The document emphasizes the importance of basing HA/DR strategies and investments on business needs and conducting proper scoping and planning.
Disaster Recovery Planning using Azure Site RecoveryNitin Agarwal
Disaster recovery and business continuity solutions have been historically expensive and time consuming. Microsoft Azure Site Recovery (ASR) makes Disaster Recovery (DR) planning and implementation simpler and affordable for all types of organizations.
Join our team of cloud experts for a walk through of DR and ASR basics. We'll highlight best practices for ASR deployments and help you get a sense of the costs for implementing a solution.
Customer migration to azure sql database from on-premises SQL, for a SaaS app...George Walters
Why would someone take a working on-premises SaaS infrastructure, and migrate it to Azure? We review the technology decisions behind this conversion, and business choices behind migrating to Azure. The SQL 2012 infrastructure and application was migrated to PaaS Services. Finally, how would we do this architecture in 2019.
Introduction to Microsoft Enterprise Mobility + SecurityAntonioMaio2
Microsoft has given us some amazing capabilities with the Microsoft Enterprise Mobility + Security (EM+S) suite to help protect both our information and our investments in Office 365. This collection of features gives you just about everything you need in the Microsoft Cloud for security, compliance and Information Protection. With such a vast array of services, tools and features, its often challenging to understand everything this product provides or how its layered on top of existing Office 365 security controls. In this session we’ll review the capabilities available to you in Microsoft EM+S, and you'll discover which ones may best fit with your security and compliance needs. Come and join us, as we also dive deep into some of the most useful Microsoft EM+ S tools.
Disaster Recovery Planning using Azure Site RecoveryNitin Agarwal
Disaster recovery and business continuity solutions have been historically expensive and time consuming. Microsoft Azure Site Recovery (ASR) makes Disaster Recovery (DR) planning and implementation simpler and affordable for all types of organizations.
Join our team of cloud experts for a walk through of DR and ASR basics. We'll highlight best practices for ASR deployments and help you get a sense of the costs for implementing a solution.
Customer migration to azure sql database from on-premises SQL, for a SaaS app...George Walters
Why would someone take a working on-premises SaaS infrastructure, and migrate it to Azure? We review the technology decisions behind this conversion, and business choices behind migrating to Azure. The SQL 2012 infrastructure and application was migrated to PaaS Services. Finally, how would we do this architecture in 2019.
Introduction to Microsoft Enterprise Mobility + SecurityAntonioMaio2
Microsoft has given us some amazing capabilities with the Microsoft Enterprise Mobility + Security (EM+S) suite to help protect both our information and our investments in Office 365. This collection of features gives you just about everything you need in the Microsoft Cloud for security, compliance and Information Protection. With such a vast array of services, tools and features, its often challenging to understand everything this product provides or how its layered on top of existing Office 365 security controls. In this session we’ll review the capabilities available to you in Microsoft EM+S, and you'll discover which ones may best fit with your security and compliance needs. Come and join us, as we also dive deep into some of the most useful Microsoft EM+ S tools.
Azure IAAS architecture with High Availability for beginners and developers -...Malleswar Reddy
It covers Azure end to end architecture with High availability and various components in terms of functional and nonfunctional.
https://youtu.be/SUb-J9vHqPE
In this presentation, I have talked about Resiliency in Azure.
I have also talked about how you can do Azure VM Improvements and Maintenance. Along with that, I have also talked about Disaster Recovery with ASR.
Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...Edureka!
This Microsoft Azure Tutorial will get your basics right about Microsoft Azure. It starts from the basics, so shall be helpful to a beginner who doesn't know anything about Cloud Computing as well. Below are the topics covered in this tutorial:
1) What is Cloud?
2) What is Microsoft Azure?
3) Azure Job Trends
4) Different Domains in Azure
5) Azure Services
6) Azure Pricing Options
7) Demo on Azure
8) Azure Certifications
To take a structured training on Microsoft Azure, you can check complete details of our Microsoft Azure Certification Training course here: https://goo.gl/585NMJ
Develop an Enterprise-wide Cloud Adoption Strategy – Chris MerriganAmazon Web Services
Taking a cloud first approach requires a different approach than you probably had to consider for your initial few workloads in the cloud. You’ll be deploying hybrid environments, and that means taking a broad view of your IT strategy, architecture, and organisational design. In this session, we cover how the CAF framework offers practical guidance and comprehensive guidelines to enterprise organisations, particularly around roles, governance, and efficiency.
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!Johan Biere
Gain high level understanding of various challenges faced by organizations in planning their Migration and BC/DR Strategy for applications in Azure and Hybrid.
Learn about the capabilities that make Azure the ideal destination for your applications, data, and infrastructure. You will get clear, scenario-based guidance on how to approach your technical migration & innovation journeys. Understand how to migrate different on-premises applications to Azure, including moving them to Azure IaaS Using ASR with Hyper-V Assessments & Agentless migration.
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)Amazon Web Services
Learn how to create a compelling business case for a large-scale migration to AWS. We present a framework and tools for creating your business case, and guidelines for using AWS services to maximize value and optimize cost for migrations to the AWS Cloud. Learn a new way of thinking about cost that includes automation, new technologies, organizational change, and other factors.
새로운 금융 서비스의 기획과 신속한 출시를 위해서는 법령 대응과 차세대 애플리케이션 서비스 운영 환경 구성을 마련해야 합니다. LG CNS는 AWS 기반의 금융권 특화 인프라 구축, 개발자 친화적인 서비스 포털 개발과 애플리케이션 개발 표준 체계 수립을 통해 신한은행의 디지털 전환 가속화에 기여했습니다. 신한은행 New 개발 플랫폼 구축 프로젝트 사례를 통해 금융권 DX를 위한 IT 방향성을 공유하고자 합니다.
Семинар «Отказоустойчивость приложений – проблемы и простые решения. Выбор оптимального метода защиты для приложений различных классов».
Подробнее о мероприятии http://www.croc.ru/action/detail/1630/
Презентация Равиля Сафиуллина, менеджера проектов компании КРОК
Azure IAAS architecture with High Availability for beginners and developers -...Malleswar Reddy
It covers Azure end to end architecture with High availability and various components in terms of functional and nonfunctional.
https://youtu.be/SUb-J9vHqPE
In this presentation, I have talked about Resiliency in Azure.
I have also talked about how you can do Azure VM Improvements and Maintenance. Along with that, I have also talked about Disaster Recovery with ASR.
Microsoft Azure Tutorial | Microsoft Cloud Computing | Microsoft Azure Traini...Edureka!
This Microsoft Azure Tutorial will get your basics right about Microsoft Azure. It starts from the basics, so shall be helpful to a beginner who doesn't know anything about Cloud Computing as well. Below are the topics covered in this tutorial:
1) What is Cloud?
2) What is Microsoft Azure?
3) Azure Job Trends
4) Different Domains in Azure
5) Azure Services
6) Azure Pricing Options
7) Demo on Azure
8) Azure Certifications
To take a structured training on Microsoft Azure, you can check complete details of our Microsoft Azure Certification Training course here: https://goo.gl/585NMJ
Develop an Enterprise-wide Cloud Adoption Strategy – Chris MerriganAmazon Web Services
Taking a cloud first approach requires a different approach than you probably had to consider for your initial few workloads in the cloud. You’ll be deploying hybrid environments, and that means taking a broad view of your IT strategy, architecture, and organisational design. In this session, we cover how the CAF framework offers practical guidance and comprehensive guidelines to enterprise organisations, particularly around roles, governance, and efficiency.
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!Johan Biere
Gain high level understanding of various challenges faced by organizations in planning their Migration and BC/DR Strategy for applications in Azure and Hybrid.
Learn about the capabilities that make Azure the ideal destination for your applications, data, and infrastructure. You will get clear, scenario-based guidance on how to approach your technical migration & innovation journeys. Understand how to migrate different on-premises applications to Azure, including moving them to Azure IaaS Using ASR with Hyper-V Assessments & Agentless migration.
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)Amazon Web Services
Learn how to create a compelling business case for a large-scale migration to AWS. We present a framework and tools for creating your business case, and guidelines for using AWS services to maximize value and optimize cost for migrations to the AWS Cloud. Learn a new way of thinking about cost that includes automation, new technologies, organizational change, and other factors.
새로운 금융 서비스의 기획과 신속한 출시를 위해서는 법령 대응과 차세대 애플리케이션 서비스 운영 환경 구성을 마련해야 합니다. LG CNS는 AWS 기반의 금융권 특화 인프라 구축, 개발자 친화적인 서비스 포털 개발과 애플리케이션 개발 표준 체계 수립을 통해 신한은행의 디지털 전환 가속화에 기여했습니다. 신한은행 New 개발 플랫폼 구축 프로젝트 사례를 통해 금융권 DX를 위한 IT 방향성을 공유하고자 합니다.
Семинар «Отказоустойчивость приложений – проблемы и простые решения. Выбор оптимального метода защиты для приложений различных классов».
Подробнее о мероприятии http://www.croc.ru/action/detail/1630/
Презентация Равиля Сафиуллина, менеджера проектов компании КРОК
High Availability can be a curiously nebulous term, and most people probably don't care about it until they can't access their online banking service, or their plane crashes.
This presentation examines some of the considerations necessary when building highly available computer systems, then focuses on the HA infrastructure software currently available from the Corosync/OpenAIS, Linux-HA and Pacemaker projects.
Originally presented at Linux Users Victoria in April 2010 (http://luv.asn.au/2010/04/06)
Как писать и как его читать? Какие информационные системы критичны, а какие нет?
В какой последовательности сохранять и восстанавливать данные?
Что такое авария в вашей компании? Как производственные аварии отражаются в ИТ?
План аварийного восстановления данных это не готовый к употреблению документ, подходящей любой компании. Это набор методик и опыт консультанов как в сфере информационных технологий, так и сфере оценки рисков производственных систем.
Современная архитектура центров обработки данных, аварийная и предаварийная миграция данных, красная кнопка диспетчера.
I gave this talk at Krakow/Poland DevOPS meetup. It was a lightning talk covering subject of High Availability solutions, architecture, planning and deploying.
AWS provides a platform that is ideally suited for building highly available systems, enabling you to build reliable, affordable, fault-tolerant systems that operate with a minimal amount of human interaction. This session covers many of the high-availability and fault-tolerance concepts and features of the various services that you can use to build highly reliable and highly available applications in the AWS Cloud: architectures involving multiple Availability Zones, including EC2 best practices and RDS Multi-AZ deployments; loosely coupled and self-healing systems involving SQS and Auto Scaling; networking best practices for high availability, including Elastic IP addresses, load balancing, and DNS; leveraging services that inherently are built with high-availability and fault tolerance in mind, including S3, Elastic Beanstalk and more.
The primary requirements for OpenStack based clouds (public, private or hybrid) is that they must be massively scalable and highly available. There are a number of interrelated concepts which make the understanding and implementation of HA complex. The potential for not implementing HA correctly would be disastrous.
This session was presented at the OpenStack Meetup in Boston Feb 2014. We discussed interrelated concepts as a basis for implementing HA and examples of HA for MySQL, Rabbit MQ and the OpenStack APIs primarily using Keepalived, VRRP and HAProxy which will reinforce the concepts and show how to connect the dots.
The A to Z Guide to Business Continuity and Disaster RecoverySirius
Companies often face challenges during business continuity and disaster recovery (BC/DR) planning. One of the key challenges is to reach consensus to ensure everyone at the company is on the same page. Therefore, it is important for the business and IT to have a comprehensive discussion about its current capabilities, needs, procedures and expectations for BC/DR.
To help with these conversations, we have developed an alphabetical guide and identified 26 important terms. This list is not meant to be exhaustive, but rather a good starting point for this discussion.
Business continuity and disaster recovery are not the same but complement each other. Planning on BCP and DRP is necessary for all business. This slide contains information on how to achieve and maintain them.
What do you do when disaster strikes? In part 9 of our DB2 Support Nightmare series we look at another DB2 disaster scenario and how it was resolved by the experts at Triton Consulting.
Design patterns and plan for developing high available azure applicationsHimanshu Sahu
1. Design Patterns High Availability of Azure Applications
2. Practical Demo on points to take care for High Availability from Infrastructure point of view(the points we discussed in last seminar)
3. Different Patterns for High Availability
3.1 Health Endpoint Monitoring Pattern
3.2 Queue-based Load Leveling Pattern
3.2 Throttling Pattern
3.3 Retry Pattern
3.4 Multiple Datacenter Deployment Guidance
4. Architecture for High Availability of Azure Applications
5. best practices for developing High Available Azure Applications
Impact 2013 2963 - IBM Business Process Manager Top PracticesBrian Petrini
Process efficiency remains the top priority of IT executives around the world. To help you succeed, IBM has collected a number of key top practices that have proven to be the necessary ingredient of any success story with the market leading process management solution ? IBM Business Process Manager. Placed in the context of an end-to-end BPM solution lifecycle, this session will focus on key infrastructure, administration, and operational top practices for IBM BPM Standard and Advanced, as distilled by lead IBM practitioners based experiences with projects world-wide. By the end of the session you will have the top tips on: setting up development environments, critical points on keeping your IBM BPM infrastructure scalable, performance tuning, as well access to top intellectual capital in this area.
MIRAI - Managing Industry Restructuring and Adoptions InquisitivelyQuEST Forum
MIRAI - Managing Industry Restructuring and Adoptions Inquisitively presented by S.M. Balasubramaniyan - Digital Core Technologies. Predict the course of business in the short (Manageable) or medium (Profitable) term will help to determine the organization’s success.
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons LearnedVMworld
VMworld 2013
Bjoern Brundert, VMware
Valentin Hamburger, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
CMGT/410 v19
Business Requirements Template
CMGT/410 v19
Page 2 of 14Business Requirements TemplateHow to Use This Document
This document is a template for creating a Business Requirements Document (BRD); it includes instructions and examples for guidance. As you complete your BRD using the template, only include sections pertinent to your project.Table of Contents
How to Use This Document1
Table of Contents1
1.Executive Summary2
1.1Project Overview2
1.2Purpose and Scope of this Specification2
2.Product/Service Description3
2.1Product Context3
2.2User Characteristics3
2.3Assumptions3
2.4Constraints3
2.5Dependencies3
3.Requirements4
3.1Functional Requirements4
3.2User Interface Requirements5
3.3Usability5
3.4Performance6
3.4.1Capacity6
3.4.2Availability6
3.4.3Latency6
3.5Manageability/Maintainability6
3.5.1Monitoring6
3.5.2Maintenance6
3.5.3Operations7
3.6System Interface/Integration7
3.6.1Network and Hardware Interfaces7
3.6.2Systems Interfaces7
3.7Security8
3.7.1Protection8
3.7.2Authorization and Authentication8
3.8Data Management8
3.9Standards Compliance9
3.10 Portability9
4.User Scenarios/Use Cases9
5.Deleted or Deferred Requirements9
6.Requirements Confirmation/Stakeholder Sign-Off10
Appendices11
Appendix A: Definitions, Acronyms, and Abbreviations11
Appendix B: References11
Appendix C: Requirements Traceability Matrix12
Appendix D: Organizing the Requirements131. Executive Summary
1.1 Project Overview
Describe this project or product and its intended audiences, or provide a link or reference to the project charter.
1.2 Purpose and Scope of this Specification
Describe the purpose of this specification and its intended audience. Include a description of what is within the scope what is outside of the scope of these specifications.
Example:
In Scope
This document addresses requirements related to Phase 2 of Project A:
· Modification of Classification Processing to meet legislative mandate ABC
· Modification of Labor Relations Processing to meet legislative mandate ABC
Out of Scope
The following items in Phase 3 of Project A are out of scope:
· Modification of Classification Processing to meet legislative mandate XYZ
· Modification of Labor Relations Processing to meet legislative mandate XYZ
(Phase 3 will be considered in the development of the requirements for Phase 2, but the Phase 3 requirements will be documented separately.)2. Product/Service Description
In this section, describe the general factors that affect the product and its requirements. This section should contain background information, not state specific requirements (provide the reasons why certain specific requirements are later specified).
2.1 Product Context
How does this product relate to other products? Is it independent and self-contained? Does it interface with a variety of related systems? Describe these relationships or use a diagram to show the major components of the larger system, interconnections, and external interfaces.
2.2 User Characteristics
Create gen.
In this session, TESCO will review the Lessons Learned from AMI Deployments and Asset Management Readiness. One of the main objectives of any AMI smart meter initiative is to provide customers with increased visibility, insight, control, and convenience. The AMI smart meter initiative fundamentally transforms the relationship a utility has with its customers by enabling them to become more self-aware of their energy usage. Your organization’s view of assets under management, and how best to manage and control them, will be paramount to the on-going realization of your investment.
Whether you have your own I.T. department or you use an outside provider, there are certain things you should expect AND receive and also understand about I.T.
This short presentation outlines what you should expect and understand to ultimately manage your I.T. like every other department in your company with accountability through key performance indicators and metrics.
The following is a presentation that will help you manage your IT people, processes and technology by Jason Caras, Co-CEO of IT Authorities currently ranked 35th in the world by MSPMentor.net
Building a Business Continuity CapabilityRod Davis
A detailed overview of the business continuity / disaster recovery planning process. Gives numerous tips for effective execution of plan development. Emphasizes development of a true recovery capability through exercises which reveal weaknesses in the plan or technology leading to improvements.
Optimizing connected system performance md&m-anaheim-sandhi bhide 02-07-2017sandhibhide
Sandhiprakash Bhide presenting at the Smart Manufacturing Innovation Summit/Industry 4.0 event on "Optimizing Connected System Performance and Establishing Tangible Goals for Sensor Use"
Similar to HA & DR System Design - Concepts and Solution (20)
The Business Continuity Conference, 25th October 2023 in Riyadh - Mr. Atiq BajwaContinuity and Resilience
Business Continuity Strategies
What is a Business Continuity Strategy?
Keeping the ISO-22301 definition of Business Continuity in mind, the aim of a Business Continuity Strategy should be:
“To continue the delivery of products and services at predefined capacity during a disruption”
So a Business Continuity strategy should:
Meet the Minimum Business Continuity Objectives (MBCO)
Legal and regulatory requirements
Contractual commitments
Quantity, Quality, time commitments with the customers
Practical
Cost Effective
An effective business continuity strategy should be specific to the needs of an organization
It should be:
Able to meet the MBCO
Practical
Cost effective
Business Continuity Strategies should be regularly reviewed and updated to remain relevant and effective.
A strategy considered effective today may not be effective in 6 months.
The Business Continuity Conference, 25th October 2023 in Riyadh - Nuha EltinayContinuity and Resilience
Building Urban Resilience in Critical Infrastructure
Assets, systems, and networks that are essential by governments for the functioning of a society and economy and deserving of special protection for national security.
The ability of a system, community or society exposed to hazards to resist, absorb, accommodate, adapt to, transform and recover from the effects of a hazard in a timely and efficient manner, including through the preservation and restoration of its essential basic structures and functions through risk management (UNDRR).
The FIVE ICLEI PATHWAYS reflect ICLEI’s approach to achieving a sustainable city as well as local contributions to implementing the goals laid out in international frameworks such as the Sustainable Development Goals. Any of our individual projects or initiatives can be oriented along one or more specific pathways. We also look at how the pathways connect to bring about change in an INTEGRATED way. For example, we consider how nature-based development contributes to resilience, or how to bring equity into low emission development.
Cities need to look at resilience from a systemic governance perspective
Integrated management starts with wide-scale mobilization of support from stakeholders and robust facts and data.
Challenges often lie in the acceleration and upscaling of activities. Individual best practice is easier to achieve, follow-up funding and investment is challenging
The Business Continuity Conference, 25th October 2023 in Riyadh - Paul GantContinuity and Resilience
The five essential elements of optimising your BC programme through technology -
1. Securing Accurate Data
2. Delivering Programme Compliance
3. Turning Data into Intelligence
4. Enabling Continuous Improvement
5. Positioning in a Risk World
The Business Continuity Conference, 25th October 2023 in Riyadh - David Boll...Continuity and Resilience
IT Disaster Recovery – Challenges and Solutions.
What is IT DR?
1. The ability to respond and recover from disruptions to IT infrastructure, networking, systems, equipment and data to support business continuity.
2. Originated from the legacy environment of mainframes where IT was centralised and had a major impact.
3. Further improved to IT DR sites to manage failover:
Cold
Warm
Hot
4. Traditionally strategies related to data backup by tape only.
5. Introduction of cloud and SAAS solutions has improved resilience through decentralisation.
Next step cloud-to-cloud DR solutions?
Why IT DR?
IT DR is critical and always important, which is often not given enough focus in BCM programs
Critical component of resilience
IT DR and IT resilience is a critical element of a thorough BCM system and resilience program
High % of real disruptions
It failures continue to be a leading cause of business continuity disruption.
Examples?
More important that ever
With increasing reliance on IT and digitisation, complexity and new risks, the requirement for IT DR continues to become even more important
Make or break your recovery
A well defined, implemented and exercised IT DR program is essential to the recovery of business delivery of products and services
The Business Continuity Conference, 25th October 2023 in Riyadh - Abdulrahma...Continuity and Resilience
Lessons from a Chief Continuity Officer-
A Chief Continuity Officer (CCO) is responsible for ensuring that an organization's critical operations continue despite any disruptions or crises.
1. Build a robust business continuity plan.
2. Foster a culture of preparedness.
3. Establish clear roles and responsibilities.
4. Develop strong partnerships.
5. Implement robust technology systems.
6. Continuously assess and mitigate risks.
7. Communicate effectively.
8. Learn from incidents.
Remember, flexibility and adaptability are key in the ever-changing landscape of continuity management. As a CCO, it's essential to stay proactive, be prepared for unexpected events, and continuously improve the organization's ability to recover and thrive in the face of disruptions.
Business Resilience and its components often gather varied points of view and impressions from practitioners, champions, consultants, and other related stakeholders.
Over time there are few misconceptions that seem to have held on and often turn out to be counterproductive to the vision and goal of such programs.
CREATING should eventually lead to putting in place a comprehensive Program covering all phases of the full BCM Lifecycle – Plan, Do, Check and Act
MAINTAINING involves performing the activities to keep the BCM Program appropriate and relevant for the upcoming future – including Improvement. This covers:
Almost all BCM standards and guidelines make it mandatory to build a BCM culture. This is best done by ensuring ongoing and regular emphasis on the concept of Business Continuity, and its importance to the organization.
Business Continuity Compliance
Cycle
Regulatory
Internal
Third party
Industry Compliance
SecOps
Review and maintain
Regulatory Compliance
Meet the Specific Compliance requirements by SAMA, NCA, CITC etc..
Industry Specific Compliance
For BFSI – SAMA, NCA
For Telco – CITC, NCA
For hospitality - STA, NCA
Third Party
ISO , 27001, 27021 ,
COSO , NIST, NESA
HIPAA , 27005 RISK
internal
Compliance to internal Polices , procedures Standards
InfoSec, Financial , HR, IT
SecOps
Adherence to specific Cyber Security –First line of defense polices
Vulnerability Assessment.
Identification of BCM related risks and comply to the remediation
BCM Maintenance Plan
This phase maintain the BCP in a constant ready-state. The maintenance process of a BCMS is constant and dynamic.
Crisis is an inherent abnormal, unstable, and complex situation that represents a threat to the strategic objectives, reputation or existence of an organization.
(ISO 22361 Crisis Management Guidelines)
Crisis Management is a coordinated activities to lead, direct and control an organization with regard to a crisis.
(ISO 22329: Crisis Management Guidelines)
Cyber security and IT resilience is a journey, not a destination, and we need to consider how business continuity, integrated with them.
This is becoming more and more prevalent at Board level and is having significant impacts, particularly on sectors.
Enterprise resilience goes beyond organizational and operational resilience.
It indicates an organization's ability to:
Dynamically plan, prepare, and understand risks and critical functions;
Anticipate disruptions and potential downstream impacts;
Respond progressively in a coordinated, organized, and controlled manner; and
Recover, adapt, and evolve to improve future responses.
Enterprise resilience encompasses cyber and physical threats across all geographies.
Enterprise resilience goes beyond organizational and operational resilience.
It indicates an organization's ability to:
Dynamically plan, prepare, and understand risks and critical functions;
Anticipate disruptions and potential downstream impacts;
Respond in a coordinated, organized, and controlled manner; and
Recover, adapt, and evolve to improve future responses.
Enterprise resilience encompasses cyber and physical threats across all geographies.
“The best way to get management excited about a resiliency plan is to have a fire in one of your production data centers.”
Presented by Daman Dev Sood, Continuity & Resilience (CORE)
Introduction:
Over 33 years in the industry
Over 15 years in BCM a related domains
National and Global Winner of the BCI Awards
AFBCI
Mix of experience as Practitioner, Trainer, and Consultant
BCI Approved Instructor
Presented by Dhiraj Lal
About Continuity & Resilience (CORE)
Consulting Services (ISO 22301 Certified)
Cyber Security
Business Continuity Management
Crisis Management
IT Disaster Recovery
Information Security
Risk Management
Training Services
NCEMA developed Training (we are trainers for the NCEMA courses at GCAS, NCEMA licensed training entity)
CORE is an approved Global Training partner for the UK based Business Continuity Institute licensed to conduct BCI trainings anywhere in the Globe
Notification and Automation Tools
CORE acts as a enabler between the partner & client by providing support for:
Gather requirements
Shortlist Vendors
Subject matter expertise for tool selection
Perform Vendor Demos
Tool installation & implementation
support for BC, ITDR & Notification
Assistance during tool testing
Presented by-Kashish Jhamb Cityinnovates
What’s a Social Media Crisis?CRISIS? Really?
If there’s a high volume of incoming social media messages on one particular topic or negative comments, chances are you have a social media crisis on your hands.
A communications crisis can strike at any time. It could be a faulty product, a lousy campaign, or a slip of the tongue from someone higher up.
It doesn’t matter the industry you’re in, or how popular you’ve been to this point. Sometimes, it just happens.
Waiting for a social media crisis to blow over is never an option. If you ignore it, it will likely get worse. Social media can be an asset in a crisis when used correctly, not an extra problem.
How to identify a Crisis on Social Media
When the public knows more (than your company) about the issue and they voice it on social media that’s your first sign of a social media crisis
If you start receiving a negative review in series on a particular product or a service then it is a sign of social media crisis
If you get more than 10 negative mentions per hour, for more than three consecutive hours then it is a sign of social media crisis
Presented by Ramesh Ramani (LRQA)
AGENDA
Introduction-BCMS and ISMS
International Standards, UAE Regulations (NCEMA, ADSIC, NESA, ISR, GDPR). Dubai Data Law
PDCA Cycle
Common Factors-BCMS and ISMS
Organisational Considerations
Joint Project Management
Where this will work?
Where this will not work
Q&A
Presented by -AWS AL KHANJARI
A serious threat which, under time pressure and highly uncertain circumstances, necessitates making critical decisions.
A Crisis Communication Plan outlines the procedures for collecting conveying information to interested parties during or immediately following an emergency or crisis.
Disaster and disruptive business incidents push people and organisation to their limits, and one of the first impacted elements are communication systems.
Best steel industrial company LLC in UAEalafnanmetals
AL Afnan Steel Industrial Company LLC is a distinguished steel manufacturer and supplier, celebrated for its high-quality products and outstanding customer service. With a diverse portfolio that includes structural steel, and custom fabrications, AL Afnan meets a wide array of industrial demands. We are dedicated to using advanced technologies and sustainable methods to ensure excellence and reliability in every product, serving both local and international markets with efficiency.
Best Catering Event Planner Miso-Hungry.pptxMiso Hungry
Miso-Hungry, led by Executive Chef Emilio Molina, is Islamorada's premier catering event planner. We specialize in sustainable, farm-fresh cuisine, using local ingredients to create unforgettable dishes. As a FollowTheFoodHMI branded company, we bring our culinary expertise across the U.S., connecting communities through exceptional food and personalized event planning. Let us showcase our family's passion and make your event extraordinary.
Are Gutters Necessary? Explore the details now!AmeliaLauren3
Gutters are typically installed at a slight downward slope to allow water to flow freely towards downspouts or drains – the downspout being the vertical pipe attached to the gutters. The water is subsequently transported by the downspout to either the ground or an underground drainage system. Maintaining a gutter system that is free of blockages and functional requires regular maintenance.
But, many wonder in what situations gutters are required and not required. In this ppt we will discuss in detail the matter, ‘Are Gutters Necessary?’
What Are the Latest Trends in Endpoint Security for 2024?VRS Technologies
In this PDF, Discover the top 2024 endpoint security trends, including zero trust, AI integration, XDR, cloud security, and enhanced mobile protection. VRS Technologies LLC supplies the top level Endpoint Security Service Dubai. For More Info Contact us: +971 56 7029840 Visit us: https://www.vrstech.com/endpoint-security-solutions.html
Blessed Marine Automation offers cutting-edge marine automation solutions tailored to enhance vessel efficiency and safety. From advanced control systems to remote monitoring, our services empower maritime operations worldwide. Explore our comprehensive range of products and services to optimize your vessel's performance. https://www.blessedmarineautomation.com/
All Trophies at Trophy-World Malaysia | Custom Trophies & Plaques Supplier. Come to our Trophy Shop today and check out all our variety of Trophies available. We have the widest range of Trophies in Malaysia. Our team is always ready to greet your needs and discuss with you on your custom Trophy for your event. Rest assured, you will be with the best Trophy Supplier in Malaysia. The official Trophy Malaysia. Thank you for your support.
How Does Littering Affect the Environment.ClenliDirect
Read this PPT now to gain in-depth insights into how to fight litter and safeguard our landscapes from its negative impacts.
Visit-https://clenlidirect.com/cleaning-equipment/litter-picker-grabber-equipment.html
Learn about Inspect Edge, the leading platform for efficient inspections, featuring the advanced NSPIRE Inspection Application for seamless property assessments. Discover how the NSPIRE Inspection Application by Inspect Edge revolutionizes property inspections with advanced features and seamless integration.
The Jamstack Revolution: Building Dynamic Websites with Static Site Generator...Softradix Technologies
In this infographic, the Jamstack architecture emphasizes pre-rendered content and decoupling the frontend from the backend. It leverages static site generators (SSGs) to create fast-loading HTML files and APIs for dynamic functionality. Benefits include improved performance, enhanced security, scalability, and ease of deployment. Real-world examples include Netlify, Gatsby, and Contentful. https://softradix.com/web-development/
Unlocking Insights: AI-powered Enhanced Due Diligence Strategies for Increase...RNayak3
Explore how a risk-based approach to Enhanced Due Diligence can deliver effective Anti-Money Laundering (AML) compliance and monitoring in banking and financial services.
DOJO Training Center - Empowering Workforce ExcellenceHimanshu
The document delves into DOJO training, an immersive offline training concept designed to educate both new hires and existing staff. This method follows an organized eight-step process within a simulated work setting. The steps encompass safety protocols, behavioral coaching, product familiarity, production guidelines, and procedural understanding. Trainees acquire skills through hands-on simulations and rehearsal prior to transitioning to actual shop floor duties under supervision. The primary aim is to minimize accidents and defects by ensuring employees undergo comprehensive training, preparing them effectively for their job roles.
SECUREX UK FOR SECURITY SERVICES AND MOBILE PATROLsecurexukweb
At Securex UK Ltd we are dedicated to providing top-rated security solutions tailored to your specific needs. With a team of highly trained professionals and cutting-edge technology, we prioritize your safety and peace of mind.
Our commitment to excellence extends beyond traditional security measures. We understand the dynamic nature of security challenges, and our personalized approach ensures that every client receives a bespoke protection plan.
Colors of Wall Paint and Their Mentally Properties.pptxBrendon Jonathan
Discover how different wall paint colors can influence your mood and mental well-being. Learn the psychological effects of colors and find the perfect hue for every room in your home.
Experience the breathtaking beauty of a Waikiki sunset aboard the MAITAI Catamaran. Sail along the stunning coastline as the sun dips below the horizon, casting vibrant hues across the sky. Enjoy the gentle ocean breeze, refreshing drinks, and a relaxed atmosphere. This unforgettable voyage offers panoramic views of Diamond Head and the Waikiki skyline, making it the perfect way to end your day in paradise. Join us for a memorable sunset cruise you won't forget. Please visit our website: https://www.maitaicatamaran.net/ and call us at 808-922-5665 for additional information.
In the competitive realm of online business, visibility is key, and search engine optimization (SEO) serves as the cornerstone of digital prominence. As the demand for effective SEO solutions continues to soar, finding the best SEO company in Perth becomes imperative. Enter Simba Squad – a dynamic force dedicated to propelling your business to new heights of success.
BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem sol...gitapress3
TOP No AsTro 1 black magic SpecialiSt UK baba ji +91-9463629203 VashIkaRan blaCk maGiC specialist in uSA Uk England Luxembourg CanAdA America BEst VASHIKARAN SPECIALIST 9463629203 in UK Baba ji Love Marriage problem solution Uk USA america england LonDon Divorce problem solution astroloGer
Comprehensive Water Damage Restoration Serviceskleenupdisaster
Find out how Disaster Kleenup's professional water damage restoration services can quickly and efficiently restore your property. Find more about our advanced techniques and quick action plans. Visit here: https://iddk.com/disaster-cleanup-services/flood-damage/
SMS2ORBIT | launched in 2022 in Mumbai's Andheri area, aims to be the most reliable Bulk SMS Service Provider in Mumbai.
If More Information About The SMS Service Provided By SMS2ORBIT Is Desired, Please Don’t Hesitate To Contact The Business Team. They Can Be Reached At
business@sms2orbit.com Or By Calling 97248 55877.
Delightful Finds: Unveiling the Power of Gifts Under 100JoyTree Global
Stretch your budget and spread joy! This guide explores the world of gifts under 100, proving thoughtful gestures don't require a hefty price tag. Discover unique and practical options for birthdays, holidays, or simply showing someone you care. Find inspiration for every occasion within your budget!
Delightful Finds: Unveiling the Power of Gifts Under 100
HA & DR System Design - Concepts and Solution
1. Continuity and Resilience (CORE)
ISO 22301 BCM Consulting Firm
Presentations by our partners and
extended team of industry experts
Our Contact Details:
INDIA UAE
Continuity and Resilience
Level 15,Eros Corporate Tower
Nehru Place ,New Delhi-110019
Tel: +91 11 41055534/ +91 11 41613033
Fax: ++91 11 41055535
Email: neha@continuityandresilience.com
Continuity and Resilience
P. O. Box 127557
Abu Dhabi, United Arab Emirates
Mobile:+971 50 8460530
Tel: +971 2 8152831
Fax: +971 2 8152888
Email: info@continuityandresilience.com
2. H A & D R Design Concepts
S Seshadri
Head – IT DR & Service Management
Continuity and Resilience
10th Feb, 2014
Dubai
2
3. Outage Categorization
• Service failures that should/need not be known to end users
need ‘fault protection’ – the operation of such services will be
continuous despite failure scenarios
• Short interruptions (within a few hours) are referred to as
‘minor outages’
• Longer interruptions, when end users’ business services get
delayed for longer durations, are termed as disaster situations
or ‘major outages’
3
4. Key Questions
1. Which systems should ‘never’ fail – we may need Fault Tolerant
systems in their place
2. What failures should be handled transparently, where an outage
must not occur? Against such failures we need fault protection.
3. How long may a short-term interruption be that happens once a day,
once a week, or once a month? Such interruptions are called minor
outages.
4. How long may a long-term interruption be that happens very seldom
and is related to serious damage to the IT system? For instance, when
will this cause a big business impact, also called a major outage or
disaster?
5. How much data may be lost during a major outage? And in which state
– persistent or ephemeral…
6. What failures are deemed so improbable that they will not be
handled, or what failures are beyond the scope of a project?
4
5. Business Issues & Cost of IT Outage
• IT Fault Protection has to be driven by business
considerations
• Business Continuity is the overall goal
• Business imperatives manifest through BIA/RA and
MTPoD/RTO/RPO
• IT Outage is not the real issue, but the business
consequences are
• IT Outage affects revenues & costs adversely
• Direct Costs – repairs, penalties, lost revenue
• Indirect Costs – lost & additional work hours
5
6. Cost Vs Benefit
• IT Recovery has extensive cost implications – both in terms of
Capex and Opex
• Strategies developed should be cost effective
• ‘Technology for the sake of Technology’ approach should be
completely avoided
• Strategies should, as far as possible, be able to address
disruptions and impacts collectively
• Organizational objectives and risk appetite should direct
recovery strategies
• Legal, contractual and regulatory aspects play a major role
(SOX, SAS 70, BASEL II/III…..)
6
7. IT Service Outage
• Importance of IT Services depends on
– Business relevance
– Revenues
– Functionality that they enable
– Amount of damage due to the outage
– Any regulatory aspect that demands the service
• Outage Categorization is dictated by the importance of the
service and hence the significance of its failure
7
8. High Availability
• High availability is the characteristic of a system to protect
against or recover from minor outages in a short time frame
with largely automated means.
• HA has 3 essential features
– Outage categorization is ‘minor’- we need to envisage
potential failure scenarios for the service and the minor
outage requirements for them - robustness
– System category should involve Mission Critical & Business
Important and Business Foundation processes which need
to be recovered within a very short time – RTO/RPO
– Component (SPoF) level protection which will facilitate
automatic recovery – redundancy
• HA features are normally built within the primary data center
and data replication is synchronous
8
9. Continuous Availability
• Continuous Availability is the highest point of High Availability,
wherein, every component failure is protected against, and no ‘after
failure recovery’ takes place
• These are known as Fault Tolerant systems, that provide automatic,
high-speed ‘failover’ in the case of h/w or s/w failures
• They have ‘internal multi-computer systems architecture’ that have
no shared central components, including memory
• Tandem’s ‘non-stop’ systems and Stratus’s fault tolerant computers
are examples of this
• These are used by the leading stock exchanges globally (NSE in India
uses Stratus and BSE, Tandem), and by banks for their ATM related
transaction processing
• These systems scale extremely well to the largest commercial
workloads
• These systems were introduced originally by Airbus for their A-320
planes for on-board flight controls In their long duration flights
10. HA Components
Essential ingredients of High Availability are:
• Availability
• Reliability
• Serviceability
We will discuss the above three in the following
slides.
10
11. Availability & Metrics
• Availability – How long a service or system component is
available for use and the features that help the system to stay
operational despite occurrence of failures, eg. NIC, Mirrored
Disks, Redundant Power Supply
• Availability = uptime/uptime+downtime
• Downtime will include scheduled downtime also
• Elapsed time can be measured as wall clock time
• Availability can be expressed in absolute numbers (79 hrs out
of 80 hrs or as a percentage (99.89%)
• Availability = MTBF/MTBF+MTTR (????)
– MTBF: Mean Time Between Failures
– MTTR: Mean Time To Repair
11
12. Reliability & Metrics
• Reliability is a measure of ‘fault avoidance’
• Refers to the ‘probability that a system will be available over a
time interval T’
• MTBF is a measure of Reliability
• Annual Failure Rate (AFR) is the inverse of MTBF
• Reliability features help to ‘prevent’ and ‘detect’ failures
• H/w reliability has tremendously improved over the last 30
years and they are highly resilient nowadays
Component MTBF (Hours) MTBF (Years) AFR (per year)
Disk Drive 300,000 34 0.0292
Power Supply 150,000 17 0.0584
Fan 250,000 28 0.0350
NIC 200,000 23 0.0438
12
13. Serviceability
• Measurement that expresses how easily and quickly
a system is serviced and repaired
• The lower the planned service time, the higher is the
availability
• Planned serviceability goes into the architecture as a
design objective
• Actual serviceability should be lower than planned
serviceability
• These clauses have to be carefully built into the
Service Level Agreements with IT vendors
• Murphy’s Law: Anything that can possibly go wrong,
does
13
14. HA/DR Strategy - Aspects
• Data – what is the architecture concerned with
• Function – how is the data worked with
• Location – where is the data worked with
• People – who works with the data and achieve the
functionality
• Time – when is the data processed
Each of the above aspects are run through 3 levels of abstraction
• Objectives – What will this achieve vis a vis org objectives
• Conceptual Model – Realization of the objectives on a
business process level
• System Model – Logical data model and the application
functions that must be implemented to realize the business
concepts
14
15. HA/DR Framework (Zachman)
Objectives Conceptual Model System Model
Data
(What)
Business Continuity /
IT Service Continuity
Availability of mission-
critical and important
business services
ICT categories,
dependency diagrams
Function
(How)
Map biz processes to IT
services, RTO, RPO, SLA
ITIL processes, IT
processes, projects
Design patterns – RAS,
redundancy, backup,
replication,
virtualization
Location
(Where)
Internal (IT),
Outsourced
Data Center, Disaster
Recovery Center
All systems, all
categories
People
(Who)
Biz process owner CIO/IT dept IT PM, Architect,
System Engineers,
System Administrators
Time
(When)
Implementation Plan Outage scenarios,
categories
Failure/Change/
Incident/Problem
/Disaster
15
16. HA/DR System Design
• System Model discussed earlier is the core of this activity
• ‘What’ and ‘How’ of the System Model will lay the foundation
for HA/DR System Design
• Protection against outages of computers, systems and
databases are in scope for HA
• Protection against infra/building/city/ outage,
user/administrative errors are in scope for DR
• Sound processes, solid architecture, careful engineering and
an eye for details are the hall marks of a good HA/DR system
design
16
18. HA/DR Scoping
• Take into account regulatory aspects (SOX, SAS, Basel II)
• Identify the key applications (from business BIAs)
• Check out the various ICT environments required by these
applications (IT BIA)
• Identify the dependencies
• Carefully identify and document the component categories
that are not required – scope exclusions
• Prepare preliminary system scope – list of component
categories required for HA/DR
• Identify failure scenarios for each of these component
categories
• Document the failure scenarios that are outside the scope
• The component categories and the failure scenarios will
constitute the scope of HA/DR
18
19. Redundancy & Replication
• Redundancy is the ability to continue operations in the case of
component failures
• Recovery is done through ‘managed component repetition’
• Eliminating ‘single points of failure’ is the goal
• Just adding a second component is not enough
• Replicated component has to be ‘managed’ to take over in
case the original component fails (failover)
• This ‘management’ can be automated or manual
• Replication of the ‘state’ of the component is crucial
• Replication may be a duplicate part, an alternate system (HA)
or an alternate location (DR)
• 100% redundancy through replication is very expensive and
difficult to achieve
19
20. Data Replication
• Redundancy for Disk Drives means ‘data replication’ and hence very
crucial
• Redundant disks provide multiple storage of data and/or OS
• Data disks carry one of the highest risks
• OS disks usually house the root file system and swap space
• Data Replication can be ‘synchronous’ or ‘asynchronous’
• RPO considerations should dictate data replication approach
• For very low or nil RPO, latency in data replication may not be
tolerated (synchronous vs asynchronous)
• Bandwidth considerations also impact replication
• Data Deduplication technology in recent times along with data
compression has reduced much of the headaches involved with
data replication
• Two main types of date replication
– Host based/Storage based
20
21. Virtualization
• Virtualization, as a concept, was demonstrated in 1960s ,
when IBM’s Thomas J Watson Research Center simulated
‘multiple pseudo machines’ on a single 7044 MX Mainframe
• Virtualization allows multiple operating system (OS) instances
to run concurrently on a single computer.
• It is a means of separating hardware from a single OS, by
“inserting an abstraction layer” into the software stack.
• Each ‘Guest’ OS is managed by a Virtual Machine Monitor.
• Virtualization Software can also collect a number of separate
resources and “pool” them, even if the devices or resources
remain in separate physical locations.
• The end goal is sharing the resources and capabilities flexibly,
under software control.
• The part of the virtualization package that enables to interact
with and control the VMs is referred to as the Virtual Machine
Monitor (VMM) or Hypervisor software.
21
22. Virtualization of Resources
• They supply resources in logical units to application programs and free
them from reliance on specific hardware
• Virtualization of Servers allows business to consolidate the workloads
running on multiple servers to just a FEW
• Storage Virtualization hides the physical storage from applications on host
systems, and presents a simplified (logical) view to the applications and
allows them to reference the storage resource by its common name
whereas the actual storage could be on a complex, multilayered,
multipath storage networks.
• RAID is an early example of storage virtualization.
• Virtual CPU is one of the oldest concepts, which has enabled
multiprocessing capability, handled by OS
• Virtual Memory is as old as Virtual CPU – again handled by the OS as part
of Virtual Memory Management
• Working within a virtualized environment may add some options and new
flexibility to your HA and DR plans.
22
23. Storage Virtualization
• With regard to storage, the objective is to bring together multiple
storage devices under unified command, whether they are from the
same manufacturer or not, and without regard for their physical
locations.
• Once accomplished, the now-unified band of storage systems can
be treated as a single, huge storage capacity that can be
provisioned, managed, backed up to tape, and even replicated to
offsite disaster recovery (DR) or high availability (HA) sites, with
greater visibility, synchronized automation, and reduced
management labour.
• Even archiving, multi-level storage, and information lifecycle
management (ILM) efforts can be made simpler, with older, slower,
or cheaper storage units provisioned to handle the near-line or
archival storage while newer, faster devices handle the current
production processes.
23
24. Host Clustering
• Increasing availability through redundancy on the host level
by taking several hosts and using them to supply a bunch of
services, where each service is not strictly associated with a
specific computer
• Host Clustering addresses
– Hardware errors
– OS errors
– Application errors
• Failover clusters , which allow a service to migrate from one
host to another in the case of an error. They are the most
used technology for high availability.
• Load-balancing clusters, which run a service on multiple hosts
from the start and handle outages of a host – more relevant
for performance than HA.
24
25. Middleware
• Generally considered to be the layer between the OS and the
applications
• They are independent of applications but carry application-
specific configuration and used by multiple applications
• Database Servers, Web Servers, Application Servers,
Messaging Servers are some examples
• HA for these will include product specific clustering, data
replication, and even session state replication
• Properly configured failover cluster sufficiently integrated
with the DB Server provides HA
• Redo log file shipping (asynchronous) with commits delayed
by the RPO will provide the best DR
• HA for Web Servers and Messaging Servers are achieved
mostly through Load-balancing Clusters (stateless)
25
26. HA for Applications
• Application HA is the eventual goal
• Application categories – Off the Shelf, Bought & Customized,
In-house Built
• Failover cluster is an approach most commonly adopted for all
categories of applications
• Applications touch the nerve center of all the following
systems:
– Development
– Acceptance/Integration Test
– Staging & Release
– Production
– Disaster Recovery
• Suitable precautions must be taken while coding/testing
stages to ensure HA
26
27. Networks
• Network is the backbone of ICT as it provides the linkages and
ability to communicate between component categories
• Various types of networks are
– LAN, VLAN, MAN, WAN, VPN, Intranet, Extranet, Internet
• And there are n/w components that help build and run the
networks – NIC, switches, routers, hubs, firewalls etc.
• Connectivity is the most major element of networks
• Data management on the network is done through encoding, data
compression & encryption/decryption
• Power supply, Heating, Ventilating & Air Conditioning (HVAC) are
two other important considerations
• It is absolutely essential to provide redundancies at each of the
network and component level/s for network HA
• Generally, there is no pay-load based state for any of these – hence
two or more devices would ensure HA
27
28. Data Back up and Restoration
• A major requisite for HA & DR
• Management of backed up data is equally important
• Restoration of data must work effectively
• Automated mechanisms exist
• System/file/database backups are the key
• Full or incremental backup
• Consistency of the data state is crucial
• Checkpoint functionality is useful in this context
• Storage and handling of backup media is very significant
• Remote (including at the DR site) storage of backups including
Tape Vaulting should be institutionalized
• Testing/recycling and proper maintenance of backup media
• Backup on failover clusters should distinguish between
physical and logical hosts in the cluster
28
29. HA & DR – Positioning
• HA and DR are two sides of the same coin
• Redundancy, Replication and Robustness are the key
characteristics of both HA & DR
• HA focuses on fault protection and is built on mostly
automated recovery techniques for minor outages
• HA is not built for environmental disasters like floods, fire,
earthquake and manmade incidents like terrorist attacks,
human errors of huge magnitude
• The above additional scenarios and major outages lead to the
need for DR, that focuses only on recovery
• DR is also associated with a large part of manual recovery in
terms of Emergency Management and Damage Assessment &
Recovery apart from IT Recovery
• When the primary data center is unavailable, migration to DR
site will be the only option
29
30. Disaster Recovery
• Disaster recovery is the ability to continue with services in the
case of major outages, often with reduced capabilities or
performance.
• Disaster recovery handles the disaster when either a single
point of failure is the defect or when many components are
damaged and the whole system is rendered non-functional.
• Operations cannot be resumed on the same system or at the
same site. Instead, a replacement or backup system, usually
located at another place is activated and operations continue
from there.
• Disaster recovery often restores only restricted resources and
thus restricted service levels.
• Continuation of service also does not happen instantly, but
will happen after some outage time.
30
31. DR in Context
• IT DR is activated when the likely recovery time is above the
least RTO and there is expected data loss
• IT recovery will be limited only by the agreed levels of service
by the business owners
• IT DR activities will be carried out of the DR site, which should
be equipped fully to handle IT services upto agreed levels
• Scaling up the IT services in due course of time will generally
be outside the purview of DR Planning
• Agreed levels of IT services are resumed in the DR site using
the infrastructure and back up data/tapes there
• The roles of primary and DR sites are interchangeable but not
in the strict sense of HA
• In the above scenario, both primary and DR sites will be
functional, even though they may cater to different business
activities/IT services
31
32. DR and the Cloud
• Cloud is the latest buzz word in outsourced business model
• Leveraging cloud model can optimize DR procedures
• Reduces the high cost of maintaining stand-by sites
• Cloud service providers normally have state of the art systems
and infrastructure, huge bandwidth, exacting security setup,
apart from complying with relevant ISO guidelines and
industry best standards.
• According to recent Aberdeen study report, DR is the leading
‘use case’ for cloud
• The key advantages are recovery times, virtualization and
multi-site availability
• Concerns regarding security, identity and compliance to
various regulations do exist as the cloud model matures
• With data volumes growing at the rate of 10 times every 5
years, cloud computing is likely to see a huge growth
32
33. DR in the Supply Chain
• Supply Chain is basically a delineation of dependencies
depicting the various actors in the chain of a product or
service from a vendor till reaching a consumer
• IT DR dependencies are manifold – internal customers, ICT
equipments, external vendors and service providers, IT staff,
etc etc…
• DR planning should judiciously take into account the inherent
risks in the supply chain and provision suitable mechanisms to
handle them effectively, so that the DR goal does not derail
• Typically, if Data Center support is outsourced, there is a huge
dependence on the Service Provider – timely availability of
people, spares, replacements etc.
• Supply chain glitches can emerge from as innocuous a thing as
consumables supplies
33
Division of our complete problem into the above layers enables us to think about potential problems and their solution separately
This separation builds the base for HA/DR scoping
Eliminations have to be documented and sign off obtained
All the ‘scope exclusions’ must be recognized during risk management and might need to be handled in separate projects
‘State’ does not just refer to data
Data state in DR situations generally differ due to accepted RPO
These refer to files or registry entries in the case of s/w components and firmware releases in the case of h/w
Disks are redundant via volume manager
Primary and secondary databases are redundant via system administrator
NIC is redundant through OS (multipath configuration)
In the above cases, VM/SA/MC could be the SPoF
Through virtualization, Hewlett Packard consolidated no fewer than 86 of its own data centers to just three.
The actual server counts and consolidation ratios vary, A ratio of 10:1 is not uncommon.
H/w: No redundancy/redundant component had an error/redundancy activation did not work
OS: Process scheduling error/processes hanging/memory management deficiency/network traffic glitches/file system corruption
Apps: Memory leaks due to applications getting into endless loop/deadlocks in communication processes/other software errors
F/O clusters – active/active; active/passive
Suitable for application with stateful data
Eg.: No application must use machine specific configuration of the physical host (as it will not recognize a virtualized host or a cluster node)
On exceptional conditions, apps must contain start/stop/restart actions
Long batch jobs should have check points for validation at restart
Application needs to be designed in a cluster environment
Tiered development approach for applications – UI (front end), business logic (middleware) database (backend)
‘From the scratch’ applications should deploy fault-tolerant requirements
Code quality is of paramount importance
Testing – function point, non-functional properties and end-to-end
Open Systems Interconnection Reference Model provides seven layers of abstraction for networks – Physical, Datalink, Network, Transport, Session, Presentation, Application
Popular network protocols are Ethernet, TCP/IP, Token Ring, Frame Relay, ATM, FC, etc.
Network Outage is generally considered a major outage
WAN outages are major and it is almost impossible to prevent totally – question remains if these multiple connections are independent or if they share some SpoF
Typically the ‘last mile’ and the ‘proverbial digger’ syndrome
WAN Virtualization dangers
ISPs - SLAs – penalties – goes on and on
Other network based services like DHCP, DNS, LDAP, AD, Email, Print etc have to be redundant depending on the need
SOPs should be in place in details for all backup and restoration processes
Specific personal responsibilities should be assigned for backup duties
Internal and external cloud
Private and Public cloud