SharePoint 2013 DR solution overview

SHAREPOINT 2013 DR SOLUTION
(WARM STAND-BY)
An overview of a workable solution
for mid-size Enterprises
An example of implementation and
DR Documentation content
Emilio Gratton – ICT Project Manager
EG IT Services
1

OUTLINE
 Business Requirements
 Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
 Prerequisites
 Activation Scenarios
 Schedule of events (workflows)
 Logical System overview
 Escalation matrix
 DR procedures
 Health checks
 DR validation exercise
 Event Summary and logs
2

BUSINESS REQUIREMENTS
 100% availability (24/7) of personal files
 Same downtime availability as the hosting Data Centre
 30 minutes service restoration over DR DC in R/W mode in case of
major Data Centre planned or unplanned outage
 30 minutes restoration over DR in Read/only mode in case of planned
farm outage (service pack release case)
3

RECOVERY TIME OBJECTIVE (RTO) AND
RECOVERY POINT OBJECTIVE (RPO)
 The RTO is the agreed time duration between a failure and the restoration of
service. In this solution, it has been defined as 30 minutes.
 The RPO has been defined as a minimum service with the following:
 Web Front End server restored (SharePoint 2013 main page accessible with links
operational)
 Search service restored (a query successfully displays results related to internal
documentation)
 Personal page displaying all links and documents
4

PREREQUISITES 1/2
 Data Centres connected with fast and reliable dedicated link
 Host are virtualized to ensure host HA
 Windows servers are Load Balanced , SQL Servers are clustered
 Infrastructure patch level consistent across DCs (SCCM to monitor and
report)
 DR farm is kept updated as per latest updates applied to the Production
farm
 Customized code and solutions are kept updated in both Farms
(blogs.msdn.com/..../managing-custom-solutions-for-disaster-recovery-sharepoint-farms )
5

PREREQUISITES 2/2
 Local DR DBs maintenance managed with the following ploys:
 SharePoint Admin to manually maintain a local copy of Configuration and
Administrative DBs on DR Farm
 This include all DBs required at the DR farm but that are not supported by SQL
Server AlwaysOn Availability Group with asynchronous-commit for disaster
recovery.
 A full list of the supported high availability and disaster recovery options for
SharePoint 2013 databases is located here: http://technet.microsoft.com/high
availability and disaster recovery options for each SharePoint 2013 system and
service application database
6

ACTIVATION SCENARIOS
 Four main cases:
 Data Centre Outage
 SharePoint farm incident
 SharePoint farm planned outage
 SharePoint farm standard maintenance
 For each case define:
 Rationale
 DR feature
 Actions (associated workflow)
 Escalation points
7

SCHEDULE OF EVENTS (WORKFLOWS)
 Any case scenario need an associated workflow
 Each workflow contains at least 3 stages:
 Workflow activations and initial controls and notifications
 Remediation steps
 Final controls and notifications
 Tasks are tailored to the Enterprise IT Operations’ procedures and
teams
8

DATABASE
MICROSOFT SQL SERVER
DATABASE
MICROSOFT SQL SERVER
Replica (Auto-Failover) Replica (Async)
Primary Data-Centre
F5 LTM Load Balancing
DNS
APPLICATION SERVERS
WEB FRONT ENDOFFICE WEB APPS SERVERS
Internal
Users
F5 LTM Load Balancing
APPLICATION SERVERS
WEB FRONT ENDOFFICE WEB APPS SERVERS
Disaster recovery Data-Centre
Config
DBs
Admin
DBs
LOGICAL SYSTEM OVERVIEW
9

LOGICAL SYSTEM OVERVIEW - COMMENTS
 The previous slide is a simplified overview of the three servers’ tiers:
 Web
 Application
 Database
 The DNS servers point only to Production farm
 DR farm is not operational but servers are up and running
 DR DBs receive logs only when transactions are completed
(Asyncronous replica)
 Config and Admin DBs are locally maintained on DR farm
10

ESCALATION MATRIX
 Escalation matrix has to be defined according with
Enterprise incident procedures.
 This should include escalation points outside the
organization (vendors or Microsoft)
11

DR PROCEDURES
 Use this section of the manual to detail all tasks contained
into the workflows:
 Communications
 Network tasks
 DNS/Server tasks
 DB tasks
12

HEALTH CHECKS
In this section the SharePoint team declares what are the
checks that are performed to confirm that the service is
restored onto the other farm
13

DR VALIDATION EXERCISE
 Once the SharePoint farms are configured, run a DR
exercise to validate the workflows and the associated tasks
 For each exercise arrange a specific Event Summary Log file
that contains:
 Overview of RTO and RPO under validation
 Tested scenarios
 Detailed event log for each test (see following slide)
14

EVENT SUMMARY AND LOGS
 For each test record:
 Participants (roles and names)
 Schedule of events:
 Activity progress (in minutes)
 Real activity progress as recorded
 Task Category
 Role performing the activity
 Action Required
 Comments/issues/notes
 If you have Lync or WebEx or other chat group solution you can
create a conversation with all participants, record all events and save
the conversation for review or training purposes
15

ABOUT THE AUTHOR
Emilio Gratton
 15+ years’ IT Infrastructure Project Management
 PRINCE2 Registered Practitioner
 Several experience of SharePoint infrastructure and
solutions delivery
Email: management@emiliogratton.com
16

SharePoint 2013 DR solution overview

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to SharePoint 2013 DR solution overview

Similar to SharePoint 2013 DR solution overview (20)

Recently uploaded

Recently uploaded (20)

SharePoint 2013 DR solution overview