SharePoint 2013 DR solution:
An overview of a workable solution
for mid-size Enterprises
An example of implementation and
DR Documentation content
Outline:
- Business Requirements
- Recovery Time Objective (RTO) and Recovery Point
- Objective (RPO)
- Prerequisites
- Activation Scenarios
- Schedule of events (workflows)
- Logical System overview
- Escalation matrix
- DR procedures
- Health checks
- DR validation exercise
- Event Summary and logs
IAC 2024 - IA Fast Track to Search Focused AI Solutions
SharePoint 2013 DR solution overview
1. SHAREPOINT 2013 DR SOLUTION
(WARM STAND-BY)
An overview of a workable solution
for mid-size Enterprises
An example of implementation and
DR Documentation content
Emilio Gratton – ICT Project Manager
EG IT Services
1
2. OUTLINE
Business Requirements
Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
Prerequisites
Activation Scenarios
Schedule of events (workflows)
Logical System overview
Escalation matrix
DR procedures
Health checks
DR validation exercise
Event Summary and logs
2
3. BUSINESS REQUIREMENTS
100% availability (24/7) of personal files
Same downtime availability as the hosting Data Centre
30 minutes service restoration over DR DC in R/W mode in case of
major Data Centre planned or unplanned outage
30 minutes restoration over DR in Read/only mode in case of planned
farm outage (service pack release case)
3
4. RECOVERY TIME OBJECTIVE (RTO) AND
RECOVERY POINT OBJECTIVE (RPO)
The RTO is the agreed time duration between a failure and the restoration of
service. In this solution, it has been defined as 30 minutes.
The RPO has been defined as a minimum service with the following:
Web Front End server restored (SharePoint 2013 main page accessible with links
operational)
Search service restored (a query successfully displays results related to internal
documentation)
Personal page displaying all links and documents
4
5. PREREQUISITES 1/2
Data Centres connected with fast and reliable dedicated link
Host are virtualized to ensure host HA
Windows servers are Load Balanced , SQL Servers are clustered
Infrastructure patch level consistent across DCs (SCCM to monitor and
report)
DR farm is kept updated as per latest updates applied to the Production
farm
Customized code and solutions are kept updated in both Farms
(blogs.msdn.com/..../managing-custom-solutions-for-disaster-recovery-sharepoint-farms )
5
6. PREREQUISITES 2/2
Local DR DBs maintenance managed with the following ploys:
SharePoint Admin to manually maintain a local copy of Configuration and
Administrative DBs on DR Farm
This include all DBs required at the DR farm but that are not supported by SQL
Server AlwaysOn Availability Group with asynchronous-commit for disaster
recovery.
A full list of the supported high availability and disaster recovery options for
SharePoint 2013 databases is located here: http://technet.microsoft.com/high
availability and disaster recovery options for each SharePoint 2013 system and
service application database
6
7. ACTIVATION SCENARIOS
Four main cases:
Data Centre Outage
SharePoint farm incident
SharePoint farm planned outage
SharePoint farm standard maintenance
For each case define:
Rationale
DR feature
Actions (associated workflow)
Escalation points
7
8. SCHEDULE OF EVENTS (WORKFLOWS)
Any case scenario need an associated workflow
Each workflow contains at least 3 stages:
Workflow activations and initial controls and notifications
Remediation steps
Final controls and notifications
Tasks are tailored to the Enterprise IT Operations’ procedures and
teams
8
9. DATABASE
MICROSOFT SQL SERVER
DATABASE
MICROSOFT SQL SERVER
Replica (Auto-Failover) Replica (Async)
Primary Data-Centre
F5 LTM Load Balancing
DNS
APPLICATION SERVERS
WEB FRONT ENDOFFICE WEB APPS SERVERS
Internal
Users
F5 LTM Load Balancing
APPLICATION SERVERS
WEB FRONT ENDOFFICE WEB APPS SERVERS
Disaster recovery Data-Centre
Config
DBs
Admin
DBs
LOGICAL SYSTEM OVERVIEW
9
10. LOGICAL SYSTEM OVERVIEW - COMMENTS
The previous slide is a simplified overview of the three servers’ tiers:
Web
Application
Database
The DNS servers point only to Production farm
DR farm is not operational but servers are up and running
DR DBs receive logs only when transactions are completed
(Asyncronous replica)
Config and Admin DBs are locally maintained on DR farm
10
11. ESCALATION MATRIX
Escalation matrix has to be defined according with
Enterprise incident procedures.
This should include escalation points outside the
organization (vendors or Microsoft)
11
12. DR PROCEDURES
Use this section of the manual to detail all tasks contained
into the workflows:
Communications
Network tasks
DNS/Server tasks
DB tasks
12
13. HEALTH CHECKS
In this section the SharePoint team declares what are the
checks that are performed to confirm that the service is
restored onto the other farm
13
14. DR VALIDATION EXERCISE
Once the SharePoint farms are configured, run a DR
exercise to validate the workflows and the associated tasks
For each exercise arrange a specific Event Summary Log file
that contains:
Overview of RTO and RPO under validation
Tested scenarios
Detailed event log for each test (see following slide)
14
15. EVENT SUMMARY AND LOGS
For each test record:
Participants (roles and names)
Schedule of events:
Activity progress (in minutes)
Real activity progress as recorded
Task Category
Role performing the activity
Action Required
Comments/issues/notes
If you have Lync or WebEx or other chat group solution you can
create a conversation with all participants, record all events and save
the conversation for review or training purposes
15
16. ABOUT THE AUTHOR
Emilio Gratton
15+ years’ IT Infrastructure Project Management
PRINCE2 Registered Practitioner
Several experience of SharePoint infrastructure and
solutions delivery
Email: management@emiliogratton.com
16