Ch05 high availability

Objectives
Understand database replication
Manage a database availability group
Understand Active Manager
Understand site resiliency for Exchange 2013

Overview of Critical Services
Messaging services have the following requirements
◦ Business-critical
◦ Always online with minimal to no data loss
◦ Capable of surviving a variety of failure scenarios
◦ Flexible
◦ Managed during business hours with minimal impact on end users
◦ Fast and secure

Database Availability Groups (DAGs)
Method of providing database resiliency by replicating the active Mailbox database to other
Mailbox servers
DAG configuration includes
◦ Add mailbox servers to the DAG as members
◦ Decide which databases will be replicated to which members
◦ One Mailbox server will have an “active” copy of the database while others store a “passive” copy
◦ If the active database fails a passive copy will become active
◦ Minimal to no interruption of Mailbox service to end users
◦ Business and technical requirements drive the DAG to stretch across multiple datacenters

Database Replication
DAG is the boundary to replicate database content between Mailbox servers
Uses the Microsoft Exchange Replication Service to continuously replicate transaction logs
between the active and passive copies
TCP port 64327 used for replication
Mailbox databases write transactions to memory (log buffer) then to disk (transaction log) and
then commit the transactions to the Mailbox database
Uses file mode and block mode replication
◦ File mode replicates the transaction logs between servers (1 MB in size)
◦ Once file mode is up-to-date block mode begins which replicates the log buffer to passive DAG
members. This minimizes loses in the event of failure.

DAG Requirements
Require clustering components included with Windows Server 2008 R2 Enterprise or Datacenter and Windows Server
2012 Standard and Datacenter
All DAG members must be running the same OS version
Supported by both Exchange 2013 versions, Standard and Enterprise
Maximum of 16 members in a single DAG
Must be members of the same domain
Exchange Mailbox server installed on domain controllers is not support by Microsoft for DAG membership
DAG name has a 15 character limitation
All DAG members should use the same number of NICs and have connectivity to all members
If using multiple NICs they must be on different networks
Round-trip latency limit of 500 milliseconds between members
Requires a non-DAG member to be used as a file share witness

DAG Quorum
DAG availability is based on quorum, which is maintained when a majority of the Mailbox
servers are online
Formula N/2+1 to determine number of servers required to maintain quorum
◦ 7 Mailbox server DAG: 7/2+1 (round down) = 4 DAG members must be active to maintain quorum
Traditionally, once quorum is lost the cluster would be marked as offline and mailbox databases
are dismounted
Windows Server 2012 uses “dynamic quorum” which adjusts the number of servers required to
maintain quorum after a failure making it more resilient
Quorum changes depending on number of nodes
◦ Even number: A Node and File Share Majority (uses a file share witness server)
◦ Odd number: A Node Majority

File Share Witness
A witness server is a domain joined computer that is not part of a DAG that can be used to
maintain quorum when a DAG contains an even number of Mailbox servers
Example:
◦ Two datacenters with 4 servers in each site (8 MB servers) loses its connection between datacenters
◦ Formula: N/2+1 8/2+1=5 servers required to maintain quorum
◦ Without a witness server once the connection is broken neither datacenters continue to function since
they have less than the number required for quorum
The site with the witness server gets an additional “vote” therefore maintaining quorum

Symmetric Database Copies
Microsoft recommendations for using a single volume
◦ Use a single volume for the entire disk
◦ Number of copies of each database should equal the number
of copies per disk
◦ Activation preference should be balanced across DAG members
Note that Windows Server Backup targets the entire
volume

DAG Member Network Interfaces
Microsoft recommends using multiple interfaces on DAG members
Maximum of one NIC used for client connections (MAPI NIC)
◦ Uses a default gateway
◦ Enable File and Printer Sharing for Microsoft Networks
◦ Enable Client for Microsoft Networks
◦ Register in DNS
◦ Highest priority in binding order
One or more NICs for DAG communication
◦ Separate subnet from MAPI NIC
◦ No default gateway
◦ Disable File and Printer Sharing for Microsoft Networks
◦ Disable Client for Microsoft Networks
◦ Do not register in DNS
◦ Do not use NICs configured for iSCSI

Lagged Mailbox Database Copies
A lagged mailbox database copy is a replication partner of an active database that delays
committing transactions to the database for a predetermined period of time (replay lag time)
◦ Replay lag time specifies how long to wait until the transaction log is committed to the mailbox
database
◦ Truncation lag time defines when the transaction log file will be deleted from disk (begins after replay
lag time has completed)
Note:
◦ The duration of the ReplayLagTimes parameter should match the duration of time Safety Net stores
email messages (default is 2 days)
◦ The Safety Net is a message queue retention mechanism used to store a copy of each message
delivered to the active copy of a mailbox database

Automatic Reseed
New to Exchange 2013
Involves pre-mapping volumes and using mount
points to plan an automated reseeding of a failed
database replica back to a healthy state

Active Manager
Active Manager runs on all Mailbox servers inside the Microsoft Exchange Replication Service
and is responsible for managing the active database copies in a DAG during failover
When the Mailbox server is in a DAG there are two Active Manager roles
◦ Primary (PAM)
◦ Held by one DAG member
◦ Has ownership of the cluster quorum resources
◦ Standby Active Manager (SAM)
◦ Available to become primary should the current role holder fail
◦ Notifies the PAM of active local databases
◦ If an active database fails the SAM notifies the PAM to begin a failover to a passive database copy
◦ If the entire server fails, the PAM is already aware of the active databases that were held by that Mailbox server and will begin
failover to a passive copy of the database

Best Copy and Server
Selection (BCSS)
Process used by the PAM Mailbox server for automatic selection of
a new active database during failover/switchover when the
Administrator has not specified a target server
Uses 10 sets of criteria when determining a new active database
If the selected server passes the 10 sets of criteria and the replay
queue length is less than the amount of acceptable logs lost then it
will become the new active database
AutoDatabaseMountDial cmdlet defines the acceptable
number of missing transaction logs

Best Copy and Server Selection (BCSS)
The order used to select a new active database is as follows:
◦ Meets 10 sets of criteria
◦ Copy Queue Length: number of log files waiting to be copied and inspected
◦ Activation Preference: Administrative preference number
◦ Replay Queue Length: number of log files waiting to be replayed into this copy of the database
• MBX4DB1 becomes the Active database
– Meets the first set of criteria (as do the others)
– Lowest Copy Queue Length

Site Resiliency
Managing multi-site Exchange organizations has been simplified in Exchange 2013, no longer do
clients connect to specific namespaces or connect via RPC and instead use HTTP and allow
connections to any CAS server
◦ Exchange 2010
◦ Clients connected to a CAS namespace for a database which effectively made the CAS a point of failure. If a DAG failed over to
another datacenter, the admin had to update the RpcClientAccessServer parameter to update the CAS servers of the new location
Use multiple DNS A records resolving the same name to the IP address of multiple CAS servers in
different sites. If a local CAS server doesn’t respond a remote CAS can proxy the request back to
a local DAG

Page Patching
When the DAG spans multiple sites there greater delay in replaying transaction logs to the
passive database
This can result in divergence between datacenters in the event of failure
Replication service will attempt to update the transaction logs with information from the active
database
If the passive database becomes active without being fully updated there will be a loss in
content contained within the remaining transaction logs
In this event, the Administrator must determine which log files are missing and use a recovery
mailbox database to restore the missing log files and export the content into a PST file

Site Resiliency Scenarios
SINGLE DAG – TWO SITES
Primary datacenter contains the majority of
MB servers
If the MB server number is even then a File
Share Witness is placed at the primary
datacenter
Issues
◦ If end users are located in both sites, failure of
the WAN link will result in loss of email service
for users in the secondary datacenter
◦ Requires manual failover to the secondary
datacenter should the primary fail

Site Resiliency Scenarios
SINGLE DAG – THREE SITES
Microsoft preferred solution
Uses an even number of DAG members across
both sites with the File Share Witness located
in the third datacenter
Benefit is that either datacenter can fail and
quorum is maintained keeping the DAG active
Issues
◦ Failure of the WAN from the secondary
datacenter will result in end users at that
location from accessing email services

Site Resiliency
Scenarios
MULTIPLE DAGS – TWO SITES
Used when the WAN connection doesn’t
support the required throughput required for
continuous replication
Allows mailbox services to be available at both
locations in the event of a WAN link failure by
having local users associated with a local
active DAG
Issues
◦ Requires more servers, storage and support

Patch Management
Cumulative updates are released every 3 months (quarterly).
Cumulative update is a full product. It is possible to install Exchange 2013 from scratch using a
cumulative update download, as well as to upgrade a previous release to the latest software
level.
Cumulative updates may include schema updates and therefore need to be planned considering
the entire forest not just a single server and require Enterprise Admin and Schema Admin
privileges.
21

Maintenance Mode
Used when installing a cumulative update to a server within a DAG. Placing a DAG member in maintenance mode will
move all active databases off the server and blocks any other server from moving a database to this server.
The DAG member is put into Maintenance Mode by using the following commands in EMS:
◦ CD $ExScripts
.StartDAGServerMaintenance.ps1 -Server AMS-EXCH01
When the DAG member is upgraded (and rebooted), it can be put back into normal operation using the following
commands in EMS:
◦ CD $ExScripts
.StopDAGServerMaintenance.ps1 -Server AMS-EXCH01
The last step is to redistribute the mailbox databases across all the DAG members. Again, the
RedistributeActiveDatabases.ps1 script can be found in the $ExScripts directory so you can use the following command in
EMS. This redistributes the active mailbox databases across the DAG based on their activation preference.
◦ CD $ExScripts
.RedistributeActiveDatabases.ps1 -DagName DAG01 –BalanceDbsByActivationPreference
-Confirm:$False
22

References
Sybex, Mastering Microsoft Exchange 2013 by David Elfassy

Ch05 high availability

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Ch05 high availability

Similar to Ch05 high availability (20)

Recently uploaded

Recently uploaded (20)

Ch05 high availability