Exchange Server 2010 - Database Availability GroupLet’s Begin DAG DAG is one of the major enhancements in exchange 2010. LCR, CCR and SCR in exchange 2007are dropped in Exchange 2010 and DAG is introduced as a single HA solution. Exchange “14” uses thesame Continuous Replication technology found into Exchange Server 2007, but unites on-site (CCR) andoff-site (SCR) data replication into a one framework. Exchange server manages all aspects of failover andno windows clustering knowledge is required as DAG configure clustering by it. DAG can have as many as15 copies (up to 16 Nodes) of Databases compared to the two node CCR cluster. DAG makes failover moregranular, database-level rather than server level. So a failure of a database running on DAG won’t result inentire server failover which affect the users in the other Databases on the server. A server that is a part ofDAG can still hold other server roles. This reduces the minimum number of servers required to build aredundant exchange environment to two. DAG can easily be stretch across sites to provide site resiliencefrom a disaster. In CCR the passive server is in an inactive condition where as in DAG databases can bedistributed among the nodes.Some Clustering BasicsBefore we begin DAG it will be better to have a look into some clustering technologies, which will helpyou to understand DAG very quickly.The concept of a cluster involves taking two or more computers and organizing them to work together toprovide higher availability, reliability and scalability than can be obtained by using a single system. Whenfailure occurs in a cluster, resources can be redirected and the workload can be redistributed. A Servercluster provides high availability by making application software and data available on several serverslinked together in a cluster configuration. If one server stops functioning, a process called failoverautomatically shifts the workload of the failed server to another server in the cluster. The failover processis designed to ensure continuous availability of critical applications and data.
There are mainly three types of clustering in Windows Server.Network Load Balancing provides failover support for IP-based applications and services that requirehigh scalability and availability. With Network Load Balancing (NLB), organizations can build groups ofclustered computers to support load balancing of Transmission Control Protocol (TCP), User DatagramProtocol (UDP) and Generic Routing Encapsulation (GRE) traffic requests. Web-tier and front-endservices are ideal candidates for NLB.Component Load Balancing, which is a feature of Microsoft Application Center 2000, provides dynamicload balancing of middle-tier application components that use COM+. With Component Load Balancing(CLB), COM+ components can be load balanced over multiple nodes to dramatically enhance theavailability and scalability of software applications.Server cluster provides failover support for applications and services that require high availability,scalability and reliability. With clustering, organizations can make applications and data available onmultiple servers linked together in a cluster configuration. Back-end applications and services, such asthose provided by database servers, are ideal candidates for Server cluster. Some of the components ofServer clusters are discussed below,QuorumA quorum is the cluster’s configuration database, it tells the cluster which node should be active. Standard quorum: It is a configuration database for the cluster and is stored on a shared harddisk, accessible to all of the cluster’s nodes.The other thing that the quorum does is to intervene when communications fail between nodes.Normally, each node within a cluster can communicate with every other node in the cluster over adedicated network connection. If this network connection were to fail though, the cluster would be splitinto two pieces, each containing one or more functional nodes that cannot communicate with the nodesthat exist on the other side of the communications failure.When this type of communications failure occurs, the cluster is said to have been partitioned. Theproblem is that both partitions have the same goal; to keep the application running. The application can’t
be run on multiple servers simultaneously though, so there must be a way of determining which partitiongets to run the application. This is where the quorum comes in. The partition that “owns” the quorum isallowed to continue running the application. The other partition is removed from the cluster. Majority Node Set (MNS) Quorum: The Main difference between a Standard Quorum and aMNS quorum is that that in MNS each node has its own, locally stored copy of the quorum database. Theother way that a MNS quorum depends on majorities is in starting the nodes. A majority of the nodes((number of nodes /2) +1) must be online before the cluster will start the virtual server. If fewer than themajority of nodes are online, then the cluster is said to “not have quorum”. In such a case, the necessaryservices will keep restarting until a sufficient number of nodes are present.One of the most important things about MNS is that you must have at least three nodes in the cluster.Remember that a majority of nodes must be running at all times. If a cluster only has two nodes, then themajority is calculated to be 2 ((2 nodes / 2) +1)-2. Therefore, if one node were to fail, the entire clusterwould go down because it would not have quorum.File share witnessThe file share witness feature is an improvement to Majority Node Set (MNS) quorum model. This featurelets you use a file share that is external to the cluster as an additional "vote" to determine the status of thecluster in a two-node MNS quorum cluster deployment.Consider a two-node MNS quorum cluster. Because an MNS quorum cluster can only run when themajority of the cluster nodes are available, a two-node MNS quorum cluster is unable to sustain the failureof any cluster node. This is because the majority of a two-node cluster is two. To sustain the failure of anyone node in an MNS quorum cluster, you must have at least three devices that can be considered asavailable. The file share witness feature enables you to use an external file share as a witness. This witnessacts as the third available device in a two-node MNS quorum cluster. Therefore, with this feature enabled,a two-node MNS quorum cluster can sustain the failure of a single cluster node. Additionally, the fileshare witness feature provides the following two functionalities: It helps protect the cluster against aproblem that is known as a split brain (a condition that occurs when all networks fail). It helps protect thecluster against a problem that is known as a partition in time.
Fundamental of DAGDAG A database availability group (DAG) is the base component of the high availability and siteresilience framework built into Microsoft Exchange Server 2010. A DAG is a group of up to 16 Mailboxservers that host a set of databases and provide automatic database-level recovery from failures that affectindividual servers or databases.A DAG is a boundary for mailbox database replication, database and server switchovers, and failovers, andfor an internal component called Active Manager. Active Manager is an Exchange 2010 component whichmanages switchovers and failovers that runs on every server in a DAG. What DAG changes 1. No more Exchange Virtual Servers/Clustered Mailbox Servers. 2. Database is no longer associated to a Server but is an Organization Level resource.
3. There is no longer a requirement to choose Cluster or Non Cluster at installation, an Exchange 2010 server can move in and out of a DAG as needed. 4. The limitation of only hosting the mailbox role on a clustered Exchange server. 5. Storage Groups have been removed from Exchange.Server A server is a unit of membership for a DAG. A server hosts active and passive copies of MultipleMailbox Databases and execute various services on Exchange Mailbox Database like Information Store,Mailbox Assistance etc. A server is also responsible for the execution of replication service on passivemailbox database copies. Server provides connection point between Information Store and RPC ClientAccess. It defines very few server-level properties relevant to High Availability (HA) like Server’s DAGand Activation Policy.Mailbox Database A database is a unit of Failover in a DAG. A database has only one active copy, it can be mountedor dismounted. A Mailbox database can have as many as 15 passive copies depending on the number ofMailbox Servers available. Ideally it takes only about 30 seconds for database failover. Serverfailover/switchover involves moving all active databases to one or more other servers. Database names areunique across a forest. Mailbox Database defines properties like GUID, EDB file path and Name of servershosting copies. Mailbox Availability Terms Active Mailbox: Provide mail services to the clients. Passive Mailbox: Available to provide mail services to the clients if active copy fails. Source Mailbox: Provides data for copying to a separate location Target Mailbox: Receives data from the source
Mailbox Database CopyIt defines the Scope of Database replication. A Database copy is either source or target of replication at anygiven time. A copy is either active or passive at any given time. Only one copy of each database in a DAGis active at a time. A server may not host one or no copy of any database.Active ManagerFor exchange server Active Directory is primary source for configuration information, whereas ActiveManager is primary source for changeable state information such as active and mounted.Active Manager is an Exchange-aware resource manager known as high availability’s brain. AM run onevery server in the DAG and manages which copies should be active and which should be passive. It isalso definitive source of information on where a database is active or mounted and provides thisinformation to other Exchange components (e.g., RPC Client Access and Hub Transport). AM Informationis stored in cluster database.In Exchange Server 2010, the Microsoft Exchange Replication service periodically monitors the health ofall mounted databases. In addition, it also monitors Extensible Storage Engine (ESE) for any I/O errors orfailures. When the service detects a failure, it notifies Active Manager. Active Manager then determineswhich database copy should be mounted and what it required to mount that database. In addition, tracksthe active copy of a mailbox database (based on the last mounted copy of the database) and provides thetracking results information to the RPC Client Access component on the Client Access server to which theclient is connected.When an administrator makes a database copy the active mailbox database, this process is known as aswitchover. When a failure affecting a database occurs and a new database becomes the active copy, thisprocess is known as a failover. This process also refers to a server failure in which one or more serversbring online the databases previously online on the failed server. When either a switchover or failoveroccurs, other Exchange Server 2010 server roles become aware of the switchover almost immediately andwill redirect client and messaging traffic to the new active database.For example, if an active database in a DAG fails because of an underlying storage failure, Active Managerwill automatically recover by failing over to a database copy on another Mailbox server in the DAG. In the
event the database is outside the automatic mount criteria and cannot be automatically mounted, anadministrator can manually perform a database failover. Primary Active Manager (PAM): PAM is the Active Manager in the DAG which decides whichcopies will be active and passive. It moves to other servers if the server hosting it is no longer able to. Youneed to move the PAM if you take a server offline for maintenance or upgrade. PAM is responsible forgetting topology change notifications and reacting to server failures. PAM is a role of an Active Manager.If the server hosting the PAM fails, another instance of Active Manager adopts the role (the one that takesownership of the cluster group). The PAM controls all movement of the active designations between adatabase’s copies (only one copy can be active at any given time, and that copy may be mounted ordismounted). The PAM also performs the functions of the SAM role on the local system (detecting localdatabase and local Information Store failures). Standby Active Manager (SAM): SAM provides information on which server hosts the activecopy of a mailbox database to other components of Exchange, e.g., RPC Client Access Service or HubTransport. SAM detects failures of local databases and the local Information Store. It reacts to failures byasking the PAM to initiate a failover (if the database is replicated). A SAM does not determine the targetof failover, nor does it update a database’s location state in the PAM. It will access the active databasecopy location state to answer queries for the active copy of the database that it receives from CAS, Hub,etc.Active Manager Best Copy SelectionWhen a failure occurs that affects a replicated mailbox database, the PAM initiates failover logic andselects the best available database copy for activation. PAM uses up to ten separate sets of criteria whenlocating the best copy to activate. When a failure affecting the active database occurs, Active Manager usesseveral sets of selection criteria to determine which database copy should be activated. Active Managerattempts to locate a mailbox database copy that has a status of Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, or SeedingSource, then depending on the status of the contentindexing, Replay Queue Length, Copy Queue length it determines the best copy to activate.
Continuous ReplicationContinuous Replication combines the asynchronous log shipping and replay technology. It includes thefollowing steps Database copy seeding of target Log copying from source to target Log inspection at target Log replay into database copyDatabase SeedingSeeding is the process of making available a baseline copy of a database on the passive nodes. Dependingon the situation, seeding can be an automatic process or a manual process in which you initiate theseeding.Automatic seeding: An automatic seed produces a copy of a database in the target location. Automaticseeding requires that all log files, including the very first log file created by the database (it contains thedatabase creation log record), be available on the source. Automatic seeding only occurs during thecreation of a new server or creation of a new database (or if the first log still exists, i.e. log truncationhasn’t occurred).Seeding using the Update-MailboxDatabaseCopy cmdlet: You can use the Update-MailboxDatabaseCopycmdlet in the Exchange Management Shell to seed a database copy. This option utilizes the streaming copybackup API to copy the database from the active location to the target location.Manually copying the offline database: This process dismounts the database and copies the database file tothe same location on the passive node. If you use this method, there will be an interruption in servicebecause the procedure requires you to dismount the database.Seeding is required under the following conditions: When a new passive node is introduced into a DAG environment and the first log file of the production Database is not available.
After a failover occurs in which data is lost as a result of the now passive copy having become diverged and unrecoverable. When the system has detected a corrupted log file that cannot be replayed into the passive copy. After an offline defragmentation of the database occurs. After a page scrubbing of the active copy of a database occurs, and you want to propagate the changes to the passive copy. After the log generation sequence for the Database group has been reset back to 1.Log ShippingLog shipping allows you to automatically send transaction log backups from a primary database on aprimary server instance to one or more secondary databases on separate secondary server instances. Logshipping in Exchange Server 2010 leverages TCP sockets and supports encryption and compression.Administrator can set TCP port to be used for replication.Replication service on target notifies the active instance the next log file it expects based on last log filewhich it inspected. Replication service on source responds by sending the required log file(s). Copied logfiles are placed in the target’s Inspector directory.Log InspectionLog inspector is Responsible for verifying that the log files are valid. The following actions are performedby LogInspector:Physical integrity inspection This validation utilizes ESEUTIL /K against the log file and validates thatthe checksum recorded in the log file matches the checksum generated in memory.Header inspection The Replication service validates the following aspects of the log file’s header: The generation is not higher than the highest generation recorded for the database in question. The generation recorded in the log header matches the generation recorded in the log filename. The log file signature recorded in the log header matches that of the log file.Removal of Exx.log Before the inspected log file can be moved into the log folder, the Replication serviceneeds to remove any Exx.log files. These log files are placed into another sub-directory of the log
directory, the ExxOutofDate directory. An Exx.log file would only exist on the target if it was previouslyrunning as a source. The Exx.log file needs to be removed before log replay occurs because it will containold data which has been superseded by a full log file with the same generation. If the closed log file is not asuperset of the existing Exx.log file, then we will have to perform an incremental or full reseedLog ReplayAfter the log files have been inspected, they are placed within the log directory so that they can bereplayed in the database copy. Before the Replication service replays the log files, it performs a series ofvalidation tests. Once these validation checks have been completed, the Replication service will replay thelog iteration.Lossy Failure ProcessIn the event of failure, the following steps will occur for the failed database: 1. Active Manager will determine the best copy to activate 2. The Replication service on the target server will attempt to copy missing log files from the source – ACLL (Attempt To Copy Last Log) 3. If successful (for example, because the server is online and the shares and necessary data are accessible), then the database will mount with zero data loss. 4. If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting 5. The mounted database will generate new log files (using the same log generation sequence) 6. Transport Dumpster requests will be initiated for the mounted database to recover lost messages 7. When original server or database recovers, it will run through divergence detection and perform an incremental reseed or require a full reseedAutoDatabaseMountDial.There are three possible values for the server setting AutoDatabaseMountDial. Lossless Lossless is zero logs lost. When the attribute is set to Lossless, under most circumstances the system waits for the failed node to come back online before databases are mounted. Even then the failed system must return with all logs accessible and not corrupted. After the failure, the
passive node is made active, and the Microsoft Exchange Information Store service is brought online. It checks to determine whether the databases can be mounted without any data loss. If possible, the databases are mounted. If they cannot be automatically mounted, the system periodically attempts to copy the logs. If the server returns with its logs intact, this attempt will eventually succeed, and the databases will mount. If the server returns without its logs intact, the remaining logs will not be available, and the affected databases will not mount automatically. In this event, administrative action is required to force the database to mount when logs are lost. Good availability Good availability is three logs lost. Good availability provides fully automatic recovery when replication is operating normally and replicating logs at the rate they are being generated. Best availability Best availability is six logs lost, which is the default setting. Best availability operates similarly to Good availability, but it allows automatic recovery when the replication experiences slightly more latency. Thus, the new active node might be slightly farther behind the state of the old active node after the failover, thereby increasing the likelihood that database divergence occurs, which requires a full reseed to correct.Incremental ResyncIn Exchange Server 2007, LLR (Lost Log Resilience) delayed writes to the active database to minimizedivergence between an old failed active and the new active, and thereby minimize the need to performfor reseeds. Changes were written in the passive database before they were written in the active database.When the old failed active came back, it was unlikely that it contained data that had never made it to thepassive before it failed. Only when it contained data that had never made it to the passive, did it have toreceive a full reseed when it came back online.In Exchange Server 2010, we now have two incremental resync solutions. Incremental resync v1 is basedon LLR depth and is only used when the waypoint = 1 (i.e. we’ve only lost one log). Incremental resyncv2 is used when more than a single log is lost and has the following process: 1. Active DB1 on server1 fails and is a lossy failure. 2. Passive DB1 copy on Server3 takes over service. 3. Sometime later, failed DB1 on Server1 comes back as passive, but contains inconsistent data.
4. Replication service on Server1 will compare the transaction logs on Server1 with Server3 starting with the newest generation and working backwards to locate the divergence point. 5. Once the divergence point is located, the log records of the diverged logs on Server1 will be scanned and a list of page records will be built. 6. The replication service will then copy over the corresponding page records and logs from Server3. In addition, the database header min/max required logs will also be obtained from the active db copy on Server3. 7. Replication Service on Server1 will then revert the changes of diverged logs by inserting the correct pages from Server3. 8. Server1’s copy’s db header will be updated with the appropriate min/max log generations. 9. Log recovery is then run to get the db copy current.Database Activation CoordinationDAC mode is used to control the activation behavior of a DAG when a catastrophic (disastrous orextremely harmful) failure occurs that affects the DAG (for example, a complete failure of one of thedatacenters). When DAC mode isnt enabled, and a failure affecting multiple servers in the DAG occurs,when a majority of servers are restored after the failure, the DAG will restart and attempt to mountdatabases. In a multi-datacenter configuration, this behavior could cause split brain syndrome, acondition that occurs when all networks fail, and DAG members cant receive heartbeat signals from eachother. Split brain syndrome also occurs when network connectivity is severed between the datacenters.Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case ofDAGs with an even number of members, the DAGs witness server) to be available and interacting for theDAG to be operational. When a majority of the members are communicating, the DAG is said to have aquorum.DAC is designed to prevent this by implementing a “mommy may I” (Datacenter Activation CoordinationProtocol (DACP)) protocol. In the event where there has been a catastrophic loss, when the DAG recoversit cannot mount databases just because quorum is present in the DAG. Instead it must coordinate with theother active managers in the DAG to determine state.
Consider the two-datacenter scenario. Suppose there is a complete power failure in the primarydatacenter. In this event, all of the servers and the WAN are down, so the organization makes the decisionto activate the standby datacenter. In almost all such recovery scenarios, when power is restored to theprimary datacenter, WAN connectivity is typically not immediately restored. This means that the DAGmembers in the primary datacenter will power up, but they won’t be able to communicate with the DAGmembers in the activated standby datacenter. The primary datacenter should always contain the majorityof the DAG quorum voters, which means that when power is restored, even in the absence of WANconnectivity to the DAG members in the standby datacenter, the DAG members in the primary datacenterhave a majority and therefore have quorum. This is a problem because with quorum, these servers may beable to mount their databases, which in turn would cause divergence from the actual active databases thatare now mounted in the activated standby datacenter.DACP was created to address this issue. Active Manager stores a bit in memory (either a 0 or a 1) that tellsthe DAG whether its allowed to mount local databases that are assigned as active on the server. When aDAG is running in DAC mode (which would be any DAG with three or more members), each time ActiveManager starts up the bit is set to 0, meaning it isnt allowed to mount databases. Because its in DACmode, the server must try to communicate with all other members of the DAG that it knows to getanother DAG member to give it an answer as to whether it can mount local databases that are assigned asactive to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. Ifanother server responds that its bit is set to 1, it means servers are allowed to mount databases, so theserver starting up sets its bit to 1 and mounts its databases.But when you recover from a primary datacenter power outage where the servers are recovered but WANconnectivity has not been restored, all of the DAG members in the primary datacenter will have a DACPbit value of 0; and therefore none of the servers starting back up in the recovered primary datacenter willmount databases, because none of them can communicate with a DAG member that has a DACP bit valueof 1.
Transport DumpsterYou may be aware that Transport Dumpster is part of HUB server. Although it is part of HUB server itworks in conjunction with DAG. So it will be better to discuss the functionality of Transport Dumpsterwhile discussing about DAG. So let’s see howTransport dumpster is a feature designed to minimize data loss by redelivering recently submittedmessages back to the mailbox server after a lossy failure.Improvements in Transport DumpsterIn Exchange 2007, messages were retained in the transport dumpster until the administrator-defined timelimit or size limit is reached. In Exchange 2010, the transport dumpster now receives feedback from thereplication pipeline to determine which messages have been delivered and replicated. As a message goesthrough Hub Transport servers on its way to a replicated mailbox database in a DAG, a copy is kept in thetransport queue (mail.que) until the replication pipeline has notified the Hub Transport server that thetransaction logs representing the message have been successfully replicated to and inspected by all copiesof the mailbox database. After the logs have been replicated to and inspected by all database copies, theyare truncated from the transport dumpster. This keeps the transport dumpster queue smaller bymaintaining only copies of messages whose transactions logs havent yet been replicated.The transport dumpster has also been enhanced to account for the changes to the Mailbox server role thatenable a single mailbox database to move between Active Directory sites. DAGs can be extended tomultiple Active Directory sites, and as a result, a single mailbox database in one Active Directory site canfail over to another Active Directory site. When this occurs, any transport dumpster redelivery requestswill be sent to both Active Directory sites: the original site and the new site.Whenever a Hub Transport server receives a message, it undergoes categorization. Part of thecategorization process involves querying Active Directory to determine if the destination Databasecontaining the recipient’s mailbox is enabled DAG. Once the message has been delivered to all recipients,the message is committed to the mail.que file on the Hub Transport server and stored in the transportdumpster inside the mail.que file. The transport dumpster is available for each Database within each
Active Directory site that has DAG enabled. There are two settings that define the life of a message withinthe transport dumpster. They are: MaxDumpsterSizePerDatabase: The MaxDumpsterSizePerDatabase parameter specifies the maximum size of the transport dumpster on a Hub Transport server for each database. The default value is 18 MB. The valid input range for this parameter is from 0 through 2147483647 KB. The recommendation is that this be set to 1.5 times the maximum message size limit within your environment. If you do not have a maximum message size limit set, then you should evaluate the messages that are delivered within your environment and set the value to 1.5 times the average message size in your organization. When you enter a value, qualify the value with one of the following units: KB (kilobytes) MB (megabytes) GB (gigabytes) TB (terabytes)Unqualified values are treated as kilobytes. MaxDumpsterTime Defines the length of time that a message remains within the transport dumpster if the dumpster size limit is not reached. The default is seven days.If either the time or size limit is reached, messages are removed from the transport dumpster by order offirst in, first out.When a failover (unscheduled outage) occurs, the Replication service will attempt to copy the missing logfiles. If the copy attempt fails, then this is known as a lossy failover and the following steps are taken. 1. If the databases are within the AutoDatabaseMountDial value, they will automatically mount. 2. The Replication service will record that the Database requires Transport Dumpster redelivery in the cluster database by setting the DumpsterRedeliveryRequired key to true. 3. The Replication service will record the Hub Transport servers that exist within the clustered mailbox server’s Active Directory site in the cluster database
4. The Replication service will calculate the loss window. This is done using the LastLogInspected marker as the start time and the current time as the end time. Since the transport dumpster is based on message delivery times, we generously pad the loss window by expanding it 12 hours back and 4 hours forward. The start time is recorded in DumpsterRedeliveryStartTime and the end time is recorded in DumpsterRedeliveryEndTime. 5. The Replication service makes an RPC call to the Hub Transport servers listed in DumpsterRedeliveryServers requesting dumpster redelivery for the given loss time window. 6. The Hub Transport server will acknowledge the first redelivery request with a Retry response. 7. The Hub Transport server will redeliver the messages it has within its transport dumpster for the allotted time window. Once the message is resubmitted for delivery, the message is removed from the transport dumpster. 8. The Replication service makes an RPC call to the Hub Transport servers listed in DumpsterRedeliveryServers requesting dumpster redelivery for the given loss time window. 9. The Hub Transport servers that have successfully redelivered the dumpster messages will acknowledge the redelivery request with a Success response. At this point the Replication service will remove those Hub Transport servers from the DumpsterRedeliveryServers key. 10. This process will continue until either all Hub Transport servers have redelivered the mail, or the MaxDumpsterTime has been reached.Note: If there are no message size limits in your organization, a single 18 MB message will purge all othermessages for given Database on a given Hub Transport server.ReferenceUnderstanding Database Availability GroupsUnderstanding Active ManagerUnderstanding Mailbox Database CopiesUnderstanding the Exchange Information StoreWhite Paper: Continuous Replication Deep DiveUnderstanding How Cluster Quorums Work