UNC307 - Microsoft Exchange Server 2010 High Availability

3,524 views
3,324 views

Published on

Welcome to the future! The future of Exchange high availability, that is. In this session we reveal the changes and improvements to the built-in high availability platform in Exchange Server 2010. Exchange 2010 includes a unified solution for high availability and disaster recovery that is quick to deploy and easy to manage. Learn about all of the new features in Exchange 2010 that make it the most resilient, highly available version of Exchange ever.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,524
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
273
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

UNC307 - Microsoft Exchange Server 2010 High Availability

  1. 1.
  2. 2. High Availability<br />Scott Schnoll<br />Principal Technical Writer<br />Microsoft Corporation<br />Session Code: UNC307<br />
  3. 3. Agenda<br />Exchange 2010 High Availability Vision/Goals<br />Exchange 2010 High Availability Features<br />Exchange 2010 High Availability Deep Dive<br />Deploying Exchange 2010 High Availability Features<br />Transitioning to Exchange 2010 High Availability<br />High Availability Design Examples<br />
  4. 4. Exchange 2010 High Availability Vision/Goals<br />
  5. 5. Exchange 2010 High Availability Vision and Goals<br />Vision: Deliver a fast, easy-to-deploy and operate, economical solution that can provide messaging service continuity for all customers<br />Goals<br />Deliver a native solution for high availability/site resilience<br />Enable less expensive and less complex storage<br />Simplify administration and reduce support costs<br />Increase end-to-end availability<br />Support Exchange Server 2010 Online<br />Support large mailboxes at low cost<br />
  6. 6. Complex site resilience and recovery<br />Dallas <br />DB1<br />Outlook<br />OWA, ActiveSync, or Outlook Anywhere<br />DB2<br />Standby Cluster<br />DB3<br />Clustered Mailbox Server had to be created manually<br />San Jose<br />Front End Server<br />Third-party data replication needed for site resilience<br />NodeB(passive)<br />NodeA(active)<br />Clustering knowledge required<br />Failover at Mailbox server level<br />DB1<br />DB4<br />DB2<br />DB5<br />DB3<br />DB6<br />Exchange Server 2003<br />
  7. 7. Complex activation for remote server / datacenter<br />Dallas <br />DB1<br />SCR<br />Outlook<br />OWA, ActiveSync, or Outlook Anywhere<br />DB2<br />Standby Cluster<br />DB3<br />Clustered Mailbox Server can’t co-exist with other roles<br />San Jose<br />Client Access Server<br />No GUI to manage SCR<br />NodeB(passive)<br />NodeA(active)<br />CCR<br />Clustering knowledge required<br />DB1<br />DB4<br />DB1<br />DB4<br />DB2<br />DB2<br />DB5<br />DB5<br />Failover at Mailbox server level<br />DB3<br />DB3<br />DB6<br />DB6<br />Exchange Server 2007<br />
  8. 8. Dallas <br />All clients connect via CAS servers<br />DB1<br />DB3<br />Client<br />DB5<br />Mailbox Server 6<br />San Jose<br />Easy to extend across sites<br />Client Access Server<br />Failover managed by/with Exchange<br />Mailbox Server 1<br />Mailbox Server 2<br />Mailbox Server 3<br />Mailbox Server 4<br />Mailbox Server 5<br />DB1<br />DB4<br />DB1<br />DB5<br />DB3<br />DB2<br />Database level failover<br />DB5<br />DB2<br />DB1<br />DB4<br />DB3<br />DB3<br />DB1<br />DB2<br />DB4<br />DB5<br />Exchange Server 2010<br />
  9. 9. Exchange 2010 High Availability Features<br />
  10. 10. Exchange 2010 High Availability Terminology<br />High Availability – Solution must provide data availability, service availability, and automatic recovery from failures<br />Disaster Recovery – Process used to manually recover from a failure<br />Site Resilience – Disaster recovery solution used for recovery from site failure<br />*over – Short for switchover/failover; a switchover is a manual activation of one or more databases; a failover is an automatic activation of one or more databases after a failure<br />
  11. 11. Exchange 2010 High Availability Feature Names<br />Mailbox Resiliency – Name of Unified High Availability and Site Resilience Solution<br />Database Mobility – The ability of a single mailbox database to be replicated to and mounted on other mailbox servers<br />Incremental Deployment – The ability to deploy high availability /site resilience after Exchange is installed<br />Exchange Third Party Replication API – An Exchange-provided API that enables use of third-party replication for a DAG in lieu of continuous replication<br />
  12. 12. Exchange 2010 High Availability Feature Names<br />Database Availability Group – A group of up to 16 Mailbox servers that host a set of replicated databases<br />Mailbox Database Copy – A mailbox database (.edb file and logs) that is either active or passive<br />RPC Client Access service – A Client Access server feature that provides a MAPI endpoint for Outlook clients<br />Shadow Redundancy – A transport feature that provides redundancy for messages for the entire time they are in transit<br />
  13. 13. Exchange 2010 *overs<br />Within a datacenter<br />Database or server *overs<br />Datacenter level: switchover<br />Between datacenters<br />Database or server *overs<br />Assumptions:<br />Each datacenter is a separate Active Directory site<br />Each datacenter has live, active messaging services<br />Standby datacenter must be active to support single database *over<br />
  14. 14. Exchange 2007 Concepts Brought Forward<br />Extensible Storage Engine (ESE)<br />Databases and log files<br />Continuous Replication<br />Log shipping and replay<br />Database seeding<br />Store service/Replication service<br />Database health and status monitoring<br />Divergence<br />Automatic database mount behavior<br />Concepts of quorum and witness<br />Concepts of *overs<br />
  15. 15. Exchange 2010 Cut Concepts<br />Storage Groups<br />Databases identified by the server on which they live<br />Server names as part of database names<br />Clustered Mailbox Servers<br />Pre-installing a Windows Failover Cluster<br />Running Setup in Clustered Mode<br />Moving a CMS network identity between servers<br />Shared Storage<br />Two HA Copy Limits<br />Requirement of Two Networks<br />Concepts of public, private and mixed networks<br />
  16. 16. HA/Backup Strategy Changes<br />Exchange 2010<br />Feature Set<br />Feature Benefits<br />HW/SW Failures<br />Mailbox Resiliency<br />Fast Recovery<br /><ul><li>Fast recovery
  17. 17. Data redundancy</li></ul>Data Center Failures<br />Single Item Recovery<br />Accidentally Deleted Items<br /><ul><li>Guaranteed item retention</li></ul>Administrator Error<br />Data Retention <br />Lagged Copy<br /><ul><li>Past point-in-time DB copy</li></ul>Mailbox Corruption<br />Personal Archive + Retention Policies<br />Long Term<br /> Data Retention<br /><ul><li>Alternate mailbox for older data</li></li></ul><li>Exchange 2010 High Availability Deep Dive<br />
  18. 18. Exchange 2010 HA Fundamentals<br />Database Availability Group<br />Server<br />Database<br />Database Copy<br />Active Manager<br />RPC Client Access<br />RPC CAS<br />SVR<br />DB<br />DB<br />copy<br />copy<br />copy<br />copy<br />AM<br />AM<br />SVR<br />DAG<br />RPC CAS<br />
  19. 19. Database Availability Group (DAG)<br />Base component of high availability and site resilience<br />A group of up to 16 servers that host a set of replicated databases<br />“Wraps” a Windows Failover Cluster<br />Manages membership (DAG member = node)<br />Provides heartbeat of DAG member servers<br />Active Manager stores data in cluster database<br />Defines a boundary for:<br />Mailbox database replication<br />Database and server *overs<br />Active Manager<br />
  20. 20. DAG Requirements<br />Windows Server 2008 SP2 Enterprise Edition or Windows Server 2008 R2 Enterprise Edition<br />Exchange Server 2010 Standard Edition or Exchange Server 2010 Enterprise Edition<br />Standard supports up to 5 databases per server<br />Enterprise supports up to 100 databases per server<br />At least one network card per DAG member<br />
  21. 21. Active Manager<br />Exchange component that manages *overs<br />Runs on every server in the DAG<br />Selects best available copy on failovers<br />Is the definitive source of information on where a database is active<br />Stores this information in cluster database<br />Provides this information to other Exchange components (e.g., RPC Client Access and Hub Transport)<br />Two Active Manager roles: PAM and SAM<br />Active Manager client runs on CAS and Hub<br />
  22. 22. Active Manager<br />Primary Active Manager (PAM)<br />Runs on the node that owns the cluster group<br />Gets topology change notifications<br />Reacts to server failures<br />Selects the best database copy on *overs<br />Standby Active Manager (SAM)<br />Runs on every other node in the DAG<br />Responds to queries about which server hosts the active copy of the mailbox database<br />Both roles are necessary for automatic recovery<br />If Replication service is stopped, automatic recovery will not happen<br />
  23. 23. Active ManagerSelection of Active Database Copy<br />Active Manager selects the “best” copy to become active when existing active fails<br />Ignores servers that are unreachable or activation is temporarily or regularly blocked<br />Sorts copies by currency to minimize data loss<br />Breaks ties during sort based on Activation Preference<br />Selects from sorted listed based on copy status of each copy<br />
  24. 24. Active ManagerSelection of Active Database Copy<br />Active Manager selects the “best” copy to become active when existing active fails<br />8<br />6<br />9<br />5<br />7<br />10<br />Catalog Crawling<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />CopyQueueLength &lt; 10<br />Catalog Healthy<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />Catalog Crawling<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />Catalog Healthy<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />CopyQueueLength &lt; 10<br />ReplayQueueLength &lt; 50<br />Catalog Crawling<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />CopyQueueLength &lt; 10<br />ReplayQueueLength &lt; 50<br />Catalog Healthy<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />ReplayQueueLength &lt; 50<br />Catalog Crawling<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />ReplayQueueLength &lt; 50<br />Catalog Healthy<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />CopyQueueLength &lt; 10<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />ReplayQueueLength &lt; 50<br />Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource<br />
  25. 25. Automatic Recovery Process<br />When a failure occurs that affects a database:<br />Active Manager determines the best copy to activate<br />The Replication service on the target server attempts to copy missing log files from the source (ACLL)<br />If successful, then the database will mount with zero data loss<br />If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting <br />The mounted database will generate new log files (using the same log generation sequence)<br />Transport Dumpster requests will be initiated for the mounted database to recover lost messages<br />When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed<br />
  26. 26. Example: Database Failover<br />Database failure occurs<br />Failure item is raised<br />Active Manager moves active database<br />Database copy is restored<br />Similar flow within and across datacenters<br />DAG<br />Mailbox Server 1<br />Mailbox Server 2<br />Mailbox Server 3<br />Mailbox Server 4<br />Mailbox Server 5<br />DB3<br />DB2<br />DB4<br />DB3<br />DB4<br />DB1<br />DB5<br />DB4<br />DB5<br />DB5<br />DB2<br />DB1<br />DB3<br />DB1<br />DB2<br />
  27. 27. Example: Server Failover<br />Server failure occurs<br />Cluster notification of node down<br />Active Manager moves active databases<br />Server is restored<br />Cluster notification of node up<br />Database copies resynchronize with active databases<br />Similar flow within and across datacenters<br />DAG<br />Mailbox Server 1<br />Mailbox Server 2<br />Mailbox Server 3<br />Mailbox Server 4<br />Mailbox Server 5<br />DB3<br />DB2<br />DB4<br />DB3<br />DB4<br />DB1<br />DB5<br />DB4<br />DB5<br />DB5<br />DB2<br />DB1<br />DB3<br />DB1<br />DB2<br />
  28. 28. Example: RCA service and AM<br />Outlook tries to reconnect<br />Outlook tries again<br />Outlook1<br />Outlook3<br />Outlook2<br />CAS Array<br />Load Balancer<br />RPC Client Access Server<br />RPC Client Access Server<br />RPC Client Access Server<br />Active Manager Client<br />Active Manager Client<br />Active Manager Client<br />CAS1<br />CAS2<br />CAS3<br />Active Manager Returns Mailbox Server1<br />Outlook’s reconnect triggers new AM request<br />If failover is in progress AM returns old server & connect fails<br />DB failover is complete & AM returns new server<br />Disk Fails<br /> CAS Fails<br />Where’s the DB mounted?<br />DAG<br />MAPI RPC<br />Active Manager<br />MAPI RPC<br />Active Manager<br />MAPI RPC<br />Active Manager<br />MAPI RPC<br />Active Manager<br />Store<br />Store<br />Store<br />Store<br />Mailbox<br />Server1<br />Mailbox<br />Server2<br />Mailbox<br />Server3<br />Mailbox<br />Server4<br />
  29. 29. DAG Lifecycle<br />DAG is created initially as empty object in Active Directory<br />Continuous replication or 3rd party replication using Third Party Replication mode<br />DAG is given a name and one or more IP addresses (or configured to use DHCP)<br />When first Mailbox server is added to a DAG<br />A Windows failover cluster is formed with a Node Majority quorum using the name of the DAG <br />The server is added to the DAG object in Active Directory<br />A cluster network object (CNO) for the DAG is created in the built-in Computers container<br />The Name and IP address of the DAG is registered in DNS<br />The cluster database for the DAG is updated with info on configured databases, including if they are locally active (which they should be)<br />
  30. 30. DAG Lifecycle<br />When second and subsequent Mailbox server is added to a DAG<br />The server is joined to cluster for the DAG<br />The quorum model is automatically adjusted<br />Node Majority - DAGs with odd number of members<br />Node and File Share Majority - DAGs with even number of members<br />File share witness cluster resource, directory, and share are automatically created by Exchange when needed<br />The server is added to the DAG object in Active Directory<br />The cluster database for the DAG is updated with info on configured databases, including if they are locally active (which they should be)<br />
  31. 31. DAG Lifecycle<br />After servers have been added to a DAG<br />Configure the DAG<br />Network Encryption<br />Network Compression<br />Configure DAG networks<br />Network subnets<br />Enable/disable MAPI traffic/replication<br />Create mailbox database copies<br />Seeding is performed automatically<br />Monitor health and status of database copies<br />Perform switchovers as needed<br />
  32. 32. DAG Lifecycle<br />Before you can remove a server from a DAG, you must first remove all replicated databases from the server<br />When a server is removed from a DAG:<br />The server is evicted from the cluster<br />The cluster quorum is adjusted as needed<br />The server is removed from the DAG object in Active Directory<br />Before you can remove a DAG, you must first remove all servers from the DAG<br />
  33. 33. Deploying Exchange 2010 HA Features<br />
  34. 34. Deploying Exchange 2010 HA Features<br />
  35. 35. Exchange 2010 Incremental Deployment<br />Create a DAGNew-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer EXHUB1 -WitnessDirectory C:DAG1FSW -DatabaseAvailablityGroupIpAddresses 10.0.0.8New-DatabaseAvailabilityGroup -Name DAG2 -DatabaseAvailablityGroupIpAddresses 10.0.0.8,192.168.0.8<br />Add first Mailbox Server to DAGAdd-DatabaseAvailbilityGroupServer -Identity DAG1 -MailboxServer EXMBX1<br />Add second and subsequent Mailbox ServerAdd-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX2<br />Add a Mailbox Database CopyAdd-MailboxDatabaseCopy -Identity MBXDB1 -MailboxServer EXMBX3<br />Extend as needed<br />
  36. 36. Transitioning to Exchange 2010 High Availability<br />
  37. 37. Transition Steps<br />Verify that you meet requirements for Exchange 2010<br />Deploy Exchange 2010<br />Use Exchange 2010 mailbox move features to migrate<br />Unsupported Transitions<br />In-place upgrade to Exchange 2010 from any previous version of Exchange<br />Using database portability between Exchange 2010 and non-Exchange 2010 databases<br />Backup and restore of earlier versions of Exchange databases on Exchange 2010<br />Using continuous replication between Exchange 2010 and Exchange 2007<br />
  38. 38. Exchange Server 2010 High Availability Design Examples<br />
  39. 39. High Availability Design ExampleBranch/Small Office Design<br />Hardware Load Balancer<br />8 processor cores recommended with a maximum of 64GB RAM<br />Member servers of DAG can host other server roles<br />Client Access<br />Hub Transport<br />Mailbox<br />Client AccessHub TransportMailbox<br />DB1<br />DB1<br />UM role not recommended for co-location<br />2-server DAGs should use RAID<br />DB2<br />DB2<br />DB2<br />DB3<br />DB3<br />
  40. 40. High Availability Design ExampleDouble Resilience – Maintenance + DB Failure<br />2 servers out -&gt; manual activation of server 3 <br />In 3 server DAG, quorum is lost<br />DAGs with more servers sustain more failures – greater resiliency <br />AD: Dublin<br />Single Site<br />3 Nodes<br />3 HA Copies <br />CAS NLB Farm<br />JBOD -&gt; 3 physical Copies <br />X<br />Mailbox<br />Server 1<br />Mailbox<br />Server 2<br />Mailbox<br />Server 3<br />X<br />DB2<br />DB1<br />DB3<br />DB2<br />DB1<br />DB3<br />DB2<br />DB1<br />DB3<br />DB4<br />DB5<br />DB6<br />DB4<br />DB5<br />DB6<br />DB5<br />DB6<br />DB4<br />Database Availability Group<br />
  41. 41. High Availability Design ExampleDouble Node/Disk Failure Resilience<br />AD: Dublin<br /><ul><li>Single Site
  42. 42. 4 Nodes
  43. 43. 3 HA Copies
  44. 44. JBOD -> 3 physical Copies
  45. 45. Upgrade server 1
  46. 46. Server 2 fails
  47. 47. Server 1 upgrade is done
  48. 48. 2 active copies die</li></ul>CAS NLB Farm<br />X<br />Mailbox<br />Server 1<br />Mailbox<br />Server 2<br />Mailbox<br />Server 3<br />Mailbox<br />Server 4<br />X<br />DB6<br />DB4<br />DB5<br />DB3<br />DB7<br />DB5<br />DB2<br />DB1<br />DB3<br />DB8<br />DB7<br />DB1<br />DB8<br />DB1<br />DB2<br />DB6<br />DB7<br />DB8<br />DB5<br />DB4<br />DB6<br />DB2<br />DB3<br />DB4<br />Database Availability Group (DAG)<br />
  49. 49. High Availability on JBOD6 Servers, 3 Racks, 3 Copy DAG <br />24,000 Mailboxes<br />Heavy Profile: 100<br /> Messages/day<br />.1 IOPS/Mailbox<br />MAPI network<br />2GB Mailbox Size<br />8 Cores<br />48 GB RAM<br />8 Cores<br />48 GB RAM<br />Replication network<br />4,000 Active Mbxs/Svr<br />Mbx Server 1<br />Mbx Server 2<br />6 Servers, 3 Copies = double server failure resiliency<br />DB1<br />DB2<br />DB3<br />DB4<br />DB5<br />DB6<br />DB46<br />DB47<br />DB48<br />DB49<br />DB50<br />DB51<br />DB52<br />DB53<br />DB31<br />DB32<br />DB54<br />DB33<br />4,000 Active Mbxs/Svr<br />DB1<br />DB7<br />DB8<br />DB9<br />DB10<br />DB11<br />DB12<br />DB55<br />DB56<br />DB57<br />DB58<br />DB59<br />DB60<br />DB61<br />DB62<br />DB34<br />DB35<br />DB63<br />DB36<br />1st failure: ~5,000 active<br />DB13<br />DB14<br />DB15<br />DB16<br />DB17<br />DB18<br />DB64<br />DB65<br />DB66<br />DB67<br />DB68<br />DB69<br />DB70<br />DB71<br />DB37<br />DB38<br />DB72<br />DB39<br />2nd failure: 6,000 active<br />DB1<br />Soft active limit: 24<br />DB19<br />DB20<br />DB21<br />DB22<br />DB23<br />DB24<br />DB73<br />DB74<br />DB75<br />DB76<br />DB77<br />DB78<br />DB79<br />DB80<br />DB40<br />DB41<br />DB81<br />DB42<br />DB25<br />DB26<br />DB27<br />DB28<br />DB29<br />DB30<br />DB82<br />DB83<br />DB84<br />DB85<br />DB86<br />DB87<br />DB88<br />DB89<br />DB43<br />DB44<br />DB90<br />DB45<br />1TB 7.2k SATA disks<br />JBOD: 48 Disks/node<br />Online Spares (3)<br />Database Availability Group (DAG)<br />288 disks total<br />30 TB of db space<br />Battery Backed<br /> Caching Array<br /> Controller<br />Active copy<br />Passive copy<br />Spare Disk<br />Legend<br />
  50. 50. Key Takeaways<br />Greater end-to-end availability with Mailbox Resiliency<br />Unified framework for high availability and site resilience<br />Faster and easier to deploy with Incremental Deployment<br />Reduced TCO with core ESE architecture changes and more storage options<br />Supports large mailboxes for less money<br />
  51. 51. question & answer<br />
  52. 52. Required Slide<br />Speakers, <br />TechEd 2009 is not producing <br />a DVD. Please announce that <br />attendees can access session <br />recordings at TechEd Online. <br />www.microsoft.com/teched<br />Sessions On-Demand & Community<br />www.microsoft.com/learning<br />Microsoft Certification & Training Resources<br />http://microsoft.com/technet<br />Resources for IT Professionals<br />http://microsoft.com/msdn<br />Resources for Developers<br />Resources<br />
  53. 53. Complete an evaluation on CommNet and enter to win an Xbox 360 Elite!<br />
  54. 54.
  55. 55. Required Slide<br />© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.<br />The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.<br />

×