ITCamp 2011 - Paul Roman - High Availability for Exchange 2010

852 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
852
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

ITCamp 2011 - Paul Roman - High Availability for Exchange 2010

  1. 1. High Availability for Exchange 2010 Paul Roman, MVP Managing Partner, PRAS Consulting E-mail: paul.roman@pras.ro Blog: paulroman.pras.ro Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  2. 2. IT Camp 2011• Thanks for coming!• ITCamp is made possible by our sponsors: Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  3. 3. Session agenda• Discuss different HA design dimensions: – Infrastructure design – Database Availability Group design – Client experiences• Implementation Examples• Q&A• Feedback & prizes Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  4. 4. How should you design your IT infrastructure for Exchange HAINFRASTRUCTURE DESIGN Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  5. 5. Infrastructure Design Active Directory Sites• Active Directory site assignment controls the association of CAS to Mailbox and Hub to Mailbox – CAS/HUB service local mailbox servers, “mostly” – Could be for multiple DAGs• DAGs can span subnets without special action – IP address for each MAPI subnet used by DAG – Configured on DAG object• Question : When would an AD site span datacenters? – Answer: When datacenters have LAN quality communication• Follow Active Directory guidance for AD site definition Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  6. 6. Infrastructure Design Cross-Datacenter Network Configuration• For site resilience configurations use DHCP to assign addresses for replication network – Enables delivery of the typically required static routes – If using static IP addresses, use netsh instead of route for configuring static routes• In terms of latency requirements, Exchange 2010 was designed with a target round-trip latency of 250ms or less – Remember, the higher the latency, the more impact to replication• Configure a DNS TTL on “service access connection records” that is consistent with your SLA – E.g. ~5 minutes for a one hour RTO SLA – Direct association between this time and recovery – Remember the records might be in different zones! Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  7. 7. Infrastructure Design Namespace Planning (Site Resilience)• Each datacenter should be considered active when planning for namespaces• Each datacenter needs the following namespaces – OWA/OA/EWS/EAS namespace – POP/IMAP namespace – RPC Client Access namespace – SMTP namespace• In addition, one of the datacenters will maintain the Autodiscover namespace Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  8. 8. Infrastructure Design Leverage Split-brain DNSBest Practice: Use “Split DNS” forExchange hostnames used by clientsGoal: minimize number of hostnames mail.contoso.com for Exchange connectivity on intranet and Internet mail.contoso.com has different IP addresses in intranet/Internet DNSImportant – before moving down thispath, be sure to map out all the hostnames (outside of Exchange) that you willwant to create in the internal zone Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  9. 9. Infrastructure DesignWhat does the namespace design look like? External DNS External DNS Mail.contoso.com Mail.region.contoso.com Pop.contoso.com Pop.region.contoso.com Imap.contoso.com Imap.region.contoso.com Autodiscover.contoso.com Smtp.region.contoso.com Smtp.contoso.comExternalURL = ExternalURL =mail.contoso.com mail.region.contoso.comCAS Array = Datacenter 1 Datacenter 2 CAS Array =outlook.contoso.com outlook.region.contoso.comOA endpoint = OA endpoint =mail.contoso.com mail.region.contoso.comInternal DNS CAS HT HT CAS Internal DNSMail.contoso.com Mail.region.contoso.comPop.contoso.com Pop.region.contoso.comImap.contoso.com Imap.region.contoso.comAutodiscover.contoso.com Smtp.region.contoso.comSmtp.contoso.com Outlook.region.contoso.comOutlook.contoso.com AD MBX MBX AD Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  10. 10. Infrastructure Design Certificate PlanningBest practice: minimize the number of certificates 1 certificate for all CAS servers + reverse proxy + Edge/HubUse “Subject Alternative Name” (SAN) certificatewhich can cover multiple hostnamesIf leveraging a certificate per datacenter, thenensure that the Certificate Principal Name is thesame on all certificates Outlook Anywhere won’t connect if the Principal Name on the certificate does not match the value configured in msstd: (default matches OA RPC End Point) Set-OutlookProvider EXPR -CertPrincipalName msstd:mail.contoso.com Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  11. 11. Infrastructure Design Site Resilience ModelsThere are two key models you have to take intoaccount when designing site resilient solutions Datacenter / Namespace Model User Distribution ModelAs mentioned, when planning for site resilience,each datacenter needs to be considered active Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  12. 12. Infrastructure Design User Distribution ModelsThe locality of the users will ultimately determineyour site resilience architecture Are users primarily located in one datacenter? Are users located in multiple datacenters? Is there a requirement to maintain user population in a particular datacenter?Active/Passive user distribution model Database copies deployed in the secondary datacenter, but no active mailboxes are hosted thereActive/Active user distribution model User population dispersed across both datacenters with each datacenter being the primary datacenter for its specific user population Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  13. 13. Infrastructure Design Client Access Arrays1 CAS array per AD site Multiple DAGs within an AD site can use the same CAS arrayFQDN of the CAS array needs to resolve to a load-balanced virtual IPaddress in DNS Should only resolve in internal DNS structureCAS Array does not provide any load balancing -> you need a loadbalancer!Set the databases in the AD site to utilize CAS array via Set-MailboxDatabase RPCClientAccessServer propertyBy default, new databases will have the RPCClientAccessServer valueset on creation If database was created prior to creating CAS array, then it is set to random CAS FQDN (or local machine if role co-location) If database is created after creating CAS array, then it is set to the CAS array FQDN Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  14. 14. How should you design your DAGsDATABASE AVAILABILITY GROUPDESIGN Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  15. 15. DAG Design Database Copies• Each DAG member can host 1 copy of each mailbox database• Maximum number of copies within a 16 member DAG: – 1 copy – 1600 databases – 2 copies – 800 databases – 3 copies – 533 databases• Two types of database copies – HA database copies – Lagged database copies Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  16. 16. DAG Design Lagged Database Copies• Lagged copies are only for point-in-time protection – Logical corruption and/or mailbox deletion prevention scenarios – Provide a maximum of 14 days protection• When should you deploy a lagged copy? – Useful only to mitigate a risk – Not needed if deploying a third-party backup solution (e.g. DPM 2010)• Lagged copies are not HA database copies – Lagged copies should never be activated!• Lagged copies have storage implications Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  17. 17. DAG Design Controlling Database Copy Activation• Various scenarios: – Don’t want to activate database copies on servers in standby because… – Want to preclude activation of copies on server X because of hardware issue or lagged copies… – Block activation of database copies on a server during upgrade• Two ways to activation block copies – Set-MailboxServer <Server> - DatabaseCopyAutoActivationPolicy <Blocked,IntrasiteOnly,Unrestricted> – Suspend-MailboxDatabaseCopy <DBServer> - ActivationOnly Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  18. 18. DAG Design Sizing• Question: How many members should be in a DAG? – Answer: It depends (maximum would be 16)• The larger the DAG, better resiliency – Consider the implications of a three copy/ six server DAG vs. two DAGs with three servers and three copies of each database – Larger DAGs continue to provide as much service as they can after more failures• The larger the DAG, the better efficiency of the hardware – Distribute active load across all members• For server count, consider a multiple of the number of copies you are deploying Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  19. 19. DAG Design Sizing• Question: How many DAGs should I deploy? – Answer: It depends• Obviously you will need to deploy multiple DAGs if you need more than 16 servers• You may also need multiple DAGs depending on your site resilience architecture – If deploying an Active/Active user distribution architecture, then you should consider deploying 2+ DAGs – allows you to control locality and not perform a site activation in the event of a network failure between datacenters Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  20. 20. DAG Design Active/Active User Distribution Sizing Secondary DatacenterPrimary Datacenter Outlook Outlook DAG1 HT2010 CAS-Pri CAS-Sec HT2010 FSW DAG1 Active Active MBX-A MBX-B MBX-C MBX-D Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  21. 21. DAG DesignActive/Active User Distribution Sizing Secondary DatacenterPrimary Datacenter Outlook Outlook DAG1 HT2010 CAS-Pri CAS-Sec DAG2 HT2010 FSW FSW DAG1 Active Passive MBX-A MBX-B MBX-C MBX-D DAG2 Passive Active MBX-E MBX-F MBX-G MBX-H Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  22. 22. DAG Design Two Failure Models• Design for all database copies activated – Design for the worst case - server architecture handles 100 percent of all hosted database copies becoming active• Design for targeted failure scenarios – Design server architecture to handle the active mailbox load during the worst failure case you plan to handle • 1 member failure requires 2 or more HA copies and 2 or more servers • 2 member failure requires 3 or more HA copies and 4 or more servers – Requires Set-MailboxServer <Server> - MaximumActiveDatabases <Number> Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  23. 23. DAG Design It’s all in the layout• Consider this scenario – 8 servers, 40 databases with 2 copies Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8 DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36 DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37 DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38 DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39 DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40 DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’ DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’ DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’ DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’ DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’ Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  24. 24. DAG Design It’s all in the layout• If I have a single server failure – Life is good Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8 DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36 DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37 DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38 DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39 DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40 DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’ DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’ DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’ DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’ DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’ Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  25. 25. DAG Design It’s all in the layout• If I have a double server failure – Life could be good… Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8 DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36 DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37 DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38 DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39 DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40 DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’ DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’ DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’ DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’ DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’ Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  26. 26. DAG Design It’s all in the layout• If I have a double server failure – Life could be bad… Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Server 7 Server 8 DB1 DB6 DB11 DB16 DB21 DB26 DB31 DB36 DB2 DB7 DB12 DB17 DB22 DB27 DB32 DB37 DB3 DB8 DB13 DB18 DB23 DB28 DB33 DB38 DB4 DB9 DB14 DB19 DB24 DB29 DB34 DB39 DB5 DB10 DB15 DB20 DB25 DB30 DB35 DB40 DB36’ DB31’ DB26’ DB21’ DB16’ DB11’ DB6’ DB1’ DB37’ DB32’ DB27’ DB22’ DB17’ DB12’ DB7’ DB2’ DB38’ DB33’ DB28’ DB23’ DB18’ DB13’ DB8’ DB3’ DB39’ DB34’ DB29’ DB24’ DB19’ DB14’ DB9’ DB4’ DB40’ DB35’ DB30’ DB25’ DB20’ DB15’ DB10’ DB5’ Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  27. 27. DAG Design It’s all in the layout• Now let’s consider this scenario – 4 servers, 12 databases with 3 copies Server 1 Server 2 Server 3 Server 4 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12 DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’ DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’ – With 1a single Server 2 Server server failure: 3 Server Server 4 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12 DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’ DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’ – With 1a double server failure: Server Server 2 Server 3 Server 4 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 DB10 DB11 DB12 DB4’’ DB5’’ DB6’ DB1’ DB3’’ DB7’’ DB2’’ DB3’ DB4’ DB1’’ DB2’ DB5’ DB7’ DB9’’ DB10’ DB8’ DB11’ DB12’’ DB10’’ DB11’’ DB12’ DB6’’ DB8’’ DB9’ Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  28. 28. DAG Design It’s all in the layout – Over Subscription• If you plan to over subscribe the servers then: – Don’t plan to be perfect! – Set soft threshold for number of active databases per server • In some circumstances databases will fail to mount because of limit – Put processes in place for redistributing databases per server • After hardware maintenance • After software maintenance • Periodically – because of random failures – SP1 includes a script to provide automated load balancing Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  29. 29. DAG Design It’s all in the layout – Over Subscription• If you plan to over subscribe the servers then: – Educate your operations team on implication of over subscription – Periodically validate you are not too over subscribed • Run in your worst case scenario for a period of time – Have a plan on how you handle being too over subscribed• Reminders: – Design storage subsystems to handle all database copy I/O and capacity – Design CPU and memory to handle the max active database copies and the passive copies – Design memory to handle the max active database copies – Design network subsystem to handle the throughput required to sustain the active load, the number of target copies, and CI updates Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  30. 30. DAG Design It’s all in the layout• Consider physical hardware situations where practical (JBOD in particular) – If servers in DAG are in multiple racks then spread copies across racks – If servers are in different rooms in datacenter then factor that into distribution – If servers reside on the same network switch/router, then a network failure can take out multiple servers – In summary, minimize possible single points of failures on Microsoft’s Dev and ITPro technologies Premium conference @itcampro / #itcampro
  31. 31. DAG Design Storage Architecture• Deployment on RAID or JBOD will be based on several factors – Cost – Hardware – Number of copies – Types of copies – Single or multi-datacenter Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  32. 32. DAG Design Storage Architecture 2 HA Copies 3+ HA Copies 2+ HA Copies 1 Lagged 2+ Lagged (Total) (Total) / Datacenter Copy Copies / DatacenterPrimary RAID RAID or JBOD RAID or JBOD RAID RAID or JBODDatacenterServersSecondary RAID RAID RAID or JBOD RAID RAID or JBODDatacenterServers Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  33. 33. DAG Design Replication Concerns• Replication is always from source to target – Remember if you have multiple copies in a remote datacenter, you will have multiple log streams being shipped across the wire• Exchange 2010 offers compression for log shipping – Controllable setting for the DAG – Default is inter-subnet – MSIT sees 30% compression, but can vary for each customer based on message profile• Also have to factor in content indexing – While an index exists for every copy, the index for a passive copy is updated by getting changes from active copy’s index – This communication is not compressed• How do I size for replication and content indexing impact? – Use the Exchange 2010 Mailbox Server Role Requirements Calculator Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  34. 34. DAG Design Replication Networks• Single network DAG members fully supported – Recommendation: have minimum of two networks on each member server• Initial DAG network configuration is based on the enumeration of cluster networks – Cluster enumerates networks based on subnet – One cluster network is created for each subnet / port – Recommendation: Collapse into single MAPI and Replication DAG networks• MAPI network may be replication disabled – Network will be utilized for replication if no other valid replication path exists• There is no preference order to replication networks – chosen at random by Replication service Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  35. 35. DAG Design Small Scale Architectures • Small scale / branch office architectures that require high availability – 2-4 servers typically – Requires Windows Server Enterprise Edition • There are many different options: Hardware Licensing2 physical servers (all-in-one)* Requires Hardware Load Less licenses Balancer2 physical server architecture Less hardware More Exchange licensesutilizing Hyper-V (roleseparation via VMs)*4 physical servers (role More hardware More Exchange andseparation – 2 MBX, 2 HT/CAS) Windows licenses Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  36. 36. How should you design your DAGsCLIENT EXPERIENCES Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  37. 37. Client Experiences Typical Outlook Behavior• All Outlook versions behave consistently in a single datacenter HA scenario – Profile points to Client Access Server array – Profile is unchanged by failovers or loss of CAS• All Outlook versions should behave consistently in a datacenter failover scenario – Primary datacenter Client Access Server DNS name is bound to IP address of standby datacenter’s Client Access Server – Autodiscover continues to hand out primary datacenter CAS name as Outlook RPC endpoint – Profile remains unchanged Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  38. 38. Client Experiences Cross-Site DB Failover Redirect (Outlook Outlook 2003 can’t Versions) Autodiscover detects profile changeupdate if source CAS is and updates client unavailable Outlook 2003 Outlook 2007 Outlook 2010 Outlook 2003 updates due to ecWrongServer Secondary Datacenter Primary Datacenter CAS-Sec HT2010 HT2010 CAS-Pri Autodiscover detects profile change and updates client DAG MBX-A MBX-B MBX-C MBX-D Key Active Preferred Database Site = PDC Passive (RPCClintAccessServer = CAS-PRI) Cross Site Connections = Not Allowed Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  39. 39. Client Experiences Other Clients• Other client behavior varies per technology and scenario: In-Site *Over Scenario Out-of-Site *Over Datacenter Switchover ScenarioOWA Reconnect Manual Redirect ReconnectActive Sync Reconnect Redirect or proxy ReconnectPOP/IMAP Reconnect Proxy ReconnectEWS Reconnect Autodiscover ReconnectAutodiscover N/A Seamless ReconnectSMTP / Powershell N/A N/A Reconnect Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  40. 40. Real life implementationsIMPLEMENTATION EXAMPLES Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  41. 41. Implementation Examples Fully Redundant InfrastructurePremium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  42. 42. Implementation ExamplesDisaster recovery with lagged copy Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  43. 43. Conclusion• There are many different design dimensions that have to be considered when designing for high availability and site resilience with Exchange 2010• The choices you will make will determine the number of copies and hardware you deploy – Design choices should be based on customer requirements – Exchange 2010 allows you to take advantage of new options which can lower costs Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  44. 44. Q&A Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro
  45. 45. Don’t forget!Get your free Azure pass! We want your feedback!• 30+15 days, no CC req’d • Win a WP7 smartphone – http://bit.ly/ITCAMP11 – Fill in your feedback forms – Promo code: ITCAMP11 – Raffle: end of the day Premium conference on Microsoft’s Dev and ITPro technologies @itcampro / #itcampro

×