IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010


Published on

Learn about IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010. IBM builds, tests, and publishes reference configurations and performance metrics to provide clients with guidelines for sizing their Microsoft Exchange Server 2010 environments. This document highlights the IBM Flex System x240 Compute Node and IBM Flex System V7000 Storage Node and how they can be used as the foundation for your Exchange 2010 infrastructure. For more information on Pure Systems, visit

Visit to 'Like' the official Facebook page of IBM India Smarter Computing.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

  1. 1. IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010 Performance metrics and reference architecture for 30,000 Exchange users on IBM Flex System
  2. 2. Systems and Technology Group April 2013 Technical Whitepaper Page | 2 © Copyright IBM Corporation 2013 Contents 2 Overview 2 Introduction 5 Test Configuration 8 Solution Validation Methodology 11 Validation Results 13 Reference Architecture Overview IBM® builds, tests, and publishes reference configurations and performance metrics to provide clients with guidelines for sizing their Microsoft ® Exchange Server 2010 environments. This document highlights the IBM Flex System™ x240 Compute Node and IBM Flex System V7000 Storage Node and how they can be used as the foundation for your Exchange 2010 infrastructure. To demonstrate the performance of the x240 Compute Node and Flex V7000 Storage Node, 9,000 mail users are hosted on the x240 with the mailbox databases residing on the Flex V7000. Multiple tests are run to validate both the storage and server running at that workload. The performance metrics are then used to design a highly available reference architecture for a fictitious organization with 30,000 employees. IBM has tested and is releasing this configuration, which is built using these key components:  IBM Flex System V7000 Storage Node  IBM Flex System x240 Compute Node  Microsoft Windows® Server 2008 R2 SP1  Microsoft Exchange Server 2010 SP1 Introduction This document provides performance characteristics and reference architecture for an Exchange 2010 mail environment hosted on an IBM Flex System x240 Compute Node and IBM Flex System V7000 Storage Node. Microsoft Exchange Server 2010 Exchange Server 2010 software gives you the Flexibility to tailor your deployment to your unique needs and provides a simplified way to help keep e-mail continuously available for your users. This Flexibility and simplified availability comes from innovations in the core platform on which Exchange is built. These innovations deliver numerous advances in performance, scalability, and improved reliability, while lowering the total cost of ownership when compared to an Exchange Server 2007 environment. A new, unified approach to high availability and disaster recovery helps achieve improved levels of reliability by reducing the complexity and cost of delivering business continuity. With new features, such as Database Availability Groups and online mailbox moves, you can more easily and confidently implement mailbox resiliency with database-level replication and failover, all with familiar Exchange management tools. Administrative advances in Exchange Server 2010 can help you save time and lower operational costs by reducing the burden on your IT staff. A new role-based security model, self-service capabilities, and the Web-based Exchange Control Panel allow you to delegate common or specialized tasks to your users without giving them full administrative rights or increasing help- desk call volume. For more information For more information about Microsoft Exchange Server 2010, visit the following URL: IBM PureFlex System To meet today’s complex and ever-changing business demands, you need a solid foundation of server, storage, networking, and software resources. Furthermore, it needs to be simple to deploy, and able to quickly and automatically adapt to changing conditions. You also need access to, and the ability to take advantage of, broad expertise and proven guidelines in systems management, applications, hardware maintenance and more. IBM PureFlex System is a comprehensive infrastructure system that provides an expert integrated computing system. It combines servers, enterprise storage, networking, virtualization, and management into a single structure. Its built-in expertise enables organizations to manage and deploy integrated patterns of virtual and hardware resources through unified management. These systems are ideally suited for customers who want a system that delivers the simplicity of an integrated solution while still able to tune middleware and the runtime environment. IBM PureFlex System uses workload placement based on virtual machine compatibility and resource availability. Using built-in virtualization across servers, storage, and networking, the infrastructure system enables automated scaling of resources and true workload mobility. IBM PureFlex System has undergone significant testing and validation so that it can mitigate IT complexity without compromising the Flexibility to tune systems to the tasks
  3. 3. Systems and Technology Group April 2013 Technical Whitepaper Page | 3 © Copyright IBM Corporation 2013 businesses demand. By providing both Flexibility and simplicity, IBM PureFlex System can provide extraordinary levels of IT control, efficiency, and operating agility. This combination enables businesses to rapidly deploy IT services at a reduced cost. Moreover, the system is built on decades of expertise. This expertise enables deep integration and central management of the comprehensive, open-choice infrastructure system. It also dramatically cuts down on the skills and training required for managing and deploying the system. The streamlined management console makes it easy to use and provides a single point of control to manage your physical and virtual resources for a vastly simplified management experience. IBM PureFlex System combines advanced IBM hardware and software along with patterns of expertise. It integrates them into three optimized configurations that are simple to acquire and deploy so you get fast time to value. IBM PureFlex System has the following configurations:  IBM PureFlex System Express, which is designed for small and medium businesses and is the most affordable entry point for PureFlex System.  IBM PureFlex System Standard, which is optimized for application servers with supporting storage and networking, and is designed to support your key ISV solutions.  IBM PureFlex System Enterprise, which is optimized for transactional and database systems. It has built-in redundancy for highly reliable and resilient operation to support your most critical workloads. Figure 1: Front and rear view of the IBM PureFlex System Enterprise Chassis IBM offers the easy-to-manage PureFlex System with the IBM Flex System V7000 storage node to tackle the most complex environments. IBM Flex System V7000 Storage Node IBM Flex System V7000 Storage Node is an integrated piece of the PureFlex System and is designed to be easy to use and enable rapid deployment. The Flex V7000 storage node supports extraordinary performance and Flexibility through built-in solid state drive (SSD) optimization and thin provisioning technologies. With non-disruptive migration of data from existing storage, you also get simplified implementation, minimizing disruption to users. In addition, advanced storage features like automated tiering, storage virtualization, clustering, replication and multi-protocol support are designed to help you improve the efficiency of your storage. As part of your Flex or PureFlex System, Flex V7000 can become part of your highly efficient, highly capable, next-generation information infrastructure. Highlights  A single user interface to manage and virtualize internal and third-party storage that can improve storage utilization  Built-in tiering and advanced replication functions are designed to improve performance and availability without constant administration  Single user interface simplifies storage administration to allow your experts to focus on innovation
  4. 4. Systems and Technology Group April 2013 Technical Whitepaper Page | 4 © Copyright IBM Corporation 2013 Figure 2: IBM Flex System V7000 Storage Node Flex V7000 system details Flex V7000 enclosures support up to twenty-four 2.5-inch drives or up to twelve 3.5-inch drives. Control enclosures contain drives, redundant dual-active intelligent controllers, and dual power supplies, batteries and cooling components. Expansion enclosures contain drives, switches, power supplies and cooling components. You can attach up to nine expansion enclosures to a control enclosure supporting up to 240 drives. Key system characteristics are:  Internal storage capacity: up to 36 TB of physical storage per enclosure  Disk drives: SAS disk drives, near-line SAS disk drives and solid-state drives can be mixed in an enclosure to give you extraordinary Flexibility  Cache memory: 16 GB cache memory (8GB per controller) as a base feature—designed to improve performance and availability  Ports per control enclosure: Eight 8 Gbps Fibre Channel host ports, four 1 Gbps and optionally four 10 Gbps iSCSI host ports  Ports per File Module: Two 1 Gbps and two 10 Gbps Ethernet ports for server attachment and management, two 8 Gbps FC ports for attachment to Flex V7000 control enclosures IBM Flex System x240 Compute Node IBM Flex System x240 Compute Node, an element of the IBM PureFlex System, provides outstanding performance for your mission-critical applications. Its energy-efficient design supports up to 16 processor cores and 768 GB of memory capacity in a package that is easy to service and manage. With outstanding computing power per watt and the latest Intel® Xeon® processors, you can reduce costs while maintaining speed and availability. Highlights  Optimized for virtualization, performance and highly scalable networking  Embedded IBM Virtual Fabric allows IO Flexibility  Designed for simplified deployment and management To meet today’s complex and ever-changing business demands, the x240 compute node is optimized for virtualization, performance and highly scalable I/O designed to run a wide variety of workloads. The Flex System x240 is available on either your PureFlex System or IBM Flex System solution. Figure 3: IBM Flex System x240 Compute Node For more information For more information about IBM PureFlex System visit the following URL: Test Configuration The test configuration described in this section is designed to demonstrate the server performance for a preconfigured Exchange 2010 user population running a specific workload. It is not designed as an Exchange “solution” because it does not include high-availability features. If a production environment were implemented as described in the Test Configuration portion
  5. 5. Systems and Technology Group April 2013 Technical Whitepaper Page | 5 © Copyright IBM Corporation 2013 of this document, the server itself would be a single point of failure. For a valid Exchange solution which is based on the tests performed and illustrated in this section, please turn to the section labeled, Reference Architecture on page 13. To demonstrate the performance characteristics of the x240 compute node and the Flex V7000 storage node, the test configuration is designed to support 9,000 Exchange mailboxes. The Client Access Server (CAS) and Hub Transport Server (HUB) are deployed in a 1:1 ratio with the mailbox server in a multi-role assignment (i.e., the CAS role and the Transport role are installed on the same physical server with one CAS/HUB server per mailbox server). For testing purposes, the CAS/HUB server is deployed virtually on physical hardware separate from the mailbox server. This test configuration uses two domain controllers in a single Active Directory forest. Exchange Load Generator 2010 (LoadGen) is used to generate user load for the server performance evaluation testing (see details below). Three LoadGen clients are required to generate sufficient load to simulate 9,000 mailboxes. Exchange Server Jetstress 2010 is used to run stress tests against the Flex V7000 storage node. I/O Connectivity Figure 4 shows the internal I/O links between the compute nodes in the Flex System Enterprise Chassis and the four I/O modules in the rear of the chassis. Each of these individual I/O links can be wired for 1 GB or 10 GB Ethernet, or 8 or 16 GBps Fibre Channel. You can enable any number of these links. The application-specific integrated circuit (ASIC) type on the I/O Expansion adapter dictates the number of links that can be enabled. Some ASICs are two port and some are four port. For a two port ASIC, one port can go to one switch and one port to the other providing high availability. Figure 4: Internal connections to the I/O modules Figure 5 illustrates the Fibre Channel 8 GB internal connections between the x240 compute node and the Flex V7000 storage node. A single dual-port HBA provides the 8 GB internal connections to the storage. Figure 5: Internal 8 GB connections between the server and storage
  6. 6. Systems and Technology Group April 2013 Technical Whitepaper Page | 6 © Copyright IBM Corporation 2013 Figure 6 illustrates the test environment’s design and mail flow. Traffic originates from the LoadGen client. The network switch routes traffic to the ADX, which assigns the traffic to the CAS server (in a production environment the ADX would route traffic to load-balanced CAS server which is part of a CAS array). Traffic routes back through the network switch and to the CAS server. The CAS server then routes the package to the mailbox server. Figure 6: Mail flow Server Configuration The x240 compute node is equipped with two Intel Xeon E5- 2670 2.6 GHz 8-core processors and 192GB of memory. Hyperthreading is enabled. Storage Configuration The underlying storage design consists of multiple hard disk types, combined into logical groups called MDisks, which are then used to create storage pools. The storage pools are then divided into volumes which are assigned to host systems. An MDisk (managed disk) is a component of a storage pool that is comprised of a group of identical hard disks that are part of a RAID array of internal storage. Figure 7 lists the disks used to build the MDisks used in this test. MDisk0 is comprised of eight 300 GB SAS 15k hard drives. MDisk1 is comprised of two 400 GB SSD hard drives. MDisk2 is comprised of eight 900 GB SAS 10k hard drives. A storage pool is a collection of storage capacity that provides the capacity requirements for a volume. One or more MDisks make up a storage pool. A volume is a discrete unit of storage on disk, tape, or other data recording medium that supports some form of identifier and parameter list, such as a volume label or input/output control. By default, all volumes that you create are striped across all available MDisks in one storage pool. Figure 7: MDisks comprised of internal disks Figure 8 illustrates the storage pools (and the MDisks that make up each particular pool) as well as the logical volumes created from each of the storage pools.
  7. 7. Systems and Technology Group April 2013 Technical Whitepaper Page | 7 © Copyright IBM Corporation 2013 Figure 8: Storage pool design The mailbox server supports 12 mailbox databases as well as 12 log files. The three volumes shown in Figure 8 are assigned to accommodate the mailbox databases and three smaller volumes are assigned to accommodate the log files. Figure 9 illustrates the mailbox database and log distribution between the volumes. Figure 9: Volumes assigned to the x240 compute node Solution Validation Methodology The testing required two phases: the storage performance evaluation phase and the server performance evaluation phase. Microsoft provides two tools to evaluate both aspects of an Exchange environment: Exchange Server Jetstress 2010 (Jetstress) for testing the performance of the storage system, and Microsoft Exchange Load Generator (LoadGen) for testing server performance. Storage Validation Storage performance is critical in any type of Exchange deployment. A poorly performing storage subsystem will result in high transaction latency, which will affect the end-user experience. It is important to correctly validate storage sizing and configuration when deploying Exchange in any real-world scenario. In order to facilitate the validation of Exchange storage sizing and configuration, Microsoft provides a utility called Exchange Server Jetstress 2010. Jetstress simulates an Exchange I/O workload at the database level by interacting directly with the Extensible Storage Engine (ESE). The ESE is the database technology that Exchange uses to store messaging data on the mailbox server role. Jetstress can simulate a target profile of user count and per-user IOPS, and validate that the storage subsystem is capable of maintaining an acceptable level of performance with the target profile. Test duration is adjustable and can be set to an extended period of time to validate storage subsystem reliability.
  8. 8. Systems and Technology Group April 2013 Technical Whitepaper Page | 8 © Copyright IBM Corporation 2013 Testing storage systems using Jetstress focuses primarily on database read latency, log write latency, processor time, and the number of transition pages repurposed per second (an indicator of ESE cache pressure). The Jetstress utility returns a Pass or Fail grade, which is dependent on the storage performance. Test Execution and Data Collection To ensure that the storage can handle the load generated by 9,000 users, Jetstress is installed and run on the mailbox server. The Jetstress instance running on the mailbox server simulates load for 9,000 mailboxes. Although Jetstress is installed and run from the mailbox server, it does not test the performance characteristics of the server itself. Jetstress is designed to test the performance characteristics of the storage system only. After the test completes, Jetstress generates a report with a Pass or Fail grade for the test. Server Validation For validating server performance, Microsoft provides a utility called Exchange Load Generator. LoadGen is a pre-deployment validation and stress testing tool that introduces various types of workloads into a test (non- production) Exchange messaging system. LoadGen simulates the delivery of multiple MAPI client messaging requests to an Exchange mailbox server. To simulate the delivery of these messaging requests, LoadGen is installed and run from client computers which have network connectivity to the Exchange test environment. These tests send multiple messaging requests to the Exchange mailbox server, which causes a mail-based performance load on the environment. After the tests are complete, you can use the results to assist with:  Verifying the overall deployment plan  Identifying bottlenecks on the server  Validating Exchange settings and server configurations LoadGen Profile The LoadGen profile information in the table below was used to validate the test environment. LoadGen Configuration Value Messages Send/Receive Per Day 100 Average Message Size 75 KB Mailbox Size 250 MB Testing for Peak Load When validating your server design, it is important to test the solution under anticipated peak workload rather than average workload. Most companies experience peak workload in the morning when employees arrive and check e-mail. The workload then tapers off throughout the remainder of the day. Based on a number of data sets from Microsoft IT and other clients, peak load is generally equal to two times the average workload throughout the remainder of the work day. LoadGen uses a task profile that defines the number of times each task will occur for an average user within a simulated day. The total number of tasks that need to run during a simulated day is calculated as the number of users multiplied by the sum of task counts in the configured task profile. LoadGen then determines the rate at which it should run tasks for the configured set of users by dividing the total number of tasks to run in the simulated day by the simulated-day length. For example, if LoadGen needs to run 1,000,000 tasks in a simulated day, and a simulated day is equal to 8 hours (28,800 seconds), LoadGen must run 1,000,000 ÷ 28,800 = 34.72 tasks per second to meet the required workload definition. By default, LoadGen spreads the tasks evenly throughout the simulated work day. To ensure that the Exchange solution is capable of sustaining the workload generated during the peak average, modify LoadGen settings to generate a constant amount of load at the peak average level, rather than spreading out the workload over the entire simulated work day. To increase the amount of load to the desired peak average, divide the default simulated day length (8 hours) by the peak-to-average ratio (2) and use this as the new simulated-day length.
  9. 9. Systems and Technology Group April 2013 Technical Whitepaper Page | 9 © Copyright IBM Corporation 2013 To change the simulated-day length, modify the following section of the LoadGenConfig.xml file to reflect a 4-hour simulated day: <SimulatedDayLengthDuration>P0Y0M0DT8H0M0S< /SimulatedDayLengthDuration> Change to: <SimulatedDayLengthDuration>P0Y0M0DT4H0M0S< /SimulatedDayLengthDuration> Test Execution and Data Collection Testing large Exchange environments requires a staged start of the LoadGen clients to prevent RPC requests from queuing up. When RPC requests in queue exceed 1.5 times the number of simulated users, LoadGen initiates test shutdown. Therefore, a 4-hour, ramp-up time is required to stage the LoadGen client startup, resulting in a 16-hour test duration. Performance data is collected for the full 16-hour duration of all five test runs. Performance summary data is then taken from an 8-hour stable period. Figure 10 shows the collected performance data for the entire test with the stable period highlighted in gray. Figure 10: Sample Perfmon data showing the highlighted stable period Content Indexing Content indexing was disabled during the test period due to the added time between test runs required to completely index the databases. A production environment with content indexing enabled could experience an additional load of up to 20%. Test Validation Criteria To verify server performance, Microsoft Performance Monitor (Perfmon) was used to record the data points described in this section. Mailbox Server The following table lists the performance counters collected on the MAILBOX server as well as the target values. Perfmon Counter Target Value MSExchangeIS Mailbox(_Total)Messages Queued For Submission < 50 at all times MSExchangeIS Mailbox(_Total)Messages Delivered/sec Scales linearly with number of mailboxes MSExchangeISRPC Requests < 70 at all times MSExchangeISRPC Averaged Latency Average < 10 Processor(_Total)%Processor Time <80% MSExchangeIS Mailbox(_Total)Messages Queued For Submission The total number of messages queued for the submission counter shows the current number of submitted messages not yet
  10. 10. Systems and Technology Group April 2013 Technical Whitepaper Page | 10 © Copyright IBM Corporation 2013 processed by the transport layer. This value should be below 50 at all times. MSExchangeIS Mailbox(_Total)Messages Delivered/sec The messages delivered per second counter shows the rate that messages are delivered to all recipients. This indicates current message delivery rate to the store. This is not so much a performance metric of the mailbox server, but more an indication that the LoadGen servers are generating the appropriate level of load. This number should scale parallel to the number of simulated mailboxes. MSExchangeISRPC Requests The number of RPC requests indicates the overall RPC requests currently executing within the information store process. This should be below 70 at all times. MSExchangeISRPC Averaged Latency RPC averaged latency indicates the RPC latency in milliseconds (msec), averaged for all operations in the last 1,024 packets. This should not be higher than 10 msec on average.
  11. 11. Systems and Technology Group April 2013 Technical Whitepaper Page | 11 © Copyright IBM Corporation 2013 For information about how clients are affected when overall server RPC averaged latencies increase, visit the following URL: Processor(_Total)%Processor Time For physical Exchange deployments, use the Processor(_Total)% Processor Time counter and verify that this counter is less than 80 percent on average. For more information For more information about Exchange 2010 performance monitoring, visit the following URL: Validation Results This section lists the results of the Jetstress and LoadGen tests run against the Flex V7000 storage node and the x240 compute node. Storage Results The storage passed rigorous testing to establish a baseline that would conclusively isolate and identify any potential bottleneck as a valid server performance-related issue. Figure 11 shows a report generated by Jetstress. The 9,000 mailbox test passed with low latencies. Figure 11: Jetstress results Figure 12 shows the data points collected during the Jetstress test run. The second column shows the averaged latency for I/O Database Reads on all databases. To pass, this number must be below 20 msec. The highest result for this test is 17.451, which is somewhat of an outlier; the remaining database instances performed well below the 17.451 msec response time of instance 3. The third column shows the averaged latency for I/O Log Writes on all databases. To pass, this number must be below 10 msec. The highest result for this test is 0.647.
  12. 12. Systems and Technology Group April 2013 Technical Whitepaper Page | 12 © Copyright IBM Corporation 2013 Figure 12: Transactional I/O performance The Jetstress test results show that at 9,000 mailboxes, the storage performs exceedingly well and has remaining headroom for additional I/O. Server Results This section describes the performance results for the x240 compute node hosting 9,000 mailbox users. Figure 13 below shows the test results for the x240 compute node. The first column lists the performance counters and the expected target values. The second column lists the average recorded value for each of the counters. The third column lists the maximum recorded value for each of the counters. The test values we are most interested in are accentuated in boldface italics. Figure 13: Test results for the x240 compute node The x240 handled the load exceedingly well. The maximum Messages Queued value remains well below the recommended maximum of 50. The maximum RPC Requests are also well below the recommended maximum of 70. The RPC Averaged Latency is 1.48 which is well below the recommended maximum average of 10 msec. The last row in Figure 13 shows the processor load on the x240 compute node. Even under peak load, the processor does not exceed 24% utilization. The results of this test demonstrate the x240 compute node is quite capable for this Exchange 2010 workload on this particular hardware configuration.
  13. 13. Systems and Technology Group April 2013 Technical Whitepaper Page | 13 © Copyright IBM Corporation 2013 Reference Architecture This section describes a highly available Exchange 2010 reference architecture that is based on the test results above. Customer Profile The example used for this reference architecture is a fictitious organization with 30,000 employees. The employee population is split evenly between two regions. Each region has a datacenter capable of housing server and storage hardware. The company has determined the average number of emails sent and received per day for each user is approximately 100, with an average email size of 75KB. Each user will be assigned a 500 MB mailbox. High availability If an organization has multiple datacenters, the Exchange infrastructure can be deployed in one or distributed across two or more sites. Typically the service level agreement currently in place will determine the degree of high availability and the placement of the Exchange infrastructure. In this example, the organization has two datacenters with a user population that is evenly distributed between the two. The organization has determined site resiliency is required; therefore, the Exchange 2010 design will be based on a multiple site deployment with site resiliency. Backups Exchange 2010 includes several new features that provide native data protection that, when implemented correctly, can eliminate the need for traditional backups. Traditionally backups are used for disaster recovery, recovery of accidentally deleted items, long term data storage, and point-in-time database recovery. Each of these scenarios is addressed with new features in Exchange 2010 such as high availability database copies in a database availability group, recoverable items folders, archiving, multiple-mailbox search, message retention, and lagged database copies. In this example, the organization has decided to forgo traditional backups in favor of using an Exchange 2010 native data protection strategy. Number of database copies Before determining the number of database copies needed, it is important to understand the two types of database copies. High availability database copy – This type of database copy has a log replay time of zero seconds. When a change is made in the active database copy, changes are immediately replicated to passive database copies. Lagged database copy – This type of database copy has a pre- configured delay built into the log replay time. When a change is implemented in the active database copy, the logs are copied over to the server hosting the lagged database copy, but are not immediately implemented. This provides point-in-time protection which can be used to recover from logical corruption of a database (logical corruption occurs when data has been added, deleted, or manipulated in a way the user or administrator did not expect). Log replay time for lagged database copies IBM recommends using a replay lag time of 72 hours. This gives administrators time to detect logical corruption that occurred at the start of a weekend. Another factor to consider when choosing the number of database copies is serviceability of the hardware. If only one high availability database copy is present at each site, the administrator is required to switch over to database copy hosted at a secondary datacenter every time a server needs to be powered off for servicing. To prevent this, maintaining a second database copy at the same geographic location as the active database copy is a valid option to maintain hardware serviceability and to reduce administrative overhead. Microsoft recommends having a minimum of three high availability database copies before removing traditional forms of backup. Because our example organization chose to forgo traditional forms of backup, they require at least three copies of each database.
  14. 14. Systems and Technology Group April 2013 Technical Whitepaper Page | 14 © Copyright IBM Corporation 2013 In addition to the three high availability database copies, the organization has chosen to add a fourth, lagged database copy, to protect against logical corruption. Database availability groups With Exchange 2010, the former data protection methods in Exchange 2007 (Local Continuous Replication, Single Copy Clusters, Cluster Continuous Replication and Standby Continuous Replication) have evolved into Database Availability Groups (DAG). The DAG is the new building block for highly available and/or disaster recoverable solutions. A DAG is a group of up to 16 mailbox servers that host a set of replicated databases and provide automatic database-level recovery from failures that affect individual servers or databases. Microsoft recommends minimizing the number of DAGs deployed for administrative simplicity. However, in certain circumstances multiple DAGs are required:  You deploy more than 16 mailbox servers  You have active mailbox users in multiple sites (active/active site configuration)  You require separate DAG-level administrative boundaries  You have Mailbox servers in separate domains. (DAG is domain bound) In our example, the organization is deploying an active/active site configuration; therefore, they require at least two DAGs. Mailbox Servers and Database Distribution Given the decisions above, we can determine the number of mailbox servers and the mailbox database distribution. The organization needs at least four servers to support the three highly available database copies and one lagged copy (a server can host both lagged database copies and high availability copies simultaneously). Figure 14 below illustrates the mailbox database distribution amongst the required physical servers for one of the two DAGs. The second DAG will be the mirror image of the first DAG, with database copies one and two at Site B, and the third copy and the lagged copy at Site A. Figure 14: Database distribution amongst servers (per DAG) This design enables the organization to withstand up to two server failures without loss of data. For example, if Server 2 fails, the passive copies (number 2) for each database hosted by Server 2 will activate on Server 1. If Server 1 then fails, the third database copy hosted at the secondary site could be activated. With two servers at each site hosting active mailboxes (15,000 users per site), the entire population of 30,000 users is divided equally amongst the four servers (two servers per DAG) resulting in 7,500 users per server at normal run time (no failed servers). The test results above have conclusively shown the x240 compute node consumes roughly 24% of its processing power to handle the workload generated by 9,000 users. With a single server failure, a server would be required to handle the workload generated by 15,000 users. The additional load of 4,000 users is well within the remaining processing capacity of the x240 compute node. Client Access Servers and Transport Servers
  15. 15. Systems and Technology Group April 2013 Technical Whitepaper Page | 15 © Copyright IBM Corporation 2013 To ease deployments, the Client Access Server (CAS) role, and the Hub Transport Server (HUB) role are often installed together on a single physical server, separate from the mailbox servers. The CAS/HUB servers are then deployed in a 1:1 ratio with mailbox servers (e.g. one CAS/HUB server per one mailbox server). For example, this organization requires four mailbox servers per DAG for a total of eight mailbox servers. Therefore, eight additional servers, installed with both the CAS role and the HUB role, are required to handle the workload generated by 30,000 users. Storage Sizing For sizing the storage required by Exchange 2010, Microsoft has created the Mailbox Server Role Requirements Calculator. To download the calculator, and get information on its use see the following URL: spx To correctly estimate the number of disks required, and to align with the testing performed above, a few variables are configured in the Mailbox Server Role Calculator:  Disk Type – 900 GB 10k 2.5” SAS  RAID 1/0 Parity Grouping – 4+4 (to more closely simulate the 8-disk MDisk groups in a RAID10 configuration)  Override RAID configuration – Yes; configured for RAID10 on the database and log LUNs After customization is complete, the calculator determines 216 disks are required at each site to host the mailbox databases and logs and to provide a restore LUN for each server. The Final Configuration Figure 15 summarizes the end result of the sizing efforts. The two sites are labeled Site A and Site B. Each site has 15,000 local Exchange users. Two DAGs span the sites and are labeled DAG-1 and DAG-2 in the diagram. Site A is the primary site for DAG-1 and Site B is the primary site for DAG-2. Primary sites Primary site refers to the active copies of the mailbox databases being geographically co-located with the 15,000 Exchange users of that site. Two network connections are required for the mailbox servers; one network for MAPI traffic and one network for database replication. The networks are labeled as MAPI and Replication in the diagram. Each of the DAGs has four mailbox servers. At the primary site (where the users are co-located with their Exchange mailboxes) two mailbox servers host the active database copy (Copy-1 in the diagram) and the first passive copy (Copy-2 in the diagram). At the secondary site, two mailbox servers host the third passive copy (Copy-3 in the diagram) and the lagged database copy (Lag Copy in the diagram). The second DAG is a mirror of the first DAG. The two mailbox servers hosting the active copy of the database and the first passive copy are located at Site B while the third and fourth servers hosting the third passive copy and the lagged database copy are located at Site A. Figure 15: The final configuration
  16. 16. Systems and Technology Group April 2013 Technical Whitepaper Page | 16 © Copyright IBM Corporation 2013 In addition to the mailbox servers, each DAG has four CAS/HUB servers; two at the DAG’s primary site and two at the secondary site. Each site has a global catalog server to provide redundancy at the domain controller level. Hardware Required With redundant SAN switches, network switches, and power modules, the IBM Flex System Enterprise Chassis provides the high availability and fault tolerance necessary for an enterprise class Exchange environment. Take advantage of redundant power supplies IBM recommends multiple circuits be used to power the Flex System Enterprise Chassis, so in the case of a tripped breaker, the chassis does not become a single point of failure. Figure 16 illustrates the hardware required at each site to support the organization’s 30,000 user population. Each site requires four x240 compute nodes to host the mailbox role, four x240 compute nodes to host the CAS/HUB role, and if not already present an additional x240 compute node for a global catalog server. Each mailbox server should have at least 128 GB of memory installed and each CAS/HUB server should have at least 32 GB of memory installed. Each x240 compute node should have two Intel Xeon E5-2670 2.6 GHz 8-core processors. To host the database files, each site’s Flex System Enterprise Chassis will require the Flex System V7000 storage node fully populated with 900 GB 10K SAS hard disk drives. In addition, eight fully populated (with the same drive type) V7000 or Flex V7000 expansion drawers are also required. Finally, IBM recommends using a hardware-based (rather than software-based) network load balancer such as the Brocade ADX 1000 series as shown in the figure below. Figure 16: Hardware required at each site Conclusion The x240 compute node and Flex V7000 storage node performed well throughout the test durations. These tests demonstrate the capability of the x240 and Flex V7000 in supporting 9,000 Exchange 2010 mailboxes on a single x240 compute node. Although the tests are not a true-to-life deployment scenario the results can be used to build highly available Exchange architectures as shown in the Reference Architecture section. With high availability and fault tolerance built into the platform, IBM Flex System is a solid foundation for an enterprise Exchange environment.
  17. 17. Systems and Technology Group April 2013 Technical Whitepaper Page | 17 © Copyright IBM Corporation 2013 About the author Roland G. Mueller works at the IBM Center for Microsoft Technologies in Kirkland, Washington (just 5 miles from the Microsoft main campus). He has a second office in building 35 at the Microsoft main campus in Redmond, Washington to facilitate close collaboration with Microsoft. Roland has been an IBM employee since 2002 and has specialized in a number of different technologies including: virtualization, bare-metal server deployment, and Exchange Server infrastructure sizing, design, and performance testing.
  18. 18. Systems and Technology Group April 2013 Technical Whitepaper Page | 18 © Copyright IBM Corporation 2013
  19. 19. Page | 19 © Copyright IBM Corporation 2013 © Copyright IBM Corporation 2013 IBM Systems and Technology Group Dept. U2SA 3039 Cornwallis Road Research Triangle Park, NC 27709 Produced in the United States of America April 2013 All Rights Reserved IBM, the IBM logo and are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at Other company, product and service names may be trademarks or service marks of others. References in this publication to IBM products and services do not imply that IBM intends to make them available in all countries in which IBM operates.